SSEARCH(1)               USER COMMANDS                 SSEARCH(1)


NAME
     ssearch - scan a protein or DNA sequence library for similar
     sequences


SYNOPSIS
     ssearch [-a -b # -d # -e -l _F_A_S_T_L_I_B_S  -r _S_T_A_T_F_I_L_E -m # -Q -s
     _S_M_A_T_R_I_X -w # ] query-sequence-file library-file

     ssearch [-Qabdelmorsw] query-file @library-name-file

     ssearch [-Qabdelmrsw] query-file "%PRMVI"

     ssearch [-aelmrsw] - interactive mode


DESCRIPTION
     ssearch compares a protein or DNA sequence  to  all  of  the
     entries  in  a  sequence  library  using the rigorous Smith-
     Waterman algorithm (Smith and Waterman, J. Mol. Biol. (1983)
     147:195-197.   For  example,  ssearch  can compare a protein
     sequence to all of the sequences in  the  NBRF  PIR  protein
     sequence   database.    ssearch  will  automatically  decide
     whether the query sequence is DNA or protein by reading  the
     query  sequence  as  protein  and  determining  whether  the
     `amino-acid composition' is more than 85% A+C+G+T.  The pro-
     gram can be invoked either with command line arguments or in
     interactive mode.  ssearch compares a query  sequence  to  a
     sequence  library  which  consists  of  sequence data inter-
     spersed with  comments,  see  below.   The  fasta  programs,
     including ssearch, use a standard text format sequence file.
     Lines beginning with or lower case, blanks,tabs and unrecog-
     nizable  characters  are ignored.  ssearch expects sequences
     to use the single letter amino acid codes, see  protcodes(1)
     .   Library  files  for  ssearch  should have the form shown
     below.

OPTIONS
     ssearch can be directed to change the scoring matrix, search
     parameters, output format, and default search directories by
     entering options on the command line (preceeded by  a  `-').
     All  of  the  options  should preceed the file name and ktup
     arguments). Alternately, these options  can  be  changed  by
     setting  environment variables.  The options and environment
     variables are:


     -a   (SHOWALL) Modifies the display of the two sequences  in
          alignments.  Normally,  both  sequences  are shown only
          where they overlap (SHOWALL=0); If -a or  the  environ-
          ment  variable SHOWALL = 1, both sequences are shown in
          their entirety.


Sun Release 4.1        Last change: local                       1


SSEARCH(1)               USER COMMANDS                 SSEARCH(1)


     -b # The number of similarity scores to be shown when the -Q
          option is used.  This value is usually calculated based
          on the actual scores.

     -d # The  number  of  alignments  to  be  shown.   Normally,
          ssearch shows the same number of alignments as similar-
          ity scores.  By using ssearch -Q  -b  200  -d  50,  one
          would  see the top scoring 200 sequences and alignments
          for the 50 best scores.

     -e   scale  the   similarity   scores   by   a   factor   of
          ln(n0)/ln(n1),  where  n0 and n1 are the lengths of the
          query and library sequence.  This  has  the  effect  of
          increasing  the scores of very short sequences, such as
          partial N-terminal sequences, and decreasing the scores
          of  very long sequences, which are more likely to match
          by random chance.  Unscaled scores are shown  with  the
          alignments.

     -l # (FASTLIBS) The name of the library menu file.  Normally
          this  will  be  determined  by the environment variable
          FASTLIBS.  However, a library menu  file  can  also  be
          specified with -l.

     -m # (MARKX) =0,1,2,3.  Alternate  display  of  matches  and
          mismatches in alignments. MARKX=0 uses ":","."," ", for
          identities,   consevative   replacements,   and    non-
          conservative replacements, respectively. MARKX=1 uses "
          ","x", and "X".   MARKX=2  does  not  show  the  second
          sequence, but uses the second alignment line to display
          matches  with  a  "."   for  identity,  or   with   the
          mismatched  residue  for mismatches.  MARKX=2 is useful
          for  aligning  large  numbers  of  similar   sequences.
          MARKX=3 writes out a file of library sequences in FASTA
          format.   MARKX=3  should  always  be  used  with   the
          "SHOWALL"  (-a)  option,  but  this does not completely
          ensure  that  all  of  the  sequences  output  will  be
          aligned.

report
     -Q Quiet option.  This allows ssearch to search  a  database  and
          the  results  without  asking any questions. ssearch -Q
          file library > output can be put in the  background  or
          run  at  a  later time with the unix 'at' command.  The
          number of similarity scores  and  alignments  displayed
          with the -Q option can be modified with the -b (scores)
          and -d (alignments) options.

     -r   _S_T_A_T_F_I_L_E Causes ssearch to write out the sequence iden-
          tifier, superfamily number (if available), and similar-
          ity scores  to  _S_T_A_T_F_I_L_E  for  every  sequence  in  the
          library.  These results are not sorted.


Sun Release 4.1        Last change: local                       2


SSEARCH(1)               USER COMMANDS                 SSEARCH(1)


     -s str
          (SMATRIX) the filename of an alternative scoring matrix
          file.    For  protein  sequences,  PAM250  is  used  by
          default; PAM120 can  be  used  with  the  command  line
          option -s 120.

     -w # (LINLEN) output line length  for  sequence  alignments.
          (normally 60, can be set up to 200).


EXAMPLES
     (1)  ssearch musplfm.aa $AABANK

     Compare the amino acid sequence in the file musplfm.aa  with
     the   complete   PIR  protein  sequence  library.   This  is
     extremely slow and should almost never be done.  ssearch  is
     designed to search very small libraries of sequences.

          >LCBO bovine preprolactin
          WILLLSQ ...
          >LCHU human ...
          ...


     (2)  ssearch -a -w 80 musplfm.aa lcbo.aa

     Compare the amino acid sequence in the file musplfm.aa  with
     the sequences in the file lcbo.aa using _k_t_u_p = 1.  Show both
     sequences in their entirety, with 80 residues on each output
     line.

     (3)  ssearch

     Run the ssearch program in interactive  mode.   The  program
     will  prompt  for the file name for the query sequence, list
     alternative libraries to be seached (if  FASTLIBS  is  set),
     and prompt for the _k_t_u_p.

     You can use your own sequence files  for  ssearch,  just  be
     certain  to  put  a '>' and comment as the first line before
     the sequence.

SEE ALSO
     rss(1),    align(1),     fasta(1),     rdf2(1),protcodes(5),
     dnacodes(5)

AUTHOR
     Bill Pearson
     wrp@virginia.EDU


Sun Release 4.1        Last change: local                       3