ALIGN(1) USER COMMANDS ALIGN(1) NAME align - compute the global alignment of two protein or DNA sequences align0 - compute the global alignment of two protein or DNA sequences without penalizing for end-gaps SYNOPSIS align [ -m # -s _S_M_A_T_R_I_X -w # ] sequence-file-1 sequence- file-2 DESCRIPTION align produces an optimal global alignment between two pro- tein or DNA sequences. align will automatically decide whether the query sequence is DNA or protein by reading the query sequence as protein and determining whether the `amino-acid composition' is more than 85% A+C+G+T. align uses a modification of the algorithm described by E. Myers and W. Miller in "Optimal Alignments in Linear Space" CABIOS (1988) 4:11-17. The program can be invoked either with command line arguments or in interactive mode. align weights end gaps, so that an alignment of the form -----MACF SRTKIMACF will have a higher score than: MACF MACF align0 uses the same algorithm, but does not weight end gaps. Sometimes this can have surprising effects. align and align0 use the standard fasta format sequence file. Lines beginning with '>' or ';' are considered com- ments and ignored; sequences can be upper or lower case, blanks,tabs and unrecognizable characters are ignored. align expects sequences to use the single letter amino acid codes, see protcodes(1) . OPTIONS align can be directed to change the scoring matrix and out- put format by entering options on the command line (pre- ceeded by a `-' or `/' for MS-DOS). All of the options should preceed the file name arguments. Alternately, these options can be changed by setting environment variables. The options and environment variables are: -m # (MARKX) =1,2,3. Alternate display of matches and mismatches in alignments. MARKX=1 uses ":","."," ", for identities, consevative replacements, and non- Sun Release 4.1 Last change: local 1 ALIGN(1) USER COMMANDS ALIGN(1) conservative replacements, respectively. MARKX=2 uses " ","x", and "X". MARKX=3 does not show the second sequence, but uses the second alignment line to display matches with a "." for identity, or with the mismatched residue for mismatches. MARKX=3 is useful for aligning large numbers of similar sequences. -s str (SMATRIX) the filename of an alternative scoring matrix file or "120" to use the PAM120 matrix. -w # (LINLEN) output line length for sequence alignments. (normally 60, can be set up to 200). EXAMPLES (1) align musplfm.aa lcbo.aa Compare the amino acid sequence in the file musplfm.aa with the amino acid sequence in the file lcbo.aa Each sequence should be in the form: >LCBO bovine preprolactin WILLLSQ ... (2) align -w 80 musplfm.aa lcbo.aa > musplfm.aln Compare the amino acid sequence in the file musplfm.aa with the sequences in the file lcbo.aa Show both sequences with 80 residues on each output line and write the output to the file musplfm.aln. (3) align Run the align program in interactive mode. The program will prompt for the file name for the first sequence and the second sequence. SEE ALSO rdf2(1),protcodes(5), dnacodes(5) AUTHOR Bill Pearson wrp@virginia.EDU Sun Release 4.1 Last change: local 2