Aligning Sequences Locally

The local alignment routine is based around the program SIM by Huang and Miller which is an implementation of the Smith-Waterman algorithm Huang,X.Q. & Miller, W. A Time-Efficient, Linear-Space Local Similarity Algorithm. Advances in Applied Mathematics 12 337-357 (1991).

SIM finds k best non-intersecting alignments between two sequences or within a single sequence using dynamic programming techniques. The alignments are reported in order of decreasing similarity score and share no aligned pairs. SIM requires space proportional to the sum of the input sequence lengths and the output alignment lengths, so it accommodates 100,000-base sequences on a workstation. Both sequences must be of the same type, ie both be DNA or both be protein.

A dialogue box (shown above) requests the horizontal and vertical sequences and the ranges over which they are to be aligned ( see section Selecting a sequence). Either a specified number of alignments can be requested or alternatively, all alignments above a certain score. If the sequence is DNA, the scores for a matching aligned pair, a transition and a transversion must be provided. These values are used to generate a score matrix. For protein sequences, the score matrix can be changed from the "Options" menu ( see section Changing the score matrix). Both DNA and protein sequences require the penalty for opening a gap and the penalty for gap extension.

The alignments are displayed in the Output Window along with the percentage mismatch (see below) and on the SPIN Sequence Comparison Plot as a series of lines, each line corresonding to a single alignment. The line represents the path of alignments.

The following two plots show local alignments of two Xenopus Laevis sequences. The vertical sequence (xlactcag) is genomic DNA, and the horizontal sequence (xlacacr) is the corresponding cDNA.

The first plot is of a local alignment using a higher than default penalty for each residue in the gap (1 as opposed to 0.2). It has also been specified that all alignments scoring more than 20 are to be shown. The result of this is seven aligned regions, represented by seven diagonal lines in the plot. These regions correspond to the exons that are present in both sequences, separated by the introns that are only present in the genomic sequence.

(Click for full size image)

The second plot shows the result for the same two sequences when the default gap penalty is accepted and when only the highest scoring alignment is displayed. This best alignment covers five of the seven exons identified in the previous plot, with the lower gap penalty allowing it to span the introns that separate them.

(Click for full size image)

Below is a typical aligment as written to the Output Window.

```
Percentage mismatch  35.7
438       448       458       468       478       488
h caggcctgtgaggaccagcagtgctgtcctgagatgggcggctggtctggctgggggccc
:::::::::::   :::: ::  ::: ::       :: : :::: :   :::::: :::
m caggcctgtgacacccagaagacctgccccacacatggggcctgggcatcctggggcccc
451       461       471       481       491       501

498       508       518
h tgggagccttgctctgtcacctgc
:::   ::  :::: :   :::::
m tggagcccccgctcaggatcctgc
511       521       531

```

Further operations available for local alignments are:

Information
This command gives a brief description of the sequences used in the comparison and the input parameters used.
```
horizontal PERSONAL: h from 1 to 1553
vertical PERSONAL: m from 1 to 1358
number of alignments 3
score for match 1
score for transition -1
score for transversion -1
penalty for starting gap 6
penalty for each residue in gap 0.2

```
Configure
This option allows the line width and colour of the matches to be altered.See section Colour Selector. A colour browser is displayed from which the desired line width or colour can be configured. Pressing OK will update the SPIN Sequence Comparison Plot.
Display sequences
Selecting this command invokes the Sequence Comparison Display ( see section Sequence Comparison Display). Moving the cursor in the sequence display will move the cursors of the same sequence in any SPIN Sequence Comparison Plot ( see section Cursors). To force the sequence display to show the nearest match, use the "nearest match" button in the sequence display plot.
Hide
This option removes the points from the SPIN Sequence Comparison Plot but retains the information in memory.
Reveal
This option will redisplay previously hidden points in the SPIN Sequence Comparison Plot.
Remove
This command removes all the information regarding this particular invocation of Local alignment, and access to this data is lost.