Jemboss Alignment Editor

This is under development by the EMBOSS team.

The Jemboss Alignment Editor can be used interactively to edit a sequence alignment (read in fasta or MSF format). It can also be used from the command line to produce image files of the alignment (e.g. within a script).

INTERACTIVE INTERFACE

Loading Sequences

Supported sequence file formats are FASTA and MSF (so far). Load the aligned sequences in via the "File" menu and "Open".

When loaded the alignment appears in the order the sequences appear in the sequence file. The name is on the right hand side with the length of each sequence. At the top of the editor is the relative sequence numbering. A tooltip popup when the mouse arrow is placed over a site. The tooltip gives the name of the sequence, the residue and the position of that site.

Editing Functions

Gaps can be introduced into a sequence by clicking on a site and dragging the mouse to the right. Gaps can similarly be removed by dragging to the left. Sequences can be deleted from the the viewer by right mouse clicking on the sequence and selecting "Delete" from the pop up menu.

Sequences can be locked by selecting the name on the right hand side of the alignment and clicking the 'Lock' button. When gaps are then introduced into one of these sequences the gap will also appear in those sequence in this group of locked sequences. The lock can be switched off with the 'Unlock' button or 'Unlock All Sequences' in the 'Edit' menu.

Also under the 'Edit' menu there is an utility to trim the ends of sequences. This can be used to tidy up the termini of an alignment.

Colour Schemes

There are various colour schemes for the residues these can be found under the View menu. The residues can be coloured by there property or using one of the other common colour schemes (e.g. the Taylor method). These colour schemes can be edited using the "Colour Display" window and right clicking on a colour to bring up a palette to select from.

Alternatively sequences can be coloured based on the number of identities and positive matches in a column. This is the "Colour Identical/Matches" option under the "View" menu. The positive matches are defined by the scoring matrix selected. This method is based on the EMBOSS program prettyplot colour scheme. The user can define the number of identities that are required in a column for them to be set to a given colour. Also, a threshold can be set for the positive match score, above which the contributing matches are set to a given colour.

Scoring Matrix

The are used to calculate scores in the consensus. A different scoring matrix can be selected from the "View" menu and clicking on "Matrix Display". The current scoring matrix being used is reported at the bottom of the alignment viewer.

Consensus Sequence

This uses the same algorithm as used by cons in EMBOSS. The consensus when calculated appears below the other sequences in the viewer.

Consensus Plot

Gives a graphical representation of the consensus. This is based on plotcon in EMBOSS.

Identity Table

This can be calculated and gives the percentage of identical matches between each pair of sequences.

Inserting Annotation Sequences

Annotation can be cut and pasted or read in from a file. These will appear under the alignment. As well as taking any of the supported sequence formats the concise output from JPred can be used directly, e.g.

jpred:-,-,-,-,-,-,-,-,-,-,-,-,-,E,E,E,E,-,-,-,-,-,-,H,H,H,H,H,H,H,H,H,H,-,-,-,

COMMAND LINE INTERFACE

Command Line Usage

The following jars need to be in your classpath:

setenv CLASSPATH Jemboss.jar:jakarta-regexp-1.2.jar

java org/emboss/jemboss/editor/AlignJFrame file [options]

file This is the multiple sequence alignment in fasta or MSF format.
 

Command Line Options

-calc Calculate consensus and display under the alignment. The following 3 flags can be used to define values used in the calculations.
-plu (plurality) minimum positive match score value for there to be a consensus.
-numid minimum number of identities for there to be a consensus.
-case minimum positive match score for setting the consensus to upper-case.
-color Used to define a colour scheme, below is the list of available colour schemes:
taylor
residue
rasmol
acid
polar
hydrophobic
aromatic
surface
charge
size
base

java org.emboss.jemboss.editor.AlignJFrame file -color size

-font Set the font size.
-id Display a percentage ID pair table.
-noshow Turns of the alignment display.
-nres Number of residues to each line is a print out.
-pretty EMBOSS prettyplot colour scheme. The -matrix flag option can be used to define a scoring matrix for identifying positive matches.
-noBox switch off box drawing around identical and positive matches.
-minID define the minimum number of identities. The default for this is the number of sequences file.
-match define a threshold value for the number of positive matches, the default is half the total wgt.
-colID define a lettering colour for the identities.
-colIDBack define a background colour for identities.
-colMatch define a lettering colour for positive matches.
-colMatchBack define a background colour for positive matches. Available colour options:
red, blue, cyan, darkGray, gray , green, lightGray, magenta , orange, pink, white, yellow, black
-print Print the alignment image. The following 2 flags can be used along with the print flag
-prefix prefix for image output file.
-onePage fit the alignment to one page. This option must be be used with the -nres flag to define the residues per line.
-type png or jpeg (default is jpeg).
-antialias turn anti-aliasing on.
-landscape Print as landscape (the default is portrait).
-margin Define the left, right, top and bottom margin (in cm).

java org.emboss.jemboss.editor.AlignJFrame file -matrix EBLOSUM62 \
-noshow -print -margin 0.5 0.5 0.5 0.5

-matrix To define a scoring matrix. Used with the -pretty and -calc option.
-list List the available scoring matrix files.

Command Line Examples

java org.emboss.jemboss.editor.AlignJFrame file -matrix EBLOSUM80 \
-pretty -noshow -id -print -type png

java org.emboss.jemboss.editor.AlignJFrame file -matrix EPAM250 \
-pretty -colIDBack black -colID white -print \
-margin 0.5 0.5 0.5 0.0 -noshow

 

W.R.Taylor Protein residue colour references: Protein Eng. vol.10 no. 7 pp743-746, 1997