Automated Alignment Tools

Pairwise Alignment tool

This tool allows one to drop one or a selected set of sequences onto a reference sequence; the sequences dropped will then be aligned to the reference sequence, preserving whatever gaps are present in the reference sequence. In the process, if gaps need to be inserted into the reference sequence, then they will be also inserted into other, non-selected sequences.

The dropdown menu for this tool allows you to change the gap opening cost (default 8) the gap extension cost (default 3) within bases of a sequences, as well as the equivalent costs at the ends of the sequence (default 2 and 2, respectively).

The default substitution costs are:

for DNA and RNA data, 5 for a transition and 10 for a transversion
for protein data, 5 for each substitution.

These costs can be changed using the Substitution Costs dialog box available from the tool's dropdown menu.

Aligning nucleotide sequences to match an amino acid alignment

This feature allows one to take a matrix of nucleotides, and an existing alignment of their translated amino acids, and have the nucleotides realigned to match the amino acid alignment. To do this, you will need to have in the same Mesquite file both the nucleotide matrix and the protein matrix. For example, you could do the following:

Assign codon positions and genetic code to a nucleotide matrix (see the main Mesquite manual for details).
Adjust each sequence so that its reading frame is correct, by using the Shift To Minimize Stops feature.
Trim any incomplete codons from the ends of the sequences by selecting the entire matrix and choosing Matrix>Alter/Transform>Other Choices... and selecting Trim Terminal Incomplete Codons. "Terminal Incomplete Codons" are nucleotides that are only part of a codon. For example, if one sequence starts at a third position, then that third position nucleotide represents only one-third of a codon, and it will be trimmed. Once this is done, only complete codons will be left in the sequence
Translate the DNA matrix to amino acids by choosing Characters>Make New Matrix From>Translate DNA To Protein. You will now have the protein matrix in your file.
Align the protein matrix. You could use, for example, the Clustal Align feature described below. If instead you export the matrix (e.g., using the File>Export options), align the proteins in a separate program, you will then need to choose File>Include file to include the output of the alignment program into your file.
Finally, go to your DNA matrix, and choose Matrix>Alter/Transform>Align DNA to Protein.

MAFFT Align

This feature allows one to select a single block of sequences, and then have MAFFT align them. To do this, select the block, then choose Matrix>Align Multiple Sequences>MAFFT Align...

You will be presented first with a query as to whether you want to do the MAFFT alignment on a separate thread, or on the same thread. Mesquite can do multiple things at once, because it can have one thing running on one computational "thread", and another thing happening on a separate thread. There is a main thread of the program that is the thread the user deals with directly, and that allows you to give commands to Mesquite (via menus, etc.). If this main thread is busy with a calculation, then you will not be able to ask for new things to happen in Mesquite until the calculation is done. By choosing "Separate" in the query that appears, you are asking Mesquite to create a thread separate from the main thread, thus enabling you to do things in Mesquite while the MAFFT alignment is proceeding. However, if you do this, you must remember not to edit the matrix or close the window showing the matrix; if you do, Mesquite will be very unhappy. The safest thing to do is choose "No" to that query.

Once you make that choice, you will see a dialog box in which you must enter the directory location of MAFFT. This is necessary as without it Mesquite won't know how to use MAFFT. In this dialog box you can also set options for MAFFT.

To set the directory location, you can either type in the path directly, or you can use the Browse button and then you can find MAFFT and the location filled in automatically. (On a Mac, by default the standard version of MAFFT is installed in /usr/local/bin, so the path would be /usr/local/bin/mafft )

If you wish, you can also alter the options of MAFFT. The MAFFT manual notes some standard alignment procedure settings, including some accuracy-oriented methods (e.g., L-INS-i, G-INS-i, and E-INS-i) and some speed-oriented methods (including FFT-NS-i). Mesquite offers these standard methods in the pull-down menu beside "Suggested methods". Choosing one of these will fill the "basic alignment method" with the program arguments needed to implement that method.

For additional options, which you can enter into the "Addition MAFFT options" text box, see the MAFFT manual for more details.

If you then press OK, Mesquite will send that section of the matrix to MAFFT and ask for it to be aligned; it will then harvest the results and reincorporate that piece into the matrix.

If you use MAFFT from within Mesquite, then you should cite MAFFT as appropriate; see the MAFFT manual for citation details.

ClustalW Align

This tool is just like the MAFFT Align feature described above, except that it works for the ClustalWprogram. The version of ClustalW that is to be used by Mesquite must be a version of ClustalW that is executable from the command line of your operating system.

Muscle Align

This tool is just like the MAFFT Align feature described above, except that it works for Robert C. Edgar's MUSCLE program.