BIRCH

TUTORIAL: PHYLOGENETIC ANALYSIS USING DISTANCE METHODS


Oct. 15, 2014


PHYLIP Main documentation: $doc/Phylip/main.html
PHYLIP Distance methods: $doc/Phylip/distance.html
PHYLIP DNADIST: $doc/Phylip/dnadist.html
PHYLIP PROTDIST: $doc/Phylip/protdist.html
PHYLIP FITCH: $doc/Phylip/fitch.html
PHYLIP PROTDIST: $doc/Phylip/kitsch.html
PHYLIP NEIGHBOR: $doc/Phylip/neighbor.html
ATV tree viewer: $doc/atv/atv_documentation.pdf

The PHYLIP programs are command line programs, but can be run by BioLegato.
The programs in the PHYLIP package are interactive programs designed to be run at the command line. BioLegato can run these programs by generating the keystrokes you would have typed to set program parameters. 

Construction of a phylogeny using distance methods involves two steps:

  1. blnalign and blpalign run DNADIST and PROTDIST, respectively, to construct a distance matrix.
  2. The distance matrix is used to construct a phylogenetic tree, using any of a number of methods implemented in the programs FITCH, KITSCH or NEIGHBOR.

Example: Plant Type III Chitinases

The chitinases in plants are hydrolytic enzymes that degrade chitins (N-acetyl glucosamine). Although chitin does not occur in plants,  in many fungi, it is a major component of the fungal cell wall. Not surprisingly, chitinases are produced in plants in response to fungi. Chitinases have been demonstrated to play an important role in plant defense responses. There are six classes of chitinases so far identified. Most known chitinases fall into the Type I and Type II classes. This exercise will work with a smaller class of genes encoding Type III chitinases.
 

1. The dataset

The file chitIII.mrtrans.gde is a GDE format file containing protein coding sequences (CDS) from chitinase III genes. These DNA sequences have already been aligned using Pearson's mrtrans program, which reads a set of unaligned DNA sequences and aligns them according to a set of aligned proteins.

Create a directory called distance, and save chitIII.mrtrans.gde to this directory. Open the file in blnalign:


 

2. A quick phylogeny using FITCH

For routine distance tree construction, the method of Fitch and Margoliash is the method of choice. FITCH allows for variable rates of evolution in different lineages, and iterates the tree to minimize the least squares distance across the entire tree. Although Neighbor-Joining is faster, it is also much less thorough, considering one tree. It is probably the least rigorous method for constructing a phylogeny . To run FITCH, choose Phylogeny --> DNA Distance methods. Fitch-Margoliash is the default method.


 

DNADIST will calculate a distance matrix, and then FITCH will run, and by default, 3 windows will appear.

OUTFILE- the report on the phylogeny

 
 

TREEFILE - the machine -readable treefile. Readable by programs such as DRAWTREE, DRAWGRAM, and ATV.

chitlll.dna.fitch.treefile.png

 
The treefile also pops up on a bltree window, allowing further tasks to be performed using the tree as input.

TREEFILE - the treefile in the ATV tree editor.

chitlll.dna.fitch.atv.png


 
Hint: Each of these files need to be saved separately, if you wish to save them. Give them all the same base name, but different extensions, such as chitIII.dna.fitch.outfile, chitIII.dna.fitch.treefile.

Note: Do NOT save the contents of the ATV window using the .treefile extension. You will overwrite the original treefile.  ATV can save files in NHX (New Hampshire, extended) format, which will preserve any changes made in ATV.  In most cases, you can just save the .treefile  and read it into ATV, treetool, or other tree drawing programs whenever you want to work with it.
 

3. Phylogeny using amino acid sequences.

Since also possible to construct distance matricies for multiple alignments of amino acid sequences, the same programs (FITCH, KITSCH, and NEIGHBOR) can be used to construct distance trees. The file chitIII.pro.tcoffee.gde contains the chitinase III proteins aligned using TCOFFEE. Open this file with blpalign.



Some of the parameters for construction of the distance matrix using PROTDIST are different from those for DNADIST.  These include several different methods for constructing distance matrices, as well as a choice of alternative genetic codes, where appropriate.

Once the distance matrix is constructed, there is no difference in computation of the phylogenetic tree, so all parameters are the same as previously.

FITCH will produce an outfile (chitIII.pro.fitch.outfile) and a treefile (chitIII.pro.fitch.treefile) similar to those with the DNA alignment.  For comparison, the treefile is shown in ATV below:


chitIII.pro.fitch.atv.swapped.png


While this may look like a different tree than that produced using the DNA alignment, the topologies (ie. the order of branching) are identical. To prove this, we can choose the "Swap children" option in ATV, and then click on internal nodes to rotate the branches.



Comparison with the tree from the DNA alignment shows that these trees have identical topologies, and similar lengths for most branches.