Assignment 3 (Nov. 8,
This assignment is worth 20% of the course grade.
Due: Tuesday November 20, 2018.
One of the assumptions underlying phylogenetic analysis of a gene is
that the gene evolves as a single unit, and that all parts of the
gene evolve uniformly. Evolutionary processes such as exon
shuffling, unequal crossing over and gene conversion can invalidate
these assumptions. For example, in gene conversion between two
copies of a gene, the net effect is that one copy of the gene
overwrites the other. Gene conversion* can replace an entire copy of
a gene with another, or just part of a gene with sequence from a
different copy. If the two original copies of the gene diverged from
a single ancestral copy in the distant past, each copy would have a
distinct phylogenetic tree. As a consequence, a chimeric gene
resulting from gene conversion would give you different phylogenetic
trees, depending on which part of the gene you looked at.
*If you are not already familiar with gene conversion, see Forsdyke
Evolution Academy 01-53 Gene Conversion [http://www.youtube.com/watch?v=tjaU38sTJMU].
The file priglobin.fasta is a
FASTA file containing 10 gamma globin genes: four human and two
each from chimp, gorilla and orangutan. These sequences have been
aligned to maximize similarity. The corresponding GenBank entries
(5 GenBank entries containing 2 genes each) are contained in priglobin.gen . The genes in priglobin.fasta have been extracted
from these larger GenBank entries. Examination of the GenBank
entries will show that in primates, gamma globin genes are found
in two tandem copies.
Note: In priglobin.fasta, HumanA1 and Human A2 are tandem
copies of the SAME locus as HumanB1 and Human B2. That is, A1A2
is a haplotype of the same locus as B1B2. The GenBank entry for
the A1A2 is HUMGAMGLOA, and the B1B2 haplotype is HUMGAMGLOB in
the GenBank file.
The question you need to address is: Is it valid to construct a
phylogenetic tree using the sequences in priglobin.fasta as a single
unit, or do different parts of the gene have distinct evolutionary
1. (5 points) Annotate the alignment,
showing the locations of important parts of the alignment,
particularly promoters, exons and introns.
a) Read priglobin.fasta into blnalign.
Run Alignment --> Reform to generate a view of the
alignment. Print 100 nucleotides per line, and make sure that
conserved sites are printed as dots (.), and gaps as dashes (-).
Save this file as priglobin.reform.
b) Read priglobin.reform into LibreOffice Writer. Format the
sequence to fit a standard letter-sized page as follows: Select all
and change the font to a fixed font, such as Liberation mono, 8
point. You will probably need to adjust top, bottom, left and right
margins to fit the alignment.
c) Using the GenBank file as a guide, annotate the alignment for
features such as exons and introns. It is critical to realize that
each GenBank entry contains two tandem copies of each gene. As well,
your alignment contains gaps. For this reason, the coordinates in
the alignment will not correspond exactly with the coordinates found
in the GenBank Features Table. However, if you use the Chimp1
sequence as a reference sequence, it should be straightforward to
find the beginnings and ends of features such as exons and introns
Add lines to your alignment document showing the start and stop
points for each feature. For an example of annotation, see labeling.odt. Save your document as
priglobin.odt. Be creative in how you annotate, but don't waste a
lot of time on this step. The main point is to have a well-annotated
copy of the alignment to make decisions on further anlalysis steps.
Although you can look at your document on the screen, it will
probably be most useful to print a copy.
|d) Although the reform output is enough to
assess polymorphism in different regions of the gene,
Jalview has several features that make it easy to visualize
the conserved vs. divergent regions. With your alignment in
blnalign, launch Jalview using Alignment --> Jalview.
First, choose Select --> Select All. Next, choose
Color --> Nucleotide colour scheme. This leaves
conserved bases, and leaves rare bases with a white
background, so they stick out prominently. Also choose Format
--> Show non-conserved. This will show ONLY
those nucleotides that differ from the consensus, which
helps bring out the polymorphism in the alignment. Next open
the View --> Overview window. This gives a
coloured low-resolution view of the entire alignment.
There is a red box around the region shown in the alignment
editor, and you can pull the red box left or right and the
alignment scrolls with it. This is a really great way to get
a feeling for the conservation in an alignment.
2. (5 points) Based on the results
above, create files containing specific regions of the alignment
to be used to test the hypothesis that the different parts of
these genes evolved independently
The goal, in essence, is to find out whether different parts of the
alignment give different phylogenetic trees. Therefore, it will be
necessary to build trees using different parts of the alignment. Two
considerations would include:
It is up to you to decide which regions of the alignment are best
to use. Explain your choices.
- The locations of important features, coding vs. non-coding
regions, promoters, introns, exons etc.
- Which are the most informative areas of the alignment? Which
are the least informative?
Next, create FASTA files for each region you wish to analyze.
Specific regions of the alignment can be extracted using readseq.
For example, if you wanted to extract the part of the alignment from
500 to 1000, use the following command to send output to a file
readseq -extract=500..1000 -f fasta -o priglobin500-1000.fasta priglobin.fasta
3. (5 points) Test the hypothesis by
constructing maximum likelihood trees.
In the tutorial entitled Phylogenetic
Analysis Using Parsimony and Maximum Likelihood , we saw a
good compromise between the speed of parsimony and the rigor of
a) Construct a maximum likelihood tree of the entire alignment. Run
DNAPARS, to generate a tree topology. Keeping in mind that the
branch lengths from the consensus tree are bootstrap replicate
numbers, not real branch lengths. Consequently, it is still
necessary to save the consensus tree from the bootstrapping step,
and then run DNAML using the bootstrap consensus tree as a User
Tree, to generate a final tree with branch lengths.
Create image files as described in tree_images.html.
b) Repeat the process for each region of the alignment which you
extracted in step 2 above.
c) Compare the trees from different regions of the sequence.
Probably the best criterion for comparing trees from different
regions is the bootstrap replicate numbers from DNAPARS. If a region
gives a consistent tree across all bootstrapped replicates, it
probably has evolved as a coherent unit over time. If few branches
on the tree are consistently replicated, the region probably has a
more complex evolutionary history. Also, if branch lengths are long
in one region and short in another, it is evidence that different
mutation rates occur between the two regions. (This assumes that
regions of roughly equal numbers of informative positions are
d) Evaluate how consistent the trees are between different regions
of the gene using the Phylip TREEDIST
program. In bltree, import all your treefiles using File --> Import Treefile, and
select the trees you wish to compare. Next, choose Evaluate --> Treedist. The
output should be saved in a file called priglobin.treedist.
Present your TREEDIST results as shown in the example in sample_tree.html.
4. (3 points) Conclusions - What have you
Give a brief summary of your conclusions from the data, but be
sure to address the following:
- What can you conclude from the TREEDIST results?
- If different parts of the alignment show different
phylogenetic trees, what do the data tell you about their
evolutionary histories? Are there parts of the gene that have
evolved together, and other parts for which a different
hypothesis must be postulated?
5. (2 points) Presentation.
Part of the grade will be determined by the quality of your
web page(s) for the assignment, including:
- The assignment page(s) must be accessible at either of
No other URLs will be accepted.
- All links must work, and all graphics must display. Each time
I have to contact you to fix something, 1 point will be
deducted. You get no credit for anything I can't access.
- Pay attention to the organizational and stylistic hints found
2. Do what it takes to make it easy to read and to
understand the points you wish to get across.
How to get
1. Create a directory called either
public_html/PLNT4610/as3 or public_html/PLNT7690/as3. Make this
directory world-readable and world searchable.
2. Do all work in the as3
directory. That way, all your files will already be where they
need to be.
you need to complete your assignment
Your report should include links to the following:
- priglobin.reform, priglobin.odt
- For each tree in your report you should create a table similar
to that found in sample_tree.html.
Each table contains a tree for each region, along with links to
the output files using a consistent naming convention. For
example, if the alignment for positions 1 - 500 was in a file
called priglobin1-500.fasta, you would have the following files:
How to post it
1. Create a new HTML file called as3/as3.html. Your web page
for Assignment 3 should take the form of a report, that makes it
easy to figure out what you did.
2. Make all files in the as3 directory world-readable. (chmod
3. Edit either PLNT4610.html or PLNT7690.html to include a link
4. In the Firefox or SeqMonkey Browser, go to your home page
and follow all hypertext links to your assignment, and test all
links to your output files.
5. If you paste excerpts of output into a web page, change the
output section to a fixed font such as Courier, or
set the style to "Preformat". The output from most sequence
programs assumes that each character takes up an equal amount of
width, which is not true for proportional fonts such as Helvetica or Times.
Academic integrity: Your work is assumed to be your own
original work. All University policies regarding academic
On the day the assignments are due, I should be able to just go
to each person's web site and find the output. You don't need to
send me an email message saying that your assignment is complete.
If you choose not to hand in this assignment, you don't need to do