Assignment 3 (Nov. 9,
This assignment is worth 20% of the course grade.
Due: Tuesday November 21, 2017.
One of the assumptions underlying phylogenetic analysis of a gene is
that the gene evolves as a single unit, and that all parts of the
gene evolve uniformly. Evolutionary processes such as exon
shuffling, unequal crossing over and gene conversion can invalidate
these assumptions. For example, in gene conversion between two
copies of a gene, the net effect is that one copy of the gene
overwrites the other. Gene conversion* can replace an entire copy of
a gene with another, or just part of a gene with sequence from a
different copy. If the two original copies of the gene diverged from
a single ancestral copy in the distant past, each copy would have a
distinct phylogenetic tree. As a consequence, a chimeric gene
resulting from gene conversion would give you different phylogenetic
trees, depending on which part of the gene you looked at.
*If you are not already familiar with gene conversion, see Forsdyke
Evolution Academy 01-53 Gene Conversion [http://www.youtube.com/watch?v=tjaU38sTJMU].
The file priglobin.gde is a GDE file
containing 10 gamma globin genes: four human and two each from
chimp, gorilla and orangutan. These sequences have been aligned to
maximize similarity. The corresponding GenBank entries (5 GenBank
entries containing 2 genes each) are contained in priglobin.gen . The genes in
priglobin.gde have been extracted from these larger GenBank
entries. Examination of the GenBank entries will show that in
primates, gamma globin genes are found in two tandem copies.
The question you need to address is: Is it valid to construct a
phylogenetic tree using the sequences in priglobin.gde
as a single unit, or do different parts of the gene have
distinct evolutionary histories?
1. (5 points) Annotate the alignment,
showing the locations of important parts of the alignment,
particularly promoters, exons and introns.
a) Read priglobin.gde into blnalign. Run Alignment
--> Reform to generate a view of the alignment. Print 100
nucleotides per line, and make sure that conserved sites are printed
as dots (.), and gaps as dashes (-). Save this file as
b) Read priglobin.reform into LibreOffice Writer. Format the
sequence to fit a standard letter-sized page as follows: Select all
and change the font to a fixed font, such as Liberation mono, 8
point. You will probably need to adjust top, bottom, left and right
margins to fit the alignment.
c) Using the GenBank file as a guide, annotate the alignment for
features such as exons and introns. It is critical to realize that
each GenBank entry contains two tandem copies of each gene. As well,
your alignment contains gaps. For this reason, the coordinates in
the alignment will not correspond exactly with the coordinates found
in the GenBank Features Table. However, if you use the Chimp1
sequence as a reference sequence, it should be straightforward to
find the beginnings and ends of features such as exons and introns
Add lines to your alignment document showing the start and stop
points for each feature. For an example of annotation, see labeling.odt. Save your document as
priglobin.odt. Be creative in how you annotate, but don't waste a
lot of time on this step. The main point is to have a well-annotated
copy of the alignment to make decisions on further anlalysis steps.
Although you can look at your document on the screen, it will
probably be most useful to print a copy.
|d) Although the reform output is enough to
assess polymorphism in different regions of the gene,
Jalview has several features that make it easy to visualize
the conserved vs. divergent regions. With your alignment in
blnalign, launch Jalview using Alignment --> Jalview.
First, choose Select --> Select All. Next, choose
Color --> Nucleotide colour scheme. This leaves
conserved bases, and leaves rare bases with a white
background, so they stick out prominently. Also choose Format
--> Show non-conserved. This will show ONLY
those nucleotides that differ from the consensus, which
helps bring out the polymorphism in the alignment. Next open
the View --> Overview window. This gives a
coloured low-resolution view of the entire alignment.
There is a red box around the region shown in the alignment
editor, and you can pull the red box left or right and the
alignment scrolls with it. This is a really great way to get
a feeling for the conservation in an alignment.
2. (5 points) Based on the results
above, create files containing specific regions of the alignment
to be used to test the hypothesis that the different parts of
these genes evolved independently
The goal, in essence, is to find out whether different parts of the
alignment give different phylogenetic trees. Therefore, it will be
necessary to build trees using different parts of the alignment. Two
considerations would include:
It is up to you to decide which regions of the alignment are best
to use. Explain your choices.
- The locations of important features, coding vs. non-coding
regions, promoters, introns, exons etc.
- Which are the most informative areas of the alignment? Which
are the least informative?
First, save the alignment in a FASTA file called priglobin.fsa.
Next, create FASTA files for each region you wish to analyze.
Specific regions of the alignment can be extracted using readseq.
For example, if you wanted to extract the part of the alignment from
500 to 1000, and your fasta file was named priglobin.fsa, use the
following command to send output to a file called
readseq -extract=500..1000 -f fasta -o priglobin500-1000.fsa priglobin.fsa
3. (5 points) Test the hypothesis by
constructing maximum likelihood trees.
In the tutorial entitled Phylogenetic
Analysis Using Parsimony and Maximum Likelihood , we saw a
good compromise between the speed of parsimony and the rigor of
a) Construct a maximum likelihood tree of the entire alignment. Run
DNAPARS, to generate a tree topology. Keeping in mind that the
branch lengths from the consensus tree are bootstrap replicate
numbers, not real branch lengths. Consequently, it is still
necessary to save the consensus tree from the bootstrapping step,
and then run DNAML using the bootstrap consensus tree as a User
Tree, to generate a final tree with branch lengths.
Create image files as described in tree_images.html.
b) Repeat the process for each region of the alignment which you
extracted in step 2 above.
c) Compare the trees from different regions of the sequence.
Probably the best criterion for comparing trees from different
regions is the bootstrap replicate numbers from DNAPARS. If a region
gives a consistent tree across all bootstrapped replicates, it
probably has evolved as a coherent unit over time. If few branches
on the tree are consistently replicated, the region probably has a
more complex evolutionary history. Also, if branch lengths are long
in one region and short in another, it is evidence that different
mutation rates occur between the two regions. (This assumes that
regions of roughly equal numbers of informative positions are
d) Evaluate how consistent the trees are between different regions
of the gene using the Phylip TREEDIST
program. In bltree, import all your treefiles using File --> Import Treefile, and
select the trees you wish to compare. Next, choose Evaluate --> Treedist. The
output should be saved in a file called priglobin.treedist.
Present your TREEDIST results as shown in the example in sample_tree.html.
4. (3 points) Conclusions - What have you
Give a brief summary of your conclusions from the data, but be
sure to address the following:
- What can you conclude from the TREEDIST results? If different
parts of the alignment show different phylogenetic trees, what
do the data tell you about their evolutionary histories? Are
there parts of the gene that have evolved together, and other
parts for which a different hypothesis must be postulated?
- Which of the following two hypotheses seems most consistent
with the results:
- a) The HumanA1-A2 repeat and the HumanB1-B2 repeats have a
common ancestor which also had copies 1 and 2 as a tandem
- b) At each locus, A and B, separate tandem duplications
occurred to generate copies A1-A2 and B1-B2, respectively
5. (2 points) Presentation.
Part of the grade will be determined by the quality of your
web page(s) for the assignment, including:
- The assignment page(s) must be accessible at either of
No other URLs will be accepted.
- All links must work, and all graphics must display. Each time
I have to contact you to fix something, 1 point will be
deducted. You get no credit for anything I can't access.
- Pay attention to the organizational and stylistic hints found
2. Do what it takes to make it easy to read and to
understand the points you wish to get across.
How to get
1. Create a directory called either
public_html/PLNT4610/as3 or public_html/PLNT7690/as3. Make this
directory world-readable and world executable.
2. Do all work in the as3
directory. That way, all your files will already be where they
need to be.
you need to complete your assignment
Your report should include links to the following:
- priglobin.reform, priglobin.odt, priglobin.fsa
- For each tree in your report you should create a table similar
to that found in sample_tree.html.
Each table contains a tree for each region, along with links to
the output files using a consistent naming convention. For
example, if the alignment for positions 1 - 500 was in a file
called priglobin1-500.fsa, you would have the following files:
How to post it
1. Create a new HTML file called as3/as3.html. Your web page
for Assignment 1 should take the form of a report, that makes it
easy to figure out what you did.
2. Make all files in the as3 directory world-readable. (chmod
3. Edit either PLNT4610.html or PLNT7690.html to include a link
4. In the Firefox or SeqMonkey Browser, go to your home page
and follow all hypertext links to your assignment, and test all
links to your output files.
5. If you paste excerpts of output into a web page, change the
output section to a fixed font such as Courier, or
set the style to "Preformat". The output from most sequence
programs assumes that each character takes up an equal amount of
width, which is not true for proportional fonts such as Helvetica or Times.
Academic integrity: Your work is assumed to be your own
original work. All University policies regarding academic
On the day the assignments are due, I should be able to just go
to each person's web site and find the output. You don't need to
send me an email message saying that your assignment is complete.
If you choose not to hand in this assignment, you don't need to do