|
Coalescence package for Mesquite
|
Wayne Maddison
August 2005
This package of modules and library classes provides a few basic calculations,
simulations and visualizations concerning gene tree coalescence in population
genetics. There is much more that can be done,
and opportunities for other programmers to contribute coalescence calculations
to Mesquite.
Overview
Just as MacClade was designed to provide tools to ask "what if the phylogenetic
history had been like this", Mesquite is designed to extend such questions
to other realms such as population genetics. Using, for instance, the insert
node and the lineage width tools in the Tree Window, one can construct a population
history with various expansions and contractions, and explore its consequences
on its contained gene trees via simulations.The best way to understand what
the package can do is to look at the example files (see below)
The coalescence package currently includes:
- Simple neutral coalescence simulations. Coalescent gene trees are simulated either within a single population, or within a tree of populations/species. These simulate the gene tree itself (topology and branch lengths); to simulate nucleotide sequence data, you'll need to simulate character evolution on these gene trees using the stochastic character evolution package.
- Reconstruction of the history of a gene tree within a species tree so as to minimize deep coalescences, and a tree drawing routine for species trees that draws their gene trees coalescing within them.
- Counting of Slatkin & Maddison's "s" for the discord between a gene tree and the different populations from which the genes were sampled.
- Counting of "deep coalescences" in the style of Maddison 1997 (Syst. Biol.)
- Various other modules that could be used outside of the context of coalescence, but which otherwise aren't part of the basic package of Mesquite modules (e.g., insert node, tree depth).
Thus, the modules are mostly focused on the topology of the gene tree and its relationship with a species tree. As shown in the examples, the package can be used to test some basic hypotheses in population genetics, especially hypotheses of population histories.
Conventions
Effective population size is for the genes themselves, as if the organisms were haploid. Branch lengths of a population or species tree are treated as measured in units of generations. When a gene tree is drawn within a species tree, its branches are drawn green except if polytomies were automatically resolved to optimize fit to the species tree, in which case the resolved branches are drawn in magenta.
Modules
The modules included are:
Simulations:
- NeutralCoalescence ("Neutral Coalescence") -- Simulates neutral coalescence. Employed by CoalescentTrees and ContainedCoalescence in order to simulate gene trees.
- CoalescentTrees ("Coalescent Trees") -- Supplies trees simulated by a coalescent process. The effective population size is a parameter that can be adjusted.
- ContainedCoalescence ("Contained Coalescence within Current Tree") -- Supplies trees simulated by a coalescent process within the branches of a species tree obtained from a Tree Window or other tree context.
- ContainedCoalescenceMult ("Contained Coalescence in Species Trees") -- not yet ready.
Calculations with gene trees:
- RecCoalescenceHistory ("Reconstruct Deep Coalescence")
-- Reconstructs the fit of a gene tree into a species tree so as to minimize
deep coalescences in the sense of W. Maddison (1997, Syst. Biol.). Also counts
the deep coalescence cost of such a fit. Options are (1) to treat the gene
tree as rooted or unrooted and (2) to allow polytomies in the gene tree to
resolve automatically to minimize deep coalescences further.
- DeepCoalescencesG ("Deep Coalescences (gene tree)") --
Counts the cost in deep coalescences to fit a gene tree in a species tree;
treats this as a value for the gene tree.
- DeepCoalescencesSp ("Deep Coalescences (species tree)")
-- Counts the cost in deep coalescences to fit a gene tree in a species tree;
treats this as a value for the species tree.
- SlatkinMaddisonS ("s of Slatkin & Maddison") -- Counts
the s value of Slatkin and Maddison (s is a measure of the discordance between
a gene tree and a division into populations). Requires an available Association
of genes into populations.
- TreeDepth ("Tree Depth") -- Determines the depth of the
tree, measured as the sum of branch lengths from the root to the tallest terminal
node.
Utilities:
- aCoalescencePkgIntro ("Coalescence Package Introduction") -- Introduces the coalescence package.
- LineageWidth ("Adjust lineage width") -- Allows the user to adjust the widths (e.g., effective population sizes) of branches of a population or species tree. This is not merely a graphical widening, but attaches a width parameter to the branches of the tree.
- InsertNode ("Insert Node") -- Inserts a node along a branch of a tree. This creates a node with only a single descendant. It can be used to break a branch into pieces, each of which is assigned its own effective population size (lineage width) and duration of time in generations (branch length).
The coalescence package depends on modules and libraries from the more general Taxa Associations package. The Taxa Associations package is not yet a standalone package, and is included in service of the coalescence routines. These are the included modules from the Taxa Associations package:
Management and Utilities:
- ManageAssociations ("Manage TaxaAssociation blocks") -- Reads and writes TaxaAssociation blocks to NEXUS files, and supervises their editing and manipulation by users.
- StoredAssociations ("Stored Taxa Associations") -- Supplies stored TaxaAssociations to calculations that need information on which taxa from one taxa block (e.g., genes) are associated with which taxa in another block (e.g., species).
- ManageDistributionBlock ("Read DISTRIBUTION blocks") -- Reads DISTRIBUTION blocks (e.g., used by Rod Page's GeneTree) in NEXUS files. Subsequent writing of the information is currently to separate TAXA, TREES and TaxaAssociation blocks.
Graphics and analysis:
- ContainedAssociates ("Contained Associates") -- Draws a species tree with broad branches, inside which are reconstructed and drawn gene trees within them.
What remains to be done
Improvements to the existing modules remain to be made, including making
them more efficient.
Many other calculations taking a gene tree perspective could be done, including
those that have nucleotide sequence evolution occurring along the branches
of the gene tree. This would allow direct comparison against observed
sequences, without being forced to reconstruct a gene tree from the
sequences before
comparison with Mesquite's results. The solution to this will come
from the Genesis package, which can be combined with the coalescence
package to generate
nucleotide sequences evolved on coalescence trees.
Installation
There are two folders (directories) whose contents need to be in the correct
place for Mesquite to be able to use them. These two directories are called
(1) "coalesce" and (2) "assoc". Find where you have Mesquite
installed on your hard disk. These three directories should be in the "mesquite"
directory within "Mesquite Folder".
Examples
There is a series of example data files in the directory "coalescence_examples". The files are self explanatory; begin with the file whose name begins with "00".
© Copyright 2002-2005 W. Maddison