>>>boot BOOTSTRAP A bootstrap analysis (Felsenstein 1985 Evolution 39:583) can be performed with any tree making method. The number of replicates is specified in area REPLICATES. Some options may be activated in the OPTION->BOOTSTRAP menu : - The by default number of replicates can be set. Initial value is 500. - By default, sequence order is randomly drawn for each bootstrap replicate for parsimony and likelihood methods : the implemented algorithms are order dependant. Removing it (JUMBLE AT EACH REPLICATE option) may have bad effects, specially when numerous equally parsimonious trees are found. - By default, bootstrap analysis means evaluating the robustness of each node (branch) of the tree reconstructed from the whole data set. When the CONSENSUS TREE option is activated, an additionnal tree is drawn.This consensus tree includes the branches best supported by bootstrapping (i.e. those branches that were recovered in the highest number of bootstrap trees) whether they are in the whole-data tree or not, provided they are compatible with each other. The CONSENSUS TREE option is useful to sum up the information contained in several equally most parsimonious trees. - An original functionality of phylo_win is activated by triggering the ALLOW BRANCH COMBINING option. This makes a new COMBINE button appear in any subsequent tree window, provided a bootstrap analysis was asked for. The main goal of the option is to evaluate the bootstrap support of combinations of adjacent branches. More generally, by selecting adjacent branches and/or unselecting species, the user defines two sets of species. The proportion of bootstrap trees including at least one internal branch separating the defined two sets is computed. This option is useful to estimate the robustness of phylogenetic relationships among a subset of species whatever the behaviour of the remaining species, without loss of information. >>>dime DISTANCE BASED METHODS The distance-based tree-making method used in phylo_win is the Neighbor Joining method (Saitou and Nei 1987 MBE 4:406). Numerous methods are available to compute pairwise distances (DISTANCE menu). NUCLEOTIDIC DISTANCES : - observed divergence : observed percent of differences between the 2 compared sequences. - Jukes and Cantor distance (Jukes and Cantor 1969, p21 in Mammalian Protein Metabolism, Munro ed, vol 3) : corrects for multiple substitutions according to the one-parameter model. - Kimura distance (Kimura 1980 JME 16:111) : corrects for multiple substitutions according to a 2-parameter model, allowing for unequal transition and transversion rates. - Tajima and Nei distance (Tajima and Nei 1984 MBE 1:269) : corrects for multiple substitutions according to the equal-input model, allowing for unequal A-, C-, G- and T contents within present day sequences. - Galtier and Gouy distance (Galtier and Gouy 1995 PNAS 92:11317) : corrects for multiple substitutions according to a non-homogeneous model, allowing for unequal G+C contents between present day sequences. - LogDet distance : Lake 1994 PNAS 91:1455 , Lockhart et al. 1994 MBE 11:605. PROTEIC DISTANCES : - observed divergence : see above - Poisson Correction : corrects for multiple substitutions according to a one-parameter model. CODONS DISTANCES : - Ka : non-synonymous substitution rate according to Li 1993 JME 36:96 - Ks : synonymous substitution rate according to Li 1993 JME 36:96 Nuclear or mammaliam mitochondrial genetic code can be chosen for Ka and Ks computing (CODE menu). Ka and Ks may be computed from nucleotidic sequences (gap containing codons and stop codons are checked for) or from proteic sequences. In the latter case, a file with the corresponding nucleotidic sequences is asked for. Sequences in this file need not to be aligned. Gaps, if any, are ignored. >>>disp NAMES AND SEQUENCES DISPLAY General properties of names and sequences display are set via the DISPLAY menu. - Sequence order can be changed using DISPLAY->MOVE and clicking-and-dragging names within the name panel. A new sequence order may be confirmed (OK button) or cancelled. - DISPLAY->REMOVE deletes the selected sequences after confirmation. - Sequence names can be edited using DISPLAY->RENAME. Changes may be confirmed (OK button) or cancelled. - Groups and colors of aminoacids can be changed. A new choice of groups and colors can be read from a file. This file must be either $(HOME)/phylo_win.aacolors (where $(HOME) is the users' home directory) or specified by the -aagroupfile option when calling phylo_win (try phylo_win -help for informations). This file must include a single line specifying new aminoacid groups as examplified above : EDQNHRK,ILMV,APSGT,FY,WC The number of aminoacid groups must be lower than or equal to 5. Colors are fixed to red, green, yellow, blue, and magenta. Missing letters are painted in white, as are gaps. >>>gaop GAP OPTIONS Two ways of dealing with gaps are available in phylo_win. By default, all selected gap-containing sites are removed from the data. The actual number of sites involved in the analysis is given in the tree window. For distance-based methods, the PAIRWISE GAP REMOVAL option may be set in the OPTION menu. For each sequence pair, sites with a gap in at least one of both sequences are removed. The distance is computed over all remaining sites. The number of sites actually used vary between sequence pairs. The mean number of used sites over all pairwise comparisons is given in the tree window. Undefined nucleotides (different from A, C, G, T, U) or aminoacids (different from A, C, D, E, F, G, H, I, K, L, M, N, P, Q, R, S, T, V, W, Y) are considered as gaps. >>>inou INPUT FILES phylo_win basic input is a file of aligned sequences. It can also read tree files, for elaborate usages. The main sequence file format in phylo_win is an improved version of the MASE format (Faulkner and Jurka 1988. TIBS 13:321 ) called MASE+. In this file, informations about site sets, species groups and stored trees can be read. A description of the MASE+ format as well as a prototype file are given in the HELP->FORMAT menu. phylo_win can also read the following sequence file formats : CLUSTAL, FASTA, PAUP=NEXUS and PHYLIP. In a NEXUS file, only one data block is read -the first one. Tree files may be read by phylo_win. The required format is the standard (CLUSTAL, NEXUS=PAUP, PHYLIP, TREEVIEW...) parenthesed format. OUTPUT FILES phylo_win basic output is a MASE+ sequence file with informations about species and sites selection as well as stored trees. It can also write tree files and distance files. The unique sequence output file format in phylo_win is MASE+. phylo_win cannot be used as a format translator. In this file, the user can save the informations set during a phylo_win session namely site sets, species groups and reconstructed trees. Reconstructed trees can also be saved into tree files. File format is the standard parenthesed format. phylo_win can write distance files. Format is a non-standard upper-right semi-matrix. A prototype is : 5 0.03 0.31 0.14 0.16 0.33 0.19 0.15 0.27 0.28 0.17 Option : K Rat Mouse Chicken Horse Pig 5 is the nuber of compared sequences. "Option : K" means that Kimura's distance was computed. Distance between Rat and Mouse is 0.03, distance between Chicken and Horse is 0.27. >>>intr INTRODUCTION phylo_win is a mouse-driven interface for molecular phylogenetic purposes. Its basic functionalities are : - displaying a sequence alignment - allowing an easy selection of sequences and sites to analyse - reconstructing phylogenetic trees according to numerous methods - drawing and printing the reconstructed trees - saving species groups, site sets and trees into a file More elaborate usages include performing bootstrap, jumbling sequence input order, evaluating trees according to various criteria, evaluating a combination of adjacent nodes by bootstrap, etc... phylo_win does not allow the user to modify sequences nor to edit the alignment. For these functionalities, an alignment editor such as seaview is required. To get started with phylo_win, an input sequence file is needed. phylo_win can read the following file formats : CLUSTAL, FASTA, MASE, NEXUS=PAUP, PHYLIP. Nucleotidic or proteic files are allowed. To learn more about the properties of the interface, read topics of the HELP->GENERAL menu. To know what the tree-making methods actually do, read topics of the HELP->ALGORITHM menu. To get a description of MASE+, the main file format in phylo_win, read topics of the HELP->FORMAT menu. >>>jumb JUMBLE The JUMBLE option can be called when maximum parsimony or maximum likelihood method is used. These methods are sequence input order-dependant.If JUMBLE is set, the tree making process is iterated (number of replicates defined in the REPLICATES area) with random sequence input order. The best recovered tree(s) over all jumble replicates is (are) displayed. >>>mali MAXIMUM LIKELIHOOD The maximum likelihood algorithm used in phylo_win is that of the FASTDNAML program (Olsen et al 1994 CABIOS 10:41 , Felsenstein 1981 JME 17:368). Few options can be set in the OPTIONS->ML PARAMETERS menu, namely : - the assumed transitions/transversions ratio (default is 2) - the number of branches crossed by moving subtrees during global tree rearrangements (G option in PHYLIP), both during species addition (default is 1) and after all species were added (default is 1). >>>mapa MAXIMUM PARSIMONY The maximum parsimony (Fitch 1971 Syst Zool 20:406) algorithms used in phylo_win come from the PHYLIP package (Felsenstein 1993, University of Washington) : programs DNAPARS and PROTPARS are used for nucleotidic and proteic sequences, respectively. Up to 10 equally best trees are shown. >>>mapl MASE+ FORMAT A MASE+ file includes a header with general comments followed by successive triplets (sequence comments, sequence name, sequence). The header is composed by lines beginning with two semicolons ;; . It may include : - general unread comments - specifications of sets of sites : begin with " ;;# of regions = " , followed by the name of the site set on the same line and the boundaries of specified regions on a new line. - specifications of groups of sequences : begin with " ;;@ of species = " , followed by the name of the species group on the same line and the numbers of specified sequences on a new line. - specifications of trees : begin with ;;$ followed by the description of the specified tree on a new line. Trees must be written in the common (PHYLIP, CLUSTAL, NEXUS ...) parenthesed format, with an optional header enclosed by brackets [ ]. Sequence comments are composed by lines beginning with one semicolon ; . Sequence names must be written on a new line after sequence comments. Their length must be lower than or equal to 20 characters. Sequences must begin on a new line after sequence names. New lines within a sequence are allowed. See section PROTOTYPE for an example. >>>miop MISCELLANEOUS OPTIONS MISC->AUTO SAVE sets the auto-save option so that changes are automatically saved before quitting phylo_win or opening a new sequence file. The former file is not removed. If input file is file.mase, successive auto-saved files are named file.mase~1~, file.mase~2~, etc... MISC->OUTPUT DISTANCE MATRIX must be triggered to output the distance matrix when a Neighbor Joining tree is reconstructed. See HELP->GENERAL->INPUT/OUTPUT for a description of the output file format. MISC->SMALL TREE WINDOWS. By default, the size of tree windows depends on the number of species in tree. If set, this option forces the size to the minimum. HIDE SEQUENCES : sequences are not printed. Useful on slow terminals. >>>prot MASE+ FILE : PROTOTYPE The following MASE+ file contains 6 aligned sequences, 2 sets of sites, 1 group of species and 2 trees. ;; MASE+ format ;; general comments ;; ;;the 4 first lines are not taken into account ;;# of regions = 3 first_set ;;3, 9 15,20 30,40 ;; ;; This line is useless, as well as the previous one ;;@ of species = 5 mammals ;; 1, 2, 4, 5, 6 ;;# of regions = 1 all sites ;;1, 40 ;;$TREE1 ;;[comments on the tree TREE1] ;;((Rat, Mouse), (Horse, (Pig, Cow))); ;;$tree2 ;;[branch lengths and bootstrap values are allowed , trees may be rooted -previous- or unrooted -next.] ;;(Chicken:0.3, Mouse:0.21, ;;(Horse:0.15, Cow:0.16)97:0.07); ;;new lines in trees are allowed ;;Last line of the header ;comments for sequence1 ;unlimited number of lines ;sequences below are random Rat AGGATGCGGCAATAGC-GTAGACCAGATCC AATGCGGTGC ;sequence 2 Horse AGCATGCAGCAATAGT-GTAGACGAGATCC AATGCAATGC ;sequence 3 Chicken GGGATGCCTCAATAGCATTAGGCCAGATCC AATGCA---C ; Mouse AGGATGCAGCAATAGC-GCAGACCGGATCC AATGCGCTGC ;sequence5 Pig AGGATGCGTCAATAGC-GTAGATCAGGTCC AATTTGGTAT ; Cow AGGATACGGCAATAGC-GTTGACCAGATCC AATTCGGTAT >>>sise SITES SELECTION The down-central button box and the sequence panel are devoted to the selection and storage of site sets. Clicking on a sequence site selects (or unselects) the site. Clicking-and-dragging or pushing the shift button while clicking allow to select (unselect) more than one site in a row. Buttons SELECT ALL and SELECT NONE can speed up the selection process. A set of selected sites can be saved using the ADD SET button. The name of the set is set in the SET area. The list of stored sets is displayed. Clicking on a name of this list selects the sites of the corresponding set. A set can be removed from the list using the DELETE SET button. The current stored sets are saved into a MASE+ file if FILE->SAVE is called. For coding nucleotidic sequences, codon positions may be specified as well (CODING menu) : first, second and third positions can be retained for/removed from the analysis. Only those sites that are currently selected are taken into account in the analysis. The number of selected sites is given above the sequence panel. >>>spse SPECIES SELECTION The down-left button box and the name list are devoted to the selection and storage of species groups. Clicking on a sequence name selects (or unselects) the sequence. Clicking-and-dragging or pushing the shift button while clicking allow to select (unselect) more than one sequence in a row. Buttons SELECT ALL and SELECT NONE can speed up the selection process. A group of selected sequences can be saved using the ADD GROUP button. The name of the group is set in the GROUP area. The list of stored groups is displayed. Clicking on a name of this list selects the sequences of the corresponding group. A group can be removed from the list using the DEL. GROUP button. The current stored groups are saved into a MASE+ file if FILE->SAVE is called. Only those sequences that are currently selected are taken into account in the analysis. The number of selected sequences is given above the name list. >>>trdi TREE DISPLAY Reconstructed trees are drawn in separate windows. Branch lengths (if any) and bootstrap values (if any) can be printed. Only rooted trees are drawn, although molecular tree-making methods recover unrooted trees. The root is arbitrarily located on the "central" branch of the tree, i.e. the branch that minimizes the difference between the average distance from root to taxa in the right part of the tree and that in the left part of the tree. The location of the root can be changed by triggering the NEW OUTGROUP button. The SWAP NODES button allows to invert the positions of the two descendant nodes of any internal node. The SUBTREE button shows a magnified view of a part of the tree. The SHOW TREE button switches back to the non-editable mode. A tree can be saved via the STORE button. The list of stored trees is displayed. A selected stored tree can be re-drawn (DRAW TREE button), evaluated according to four criteria (EVALUATE button, see HELP->ALGORITHM menu) or deleted (DELETE button). The current stored trees are saved into a MASE+ file if FILE->SAVE is called. A currently drawn tree can also be written (parenthesed format) into a tree file (TREE FILE button). A tree plot can be converted to the PostScript format and then printed if a PostScript printer is available. Button PRINT TREE either runs the printing process (if a by-default printer is found) or creates a PostScript file. Trees can also be input from a file (INPUT TREE button) in order to be drawn or evaluated. Required file format is the common (PHYLIP, CLUSTAL, NEXUS) parenthesed format. A file may include several trees, separated by at least one new line. >>>trev TREE EVALUATION Any stored topology may be evaluated according to four criteria. The basic goal is to compare several alternative trees, possibly input from a file. - BRANCH LENGTHS : Least square optimal branch lengths according to the selected distance-computing method are computed. The tree with new branch lengths is displayed. The residual sum squares (RSS) and the total length of the tree (sum of all branch lengths) are given, so that several alternative topologies can be compared according to the Least Square or Minimum Evolution criteria (Cavalli-Sforza and Edwards 1967 Am J Hum Genet 19:223 , Rhzetsky and Nei 1992 JME 35:367). - PARSIMONY : The minimum number of steps required for the selected tree is computed. - LIKELIHOOD : The likelihood of the selected tree according to the model implemented in FASTDNAML, described as F84 in Yang 1994 JME 39:105, is computed. Branch lengths, if any, are not taken into account : optimal branch lengths are computed, and the likelihood of this optimally labelled tree is returned. The data set used for these evaluations is defined by : - the species involved in the evaluated tree - the currently selected sites. >>>end