Dendrogram Tree Display

[screenshot]

This display type is not to be confused with the generic ACEDB "Tree" display. The Dendrogram display represents true multi-object data ?Tree objects such as taxonomy, phylogenetic trees (of DNA sequences or proteins, for instance) and developmental cell lineages. ?Tree objects are composed of a "tree" of objects in the computing sense of a data tree, that is, the display recursively traverses "parent-child" linked ?TreeNode objects rooted from the "Root" tag of the ?Tree object in question.

Contents


Tree Data Input

The ?Tree and associated ACEDB models guiding dendrogram construction are described at the end of this document (see Models).

Tree data may be read into ACEDB from the main ACEDB class window popup menu (under Phylogenetic Trees) as a "New Hampshire" formatted tree file (the standard Phylip format for trees, default file extension ".ph") . Users may designate the tree being input as a "taxonomy", "DNA sequence" or "Protein Sequence" tree, depending upon the menu item selected. If no ?Tree name is given in the .ph file (as the first "root" label), then the user is prompted for such the name.

Tree data (including cell lineages) may also be input directly as ACEDB formatted ?Tree and ?TreeNode objects (with attached ?Taxon, ?Sequence and/or ?Protein objects) by the usual ACEDB methods: with direct object creation/updating or by using .ace data files (formatted as per the "Associated Models" below). Note: "New Hampshire" tree input modality does not currently apply for "cell lineage" trees, Embedded_Tree ?Tree's, etc. which must be input using direct ACEDB methods.

In general, the display code assumes that given ?TreeNode objects in a given ?Tree are unique (un-duplicated, non-cyclic) in the tree. Also, "Embedded ?Tree's" should probably not be directly or indirectly self referential! Undefined problems may occur in dendrogram displays if this is not the case.

"New Hampshire" format

The "New Hampshire" format consists of parenthesized, comma delimited nested lists of items representing the structure of a tree. A list is composed of atoms (leaves) or lists (subtrees). The outer list is terminated by a semi-colon. Everything in the file following the semi-colon is ignored. Any item (including sublists) in a list may be labeled and/or may have a numerically descriptive suffix with either or both a float number label designating bootstrap value and a second float number (after any bootstrap value and prefixed with a colon) designating a subtree "branch length" (i.e. " (...)100:0.56" where 100 is the bootstrap value and 0.56 is the distance (the (...) denoting the subtree list itself). Alternatively, the bootstrap value may be a number enclosed in square brackets, coming after the branch length value (i.e. (...):0.56[100] alternative notation for the previous example).

All that been said, unlabeled nodes and/or nodes without distances and/or bootstrap values are perfectly acceptable.

Valid labels must begin with an alphabetic character but may contain other characters. The nature of tree parsing is such that whitespace in the .ph file is ignored. In order to put a space into any label, substitute "_" for the spaces in the label string. If an underscore "_" is required in a label, the backslash "\" character can be used to "escape" the very next character. The backslash character may also be used to insert a space or any of the special characters in the following string: "_\{}[](),; :" (not including the double quotation marks).

Labels may be placed either before or after a list description corresponding to a subtree (or the total tree). If the label is placed before the list, then it may be used for labelling the ?TreeNode in taxonomic and cell lineage trees. For DNA or Protein trees, the label is only inserted into the Label Text tag-value field of the ?TreeNode but not used as the ?TreeNode name. If a label is found after the list, it is used in the "Label" value only.

Valid numbers (distance and bootstrap values) must begin with a digit (even if only a zero) and may contain a decimal point. Negative signs on numbers are ignored (but the user is notified if one is encountered).

The following is an example of a tree encoded New Hampshire format:

(A,B:10.0,C(D:3.0,(E:2.0):4.0,,F:2.0)1.0:13.0,(G:2.0,H:1.0):5.0[0.9]);

(approximately) representing the tree:

+-A
|
+----------B
|       +---D
|   100%      |
+-------------C--E
|       |
|       +-?  ? == an unlabeled ?TreeNode
|       |
|       +--F
|     +--G
| 90% |
+-----+
      |
      +-H

The "Phylogenetic Tree" popup menu item gives you the choice of entering a "Taxonomic", "DNA Sequence" or "Protein Sequence" tree. Depending on which option you pick, your leaf node label names are automatically used to generate ?Taxon, ?Sequence or ?Protein objects, respectively, during tree input. In the case of "Taxonomic" trees, interior node labels are also used to generate ?Taxon objects (i.e. for higher "rank" taxons, e.g. phylum, order, family, genus, etc... see "Associated Models" below).

Once entered, the trees may be edited using all the usual ACEDB techniques of object editing (i.e. object Updating). No specialized tool for tree editing is currently available; however, the semantics of the models is simple enough: the UNIQUE Parent and the Child nodes of ?TreeNodes may be deleted or modified to "manually" prune from, or graft subtrees onto, each other. Also, manual editing (or .ace file input) can add additional node data which is not easily input in the New Hampshire format, e.g. ?Taxon colour coding, associated sequences, etc.

*** ACEDB Extensions to the basic New Hampshire standard ****

The "C" label above just before the sublist is interpreted as "interior" node label for the subtree. Such labels, with or without Taxon embellishments (see below) may either preceed (prefix) or come after (postfix) a sublist (Developer's Note: I think that "postfix" labels are the convention in the New Hampshire tree definition standard, but the ACEDB dendrogram tree data input facility accepts both prefix and postfix labels; however, only prefix labels are used in ?TreeNode ACEDB object names; postfix labels merely set the Label Text tag-value of the ?TreeNode, or the Taxon embellishment (see below)).

The "interior" node labeling facility is especially useful for Taxonomy trees or for labeling subtree cuts ("subfamilies") of phylogenetic trees. Note: One can even label the "Root" node of a tree in this way. In fact, if such a label is provided, it is also taken to be the name of the ?Tree, by default (i.e. the user is not prompted for the tree name).

An additional ACEDB specific extension to the basic New Hampshire standard trees is that ?Taxon rank may be specified in between the internal label and the list, as a brace ("{" and "}") enclosed tag field. The tag name must be listed under the "Rank" tag in the ACEDB ?Taxon model class. For example, a "Taxonomic" tree describing the nematode would be written as follows (several taxa omitted for clarity):

NCBI_Taxonomy(Eukaryotae {Superkingdom}(Metazoa {Kingdom}( Nematoda {Phylum} ... (Caenorhabditis {Genus}( Caenorhabditis_elegans{Species}))))))))))));

Alternatively, the taxon name may appear within the "rank" list, delimited by a colon. Thus, the above "Taxonomic" tree can be equivalently written:

NCBI_Taxonomy ({Superkingdom:Eukaryotae}( {Kingdom: Metazoa }( {Phylum: Nematoda } ... ({Genus: Caenorhabditis }( {Species: Caenorhabditis_elegans }))))))))))));

The merits of this latter syntax is that Sequence and Protein trees may attach a "{Species:<name>}" value to their leaf labels, which will be sequence names. Note that when you input taxonomic trees in this manner, it is generally necessary to provide the whole taxonomic path complete with taxon rank labels for every level, to guarantee the proper construction of the taxonomic subtree.

Troubleshooting a Tree Parse Failure

Sometimes, ACEDB will fail to successfully parse in a given tree file. The program will generally give an error message giving some indication of the source of the parse failure. Common (indirect) reasons for failure include unmatched parentheses, braces, brackets, or commas. If the program complains about a missing comma, check also for labels in which you have used spaces rather than underscores to indicate blank characters. Remember that labels must begin with a character and numbers with at least a digit (not a decimal point).

Back to Top of Page | Table of Contents


Navigating through the Tree

Both the mouse and the keyboard may be used to navigate through a tree and display associated information. Refer to the corresponding tables about keyboard and menu operations (below). Clicking the "Middle" mouse button (or "shift+Left mouse button" in a two button mouse) scrolls the tree display to centre it at the point of the mouse click. "Page Up" and "Page Down" can be used to scroll the display too.

The active (clickable) components are the ?TreeNodes, which may be unlabeled or labeled. Tree nodes have two parts, an 'anterior' node box and a 'posterior' label box (possibly empty). The primary feature of the 'anterior' box is a small circles designating the node. The context of the anterior versus posterior node box is slight different, with respect to double clicking actions and pop-up menus; however, when either box is selected, the whole tree node becomes 'currently active' with highlighting. The actual colour of a highlighted node is influenced by the setting of tags in the model, as follows:

Nodes without a subtree (child nodes) attached or the root nodes of hidden subtrees are treated as "leaves" of a tree for the purposes of navigation.

Displaying Associated Data

As expected, one mouse "pick" of a tree node label highlights it, and a second pick (or "double click") results in the display of information associated with the node. Alternatively, highlighting nodes, followed by pressing of the "Enter" key, also result in such a data display. The manner of display is object specific: i.e. ?Sequence objects give a "feature map" display; ?Proteins give a "peptide map" display; Embedded_Tree ?Tree's give a new dendrogram display. For user reference, the ACEDB text only object records are also concurrently displayed for the non-text displayed objects. This is now a general "tag2" driven feature: any ?TreeNode "Contains" associate tag-value with a non-textonly ACEDB display type, will be displayed with its default display when the treenode is "displayed". Generally, all associated information under the "Contains" tag of the tree node object is concurrently displayed. If no 'Contains' tag associated data is found, the ?TreeNode object itself is displayed by default (as 'text only') Note: ?Tree's may be embedded in ?TreeNode's of other ?Tree's. Clicking on such a node generates display of the associated ?Tree in a new ACEDB display window.

Pop-up menu commands (see below) are also available to display 'Contains' tag associated data as 'text only' or to display the underlying ?TreeNode data object rather than the associated data.

?Protein and ?Sequence objects (and any other class with a Species ?Species tag-value pair) will also trigger display of any embedded "Taxon ?Taxon" or "Species ?Species" data from their own object (i.e. not the ?TreeNode but the ?Protein/?Sequence object itself). If a ?TreeNode contains a Taxon ?Taxon tag-value pair, though, this overrides the Species in any associated ?Protein or ?Sequence object.

"Pick_me_to_call" tag/value pairs may be used to invoke scripts (e.g. to display node associated jpeg or gif images). The default "ground state" of information is a ?TreeNode object record. For tree nodes without labels, the ?TreeNode associated data may always be displayed by selecting the given node then doing a "Show selected object as text" popup menu operation.

Hiding Subtrees

Double picking tree node circles results in the expansion/contraction of the subtree rooted by the given node. Hidden subtrees are indicated by an arrow and a number indicating the total number of leaves in that (hidden) subtree. Double clicking a hidden subtree root node exposes the subtree. See also "Insert/Delete" keys and the popup menu "Hide/Show Subtree" below. Using the Right Arrow from the root node of a "hidden" subtree, also automatically expands that subtree.

Phylogenetic Tree "Bootstrap" Values

If subtree phylogenetic (float) "bootstrap" values are available in a ?TreeNode, they are displayed to the right of the tree node circle. Raw bootstrap values in the ?TreeNodes may be normalized using the ?Tree..Bootstrap_Factor tag-value (i.e. actual displayed bootstrap value is ?TreeNode..Bootstrap / ?Tree..Bootstrap_Factor). Bootstrap_Factor is 1.0 by default. If the displayed bootstrap value is in the range of 0.0 to 1.0, the value is displayed as a percentage 0% to 100%. Bootstrap values should be positive (if negative, the sign is ignored). The display of the bootstrap values is globally controled by the ?Tree..Hide_Bootstraps tag. The ?TreeNode..Hide_Bootstraps tag hides the bootstraps for a complete subtree. This latter setting is overridden for any given node by the ?TreeNode.. Show_Bootstrap tag.

Keyboard Operations

The keyboard may also be used to navigate through the tree. The following keys (and their effects) are available. If "Needs Active Node" is "Yes" then the key only has an effect if a node is currently selected. In most cases, the display is scrolled as necessary to bring the given tree node into view.

Key (Needs Active Node?) Effect  

Home (No) Go to first leaf in the tree

End (No) Go to last leaf in the tree

Enter (Yes) Display the information associated with the current Node

Right Arrow (Yes) Go to first child Node in subtree of the current Node; expand if hidden

Left Arrow (Yes) Go to the parent Node of the current Node

Down Arrow (Yes) Go to the next "sibling" Node/subtree of the current Node

Up Arrow (Yes) Go to the previous "sibling" Node/subtree of the current Node

Space Bar (Yes; Leaf Node) Go to the next leaf Node or hidden subtree from the current Node

Backspace (Yes; Leaf Node) Go to the previous leaf Node or hidden subtree from the current Node

Tab (No) Toggle between labels only and descriptive labels (w/taxon)

Escape (Esc)(No) Toggle between branchline/taxon "black 'n white" and colour mode

Insert (Yes) Update the current tree Node object (using ACEDB object updating)

Delete (Yes) Toggle expansion/contraction of the subtree rooted by the current Node

 

Back to Top of Page | Table of Contents


Display Information and Configuration

Node Labels

If an explicit or inferred label is associated with the tree node, it is printed after the ?TreeNode circle. Labels may either be "simple" or "long, descriptive" depending upon the current setting of the "?Tree..Descriptive_Labels" tag or selection of the popup menu "Toggle Labels" function. Simple labels are derived from the tree type specific "Contains" object name (i.e. the ?Taxon object in a Taxonomy tree, ?Sequence object in a DNA tree, ?Protein object in a Protein tree). Where no such tree type specific object exists, but where at least one "Contains" tag2 object is present (e.g. ?Gene_Family), its object name will be used as the source of the label. Otherwise, if the ?Tree type is Taxonomy or Cell Lineage, then the ?TreeNode name is used as the label source. Otherwise, the node is displayed unlabelled.

Generally, in practice, all taxonomy tree nodes will be labeled, while only the leaves of phylogenetic (sequence/protein) trees will be so labeled, although one could also label subtrees by "Gene Family". For trees NOT of the "Taxonomy" type, interior nodes of the tree are generally unlabeled unless the "Label Text" field of the ?TreeNode is set; however, all tree "leaves" are labeled with the name of the associated object (?Sequence or ?Protein) name, or the ?TreeNode name, if the former tag values are absent from the ?TreeNode. In contrast, the interior nodes of "Taxonomy" trees are also labeled, with the ?Taxon or ?TreeNode object name. In all trees. The "Toggle Labels" popup menu item toggles labels between the ?TreeNode "Label Text" field value and a more descriptive name extracted from certain tag-value fields of an associated "Contains" object, currently: ?Taxon - "Common_name ?Text"; ?Sequence/?Protein - "Title ?Text"; ?Cell - "Fate ?Text"; or "Description ?Text" tag-value, if it exists for any other "Contains" tag2 object in the given ?TreeNode.

Copying Labels and Data to the Clipboard

The label of the currently active tree node may be copied to the system "clipboard" for pasting elsewhere in ACEDB (or in another application), by selecting "copy" in the dendrogram pop-up menu or by the usual system-specific defined manner. Special Note: You may also copy the contents of the display's header message box (if visible) by picking the box then immediately selecting "copy". Picking the message box in this manner does NOT change the currently selected tree node (and the message box itself does not change in appearance - the operation takes place silently, but also overwrites the current contents of the ACEDB post buffer, which may have previously contained the active node's label...)

Dendrogram Colouring

The displayed lines and nodes may be colour coded if suitable "#Colour" tags are set in the data objects used to construct the tree. Specifically, setting the Display Colour #Colour tag of the ?TreeNode dictates (recursively) the colour of lines and nodes in a given subtree, until overridden by a similar tag in a descendent ?TreeNode (for its subtree).

Similarly, setting the Display "Foreground_Colour" and "Background_Colour" #Colour tags of associated ?Taxon objects will override the subtree colour of the node labels "by taxon". This latter feature is especially useful to colour code a phylogenetic (?Sequence or ?Protein) tree by species. For this latter feature, curators have a choice of either embedding a Taxon ?Taxon value directly into the ?TreeNodes or putting a "Species ?Species" tag-value pair directly into the associated ?Protein or ?Sequence object under "Contains" in a given ?TreeNode (this latter approach probably being a more sensible way to organize things ...)

Bootstrap values for subtrees may be loaded in from tree files, stored in the ?TreeNodes and displayed next to the nodes in the display. Bootstrap values are globally displayed by default unless disabled by the ?Tree "Hide_Bootstrap" tag (or hidden from the pop up menu toggle for bootstraps). Subtrees may have all their bootstrap values hidden by a local "Hide_Bootstrap" tag in the ?TreeNode rooting the subtree. This global and subtree based "Hide_Bootstrap" behaviour may be overridden locally, for a single given ?TreeNode, by setting the "Show_Bootstrap" tag.

Setting the "Display Hide" tag in a ?TreeNode dictates that a given subtree rooted by the ?TreeNode is initially displayed as hidden.

The ?Tree model class controls default alignment (Top, Middle or Bottom) of the dendrogram tree (set to "Top" for Taxonomic trees, "Middle" for other types, by default).

A "header" of various dendrogram information is provided. This header may be hidden (by default) by setting the "Display Hide" ?Tree tag or dynamically, using the "Toggle Header" popup menu item. The number of leaves, the maximum branch length and the current "Normalization" branch display scale (plus an approximate "ruler legend" for the scale) is shown at the top of the graph. A "message box" (coloured cyan) displays additional information about the currently selected node, such as node name, common name, descriptive text and associated Taxon (usually a species, if available).

Presently, the source of the descriptive text depends upon the type of tree being displayed. For DNA trees, the "Brief_identification" tag value is used. For Protein trees, the "Title" tag value is used. For Cell lineages, the "Remark" tag value is used. For Taxonomy trees, the code looks for a "Description ?Text" tag value field. If no descriptive text is found in this way, the program looks for a "Description ?Text" tag value field in the first of any "Contains" tag2 object.

The "Normalization" value scales the tree's branch lengths into some reasonable graph display length (Default Value == 1.0). (Re)set the "Normalization" value in the ?Tree object (see below) to some reasonable number to change the scale of the displayed tree. The "Rescale" popup menu item does this too (but does not save the value to the ?Tree object; the user needs to explicitly set the object value). Taxonomic trees have a Normalization of 1.0 by default and suppress the associated metric values in the header. Printing ("to fit" horizontally) of dendrograms may be facilitated by rescaling the tree.

*** Note: Following initial tree data input from a "New Hampshire" file, the display scale is unlikely to be suitable for a reasonable display of the tree and will likely need to be changed ***

Your display.wrm must have an entry for "DtDendrogram". Your options.wrm and models.wrm files must reflect the presence of the ?Tree, ?TreeNode, ?Taxon and ?Species model classes (see "Associated Models" below). If you like, you can set the default displays in options.wrm for ?Tree, ?TreeNode, ?Taxon and ?Species to "DtDendrogram". The code acts accordingly, displaying the (first) associated dendrogram tree by default, scrolled to the active box set to the given object (except for ?Tree, which is set to the first leaf).

Back to Top of Page | Table of Contents


 Menu Buttons

Back to Top of Page | Table of Contents


Pop-Up Menu items

The dendrogram display has several context specific pop-up menus to manipulate the display. Some of these menus have keyboard equivalents (noted in parentheses in the popup menu).

Display Button Menu A right button click on the 'Display' button in the header brings up a list of globally acting commands which modify how the dendrogram is displayed. All of these menu items only affect the "look" of the currently active display temporarily until the 'Save Display State' item is selected.


Main Menu: default menu on the display, in all contexts in which no other menu is available, is the main menu.


Node Box Menu: The 'anterior' node display box has a context specific pop-up menu as follows:


Label Box Menu: The 'posterior' label display box has a context specific pop-up menu as follows:

Back to Top of Page | Table of Contents


Associated ACEDB Models

The four following ACEDB classes should be supported in order for this display to work properly. Note: The "Contains" field in ?TreeNodes is now a "tag2" magic tag, therefore, any type of ACEDB tag-object pair associated data may displayed by any means which shows the given node's data (node selection+enter key or double clicking the label). The dendrogram display code also knows a bit more about ACEDB ?Sequence and ?Protein classes (and the tag "Title ?Text" field therein).

Several of the tags in the following required models are "magic", directly interpreted by Dendrogram display source code. These tags (and other mandatory elements) are bold highlighted, Some classes of objects are interpreted in a special way by the program code: the existing ?Cell, ?Sequence and ?Protein classes, plus a new ?Taxon class (with an augmented ?Species class). Users may wish to use all the other noted tags, keywords and xref's, which are, nevertheless, not stricted needed for the proper functioning of the Dendrogram display code (and thus, may be modified or omitted). Some the tag/values are only default values at graph creation and their function may be overridden by specific popup menu commands (e.g. "Align Tree...", "Rescale tree", "Hide Subtree", etc.)


// the main "tree" object
?Tree    Description UNIQUE ?Text		// Used as graph display title if present
         Type UNIQUE Taxonomy			// Controls semantics of tree display
                     DNA
                     Protein
                     Cell_Lineage
         Root UNIQUE ?TreeNode                  // "Root" node of the current tree
         Tree_Node   ?TreeNode                  // Nodes in other ?Trees within which this ?Tree is embedded
         Display     No_Header			// Suppresses display "header"
                     Descriptive_Labels		// Show descriptive labels
                     Colour			// Taxon_colouring if present
                     Normalization    UNIQUE Float	// Normalization factor for display (defaults to 1.0)
                     Bootstrap_Factor UNIQUE Float	// Normalization factor for bootstrap values (defaults to 1.0)
                     Hide_Bootstraps
                     Alignment UNIQUE Top	// How the dendrogram is to be drawn
                                      Middle
                                      Bottom
                                      Unrooted  // Unrooted "star" tree
         Reference   ?Paper XREF Tree

?TreeNode       Label UNIQUE ?Text		// Tree vertex label, e.g. sequence name or taxon   
                Id    UNIQUE Int		// Node numbering...
                Description  ?Text
		Type UNIQUE  Root		// Root ?TreeNode should be so designated!
                             Interior
		             Leaf
                Distance UNIQUE	Float		// "Evolutionary distance" or branch length
                Bootstrap UNIQUE	Float		// Node subtree "bootstrap" values
                Tree   UNIQUE   ?Tree
		Parent UNIQUE   ?TreeNode XREF  Child
		Child           ?TreeNode XREF  Parent
                Display  Hide			// Hide the subtree (children) of this node
                         Colour   UNIQUE #Colour // Fixes the colour of the subtree; 
                                                 // overridden by child node settings
                         Hide_Bootstraps         // Hide all bootstraps in subtree...
                         Show_Bootstrap          // ... except those with the "Show_Bootstrap" tag set
                Contains Embedded_Tree UNIQUE ?Tree XREF Tree_Node 
                         Taxon    UNIQUE ?Taxon     XREF Tree_Node
		         Sequence UNIQUE ?Sequence  XREF Tree_Node
		         Protein  UNIQUE ?Protein   XREF Tree_Node
                         Cell     UNIQUE ?Cell      XREF Tree_Node
                         URL      UNIQUE ?Url       XREF Tree_Node
                Pick_me_to_call   Text   Text

?Taxon   Common_name  UNIQUE ?Text
         Other_names  Text
         Rank  UNIQUE Superkingdom
                      Kingdom
                      Phylum
                      Subphylum
                      Superclass
                      Class
                      Subclass
                      Superorder
                      Order
                      Suborder
                      Superfamily
                      Family
                      Subfamily
                      Genus
                      Species UNIQUE ?Species XREF Taxon
                      No_Rank
         Description  ?Text
	 Taxonomy     UNIQUE ?TreeNode	// Unique taxonomy tree 
         // Note: ?Taxon objects can be associated with more than one tree
         Tree_Node    ?TreeNode XREF Taxon 
	 // Optional colour code (i.e. on Phylogenetic trees)
	 Display      Foreground_Colour   #Colour  
	              Background_Colour   #Colour  

?Species Common_name  ?Text
         Taxon UNIQUE ?Taxon XREF Species
         Loci         ?Locus
         Sequences    ?Sequence
         Proteins     ?Protein
         Reference    ?Paper

//


Back to Top of Page | Table of Contents

Known Bugs


Back to Top of Page | Table of Contents

Last edited (rbsk): June 28, 1999

HTML syntax revised: October 20, 1998 (fw)

The "Dendrogram" display is the creation of Richard Bruskiewich,
International Rice Research Institute (IRRI), r.bruskiewich@cgiar.org,
(formerly at the Sanger Centre, UK).