# Copyright (C) 2013 by Ben Morris (ben@bendmorris.com) # This code is part of the Biopython distribution and governed by its # license. Please see the LICENSE file that should have been included # as part of this package. import xml.etree.ElementTree as ET cdao_namespaces = { 'cdao': 'http://purl.obolibrary.org/obo/cdao.owl#', 'obo': 'http://purl.obolibrary.org/obo/', } def resolve_uri(s, namespaces=cdao_namespaces, cdao_to_obo=True, xml_style=False): '''Converts prefixed URIs to full URIs. Optionally, converts CDAO named identifiers to OBO numeric identifiers.''' if cdao_to_obo and s.startswith('cdao:'): return resolve_uri('obo:%s' % cdao_elements[s[5:]], namespaces, cdao_to_obo) for prefix in namespaces: if xml_style: s = s.replace(prefix+':', '{%s}' % namespaces[prefix]) else: s = s.replace(prefix+':', namespaces[prefix]) return s cdao_owl = ''' ]> Comparison of two or more biological entities of the same class when the similarities and differences of the entities are treated explicitly as the product of an evolutionary process of descent with modification. The Comparative Data Analysis Ontology (CDAO) provides a framework for understanding data in the context of evolutionary-comparative analysis. This comparative approach is used commonly in bioinformatics and other areas of biology to draw inferences from a comparison of differently evolved versions of something, such as differently evolved versions of a protein. In this kind of analysis, the things-to-be-compared typically are classes called 'OTUs' (Operational Taxonomic Units). The OTUs can represent biological species, but also may be drawn from higher or lower in a biological hierarchy, anywhere from molecules to communities. The features to be compared among OTUs are rendered in an entity-attribute-value model sometimes referred to as the 'character-state data model'. For a given character, such as 'beak length', each OTU has a state, such as 'short' or 'long'. The differences between states are understood to emerge by a historical process of evolutionary transitions in state, represented by a model (or rules) of transitions along with a phylogenetic tree. CDAO provides the framework for representing OTUs, trees, transformations, and characters. The representation of characters and transformations may depend on imported ontologies for a specific type of character. CDAO Team Comparative Data Analysis Ontology comparative analysis; comparative data analysis; evolutionary comparative analysis; evolution; phylogeny; phylogenetics has_Character This property associates a character data matrix with a character (a column) represented in the matrix. belongs_to_Edge_as_Child The property links a Node to the Edge it belongs to in the child position. has_Ancestor The property links a node to any of the other nodes that are its ancestors in a rooted tree. has_Nucleotide_State This property associates a nucleotide character-state instance with a state value from the domain of nucleotide states. belongs_to_Edge The property links a Node to one of the edges that are incident on such node. belongs_to_Character_State_Data_Matrix has_Root The property links a rooted tree to the specific node that represents the unique root of the tree. has_Child The property links a node to a node that is an immediate descendant in the tree. has_First_Coordinate_Item The property that relates a coordinate list to the first item in the list. has_Coordinate belongs_to_Continuous_Character has_Datum This property relates a character to a state datum for the character. has_Standard_Datum subtree_of This property links two networks where the latter is a substructure of the former has_Amino_Acid_State This property associates a amino acid character-state instance with a state value from the domain of amino acid states. is_annotation_of has_RNA_Datum has_Left_State This property relates a transformation to a 'left' state (the state associated with the 'left' node). precedes exclude has_Node Property that associates to each Edge the Nodes it connects. nca_node_of has_External_Reference Associates a TU to some external taxonomy reference. has_Coordinate_System This property links a coordinate to the coordinate system it references. belongs_to_Nucleotide_Character connects_to has_Amino_Acid_Datum This property relates an amino acid character (a column in a protein sequence alignment) to a state datum for the character (an individual cell in the alignment column). hereditary_change_of This property relates a type of evolutionary change (an Edge_Transformation) to the character that undergoes the change. The change is a transformation_of the affected character. has_Compound_Datum This property relates a compound character (a character with some states that are subdividable) to a state datum for the character. has_Descendants reconciliation_of belongs_to_Amino_Acid_Character has_Descendant A property that links a node to any of its descendants in a rooted tree. has_Continuous_State This property associates a character-state instance with a state value on a continuous numeric scale. has_Type belongs_to_Edge_as_Parent The property links a Node to one of the Edges where the node appears in the parent position (i.e., closer to the root). has Generic 'has' property. has_Parent The property that links a node to its unique parent in a rooted tree. belongs_to_Compound_Character homologous_to This propery relates different instances of the same character, including the case when the states of the character differ (e.g., large_beak of beak_size_character of TU A is homologous_to small_beak of beak_size_character of TU B). has_Change_Component This property relates a transformation to the components that compose it. has_Categorical_Datum has_State This property associates a character-state instance with its state value, e.g., a state value expressed in terms of an imported domain ontology. has_Left_Node This property relates a transformation to a 'left' node (the node that has the 'left' state). has_Right_State This property relates a transformation to a 'right' state (the state associated with the 'right' node). represents_TU This property relates a TU or taxonomic unit (typically associated with character data) to a phylogenetic history (Tree). exclude_Node has_Compound_State This property associates a compound character-state instance with its compound state value. belongs_to Generic property that links a concept to another concept it is a constituent of. The property is a synonym of part_of. belongs_to_TU This property relates a character-state datum to its TU. belongs_to_Network has_Annotation part_of has_Nucleotide_Datum This property relates a nucleotide character (a column in a nucleotide alignment) to a state datum for the character (an individual cell in the alignment column). represented_by_Node This property relates a TU to a node that represents it in a network. has_Remaining_Coordinate_List The property that relates a coordinate list to the item in the list beyond the first item. has_Element exclude_Subtree belongs_to_Tree has_Parent_Node Associates to a Directed Edge the Node that is in the parent position in the edge (i.e., the node touched by the edge and closer to the root of the tree) has_Lineage_node belongs_to_Tree_as_Root has_Hereditary_Change belongs_to_Character has_Molecular_Datum has_Continuous_Datum This property relates a continuous character to a state datum for the character. has_TU This property associates a character data matrix with a TU (a row) represented in the matrix. has_Child_Node The property associates to a Directed Edge the Node that is in the child position in the edge, i.e., the node touched by the edge and closer to the leaves of the tree. has_Right_Node This property relates a transformation to a 'right' node (the node that has the 'right' state). has_Precision has_Point_Coordinate_Value has_Int_Value has_Support_Value has_Value has_Uncertainty_Factor has_Range_End_Value has_Float_Value has_Range_Start_Value DesoxiRibonucleotideResidueStateDatum CoordinatePoint Lineage Phylo4Tree Network ModelDescription Description of a model of transformations. This is a non-computible description of a model, not the fully specified mathematical model, which typically relates the probability of a transformation to various parameters. StandardStateDatum ContinuousCharacterLengthType ContinuousCharBayesianLengthType NEXUSTreeBlock RootedTree 1 Kimura2Parameters TreeProcedure Generic_State This class should be renamed. These are not generic states but non-concrete states including gap, unknown and missing. This concept is tied to the verbally ambiguous 'gap' concept and to the use of a gap character (often the en dash '-') in text representations of sequence alignments. In general, this represents the absence of any positively diagnosed Character-State. As such, the gap may be interpreted as an additional Character-State, as the absence of the Character, or as an unknown value. In some cases it is helpful to separate these. UnrootedSubtree UnresolvedTree BifurcatingTree ContinuousStateDatum SubstitutionModel JukesKantor DatumCoordinate 1 A positional coordinate giving the source of a character state, used for molecular sequences. drawing from seqloc categories from NCBI at http://www.ncbi.nlm.nih.gov/IEB/ToolBox/SDKDOCS/SEQLOC.HTML#_Seq-loc:_Locations_on UnresolvedRootedTree Branch 'Branch' is the domain-specific synonym for an edge of a (Phylogenetic) Tree or Network. Branches may have properties such as length and degree of support. CharacterStateDataMatrixAnnotation Meta-information associated with a character matrix, such as, for the case of a sequence alignment, the method of alignment. AncestralNode 1 UnresolvedUnrootedTree UncertainStateDomain ReconcileTree 2 Continuous 1 This class describes a continuous value. The link to the actual float value is through the property has_Value. It could have also other properties attached (e.g., has_Precision). AlignmentProcedure Dichotomy Molecular ContinuousCharParsimonyLengthType Categorical CDAOAnnotation Its possible that this base class should be discarded and that annotations should inherit from an imported base class if one exists. The base class of annotations in CDAO. originationEvent Polytomy 3 PolymorphicStateDomain 1.0 TreeAnnotation Standard EdgeLength Its possible that this should not be classed as an 'annotation' since it contains data rather than meta-data. The length of an edge (branch) of a Tree or Network, typically in units of evolutionary changes in character-state per character. RibonucleotideResidue Clade DiscreteCharParsimonyLengthType MolecularStateDatum PolyphyleticGroup NexusDataBlock BranchingNode 2 Compound CharacterStateDataMatrix A matrix of character-state data, typically containing observed data, though in some cases the states in the matrix might be simulated or hypothetical. Synonyms: character Data matrix, character-state matrix RibonucleotideResidueStateDatum TimeCalibratedLengthType SetOfNodes MRCANode 1 FASTADataMatrix evolutionaryTransition 1 1 1 EdgeLengthType cladogeneticChange anageneticChange TUAnnotation PhyloTree ContinuousCharacter PHYLIPTree Subtree Character Traits shown to be relevant for phylogenetic classification GalledTree SpeciesTree TreeFormat StandardCharacter AminoAcidResidue This class will be declared equivalent ot the amino acid class description imported geneDuplication CompoundCharacter A character that could be divided into separate characters but is not due to the non-independence of changes that would result, e.g., as in the case of a subsequence that is either present or absent as a block. SIMMAPTree CommonAncestralNode NewickTree TimeProportionalLengthType DiscreteCharDistanceLengthType StarTree FullyResolvedUnrootedTree ParaphyleticGroup geneticEvent UnrootedTree CategoricalStateDatum DiscreteCharLikelihoodLengthType CharacterStateDomain The universe of possible states for a particular type of character, e.g., the states of an Amino_Acid character come from the Amino_Acid domain. CoordinateList GammaDistribution DesoxiRibonucleotideResidueCharacter CoordinateRange ReticulateEvolution hereditaryChange 1 1 1 CharacterStateDatum 1 1 The instance of a given character for a given TU. Its state is an object property drawn from a particular character state domain, e.g., the state of an Amino_Acid_State_Datum is an object property drawn from the domain Amino_Acid. Edge 2 An edge connecting two nodes in a (Phylogenetic) Tree or Network, also known as a 'branch'. Edges may have attributes such as length, degree of support, and direction. An edge can be a surrogate for a 'split' or bipartition, since each edge in a tree divides the terminal nodes into two sets. DiscreteCharacterLengthType EdgeAnnotation FullyResolvedRootedTree GrafenLengthType CoordinateSystem A reference to an external coordinate system. Coordinates for data must refer to some such external coordinate system. GenBankDataMatrix DataMatrixFormat TerminalNode RibonucleotideResidueCharacter Tree CategoricalCharacter AminoAcidResidueStateDatum PHYLIPDataMatrix ContinuousCharLikelihoodLengthType MolecularCharacter hereditaryPersistance SetOfCharacters SetOfThings The class is used to describe either colletions of characters or higher order grouping (e.g., groups of groups of characters). This extends the CharSet block of NEXUS. Sequence 1 A set of ordered states, typically the residues in a macromolecular sequence. speciation cladogenesis Bifurcation 2 DiscreteCharBayesianLengthType TaxonomicLink Link to an externally defined taxonomic hierarchy. MonophyleticGroup molecularRecombination HolophyleticGroup FullyResolvedTree AminoAcidResidueCharacter recombination DesoxiRibonucleotideResidue RootedSubtree CompoundStateDatum GapCost TU A unit of analysis that may be tied to a node in a tree and to a row in a character matrix. It subsumes the traditional concepts of 'OTU' and 'HTU'. DirectedEdge 1 1 A directed edge. Rooted trees have directed edges. The direction is specified by way of the parent and child relationships of nodes that the edge connects. Node 1 1 ContinuousCharDistanceLengthType dA absent unknown gap dG rU dC dT ''' cdao_elements = {} root = ET.fromstring(cdao_owl) for node_type in 'ObjectProperty', 'Class', 'DatatypeProperty': for element in root.findall('{http://www.w3.org/2002/07/owl#}%s' % node_type): obo = element.attrib['{http://www.w3.org/1999/02/22-rdf-syntax-ns#}about'].split('/')[-1] cdao = element.find('{http://www.w3.org/2000/01/rdf-schema#}label').text cdao_elements[cdao] = obo