# Copyright 2018 by Adhemar Zerlotini. All rights reserved. # # This file is part of the Biopython distribution and governed by your # choice of the "Biopython License Agreement" or the "BSD 3-Clause License". # Please see the LICENSE file that should have been included as part of this # package. """Bio.SearchIO support for InterProScan output formats. This module adds support for parsing InterProScan XML output. The InterProScan is available as a command line program or on EMBL-EBI's web page. Bio.SearchIO.InterproscanIO was tested on the following version: - versions: 5.26-65.0 (interproscan-model-2.1.xsd) More information about InterProScan are available through these links: - Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/ - Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search - Documentation: https://github.com/ebi-pf-team/interproscan/wiki Supported format ================ Bio.SearchIO.InterproscanIO supports the following format: - XML - 'interproscan-xml' - parsing interproscan-xml ================ The interproscan-xml parser follows the InterProScan XML described here: https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats +--------------+--------------------+----------------------------------------+ | Object | Attribute | XML Element | +==============+====================+========================================+ | QueryResult | target | `InterPro` | | +--------------------+----------------------------------------+ | | program | `InterProScan` | | +--------------------+----------------------------------------+ | | version | `protein-matches.interproscan-version` | +--------------+--------------------+----------------------------------------+ | Hit | accession | `signature.name` | | +--------------------+----------------------------------------+ | | id | `signature.ac` | | +--------------------+----------------------------------------+ | | description | `signature.desc` | | +--------------------+----------------------------------------+ | | dbxrefs | `IPR:entry.ac` | | | | `go-xref.id` | | | | `pathway-xref.db:pathway-xref.id` | | +--------------------+----------------------------------------+ | | attributes | | | | ['Target'] | `*-match` / `*-location` | | | ['Target version'] | `signature-library-release.library` | | | ['Hit type'] | `signature-library-release.version` | +--------------+--------------------+----------------------------------------+ | HSP | bitscore | `*-location.score` | | +--------------------+----------------------------------------+ | | evalue | `*-location.evalue` | +--------------+--------------------+----------------------------------------+ | HSPFragment | query_start | `*-location.start` | | (also via +--------------------+----------------------------------------+ | HSP) | query_end | `*-location.end` | | +--------------------+----------------------------------------+ | | hit_start | `*-location.hmm-start` | | +--------------------+----------------------------------------+ | | hit_end | `*-location.hmm-end` | | +--------------------+----------------------------------------+ | | query | `sequence` | +--------------+--------------------+----------------------------------------+ InterProScan XML files may contain a match with multiple locations or multiple matches to the same protein with a single location. In both cases, the match is uniquely stored as a HIT object and the locations as HSP objects. `HSP.*start == *start - 1` (Since every start position is 0-based in Biopython) `HSP.aln_span == query-end - query-start` The types of matches or locations (eg. hmmer3-match, hmmer3-location, coils-match, panther-location) are stored in hit.attributes['Hit type']. For instance, for every 'phobious-match', there will be a 'phobious-location'. Therefore, Hit.type will store the string excluding '-match' or '-location' ('phobious', in this example). """ from .interproscan_xml import InterproscanXmlParser # if not used as a module, run the doctest if __name__ == "__main__": from Bio._utils import run_doctest run_doctest()