# Copyright 2018 by Adhemar Zerlotini. All rights reserved. # # This file is part of the Biopython distribution and governed by your # choice of the "Biopython License Agreement" or the "BSD 3-Clause License". # Please see the LICENSE file that should have been included as part of this # package. """Bio.SearchIO support for InterProScan output formats. This module adds support for parsing InterProScan XML output. The InterProScan is available as a command line program or on EMBL-EBI's web page. Bio.SearchIO.InterproscanIO was tested on the following version: - versions: 5.26-65.0 (interproscan-model-2.1.xsd) More information about InterProScan are available through these links: - Publication: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3998142/ - Web interface: https://www.ebi.ac.uk/interpro/search/sequence-search - Documentation: https://github.com/ebi-pf-team/interproscan/wiki Supported format ================ Bio.SearchIO.InterproscanIO supports the following format: - XML - 'interproscan-xml' - parsing interproscan-xml ================ The interproscan-xml parser follows the InterProScan XML described here: https://github.com/ebi-pf-team/interproscan/wiki/OutputFormats +--------------+--------------------+------------------------------------------+ | Object | Attribute | XML Element | +==============+====================+==========================================+ | QueryResult | target | ``InterPro`` | | +--------------------+------------------------------------------+ | | program | ``InterProScan`` | | +--------------------+------------------------------------------+ | | version | ``protein-matches.interproscan-version`` | +--------------+--------------------+------------------------------------------+ | Hit | accession | ``signature.name`` | | +--------------------+------------------------------------------+ | | id | ``signature.ac`` | | +--------------------+------------------------------------------+ | | description | ``signature.desc`` | | +--------------------+------------------------------------------+ | | dbxrefs | ``IPR:entry.ac`` | | | | ``go-xref.id`` | | | | ``pathway-xref.db:pathway-xref.id`` | | +--------------------+------------------------------------------+ | | attributes | | | | ['Target'] | ``*-match`` / ``*-location`` | | | ['Target version'] | ``signature-library-release.library`` | | | ['Hit type'] | ``signature-library-release.version`` | +--------------+--------------------+------------------------------------------+ | HSP | bitscore | ``*-location.score`` | | +--------------------+------------------------------------------+ | | evalue | ``*-location.evalue`` | +--------------+--------------------+------------------------------------------+ | HSPFragment | query_start | ``*-location.start`` | | (also via +--------------------+------------------------------------------+ | HSP) | query_end | ``*-location.end`` | | +--------------------+------------------------------------------+ | | hit_start | ``*-location.hmm-start`` | | +--------------------+------------------------------------------+ | | hit_end | ``*-location.hmm-end`` | | +--------------------+------------------------------------------+ | | query | ``sequence`` | +--------------+--------------------+------------------------------------------+ InterProScan XML files may contain a match with multiple locations or multiple matches to the same protein with a single location. In both cases, the match is uniquely stored as a HIT object and the locations as HSP objects. ``HSP.*start == *start - 1`` (Since every start position is 0-based in Biopython) ``HSP.aln_span == query-end - query-start`` The types of matches or locations (eg. hmmer3-match, hmmer3-location, coils-match, panther-location) are stored in hit.attributes['Hit type']. For instance, for every 'phobious-match', there will be a 'phobious-location'. Therefore, Hit.type will store the string excluding '-match' or '-location' ('phobious', in this example). """ from .interproscan_xml import InterproscanXmlParser # if not used as a module, run the doctest if __name__ == "__main__": from Bio._utils import run_doctest run_doctest()