org.biojavax.bio.seq
Class RichSequence.IOTools

java.lang.Object
  extended by org.biojavax.bio.seq.RichSequence.IOTools
Enclosing interface:
RichSequence

public static final class RichSequence.IOTools
extends Object

A set of convenience methods for handling common file formats.

Since:
1.5
Author:
Mark Schreiber, Richard Holland

Nested Class Summary
static class RichSequence.IOTools.SingleRichSeqIterator
          Used to iterate over a single rich sequence
 
Method Summary
static SymbolTokenization getDNAParser()
          Creates a DNA symbol tokenizer.
static SymbolTokenization getNucleotideParser()
          Creates a nucleotide symbol tokenizer.
static SymbolTokenization getProteinParser()
          Creates a protein symbol tokenizer.
static SymbolTokenization getRNAParser()
          Creates a RNA symbol tokenizer.
static RichSequenceIterator readEMBL(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a EMBL file using a custom type of SymbolList.
static RichSequenceIterator readEMBLDNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBL-format stream of DNA sequences.
static RichSequenceIterator readEMBLProtein(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBL-format stream of Protein sequences.
static RichSequenceIterator readEMBLRNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBL-format stream of RNA sequences.
static RichSequenceIterator readEMBLxml(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a EMBLxml file using a custom type of SymbolList.
static RichSequenceIterator readEMBLxmlDNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBLxml-format stream of DNA sequences.
static RichSequenceIterator readEMBLxmlProtein(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBLxml-format stream of Protein sequences.
static RichSequenceIterator readEMBLxmlRNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an EMBLxml-format stream of RNA sequences.
static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, Namespace ns)
          Read a fasta file.
static RichSequenceIterator readFasta(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a fasta file building a custom type of RichSequence .
static RichSequenceIterator readFastaDNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an FASTA-format stream of DNA sequences.
static RichSequenceIterator readFastaProtein(BufferedReader br, Namespace ns)
          Iterate over the sequences in an FASTA-format stream of Protein sequences.
static RichSequenceIterator readFastaRNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an FASTA-format stream of RNA sequences.
static RichSequenceIterator readFile(File file, Namespace ns)
          Guess which format a file is then attempt to read it.
static RichSequenceIterator readFile(File file, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Guess which format a file is then attempt to read it.
static RichSequenceIterator readGenbank(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a GenBank file using a custom type of SymbolList.
static RichSequenceIterator readGenbankDNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an GenBank-format stream of DNA sequences.
static RichSequenceIterator readGenbankProtein(BufferedReader br, Namespace ns)
          Iterate over the sequences in an GenBank-format stream of Protein sequences.
static RichSequenceIterator readGenbankRNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an GenBank-format stream of RNA sequences.
static RichSequenceIterator readHashedFastaDNA(BufferedInputStream is, Namespace ns)
          Iterate over the sequences in an FASTA-format stream of DNA sequences.
static RichSequenceIterator readINSDseq(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a INSDseq file using a custom type of SymbolList.
static RichSequenceIterator readINSDseqDNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an INSDseq-format stream of DNA sequences.
static RichSequenceIterator readINSDseqProtein(BufferedReader br, Namespace ns)
          Iterate over the sequences in an INSDseq-format stream of Protein sequences.
static RichSequenceIterator readINSDseqRNA(BufferedReader br, Namespace ns)
          Iterate over the sequences in an INSDseq-format stream of RNA sequences.
static RichSequenceIterator readStream(BufferedInputStream stream, Namespace ns)
          Guess which format a stream is then attempt to read it.
static RichSequenceIterator readStream(BufferedInputStream stream, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Guess which format a stream is then attempt to read it.
static RichSequenceIterator readUniProt(BufferedReader br, Namespace ns)
          Iterate over the sequences in an UniProt-format stream of RNA sequences.
static RichSequenceIterator readUniProt(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a UniProt file using a custom type of SymbolList.
static RichSequenceIterator readUniProtXML(BufferedReader br, Namespace ns)
          Iterate over the sequences in an UniProt XML-format stream of RNA sequences.
static RichSequenceIterator readUniProtXML(BufferedReader br, SymbolTokenization sTok, RichSequenceBuilderFactory seqFactory, Namespace ns)
          Read a UniProt XML file using a custom type of SymbolList.
static void registerFormat(Class formatClass)
          Register a new format with IOTools for auto-guessing.
static void writeEMBL(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in EMBL Format.
static void writeEMBL(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in EMBL format.
static void writeEMBLxml(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in EMBLxml Format.
static void writeEMBLxml(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in EMBLxml format.
static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns)
          Writes Sequences from a SequenceIterator to an OutputStream in Fasta Format.
static void writeFasta(OutputStream os, SequenceIterator in, Namespace ns, FastaHeader header)
          Writes Sequences from a SequenceIterator to an OutputStream in Fasta Format.
static void writeFasta(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in Fasta format.
static void writeFasta(OutputStream os, Sequence seq, Namespace ns, FastaHeader header)
          Writes a single Sequence to an OutputStream in Fasta format.
static void writeGenbank(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in GenBank Format.
static void writeGenbank(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in GenBank format.
static void writeINSDseq(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in INSDseq Format.
static void writeINSDseq(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in INSDseq format.
static void writeUniProt(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in UniProt Format.
static void writeUniProt(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in UniProt format.
static void writeUniProtXML(OutputStream os, SequenceIterator in, Namespace ns)
          Writes sequences from a SequenceIterator to an OutputStream in UniProt XML Format.
static void writeUniProtXML(OutputStream os, Sequence seq, Namespace ns)
          Writes a single Sequence to an OutputStream in UniProt XML format.
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

registerFormat

public static void registerFormat(Class formatClass)
Register a new format with IOTools for auto-guessing.

Parameters:
formatClass - the RichSequenceFormat object to register.

readStream

public static RichSequenceIterator readStream(BufferedInputStream stream,
                                              RichSequenceBuilderFactory seqFactory,
                                              Namespace ns)
                                       throws IOException
Guess which format a stream is then attempt to read it.

Parameters:
stream - the BufferedInputStream to attempt to read.
seqFactory - a factory used to build a RichSequence
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the file
Throws:
IOException - in case the stream is unrecognisable or problems occur in reading it.

readStream

public static RichSequenceIterator readStream(BufferedInputStream stream,
                                              Namespace ns)
                                       throws IOException
Guess which format a stream is then attempt to read it.

Parameters:
stream - the BufferedInputStream to attempt to read.
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the file
Throws:
IOException - If the file cannot be read.

readFile

public static RichSequenceIterator readFile(File file,
                                            RichSequenceBuilderFactory seqFactory,
                                            Namespace ns)
                                     throws IOException
Guess which format a file is then attempt to read it.

Parameters:
file - the File to attempt to read.
seqFactory - a factory used to build a RichSequence
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the file
Throws:
IOException - in case the file is unrecognisable or problems occur in reading it.

readFile

public static RichSequenceIterator readFile(File file,
                                            Namespace ns)
                                     throws IOException
Guess which format a file is then attempt to read it.

Parameters:
file - the File to attempt to read.
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the file
Throws:
IOException - If the file cannot be read.

readFasta

public static RichSequenceIterator readFasta(BufferedReader br,
                                             SymbolTokenization sTok,
                                             Namespace ns)
Read a fasta file.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readFasta

public static RichSequenceIterator readFasta(BufferedReader br,
                                             SymbolTokenization sTok,
                                             RichSequenceBuilderFactory seqFactory,
                                             Namespace ns)
Read a fasta file building a custom type of RichSequence . For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a RichSequence
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readFastaDNA

public static RichSequenceIterator readFastaDNA(BufferedReader br,
                                                Namespace ns)
Iterate over the sequences in an FASTA-format stream of DNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file
See Also:
for a speeded up version that can access sequences from memory.

readHashedFastaDNA

public static RichSequenceIterator readHashedFastaDNA(BufferedInputStream is,
                                                      Namespace ns)
                                               throws BioException
Iterate over the sequences in an FASTA-format stream of DNA sequences. In contrast to readFastaDNA, this provides a speeded up implementation where all sequences are accessed from memory.

Parameters:
is - the BufferedInputStream to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file
Throws:
BioException - if somethings goes wrong while reading the file.
See Also:
readFastaDNA(java.io.BufferedReader, org.biojavax.Namespace)

readFastaRNA

public static RichSequenceIterator readFastaRNA(BufferedReader br,
                                                Namespace ns)
Iterate over the sequences in an FASTA-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readFastaProtein

public static RichSequenceIterator readFastaProtein(BufferedReader br,
                                                    Namespace ns)
Iterate over the sequences in an FASTA-format stream of Protein sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readGenbank

public static RichSequenceIterator readGenbank(BufferedReader br,
                                               SymbolTokenization sTok,
                                               RichSequenceBuilderFactory seqFactory,
                                               Namespace ns)
Read a GenBank file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readGenbankDNA

public static RichSequenceIterator readGenbankDNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an GenBank-format stream of DNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readGenbankRNA

public static RichSequenceIterator readGenbankRNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an GenBank-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readGenbankProtein

public static RichSequenceIterator readGenbankProtein(BufferedReader br,
                                                      Namespace ns)
Iterate over the sequences in an GenBank-format stream of Protein sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readINSDseq

public static RichSequenceIterator readINSDseq(BufferedReader br,
                                               SymbolTokenization sTok,
                                               RichSequenceBuilderFactory seqFactory,
                                               Namespace ns)
Read a INSDseq file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readINSDseqDNA

public static RichSequenceIterator readINSDseqDNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an INSDseq-format stream of DNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readINSDseqRNA

public static RichSequenceIterator readINSDseqRNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an INSDseq-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readINSDseqProtein

public static RichSequenceIterator readINSDseqProtein(BufferedReader br,
                                                      Namespace ns)
Iterate over the sequences in an INSDseq-format stream of Protein sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLxml

public static RichSequenceIterator readEMBLxml(BufferedReader br,
                                               SymbolTokenization sTok,
                                               RichSequenceBuilderFactory seqFactory,
                                               Namespace ns)
Read a EMBLxml file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLxmlDNA

public static RichSequenceIterator readEMBLxmlDNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of DNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLxmlRNA

public static RichSequenceIterator readEMBLxmlRNA(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLxmlProtein

public static RichSequenceIterator readEMBLxmlProtein(BufferedReader br,
                                                      Namespace ns)
Iterate over the sequences in an EMBLxml-format stream of Protein sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBL

public static RichSequenceIterator readEMBL(BufferedReader br,
                                            SymbolTokenization sTok,
                                            RichSequenceBuilderFactory seqFactory,
                                            Namespace ns)
Read a EMBL file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLDNA

public static RichSequenceIterator readEMBLDNA(BufferedReader br,
                                               Namespace ns)
Iterate over the sequences in an EMBL-format stream of DNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLRNA

public static RichSequenceIterator readEMBLRNA(BufferedReader br,
                                               Namespace ns)
Iterate over the sequences in an EMBL-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readEMBLProtein

public static RichSequenceIterator readEMBLProtein(BufferedReader br,
                                                   Namespace ns)
Iterate over the sequences in an EMBL-format stream of Protein sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readUniProt

public static RichSequenceIterator readUniProt(BufferedReader br,
                                               SymbolTokenization sTok,
                                               RichSequenceBuilderFactory seqFactory,
                                               Namespace ns)
Read a UniProt file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readUniProt

public static RichSequenceIterator readUniProt(BufferedReader br,
                                               Namespace ns)
Iterate over the sequences in an UniProt-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readUniProtXML

public static RichSequenceIterator readUniProtXML(BufferedReader br,
                                                  SymbolTokenization sTok,
                                                  RichSequenceBuilderFactory seqFactory,
                                                  Namespace ns)
Read a UniProt XML file using a custom type of SymbolList. For example, use RichSequenceBuilderFactory.FACTORY to emulate readFasta(BufferedReader, SymbolTokenization) and RichSequenceBuilderFactory.PACKED to force all symbols to be encoded using bit-packing.

Parameters:
br - the BufferedReader to read data from
sTok - a SymbolTokenization that understands the sequences
seqFactory - a factory used to build a SymbolList
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

readUniProtXML

public static RichSequenceIterator readUniProtXML(BufferedReader br,
                                                  Namespace ns)
Iterate over the sequences in an UniProt XML-format stream of RNA sequences.

Parameters:
br - the BufferedReader to read data from
ns - a Namespace to load the sequences into. Null implies that it should use the namespace specified in the file. If no namespace is specified in the file, then RichObjectFactory.getDefaultNamespace() is used.
Returns:
a RichSequenceIterator over each sequence in the fasta file

writeFasta

public static void writeFasta(OutputStream os,
                              SequenceIterator in,
                              Namespace ns,
                              FastaHeader header)
                       throws IOException
Writes Sequences from a SequenceIterator to an OutputStream in Fasta Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input RichSequences
ns - a Namespace to write the RichSequences to. Null implies that it should use the namespace specified in the individual sequence.
header - the FastaHeader
Throws:
IOException - if there is an IO problem

writeFasta

public static void writeFasta(OutputStream os,
                              SequenceIterator in,
                              Namespace ns)
                       throws IOException
Writes Sequences from a SequenceIterator to an OutputStream in Fasta Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input RichSequences
ns - a Namespace to write the RichSequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeFasta

public static void writeFasta(OutputStream os,
                              Sequence seq,
                              Namespace ns)
                       throws IOException
Writes a single Sequence to an OutputStream in Fasta format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeFasta

public static void writeFasta(OutputStream os,
                              Sequence seq,
                              Namespace ns,
                              FastaHeader header)
                       throws IOException
Writes a single Sequence to an OutputStream in Fasta format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
header - a FastaHeader that controls the fields in the header.
Throws:
IOException - if there is an IO problem

writeGenbank

public static void writeGenbank(OutputStream os,
                                SequenceIterator in,
                                Namespace ns)
                         throws IOException
Writes sequences from a SequenceIterator to an OutputStream in GenBank Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeGenbank

public static void writeGenbank(OutputStream os,
                                Sequence seq,
                                Namespace ns)
                         throws IOException
Writes a single Sequence to an OutputStream in GenBank format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeINSDseq

public static void writeINSDseq(OutputStream os,
                                SequenceIterator in,
                                Namespace ns)
                         throws IOException
Writes sequences from a SequenceIterator to an OutputStream in INSDseq Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeINSDseq

public static void writeINSDseq(OutputStream os,
                                Sequence seq,
                                Namespace ns)
                         throws IOException
Writes a single Sequence to an OutputStream in INSDseq format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeEMBLxml

public static void writeEMBLxml(OutputStream os,
                                SequenceIterator in,
                                Namespace ns)
                         throws IOException
Writes sequences from a SequenceIterator to an OutputStream in EMBLxml Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeEMBLxml

public static void writeEMBLxml(OutputStream os,
                                Sequence seq,
                                Namespace ns)
                         throws IOException
Writes a single Sequence to an OutputStream in EMBLxml format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeEMBL

public static void writeEMBL(OutputStream os,
                             SequenceIterator in,
                             Namespace ns)
                      throws IOException
Writes sequences from a SequenceIterator to an OutputStream in EMBL Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeEMBL

public static void writeEMBL(OutputStream os,
                             Sequence seq,
                             Namespace ns)
                      throws IOException
Writes a single Sequence to an OutputStream in EMBL format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeUniProt

public static void writeUniProt(OutputStream os,
                                SequenceIterator in,
                                Namespace ns)
                         throws IOException
Writes sequences from a SequenceIterator to an OutputStream in UniProt Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeUniProt

public static void writeUniProt(OutputStream os,
                                Sequence seq,
                                Namespace ns)
                         throws IOException
Writes a single Sequence to an OutputStream in UniProt format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeUniProtXML

public static void writeUniProtXML(OutputStream os,
                                   SequenceIterator in,
                                   Namespace ns)
                            throws IOException
Writes sequences from a SequenceIterator to an OutputStream in UniProt XML Format. This makes for a useful format filter where a StreamReader can be sent to the RichStreamWriter after formatting.

Parameters:
os - The stream to write fasta formatted data to
in - The source of input Sequences
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

writeUniProtXML

public static void writeUniProtXML(OutputStream os,
                                   Sequence seq,
                                   Namespace ns)
                            throws IOException
Writes a single Sequence to an OutputStream in UniProt XML format.

Parameters:
os - the OutputStream.
seq - the Sequence.
ns - a Namespace to write the sequences to. Null implies that it should use the namespace specified in the individual sequence.
Throws:
IOException - if there is an IO problem

getDNAParser

public static SymbolTokenization getDNAParser()
Creates a DNA symbol tokenizer.

Returns:
a SymbolTokenization for parsing DNA.

getRNAParser

public static SymbolTokenization getRNAParser()
Creates a RNA symbol tokenizer.

Returns:
a SymbolTokenization for parsing RNA.

getNucleotideParser

public static SymbolTokenization getNucleotideParser()
Creates a nucleotide symbol tokenizer.

Returns:
a SymbolTokenization for parsing nucleotides.

getProteinParser

public static SymbolTokenization getProteinParser()
Creates a protein symbol tokenizer.

Returns:
a SymbolTokenization for parsing protein.