org.biojava.bio.seq
Class DNATools

java.lang.Object
  extended by org.biojava.bio.seq.DNATools

public final class DNATools
extends Object

Useful functionality for processing DNA sequences.

Author:
Matthew Pocock, Keith James (docs), Mark Schreiber, David Huen, Richard Holland

Method Summary
static AtomicSymbol a()
           
static Symbol b()
           
static AtomicSymbol c()
           
static Symbol complement(Symbol sym)
          Complement the symbol.
static SymbolList complement(SymbolList list)
          Retrieve a complement view of list.
static ReversibleTranslationTable complementTable()
          Get a translation table for complementing DNA symbols.
static SymbolList createDNA(String dna)
          Return a new DNA SymbolList for dna.
static Sequence createDNASequence(String dna, String name)
          Return a new DNA Sequence for dna.
static GappedSequence createGappedDNASequence(String dna, String name)
          Get a new dna as a GappedSequence
static Symbol d()
           
static char dnaToken(Symbol sym)
          Get a single-character token for a DNA symbol
static SymbolList flip(SymbolList list, StrandedFeature.Strand strand)
          Returns a SymbolList that is reverse complemented if the strand is negative, and the origninal one if it is not.
static Symbol forIndex(int index)
          Return the symbol for an index - compatible with index.
static Symbol forSymbol(char token)
          Retrieve the symbol for a symbol.
static AtomicSymbol g()
           
static FiniteAlphabet getCodonAlphabet()
          Gets the (DNA x DNA x DNA) Alphabet
static FiniteAlphabet getDNA()
          Return the DNA alphabet.
static Distribution getDNADistribution(double fractionGC)
          return a SimpleDistribution of specified GC content.
static FiniteAlphabet getDNAxDNA()
          Gets the (DNA x DNA) Alphabet
static Distribution getDNAxDNADistribution(double fractionGC0, double fractionGC1)
          return a (DNA x DNA) cross-product Distribution with specified DNA contents in each component Alphabet.
static Symbol h()
           
static int index(Symbol sym)
          Return an integer index for a symbol - compatible with forIndex.
static Symbol k()
           
static Symbol m()
           
static Symbol n()
           
static Symbol r()
           
static SymbolList reverseComplement(SymbolList list)
          Retrieve a reverse-complement view of list.
static Symbol s()
           
static AtomicSymbol t()
           
static SymbolList toProtein(SymbolList syms)
          Convenience method that directly converts a DNA sequence to RNA then to protein.
static SymbolList toProtein(SymbolList syms, int start, int end)
          Convenience method to translate a region of a DNA sequence directly into protein.
static SymbolList toRNA(SymbolList syms)
          Converts a SymbolList from the DNA Alphabet to the RNA Alphabet.
static SymbolList transcribeToRNA(SymbolList syms)
          Transcribes DNA to RNA.
static Symbol v()
           
static Symbol w()
           
static Symbol y()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Method Detail

a

public static AtomicSymbol a()

g

public static AtomicSymbol g()

c

public static AtomicSymbol c()

t

public static AtomicSymbol t()

n

public static Symbol n()

m

public static Symbol m()

r

public static Symbol r()

w

public static Symbol w()

s

public static Symbol s()

y

public static Symbol y()

k

public static Symbol k()

v

public static Symbol v()

h

public static Symbol h()

d

public static Symbol d()

b

public static Symbol b()

getDNA

public static FiniteAlphabet getDNA()
Return the DNA alphabet.

Returns:
a flyweight version of the DNA alphabet

getDNAxDNA

public static FiniteAlphabet getDNAxDNA()
Gets the (DNA x DNA) Alphabet

Returns:
a flyweight version of the (DNA x DNA) alphabet

getCodonAlphabet

public static FiniteAlphabet getCodonAlphabet()
Gets the (DNA x DNA x DNA) Alphabet

Returns:
a flyweight version of the (DNA x DNA x DNA) alphabet

createDNA

public static SymbolList createDNA(String dna)
                            throws IllegalSymbolException
Return a new DNA SymbolList for dna.

Parameters:
dna - a String to parse into DNA
Returns:
a SymbolList created form dna
Throws:
IllegalSymbolException - if dna contains any non-DNA characters

createDNASequence

public static Sequence createDNASequence(String dna,
                                         String name)
                                  throws IllegalSymbolException
Return a new DNA Sequence for dna.

Parameters:
dna - a String to parse into DNA
name - a String to use as the name
Returns:
a Sequence created form dna
Throws:
IllegalSymbolException - if dna contains any non-DNA characters

createGappedDNASequence

public static GappedSequence createGappedDNASequence(String dna,
                                                     String name)
                                              throws IllegalSymbolException
Get a new dna as a GappedSequence

Throws:
IllegalSymbolException

index

public static int index(Symbol sym)
                 throws IllegalSymbolException
Return an integer index for a symbol - compatible with forIndex.

The index for a symbol is stable accross virtual machines & invocations.

Parameters:
sym - the Symbol to index
Returns:
the index for that symbol
Throws:
IllegalSymbolException - if sym is not a member of the DNA alphabet

forIndex

public static Symbol forIndex(int index)
                       throws IndexOutOfBoundsException
Return the symbol for an index - compatible with index.

The index for a symbol is stable accross virtual machines & invocations.

Parameters:
index - the index to look up
Returns:
the symbol at that index
Throws:
IndexOutOfBoundsException - if index is not between 0 and 3

complement

public static Symbol complement(Symbol sym)
                         throws IllegalSymbolException
Complement the symbol.

Parameters:
sym - the symbol to complement
Returns:
a Symbol that is the complement of sym
Throws:
IllegalSymbolException - if sym is not a member of the DNA alphabet

forSymbol

public static Symbol forSymbol(char token)
                        throws IllegalSymbolException
Retrieve the symbol for a symbol.

Parameters:
token - the char to look up
Returns:
the symbol for that char
Throws:
IllegalSymbolException - if the char is not a valid IUB dna code

complement

public static SymbolList complement(SymbolList list)
                             throws IllegalAlphabetException
Retrieve a complement view of list.

Parameters:
list - the SymbolList to complement
Returns:
a SymbolList that is the complement
Throws:
IllegalAlphabetException - if list is not a complementable alphabet

reverseComplement

public static SymbolList reverseComplement(SymbolList list)
                                    throws IllegalAlphabetException
Retrieve a reverse-complement view of list.

Parameters:
list - the SymbolList to complement
Returns:
a SymbolList that is the complement
Throws:
IllegalAlphabetException - if list is not a complementable alphabet

flip

public static SymbolList flip(SymbolList list,
                              StrandedFeature.Strand strand)
                       throws IllegalAlphabetException
Returns a SymbolList that is reverse complemented if the strand is negative, and the origninal one if it is not.

Parameters:
list - the SymbolList to view
strand - the Strand to use
Returns:
the apropreate view of the SymbolList
Throws:
IllegalAlphabetException - if list is not a complementable alphabet

complementTable

public static ReversibleTranslationTable complementTable()
Get a translation table for complementing DNA symbols.

Since:
1.1

dnaToken

public static char dnaToken(Symbol sym)
                     throws IllegalSymbolException
Get a single-character token for a DNA symbol

Throws:
IllegalSymbolException - if sym is not a member of the DNA alphabet

getDNADistribution

public static Distribution getDNADistribution(double fractionGC)
return a SimpleDistribution of specified GC content.

Parameters:
fractionGC - (G+C) content as a fraction.

getDNAxDNADistribution

public static Distribution getDNAxDNADistribution(double fractionGC0,
                                                  double fractionGC1)
return a (DNA x DNA) cross-product Distribution with specified DNA contents in each component Alphabet.

Parameters:
fractionGC0 - (G+C) content of first sequence as a fraction.
fractionGC1 - (G+C) content of second sequence as a fraction.

toRNA

public static SymbolList toRNA(SymbolList syms)
                        throws IllegalAlphabetException
Converts a SymbolList from the DNA Alphabet to the RNA Alphabet.

Parameters:
syms - the SymbolList to convert to RNA
Returns:
a view on syms where Symbols have been converted to RNA. Most significantly t's are now u's. The 5' to 3' order of the Symbols is conserved.
Throws:
IllegalAlphabetException - if syms is not DNA.
Since:
1.4

transcribeToRNA

public static SymbolList transcribeToRNA(SymbolList syms)
                                  throws IllegalAlphabetException
Transcribes DNA to RNA. The method more closely represents the biological reality than toRNA(SymbolList syms) does. The presented DNA SymbolList is assumed to be the template strand in the 5' to 3' orientation. The resulting RNA is transcribed from this template effectively a reverse complement in the RNA alphabet. The method is equivalent to calling reverseComplement() and toRNA() in sequence.

If you are dealing with cDNA sequences that you want converted to RNA you would be better off calling toRNA(SymbolList syms)

Parameters:
syms - the SymbolList to convert to RNA
Returns:
a view on syms where Symbols have been converted to RNA.
Throws:
IllegalAlphabetException - if syms is not DNA.
Since:
1.4

toProtein

public static SymbolList toProtein(SymbolList syms)
                            throws IllegalAlphabetException
Convenience method that directly converts a DNA sequence to RNA then to protein. The translated protein is from the +1 reading frame of the SymbolList. The whole SymbolList is translated although up to 2 DNA residues may be truncated if full codons cannot be formed.

Parameters:
syms - the sequence to be translated.
Returns:
the translated protein sequence.
Throws:
IllegalAlphabetException - if syms is not from the DNA alphabet.
Since:
1.5.1

toProtein

public static SymbolList toProtein(SymbolList syms,
                                   int start,
                                   int end)
                            throws IllegalAlphabetException
Convenience method to translate a region of a DNA sequence directly into protein. While the start and end can be specified if the length of the specified region is not evenly divisible by three then the translated region will be truncated until a full terminal codon can be formed.

Parameters:
syms - the DNA sequence to be translated.
start - the location to begin translation.
end - the end of the translated region.
Returns:
the translated protein sequence.
Throws:
IllegalAlphabetException - if syms is not from the DNA alphabet.
Since:
1.5.1