ribosome                                                update 27 Aug 2011 

      NAME
           ribosome - translates nucleic acid into protein
      
      SYNOPSIS
           ribosome [-g gcfile] [-p] [-n] [-f1|-f2|-f3|-f123] < input > output
      
      DESCRIPTION
           ribosome reads a file of one or more nucleic acid sequences
           and writes the corresponding amino acid sequence, in the standard
           one letter code, to output. Ribosome begins translating at the
           first nucleotide in each input sequence and continues to the end.
           If the length of the translated sequence is not divisible by 3,
           ribosome pads the final codon with N's and attempts to use ambi-
           guity rules to translate the final codon. Based on the genetic
           code used, ribosome derives a set of rules to resolve all ambi-
           guities that can possibly be resolved.
	   
       OPTIONS   

           -g    read in an alternative genetic code from gcfile.  If this
                 option is not specified, ribosome uses the universal
                 genetic code.
                             
           gcfile - This file specifies an alternative genetic code. An
           example is shown below.  ribosome reads the first 64 legal
           capital letters as amino acids.  Consequently, lowercase letters
           can be used for annotation purposes, as shown in the example.
           All non-amino acid characters are ignored.

                    sgc2 - yeast mitochondrial genetic code

                                  second position
           first position  -------------------------------    third position
           (5' end)        u         c         a         g    (3' end) 
           -----------------------------------------------------------------
           u               F         S         Y         C    u
                           F         S         Y         C    c
                           L         S         *         W    a
                           L         S         *         W    g
           -----------------------------------------------------------------
           c               T         P         H         R    u
                           T         P         H         R    c
                           T         P         Q         R    a
                           T         P         Q         R    g
           -----------------------------------------------------------------
           a               I         T         N         S    u
                           I         T         N         S    c
                           M         T         K         R    a
                           M         T         K         R    g
           -----------------------------------------------------------------
           g               V         A         D         G    u
                           V         A         D         G    c
                           V         A         E         G    a
                           V         A         E         G    g

           -p - Pad the protein sequence with dashes (-) so that the amino acids in
	   the protein sequences align with the codons in the DNA. Thus, if the codon
	   is ATG, the amino acid will be printed as M-- (assuming the universal
	   genetic code). As well, proteins in the 2nd reading frame will have '-'
	   added at the beginning of the sequence, and proteins in the 3rd reading
	   frame will have '--' added to the beginning.
	   
	   -fx - Print the sequence in the reading frame(s) specified by x, where
	   x can be 1,2,3, or 123 for all three reading frames on the input strand.
	   That is -f123 will print the proteins in all 3 reading frames.
	   Currently, ribosome does NOT print proteins from the opposite strand.
	   
	   -n - append the number of the reading frame to the sequence name. Thus, a
	   protein named X88754, if translated in the 2nd reading frame, will have
	   its name printed as X88754_2.

           input - If the first line of the file begins with '>' or ';', 
           input will be read as the standard .wrp (Pearson) format, 
           such as that produced by getob:
           
           >name
           ; one or more comment lines (optional)
           sequence lines

           
           Otherwise, it will be assumed that the file ONLY contains 
           sequence, and all legal IUPAC/IUB DNA characters will be
           read as sequence. 

      SEE ALSO
            getob

     AUTHOR
       Dr. Brian Fristensky
       Dept. of Plant Science
       University of Manitoba
       Winnipeg, MB  Canada  R3T 2N2
       Phone: 204-474-6085
       FAX: 204-261-5732
       frist@cc.umanitoba.ca

     REFERENCE
       Fristensky, B. (1993) Feature expressions: creating and manipulating
       sequence datasets. Nucleic Acids Research 21:5997-6003.