ribosome update 27 Aug 2011 NAME ribosome - translates nucleic acid into protein SYNOPSIS ribosome [-g gcfile] [-p] [-n] [-f1|-f2|-f3|-f123] < input > output DESCRIPTION ribosome reads a file of one or more nucleic acid sequences and writes the corresponding amino acid sequence, in the standard one letter code, to output. Ribosome begins translating at the first nucleotide in each input sequence and continues to the end. If the length of the translated sequence is not divisible by 3, ribosome pads the final codon with N's and attempts to use ambi- guity rules to translate the final codon. Based on the genetic code used, ribosome derives a set of rules to resolve all ambi- guities that can possibly be resolved. OPTIONS -g read in an alternative genetic code from gcfile. If this option is not specified, ribosome uses the universal genetic code. gcfile - This file specifies an alternative genetic code. An example is shown below. ribosome reads the first 64 legal capital letters as amino acids. Consequently, lowercase letters can be used for annotation purposes, as shown in the example. All non-amino acid characters are ignored. sgc2 - yeast mitochondrial genetic code second position first position ------------------------------- third position (5' end) u c a g (3' end) ----------------------------------------------------------------- u F S Y C u F S Y C c L S * W a L S * W g ----------------------------------------------------------------- c T P H R u T P H R c T P Q R a T P Q R g ----------------------------------------------------------------- a I T N S u I T N S c M T K R a M T K R g ----------------------------------------------------------------- g V A D G u V A D G c V A E G a V A E G g -p - Pad the protein sequence with dashes (-) so that the amino acids in the protein sequences align with the codons in the DNA. Thus, if the codon is ATG, the amino acid will be printed as M-- (assuming the universal genetic code). As well, proteins in the 2nd reading frame will have '-' added at the beginning of the sequence, and proteins in the 3rd reading frame will have '--' added to the beginning. -fx - Print the sequence in the reading frame(s) specified by x, where x can be 1,2,3, or 123 for all three reading frames on the input strand. That is -f123 will print the proteins in all 3 reading frames. Currently, ribosome does NOT print proteins from the opposite strand. -n - append the number of the reading frame to the sequence name. Thus, a protein named X88754, if translated in the 2nd reading frame, will have its name printed as X88754_2. input - If the first line of the file begins with '>' or ';', input will be read as the standard .wrp (Pearson) format, such as that produced by getob: >name ; one or more comment lines (optional) sequence lines Otherwise, it will be assumed that the file ONLY contains sequence, and all legal IUPAC/IUB DNA characters will be read as sequence. SEE ALSO getob AUTHOR Dr. Brian Fristensky Dept. of Plant Science University of Manitoba Winnipeg, MB Canada R3T 2N2 Phone: 204-474-6085 FAX: 204-261-5732 frist@cc.umanitoba.ca REFERENCE Fristensky, B. (1993) Feature expressions: creating and manipulating sequence datasets. Nucleic Acids Research 21:5997-6003.