TRANSLATE at EXPASY http://www.expasy.org/tools/dna.html
Other translation programs:
TRANSLATE at the NIH http://bimas.dcrt.nih.gov:80/molbio/translate/
TRANSLATE at EBI http://www2.ebi.ac.uk/translate/
- Usually amino acids in a protein are represented using an internationally recognized 1-letter code.(Click here for more information on the different ways to represent amino acids.)
- Remember, each amino acid is encoded by a group of three nucleotides called a codon. Therefore the protein you get is determined by the starting position in the DNA. The example below shows both strands of the sample sequence, along with the amino acids resulting from beginning at position 1, position 2, or position 3. In frame 1, "GCT TCT CAA" translates into "ASQ"; in frame 2, "CTT CTC AAA" translates into "LLK", and in frame 3 "TTC TCA AAC" translates into "FSN". (Note: asterisks (*) stand for stop codons.)
For readability, the results have been rewritten with numbering. Each reading frame is in a different color.Sample output, original orientation
15 30 45 60
GCTTCTCAAACGAGAAGTTATGGTGGCAGCAAGTCGTTTGGCTCTTCTGGTGATAGACGA
CGAAGAGTTTGCTCTTCAATACCACCGTCGTTCAGCAAACCGAGAAGACCACTATCTGCT
A S Q T R S Y G G S K S F G S S G D R R
L L K R E V M V A A S R L A L L V I D E
F S N E K L W W Q Q V V W L F W * * T R
75 90 105 120
GGCTCCTCATCTTCTGGTACAGAGAACAAAAGTTCGGGTCCTTCTAGTTCTTCAAACCAG
CCGAGGAGTAGAAGACCATGTCTCTTGTTTTCAAGCCCAGGAAGATCAAGAAGTTTGGTC
G S S S S G T E N K S S G P S S S S N Q
A P H L L V Q R T K V R V L L V L Q T R
L L I F W Y R E Q K F G S F * F F K P D
135 150 165 180
ACTGCCAAAGTTGAGCCAATCCAGAGATGCAAAGCTAGATCTGAGAGACACCAGAGAACA
TGACGGTTTCAACTCGGTTAGGTCTCTACGTTTCGATCTAGACTCTCTGTGGTCTCTTGT
T A K V E P I Q R C K A R S E R H Q R T
L P K L S Q S R D A K L D L R D T R E H
C Q S * A N P E M Q S * I * E T P E N I
195 210 225 240
TCTGAGCGTGCGGCAGAAGCTCTTGCAGAGAAGAAACTTCGTGATCTCAAAGTCCAGAAA
AGACTCGCACGCCGTCTTCGAGAACGTCTCTTCTTTGAAGCACTAGAGTTTCAGGTCTTT
S E R A A E A L A E K K L R D L K V Q K
L S V R Q K L L Q R R N F V I S K S R K
* A C G R S S C R E E T S * S Q S P E R
255 270 285 300
GAGGAGGCAGAGAGAAATAGGCTCTCGGAAGCTCTTGATGCTGATGTCAAGCGGTGGTCG
CTCCTCCGTCTCTCTTTATCCGAGAGCCTTCGAGAACTACGACTACAGTTCGCCACCAGC
E E A E R N R L S E A L D A D V K R W S
R R Q R E I G S R K L L M L M S S G G R
G G R E K * A L G S S * C * C Q A V V E
315 330 345 360
AACGGAAAGGAAAACAACCTGCGGGCATTGCTCTAACACTCCAATATATCTTGGAGCAGA
TTGCCTTTCCTTTTGTTGGACGCCCGTAACGAGATTGTGAGGTTATATAGAACCTCGTCT
N G K E N N L R A L L * H S N I S W S R
T E R K T T C G H C S N T P I Y L G A E
R K G K Q P A G I A L T L Q Y I L E Q R
375 390
GAGTGATGGAACCATCCCTCTACTGATCTTG
CTCACTACCTTGGTAGGGAGATGACTAGAAC
E * W N H P S T D L
S D G T I P L L I L
V M E P S L Y * SHowever, we often don't know the orientation of the DNA that we're sequencing, with respect to the RNA. So we have to do the translation using the complementary strand in opposite orientation, as illustrated below: Sample output, opposite orientation
377 362 347 332
CAAGATCAGTAGAGGGATGGTTCCATCACTCTCTGCTCCAAGATATATTGGAGTGTTAGA
GTTCTAGTCATCTCCCTACCAAGGTAGTGAGAGACGAGGTTCTATATAACCTCACAATCT
Q D Q * R D G S I T L C S K I Y W S V R
K I S R G M V P S L S A P R Y I G V L E
R S V E G W F H H S L L Q D I L E C * S
317 302 287 272
GCAATGCCCGCAGGTTGTTTTCCTTTCCGTTCGACCACCGCTTGACATCAGCATCAAGAG
CGTTACGGGCGTCCAACAAAAGGAAAGGCAAGCTGGTGGCGAACTGTAGTCGTAGTTCTC
A M P A G C F P F R S T T A * H Q H Q E
Q C P Q V V F L S V R P P L D I S I K S
N A R R L F S F P F D H R L T S A S R A
257 242 227 212
CTTCCGAGAGCCTATTTCTCTCTGCCTCCTCTTTCTGGACTTTGAGATCACGAAGTTTCT
GAAGGCTCTCGGATAAAGAGAGACGGAGGAGAAAGACCTGAAACTCTAGTGCTTCAAAGA
L P R A Y F S L P P L S G L * D H E V S
F R E P I S L C L L F L D F E I T K F L
S E S L F L S A S S F W T L R S R S F F
197 182 167 152
TCTCTGCAAGAGCTTCTGCCGCACGCTCAGATGTTCTCTGGTGTCTCTCAGATCTAGCTT
AGAGACGTTCTCGAAGACGGCGTGCGAGTCTACAAGAGACCACAGAGAGTCTAGATCGAA
S L Q E L L P H A Q M F S G V S Q I * L
L C K S F C R T L R C S L V S L R S S F
S A R A S A A R S D V L W C L S D L A L
137 122 107 92
TGCATCTCTGGATTGGCTCAACTTTGGCAGTCTGGTTTGAAGAACTAGAAGGACCCGAAC
ACGTAGAGACCTAACCGAGTTGAAACCGTCAGACCAAACTTCTTGATCTTCCTGGGCTTG
C I S G L A Q L W Q S G L K N * K D P N
A S L D W L N F G S L V * R T R R T R T
H L W I G S T L A V W F E E L E G P E L
77 62 47 32
TTTTGTTCTCTGTACCAGAAGATGAGGAGCCTCGTCTATCACCAGAAGAGCCAAACGACT
AAAACAAGAGACATGGTCTTCTACTCCTCGGAGCAGATAGTGGTCTTCTCGGTTTGCTGA
F C S L Y Q K M R S L V Y H Q K S Q T T
F V L C T R R * G A S S I T R R A K R L
L F S V P E D E E P R L S P E E P N D L
17 2
TGCTGCCACCATAACTTCTCGTTTGAGAAGC
ACGACGGTGGTATTGAAGAGCAAACTCTTCG
C C H H N F S F E K
A A T I T S R L R S
L P P * L L V * EMost protein coding sequences begin with ATG, the codon for Methionine. However, the sample sequences given here are usually only gene fragments, rather than complete genes. Therefore, most of the genes will be missing the beginning parts of the protein coding sequence. We now have six different amino acid sequences, but only one can be right. Which one? An "open reading frame" (ORF) is any stretch of DNA with no STOP codons. The correct ORF, the one used in nature, is almost always the longest ORF. In the example, frames 1 and 2 are both fairly long. Also, we need to take into account that sequencing errors could introduce frameshifts. Therefore, it would probably be best to do the database search (next page) with both amino sequences, to see which is the correct reading frame.
previous page
RETURN TO "Bioinformatics: Gene Identification" next page