Therefore, you will compare one or more of your translated protein sequences with all sequences in GenBank, using the tblastn program. tblastn translates DNA sequences in GenBank to protein, before comparing them with your protein.You will search your DNA sequence against the GenBank DNA database, containing all published DNA sequences from plants, animals, fungi, bacteria, archea, or viruses.
SEARCH at the National
Center for Biotechnology Information in Bethesda, Maryland.
Score EFor each match, a score is given. The the more nucleotides that match, the higher the score. The 'E value' tells the number of matches at that score, that would be expected by random chance alone, given the size of the database. An E value greater than 1 means that at least 1 or more such matches are expected, and therefore, the match is of no statistical significance. An E value of 0.01 means that a match of this score would be seen by chance once for every 100 database searches, with different test sequences of comparable length to your sequence. E = 0.001 means that you would only see a match this good once in a thousand searches. The choice of significance level therefore depends on how important it is to eliminate false positives.
Sequences producing significant alignments: (bits) Valuegi|3286690|emb|AJ007450.1|ATH7450 Arabidopsis thaliana mRNA... 166 2e-40
gi|18410827|ref|NM_106186.1| Arabidopsis thaliana DNAJ heat... 157 2e-37
gi|7212003|gb|AC023754.3|AC023754 Arabidopsis thaliana chro... 54 4e-14
gi|12331602|gb|AC025814.7|AC025814 Arabidopsis thaliana chr... 54 4e-14
The best match in this example is to an Arabidopsis thaliana auxilin-like protein:
These results show the test (query) sequence aligned with the Arabidopsis sequence from GenBank, with gaps (-) inserted to optimize the alignment. Between the two, amino acids that are present in both sequences are written with the corresponding letter, while plus (+) characters indicate that the corresponding amino acids in both sequences are chemically similar, though not identical.>gi|3286690|emb|AJ007450.1|ATH7450Arabidopsis thaliana mRNA for auxilin-like protein
Length = 1649
Score = 166 bits (421), Expect = 2e-40
Identities = 91/124 (73%), Positives = 104/124 (83%), Gaps = 1/124 (0%)
Frame = +3
Query: 1 LLKREVMVAASRLALLVIDEAPHLLVQRTKVRVLLVLQTRLPKLSQSRDAKLDLRDTREH 60
LLK++VM AAS LALLV DEAPHLLV+RTKV+VL +LQTRLPK++QSRD KLDLRDTREH
Sbjct: 837 LLKQKVMAAASHLALLVKDEAPHLLVRRTKVQVLPILQTRLPKVNQSRDVKLDLRDTREH 1016
Query: 61 LSVRQKLLQRRNFVISKSRKRRQREIGSRKLLMLMSSGGRTERKTTCGHCS-NTPIYLGA 119
L+ +Q+LLQRRNFVI K RK RQREI SRKLLMLMS+GGR ERKTT G S ++ IYL
Sbjct: 1017LTAQQRLLQRRNFVILKPRKSRQREIDSRKLLMLMSNGGRVERKTTYGR*SQHSNIYLEQ 1196
Query: 120 ESDG 123
DG
Sbjct: 1197RVDG 1208
![]() |
RETURN TO "Bioinformatics: Gene Identification" | ![]() |