last page | PLNT4610/PLNT7690Bioinformatics Lecture 5, part 2 of 3 |
nextpage |
Example: Human myoglobin gene (X00371)
Every two months GenBank releases are produced, containing essentially all published and many unpublished DNA sequences worldwide. For organizational purposes, the database is split up among a number of divisions:Division |
Description |
PRI | primate sequences |
ROD |
rodent sequences |
MAM |
other mammalian sequences |
VRT |
other vertebrate sequences |
INV |
invertebrate sequences |
PLN |
plant, fungal, and algal sequences |
BCT |
bacterial and archeal sequences |
VRL |
viral sequences |
PHG |
bacteriophage sequences |
SYN |
synthetic sequences eg. cloning vectors |
UNA |
unannotated sequences |
EST |
EST sequences (expressed sequence tags) |
PAT |
Sequences from patent applications |
STS |
STS sequences (sequence tagged sites - sequences for which PCR primers are described; used in genomic mapping) |
GSS | GSS sequences (genome survey sequences ) |
HTG |
High Throughput Genomic Sequences (raw, high throughput genomic sequencing reads) |
HTC |
HTC sequences (raw, high throughput cDNA sequencing reads) |
ENV |
environmental sampling sequences (eg.
metagenomics) |
TSA |
Transcriptome Shotgun Assembly sequences |
CON |
Does not contain sequence data. Contigs are described by join() statements, joining other sequences into contigs |
Example: protein entries corresponding to Human myoglobin gene (GB:X00371)
Both GenBank and EMBL also generate databases containing translations of DNA sequences. GenBank produces the GenPept database, and EMBL the TrEMBL database. These databases can be thought of as raw translations of known or predicted coding sequences from DNA data. GenPept and TrEMBL exist primarily as a convenience for database searches.In contrast, PIR and UniProt databases are carefully
annotated to produce an efficient report on each protein. In
particular, when many genes encode identical proteins, only
one protein entry is produced, citing all the genes and their
respective GenBank/EMBL/DDBJ accession numbers. PIRand UniProt
specialize in annotating features relating to protein
structure or chemistry. Where 3D structures are known, links
to protein structure databases are also included.
Unless otherwise cited or referenced, all content on this page is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada |
last page | PLNT4610/PLNT7690Bioinformatics Lecture 5, part 2 of 3 |
nextpage |