blastdbkit.py performs tasks on a BLAST database whose location is given in the environment variable $BLASTDB.
A full technical description of how blastdbkit.py works can be found at
http://home.cc.umanitoba.ca/~psgendb/birchhomedir/BIRCHDEV/public_html/birchadmin/blastdb/BLASTDBTechnical.html
--showall - prints an alphabetical list of available BLAST databases at the remote FTP site. Default: ftp.ncbi.nlmn.nih. Use --ftpsite to specify a particular FTP site.
--configure - This option is called by getbirch during installs and updates of a BIRCH system. If the BLASTDB environment variable is already set (ie. a BLAST database already exists on the system), this variable is set in $BIRCH/local/admin/BIRCH.settings. Otherwise, BLASTDB is set to $BIRCH/GenBank. The location of --birchdir must be set at the command line because of the fact that --configure is called during a fresh BIRCH install, when we can't count on the $BIRCH environment variable, or the presence of this setting in BIRCH.properties. Not compatible with --add, --delete or --update.
--birchdir directory - path to the BIRCH home directory. ($BIRCH)
--blastdb directory - path to the BLAST database directory. In BIRCH, the default is $BIRCH/GenBank.
--reportlocal - Write a spreadsheet-ready report with statistics on the local copy of the NCBI databases. The report is a tab-separated value file written to $BLASTDB/localstats.tsv.
--reportftp - Write a spreadsheet-ready report with statistics on the remote copy of the NCBI databases. The report is a tab-separated value file written to $BLASTDB/ftpstats.tsv
--add - Add files in dblist from the FTP site specified by --ftpsite to the BLASTDB database.
--delete - Delete files in dblist from the BLASTDB database.
--update - Update files in dblist from the FTP site specified by --ftpsite. Blast databases are often divided among many parts eg. nt.00.tar.gz, nt.01.tar.gz, nt.02.tar.gz etc. During an update, the only files downloaded are the ones that are newer than the ones locally-installed. This avoids completely downloading an entire database if only a few files have changed.
FTP site |
Directory for BLAST file
downloads |
Location |
ftp.ncbi.nih.gov |
/blast/db |
Bethesda, Maryland, USA |
ftp.hgc.jp |
pub/mirror/ncbi/blast/db |
Tokyo, Japan |
blastdbkit.py --update --dblist all
Updates all currently-installed databases
blastdbkit.py --add --dblist all
Adds ALL databases from the remote FTP site. At this writing, that corresponds to about 850 Gb!
blastdbkit.py --delete --dblist all
For obvious reasons, this is potentially a dangerous option!
Table 1. Codes for --dblist. GenBank BLAST version 5 databases |
|
Code |
Description |
NUCLEOTIDE |
|
nt | Non-redundant nucleotide |
refseq_rna | RefSeq RNA |
refseq_select_rna |
RefSeq Selected RNA sequences |
human_genome | Human Genomic - RefSeq Human chromosomal |
mouse_genome | Mouse Genomic - RefSeq
Mouse Chromosomal |
ref_euk_rep_genomes |
Representative Eukaryotic
Genomes |
ref_prok_rep_genomes |
Representative Prokaryotic
Genomes |
ref_viroids_rep_genomes |
Representative Viroid
Genomes |
ref_viruses_rep_genomes |
Representative Virus
Genomes |
patnt | Patented Nucleotide |
pdbnt | Nucleotide sequences from PDB 3D nucl. acid structures |
16S_ribosomal_RNA |
16S ribosomal |
18S_ribosomal_RNA |
18S ribosomal RNA |
28S_ribosomal_RNA |
28S ribosomal RNA |
ITS_RefSeq_Fungi |
ITS_RefSeq_Fungi |
ITS_eukaryote_sequences |
ITS_eukaryote_sequences |
LSU_eukaryote_rRNA |
LSU_eukaryote_rRNA |
LSU_prokaryote_rRNA |
LSU_prokaryote_rRNA |
SSU_eukaryote_rRNA |
SSU_eukaryote_rRNA |
Betacoronavirus |
Betacoronavirus |
PROTEIN |
|
nr | Non-redundant protein |
refseq_protein | RefSeq Protein |
swissprot | Uniprot |
pdbaa | Protein sequences from PDB 3D protein structures |
landmark |
Proteomes from reference
species |
HIGH THROUGHPUT |
|
env_nt | Environmental - Nucleotide |
OTHER |
|
taxdb | Taxonomy |
Some of the ideas in blastdbkit.py have been borrowed from the NCBI script update_blastdb.pl. blastdbkit.py differs for update_blastdb.pl in a number of ways:
- update_blastdb.pl requires installation of some non-standard Perl libraries. blastdbkit.py uses only standard Python.
- blastdbkit adds new functions: --reportlocal --reportftp and --delete
- in testing, update_blastdb.pl appears to always download databases, regardless of whether they are newer than those already installed
- blastdbkit adds the database name 'all' as a means of updating all databases
- blastdbkit.py has the additional capabilities of creating FASTA .nam files needed to search databases using FASTA programs.
- blastdbkit.py creates BioLegato menu items for searching databases in bldna and blprotein.
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist