update June 25, 2022
NAME

blastdbkit.py - Install and update local BLAST databases

SYNOPSIS


blastdbkit.py --showall [--ftpsite url ]
blastdbkit.py --configure [ --birchdir directory --blastdb directory ]
blastdbkit.py  --reportlocal 
blastdbkit.py  --reportftp [--ftpsite url ]
blastdbkit.py --add  [ --ftpsite url ] --dblist db[,db]
blastdbkit.py --delete --dblist db[,db]
blastdbkit.py
 --update [ --ftpsite url ] --dblist db[,db]

DESCRIPTION
blastdbkit.py performs tasks on a BLAST database whose location is given in the environment variable $BLASTDB.

A full technical description of how blastdbkit.py works can be found at
http://home.cc.umanitoba.ca/~psgendb/birchhomedir/BIRCHDEV/public_html/birchadmin/blastdb/BLASTDBTechnical.html

OPTIONS
--showall - prints an alphabetical list of available BLAST databases at the remote FTP site. Default: ftp.ncbi.nlmn.nih. Use --ftpsite to specify a particular FTP site.

--configure
-  This option is called by getbirch during installs and updates of a BIRCH system. If the BLASTDB environment variable is already set (ie. a BLAST database already exists on the system), this variable is set in $BIRCH/local/admin/BIRCH.settings. Otherwise, BLASTDB is set to $BIRCH/GenBank. The location of --birchdir must be set at the command line because of the fact that --configure is called during a fresh BIRCH install, when we can't count on the $BIRCH environment variable, or the presence of this setting in BIRCH.properties. Not compatible with --add, --delete or --update.
--birchdir directory - path to the BIRCH home directory. ($BIRCH)

--blastdb directory - path to the BLAST database directory. In BIRCH, the default is $BIRCH/GenBank.

--reportlocal - Write a spreadsheet-ready report with statistics on the local copy of the NCBI databases. The report is a tab-separated value file written to $BLASTDB/localstats.tsv.

--reportftp - Write a spreadsheet-ready report with statistics on the remote copy of the NCBI databases. The report is a tab-separated value file written to $BLASTDB/ftpstats.tsv
--add - Add files in dblist from the FTP site specified by --ftpsite to the BLASTDB database.

--delete - Delete files in dblist from the BLASTDB database.

--update - Update files in dblist from the FTP site specified by --ftpsite. Blast databases are often divided among many parts eg. nt.00.tar.gz, nt.01.tar.gz, nt.02.tar.gz etc. During an update, the only files downloaded are the ones that are newer than the ones locally-installed. This avoids completely downloading an entire database if only a few files have changed.

--ftpsite url - FTP site from which to download pre-formatted BLAST database files eg. ftp.ncbi.nih.gov. update_blastdb.pl will not download files if md5 checksum files are not available. Depending upon the ftpsite chosen, blastdbkit.py will download files from the appropriate directory, as listed in the table below. It is usually best to download files from the FTP site geographically closest to your location.

FTP site
Directory for BLAST file downloads
Location
ftp.ncbi.nih.gov
/blast/db
Bethesda, Maryland, USA
ftp.hgc.jp
pub/mirror/ncbi/blast/db
Tokyo, Japan


--dblist db[,db] - a comma-separated list of databases that should be installed. All databases included in the list will be installed or updated. If a database is not included in the list, but is currently installed, it will be deleted. If the database is currently installed, it will be updated.

The argument 'all' can be used with the --add, --update and --delete as follows. Note: Because of the size of these databases, 'all' should be used with a lot of forethought!

blastdbkit.py --update --dblist all
Updates all currently-installed databases

blastdbkit.py --add --dblist all
Adds ALL databases from the remote FTP site. At this writing, that corresponds to about 850 Gb!

blastdbkit.py --delete --dblist all
For obvious reasons, this is potentially a dangerous option!

The following options are mutually exclusive: --configure, --reportlocal, --reportftp, --add, --delete, --update.


Table 1. Codes for --dblist. GenBank BLAST version 5 databases
Code
Description
NUCLEOTIDE

nt Non-redundant nucleotide
refseq_rna RefSeq RNA
refseq_select_rna
RefSeq Selected RNA sequences
human_genome Human Genomic - RefSeq Human chromosomal
mouse_genome Mouse Genomic - RefSeq Mouse Chromosomal
ref_euk_rep_genomes
Representative Eukaryotic Genomes
ref_prok_rep_genomes
Representative Prokaryotic Genomes
ref_viroids_rep_genomes
Representative Viroid Genomes
ref_viruses_rep_genomes
Representative Virus Genomes
patnt Patented Nucleotide
pdbnt Nucleotide sequences from PDB 3D nucl. acid structures
16S_ribosomal_RNA
16S ribosomal
18S_ribosomal_RNA
18S ribosomal RNA
28S_ribosomal_RNA
28S ribosomal RNA
ITS_RefSeq_Fungi
ITS_RefSeq_Fungi
ITS_eukaryote_sequences
ITS_eukaryote_sequences
LSU_eukaryote_rRNA
LSU_eukaryote_rRNA
LSU_prokaryote_rRNA
LSU_prokaryote_rRNA
SSU_eukaryote_rRNA
SSU_eukaryote_rRNA
Betacoronavirus
Betacoronavirus
PROTEIN

nr Non-redundant protein
refseq_protein RefSeq Protein
swissprot Uniprot
pdbaa Protein sequences from PDB 3D protein structures
landmark
Proteomes from reference species
HIGH THROUGHPUT

env_nt Environmental - Nucleotide
OTHER

taxdb Taxonomy
 

 
ACKNOWLEDGEMENT
Some of the ideas in blastdbkit.py have been borrowed from the NCBI script update_blastdb.pl. blastdbkit.py differs for update_blastdb.pl in a number of ways:

AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist