update July 18, 2020

ncbiquery.py - Send an Entrez query to NCBI and return the result

ncbiquery.py  --db database [--qfile qfile_name] --query entrez_query_statement [--filter entrez_filter_statement] [--maxcount integer] --format sequence_format --out outfile
ncbiquery.py  --qfile qfile_name --related [--maxcount integer] --format sequence_format --out outfile
ncbiquery.py  --qfile qfile_name --target database [-maxcount integer] --format sequence_format --out outfile

ncbiquery sends a query to an NCBI database using the Entrez query syntax. Results are returned to outfile in the specified format.The -qfile option makes it possible to use the document summary from a previous search to further refine the search with a new query term.

ncbiquery.py is a wrapper that hides the individual BioPython Bio::Entrez functions behind a single command line. It is specifically intended to simplify running the Entrez Eutilities from BioLegato.

If the environment variables $BL_EMAIL and $NCBI_ENTREZ_KEY are set, all requests to the Entrez Eutils will be processed using the user's Entrez API key.

  --qfile qfile_name - output from a previous ncbiquery.py run. This run will refine the previous search.
  --query entrez_query_statement - search parameters
  --filter entrez_filter_statement - limit search based on other parameters
  --related - call elink to look up neighbors in current database
  --target database - call elink to find links in a different database
  --maxcount integer - (default=500) If the number of hits exceeds maxcount, do not retrieve entries, but just display the search metadata.
  --format sequence_format -  (default= docsum) format as defined by NCBI EUtilities.
  --db database - NCBI database as defined for EUtilities
  --out output_file  


1. Search for entries in the NCBI core nucleotide database in which the author is Fristensky and the organism contains the word Pisum. Write output to a file in the docsum XML format.

Note: ncbiquery is a bash shell script that runs ncbiquery.py. It sets PYTHONPATH, as described below.

ncbiquery --db nuccore --query 'Fristensky [AUTH] AND Pisum [ORGN]' --format docsum --out FristenskyPisum.ncbiquery

2. In the core nucleotide database, find sequences related to those found in example 1.

ncbiquery --qfile FristenskyPisum.ncbiquery --related --format docsum --out FristenskyPisumRelated.ncbiquery

3. From the sequences found in example 1, find linked sequences in the protein database

ncbiquery --qfile FristenskyPisum.ncbiquery --target protein --format docsum --out FristenskyPisumProtLink.ncbiquery

PYTHONPATH (required) - Path to BioPython. ncbiquery sets PYTHONPATH to a platform specific directory containing BioPython, and then runs ncbiquery.py. If you run ncbiquery.py directly, you need to set PYTHONPATH manually.

BL_EMAIL (required) - Email address to accompany requests to NCBI Entrez. Required by NCBI

NCBI_ENTREZ_KEY (optional) - Unique identifier for requests to NCBI Entrez. If no key is supplied, you may get slower retrieval times. If you do a large number of requests (eg. more than 3 per minute) you must supply a key, or NCBI will ramp down your future requests. See NCBI Eutil API keys.

New API Keys for the E-utilities

NCBI Entrez Help Manual at http://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options.

NCBI E-utilities Quick Start http://www.ncbi.nlm.nih.gov/books/NBK25500

BioPython Bio::Entrez at http://biopython.org/DIST/docs/api/Bio.Entrez-module.html

The ncbiquery.py script expects that xtract from the NCBI EDirect Package (http://www.ncbi.nlm.nih.gov/books/NBK179288/) is present in the execution PATH. In BIRCH, this script is found in $BIRCH/script. xtract is called by ncbiquery.py to extract specific fields from DocumentSummary output generated by NCBI Entrez tools.

Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2