update May 8, 2015

ncbiquery.py - Send an Entrez query to NCBI and return the result

ncbiquery.py  --db database [--qfile qfile_name] --query entrez_query_statement [--filter entrez_filter_statement] [--maxcount integer] --format sequence_format --out outfile
ncbiquery.py  --qfile qfile_name --related [--maxcount integer] --format sequence_format --out outfile
ncbiquery.py  --qfile qfile_name --target database [-maxcount integer] --format sequence_format --out outfile

ncbiquery sends a query to an NCBI database using the Entrez query syntax. Results are returned to outfile in the specified format.The -qfile option makes it possible to use the document summary from a previous search to further refine the search with a new query term.

ncbiquery.py is a wrapper that hides the individual BioPython Bio::Entrez functions behind a single command line. It is specifically intended to simplify running the Entrez Eutilities from BioLegato.

  --qfile qfile_name - output from a previous ncbiquery.py run. This run will refine the previous search.
  --query entrez_query_statement - search parameters
  --filter entrez_filter_statement - limit search based on other parameters
  --related - call elink to look up neighbors in current database
  --target database - call elink to find links in a different database
  --maxcount integer - (default=500) If the number of hits exceeds maxcount, do not retrieve entries, but just display the search metadata.
  --format sequence_format -  (default= docsum) format as defined by NCBI EUtilities.
  --db database - NCBI database as defined for EUtilities
  --out output_file  


1. Search for entries in the NCBI core nucleotide database in which the author is Fristensky and the organism contains the word Pisum. Write output to a file in the docsum XML format.
ncbiquery.py --db nuccore --query 'Fristensky [AUTH] AND Pisum [ORGN]' --format docsum --out FristenskyPisum.ncbiquery

2. In the core nucleotide database, find sequences related to those found in example 1.

ncbiquery.py --qfile FristenskyPisum.ncbiquery --related --format docsum --out FristenskyPisumRelated.ncbiquery

3. From the sequences found in example 1, find linked sequences in the protein database

ncbiquery.py --qfile FristenskyPisum.ncbiquery --target protein --format docsum --out FristenskyPisumProtLink.ncbiquery

NCBI Entrez Help Manual at http://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options.

NCBI E-utilities Quick Start http://www.ncbi.nlm.nih.gov/books/NBK25500

BioPython Bio::Entrez at http://biopython.org/DIST/docs/api/Bio.Entrez-module.html


The ncbiquery.py script expects that xtract from the NCBI EDirect Package (http://www.ncbi.nlm.nih.gov/books/NBK179288/) is present in the execution PATH. In BIRCH, this script is found in $BIRCH/script. xtract is called by ncbiquery.py to extract specific fields from DocumentSummary output generated by NCBI Entrez tools.

Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2