update October  08 2015
NAME

seqfetch.py - Given a set of GI (UID) or ACCESSION numbers , create a file containing the corresponding sequence entries

SYNOPSIS
seqfetch.py --infile infile --outfile  outfile [--query entrez_query_statement] [--format sequence_format] [--db NCBI_database] [--sep seperator]

DESCRIPTION
seqfetch reads  infile,  containing one or more DNA, RNA or protein IDs from NCBI databases. IDs can be either GI numbers or ACCESSION numbers. Sequences are retrieved from NCBI using the NCBI Entrez, and written to outfile.

OPTIONS

--query entrez_query_statement  - an Entrez query statement as described in the NCBI Entrez Help.

Note: This option will only work if infile contains GI numbers. The current implementation of NCBI's epost, which is needed for a query, does not support ACCESSION numbers.

Example:
--query '1:250000[SLEN]'

would retrieve sequences less then or equal to 250,000 bp in length, which might be useful if you were only interested in sequences up to the size of BAC inserts, but not complete chromosomes.
--format  sequence_format - sequence format for output, as described in the EDirect Appendices .
Formats may include
		-format        -mode         Report Type
 		_______         _____  	     ___________
acc Accession Number est EST Report fasta FASTA fasta xml TinySeq XML fasta_cds_aa FASTA of CDS Products fasta_cds_na FASTA of Coding Regions ft Feature Table gb GenBank Flatfile gb xml GBSet XML gbc xml INSDSet XML gbwithparts GenBank with Contig Sequences gp GenPept Flatfile gp xml GBSet XML gpc xml INSDSet XML gss GSS Report native text Seq-entry ASN.1 native xml Bioseq-set XML seqid Seq-id ASN.1


--db NCBI_database - NCBI database from which to retrieve sequences. As described in the Edirect documentation, databases may include
    protein
    nuccore
    nucleotide
    nucgss
    nucest

--sep separator - Character used for delimiting GID or Accession numbers in infile. Default is comma (,). This is usually only needed if more than one GID is on a line.

INPUT
infile contains a list of IDs, one per line. Comments are lines beginning with hash symbols (#) and can be placed anywhere in the file. Example:

# BLASTN 2.2.26+
# Query:
# RID: TY6949DZ014
# Database: nr
# Fields: subject gi
# 6 hits found
508843
4585272
169079
388521786
502139117
356527659
# BLAST processed 1 queries


SEE ALSO
NCBI Entrez Help Manual at http://www.ncbi.nlm.nih.gov/books/NBK3837/#EntrezHelp.Entrez_Searching_Options.

NCBI E-utilities Quick Start http://www.ncbi.nlm.nih.gov/books/NBK25500
BioPython Bio::Entrez at http://biopython.org/DIST/docs/api/Bio.Entrez-module.html

BUGS
1. Retrieval of large numbers of sequences from NCBI in a single job may not always work. We should revise seqfetch.py to break up retrievals into chunks of perhaps 500 or 1000 sequences at a time.

AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist5