Sequence query internals ======================== Sequence queries are a key feature of the EMBOSS USA (Uniform Sequence Address). In USA processing (by function seqUsaProcess) the query is parsed and stored in the AjPSeqQuery data structure. Each query field has its own string in AjPSeqQuery. Strings which are only "*" will match everything, and are removed (probably several times, just to be safe). A function ajSeqQueryWild tests for 'wildcard' queries ... those which can return more than one entry. These can use a different access method. For example, a web server query may only expect to return a single entry. In most cases, a wildcard query is not a great overhead. Queries with no wildcard in the Id are assumed to be unique. Queries where the accession and id are the same (specified as dbname:id in the USA) are treated as unique unless the id has a wildcard. All other queries are assumed to return more than one entry. To add a new field to the query processing, follow these steps. Loook for "Key" of "Sv" to find the keyword and SeqVersion query code in ajax *.c and *.h files. 1. Add the field to the AjPSeq data structure and its functions 2. Add the field to the query data structure and its functions 3. Index the field (where available) in dbi programs, using the accession number mechanism. (not yet done for most fields) 4. Read the field information in the appropriate formats (EMBL, Genbank, NCBI) 5. Write the field information in the appropriate formats (EMBL, Genbank, NCBI) 6. Check for -field in the USA and set the new field in the AjPSeqQuery in seqUsaProcess 7. Add the field to SRS and other access methods as appropriate. 8. Add the field to a "fields:" definition for the databases concerned. 9. Test with seqret using USAs containing the new field.