If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.
This month sees praise for AceDB from the Worm Advisory board, support for alternative genetic codes, better searching in the Tree window, a tightening up of GFF output, using the server with xinetd, some bug fixes and the usual nerdy stuff for developers.
AceDB received praise and support from the Wormbase Advisory Board report for 2003 which included the following paragraph:
SUMMARY REPORT FROM THE WORMBASE ADVISORY BOARD 2003 2. The advisory committee wishes to emphasize that WormBase's use of ACEDB as an integration and content delivery database is serving the project very effectively, and has been an important factor in enabling the project to achieve its impressive two-week data updating cycle. We make this point because this has been an issue of some controversy in the past, with some critics arguing that ACEDB is too ad hoc, or insufficiently robust, or cannot meet the performance demands of a production system. In point of fact ACEDB's performance on common bioinformatically important operations, notably retrieval of complex structured objects, is significantly faster than commercial R-DBMSs, while the ACEDB data model has become an object of formal study by database researchers who see in it valuable ideas for a new kind of DBMS (Ref 1). The success of the 2-week updating process shows that robustness has not been a problem. Furthermore ACEDB now comprises several person-decades of programming investment in bioinformatics-specific viewing and analysis capabilities which a generic DBMS alone could not replace. In summary, the committee feels strongly that the principle of not fixing what ain't broke should apply in this case.
Well done to the AceDB developers and the Wormbase group.
Yet another FMap shortcut key, you can use the "d" key to toggle the DNA display on/off.
(Thanks to Jean Thierry-Mieg mieg@ray.nlm.nih.gov who did the original coding for this new feature.)
Up until now AceDB has only used the standard genetic code meaning that you could not do translations of sections of dna that used a different code e.g. mitochondrial dna. You can now specify your own genetic code for any dna in AceDB.
Alternative genetic codes can be specified using the new Genetic_code class:
?Genetic_code Other_name ?Text
Translation UNIQUE Text
Start UNIQUE Text
Base1 UNIQUE Text
Base2 UNIQUE Text
Base3 UNIQUE Text
Here's how you might include it in your Sequence class:
?Sequence DNA UNIQUE ?DNA UNIQUE Int
.
.
.
Origin From_database UNIQUE ?Database UNIQUE Int
From_author ?Author XREF Sequence
.
.
.
Species ?Species
Genetic_code UNIQUE ?Genetic_code // specify a different genetic coding.
Method UNIQUE ?Method UNIQUE Float
Here's an example in ace file format:
Genetic_code : "Ascidian Mitochondrial"
Translation "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG"
Start "-----------------------------------M----------------------------"
Base1 "TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG"
Base2 "TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG"
Base3 "TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG"
Sequence : "MTCE"
DNA MTCE
Genetic_code "Ascidian Mitochondrial"
If you displayed this sequence in FMap, then when you used any of the FMap functions that translate the MTCE's DNA, the translation would be done using the genetic code specified by the "Ascidian Mitochondrial" object.
If you don't specify the code, then the standard genetic code will be used (i.e. the code that AceDB has used up until now).
If you use the old Source/Subsequence or the new SMap tags to create a hierachy of perhaps "transcripts -> genes -> clones -> chromosome" for your sequence and the sequence requires an alternative genetic code, then to make biological sense you should specify just one alternative genetic code for the whole hierachy. It may be sensible however to specify it in more than one place in the hierachy: you would certainly want to specify it in the sequence object that contained the actual dna so that you could sucessfully export that object with all its neighbours from AceDB, but you should also specify it in the topmost parent of the entire hierachy. This is because when translating the dna for any subpart of the hierachy, AceDB looks up through all the successive parents of that subpart for the alternative genetic code. Hence if you also specify the alternative genetic code at the highest level then all translations of objects within the hierachy will be guaranteed to use the correct code.
You can now search for just about anything in the tree display window: tags, text, numbers. It doesn't matter whether the search target is hidden (i.e. in a collapsed branch of the tree display) or whether you are in "Update" mode, you can still search.
You can search either by selecting the "Find" button in the tree display main menu or use the shortcuts:
Ctrl "f","b" "f" to search forward, "b" to search backwards
Ctrl ",", "." "," to find the previous occurrence of the target,
"." to find the next occurrence.
(For previous you can press either the "," key or the "<" key and for next you can press the "." key or the ">" key, its just more convenient to press the "," and "." keys because you don't have to press shift first.)
Note the following:
Please note that the targets must be complete, searching for partial strings will be introduced next month.
You can use giface to build virtual sequences using the "gif seqget" command and then dump the features on that virtual sequence in GFF format using the "gif seqfeatures" command. As alternative you could also just list the types of feature found in that virtual sequence. A new list option now allows you to list all the object keys that were used to make that virtual sequence with its features:
seqfeatures [-list [features | keys]]
-list features lists the feature types found in that sequence (the default)
-list keys lists the keys of objects used to construct the virtual sequence
You can use the "-list keys" function in combination with the existing "-source" or "-feature" flags to find out which keys of a particular type were used for a particular stretch of sequence.
GFF files are made up of single line records with the following format:
<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes] [comments]
Leaving aside the "attributes" and "comments" fields, the rest are in some senses mandatory although this is not completely clear in the GFF spec. In the past AceDB has sometimes omitted some of these mandatory fields altogether which makes parsing of the records more difficult. From now on AceDB will adopt the following standard for outputting GFF records:
<seqname | "."> <source | "."> <feature> <start> <end> <score | "."> <strand | "."> <frame | "."> [attributes] [comments]
In summary: the "seqname" to "frame" fields will always be present but will default to a "." except for the "feature", "start" and "end" fields which must values must be given. From this release AceDB will output all GFF files in this format.
(Thanks to Jack Chen chenn@cshl.edu of wormbase for finally finding a working configuration.)
In the past if you wanted the AceDB server to be automatically started when requests came in to your machine you could do this via inetd, an operating system utility which provided this service.
xinetd is an open source replacement for inetd, it provides many more features and much security than inetd, you can read more about it at the xinetd website. This has replaced inetd as the default service on Linux and some other Unix versions.
Its configuration is a bit different from inetd and is as follows:
For this example we assume the following:
nickname for server/database - MyfirstDB port for server - 20113 user owning database - mieg path to server binary - /usr/local/bin/saceserver path to database - /home/databases/aardvarkDB timeout values for server - 200:200:0
In /etc/services add the line:
MyfirstDB 20113/tcp
xinetd keeps its configuration files in the directory /etc/xinetd.d. Create a file in this directory giving it some meaningful name such as AceDB and add the following lines:
# file: /etc/xinetd.d/acedb
# default: on
# description: wormbase acedb database
service acedb
{
protocol = tcp
socket_type = stream
port = 20113
flags = REUSE
wait = yes
user = mieg
group = mieg
log_on_success += USERID DURATION
log_on_failure += USERID HOST
server = /usr/local/bin/saceserver
server_args = /home/databases/aardvarkDB 200:200:0
}
Note that you will need to change the port, user, group, server and server_args fields for you setup. Lines beginning with a "#" are as usual comments.
Warning: In the past there have been a number of problems with the way xinetd worked which prevented users from running the AceDB servers with xinetd. Currently the following combination of Linux, xinetd and AceDB is known to work:
Linux: Red Hat Linux release 7.3 (Valhalla), kernal: 2.4.18-27.7.xbigmem
xinetd: xinetd-2.3.7-4.7x
AceDB: 4_9m
If you find that you cannot get AceDB working under xinetd control then you should first check that you have up to date software.
The DNA dumping command had the wrong default, it should have been for the spliced DNA unless otherwise specified.
The code also did not cope with when an object contained both a DNA tag directly referencing DNA and a Source_exons tag specifying the exons to be spliced out of the DNA.
You may have noticed occasionally that the GFF dumping function failed reporting that dumping of the "reverse" sequence had not been implemented. This usually happens if the sequence has been reverse complemented and then the user tries to GFF dump. In fact this is a misunderstanding of how GFF should be dumped in that for any one sequence, the reverse strand features are always dumped but their strand is "-" instead of "+".
The "gif seqfeatures" command which does the GFF dumping code, when it detects a reverse complemented sequence, automatically reverse complements it for dumping and then reverse complements the sequence back. This is correct for GFF format which is supposed to represent the features on a known sequence (not its reverse complement) and the strand of features is represented by the "strand" field.
As you will remember the bIndexNNN() functions are a set a of functions that enable you to search objects for tags and retrieve keys while making the best possible use of AceDB index to avoid opening the objects wherever possible.
The bIndexGetTag2Key() function allows you to retrieve the trailing object key in a tag2 system, i.e. a set of tags in this format: locator_tag user_tag1 obj_type1_key user_tag2 obj_type2_key etc.
An example of this would be the SMap tags introduced last year into AceDB: SMap S_Parent UNIQUE <ptag2> UNIQUE <key> // must be just _one_ parent. <ptag2> UNIQUE <key>
Using the bIndexGetTag2Key() function you can get the key following the "user_tag" but if the object does not contain the locator_tag then it will not be opened which makes traversing large hierachies of SMap'd objects much more efficient.
The function copes with the situation where there are a number of user_tags following the locator tag.
The bindexNNN() functions are derived from code written by Jean Thierry-Mieg and include (from wh/bindex.h):
/*
* These functions find tags etc. making optimal use of the index,
* only opening objects to get data if strictly necessary.
*/
BOOL bIndexTag(KEY key, KEY tag) ; /* Is a tag in an object ? */
BOOL bIndexGetKey(KEY key, KEY tag, KEY *key_out) ; /* Retrieve key following a tag if
present. */
BOOL bIndexGetTag2Key(KEY key, KEY tag, KEY *key_out) ; /* Retrieve key following tag but
where tag is part of a tag2 system. */
Some AceDB windows are implemented using purely Graph package code, some with purely Gex code and some with a mixture (e.g. FMap). There can be a problem with clashes over key bindings if both layers have uses for the same keys, e.g. the left and right arrow keys are used by the text entry boxes of Graph for FMap origin and zone setting but also by the Gex layer for horizontal scrolling. So what to do ?
The addition of two new functions to the GraphDev interface allows you to solve this. graphDevDisableKeyboard() and graphDevEnableKeyboard() allow you to disable/enable Gex layer keyboard code so that the Graph layer can intercept the events. For FMap it works like this:
This allows Graph and Gex to coexist as a crude approximation of what normally happens in windowing systems.
sMapTreeRoot() is a function that allows you to find the coordinates of any child in an SMap within the ultimate parent of that SMap, while this is often useful, there are also occasions when its useful to find out the coordinates of a child in any one of its parents. A new function, sMapTreeCoords(), allows you to do this (from wh/smap.h):
/* Finds x1,x2 coords of key in target_parent, which must be a parent of
* of key in keys smap tree, if target_parent is set to KEY_UNDEFINED it
* does the same as sMapTreeRoot(). */
BOOL sMapTreeCoords(KEY key, int x1, int x2, KEY *target_parent, int *y1, int *y2) ;
As can be seen from the description, sMapTreeRoot() is now simply a call to sMapTreeCoords().
An FMap seg mostly represents a single box drawn on the FMap screen but going from this seg to the object from the seg was principally derived is not completely straightforward, in fact the FMap code has a number of different ways of doing this. The fmapSeg2SourceName() function takes a seg and returns the name of the object from which the seg was principally derived. Sometimes this is seg->key, sometimes seg->parent and sometimes seg->source depending on the seg type.
You can pick up the monthly builds from: