ACEDB User Group Newsletter - April 2003

If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.

This month sees praise for AceDB from the Worm Advisory board, support for alternative genetic codes, better searching in the Tree window, a tightening up of GFF output, using the server with xinetd, some bug fixes and the usual nerdy stuff for developers.

General News

AceDB powers the worm...or something like that...

AceDB received praise and support from the Wormbase Advisory Board report for 2003 which included the following paragraph:

    SUMMARY REPORT FROM THE WORMBASE ADVISORY BOARD 2003

    2.  The advisory committee wishes to emphasize that WormBase's use of
    ACEDB as an integration and content delivery database is serving the
    project very effectively, and has been an important factor in enabling
    the project to achieve its impressive two-week data updating cycle. We
    make this point because this has been an issue of some controversy in
    the past, with some critics arguing that ACEDB is too ad hoc, or
    insufficiently robust, or cannot meet the performance demands of a
    production system. In point of fact ACEDB's performance on common
    bioinformatically important operations, notably retrieval of complex
    structured objects, is significantly faster than commercial R-DBMSs,
    while the ACEDB data model has become an object of formal study by
    database researchers who see in it valuable ideas for a new kind of
    DBMS (Ref 1). The success of the 2-week updating process shows that
    robustness has not been a problem. Furthermore ACEDB now comprises
    several person-decades of programming investment in
    bioinformatics-specific viewing and analysis capabilities which a
    generic DBMS alone could not replace. In summary, the committee feels
    strongly that the principle of not fixing what ain't broke should
    apply in this case.

Well done to the AceDB developers and the Wormbase group.

New Features

DNA FMap keyboard shortcut

Yet another FMap shortcut key, you can use the "d" key to toggle the DNA display on/off.

AceDB now supports different genetic codes

(Thanks to Jean Thierry-Mieg mieg@ray.nlm.nih.gov who did the original coding for this new feature.)

Up until now AceDB has only used the standard genetic code meaning that you could not do translations of sections of dna that used a different code e.g. mitochondrial dna. You can now specify your own genetic code for any dna in AceDB.

Alternative genetic codes can be specified using the new Genetic_code class:

?Genetic_code   Other_name ?Text
                Translation  UNIQUE     Text
                Start  UNIQUE   Text
                Base1 UNIQUE    Text
                Base2 UNIQUE    Text
                Base3 UNIQUE    Text

Here's how you might include it in your Sequence class:

?Sequence DNA UNIQUE ?DNA UNIQUE Int
                    .
                    .
                    .
	  Origin  From_database UNIQUE ?Database UNIQUE Int
		  From_author ?Author XREF Sequence
                              .
                              .
                              .
		  Species ?Species
                  Genetic_code  UNIQUE ?Genetic_code    // specify a different genetic coding.
		  Method UNIQUE ?Method UNIQUE Float

Here's an example in ace file format:

Genetic_code : "Ascidian Mitochondrial"
Translation    "FFLLSSSSYY**CCWWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSGGVVVVAAAADDEEGGGG"
Start          "-----------------------------------M----------------------------"
Base1          "TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG"
Base2          "TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG"
Base3          "TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG"


Sequence : "MTCE"
DNA  MTCE
Genetic_code "Ascidian Mitochondrial"

If you displayed this sequence in FMap, then when you used any of the FMap functions that translate the MTCE's DNA, the translation would be done using the genetic code specified by the "Ascidian Mitochondrial" object.

If you don't specify the code, then the standard genetic code will be used (i.e. the code that AceDB has used up until now).

If you use the old Source/Subsequence or the new SMap tags to create a hierachy of perhaps "transcripts -> genes -> clones -> chromosome" for your sequence and the sequence requires an alternative genetic code, then to make biological sense you should specify just one alternative genetic code for the whole hierachy. It may be sensible however to specify it in more than one place in the hierachy: you would certainly want to specify it in the sequence object that contained the actual dna so that you could sucessfully export that object with all its neighbours from AceDB, but you should also specify it in the topmost parent of the entire hierachy. This is because when translating the dna for any subpart of the hierachy, AceDB looks up through all the successive parents of that subpart for the alternative genetic code. Hence if you also specify the alternative genetic code at the highest level then all translations of objects within the hierachy will be guaranteed to use the correct code.

Improved text search in tree display

You can now search for just about anything in the tree display window: tags, text, numbers. It doesn't matter whether the search target is hidden (i.e. in a collapsed branch of the tree display) or whether you are in "Update" mode, you can still search.

You can search either by selecting the "Find" button in the tree display main menu or use the shortcuts:


Ctrl "f","b"             "f" to search forward, "b" to search backwards

Ctrl ",", "."            "," to find the previous occurrence of the target,
                         "." to find the next occurrence.

(For previous you can press either the "," key or the "<" key and for next you can press the "." key or the ">" key, its just more convenient to press the "," and "." keys because you don't have to press shift first.)

Note the following:

If the target can't be found you get a message.
When the search reaches the end of the object you get a message.
If next/prev will wrap to the top/bottom of the object on reaching the bottom/top.
Ctrl "f","b" always bring up the dialogue box allowing you to change what you're searching for and always start from the top/bottom of the object respectively.
If the target is in a collapsed branch of the tree then the search will expand the branch so you can see the target. There is now a "Collapse" option in the main menu to collapse back the branch.

Please note that the targets must be complete, searching for partial strings will be introduced next month.

A new "-list" option for the "gif seqfeatures" command.

You can use giface to build virtual sequences using the "gif seqget" command and then dump the features on that virtual sequence in GFF format using the "gif seqfeatures" command. As alternative you could also just list the types of feature found in that virtual sequence. A new list option now allows you to list all the object keys that were used to make that virtual sequence with its features:


seqfeatures [-list [features | keys]]

             -list features    lists the feature types found in that sequence (the default)

             -list keys        lists the keys of objects used to construct the virtual sequence

You can use the "-list keys" function in combination with the existing "-source" or "-feature" flags to find out which keys of a particular type were used for a particular stretch of sequence.

Articles

Tightening up GFF output format from AceDB

GFF files are made up of single line records with the following format:

<seqname> <source> <feature> <start> <end> <score> <strand> <frame> [attributes] [comments]

Leaving aside the "attributes" and "comments" fields, the rest are in some senses mandatory although this is not completely clear in the GFF spec. In the past AceDB has sometimes omitted some of these mandatory fields altogether which makes parsing of the records more difficult. From now on AceDB will adopt the following standard for outputting GFF records:

<seqname | "."> <source | "."> <feature> <start> <end> <score | "."> <strand | "."> <frame | "."> [attributes] [comments]

In summary: the "seqname" to "frame" fields will always be present but will default to a "." except for the "feature", "start" and "end" fields which must values must be given. From this release AceDB will output all GFF files in this format.

Running the AceDB server under the control of xinetd

(Thanks to Jack Chen chenn@cshl.edu of wormbase for finally finding a working configuration.)

In the past if you wanted the AceDB server to be automatically started when requests came in to your machine you could do this via inetd, an operating system utility which provided this service.

xinetd is an open source replacement for inetd, it provides many more features and much security than inetd, you can read more about it at the xinetd website. This has replaced inetd as the default service on Linux and some other Unix versions.

Its configuration is a bit different from inetd and is as follows:

For this example we assume the following:

nickname for server/database -  MyfirstDB
             port for server -  20113
        user owning database -  mieg
       path to server binary -  /usr/local/bin/saceserver
            path to database -  /home/databases/aardvarkDB
   timeout values for server -  200:200:0

In /etc/services add the line:

MyfirstDB  20113/tcp

xinetd keeps its configuration files in the directory /etc/xinetd.d. Create a file in this directory giving it some meaningful name such as AceDB and add the following lines:

# file: /etc/xinetd.d/acedb
# default: on
# description: wormbase acedb database
service acedb
{
               protocol                = tcp
               socket_type             = stream
               port                    = 20113
               flags                   = REUSE
               wait                    = yes
               user                    = mieg
               group                   = mieg
               log_on_success          += USERID DURATION
               log_on_failure          += USERID HOST
               server                  = /usr/local/bin/saceserver
               server_args             = /home/databases/aardvarkDB 200:200:0
}

Note that you will need to change the port, user, group, server and server_args fields for you setup. Lines beginning with a "#" are as usual comments.

Warning: In the past there have been a number of problems with the way xinetd worked which prevented users from running the AceDB servers with xinetd. Currently the following combination of Linux, xinetd and AceDB is known to work:

Linux: Red Hat Linux release 7.3 (Valhalla), kernal: 2.4.18-27.7.xbigmem

xinetd: xinetd-2.3.7-4.7x

AceDB: 4_9m

If you find that you cannot get AceDB working under xinetd control then you should first check that you have up to date software.

Bugs Fixed

DNA dumping

The DNA dumping command had the wrong default, it should have been for the spliced DNA unless otherwise specified.

The code also did not cope with when an object contained both a DNA tag directly referencing DNA and a Source_exons tag specifying the exons to be spliced out of the DNA.

Both now fixed.

GFF dumping the "reverse" strand

You may have noticed occasionally that the GFF dumping function failed reporting that dumping of the "reverse" sequence had not been implemented. This usually happens if the sequence has been reverse complemented and then the user tries to GFF dump. In fact this is a misunderstanding of how GFF should be dumped in that for any one sequence, the reverse strand features are always dumped but their strand is "-" instead of "+".

The "gif seqfeatures" command which does the GFF dumping code, when it detects a reverse complemented sequence, automatically reverse complements it for dumping and then reverse complements the sequence back. This is correct for GFF format which is supposed to represent the features on a known sequence (not its reverse complement) and the strand of features is represented by the "strand" field.

Developers Corner

bIndexGetTag2Key() function

As you will remember the bIndexNNN() functions are a set a of functions that enable you to search objects for tags and retrieve keys while making the best possible use of AceDB index to avoid opening the objects wherever possible.

The bIndexGetTag2Key() function allows you to retrieve the trailing object key in a tag2 system, i.e. a set of tags in this format: locator_tag user_tag1 obj_type1_key user_tag2 obj_type2_key etc.

An example of this would be the SMap tags introduced last year into AceDB: SMap S_Parent UNIQUE <ptag2> UNIQUE <key> // must be just _one_ parent. <ptag2> UNIQUE <key>

Using the bIndexGetTag2Key() function you can get the key following the "user_tag" but if the object does not contain the locator_tag then it will not be opened which makes traversing large hierachies of SMap'd objects much more efficient.

The function copes with the situation where there are a number of user_tags following the locator tag.

The bindexNNN() functions are derived from code written by Jean Thierry-Mieg and include (from wh/bindex.h):

/*
 * These functions find tags etc. making optimal use of the index,
 * only opening objects to get data if strictly necessary.
 */
BOOL bIndexTag(KEY key, KEY tag) ;			    /* Is a tag in an object ? */
BOOL bIndexGetKey(KEY key, KEY tag, KEY *key_out) ;	    /* Retrieve key following a tag if
							       present. */
BOOL bIndexGetTag2Key(KEY key, KEY tag, KEY *key_out) ;	    /* Retrieve key following tag but
							       where tag is part of a tag2 system. */

Getting graph and gex to co-exist for key bindings

Some AceDB windows are implemented using purely Graph package code, some with purely Gex code and some with a mixture (e.g. FMap). There can be a problem with clashes over key bindings if both layers have uses for the same keys, e.g. the left and right arrow keys are used by the text entry boxes of Graph for FMap origin and zone setting but also by the Gex layer for horizontal scrolling. So what to do ?

The addition of two new functions to the GraphDev interface allows you to solve this. graphDevDisableKeyboard() and graphDevEnableKeyboard() allow you to disable/enable Gex layer keyboard code so that the Graph layer can intercept the events. For FMap it works like this:

Normally the left/right arrows control left/right scrolling of the entire FMap
When the user clicks on the zone or origin boxes, Graph disables the Gex handling of the arrow keys so that it can use them to allow the user to cursor left and right in the text entry boxes.
When the user presses enter to show they have finished, then the Gex left/right scrolling is enabled again by Graph.

This allows Graph and Gex to coexist as a crude approximation of what normally happens in windowing systems.

sMapTreeCoords(): a more general sMapTreeRoot()

sMapTreeRoot() is a function that allows you to find the coordinates of any child in an SMap within the ultimate parent of that SMap, while this is often useful, there are also occasions when its useful to find out the coordinates of a child in any one of its parents. A new function, sMapTreeCoords(), allows you to do this (from wh/smap.h):

/* Finds x1,x2 coords of key in target_parent, which must be a parent of
 * of key in keys smap tree, if target_parent is set to KEY_UNDEFINED it
 * does the same as sMapTreeRoot(). */
BOOL sMapTreeCoords(KEY key, int x1, int x2, KEY *target_parent, int *y1, int *y2) ;

As can be seen from the description, sMapTreeRoot() is now simply a call to sMapTreeCoords().

Getting the object from which an FMap seg was derived

An FMap seg mostly represents a single box drawn on the FMap screen but going from this seg to the object from the seg was principally derived is not completely straightforward, in fact the FMap code has a number of different ways of doing this. The fmapSeg2SourceName() function takes a seg and returns the name of the object from which the seg was principally derived. Sometimes this is seg->key, sometimes seg->parent and sometimes seg->source depending on the seg type.

April monthly build now available.

You can pick up the monthly builds from:

Sanger users: ~acedb/RELEASE.DEVELOPMENT
External users

Next User Group Meeting - D319, 3.00pm, Thurs, 8th May 2003

Ed Griffiths <edgrif@sanger.ac.uk>

Last modified: Fri May 2 09:55:53 BST 2003