ACEDB User Group Newsletter - July 2003

If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.


This month sees a new "subset" command, tag counts displayed in the tree window, a unfortunate delay to cut/paste in acedb text windows, more work on the Genetic_code class, a question about acedb case sensitivity and object renaming, future plans for a "son of FMap".


New Features

Blixem and selected homologies

When you select a homology in the top, "big picture", window of blixem, the list of homologies below now gets scrolled to the selected homologies.

New "subset" command in tace/giface

(This article is courtesy of Jean Thierry-Mieg who added this new feature mieg@ncbi.nlm.nih.gov)

There is a new subset command available from the command line in tace or giface:

subset x nx : returns the subset of the alphabetic active keyset from x >= 1 of length nx

This is an extremely efficient way to export long lists or tables to the web by pieces. You make an active keyset by some query, subset it, make a table out of the subset and export the table. This is better than making a slice of a table because this way you only compute the table slice you are really interested in.

New "Toggle TagCount" menu option in tree display

(This article is courtesy of Jean Thierry-Mieg who added this new feature mieg@ncbi.nlm.nih.gov)

There is a new option in the menu of the xace tree display called "Toggle TagCount". Select it, and you see in green after each tag how many objects of the same class contain this tag. What is even more interesting is to activate this option while displaying the models. In that case, you will see in red all the tags which are not used at all in your database and which you may wish to remove from your schema.

The counting relies on the internal acedb indexing which was introduced a few years ago and should be quasi instantaneous.

To see the models, double click them from the main window or select them from the 'other class...' button of the main window, or type 'find model' in a query box.

Tag count is disabled in update mode or in show-time-stamps mode.


Articles

Cut, Paste and Text windows

(This article is courtesy of Rob Clack who has been working on this new feature rnc@sanger.ac.uk)

The Pfetch window accessible from fmap and the text editor window accessible from the Update Tree view have both been reverted to the way they were in release 4_9q, ie before we started fiddling with them.

The attempt to provide copy-and-paste functionality was doomed as long as we were using GTK 1.2.10 as there is a bug in that release which breaks popup menus.

This bug is fixed in GTK 2.2.2 but the upgrade to the new GTK is non-trivial as there are numerous incompatibilities which will need to be addressed.

We plan to do the upgrade later this year, but in the meantime the enhancements to the pfetch and text editor windows have been shelved.

This means that anyone who has continued to use older versions of Acedb because of inadequacies in the pfetch and text editor windows can safely upgrade to the new monthly version.

Genetic_code class: one more time

(Thanks to Jean Thierry-Mieg mieg@ncbi.nlm.nih.gov who did the original coding for this feature.)

A previous article described a new class which allows you to specify different genetic codes for translating DNA to peptides. The proposed model for the Genetic_code class to do this was:

?Genetic_code   Other_name ?Text
                Translation  UNIQUE     Text
                Start  UNIQUE   Text
                Base1 UNIQUE    Text
                Base2 UNIQUE    Text
                Base3 UNIQUE    Text

The next month this was then revised to include a specific tag for the rare Selenocysteine amino acid:

?Genetic_code   Other_name ?Text
                Translation UNIQUE Text
                Start UNIQUE Text
                Base1 UNIQUE Text
                Base2 UNIQUE Text
                Base3 UNIQUE Text
                Selenocysteine Remark Text   // Use the Remark to document your usage of this tag.

After some discussion this has now been revised once more (sorry) because as far as is known the TGA codon when specifying Selenocysteine also acts as a stop codon when at the end of a transcript and so can be viewed as an "alternative stop" in the same was as some codons are alternative start codons.

Hence it makes sense to have a unified mechanism for both alternative start and stop codons, the new model for the Genetic_code class looks like this:

?Genetic_code   Other_name ?Text
                Remark Text
                Translation UNIQUE Text
                Start UNIQUE Text
                Stop  UNIQUE Text
                Base1 UNIQUE Text
                Base2 UNIQUE Text
                Base3 UNIQUE Text

An example for Selenocysteine taken from the the Wormbase database illustrates usage of the Genetic_code class:

Genetic_code : "Selenocysteine"
Translation  "FFLLSSSSYY**CCUWLLLLPPPPHHQQRRRRIIMMTTTTNNKKSSSSVVVVAAAADDEEGGGG"
Start        "---M----------------------------MMMM---------------M------------"
Stop         "----------**--*-------------------------------------------------"
Base1        "TTTTTTTTTTTTTTTTCCCCCCCCCCCCCCCCAAAAAAAAAAAAAAAAGGGGGGGGGGGGGGGG"
Base2        "TTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGGTTTTCCCCAAAAGGGG"
Base3        "TCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAGTCAG"

In this case the TGA codon (in this case in column 15) will specify the Selenocysteine amino acid (U) instead of its more usual role of being a stop codon. This happens unless it occurs at the end of a transcript in which case it specifies a stop, this is indicated by the "*" in the corresponding column in the data following the "Stop" tag.

Hence alternative Start codons are specified by an "M" in the Start data and alternative Stop codons are specified by a "*" in the Stop data.

In the case of Worm base the sequence class has been modified to include a reference to the Genetic_code class in the Origin section:

?Sequence DNA UNIQUE ?DNA UNIQUE Int		// Int is the length
                             .
                             .
	  Origin  From_Database UNIQUE ?Database UNIQUE Int	// release number
                             .
                             .
                  Genetic_code UNIQUE ?Genetic_code

You can see an example of this in Sequence object "C06G3.7".

You should note that when exporting sequences, acedb will not export the final codon if it translates as a valid stop codon. In FMap however, the stop codon is displayed so that its possible to see that the transcript ends correctly.

Case sensitivity and object names

Recently a user sent in the following problem:

When I was trying to create an instance of an object in ACEDB, I encountered some problem. The problem is as follow: I created an object called "test" with all lower case. Then I found out that I have to use upper case "TEST". Since I realized that ACEDB is case insensative, I can't just created a new object called "TEST". Therefore, I deleted old object called "test". And then create a new object called "TEST". However, somehow, when I fetched the object "TEST", I got "test" back even though I have deleted it and double checked it several times. It seems that ACEDB has some kind of memory system (still remember old deleted object :-)). Can someone tell me how to fix this problem? Thank you very much.

This "problem" is in essence caused by the acedb indexing system works and in particular by the way it handles texts such as the names of objects. Acedb keeps tables of strings which are indexed in a very efficient way, entries in these tables are not deleted until the database is rebuilt. Hence in the above case, when the object "test" is first created an entry is made in the index for the string "test". When the object is removed the entry remains in the index, this is done for efficiency otherwise every single string would require a usage count and object creation, tag usage etc. would become less efficient. Now when the new object "TEST" is created the name "TEST" is compared to existing strings in the index in a case insensitive way and is a match to "test", hence the object is named "test", not "TEST".

This seems contrary until you start to think what would happen when you update a database. If you wished to update existing objects you would need to get the case of their names exactly right otherwise your update would create a new object. In fact the case of all the tags you specified would also have to be correct and so on and so on. In other words the whole system is much simpler with case insensitive handling of strings.

So what to do if you wish to simply change the case of an object ?

The answer is to rename the existing object, when you do this the rename code in acedb will respect the case of the string you wich to change the object name to. There are a couple of ways to do this:

From xace:

Choose "Add/Delete/Rename/Alias Objects" from the main acedb window, specify the name and class of your object and then click on "Rename-Fuse". This will give you a dialog box where you fill in the new name in the case that you want.

From tace:

acedb> parse
-R My_Class oldname newname
ctrl-D


Bugs Fixed

Database reindex messages

If you were reindexing the database using xace, the messages reporting the reindexing tended to be lost because they were sent to the terminal from which xace was started. The messages will now be displayed in popup windows so you get better feedback that the redindexing has happened.

Blixem not showing homologies correctly (part 25)

Simon Kelley has doggedly nailed a bug in blixem which led to homologies being incorrectly shown or not shown at all. This bug turned out to be another case where not all data coordinates were being correctly modified when an fmap is reverse complemented. This meant that blixem was supplied with incorrect "gap" array data for the homologies. A tale of real persistence, well done Simon.


Future Plans

Configurable error messages

There are several facilities in acedb where it would be useful to configure the way in which error messages are output and also the way in which the facility itself runs. A good example of an existing system is the parse command which will stop at the first failure or the pparse command which will continue on to the end of the file processing all objects that do not have errors, discarding ones that do. This gives the user the choice of whether to stop "on first failure" or not.

Other places where this would be advantageous include: constructing the DNA for an SMap and constructing a virtual sequence perhaps for FMap display or GFF dumping.

In particular it would help users if they could select the number of error messages and also "first failure" when constructing a virtual sequence. In particular users who have to construct large GFF files from long virtual sequences could make use of the facility when testing new data in their databases.

New code is being added to the messaging routines in acedb to provide easier control of error messaging and when to fail. The new messErrorCond() routine includes logic to detect the first failure, and whether none, just the first or all error messages should be output.

This is work in progress as it will require some alterations to the SMap DNA and SMap convert code.

ZMap, Son of FMap

Simon Kelley <srk@sanger.ac.uk> and myself have been working on a new acedb display, named "ZMap" for now, which will be a stand alone display acting as a client that contacts database servers for sequence information. In effect a client/server version of FMap and acedb.

Simon has been building the new display within xace to test it out and it will have a number of new features including the ability to split windows and show different parts of the same sequence.

I have been working on the main scaffold code for the application which will be multithreaded so that users will be able to cancel or "Stop" display of a particular sequence in the way that web browsers do and be able to continue to work on one display while another is being constructed.

The aim is produce a stand alone display incorporating the best parts of FMap but which can be used against any database that supports the protocol that the new display uses (likely to be DAS 2 initially).


Developers Corner

New "chrono" command in tace/giface

(Thanks to Jean Thierry-Mieg who added this new feature mieg@ncbi.nlm.nih.gov)

Jean has been revamping the chrono code and has continued this by adding a "chrono" command to the command line interface:

Chrono {start | stop | show} built in profiler of chrono aware routines

Obviously this only works for routines that have chrono calls built into them but its a very useful tool for localised profiling.

Change to the way SMap constructs the DNA for a virtual sequence

While constructing some aquila tests for the SMap DNA code I found SMap didn't construct DNA the way I thought it did, it stops at the first DNA it finds while traversing down the hierachy. This means that potential mismatches may be missed depending on the way users have organised their data.

I've altered the code to find all SMap'd DNA down a hierachy, this is a small change to the code and we can reverse it if this produces problems. It seems more natural to try to map all DNA in the hierachy.

Users will probably not see any difference because they either have non-overlapping DNA or overlaps only at the end of clones.

Adding new tests to aquila

Adding new tests to aquila. We all know we should do it but.....

Well recently I've added a couple of new tests to our overnight aquila runs to test SMap DNA construction and the new genetic code routines.

Rob (Rob Clack <rnc@sanger.ac.uk>) has been adding to the aquila documentation to describe how to add tests and its really much easier now that aquila handles a series of small test script files rather than one huge file.

If you add a new facility to acedb or fix a difficult bug, try to add a new test to Aquila.


July monthly build now available.

You can pick up the monthly builds from:

Sanger users
~acedb/RELEASE.DEVELOPMENT
External users
http://www.acedb.org/Software/Downloads/monthly.shtml


Next User Group Meeting - D319, 3.00pm, Thursday, 14th August 2003



Ed Griffiths <edgrif@sanger.ac.uk>
Last modified: Thu Aug 7 08:06:41 BST 2003