ACEDB User Group Newsletter - May 2001

If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.

There are still places available on the AceDB course in June. There are some improvements to the command line interface (see items on status and lastmodified). The usage of the CDS tag in Sequence objects has been tightened up but restrictions on translation of Sequence objects have correspondingly been relaxed. If you use tags which have names consisting of numerics only you should read the article non-numeric tags. There are a number of bug fixes.

General News

We are still struggling to build AceDB 4_9 on our Solaris system for a variety of excrutiatingly boring reasons. We are now close to finishing the build so Solaris users please hang on. The problems are genuinely not AceDB but system based.... This means that this months monthly build does not at the moment include Solaris binaries.

Education

There are still places available on the 1-day AceDB introductory course on 19th June and the 3-day advanced AceDB course on 20th-22nd June at the Dept. of Genetics, Cambridge University, Cambridge.

For further details please visit:

http://www.hgmp.mrc.ac.uk/About/Courses/2001/comp.acedb.course.html

New Features

Last Modified command line interface

To help systems that interface to AceDB via the server or via tace, a last modified command has been introduced. The command returns the time the last command was executed that changed the database. This is currently a bit crude in a "fail-safe" way in that the time may be updated for some commands where in fact no modification took place.

The intention of this command is to allow systems such as Lincoln Steins wormbase servers to cache features they have requested from the server for faster performance. They can use the last modified command to check the database has not been modified.

Command description:

lastmodified : Returns time database was last modified as a string in AceDB time format.

Example:

acedb> lastmodified
2001-05-29_21:30:54
// 0 Active Objects
acedb>

New format 'status' display

It's possible to look at AceDB program status either from graphical xace (via "Admin" -> "Program Status" menu item from the main window) and from command line programs such as tace and giface using the "status" command.

The format of the status display has been changed to make it more readable, and from the command line it is possible to ask for one or more subsets of the status report statistics.

Command description:

Status {on | off} : toggle memory statistics
       {-code -database -disk -lexiques -hash -index
        -cache -cache1 -cache2 -array -memory -all} : select stats to print

Example:

acedb> status -code -database
 // ************************************************
 // AceDB status at 2001-05-29_23:44:41
 // 
 // - Code
 //             Program: tace
 //             Version: ACEDB 4_9a
 //               Build: May 29 2001 23:42:39
 // - Database
 //               Title: 
 //                Name: 
 //             Release: 4_0
 //           Directory: /home/edgrif/acedb/databases/bA404F10_db
 //             Session: 9
 //                User: edgrif
 //        Write Access: No
 //      Global Address: 351
 // 
 // ************************************************

// 0 Active Objects
acedb>

Translation of non-CDS Sequence objects now possible.

In previous versions of AceDB it was only possible to translate Sequence objects that were tagged as CDS objects. Although this seems logical it is not very helpful as users often wanted to do translations of other Sequence objects just to see what sort of protein might be produced. This led to some database administrators tagging most Sequence objects in their databases as CDS objects just so they could do protein translations which is not desireable. The code has been altered so that its possible to translate all Sequence objects which are positioned on a piece of dna. There should not be any need now to add the CDS tag just to see a protein translation.

If you have any problems with this, please send mail to acedb@sanger.ac.uk.

Articles

Non-numeric tags

Naming of tags in AceDB is reasonably flexible, but a recent bug report for a query that involved a tag that had a name that consisted only of numerics (i.e. 0 - 9) has forced a rethink. The query in question failed because if numeric tags are allowed then the query language has an inherent ambiguity because it is not possible to distinguish between a tag that has a name consisting only of numerics and a particular position following a tag, e.g. the code cannot disambiguate the user saying "find me the tag following whose name is '3'" from "find me the 3rd value over from where we are now".

It would be possible to alter the query language in some incompatible way, but more desireable is simply that numeric only tag names should not be used. The code has been modified to issue a warning every time it encounters a numeric only tag name in a wspec/models.wrm file.

You should note that it is likely that support will be withdrawn for tags with numeric only names in the near future. It is likely that tags will be restricted to the regular expression "[a-zA-Z][a-zA-Z0-9_]*", i.e. tags must begin with a lower or upper case letter and must consist only of letters, numbers and the underscore character.

Yet more on CDS tags

Following an AceDB user group meeting this month, it was decided to tighten up the use of tags for defining the position of a CDS in a Sequence object. The relevant parts of the Sequence class that can be used to define the position of a CDS within a sequence object are as follows:

?Sequence.....
	  Structure.....
				Source_Exons Int UNIQUE Int
	  Properties.....
			Coding	CDS UNIQUE Int UNIQUE Int // start, end in spliced DNA coords,
                                                          // default:  1, length_of_Source_Exons
			End_not_found
			Start_not_found UNIQUE  Int // Gives position of start frame for protein
                                                    // translation when start of CDS is before first
                                                    // exon in this object (should be in range 1 -> 3).

Description of the tags:

Source_Exons: Define the transcription unit, only part of which may be the CDS.
CDS UNIQUE Int UNIQUE Int: Defines which section of the the transcription unit is the CDS, the Int's specify the start/end of the CDS in spliced DNA coordinates, i.e. the start of the transcription unit is "1" and the end is "(sum of all exon lengths)". If the second int is not specified it defaults to the end of the transcription unit, if the first int is not specified it defaults to the beginning of the transcripton unit.
Start_not_found UNIQUE Int: For a CDS object this tag specifies that the CDS is incomplete because there are further exon(s) upstream. In this case the reading frame for translation of the CDS may incorrect, the Int can be used to alter the reading frame. The Int must have the value 1, 2 or 3, it defaults to 1.
End_not_found: For a CDS object this tag specifies that the CDS is incomplete because there are further exon(s) downstream. This implies that translation of the CDS may end before the end of the CDS depending on the alignment of the reading frame and the end of the CDS.

The rules for specifying the tags are:

CDS UNIQUE Int UNIQUE Int: If the Int's are not specified to lie within the extent of the spliced DNA of the Source_Exons they will be ignored and the code will issue an error message and the only translation allowed will be of the entire transcription unit.
Start_not_found UNIQUE Int: If this tag is set, then the CDS tag must specify that the CDS starts at the beginning of the transcription unit. Otherwise the code will issue an error message and the only translation allowed will be of the entire transcription unit.
End_not_found: When this tag is set the CDS tag should specify that the CDS ends at the end of the transcription unit, this condition is not enforced at the moment.

These rules are intended to reflect the biology of translation, if you feel that this system does not then please email acedb@sanger.ac.uk.

Bugs Fixed

DNA command misnamed files

A bug was introduced in the "dna" command (part of the tace/giface interface) which led to files being called "-f" if the user specified the file as "-f your_filename". This is now fixed, the file can be specified in either of these forms:

"dna your_filename"
"dna -f your_filename"

Asking for coordinates off the end of clone crashed giface.

Using "seqfeatures" with "-coords" that are a long way off the start/end of the Sequence object caused giface or sgifaceserver to crash. This bug has probably not emerged before because the code coped with coordinates that were only a little off the start/end of the Sequence because it adjusted the coordinates to lie within the Sequence. Coordinates that were a long way off resulted in arrays of negative size being created which unsurprisingly crashed the code.

The bug is now fixed, the "seqfeatures" request is now terminated with an error reporting that the coordinates are outside of the Sequence object.

Underlining of features for printing.

In AceDB 4_7 it was possible to set the colour of certain features in fmap/gmap to "black" so that when printed they came out as underlined which looks much better for printed copy. This feature was broken in AceDB 4_8, it has now been fixed in AceDB 4_9.

Build of "Static" executables failed.

In some circumstances where users tried building of dynamically linked and then statically executables, the latter would fail in the build because the build must be done in a certain order. This is now fixed and you should use the following make targets to do the builds:

dynamic builds: "make all"

static builds: "make all_static"

The all_static target makes sure that the build directory is cleaned up correctly before starting the build.

Some AQL queries could cause AceDB to crash.

Malformed AQL queries of the form

  select s from sequence

instead of the fuller

  select s from s in class sequence

used to crash AceDB, this is now fixed to issue an error message instead.

May monthly build now available.

You can pick up the monthly builds from:

Sanger users: ~acedb/RELEASE.DEVELOPMENT
External users: http://www.acedb.org/Software/Downloads/monthly.shtml

Next User Group Meeting - D319, 2.30pm, Thursday, 14th June 2001

Ed Griffiths <edgrif@sanger.ac.uk>

Last modified: Thu May 31 15:01:15 BST 2001