If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.
This month sees the introduction of code to support representation of mRNA exons and associated CDS in a single object rather then two as is currently used in much of the human database. There are also various other features such as interactive control of reporting of DNA mismatches while displaying large links within fmap. There are also a number of important bug fixes.
This applies particularly to the acedb human chromosome databases.
Previously, where a set of mRNA exons had a known CDS within them, this was represented by two Sequence objects in the database. One object held the positions of the mRNA exons, while the other held a subset of those exons which represented the CDS. This is a hard to maintain because both objects must be positioned correctly in their parent sequence object AND their exon coordinates must be kept in step with each other. It is also very wasteful of space in the database since two objects with two sets of largely overlapping data must be held in the database.
New code has been added to acedb to enable a single sequence object to represent a set of mRNA exons and the CDS within those exons. The following gives a brief example of how this is done.
Parent sequence:
Sequence : "bA404F10"
DNA "bA404F10" 195976
......etc
Subsequence "bA404F10.4" 126576 140451
Subsequence "bA404F10.4.mRNA" 126535 142095
......etc
CDS object:
Sequence : "bA404F10.4"
Source "bA404F10"
Source_Exons 1 55
Source_Exons 8893 9213
Source_Exons 10520 10792
Source_Exons 11003 11122
Source_Exons 12044 12147
Source_Exons 12885 12947
Source_Exons 13805 13876
CDS
......etc
mRNA object:
Sequence : "bA404F10.4.mRNA"
Source "bA404F10"
Source_Exons 1 96
Source_Exons 8934 9254
Source_Exons 10561 10833
Source_Exons 11044 11163
Source_Exons 12085 12188
Source_Exons 12926 12988
Source_Exons 13846 15561
......etc
Note that the CDS object has the CDS tag set and that its exons are a strict subset of the mRNA exons. The CDS tag can be followed by start/end coordinates for the CDS but this is redundant here because the start and end of the exons in the CDS object themselves show the start/end of the CDS.
New parent:
Sequence : "bA404F10"
DNA "bA404F10" 195976
......etc
Subsequence "bA404F10.4" 126535 142095
......etc
and single CDS/mRNA object:
Sequence : "bA404F10.4"
Source "bA404F10"
Source_Exons 1 96
Source_Exons 8934 9254
Source_Exons 10561 10833
Source_Exons 11044 11163
Source_Exons 12085 12188
Source_Exons 12926 12988
Source_Exons 13846 15561
CDS 42 1049
......etc
Here the two objects have been compressed into one. The exons are the full mRNA set of exons and the CDS tag is used to show where the CDS starts and ends within the exons. Note that the CDS start/end positions are given in the coordinates of the exons when spliced together and not the Source_Exons coordinates. Hence (if you do the maths...ugh) the start position of "42" shows that the CDS starts about half way through the first exon and the end position of "1049" shows that the CDS ends about a tenth of the way into the last exon.
How is the new object displayed ? The CDS exon sections of the new object can be given a different colour using the new "CDS_colour" tag in the Method object, in the example below the non-CDS section of the exons will be coloured blue while the CDS section will be red (the default colour is magenta).
Sequence : "bA404F10.4"
Source "bA404F10"
Source_Exons 1 96
Source_Exons 8934 9254
Source_Exons 10561 10833
Source_Exons 11044 11163
Source_Exons 12085 12188
Source_Exons 12926 12988
Source_Exons 13846 15561
CDS 42 1049
......etc
Method "my_CDS"
Method : "my_CDS"
Colour BLUE
CDS_colour RED
......etc
What about if the CDS extends beyond the set of exons in the object ? There are two tags in the existing models commonly used in the Sanger Centre that can be used to deal with this situation:
// #Sequence# (From models.wrm for 22ace)
?Sequence DNA UNIQUE ?DNA UNIQUE Int // Int is the length
......etc
Properties Pseudogene Text
......etc
End_not_found
Start_not_found Int
These tags have the following meaning:
"Start_not_found UNIQUE Int"
.
How do I go from two objects down to one ? Well the first point to note is that the new code will run perfectly well with databases that contain the "two object" representation of the mRNA/CDS exons, merging of objects can be made gradually as required. It is not possible (and almost certainly not desireable) for the code to do this automatically, the existing two objects are linked only by "similar" names and a common sequence parent often shared with many other objects all containing exons. Conversion will require the use of a specially written script to extract the two objects from the database, merge them and parse them back into the database.
How can I control which sections of the single object are operated on by the various protein translation options in fmap ? The fmap menu for exons now includes options to either translate the CDS section or the entire set of exons and display or export the result.
The acedb socket server log has changed name from database/server.log
to
database/serverlog.wrm
to be consistent with other acedb log/configuration files.
The records in the server log are now output in the same format as the
log.wrm
records which brings the following improvements:
Originally xace would report every single mismatch between every pair of DNA objects it attempted to align. This was so irritating that the code was changed to report errors only once per pair of objects aligned. Sadly this is still exceptionally irritating for those who are trying to construct large links from existing sequences because the number of pairs of DNA objects to be aligned can be very large. This is exacerbated by the fact that the user, when first making a link, already knows the DNA is incorrectly aligned.
You can now interactively turn on/off reporting of DNA mismatches by selecting the "Report/Don't Report DNA Mismatches" item from the main menu in the fmap. Reporting will stay disabled for each subsequent reuse of that fmap.
Several problems have been fixed in the Tree Display menu, some old options that were removed have been put back because users preferred them to the new ones, e.g. "Preserve". A bug where the "Show As Text" option disappeared from the menu has been fixed. The menu should now work as it always as but with some extra options, if you still have problems with the menu while using the latest monthly build then please mail to acedb@sanger.ac.uk.
The operating system sometimes needs to interrupt the execution of a program perhaps because of a serious error such as the program trying to access another programs memory space. It does this by directly interrupting the programs execution with a "signal", the signal could be one of a number of types such as "SIGSEGV" which means the program was trying to access another programs memory or "SIGFLT" which means the program was trying to do an illegal floating point operation such as dividing by zero. The program is allowed to "catch" these signals and try to decide what to do about them. AceDB catches signals so that it can clear up its read/write locks before exitting.
One of these signals is reserved specifically for interrupting a program and producing a snapshot of what the program was doing when interrupted, this is the familiar "core" file. The signal for doing this is called SIGABRT. The acedb code was erroneously catching this signal meaning that the core file was not produced correctly, or in some cases not produced at all.
This bug has been fixed and the following now applies for signal handling:
Sometimes with serious, reproducible bugs it would be useful for AceDB to not catch any signals so that a core file would be produced when the error occurs. Signal handling can now be turned off in one of two ways:
tace -nosigcatch /your/database
By default programs will run as they always have with signal catching turned on. This is how you should normally run the code, if you turn signal catching off and have been writing to the database when acedb crashes, the database will not be cleaned up with the result that it may get corrupted. This facility is intended for use in debugging difficult errors, not as a standard way to run acedb.
An annoying bug whereby xace would sometimes "freeze" when an attempt was made to print has now been fixed (DDTS bugs: SANgc10014 & SANgc10359).
A bug in the meaning of "hidden column" in tablemaker meant it mapped onto the "hidden" state in the table display system, which caused rows which differed only in hidden columns to appear multiple times. The semantics of "hidden" in tablemaker are not "don't show me this column", but "this is an intermediate working column which doesn't appear in the result table". The code was changed to reflect this, columns marked as hidden are not included at all in the output table.
Two fixes for url handling by acedb:
There was a bug in the dumping code for perl-style and other formats which caused acedb to crash or give strange output if the text being dumped contained a "%". This is now fixed.
If you wish to make suggestions about any of these plans, please mail them to acedb@sanger.ac.uk
Coming in 4_9...
As of AceDB 4_9 (due out anytime now...honest...), AceDB will support the viewing of gapped alignments in Blixem. This is a combination of work on Blixem and the new "Smap" code that will support a much more sophisticated way of contructing "virtual" sequences from clone, gene etc. data than the current fmap supports.
Coming soon...
Work is currently under way to output Ace data in XML format. As well as the data, AceDB will output XML Schema that describe the data and will enable the data to be verified using existing XML parsers that support XML Schema.
You can pick up the monthly builds from:
!*! Please note changed venue !*!