ACEDB User Group Newsletter - August 2000

If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.

This month sees some changes to output from the (p)parse commands to give more accurate and informative results about the parsing of acedb data. There is also a small but important change to acedb logging data to allow easier analysis of database activity. And there's a tip about what to do when acedb runs out of memory.

General News

Acedb and running out of memory

Most people who have used acedb for some time will have seen a message like this:


"Memory allocation failure, when requesting 100000 bytes, 20034340 already allocated"

If you were using xace all you can do is click the "OK" button and then xace terminates.

Acedb can run out of memory for a number of reasons:

If you are using acedb and memory allocation fails when trying to allocate some fairly small amount of memory e.g. perhaps 5MB ( about 5,000,000 bytes), then there is a good chance that its because of the last reason mentioned in the list above. In this case you will probably be able to do something about it straight away.

To check this you should do the following (assuming you use C shell which is what most people at Sanger use):

1) Add up the total number of bytes that acedb needed when it crashed:


"Memory allocation failure, when requesting 5242880 bytes, 125829120 already allocated"

In this case a total of 125829120 + 5242880 = 131072000 bytes (or about 125 MB)

2) Find out what your current limit is:


griffin[edgrif]65: limit datasize
datasize        131072 kbytes

In this case the limit is 128MB, although this is more than the 125MB acedb needs, acedb makes use of other libraries which use up memory so this is close enough to suspect that the datasize limit is the problem.

3) Find out if you can increase you current limit:


griffin[edgrif]66: limit -h datasize
datasize        1048576 kbytes

Here the upper limit is set to 1024MB so there is plenty of scope to increase the datasize limit.

4) Increase the limit and rerun acedb:


griffin[edgrif]66: limit datasize 524288

This sets the datasize to 512MB, the limit can be increased further as required.

New Features

Change to output from (p)parse command

The output from the parse and pparse commands has been changed to give more complete information about the ace data parsed. Previously no reporting was done for array objects parsed (e.g. raw DNA data) or for attempts to delete objects. In addition not all errors were trapped or reported.

The new output has these main changes:

Originally the commands produced:


acedb> pparse some_file.ace
// Parsing file  some_file.ace
// Parse error near line 5 in eds_grid1 : Unknown tag "FunnyLine1"
// Parse error near line 57 in eds_grid1.2 : Unknown tag "FunnyLine2"
// Parse error near line 163 in eds_grid1.9 : Unknown tag "FunnyLine3"
// 18 objects read with 3 errors
// 15 Active Objects
acedb> 

The commands now produce:


acedb> pparse some_file.ace
// Parsing file some_file.ace
// ERROR: parse error (object) near line 5 in "grid : "eds_grid1"", error was: Unknown tag "FunnyLine1"
// ERROR: parse error (object) near line 57 in "grid : "eds_grid1.2"", error was: Unknown tag "FunnyLine2"
// ERROR: parse error (object) near line 163 in "grid : "eds_grid1.9"", error was: Unknown tag "FunnyLine3"
// objects processed: 18 found, 15 parsed ok, 3 parse failed
// 15 Active Objects
acedb> 

Individual errors are now reported like this:


// ERROR: parse error (object) near line 5 in "grid : "eds_grid1"", error was: Unknown tag "FunnyLine1"

so you get to know the type of thing being parsed ("general", "object" or "array", where general means "can't ascertain the type"). You see the first line of the object/paragraph which should contain the type and name of the object. You also get to see the original error message which describes what was wrong with the object.

At the end of parsing you now see a summary:


// objects processed: 18 found, 15 parsed ok, 3 parse failed

where it should be true that: found = ok + failed.

If you specify "-v" for parse or pparse you see more detail:


// total processed: 18 found, 15 parsed ok, 3 parse failed
//         general: 0 errors
//         objects: 15 added, 0 editted, 0 deleted, 0 renamed, 0 aliased, 3 errors
//          arrays: 0 added, 0 empty, 0 deleted, 0 errors

So you get to see general errors, e.g. no class name specified for a delete operation, ordinary object operations and array object operations, normal object stats and array object stats separately.

Change to log.wrm format

log.wrm is the file where acedb programs write information about who is using the database, what they are using the database for and also information about any serious errors.

Currently the first log record for a user starting a new session on a database looks like this:

New start User:edgrif,  ACEDB 4_8b,  compiled on: Oct  5 1999 14:43:15

whereas the standard record format for all other log messages is:

2000-04-03_17:21:09 griffin 20611	***some action***

This is not ideal for serveral reasons:

To address these issues the first record for a new user session has been changed so that:

All log records now have the standard prefix:


     "date_time network_id PID"

e.g. "2000-04-03_17:21:09 griffin 20611"

The new user session record format is:


     "date_time network_id PID  NEW_START: User:userid, Program:progname, Level:ace_level, compiled on: date time"

e.g. "2000-07-20_14:20:37 griffin 19877    NEW_START: User:edgrif, Program:tace, Level:ACEDB 4_8c, compiled on: Jul 20 2000 14:14:21

This means all logs records will now have the same prefix and that the user can now be tied to their log records via the PID/network_id, the time of their first use of the database will be recorded and also which program and which machine they were using.

Bugs Fixed

Protein Translation Bugs in fmap

Protein translation displayed in fmap window: display of protein translations for CDS's in the fmap window did not work correctly for reverse strand genes. If you set the beginning position for translation by altering:


Properties -> Coding -> CDS  <new start position>

then the protein translation did not get updated to reflect the change. If you set the end position as well then the translation was shown correctly. This is now fixed so the correct translation is always shown.

Protein translation displayed in separate protein window: Several users have noticed that when they set the end position in the CDS (as mentioned above), they get huge numbers of the character 'X' appended to the end of the translation. This is because the CDS positions must be specified in the spliced DNA coordinates, not the Source_Exons coordinates. If you use the latter then you will be specifying an end position well past the end of the spliced DNA, thus causing the code to append the 'X's to "fill in" the missing data.

Code will be added to give a warning that the specified positions are outside of the spliced DNA coordinate range.

Incompletely drawn tree display window when updating

Many users have noticed how sometimes when they try to "Update" the tree display (i.e. the class viewer window), the window is sometimes only partly drawn with the lower part of the window being blank until the window is scrolled up and down. This very irritating bug has finally be pinned down to an error, not in acedb, but in the underlying GTK toolkit. A work around is in place until the new fixed versions of GTK filter through.

August monthly build NOT yet available.

Currently the acedb test system is not working correctly so only half of the code has been tested. This should be fixed this week. You will receive email when testing has been completed.

Note that this means that some of the features mentioned above will not be available until testing has been completed.

Next User Group Meeting - D213, 2.30pm, Thursday 14th September.



Ed Griffiths <edgrif@sanger.ac.uk>
Last modified: Mon May 21 15:35:57 BST 2001