ACEDB User Group Newsletter - October 2001

If you want to have this newsletter mailed to you or you want to make comments/suggestions about the format/content then send an email to acedb@sanger.ac.uk.


A quiet month in preparation for the introduction of a major new facility for constructing virtual sequences in acedb: Smap. This will be the subject of next months newsletter when there will coverage of new code for putting together DNA and the features on that DNA.

There are some welcome additions to the way blixem can be run, both in terms of speed and flexibility. There is also some new code for dumping just the CDS parts of objects listed in the keyset window of xace.


New Features

Running Blixem from xace: new options

Previously, if you used blixem from within xace then you cannot failed to have noticed that sometimes blixem takes a long time to retrieve the sequences it wants and that during this time xace is effectively "frozen" and you cannot do any work. In addition if blixem should crash for any reason then xace crashes to.

These problems are caused by two aspects of the way blixem is being run:

  1. blixem makes a separate call to "efetch" for each sequence it needs and although efetch is fast, starting it up many times is not.
  2. the blixem code is compiled in to xace to form a single program, hence if blixem crashes so does xace.

Two new features have been added to address these problems:

  1. blixem can now make a single call to a "pfetch" server to retrieve all sequences it requires in one go. This is much faster.
  2. blixem can now be called as a separate program from within xace with the result that:

You can control how blixem is run from the preferences menu, the following preferences have been added:

BLIXEM_EXTERNAL
Select this option if you want blixem to be called as an external program.
BLIXEM_PFETCH
Select this option if you want blixem to use "pfetch" instead of "efetch" to obtain its sequences.

You should then set the following preferences so that blixem can find the pfetch server (you should contact your local pfetch administrator if you don't know what values to use):

BLIXEM_NETID
Set this to either the dotted decimal, e.g. "999.9.99.99" or the text form, e.g. "hal" of the network address of the server.
BLIXEM_PORT_NUMBER
Set this to the port number that the pfetch server is listening on, e.g. 2001

Note that BLIXEM_EXTERNAL and BLIXEM_PFETCH can be set independently to one another so you can have any combination of the two.

If you are using blixem as a separate program and want to set it to use pfetch then you can do this in two ways:

environment variables
You can use two environment variables to make blixem use pfetch:
BLIXEM_PFETCH
e.g. "setenv BLIXEM_PFETCH hal"
BLIXEM_PORT
e.g. "setenv BLIXEM_PORT 2001"
The examples assume you use csh.
command line options
-P nodeid<:port>
You can specify the machine and port number with as an option to blixem, e.g. blixem -P hal:2001 rest_of_blixem_args

blixem will use the environment variables when run from xace as well, but you are recommended to control blixems use of pfetch via the preferences menu.

Dumping CDS from the Keyset window

A new option has been added to dump just the CDS dna or corresponding protein translation in FastA format from a set of objects in the keyset window. From the "Export.." menu, select "CDS DNA in Fasta format" or "CDS Protein in Fasta format". If an object in the window does not contain the CDS tag then nothing will be dumped for that object.

Fmap cursor coordinates

Fmap now displays cursor co-ords during middle-button drag.


Bugs Fixed

Dotter bug fixed

A small buglet in dotter is fixed: re-draw HSP lines after changing grey-ramp.


Developers Corner

If you wish to make comments/suggestions about any of the below, please mail them to acedb@sanger.ac.uk

New routines for the aceTmp package

Some new routines/options have been added to the aceTmp package:

aceTmpCreateDir()
Creates a directory and then creates a temporary file within that directory, this makes it more convenient to put all temporary files of a particular type into a single subdirectory of /tmp for instance.
aceTmpNoRemove()
By default aceTmp removes any temporary files that it creates, this is not always desireable, so you can use this call to stop file removal. It is of course then your responsibility to remove the file, you can use the unix "unlink" call to do this:
               char *tmpfile_name ;
               ACETMP tmpfile ;

               tmpfile = aceTmpCreate( args ) ;
 
               tmpfile_name = aceTmpGetFileName(tmpfile) ;
               aceTmpNoRemove(tmpfile) ;

               /* some time later */
               unlink(tmpfile_name) ;

Some comments about the use of bIndexFind()

(Thanks to Jean Thierry-Mieg for his comments mieg@ncbi.nlm.nih.gov)

There have been a number of emails about bIndexFind() recently and it's probably good to summarise some of these:

bIndexFind() was designed to speed up queries by referencing an in memory table to determine whether an object contains a particular tag, rather than reading the object in, opening it and searching for the tag. The speed up is enormous.

So what's the problem ? The main problem is that bIndexFind() does not simply return a boolean, i.e. TRUE if it can find the tag in the index, FALSE if not. The call returns one of three possible values:

BINDEX_TAG_ABSENT (= 0)
The tag is absent from the index.
BINDEX_TAG_UNCLEAR (= 1)
Currently this can be returned for a whole host of reasons: the index is not up to date, bindex package has not been initialised, the supplied key is a protected class or is not a B object or, well quite a few other factors really (see below for more on this).
BINDEX_TAG_PRESENT (= 2)
The tag is present in the index.

This means that it is not safe to write code like this assuming that because you have entered the "if" clause all is OK:

if (bIndexFind (obj, tag))
  {
    some actions....
  }

this can fail because bIndexFind() may return BINDEX_TAG_UNCLEAR.

Safer is to use two functions written by Jean which handle the bIndexFind() return values:

KEY keyGetKey(KEY key, KEY tag) ;
BOOL keyFindTag(KEY key, KEY tag) ;

These use the index where possible and only open the object if necessary.

It is also safe to use bIndexFind() to speed up operations in some circumstances, e.g. If you were going to do this:

if ((obj = bsCreate())
  {
    if (bsGetData (obj,tag,...))
      { ... }
    bsDestroy (obj) ;
  }

you could instead do this:

if ((bIndexFind(obj, tag) != BINDEX_TAG_ABSENT) && (obj = bsCreate())
  {
    if (bsGetData (obj,tag,...))
      { ... }
    bsDestroy (obj) ;
  }

the latter is faster if the tag is not found in the index because the object does not need to be opened, and it's safe because otherwise we then go on to open the object and look for the tag anyway.

Currently bIndexFind() returns BINDEX_TAG_UNCLEAR for a number of conditions which are likely to be coding errors, we should consider messcrashing for these instead, e.g. if the index has not been initialised, trying to find a tag with a value less than "_Date" and maybe others. Send any thoughts on this to: acedb@sanger.ac.uk


October monthly build now available.

You can pick up the monthly builds from:

Sanger users
~acedb/RELEASE.DEVELOPMENT
External users
http://www.acedb.org/Software/Downloads/monthly.shtml


Next User Group Meeting - D319, 2.30pm, Thursday, 18th October 2001



Ed Griffiths <edgrif@sanger.ac.uk>
Last modified: Mon Nov 12 15:08:34 GMT 2001