Sun Computational Biology SIG Newsletter - May 26, 2006

Bioinformatics on Solaris x64

by Dr. Brian Fristensky, Department of Plant Science, University of Manitoba



An enormous number of computational tools exist for most tasks in bioinformatics, including DNA and protein sequence analysis, mining of gene expression data, pattern detection and recognition.  Many of these programs have been developed on the Solaris Sparc platform. The expansion of Solaris to the 64-bit AMD Opteron processor (x64) potentially offers new choices for the implementation of bioinformatics infrastructure. However, users in any specialized field will not move to a new hardware platform until the applications on which they depend are available on that platform. To make the problem more complex, x64 servers and workstations will typically be added to existing systems which already include Solaris-Sparc and/or Linux-Intel machines. This article describes how these problems have been addressed in the BIRCH bioinformatics system.


Coexistence of multiple platforms


BIRCH is a comprehensive bioinformatics software system, integrated using the GDE graphic interface. BIRCH is distributed with many commonly-used open-source programs (eg. NCBI, BLAST, clustal, Phylip) pre-configured, and ready to run "out of the box". BIRCH provides an organizational framework with numerous system administration tools that simplify adding new programs, and customizing the configuration to fit the local system.  


A heterogeneous system with more than one OS/hardware platform poses numerous problems for both the user and the system administrator. The user wants things to look and work the same, regardless of which platform is being used. However, binaries and libraries are incompatible across platforms. As well, on a heterogeneous system, the sysadmin is faced with the additional work of supporting two or more platforms.

BIRCH scales from a single desktop PC to a diverse server system. While many configurations are possible, our system at the University of Manitoba has numerous Solaris-Sparc and Solaris-x64 and Linux-Intel machines which remotely mount filesystems from a central file server. The result is that any user can log into any login host from any terminal, standalone workstation or a PC running X11 software. All hosts run the same core of common desktop applications, which include


Re-compilation is just the first step

Table 1. BIRCH hierarchical directory structure (for simplicity, not all directories are shown).

/home/birch/

admin/
bin-linux-intel/
bin-solaris-sparc/
bin-solaris-amd64/
doc/
dat/
java/
lib-linux-intel/
lib-solaris-sparc/
lib-solaris-amd64/
local/
admin/
bin-solaris-sparc/
bin-linux-intel/
bin-solaris-amd64/
doc/

dat/
java/
and so on...
public_html/
script/
and so on...
Legacy Solaris-i386 binaries do execute on the Solaris-x64 platform. However, only a handful of bioinformatics applications were specifically supported by their developers on Solaris-i386. Fortunately, a majority of these programs are distributed as open source software. In a large percentage of cases, these programs have already successfully been compiled on Solaris-Sparc.  In our experience, the same code recompiles, with no change, even in the Makefile, on Solaris-x64. For some older C code, compilation was successful on neither Solaris-Sparc nor Solaris-x64.

The second mitigating factor is that software development in bioinformatics over recent years has increasingly favored Java and platform-neutral scripting languages such as Perl and Python. The net result is that the vast majority of applications in BIRCH can now run on Solaris-Sparc, Solaris-x64 and Linux-Intel. However, recompiling is just the first step.

The BIRCH directory hierarchy, whose root is specified by the $BIRCH environment variable, contains script and java directories for platform-neutral scripts and Java applications, and separate binary and library directories for each platform (Table 1). As well, $BIRCH/local mirrors the directory structure of the main BIRCH directories, similar to /usr/local in Unix. All software added locally to BIRCH, or local configuration changes, are made in $BIRCH/local. When BIRCH is updated to a new version, changes made in $BIRCH/local are preserved.

When a user logs in, the appropriate bin directory is added to the user's $PATH. Programs requiring non-standard libraries are run from wrapper scripts which add the appropriate lib directory to the $LD_LIBRARY_PATH. Note that changing the $LD_LIBRARY_PATH in specific scripts is safer than setting the $LD_LIBRARY_PATH  for the entire session at login. The latter approach might result in some programs not being able to find libraries.



A platform-neutral desktop


Click here for full size image

A Sun Java Desktop session running on a Solaris x64 server, as displayed on a SunRay thin client.  In this example, data on the plant chitinase III gene family are displayed from an in-house database using ACeDB (http://www.acedb.org) (top). An aligned set of chitinase III DNA sequences is selected in the GDE interface (center), and a phylogenetic tree was constructed using the FITCH program from Phylip (http://evolution.gs.washington.edu/phylip.html). The output is displayed in a text editor and the ATV tree editor, and in a new GDE window (bottom right).


Most programs in BIRCH are run from the GDE interface. GDE can be thought of as a generic GUI whose main purpose is to launch external programs. Running a program from GDE occurs in four steps. First, the user selects data items in the GDE main window to be used as input. Next, a program is chosen from the hierarchical GDE menus. A popup window appears which lets the user set parameters for the program. The user then launches the program by clicking 'OK'. GDE creates a Unix command to run the program by substituting parameter values into a command template. The command runs the program, using the selected data as input. When the program is finished, output is displayed. For example, PDF output would be displayed in a PDF viewer, HTML output would be dislpayed in a browser, and text output would be displayed in a text editor.

The menus and command templates for each program are specified in a script file called '.GDEmenus'. However, since not all programs will be available for all platforms, it is necessary to generate separate .GDEmeus files for each platform. The Python script makemenus.py reads file called menulist, which  tells which programs are available on each platform. For example, if menulist had an entry reading

Structure
    Cn3D   SL

then the Cn3D program would appear in the Structure menu for Solaris-Sparc and Linux-Intel, but not for Solaris-x64 (which would be 's' in the menulist file).

When a BIRCH user logs in, scripts determine the platform and set environment variables needed by various programs. In this way, almost every aspect of BIRCH can be tailored at login for each platform.

As mentioned above, GDE displays output using a variety of viewers and editors. Not all are available on all platforms. For example, The Adobe Acrobat Reader is not yet available for Solaris-x64. In the BIRCH login scripts, the environment variable $GDE_PDFVIEW might be set to 'acroread' for Solaris-Sparc and Linux-Intel, and to 'gpdf' for Solaris-x64.

The result is that any user can login to any login server or workstation, running any of (at present) 3 platforms, and have most of the same programs, working in exactly the same way.

Conclusion

The introduction of any new OS/hardware platform is an uphill battle. While users and sysadmins may welcome the advantages of the new platform, the main stumbling block is the lag in getting software ported to the new system, or even convincing developers that it is worthwhile to do so. Users don't want yet another learning curve, and sysadmins don't want to double their workload when adding a new platform. Sun appears to have understood these fundamental issues very well. The result is that in many ways, Solaris is Solaris, regardless of the hardware. When the Solaris-x64 platform was released in 2005, it included not only a complete Solaris, but a large set of applications software, without which, no one would use it. One might infer that the wide range of applications that are already available for x64 is evidence of how clean a port of Solaris Sun has done.

In November 2005, x64 servers first came online at the University of Manitoba. I was able to get most of BIRCH ported to x64 within one week. In part this was because BIRCH was deliberately written to support multiple platforms (Solaris-Sparc and Linux-Intel), so the task involved recompiling programs and adding a 3rd platform. What could have been a nightmare turned out to be a largely painless and straightforward process.

One hopes that Solaris-x64 will serve as an object lesson that will revive an interest in the software community for developing platform-neutral code.


For more information

BIRCH (http\://home.cc.umanitoba.ca/~psgendb/birchhomedir/)

Creating and Administering your own BIRCH site (http\://home.cc.umanitoba.ca/~psgendb/birchhomedir/birchadmin/birchadmin.html)

The Solaris x64 Platform (http://www.sun.com/x64/index.html)