Sun Computational Biology SIG Newsletter - May 26, 2006
Bioinformatics
on Solaris x64
by
Dr. Brian Fristensky, Department of Plant Science, University of
Manitoba
An
enormous number of computational tools exist for most tasks in
bioinformatics, including DNA and protein sequence analysis, mining
of gene expression data, pattern detection and recognition.
Many of these programs have been developed on the Solaris Sparc
platform. The expansion of Solaris to the 64-bit AMD Opteron
processor (x64) potentially offers new choices for the implementation
of bioinformatics infrastructure. However, users in any specialized
field will not move to a new hardware platform until the applications
on which they depend are available on that platform. To make the
problem more complex, x64 servers and workstations will typically be
added to existing systems which already include Solaris-Sparc and/or
Linux-Intel machines. This article describes how these problems have
been addressed in the BIRCH bioinformatics system.
Coexistence
of multiple platforms
BIRCH
is a comprehensive bioinformatics software system, integrated using
the GDE
graphic interface. BIRCH is distributed with many commonly-used
open-source programs (eg. NCBI, BLAST, clustal, Phylip)
pre-configured, and ready to run "out of the box". BIRCH
provides an organizational framework with numerous system
administration tools that simplify adding new programs, and
customizing the configuration to fit the local system.
A
heterogeneous system with more than one OS/hardware platform poses
numerous problems for both the user and the system administrator. The
user wants things to look and work the same, regardless of which
platform is being used. However, binaries and libraries are
incompatible across platforms. As well, on a heterogeneous system,
the sysadmin is faced with the additional work of supporting two or
more platforms.
BIRCH scales from a single desktop PC to a
diverse server system. While many configurations are possible, our
system at the University of Manitoba has numerous Solaris-Sparc and
Solaris-x64 and Linux-Intel machines which remotely mount filesystems
from a central file server. The result is that any user can log into
any login host from any terminal, standalone workstation or a PC
running X11 software. All hosts run the same core of common desktop
applications, which include
- The Sun Java
Desktop and GNOME desktop
- Browsers (e.g.
Mozilla, Firefox)
- OpenOffice,
StarOffice
- Most of the
applications typically available under GNOME (e.g. GIMP, Evolution)
- Most
common programming languages (e.g. Java, C++, Python, Perl)
Re-compilation
is just the first step
Table
1. BIRCH hierarchical directory structure (for simplicity, not all
directories are shown).
|
/home/birch/
admin/
bin-linux-intel/
bin-solaris-sparc/
bin-solaris-amd64/
doc/
dat/
java/
lib-linux-intel/
lib-solaris-sparc/
lib-solaris-amd64/
local/
admin/
bin-solaris-sparc/
bin-linux-intel/
bin-solaris-amd64/
doc/
dat/
java/
and so on...
public_html/
script/
and so on...
|
Legacy Solaris-i386 binaries do execute on the Solaris-x64
platform. However, only a handful of bioinformatics applications were
specifically supported by their developers on Solaris-i386.
Fortunately, a majority of these programs are distributed as open
source software. In a large percentage of cases, these programs have
already successfully been compiled on Solaris-Sparc. In our
experience, the same code recompiles, with no change, even in the
Makefile, on Solaris-x64. For some older C code, compilation was
successful on neither Solaris-Sparc nor Solaris-x64.
The
second mitigating factor is that software development in
bioinformatics over recent years has increasingly favored Java and
platform-neutral scripting languages such as Perl and Python. The net
result is that the vast majority of applications in BIRCH can now run
on Solaris-Sparc, Solaris-x64 and Linux-Intel. However, recompiling
is just the first step.
The
BIRCH directory hierarchy, whose root is specified by the $BIRCH
environment variable, contains script and java directories for
platform-neutral scripts and Java applications, and separate binary
and library directories for each platform (Table 1). As well,
$BIRCH/local mirrors the directory structure of the main BIRCH
directories, similar to /usr/local in Unix. All software added locally
to BIRCH, or local configuration changes, are made in $BIRCH/local.
When BIRCH is updated to a new version, changes made in $BIRCH/local
are preserved.
When a user logs
in, the appropriate bin directory is added to the user's $PATH.
Programs requiring non-standard libraries are run from wrapper
scripts which add the appropriate lib directory to the
$LD_LIBRARY_PATH. Note that changing the $LD_LIBRARY_PATH in specific
scripts is safer than setting the $LD_LIBRARY_PATH for the
entire session at login. The latter approach might result in some
programs not being able to find libraries.
A
platform-neutral desktop
Click
here for full size image
A Sun Java Desktop
session running on a Solaris x64 server, as displayed on a SunRay thin
client. In this example, data on the plant chitinase III gene
family are displayed from an in-house database using ACeDB
(http://www.acedb.org)
(top). An aligned set of chitinase III DNA sequences is
selected in the GDE interface (center), and a phylogenetic tree was
constructed
using the FITCH program from Phylip (http://evolution.gs.washington.edu/phylip.html). The output is displayed in a
text editor and the ATV tree editor, and in a new GDE window (bottom
right).
|
Most
programs in BIRCH are run from the GDE interface. GDE can be thought
of as a generic GUI whose main purpose is to launch external
programs. Running a program from GDE occurs in four steps. First, the
user selects data items in the GDE main window to be used as input.
Next, a program is chosen from the hierarchical GDE menus. A popup
window appears which lets the user set parameters for the program.
The user then launches the program by clicking 'OK'. GDE creates a
Unix command to run the program by substituting parameter values into
a command template. The command runs the program, using the selected
data as input. When the program is finished, output is displayed. For
example, PDF output would be displayed in a PDF viewer, HTML output
would be dislpayed in a browser, and text output would be displayed
in a text editor.
The menus and command templates for each
program are specified in a script file called '.GDEmenus'. However,
since not all programs will be available for all platforms, it is
necessary to generate separate .GDEmeus files for each platform. The
Python script makemenus.py reads file called menulist, which
tells which programs are available on each platform. For example, if
menulist had an entry reading
Structure
Cn3D
SL
then the Cn3D program would appear in the Structure
menu for Solaris-Sparc and Linux-Intel, but not for Solaris-x64
(which would be 's' in the menulist file).
When
a BIRCH user logs in, scripts determine the platform and set
environment variables needed by various programs. In this way, almost
every aspect of BIRCH can be tailored at login for each platform.
As
mentioned above, GDE displays output using a variety of viewers and
editors. Not all are available on all platforms. For example, The
Adobe Acrobat Reader is not yet available for Solaris-x64. In the
BIRCH login scripts, the environment variable $GDE_PDFVIEW might be
set to 'acroread' for Solaris-Sparc and Linux-Intel, and to 'gpdf'
for Solaris-x64.
The result is that any user can login to any
login server or workstation, running any of (at present) 3 platforms,
and have most of the same programs, working in exactly the same
way.
Conclusion
The
introduction of any new OS/hardware platform is an uphill battle.
While users and sysadmins may welcome the advantages of the new
platform, the main stumbling block is the lag in getting software
ported to the new system, or even convincing developers that it is
worthwhile to do so. Users don't want yet another learning curve, and
sysadmins don't want to double their workload when adding a new
platform. Sun appears to have understood these fundamental issues
very well. The result is that in many ways, Solaris is Solaris,
regardless of the hardware. When the Solaris-x64 platform was
released in 2005, it included not only a complete Solaris, but a
large set of applications software, without which, no one would use
it. One might infer that the wide range of applications that are
already available for x64 is evidence of how clean a port of Solaris
Sun has done.
In November 2005, x64 servers first came online
at the University of Manitoba. I was able to get most of BIRCH ported
to x64 within one week. In part this was because BIRCH was
deliberately written to support multiple platforms (Solaris-Sparc and
Linux-Intel), so the task involved recompiling programs and adding a
3rd platform. What could have been a nightmare turned out to be a
largely painless and straightforward process.
One hopes that
Solaris-x64 will serve as an object lesson that will revive an
interest in the software community for developing platform-neutral
code.
For more
information
BIRCH
(http\://home.cc.umanitoba.ca/~psgendb/birchhomedir/)
Creating
and Administering your own BIRCH site
(http\://home.cc.umanitoba.ca/~psgendb/birchhomedir/birchadmin/birchadmin.html)
The
Solaris x64 Platform (http://www.sun.com/x64/index.html)