Version 2.0					February 25, 1994

	This directory contains the source code for pvm version of the
"complib" programs described in Despande, Richards, and Pearson (1991)
CABIOS 7:237-247.  Complib is a general platform for sequence
comparison programs that compares one library of sequences - the query
library - against a (typically) much larger library of sequences, such
as Swiss-Prot or PIR (for proteins) of Genbank.  This version of
"complib" provides two comparison functions, FASTA (Pearson and
Lipman, 1988, PNAS 85:2444-2448) and Smith-Waterman (Smith and
Waterman, 1981, J. Mol. Biol.  147:195-197).  

	These programs run under PVM3, a parallel programming system
for networks of unix workstations. This release was tested extensively
under PVM3.2.6.  The programs provide more than 90% of the expected
speed-up on networks of 16 Sparc IPC's - the largest workstation
network we have used. The programs have also been tested on networks
of Dec ALPHA and SGI Indigo machines.  (Although PVM provides a
mechanism for running on a heterogeneous network of machines,
pvcomplib is not set up do do so yet.)

	A version of complib that uses the P4 parallel programming
system is also available.  If you are running the programs on an Intel
PARAGON, the P4 version works better than this PVM implementation.

The Makefile produces four different main programs:

	pvcompfa  - use the FASTA algorithm and show similarity scores
	pvcompsw  - use the Smith-Waterman algorithm and show similarity scores

	pscompfa  - use the FASTA algorithm on a library with superfamilies
		    and summarize search performance
	pscompsw  - use the Smith-Waterman algorithm and summarize performance

In addition, c.workfa and c.worksw - the programs that calculate the
actual scores, are used on the remote machines.

	The programs included in this directory work correctly with
both protein and DNA sequence libraries. (Earlier versions did not
work correctly with DNA sequence libraries.)  A program that compares
a protein sequence to a DNA sequence library (TFASTA) or vice versa
(BLASTX) is not yet available.  At the moment, these programs do not
calculate optimized FASTA scores for only the top-scoring sequences
(with the -o option, optimized scores are calculated for every
sequence in the library) and the programs report only similarity
scores, they do not display alignments. ( A version that calculates
optimized scores for a subset of the library may be available next
month.) None the less, they should be useful for laboratories doing
large scale DNA sequencing that need a more throughput than can be
obtained with FASTA on a single machine.  Likewise, pvcompsw program
can provide very high performance Smith-Waterman searches on a network
of unix workstations.

	This version of the program includes a number of changes from
the standard FASTA distribution. (1) The FASTA SMATRIX file format is
no longer used for scoring matrices.  Instead, the output from S.
Altschul's "pam.c" program, which is included with the BLAST
distribution, is used. (2) The ability to search with both the forward
and reverse complement of a DNA sequence has been incorporated (-i
option).

	This is the first "real" release of these programs. If you
encounter any bugs, please send mail to: 
.(l I
wrp@virginia.EDU
.)l
Remember, however, that PVM and parallel programming systems can be
much trickier than simple unix.  Please be certain that your PVM3
system is working properly before installing PVCOMPLIB.

Bill Pearson