Version 2.0 February 25, 1994 This directory contains the source code for pvm version of the "complib" programs described in Despande, Richards, and Pearson (1991) CABIOS 7:237-247. Complib is a general platform for sequence comparison programs that compares one library of sequences - the query library - against a (typically) much larger library of sequences, such as Swiss-Prot or PIR (for proteins) of Genbank. This version of "complib" provides two comparison functions, FASTA (Pearson and Lipman, 1988, PNAS 85:2444-2448) and Smith-Waterman (Smith and Waterman, 1981, J. Mol. Biol. 147:195-197). These programs run under PVM3, a parallel programming system for networks of unix workstations. This release was tested extensively under PVM3.2.6. The programs provide more than 90% of the expected speed-up on networks of 16 Sparc IPC's - the largest workstation network we have used. The programs have also been tested on networks of Dec ALPHA and SGI Indigo machines. (Although PVM provides a mechanism for running on a heterogeneous network of machines, pvcomplib is not set up do do so yet.) A version of complib that uses the P4 parallel programming system is also available. If you are running the programs on an Intel PARAGON, the P4 version works better than this PVM implementation. The Makefile produces four different main programs: pvcompfa - use the FASTA algorithm and show similarity scores pvcompsw - use the Smith-Waterman algorithm and show similarity scores pscompfa - use the FASTA algorithm on a library with superfamilies and summarize search performance pscompsw - use the Smith-Waterman algorithm and summarize performance In addition, c.workfa and c.worksw - the programs that calculate the actual scores, are used on the remote machines. The programs included in this directory work correctly with both protein and DNA sequence libraries. (Earlier versions did not work correctly with DNA sequence libraries.) A program that compares a protein sequence to a DNA sequence library (TFASTA) or vice versa (BLASTX) is not yet available. At the moment, these programs do not calculate optimized FASTA scores for only the top-scoring sequences (with the -o option, optimized scores are calculated for every sequence in the library) and the programs report only similarity scores, they do not display alignments. ( A version that calculates optimized scores for a subset of the library may be available next month.) None the less, they should be useful for laboratories doing large scale DNA sequencing that need a more throughput than can be obtained with FASTA on a single machine. Likewise, pvcompsw program can provide very high performance Smith-Waterman searches on a network of unix workstations. This version of the program includes a number of changes from the standard FASTA distribution. (1) The FASTA SMATRIX file format is no longer used for scoring matrices. Instead, the output from S. Altschul's "pam.c" program, which is included with the BLAST distribution, is used. (2) The ability to search with both the forward and reverse complement of a DNA sequence has been incorporated (-i option). This is the first "real" release of these programs. If you encounter any bugs, please send mail to: .(l I wrp@virginia.EDU .)l Remember, however, that PVM and parallel programming systems can be much trickier than simple unix. Please be certain that your PVM3 system is working properly before installing PVCOMPLIB. Bill Pearson