*****************************************************************
  Announcing Rel. 4.0 of the MBCRR's Protein Pattern Library
            and Search Tool (PLSEARCH)
*****************************************************************

The MBCRR Protein Pattern Library is a  database  of  "consensus-
like"  protein sequence patterns, each pattern derived from a set
of homologous sequences in the SWISS-PROT Protein Sequence  Data-
base.   Families  of  related protein sequences are identified by
running the entire  SWISS-PROT  database  against  itself  (using
BLASTP,  the  NLM/NCBI's  new high-speed similarity search tool);
the resulting set of pair-wise scores  are  then  clustered  into
families using a maximal-linkage clustering algorithm.  A pattern
construction algorithm (Smith and Smith 1990, PNAS 87:118-122) is
then  used to generate a single pattern for each family; the pat-
terns, which we call amino acid class covering  (AACC)  patterns,
are  functionally equivalent to 'regular expression' patterns and
represent the conserved primary sequence elements common  to  all
members  of each family.  This new release of the pattern library
(based on SWISS-PROT rel. 13) contains 5199  entries:  2026  pat-
terns  derived from all families of 2 or more members (encompass-
ing 10664 of the 13837 sequences in SWISS-PROT rel. 13) plus  the
remaining 3173 "non-related" sequences (i.e. from those loci that
did not cluster into any family).

The  MBCRR  distributes  the  pattern  library  with  a   dynamic
programming-based  search tool (PLSEARCH) for matching and align-
ing newly generated protein sequences against the  pattern  data-
base.   We have shown that covering patterns can be more diagnos-
tic for family membership than any of  the  individual  sequences
used to construct a pattern (see Smith and Smith, 1990) thus pat-
tern searches can be a more sensitive search technique than trad-
itional sequence vs. sequence database search tools.

A related package included in the MBCRR-Package directory is  our
new   multi-sequence  alignment  program  (PIMA:  Pattern-Induced
Multi-Alignment).  This program is now being  used  routinely  by
the Human Retrovirus and AIDS Sequence Database Group (Los Alamos
Natl. Labs) to multi-align HIV protein sequences for phylogenetic
analyses.

PLSEARCH is written in 'C' and can run under both  Unix  and  VMS
operating  systems;  PIMA  employs Unix shell scripts and thus is
currently a Unix-only implementation.  Both packages are free  of
charge  to  non-profit organizations (a distribution fee and non-
resale agreement is required for commercial use).


-----------------------------------------------------------------
Randall Smith and Temple Smith
Molecular Biology Computer Research Resource, LG-127
Dana-Farber Cancer Institute and School of Public Health,
Harvard University
44 Binney St., Boston MA 02115 USA
(617)732-3746
INTERNET: rsmith@mbcrr.harvard.edu
BITNET: rsmith%mbcrr@husc6.bitnet
-----------------------------------------------------------------