***************************************************************** Announcing Rel. 4.0 of the MBCRR's Protein Pattern Library and Search Tool (PLSEARCH) ***************************************************************** The MBCRR Protein Pattern Library is a database of "consensus- like" protein sequence patterns, each pattern derived from a set of homologous sequences in the SWISS-PROT Protein Sequence Data- base. Families of related protein sequences are identified by running the entire SWISS-PROT database against itself (using BLASTP, the NLM/NCBI's new high-speed similarity search tool); the resulting set of pair-wise scores are then clustered into families using a maximal-linkage clustering algorithm. A pattern construction algorithm (Smith and Smith 1990, PNAS 87:118-122) is then used to generate a single pattern for each family; the pat- terns, which we call amino acid class covering (AACC) patterns, are functionally equivalent to 'regular expression' patterns and represent the conserved primary sequence elements common to all members of each family. This new release of the pattern library (based on SWISS-PROT rel. 13) contains 5199 entries: 2026 pat- terns derived from all families of 2 or more members (encompass- ing 10664 of the 13837 sequences in SWISS-PROT rel. 13) plus the remaining 3173 "non-related" sequences (i.e. from those loci that did not cluster into any family). The MBCRR distributes the pattern library with a dynamic programming-based search tool (PLSEARCH) for matching and align- ing newly generated protein sequences against the pattern data- base. We have shown that covering patterns can be more diagnos- tic for family membership than any of the individual sequences used to construct a pattern (see Smith and Smith, 1990) thus pat- tern searches can be a more sensitive search technique than trad- itional sequence vs. sequence database search tools. A related package included in the MBCRR-Package directory is our new multi-sequence alignment program (PIMA: Pattern-Induced Multi-Alignment). This program is now being used routinely by the Human Retrovirus and AIDS Sequence Database Group (Los Alamos Natl. Labs) to multi-align HIV protein sequences for phylogenetic analyses. PLSEARCH is written in 'C' and can run under both Unix and VMS operating systems; PIMA employs Unix shell scripts and thus is currently a Unix-only implementation. Both packages are free of charge to non-profit organizations (a distribution fee and non- resale agreement is required for commercial use). ----------------------------------------------------------------- Randall Smith and Temple Smith Molecular Biology Computer Research Resource, LG-127 Dana-Farber Cancer Institute and School of Public Health, Harvard University 44 Binney St., Boston MA 02115 USA (617)732-3746 INTERNET: rsmith@mbcrr.harvard.edu BITNET: rsmith%mbcrr@husc6.bitnet -----------------------------------------------------------------