Content-type: text/html
Manpage of PRSS3
PRSS3
Section: User Commands (1)
Updated: local
Index
Return to Main Contents
NAME
prss - test a protein sequence similarity for significance
SYNOPSIS
prss34
[-Q -A -f # -g # -H -O file -s SMATRIX -w # -Z #
-k # -v #
]
sequence-file-1 sequence-file-2
[
#-of-shuffles
]
prfx34
[-Q -A -f # -g # -H -O file -s SMATRIX -w # -z 1,3 -Z #
-k # -v #
]
sequence-file-1 sequence-file-2
[
ktup
]
[
#-of-shuffles
]
prss34(_t)/prfx34(_t)
[-AfghksvwzZ]
- interactive mode
DESCRIPTION
prss34
and
prfx34
are used to evaluate the significance of a protein:protein, DNA:DNA
(
prss34
), or translated-DNA:protein (
prfx34
) sequence similarity score
by comparing two sequences and calculating optimal similarity scores,
and then repeatedly shuffling the second sequence, and calculating
optimal similarity scores using the Smith-Waterman algorithm. An
extreme value distribution is then fit to the shuffled-sequence
scores. The characteristic parameters of the extreme value
distribution are then used to estimate the probability that each of
the unshuffled sequence scores would be obtained by chance in one
sequence, or in a number of sequences equal to the number of shuffles.
This program is derived from
rdf2, described by Pearson and Lipman, PNAS (1988) 85:2444-2448, and
Pearson (Meth. Enz. 183:63-98). Use of the extreme value
distribution for estimating the probabilities of similarity scores was
described by Altshul and Karlin, PNAS (1990) 87:2264-2268. The
and expectations calculated by prdf.
prss34
calculates optimal scores using the same rigorous Smith-Waterman
algorithm (Smith and Waterman, J. Mol. Biol. (1983) 147:195-197) used by the
ssearch34
program.
prfx34
calculates scores using the FASTX algorithm (Pearson et al. (1997) Genomics 46:24-36.
prss34
and
prfx34
also allow a more sophisticated shuffling method: residues can be shuffled
within a local window, so that the order of residues 1-10, 11-20, etc,
is destroyed but a residue in the first 10 is never swapped with a residue
outside the first ten, and so on for each local window.
EXAMPLES
- (1)
-
prss34
-v 10 musplfm.aa lcbo.aa
Compare the amino acid sequence in the file musplfm.aa with that
in lcbo.aa, then shuffle lcbo.aa 200 times using a local shuffle with
a window of 10. Report the significance of the
unshuffled musplfm/lcbo comparison scores with respect to the shuffled
scores.
- (2)
-
prss34
musplfm.aa lcbo.aa 1000
Compare the amino acid sequence in the file musplfm.aa with the sequences
in the file lcbo.aa, shuffling lcbo.aa 1000 times. Shuffles can also be specified with the -k # option.
- (3)
-
prfx34
mgstm1.esq xurt8c.aa 2 1000
Translate the DNA sequence in the mgstm1.esq file in all six
frames and compare it to the amino acid sequence in the file
xurt8c.aa, using ktup=2 and shuffling xurt8c.aa 1000
times. Each comparison considers the best forward or reverse
alignment with frameshifts, using the fastx algorithm (Pearson et al
(1997) Genomics 46:24-36).
- (4)
-
prss34/prfx34
Run prss in interactive mode. The program will prompt for the file
name of the two query sequence files and the number of shuffles to be
used.
OPTIONS
prss34/prfx34
can be directed to change the scoring matrix, gap penalties, and
shuffle parameters by entering options on the command line (preceeded
by a `-'). All of the options should preceed the file names number of
shuffles.
- -A
-
Show unshuffled alignment.
- -f #
-
Penalty for opening a gap (-10 by default for proteins).
- -g #
-
Penalty for additional residues in a gap (-2 by default) for proteins.
- -H
-
Do not display histogram of similarity scores.
- -k #
-
Number of shuffles (200 is the default)
- -Q -q
-
"quiet" - do not prompt for filename.
- -O filename
-
send copy of results to "filename."
- -s str
-
specify the scoring matrix. BLOSUM50 is used by default for proteins;
+5/-4 is used by defaul for DNA.
prss34
recognizes the same scoring matrices as fasta34, ssearch34, fastx34, etc;
e.g. BL50, P250, BL62, BL80, MD10, MD20, and other matrices in BLAST1.4
matrix format.
- -v #
-
Use a local window shuffle with a window size of #.
- -z #
-
Calculate statistical significance using the mean/variance
(moments) approach used by fasta34/ssearch or from maximum likelihood
estimates of lambda and K.
- -Z #
-
Present statistical significance as if a '#' entry database had
been searched (e.g. "-Z 50000" presents statistical significance as if
50,000 sequences had been compared).
ENVIRONMENT VARIABLES
(SMATRIX)
the filename of an alternative scoring matrix file. For protein
sequences, BLOSUM50 is used by default; PAM250 can be used with the
command line option
-s P250(or with -s pam250.mat). BLOSUM62 (-s BL62) and PAM120 (-S P120).
SEE ALSO
ssearch3(1), fasta3(1).
AUTHOR
Bill Pearson
wrp@virginia.EDU
Index
- NAME
-
- SYNOPSIS
-
- DESCRIPTION
-
- EXAMPLES
-
- OPTIONS
-
- ENVIRONMENT VARIABLES
-
- SEE ALSO
-
- AUTHOR
-
This document was created by
man2html,
using the manual pages.
Time: 20:15:17 GMT, September 30, 2014