$Id: readme.v36 779 2011-06-14 14:38:47Z wrp $
  $Revision: 55 $

Version 3.6 of the FASTA programs is a significant update over version
3.5.  It uses the same underlying structure as FASTA35 (specifically
the strategies for ensuring accurate statistics), but it allows for
multiple high-scoring alignments to be shown, rather than just one.
This is the main functional difference between FASTA and BLAST -
BLAST could show multiple HSPs, FASTA did not.

>>June 14, 2011 (released as fasta-36.3.5a June, 2011)
(comp_lib7e.c, comp_lib8.c, compacc2.c)
Fix a serious bug in next_sequence_p() that caused a portion of the library to
be missed when long sequences filled the sequence buffer before the
slots were filled.

Make certain that thread buffers are cleared when running an expansion
script.

Return an extra '\n' before the final summary for consistency with
earlier versions.

>>June 2, 2011	 (released as fasta-36.3.5 June, 2011)
(comp_lib8.c, comp_lib5e.c, comp_lib7e.c)
Fix a bug that indicated that linked expanded sequences were
pre-loaded for alignment when they were not.

>>May 24, 2011 (released as fasta-36.3.5)
(comp_lib8.c, comp_lib7e.c, comp_lib5e.c, mshowalign2.c, compacc2.c,
initfa.c, param.h, scaleswn.c)

The in-memory versions of the program are allocating much more memory
than they actually use, causing the memory limits to cut in too soon.
Fix this by using a smaller MAXLIB_P (36000) for searches against
protein libraries, and expanding/contracting the aa1b_size more
sensibly.  Also add lost_memK value to track lost memory.  For protein
searches, lost memory is now around 15% of allocated memory (down from
40%).

Numerous fixes to improve formatting of HTML output.  Full statistics
parameters are now available with the fdata output.

Add fset_vars() to comp_lib8.c to set m_msg.max_memK properly.
Parameters have been modified to ensure less memory waste (all buffers
have 1000 sequences); Drop default 64-bit library memory limit to 8GB
(-XM8G, LIB_MEMK=8G).

>>May 25, 2011
(comp_lib8.c, comp_lib7e.c, comp_lib5e.c, mshowbest.c)

Add the '-b >1' option, guarantees that at least 1 result is shown,
but otherwise limits by E()-value.  '-b =10' guarantees to show
exactly 10 results (never more or less if the library is large
enough), '-b 10' will show no more than 10 results, limited by -E
e_cut, and '-b >1' will show at least 1 result, but is otherwise
limited by -E e_cut.

>>May 19, 2011
(comp_lib8.c, compacc2.c, param.h)
comp_lib8.c is a version of comp_lib7e.c that keeps sequences in
memory over multiple searches, but returns seqr_chains of buffers of
sequences as they are read, rather than waiting for everything to be
read. comp_lib8.c will automatically allocate up to 2 GB (32-bit
machines) or 8 GB (64-bit machines) to hold the sequence database in
a multiple query search.  This number can be increased or decreased
using the -XM# (megabytes) or -XM#G (gigabytes) option, or by setting
the LIB_MEMK environment variable.  -XM4G (LIB_MEMK=4G) makes 4GB
available for sequence libraries; -XM-1 makes all machine memory
available.

>>May 5 2011
(mshowbest.c)
Fix problems that prevented "-b align_number" properly limit output
with "-z -1".  "-z -1" also broke multiple HSPs (since no threshold
could be calculated); fixed.  
(dropnfa.c)
Fix some offset arithmetic that prevented FASTA alignments from
extending to full length in do_walign().

>>May 4, 2011
(scaleswn.c)
Provide additional checks for division by low numbers in fit_llen2()
and fit_llens().  The similarities between fit_llen(), fit_llens(),
and fit_llen2() have been highlighted, and their differences
documented.  scaleswn.c now provides pstat_info, which writes all the
values required to re-calculate zscores or E()-values from raw scores.

>>May 2, 2011
(dropnfa.c)
Fix a problem with the traditional cgap(join)/optcut(opt) thresholds
(no longer used by default) caused by allowing ktup=3 for proteins.
The ktup=3 modification increased the cgap/opt thresholds by 6.

(comp_lib5e.c, comp_lib7e.c, comp_lib8.c)
Confirm identity of -m # and -m "F3 file.out".  Small differences fixed.

(mshowbest.c, mshowalign2.c)
Remove gi|12345 information from -m B, -m BB blast-like output.  NCBI
Blast does not display gi numbers.

>>Apr. 22, 2011
(doinit.c, initfa.c)
Several of the less common options have been changed to expanded
options, changing the meaning of -X (which now specifies expanded
options), as well as -o, -1, -B, -x, and -y.  -o now provides the
offset coordinates previously specified with -X; -B is now -XB, -o
-Xo, -x -Xx1,-1, and -y -Xy, e.g. -Xy32.

>>Apr. 19, 2011
(comp_lib7e.c, comp_lib5e.c, doinit.c, mshowbest.c)
Test lastest version with -I interactive mode.  Modificiations
required to ensure that aligments goto outfd, not stdout, when
filename is entered.  In addition, in interactive mode there can be
more scores shown than e_cut, so bbp->repeat_thresh must be set in
showbest() not main() program.

>>Apr. 17, 2011
(comp_lib7e.c, doinit.c, compacc.c)

The FASTA programs now support multiple output files with different -m
out_fmt types using the -m "F# out_file" or -m "F#,#,# out_file"
option.  Normally, the -m out_fmt option applies to the default output
file, which is either stdout, or specified with -O out_file (or within
the program in interactive mode). With -m F, an output format can be
associated with a separate output file, which will contain a complete
FASTA program output.  Thus,

  ssearch36 -m 9c -m "FBB blast.out_file" -m "F10 m10.out_file" query library

Will sent the -m 9c output to stdout, but will also send -m BB output
to blast.out_file, and -m 10 output to m10.out_file.  Consistent -m
out_fmt comands can be set to the same file by separating them with
','; e.g.:

  ssearch36 -m 9c -m "F9c,10 m9c_10.out_file" query library.

Producing alternative format alignments in different files has little
additional computational cost.

One of the shortcomings of this approach is that it affects only the
output format, not the other options that modify the amount of output.
Thus, if you specify -E 0.001; that expect threshold will be used for
all the output files.  When a -m option can modify the output (e.g. -m
8 sets -d 0), that modification persists only for that file.

>>Apr. 14, 2011
(initfa.c)
Fix bugs in e_cut_r calculation that made it much too low for
lalign36, and used the >1.0 divisor improperly for all programs
(change from e_cut_r = e_cut_r/divisor to e_cut_r = e_cut/divisor).

>>Apr. 11, 2011
(comp_lib5e.c, comp_lib7e.c, compacc.c)

The non-preload version of FASTA (comp_lib5.c) has been extended to
allow script expansion (comp_lib5e.c). To do this, the central score
calculation loops have been moved to getlib_buf_work(), just as
seqr_chain_work() was created for comp_lib7e.c.  Moreover, the
function used to build the link_file names is build_link_data() is now
in compacc.c.  Differences between comp_lib5e.c and comp_lib7e.c have
been reduced.

>>Apr. 5, 2011
(comp_lib7e.c)
Fix issue with closing unopened link_lib_list_p when no results are
found. Remove no-sequence error message for link library file.

>>Apr. 1, 2011
(comp_lib7e.c)
The -e script.sh has been generalized to have all the capabilities of
a library file, in particular '@' specifies an indirect file, and
"script.sh #" allows a library type to be specified.  Thus, the
script.sh invoked by "@script.sh" should not produce a fasta file; it
should produce a file that contains the name of a fasta file (or
possibly some other format).  If '@' is used, the link_lib file
written to stdout will be prepended with '@', and treated as an
indirect file of file names.

(comp_lib5.c, comp_lib7.c, comp_lib7e.c)
Fix problem with null refstr (no Please cite:).

>>Mar. 31, 2011
(comp_lib7.c, comp_lib7e.c)
close_lib() was being called after each query.  This is incorrect for
versions (like comp_lib7) that keep the entire database in memory; the
files must be kept open to allow ranlib() to get long descriptions
(alternatively, a long description could be read initially).

(comp_lib5.c, comp_lib7.c, comp_lib7e.c) 
Fix query offset coordinates for long queries that are broken up.
Allow query library to have zero-length sequences without stopping
(queries now stop when end-of-file is reached).

(upam.h)
Fix gap penalties for BLOSUM80 matrix (change from -14, -2 to -10, -2).

>>Mar. 29, 2011
(comp_lib7e.c, doinit.c)

Add the ability to search an expanded set of sequences based on the
accessions from the initial search using "-e expand.sh" option.
If "-e expand_script.sh" is specified, the command:

    expand.sh link_acc_file > link_lib_file

is run by the program (fasta36, ssearch36, fastx36, etc), where
link_acc_file and link_lib_file are temporary file names produced by
the program. (The location of the temporary files can be specified
with the $TMP_DIR environment variable.)  link_acc_file contains a
list of accession strings for the statistically significant hits - the
information in the description line to the first space, e.g.

gi|121719|sp|P08010|GSTM2_RAT
gi|121746|sp|P09211|GSTP1_HUMAN

from a search against my pir1.lseg library.

"expand.sh" then reads that file, extracts the accession information,
expands the accessions to a new set of accessions, extracts the
expanded set of accessions from a database and writes them to
standard output (which is saved in the temporary link_lib_file
name). The sequences in expanded link_lib_file are then added to the
initial search, and included in the list of best scores (and
alignments) if their scores are statistically significant. The
additional sequences do not change the initial library size.

To test the expansion capability, use an expand.sh script that simply
cat's a file of homologs to stdout (which will go to link_lib_file and
be read), e.g. expand.sh contains "cat ../seq/gst.lib".

Building a program that can take an arbitrary list of accessions and
produce a library of homologs is more complicated (and slower), but
will allow a smaller database to be searched yet produce results
similar to those found from a larger database.

>>Mar. 24, 2011		(released as fasta-36.3.4)
(comp_lib7.c, dropfx.c, dropfz2.c, doinit.c)
Fix a bug in the new help display; identify and correct various memory
leaks and references to uninitialized data.

>>Mar. 15, 2011
(doc/fasta3x.me, fasta3x.tex)
The ancient, rarely updated, fasta3x.me has been replaced with
fasta3x.tex, with the goal of producing a more up-to-date, accurate,
and comprehensive document describing the capabilities of the FASTA
programs.  In addition, fasta36.1 has been updated/corrected.

(make/Makefile.os_x86_64)
Mac OS X clang 2.0, distributed with Xcode4.0, does not properly
optimize the smith_waterman_sse2_word() in smith_waterman_sse2.c when
clang -O is used to compile.

>>Mar. 4, 2011
(doinit.c)
Histograms are now turned off by default.  -H shows histograms for all
programs, not just the *_mpi (PCOMPLIB) programs.

>>Feb. 27, 2011
(make/Makefile36m.common, Makefile.pcom_t, Makefile.pcom_s)

The threaded programs are now the default, and the *_t versions of
programs have been removed from the Unix and unix-like (MacOX)
distributions.  Windows versions can have either threaded or
non-threaded versions, since the threaded windows programs require an
additional library. Serial versions of the programs can still be built
by editing the make/Makefile36m.common file, and using
include Makefile.pcom_s instead of include Makefile.pcom_t.

The documentation has been edited to reflect these changes.

>>Feb. 24, 2011 (comp_lib5.c, comp_lib7.c, doinit.c, initfa.c,
structs.h) The FASTA programs have a much more informative help
system.  If the -DSHOW_HELP option is included in the Makefile, the
following changes occur: (1) the program is no longer interactive by
default. To get interaction, use the -I option (-I previously meant
showing the identity alignment in lalign; that option is now available
with -J). (2) fasta36 and fasta36 -h present a short help message. (3)
fasta36 -help provides a complete list of options with a more complete
set of options.  The getopt() option strings are now built
dynamically.

>>Feb. 18-21, 2011
(doinit.c)
Fix missing -m 9i percent identity/alignment length.  Fix issues with
short sequence description in -m 6 (html) mode.

>>Feb. 17, 2011
(comp_lib5.c, comp_lib7.c, doinit.c)
Implementation of -m BB which provides completely BLAST-like output
(not just alignments).

Modification of the -b ### option.  Previously, -b 100 guaranteed 100
alignments; now -b 100 limits to 100 alignments if more than 100
alignments have E()-values less than the -E threshold. An '=' symbol
before the number reverts to the previous behavior; e.g. -m =100
guarantees 100 alignments, regardless of E()-value (-m =100 is
equivalent to -m 100 -E 100000.0, and disables other setting of the
E()-value threshold).

>>Feb. 10, 2011
(doinit.c, mshowalign2.c, c_dispn.c)
The FASTA programs have a new alignment option, "-m B", which shows
alignments in BLAST format (no context, coordinates on the same line,
BLAST symbols for matches and mismatches.)  This version does not
change the descriptions of the alignments, which are still FASTA like,
but the alignments themselves should look just like BLAST alignments.
Option -m BB makes output even more blast-like, showing not only the
alignments, but the initial set of high scoring sequences, and other
initial information, like BLAST+.

>>Feb. 9, 2011 	released as fasta-36.3.3
(dropfs2.c, initfa.c, comp_lib*.c)
Modify fasts36/fastm36 to allow up to ktup=3 for proteins; ktup=6 for
DNA (previously the max was ktup=2 for both).

Modify version string to match release version number.

>>Feb. 6, 2011
(initfa.c)
Fix bug that prevented fastm36 from working properly with DNA queries.

>>Jan. 31, 2011
(pcomp_subs2.c, work_thr2.c)
Fixes to fasty36_mpi/tfastx36_mpi problem.  Only fasty needs pascii[]
for alignments, but it wasn't being sent to workers. Fixed.  The MPI
versions of the programs have now been tested much more thoroughly.

>>Jan. 29, 2011
(comp_lib5.c, comp_lib6.c, comp_lib7.c, work_thr2.c, initfa.c,
param.h, dropfs2.c, scaleswt.c, dropfx.c)

Translated DNA shuffles (tfastx36, tfasty36) now shuffle DNA as
codons.  (1) Modify param.h pstruct to include shuffle_dna3,
initialized in resetp() [initfa.c] (2) modify buf_shuf_work() to use
ppst-zs_win and ppst->shuffle_dna3. (3) Add ppst->zs_off=0 to
scaleswt.c/process_hist(). (4) Fix some memory leaks in dropfx.c.
(5) Fix some other memory leads in dropfs2.c.

>>Jan. 28, 2011
(initfa.c, scaleswn.c, mshowalign2.c)
Address crashes that occurred when novel scoring matrices and gap
penalties were specified, particularly for DNA.  Fix memory problem
with long (-L) sequence descriptions.

>>Jan. 23, 2011
(comp_lib7.c)
comp_lib7.c uses a more efficient strategy for reading chunks of
sequences that ensures that sequence data is contiguous for *_mpi
programs.  comp_lib7.c replaces comp_lib6.c, which will be removed.

>>Jan. 22, 2011
(many files)
Replace "mw.h" with "best_stats.h", a much more informative name.

(drop*.c, p_mw.h, w_mw.h)
Remove p_mw.h, w_mw.h from code base and update_params() from
drop*.c. These files are left over from the old p2_complib.c parallel
programs.

>>Jan. 21, 2011	released as fasta-36.3.2
(comp_lib5.c, comp_lib6.c, pcomp_subs2.c)
Fixes for MPI version of programs.  Earlier versions did not handle
DNA/translated DNA comparisons properly, because duplicated sequences
(forward/reverse strand) were not handled properly. The current code
produces the correct scores and alignments, but probably is much less
efficient than it should be.

>>Jan. 11, 2011
(initfa.c, scaleswn.c)
Re-enable DNALIB_LC (read lower-case DNA sequences as lower case).

Reset ktup to default after change for short query in multi-query
searches.

Address multiple issues associated with variable scoring matrices,
i.e. -s '?BP62'.  Introduce pst->pam_name for the actual scoring
matrix, to distinguish it from pst->pam_file, which can correspond to
the std_pam->abbrev, for values like BP62 (which encodes both a matrix
and a specific set of gap penalties).  Ensure that the new scoring
matrix is initialized and extended correctly.  Fix some issues with
scoring matrix names in scaleswn.c

>>Jan. 5, 2010
(dropnnw2.c, dropgsw2.h, global_sse2.c,h, glocal_sse2.c,h)
Include SSE2 optimization for global/global and global/local alignments
provided by Michael Farrar.  Global and glocal alignments are now 20X
faster.

>>Jan. 5, 2011	re-released as fasta-36.3.1
(initfa.c, last_tat.c)
Fix bug resetting pst.e_cut_r for DNA sequences.  Modify last_tat.c
code to use pre-loaded sequence if available. Remove last_tat.c
PCOMPLIB code.

>>Jan. 3, 2011	 released as fasta-36.3.1
(comp_lib5.c, comp_lib6.c)
Add >>><<<, >>>/// to -m 9,10 output for separating multiple query
searches.  Also clean up extra >>>query line before alignments when no
alignments are shown.

>>Dec. 16, 2010
(dropgsw2.c, dropnnw2.c, dropnsw.c, comp_lib5.c, comp_lib6.c)
Fix bug that caused ssearch to not invert coordinates for
reverse-complement DNA alignments (I never imagined using ssearch for
DNA) in dropgsw2.c, dropnnw2.c, and dropnsw.c.  Add SEQ_PAD to aa0[1]
(rev-comp copy) in comp_lib5.c, comp_lib6.c.

>>Dec. 14, 2010
Modify CIGAR strings for frameshifts, including 1F and 1R for forward
and reverse frameshifts.  Extensive documentation updates.
doc/fasta36.1 is the most comprehensive and accurate description of
FASTA options.

>>Dec. 1, 2010
(drop*.c, comp_lib5.c, comp_lib6.c)
Correct problems with copying for recursive sub-alignments.  Correct
bug in adler32_crc calculation that suggested a problem with continued
library sequences that did not exist.

(initfa.c, defs.h)
Use MAXLIB, rather than MAXLIB+MAXTST for comp_lib6.c, which
pre-allocates the sequence database.  Increase MAXLIB.

>>Nov. 24, 2010
(drop*.c, drop_func.h)
Modify drop*.c functions that do recursive sub-alignments to avoid
modifying the aa1[] sequence array, which conceivably could be in use
by other threads. do_walign() now has const *aa0 AND const *aa1.  To
prevent modification of aa1, sub-regions of aa1 are now copied into
newly allocated arrays.

>>Nov. 20, 2010
(cal_cons.c, mshowbest.c, mshowalign2.c, doinit.c)
The -m 9C option displays an alignment code in CIGAR format. (-m 9c
shows the older alignment encoding.)

>>Nov. 16, 2010		(beginning of fasta-36.3.*, verstr 36.07)
(initfa.c, apam.c, upam.h, param.h)

Provide the ability to adjust the scoring matrix based on the length
of the query sequence for alignments using a protein alphabet (this
could certainly be extended to DNA as well).  By including a '?'
before the scoring matrix, e.g. -s '?BP62', a shallower matrix will be
chosen if the entropy of the selected matrix (i.e. bit score per
aligned position) times the length of the protein query is
<=DEF_MIN_BITS (defs.h), currently 40 -- this value should be set
based on the library size).  The FASTA programs include BLOSUM50 (0.49
bits/pos) and BLOSUM62 (0.58 bits/pos) but can range to MD10 (3.44
bits/position). The variable scoring matrix option searches down the
list of scoring matrices to find one with information content high
enough to produce a 40 bit alignment score.  This option is included
primarily for metagenomics scans, which can include relatively short
DNA reads, and correspondingly short protein translations.

Also correct the short-query modification to ktup, so that it works
properly with translated FASTX/FASTY searches (ktup is set to 1 when
the query_length/3 <= 20).

(dropnfa.c, dropfx.c, dropfz2.c)
Shuffled sequence alignment scores are calculated identically to
library alignment scores. Previously, optimized scores were calculated
for all shuffled sequences for FASTA type alignments, even though
typically 20 - 40% of library sequences were optimized.  Now the two
sampling strategies are consistent, though this may cause problems
when only a small fraction of sequences are optimized.

Small changes to provide consistent dropnfa.c, dropfx.c, dropfz2.c
parameter display, and fix display with -m 10.

>>Nov. 15, 2010
(initfa.c)
Enable statistical thresholds by default (previously, they were
enabled with -c -1 or -c 0.01 or anything < 1.0).  The "classical"
join/opt threshold behavior can be restored with -c O (upper case
letter O), or by providing an optimization threshold >
1.0. Statistical thresholds dramatically speed up searches (typically
2-fold), and provide more accurate statistical estimates.  The old
join/optimization thresholds where optimized for BLOSUM50, and other
1/3-bit scaled scoring matrices, and did not work well with BLOSUM62.
Statistical thresholds have been tested extensively, particularly with
-z 21, and produce much more reliable statistical estimates.

>>Oct. 14, 2010
(Makefile.fcom, cal_cons.c)
Edits to re-enable compilation and successful execution of
tfasta36(_t). tfasta36 has been superceeded by tfastx36(_t), which is
faster, and treats frameshifts as a different type of gap.

>>Oct. 13, 2010
(mshowbest.c)
Make it more difficult to request more description/scores than are
available.

>>Sep. 30, 2010	 (released as fasta-36.2.7)
(comp_lib5.c, comp_lib6.c, dropnfa.c, dropfx.c, dropfz2.c)
Fix bugs in DEBUG versions with adler32_crc calculations on
overlapping sequences.  Add more informative error messages when
debugging.  Fix a problem with hist2.hist_a != NULL with some
compilers. Fix formats for some debugging error messages in dropnfa.c,
dropfx.c, and dropfz2.c.

Also fix repeat_threshold calculation for very short sequences, to
guarantee that all matches as good as the best match with the sequence
are found.  Fix some problems that prevented FASTA from finding short
repeats with short queries.

This version of the FASTA36 package offers an alternate main program
file, comp_lib6.c, which reads the entire database into memory before
doing the search.  Using comp_lib6.c can dramatically speed up
searches with multiple queries (there is no advantage with single
query sequences) on large multi-core computers, as each search is done
without re-reading the database.  On a 48-core processor, we see
speedups greater than 40X with ssearch36_t and fastx36_t.  To enable
comp_lib6.c, edit the make/Makefile36m.common file to comment out
lines refering to comp_lib5.c and un-comment lines referring to
comp_lib6.c.

>>Sep. 29, 2010
(comp_lib5.c, comp_lib6.c, mshowbest.c)
Added -m 8C option, which mimics BLAST+ tabular with comment lines
format.

>>Sep. 17, 2010
(dropfx.c)

Fix a bug in dropfx.c/do_walign() that modified library sequences.
(This only caused a problem with comp_lib6.c, which reads the entire
database into memory and re-uses sequence buffers.  Check sequence
consistency with adler32 CRC calculation.

>>Sep. 15, 2010
(mshowbest.c, mshowalign2.c)
Change the output format slightly.  E2() expect values (-z 21+) no
longer contain the library size (which is always the same as the
E(library_size) value), and the -m 9 +- line no longer contains the
frame information, since it is redundant. (The redundant rev-comp
remains on the >-- HSP lines.)

>>Sep. 14, 2010
(comp_lib5.c, mshowbest.c, drop*.c, cal_cons[f].c, etc.)
Implement BLAST -m 8 tabular output.

>>Sep. 9, 2010

(compacc.c) Fix a bug in pre_load_best() that disabled
-L long sequence descriptions.

(doinit.c) Fix a bug that prevented non-overlapping alignments from
being displayed when the -E threshold was changed.  Before -E 0.001
would disable additional alignments.  Now, -E "0.001 0" is required to
disable the additional alignments.

(drop*.c) The display of search parameters has changed to ensure that
gap penalties are displayed on the same line as the scoring
matrix. Previously, the FASTA "Parameters:" section looked like:

Parameters: BL50 matrix (15:-5)xS ktup: 2
 join: 42 (0.0944), opt: 30 (0.601), open/ext: -10/-2, width:  16
 Scan time:  0.450

With fasta-36.2.7 (and later), the Parameters: section is:

Parameters: BL50 matrix (15:-5), open/ext: -10/-2
 ktup: 2, join: 42 (0.102), opt: 30 (0.574), width:  16

The [T]FAST[X/Y] Parameters: section includes the frameshift/substitution penalties (tfasty36):

Parameters: BL50 matrix (15:-5) open/ext: -12/ -2 shift: -20, subs: -24
 ktup: 2, E-join: 0.5 (0.224), E-opt: 0.1 (0.0536),  width:  16

>>Aug. 3, 2010	(released as fasta-36.2.6)
(scaleswn.c)

Modifications to calc_thresh(), proc_hist_ml(), to better accommodate
search strategies (fast?? with statistical thresholds) that provide
complete scores only for a high-scoring fraction of sequences.  For
some query sequences, the E()-values from the database were sometimes
much "worse" than E2()-values, an observation that is
counter-intuitive (if parameters are estimated against shuffled
related sequences, the E()-values should get worse, not better).  For
some queries, the result was very dramatic (E() < 1E-80, E2() <
1E-150).  This error appears to occur because the z-trim or mle_cen
thresholds are including many related sequences.  -z 2 was modified to
censor more sequences when only a subset are scored, and -z 1 was
modified to adjust z-trim more carefully.  As a result, z-trim was
reduced, excluding more sequences.  If too many sequence are excluded,
then regression statistics do not work, and the program fails over to
Altschul-Gish statistics.

-z 21+ modified so that MLE statistics are used for shuffle E2()
values if Altschul-Gish statistics are used for the library
E()-values.

>>July 30, 2010
(comp_lib5.c, pcomp_subs2.c)

Fix bug in buf_align_seq() that allowed buffer over-runs with long DNA
sequences with MPI.  Checks on buffer over-runs are now included in
pcomp_subs2.c/put_rbuf(),get_wbuf().  Aug. 1, 2010, fixed similar bug
in buf_shuf_seq().  -z 21 now works with long DNA sequences.

>>July 28, 2010
(mshowalign2.c)
Fix lalign36/showalign() to show best sub-optimal E()-value, not
bptr[0] E()-value (often identical).

>>July 19, 2010	(released as fasta-36.2.5)
(wm_align.c, dropfx.c,dropfz2.c)
Fix some off-by-one boundary calculations to ensure that every query
that can fit into a library is aligned correctly.

>>May 18, 2010
Implement comp_lib5.c, which simplifies the structure of
comp_lib4.c by moving some calculations into functions.

>>May 10, 2010
Fix problem setting nshow with small library in interactive mode.

>>May 5, 2010  fasta-36.2.3
Fix bug that prevented shuffled scores to be used properly for small
databases (prss capability was lost).

>>May 2, 2010  fasta-36.2.2
Fix problem with tat_score values from fasts and fastm.  fasta35 did
not re-calculate the z-score after last_stats().  fasta36 does, so it
must ensure that the e-value (sometimes p-value) is used correctly.

>>Apr. 29, 2010
More extensive testing of the MPI-PCOMPLIB programs revealed some
problems sending sequences when (or more) frames for the same sequence
was used.  This problem has been addressed, and large scale testing of
fastx36_mpi (with 100K sequence queries in a run) works.

>>Apr. 16,19, 2010
(pcomp_subs2.c, comp_lib4.c, work_thr2.c)
The MPI-PCOMPLIB parallel version of the FASTA36 programs is
working. This PCOMPLIB version takes a very different approach from
the older PVM/MPI parallel programs (p2_complib2.c/p2_workcomp2.c) -
it works virtually identically to the threaded programs (sharing the
same work_thr2.c code and get_rbuf/put_rbuf() (manager) and
get_wbuf/put_wbuf() (worker/thread) functions.  As a result, in this
initial version, the database is NOT distributed to the nodes.  During
multiple searches, the library is re-read each time.  However, load is
distributed to workers exactly the way it would be for the threaded
system, so the workload should scale.

To distinguish them from the earlier mp35compsw, mp35compfa, etc, the
new versions are search36_mpi, fasta36_mpi, etc.

The programs work with multiple queries, and producing multiple
sub-alignments, and work with -m 9c encodings.

>>Apr. 7, 2010
(various Makefiles, comp_lib4.c, pcomp_subs2.c, thr_bufs2.h,
thr_buf_structs.h)

The MPI version of the threaded programs, sseach36_mp, now compiles.
pcomp_subs2.c replaces pthr_subs2.c, and thr_bufs.h ->
thr_buf_structs.h, thr.h -> thr_bufs2.h, and pcomp_bufs2.h has been
added as the equivalent of thr_bufs2.h for PCOMPLIB.

>>Apr. 2, 2010
(comp_lib4.c, work_thr2.c, compacc.c)
Implement init_aa0(), which isolates code that calls init_work and
sets up aa0s, aa1s, f_str[1] (reverse complement) and qf_str so that
the same code is used by the serial, threaded, and (future) PCOMP
versions.

(work_thr2.c)
work_thr2.c now contains code for either threaded or PCOMPLIB
processes. Threaded processes get stuff from work_info; PCOMPLIB
processes get the same information via messages sent from init_thr()
called by main().

>>Mar. 30, 2010
(comp_lib4.c, work_thr2.c, thr_bufs.c +pcomp_subs2.c

The the data buffers used to communicate between workers and threads
have been restructured to separate the old buf2_str, which contained
sequence, score results, and alignment results, into three buffers,
buf2_data_s, buf2_res_s, and buf2_ares_s, separating sequence data
from scores and alignments.  This was done to simplify communication
in the MPI/PVM environment. Workers should be able to return results
directly into the appropriate buffer.

>>Mar. 25, 2010		fasta-36.2.1

(dropfx.c, dropfz2.c)
Found/removed two "static" declarations in small_global that caused problems
with [t]fastx/y with threaded alignments.

>>Mar. 24, 2010  (now version 36.06 with threaded alignments)
(dropnfa.c)
The DNA band aligner in dropnfa.c was not thread safe.  This has been
fixed.

>>Mar. 23, 2010
Code for pre-loading/threaded-aligning sequences has been
significantly cleaned up.  Checks are made before RANLIB() and
re_getlib() in showbest() and showalign() that should be consistent
with annotations AND functions that cannot encode alignments.

Add mshowalign2.c (which does not do PCOMPLIB) to provide threaded
alignments.  build_ares_code() and buf_do_align() modified to ignore
MX_M9SUMM so that alignments are produced whenever demanded (still
does not do alignment if a_res is available).

>>Mar. 22, 2010
(comp_lib4.c, work_thr2.c, thr_bufs.h)

comp_lib4.c has been modified to thread the alignment encoding
(build_ares) for -m 9c. If m_msg.quiet and alignments are required for
showbest(), then the program identifies the number of alignments
required, reads the sequences (and annotations) into a buffer, and
sends them to the threads to be encoded.  Then, when showbest() is
called, bbp->have_ares has been set, and the alignments are not
re-calculated.  This should be extended to thread actual alignment
production, and additional work is required to clean-up the sequence
and bline(description) buffers before a second search.

>>Mar. 17, 2010
(comp_lib4.c, dropnfa,fx,fz2.c)
Modifications to provide more sensible E2() statistical estimates with
threshold-heuristic comparison functions and -z 21.  Also fixed bug
that caused the wrong zs_off to be used with -z 21. dropnfa,fx,fz2.c
now optimize all scores when shuff_flg is set.

>>Mar. 16, 2010
(comp_lib4.c, scaleswn.c, drop*.c)

A new, relatively consistent, statistical estimation strategy has been
introduced for the heuristic programs that optimize only a fraction of
scores (fasta36, [t]fast[xy]36).  Statistics-based heuristic
thresholds can increase search speed 2 - 4-fold by doing band
optimization on only a small fraction of library sequences (with the
-c -1 option, about 10% of alignments are band-optimized, compared
with more than 50% with the classic thresholds).  However, optimizing
only a small part of the library produces two classes of scores,
optimized (10% or less) and non-optimized, with different statistical
properties.  fasta36 addresses this problem by calculating statistical
estimates only for the optimized scores, and then correcting the
significance of the score by accounting for the frequency of
optimization.  For example, sampling only 5% of scores increases the
z-value (std. deviation above the mean) by -logE(0.05)*sqrt(6)/Pi =
2.34 which offsets the z-score by 23.4.  This effect is only seen when
the -c option is used to specify statistical thresholds, and is most
apparent when looking at the histogram, which will be offset by the
appropriate z-score.

This strategy appears to produce more accurate statistics in general,
but can produce less accurate statistics for the heuristic programs when
the -z 21 option is used.

>>Mar. 3, 2010

(comp_lib4.c)
Fix the new stats[] sampling strategy to sample >60K sequences more
more uniformly.  The old code massively over-sampled later sequences,
because of several bugs. The new code works as expected.  The first
60K sequences are represented about 30% more than the rest, but after
60K, sequences are sampled moderately uniformly.  The older
SAMP_STATS_MORE is uniform across all the scores.

(build_ares.c)
Move code to produce chains of alignments (a_res) produced by
do_walign, followed by subsequent calls to calc_id, calc_code, into a
new function, build_ares_code(), which is shared by the
serial/threaded and parallel (p2_workcomp.c) programs.  This is a
first step towards having the parallel programs produce multiple HSP
alignments.

>>Feb. 27, 2010

(lib_sel.c)
Fix problem with new chained library access that prevented more than
two files from being searched.  Also, library name string has been
lengthened to allow a list of libraries to be displayed.

>>Feb. 26, 2010

Parallel programs have been tested in both PVM and MPI versions, and
some additional bugs have been fixed.  Currently, the PVM/MPI versions
are fully functional, but only with FASTA35 capabilities.  The new
multiple HSP alignments and best-shuffle E2() scores are not yet
available.

>>Feb. 24, 2010

Fix some leaks, largely do to more complex alignment data structures
for multiple alignments.  Currently, all the major leaks are in data
structures allocated in main(), and which I don't bother to
de-allocate (mostly library buffer memory).

Change zsflag > 10 to zsflag >= 10 && zsflag < 20 in three places.
Too many shuffles were being done with zsflag==21.

>>Feb. 22, 2010

Begin conversion of p2_complib2.c/p2_workcomp.c.  Very old code to
allocate aln_d_base removed from v35 and v36. No code for best list
shuffle, or multiple high-scoring alignments.  However, the code now
works properly with statistical thresholds. (Changes made to
p2_complib2.c, p2_workcomp.c to update pst struct after last_param.()).

>>Feb. 19, 2010 fasta-36x6

Fix issues with -z 26 statistics.  Add description of E2() statistics.

Added option to specify statistics routine for best-shuffled
statistics independently of library statistics by specifying a second
-z option.  Thus, -z "21 2" uses regression scaled statistics for the
library estimate, and MLE statistics for the best-shuffled estimates.

>>Feb. 17, 2010	fasta-36x5

Some of the simplifications dealing with threads in comp_lib4.c failed
on some compilers and architectures. The code for terminating threads
has been modified to allow sequence buffers with zero entries, to
simplify the empty_buffer logic.  There is now an explicit option to
terminate threads by setting lib_bhead_p->stop_thread.  However, this
flag is never set, as rbuf_done() stops the threads instead.

Also fix problem with stats_idx being associated with wrong buf2_p in
two frame searches.

>>Feb. 15, 2010	fasta-36x4

fasta36 can now display both "search" (E()) and "shuffled" (E2())
E()-value calculation and display in the best scores and
alignments. If the -z option is greater than 20, then two evalues are
calculated, one from the search (e.g. -z 1 uses regression scaled
scores) and a second derived from shuffling the high scoring
sequences.  The high-scoring sequence shuffled scores are
approximately equivalent to doing a PRSS (pairwise shuffle), but more
efficient.  High-scoring shuffled E()-values (labled E2()) are
typically 2 - 5-fold more conservative for average composition
proteins, and 10 - 20X more conservative for biased composition
proteins.

Fix another bug in -S alignment scores vs opt scores in ssearch36 (see
Feb. 8).

>>February 12, 2010
(prev. version 142)

Create comp_lib4.c (from comp_lib3.c), which simplifies some of the
processes for handling buffers of results (no more empty_reader_bufs)
and enables shuffles of high-scoring sequences to evaluate significance.

>>February 8, 2010

Fix a problem with scores and E()-values for SSEARCH sub-alignments
when the -S option is used.  When the -S option was used to ignore
lower-case residues in query or library for the initial score, the
final alignments include the lower-case masked residues.  The
SSEARCH36 was using the non-masked alignment score, rather than the
orginal score (FASTA36, and [T]FAST[XY]36 used the masked score).
This was incorrect, as the statistics are calculated for masked
sequences.  The corrected version calculates both a non-masked and a
masked score, where the masked score (for subalignments) uses the
non-masked alignment.

[T]FAST[XY]36 had a related problem, which is that when multiple
sequences are in the query with the same pam2p[0] (no -S) score, then
the wrong alignment could be shown with the initial scores.  Fixing
this requires that the alignment routine only work on the region
specified from the initial band (fixed in dropnfa.c, dropfx.c, and
dropfz2.c).

>>February 4, 2010

The more efficient statistical thresholds in fasta36 have been
disabled by default.  They can be turned on with -c -1, or by setting
thesholds (-c "0.05 0.2" would set E_band_opt to 0.05 - target 5% of
sequences - and E_join at 20% target).

My initial implementation produced very inaccurate statistics,
presumably because only a small fraction of unrelated sequences were
being band-optimized (fasta35 typically optimized about 60% of library
sequences, fasta36 with statistical thresholds optimizes about 2%,
which causes a 2 - 3X speed increase). The sampling strategy for
fasta36, and [t]fast[xy]36 scores has been adjusted to provide
relatively accurate scores for searches that optimize only a small
fraction of sequences.  On the cases I have tested, statistical
accuracy is comparable to, or better than, the version 35 programs,
but probably not as robust as ssearch estimates.

>>January 29, 2010

The logic to predetermine where scores went for shuffling breaks when
some scores are not calculated (e.g. -M 200 - 300).  Fix by using
nstats as the index for nstats < MAX_STATS, and then use stats_idx
afterwards.

Provide more efficient score sampling logic.  The old method (left
over from fasta34 or earlier) generated a random number for every
sequence after MAX_STATS; if it was less than MAX_STATS, the sample
was used. This logic is still available with -DSAMP_STATS_MORE.  The
new logic samples every other sequence between MAX_STATS and
2*MAX_STATS, every third between 2*MAX_STATS and 3*MAXSTATS, etc, and
randomly replaces one of the stats scores.  For 430K SwissProt, this
reduces the number of samples from 178K to about 145K, and reduces the
number of calls to the random number generator from 430K to 85K.

>>January 28, 2010

(comp_lib3.c, mrandom.c) Tests of ssearch36 statistical accuracy
suggests that the default statistical estimates (-z 1) are not as
accurate as they should be with BLOSUM62, -11/-1.  Both -z 11 and -z 2
work better.  In FASTA35, -z 11 - 15 caused a 2X-slowdown (actually
more) because EVERY library sequence was shuffled, even though only a
fraction of the sequences (for libraries > 60,000 would be used for
the statistical calculation.  comp_lib3.c uses a more sophisticated
strategy for sampling scores after 60,000 so that sequences are only
shuffled and aligned if they will be used in the statistical
calculation.  Doing this on SwissProt, with 430,000 sequences, means
that ~180,000 additional shuffle alignments are done, not 430,000
additional.

However, using -z 11 with the threaded program was much more than
2X-slower -- random() is not re-entrant, and is designed to provide a
consistent set of random numbers over threads, so threads were waiting
on the random number generator, with a big performance penalty.  Using
code from WikiPedia, I implemented a random number generator
(mrandom.c) that saves a local copy of state, so threaded -z 11 has
the correct performance penalty.

>>January 25, 2010 (initfa.c 36.04 January 2010)

(dropfz2.c, aln_struct.h) At long last, tfasty36 correctly produces
multiple alignments on the reverse strand. (Jan. 26, 2010) Fixed
introduced bug in fasty36 that used wrong offset in recursion.

>>January 17, 2010

Extensive changes have been made to all the drop_* functions, so that
multiple alignment results are properly sorted from highest to lowest
sw_score. dropnfa.c, dropgsw2.c, dropfx.c and dropfz2.c now all use
similar strategies to calculate non-overlapping alternative alignments.
score_thresh thresholds are applied to rst.score[ppst->score_ix]
appropriately for all recursive functions.

>>August 24, 2009

Statistical thresholds have been adjusted to produce more
approximately the correct number of joins/band optimizations.  The
approximate fraction of joins/band optimizations is now shown in the
results.

>>August 21, 2009

fasta/fastx/fasty/tfastx/tfasty now use statistically based thresholds
for joining short segments and deciding to do a band optimization --
similar to the threshold strategy used by BLAST.

The statistical thresholds used are set with the
-c option, which used to be used to set optcut.  The -c option now has three ranges:

-c < 0       -- use the old FASTA thresholds, calculated in the same way
0 < -c < 1.0 --	use the statistical thresholds and set E_opt_cut.
c >= 1.0     -- use the old FASTA threshold, and specify it.

For 0 < -c < 1.0, a second argument can be supplied (-c "0.02 0.1")
for the joining E()-threshold.  If this value is < 1.0, it is used as
E_join; if it is > 1.0, E_opt_cut is multiplied by the value to get
E_join.

>>August 19, 2009

Implement Lambda/K/H based c_gap, opt_cut in dropnfa.c, dropfx.c
(fastx), and dropfz2.c (fasty).  Add ELK_to_s() to scaleswn.c.

>>August 11, 2009

Fix bug in dropfx.c that used the wrong variables for calculating
offsets into a long DNA sequence for subset alignments.

Stop putting sw_score in score[0] when no score[0] was calculated.
Use 0 instead.

>>July 31, 2009

(dropgsw2.c) Fix problems with dropgsw2.c that allowed poor
sub-alignments to be shown.  Consolidate merge_ares_acc() for all the
functions.  Add pst.do_rep to disable multiple alignments.

>>July 6, 2009

(initfa.c, apam.c, complib2.c, p2_complib.c) move changes for
validate_novel_aa() from fasta35.

(initfa.c) Enable checks for unusual characters ('Uu' in proteins) for
many more programs with the -p option.

>>June 16, 2009

Modify statistical sampling strategy to greatly simplify the
calculation.

>>May 15, 2009

Fix bug in lav2ps.c, lav2svg.c that occured when displaying very long
sequence alignments (e.g. genome alignments).  The maximum coordinate
is set properly now.

>>May 5, 2009

(initfa.c) Fix bug (int e_cut in pgm_def_arr[]) that prevented e_cut
to be set properly for lalign for DNA.

>>May 4, 2009

The functions that return multiple sub-alignments (HSPs) after the
best alignment have been modified to ensure that alignments are
returned sorted by score, by merging the list of alignments found to
the left and right of the best alignment. 

>>April 28, 2009

(p2_complib2.c, p2_workcomp2.c, mshowbest.c, mshowalign.c) modified to
support new coordinate system, preliminary work on multiple HSPs in
parallel environment.

>>April 14, 2009

(comp_lib2.c, nmgetaa.c) Comprehensive restructuring of library file
list from a fixed length array to a variable length linked list.  The
link lists allows library files to insert additional files into the
list, so that, for example, a file of accession numbers can refer to a
list of files for the accessions.

Eventually, this should allow FASTA to support .pal/.nal files from
the NCBI, and to support files of file names most places file names
are allowed.

>>April 2, 2009		(from fasta35)

(structs.h, comp_lib2.c, doinit.c, mshowbest.c, mshowalign.c) The code
that selects the number of high scores to display has been reorganized
to support the -F e_low option (which was not implemented properly if
-b and -d were specified).  The code is simplified; m_msg.nshow is
used to specify the number of best scores listed, and min(m_msg.nshow,
m_msg.ashow) is used to specify the number of alignments shown.

>>March 26, 2009	(from fasta35 - fa35_04_07)

(initfa.c) Fix problems with 'U' recognition in DNA pam matrix,
correct implementation of -r +mat/-mis.  Previous versions of fasta35
may not have used the correct DNA matrix when the -r +mat/-mis option
was specified.

>>March 23, 2009	(initfa.c verstr -> 36.02)

(mshowbest.c, aln_structs.h) Add loop for displaying multiple aligned
regions with -m 9, -m 9i, and -m 9c in mshowbest.c.

>>March 22, 2009

(dropgsw2.c, dropnnw2.c, wm_align.c) Rearrange code in dropgsw2.c,
dropnnw2.c (which replaces dropnnw.c) so that a single function,
wm_align.c:nsw_malign() is responsible for recursive algnments for
both dropgsw2.c (sw_walign) and dropnnw2.c (nw_walign).  The strategy
for tnese (Smith-Waterman, Global-Local) alignments is
identical. nsw_malign() uses a function pointer that calculates S-W or
N-W that it gets from dropgsw2.c or dropnnw2.c

It might make sense to use a similar strategy for the recursive
translated alignments.

>>March 19, 2009

(map_db.c, mm_file.h) Fix another bug in map_db.c that appears for
sequence files larger than 2Gb.  MM_OFF is now consistently used in
more of the places where an int64_t might is required.

>>March 17, 2009

(list_db.c) Fix a bug in list_db that caused it to misread the maximum
sequence length, and then be off by 4-bytes for all the offsets.  
Include list_db with map_db in the list of auxiliary programs.

>>Mar. 8, 2009		fa35_04_06

(comp_lib2.c, pthr_subs2.c, pthr_subs.h, doinit.c, dec_pthr_subs.c)
Dynamically allocate pthread_t *fa_threads, rather than limit it to
MAX_WORKERS.  MAX_WORKERS is no longer used in the Unix environment;
it gets its value from sysconf(_SC_NPROCESSORS_CONF).  If sysconf() is
not available, MAX_WORKERS is used.  The threaded programs should now
automatically adjust the number of threads to the number of
processors.  Moreover, the number of threads can be set to more than
the number of processors with -T #threads.  Also, max_workers was
renamed fa_max_workers, and pthread_t *threads is now *fa_threads.

>>Mar. 6, 2009

copied comp_lib2.c from v35 (fix for query offset coordinates)

>>Oct. 22, 2008

The programs that allow multiple alignments to be found include:

    ssearch36(_t)
    fasta36(_t)
    fastx36(_t)
    fasty36(_t)

fasts and fastf will probably not be updated in this way, because of
the difficulty in reconstructing alignments, but fastm may be.

Right now, the pvm/mpi versions of the programs do not support
multiple sub-alignments.

>>Sep. 25, 2008

Modify the syntax for the -E option to allow the repeat E()-value
cutoff to be specified in either of two ways. 

       -E "e_cut e_rep"

If the value of e_rep is less than one, it is taken as the absolute
E()-value threshold for additional local domains, for example:

       -E "1.0 0.05" says use 1.0 for the main E()-value threshold,
        and 0.05 as the threshold for additional local alignments.

Alternatively, if e_rep >= 1.0, it is taken as a divisor for the
E()-value threshold, thus:

	-E "1.0 10.0" 

Sets the E()-value threshold for additional local alignments to
1.0/10.0 = 0.1.

Finally, if e_rep <= 0.0, no multiple alignments are done (equivalent
to previous versions of FASTA).