Copyright 1995-2004 The Institute for Genomic Research. All rights
reserved.

----------------------------------------
Installation

1. To install Lucy, type "make" in this directory. Fix any compiler or
makefile incompatible errors, then move the executable "lucy" to your
local binary directory, such as "/usr/local/bin". If your operating
system does not support the standard POSIX Pthread library to allow
multi-threading required in this version of lucy, use the included
version 1.16s of lucy which is also included in this release. Just
"cd" to its directory and "make" there. It is functionally identical
to 1.16p except it won't take advantage of multiple CPUs on your
machines.

2. Also, move the manual page file "lucy.1" to your local man page
directory, such as "/usr/local/man/man1". Don't forget to remake your
manual page index or you will need to type "man -F lucy" each time you
want to get the Lucy manual page. You can also type "man ./lucy.1" in
this directory to quickly see the manual page and/or make a printout.

3. A Postscript version of the manual page has been included as
well. To print the manual page from this file, dump the Postscript
file "lucy.ps" to your Postscript-capable printer.

----------------------------------------
Testing

1. To test the correctness of installed Lucy, type the command

    lucy -v PUC19 PUC19splice atie.seq atie.qul atie.2nd -debug lucy.info

in this directory. Check the generated files lucy.seq and lucy.qul,
see if they are reasonably correct (what does that mean? :). Also,
compare the content of the information file "lucy.info" with the
included "lucy.debug" file to see if they are the same (using the Unix
"diff" command). If not, something may be wrong. Note that in the CLZ
fields there may be some difference when you are running lucy on a
different platform other than Linux PC, but they usually won't
influence the outcome of lucy trimming. See the FAQ below if you are
curious.

(If you have multiple CPUs in your computer, type the same command
above with the additional option "-x CPU_count". You should see a
dramatic speedup of lucy and you should obtain the same output from
lucy with or without this option.)

2. Use Lucy on your own data, see if it works as expected.

3. For more information please see the manual page.

I hope this program is useful to you.

----------------------------------------
Quality trimming parameters

Note: do NOT turn on Phred trimming if you intent to feed its output
directly to lucy; trimming in Phred shortened the sequences and can
prevent lucy from seeing the vector fragments at the ends, resulted in
untrimmed vector fragments. Keeping as much data as possible for lucy
is your best strategy. Lucy does a decent job of both quality and
vector trimming, when you give it enough data to see the whole picutre
of a sequence!

The quality trimming parameters that we use in TIGR depend on a couple
of factors: was the sequence run on an ABI 377 or a 3700, and is the
project a BAC-end project or a non-BAC project.  Here are the 4 cases:
(actually, the quality trimming is the same for all BAC-end projects
regardless of machine type.)

NonBAC3700="-error 0.025 0.02 -bracket 10 0.02 -window 50 0.03"

NonBAC377="-error 0.025 0.02 -bracket 10 0.02 -window 50 0.08 10 0.3"

BAC3700="-error 0.025 0.9 -bracket 10 0.02 -window 50 0.07"

BAC377="-error 0.025 0.9 -bracket 10 0.02 -window 50 0.07"

The real difference between 377 and 3700 is the issue of whether the
quality values were produced by phred or by TraceTuner.  TraceTuner is
a 3rd party product that calls quality values on sequences run on a
3700.

I think that the best parameters to use would be lucy's default set:

LucyDefault="-error 0.025 0.02 -bracket 10 0.02 -window 50 0.08 10 0.3"

----------------------------------------
Frequently asked questions

1. Why does lucy only provide the coordinate of good quality regions
instead of directly removing the bad regions of sequences?

We do not recommend physically removing the bad regions from each
sequence because many sequence assembly programs can still benefit
from these so called "overhang" regions to improve the chance of
making a successfully assembly. If you must remove those bad regions
for your purposes you can use the included simple AWK script
"zapping.awk" to do it or write your own scripts.

2. Can I run lucy on my XYZ type machine?

At TIGR, we run Lucy on a Sun workstation, running the Solaris
operating system (version 5.5.1). Lucy has also been compiled and run
successfully under the Linux operating system, running on a
PC. Although there are no known problems with Lucy under Linux, it has
been exercised much less under Linux than under Solaris.

We use the Sun Workshop C compiler for compilation under Solaris, and
the gcc (Gnu C) compiler under Linux. I do not mean to imply that
other compilers will not work, but these are the ones that have been
tried here.

Lucy has not been run on MacOS or Windows by us, although we believe
porting it to these two platforms should not be too difficult since
source codes have been included. It is very likely that lucy can run
without any modification under a Windows command shell.

3. How's lucy's memory requirement?

Lucy's memory requirement is very moderate. The memory usage does
increase with the number of sequences being trimmed. However, Lucy
does not read all of the sequence and quality data into memory at one
time, but rather reads the data from disk as it is needed. For detail
information about lucy's memory requirement, see the manual page.

4. How can I make lucy talk to my internal database server?

Lucy does not access (nor depend on) a database server. It reads its
input from ASCII text files, and writes its output also to ASCII text
files. The input sequence and quality files are in multi-FASTA format,
as are the output sequence and quality files. It is a design decision
to separate lucy from any site specific assumptions. In TIGR, we use a
separate program, ricky, to drive lucy and provide iuput/output
between lucy and our database infrastructures. You may need to design
similar driver programs if you wish to automatically upload lucy's
output to your database.

5. Which base calling software should I use?

Currently, we are using phred version 0.990722.g as our base caller
for sequences from the ABI 377 sequencer, and TraceTuner (from
Paracel) for sequences from the ABI 3700.  I recommend that you use
phred version 0.990722.g or later for 377 sequences. Some earlier
version of phred produced non-zero quality values no smaller than
15. Older versions of lucy tends to overtrim sequences from those
earlier versions of phred. Latest version of phred can be obtained
directly from its authors <http://www.phrap.org/>.

6. I've downloaded and installed 'lucy', but I can't get it to produce
the same debug info as in the distributed 'lucy.debug' file. That,
according to the READ_ME, means something is wrong, right?

Short answer: it's not a bug in lucy but just the different random
number sequences generated on the two different platforms that are
causing the difference in lucy output. You can safely use lucy on
either platform and it should produce correct outputs.

Long answer: in lucy's secondary sequence extension module, it calls
random number generator to dertermine a real base when it sees letters
such as N or B in a sequence that can mean more than one kind of
nucleotides. This is safe since if the sequence is of high quality
there won't be any N's in its ABI base call anyway. This is just to
give lower quality sequences at the borderline of being dropped by
lucy a chance to be salvaged if their random number determined DNA
sequences match the Phred sequence well. Therefore, in case the random
number generators on two different platforms produce different
numbers, the converted ABI sequence will be somewhat different (at
those N bases) and the match result may be different.

If you look at the diff output between lucy.debug and lucy.info, you
will notice that most matches reported at CLZ fields there are too
short to cause any real difference in lucy generated final trimming
output (i.e., CLR). You can think of them as just random noises.

However, there are indeed three differences in the final lucy output
file lucy.seq between the two platforms (Linux and Solaris):

>ATIEG51TR was dropped on the Solaris side but included in the Linux
>side.
>ATIEO52TF was included on the Solaris side but dropped in the Linux
>side.
>ATIEO93TF was included on both sides but its reported good regions are
>different between the two sides:
< >ATIEO93TF 0 0 0 227 384  Solaris
---
> >ATIEO93TF 0 0 0 49 179   Linux

These three are the only differences between the outputs of lucy
running on the two different paltforms with the atie test suite. If
you look at the ABI sequence file (atie.2nd) and find the above three
sequences in it, you will see a lot of N's scattered around their
sequences. This is the reason of the difference. If you run lucy
without the secondary sequence extension step, then you will get
exactly the same outputs on both platforms (i.e., drop atie.2nd from
the argument list, then run lucy again).

So this is really a user choice: do you want to have more usesable
data included by comparing against the ABI sequence and salvage some
data at the risk of including some junks, or you want to have just
higher quality data at the risk of losing some data that are still
useful?  Perhaps an answer to this is to to run lucy with the
-inform_me option and double check those sequences lucy reports as
'salvaged'; you will find these three sequence names mentioned above
in the lucy report, meaning that lucy knows they are right at the
borderlines. :)

----------------------------------------
File list

Information:
	Copyright    - copyright notice
	HISTORY      - lucy modification history
	README.FIRST - this file you are reading now
	lucy.1       - lucy's manual page in standard Unix man page format
	lucy.ps      - lucy's manual page in Postscript format

Source file:         - source codes used to build the lucy program
	Makefile
	abi.c
	lucy.c
	poly.c
	qual_trim.c
	splice.c
	vector.c

	lucy-1.16s/  - same set of source files for non-parallel lucy

Test files:          - files for testing lucy as mentioned earlier
	PUC19
	PUC19splice
	PUC19splice.for
	PUC19splice.rev
	atie.seq
	atie.qul
	atie.2nd

	pSPORT1splice - these four files were mentioned in HISTORY
	pSPORT1vector
	ARMTM40TR.seq
	ARMTM40TR.qul