(* begin module version *)
version = 5.05 of delman1    1993 January 27


ddddddd   eeeeeeee  ll        m      m     aa     n     nn  
dd    dd  ee        ll        mm    mm    aaaa    nn    nn  
dd    dd  ee        ll        mmm  mmm   aa  aa   nnn   nn  
dd    dd  eeeeeee   ll        mmmmmmmm  aa    aa  nnnn  nn  
dd    dd  ee        ll        mm mm mm  aa    aa  nn nn nn  
dd    dd  ee        ll        mm    mm  aaaaaaaa  nn  nnnn  
dd    dd  ee        ll        mm    mm  aa    aa  nn   nnn  
dd    dd  ee        ll        mm    mm  aa    aa  nn    nn  
ddddddd   eeeeeeee  llllllll  mm    mm  aa    aa  nn    nn  
                                                            
                                                            
                       11     
                      111     
                       11     
                       11     
                       11     
                       11     
                       11     
                       11     
                    11111111  
                              

                         THE DELILA SYSTEM MANUAL

                         THOMAS D. SCHNEIDER
                         COPYRIGHT (C) 1993


1. Don't Panic!  You don't have to absorb this all at once!
2. There is an index at the end of any printed copy of Delman!
3. To create Delman2, see file aa.p

(* end module version *)
(* begin module delman.intro *)


 IIIIIIII  N     NN  TTTTTTTT  RRRRRRR    OOOOO  
    II     NN    NN     TT     RR    RR  OO    OO 
    II     NNN   NN     TT     RR    RR  OO    OO
    II     NNNN  NN     TT     RR    RR  OO    OO
    II     NN NN NN     TT     RR    RR  OO    OO
    II     NN  NNNN     TT     RRRRRRR   OO    OO
    II     NN   NNN     TT     RR  RR    OO    OO
    II     NN    NN     TT     RR   RR   OO    OO
 IIIIIIII  NN    NN     TT     RR    RR    OOOOO


(* end module delman.intro *)
(* begin module delman.intro.outline.1 *)

 DELILA SYSTEM MANUAL OUTLINE

 INTRO: Introduction To The Delila System
      OUTLINE: Outline For The Delila Manual
      DESCRIPTION: What Is The Delila System?
      ORGANIZATION: Organization Of The Manual
      POLICY: Our Policies, A Disclaimer, Obtaining The Delila System,
              Our Address And Acknowledgements

 TRANSPORT: Transportation Of The Delila System
      REQUIREMENTS: What You Will Need To Get The Delila System Running
      TAPE.FORMATS: Tape Data Formats

 ASSEMBLY: Assembly Of The Delila System Programs
      INTRO: What We Mean By Assembly
      CHACHA: Changing Characters And Getting The First Program Running
      REMBLA: Removing Excess Blanks From Files
      WORCHA: The Reserved Word Problem
      MODULE: Module Libraries - What They Are And How To Use Them
      EXAMPLE: An Example Of Constructing A Delila System Program
      PROBLEMS: Problems That May Arise During Assembly

 GUIDE: Hello, Computer - A Guide To The New User
      INTRO: Introduction To The Guide And Your Computer
      ADVICE: Advice And Tips To The New User
      DELILA: How To Use The Delila System On Your Computer

 PROGRAM: System Independent Notes On Programming
      ESSAY: Suggestions On How To Learn And Do Programming
      FABLE: A Fairy Tale For Programmers

(* end module delman.intro.outline.1 *)
(* begin module delman.intro.outline.2 *)

 USE: Uses Of The Delila System
      INTRO: Introduction
      STRUCTURE: Library Structure: Trees, Nested And Named Objects
      LANGUAGE: Delila - The Language
      AUXILIARY.PROGRAMS:  Lister And Search
      DATA.FLOW: Data Flow And Data Loops
      COORDINATES: The Coordinate System Of A PIECE
      CONTROL: How To Control The Responses Of Delila
      COMPARISON: Ways To Compare Sequences
      ALIGNED.BOOKS: How To Make And Use Aligned Books
      PERCEPTRON: Use Of The Pattern Programs
      ENCODE: Use Of The Fabulous And Powerful Encode Program
      DBPULL: Using The Data Base Extraction Programs
      SEARCH: Using The Search Program

 CONSTRUCTION: Constructing Your Own Libraries
      INTRO: Introduction
      STRUCTURE: More  On Library Structure - Logical Vs Physical Structure
      CATAL: Making New Libraries - The Catalogue Program
      EXAMPLE: An Example Of Constructing Delila Libraries
      DATA.ENTRY: Using Your Own Data
      LIBRARY.DESIGN: Making A Delila Data Base
      [FORM...]: The Forms For Library Module Entry

 DESCRIBE: Program And Data Descriptions
      CONVENTIONS: Notation For Naming, Writing And Running Programs
      SHORT.CLUSTER: Short Clustered Descriptions Of Delila System Files
      DOCUMENTATION: How Programs Are Documented
         The format for documentation in the Delila System is in
         file aa.p at the start of the Delman2 manual.

 INDEX
      An Alphabetical Listing Of The Pages In The Manual.
      (See The Page Named DELMAN.INTRO.ORGANIZATION
       For How To Generate The Index.)

(* end module delman.intro.outline.2 *)
(* begin module delman.intro.description *)

      WHAT IS THE DELILA SYSTEM?

      The Delila System is a collection of Pascal programs and data originally
 written at the University of Colorado, Boulder that allows one to manipulate
 and study sets of nucleic-acid sequences.  A set of sequences is called a
 library.  There is a librarian, and "her" name is Delila.  One gives Delila a
 list of instructions that name desired fragments.  Delila then searches the
 library, collects all the sequences together and produces a "book".  The book
 may then be searched for patterns, listed with translation to amino acids, or
 studied in various ways using programs other than Delila ("auxiliary"
 programs).  Since books may be small, these analyses can be efficient.

      Books have the same form as libraries.  In other words, libraries have a
 particular structure so that Delila can work with them.  Books have that same
 structure.  For example, given a Master DNA sequence library one can use
 Delila to make a subset such as a transcript library, containing sequences of
 mRNA.  From the transcript library subsets for gene initiation regions can be
 made and these are guaranteed to be sequences from mRNA.  During all these
 manipulations the numbering of the sequences remains consistent so that one
 can refer back to the original library or the literature.  (The technical
 differences between libraries and books will be discussed later.)

      Any auxiliary program that searches a library will know about the
 structure of the library.  Using this structure and the search results, the
 program can write Delila instructions that specify the locations of the found
 objects.  Once again, using Delila, one can loop back and create a book of
 these objects.  Also, the instructions (instead of the sequences) can be
 manipulated by various programs.

      A NOTE FOR PROGRAMMERS
      Each auxiliary program that reads a book or library knows about the
 library structure.  To make programming easy, a set of routines was written as
 an interface between the actual database (kept in a file) and the program
 calls and variables.  These "book reading routines" are kept together in what
 we call a Module Library, containing many chunks of Pascal code.  Each module
 performs certain kinds of tasks.  The modules are transferred from the module
 library into the source code of each auxiliary program by using the Module
 program.  In this way all changes to the interface packages can be made once
 in the Module Library, followed by a series of transfers.  We may send the
 Delila System with modules removed because there is no reason to send
 duplicate code.  After transportation you would assemble the programs.

      We hope that this section gave you a rough overview of what the Delila
 System can do.  Many more details and examples can be found in the sections
 that follow.

(* end module delman.intro.description *)
(* begin module delman.intro.references *)

    libdef - the definition of the Delila Library System (a file)
    moddef - the definition of the Module Transfer System (a file)
    doodle.info - describes Pascal graphics portable under UNIX

Some of the Delila programs and the method of moving modules around
are described in these papers:

    Schneider, T.D., G.D. Stormo, J.S. Haemer and L. Gold. (1982)
    A design for computer nucleic-acid sequence storage, retrieval and
    manipulation.
    Nucleic Acids Research, 10: 3013-3024.

    Schneider, T.D., G.D. Stormo, M.A. Yarus, and L. Gold (1984)
    Delila system tools.
    Nucleic Acids Research, 12: 129-140.

Some related papers are:
    Stormo, G.D., T.D. Schneider and L.M. Gold (1982)
    Characterization of translational initiation sites in E. coli.
    Nucleic Acids Research, 10: 2971-2996.

    Stormo, G.D., T.D. Schneider, L. Gold and A. Ehrenfeucht (1982)
    Use of the 'Perceptron' algorithm to distinguish translational
    initiation sites in E. coli.
    Nucleic Acids Research, 10: 2997-3011.

    Clift, B., D. Haussler, R. McConnell, T. D. Schneider and G. D. Stormo
    (1986)
    Sequence Landscapes.
    Nucleic Acids Research, 14: 141-158.

    Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
    The information content of binding sites on nucleotide sequences.
    J. Mol. Biol. 188: 415-431.

    Stormo, G.D., T.D. Schneider and L. Gold (1986)
    Quantitative analysis of the relationship between nucleotide
    sequence and functional activity
    Nucleic Acids Research, 14: 6661-6679.

    T. D. Schneider (1988)
    Information and entropy of patterns in genetic switches.
    In G. J. Erickson and C. R. Smith,
    editors, Maximum-Entropy and Bayesian Methods in Science
    and Engineering, volume 2, pages 147--154,
    Dordrecht, The Netherlands, Kluwer Academic Publishers.

    T. D. Schneider and G. D. Stormo (1989)
    Excess information at bacteriophage T7 genomic promoters detected
    by a random cloning technique.
    Nucleic Acids Research, 17:659--674.

Reference for Dotmat, Helix, Matrix and Keymat: 
    J. V. Maizel, Jr. and R. P. Lenk
    PNAS 78: 7665-7609 (1981)

A reference for Index:
    L. J. Korn, C. L. Queen and M. N. Wegman
    PNAS 74: 4401-4405 (1977)


(* end module delman.intro.references *)
(* begin module delman.intro.organization *)

      ORGANIZATION OF THE MANUAL

      The Delila Manual is broken into several somewhat independent sections.
 When Delman is paged by program PBREAK (see Technical notes below) you will
 find an index at the end.  We anticipate at least two kinds of reader:

 1) The builder who wants to get a Delila System running on a local computer.
 The section on transportation will help you get the data into your computer.
 The section on assembly will guide you through the difficult task of getting
 the programs running.  At that point the Delila Libraries will still not be
 ready to use:  you must construct catalogues as described in the section on
 CONSTRUCTING YOUR OWN LIBRARIES (DELMAN.CONSTRUCTION).  Finally you will be
 able to use the Delila System.  We suggest that you first look over the entire
 manual and associated documents.  Then begin the transport.  Good luck!

 2) The user who wants to use a Delila System that is already running on a
 local computer.  You may be interested in looking over the sections on
 transportation and assembly of the system, but this is not necessary.  If you
 don't know anything about using the computer you should start at
 DELMAN.GUIDE.  In any case, read the section on USE OF THE DELILA SYSTEM
 (DELMAN.USE).

 Each program is described in a separate manual, Delman 2.


 TECHNICAL NOTES (These are not be useful to people just starting.)
      1. The section DELMAN.GUIDE must be rewritten after transportation
 to a new computer system.

      2. DELMAN is physically broken into a set of modules.  Each module
 is a page of the manual.  The individual pages can be extracted (or
 transferred and rearranged) by using the program MODULE, as described
 in the document MODDEF and DESCRIBE.MODULE.  The pages may be looked
 at on-line with the SHOW program (DESCRIBE.SHOW).  The manual or
 extracted modules may be broken into pages for output to a lineprinter by
 using the PBREAK program with a parameter file containing:
    (* begin module
    1
  There is no closing "*)" in the trigger because many different
  module names may follow the trigger, so the trigger is for the common
  part of the module beginnings.

      You can generate another index of the contents of this manual in
 the List file of program Module if you use Delman as the Modlib and a copy
 of Delman as Sin.  (See MODDEF for the definitions of these files.)

(* end module delman.intro.organization *)
(* begin module delman.intro.policy *)

      OBTAINING THE DELILA SYSTEM BY FTP
      The Delila system is available by anonymous ftp in the archive at
 ncifcrf.gov in the directory pub/delila.

      OBTAINING THE DELILA SYSTEM BY TAPE
      We prefer not to have to write tapes or disks, but we will send the
 Delila System by tape as a single package if you do not have have ftp access.
 Under most circumstances we cannot send parts of the system or subsets of the
 data.  Please send us a tape as described in delman.transport.tape.formats,
 and we will write out the entire current version and send it back to you.
 There is no fee.  You may redistribute the system.  If you receive a a copy of
 the system from someone else, you may want to check back with us to see if
 there have been any major changes or corrections.  Referring to the version
 number of the program or documentation will help us know if there were any
 changes.

      DISCLAIMER
      No claim or guarantee is made that Delila System programs and data are
 free of error.  Although we send source code, we cannot guarantee that this
 code will compile and run on all computers.  We believe that our code is
 reasonably efficient, but we cannot be responsible for any costs due to using
 the Delila System.  We do not offer programming support, though we are willing
 to answer questions about the Delila System.

      We would appreciate a detailed description of any program errors (bugs)
 or data errors that you encounter.


      OUR ADDRESS

  Tom Schneider
  NCI/FCRDC Bldg 469. Room 144
  P.O. Box B
  Frederick, MD  21702-1201
  (301) 846-5581 (-5532 for messages)
  (301) 846-5598 fax
  network address: toms@ncifcrf.gov

  National Cancer Institute
  Laboratory of Mathematical Biology


      ACKNOWLEDGEMENTS

      Jeff Haemer, Mike Aden and Gary Stormo were instrumental in the original
design of the Delila system.

      Many people have helped us by reading and commenting on this manual.  We
 would like to thank:  Ginny Fonte, Larry Gold, Jeff Haemer, John Hoffhines,
 Jane Hessler (VA), Brent Hughes, Billie Lemmon, Melissa Mockensturm, Sandy
 Parkinson (UT), Pat Roche, Herb Schneider, Susan Scolman, Sidney Shinedling,
 Britta Singer, Rosemary Sweeney, and Mike Yarus.

      Computer time and resources were generously provided by the University of
 Colorado at Boulder, and the Frederick Biomedical Supercomputing Center.

      Funds for this project were provided through grants NIH 1 R01 GM28755,
 NIH 5 R01 GM19963 and ACS NP-178D.

(* end module delman.intro.policy *)
(* begin module delman.intro.comments *)

Please use this page to write comments you have about the manual
and the Delila system.  Our address is on page delman.intro.policy.  Thankyou.

Name:                                    Date:

(* end module delman.intro.comments *)
(* begin module delman.transport *)


tttttttt  rrrrrrr      aa     n     nn   ssssss             
   tt     rr    rr    aaaa    nn    nn  ss    ss            
   tt     rr    rr   aa  aa   nnn   nn  ss                  
   tt     rr    rr  aa    aa  nnnn  nn   ssssss             
   tt     rr    rr  aa    aa  nn nn nn        ss            
   tt     rrrrrrr   aaaaaaaa  nn  nnnn        ss  --------  
   tt     rr  rr    aa    aa  nn   nnn        ss            
   tt     rr   rr   aa    aa  nn    nn  ss    ss            
   tt     rr    rr  aa    aa  nn    nn   ssssss             
                                                            
                                                            
ppppppp    oooooo   rrrrrrr   tttttttt  
pp    pp  oo    oo  rr    rr     tt     
pp    pp  oo    oo  rr    rr     tt     
pp    pp  oo    oo  rr    rr     tt     
pp    pp  oo    oo  rr    rr     tt     
ppppppp   oo    oo  rrrrrrr      tt     
pp        oo    oo  rr  rr       tt     
pp        oo    oo  rr   rr      tt     
pp         oooooo   rr    rr     tt     


(* end module delman.transport *)
(* begin module delman.transport.requirements *)

      TRANSPORTATION - WHAT YOU WILL NEED

      If you have obtained the Delila System by computer tape, you will need
 some way of moving the data on the tape into your computer.  We suggest that
 you find someone who has already dealt with tapes.

      All Delila System programs are written in the language Pascal.  There
 are many books available on this language, but the definition of
 the language is in:
      K. Jensen and N. Wirth
      Pascal User Manual and Report
      Springer-Verlag, New York 1978

      Some of the Delila programs have been automatically translated to C.
 See the README file for further details.

      To run Pascal programs you will need a Pascal compiler on your computer,
 and enough memory to use it.  It is impossible to make an accurate estimate
 of the memory requirements, because this depends on the computer system.
 However, we once set up an older version of the entire system on two computers:
      CDC Cyber/KRONOS 5000 pru x 640 char/pru = 3,200,000 characters
      DIGITAL VAX/VMS  7000 blocks x 512 char/block = 3,584,000 characters
 Since then more programs have been added, and we find roughly:
      4,300,000 characters of source code and files
      5,300,000 bytes of compiled code on a Pyramid 90x computer running UNIX.
 Since these estimates include object code, it is possible that the amount
 you require will be more or less.  The estimates do not include memory
 required for running the system.
      Since transportation of programs from one computer to another is
 still a tricky business, we recommend that either you learn about
 tapes, your computer, and Pascal, or that you find local people who
 know about these things and are willing to give you help.

      The first Delila system file on the tape is called AAA (the name
 guarantees that it will be first).  It lists the name of
 all the Delila files on the tape, in the order that they were taped.
 Following AAA the other files are in alphabetical order.
 Files are described in the manual section DELMAN.DESCRIBE.

      If you keep notes on difficulties that you encounter and
 how each was solved, transportation of future versions of the
 Delila System will be easier.

(* end module delman.transport.requirements *)
(* begin module delman.transport.tape.formats *)

      TAPE DATA FORMATS

      We send the Delila System (programs and data) out on tape.
      Send us a standard 2400 foot tape.  We will send you back the tape with
      the format:

         9 track
         1600 bits per inch
         Unlabeled
         Standard ASCII character set
         80 characters per record
         10 records per block

      We can also send UNIX tar tapes.
      The first file on the tape lists the names of all the files on the tape.

(* end module delman.transport.tape.formats *)
(* begin module delman.assembly *)


    AA      SSSSSS    SSSSSS   EEEEEEEE  M      M  BBBBBBB   LL        YY    YY
   AAAA    SS    SS  SS    SS  EE        MM    MM  BB    BB  LL         YY  YY
  AA  AA   SS        SS        EE        MMM  MMM  BB    BB  LL          YYYY
 AA    AA   SSSSSS    SSSSSS   EEEE      MMMMMMMM  BBBBBBB   LL           YY
 AA    AA        SS        SS  EE        MM MM MM  BB    BB  LL           YY
 AAAAAAAA        SS        SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA        SS        SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA  SS    SS  SS    SS  EE        MM    MM  BB    BB  LL           YY
 AA    AA   SSSSSS    SSSSSS   EEEEEEEE  MM    MM  BBBBBBB   LLLLLLLL     YY


(* end module delman.assembly *)
(* begin module delman.assembly.intro *)

      ASSEMBLY OF THE DELILA SYSTEM PROGRAMS

      At this point we will assume that all the programs and data are in
 files on your computer.  Be sure to read the sections in PROGRAMS AND
 DATA DESCRIPTIONS (DELMAN.DESCRIBE.CONVENTIONS) that discusses our file
 naming and running conventions.

      This section will guide you in the construction of the Delila System
 programs.  There are several stages to this process:
      changing characters - making sure that all the characters are correct
      removing blanks - blank characters at the end of lines can be removed
         to speed processing and save memory.
      changing words - changing the words that your compiler thinks
         are reserved words in Pascal (but aren't in standard Pascal...)
      module corrections - making sure that modular chunks of code function
         correctly on your computer.
      module transfers - inserting chunks of code into programs
      compilation and debugging - making the programs and finding out why
         things don't work ("If something can go wrong, it will." - Murphy)

      We have written some tools to aid you in this process - but to use the
 tools you must first get some of them running - so the first steps must
 be done by hand.

      Remember to take dated notes about your problems and how they were
 solved.


      USE OF COMMAND FILES
      Most computer systems allow one to put commands in a file and execute
 them.  If you can do this, it will speed up assembly enormously.  One
 such "command" file could contain instructions to remove blanks,
 change characters, change words, transfer modules and perhaps even try
 to compile.  However, it would be better to have several command files,
 each of which did a small part, giving you more flexibility.

(* end module delman.assembly.intro *)
(* begin module delman.assembly.chacha *)

      CHANGING CHARACTERS

      When characters are written to tape they are encoded as binary strings.
 When your computer reads the tape, the characters are decoded for
 storage on your computer.  If the decoding does not exactly reverse
 the encoding, then the characters you receive will not be the same as
 the ones that we send.  For example, you many have a pound sign for each
 exclamation mark that we sent.  Your first task is to find out what
 changes occurred (if any).  To aid you, we provided a list of
 characters with English descriptions in the file 'chars'.
 Look at this file and write down the changes required.
      Use the editor on your computer to correct the characters in the file
 CHACHAS.  Now try to compile CHACHAS.  Determine the reasons for any
 errors.  (For example, you may have to switch double and single quotes
 to satisfy the compiler or you may have to remove the non-standard linelimit
 call.)
      The CHACHA program will now assist you in converting characters in
 the files from the tape.  You should try it out on chars, remembering
 not to destroy the original file.  NOTE: Some Pascal compilers may
 not allow programs that read "nonstandard" characters.  (Example:  small
 characters.)  You may be able to get around this by setting compiler defaults.

(* end module delman.assembly.chacha *)
(* begin module delman.assembly.rembla *)

      REMOVING EXCESS BLANKS FROM FILES

      The files that you get off the tape may have extra blanks (spaces) at
 the ends of lines.  This may be due to transportation itself, or the source
 computer may add extra blanks to lines.  Although these blanks will not
 affect the function of most programs, they will slow down program
 execution and use up extra memory.
      Transportation can also add blank lines to the end of the file.  Some
 programs will object to this.  Catal is one example.
      The program Rembla (remove blanks) will remove all blanks from the ends
 of lines in a file, and any extra blank lines at the end.  We recommend that
 you include this as a step during assembly of programs.  It should
 also be done for data files, especially the libraries.

(* end module delman.assembly.rembla *)
(* begin module delman.assembly.worcha *)


      THE RESERVED WORD PROBLEM

      The language Pascal defines certain words (such as PROGRAM, VAR,
 BEGIN and END) to be reserved words.  These words cannot be used as
 variable names.  This in itself presents no difficulties for
 portability.  However, your Pascal compiler (like ours) may reserve more
 words than just the standard set.  If one of the Delila System programs
 uses a non-standard reserved word of your compiler, then the program will
 not compile.  You will not have to change all these names by hand because
 we have sent a program to do it automatically.
      Non-standard reserved words should be listed somewhere in the manual for
 your Pascal compiler.  Use this list and the program WORCHA to remove all
 the reserved names.  We suggest using new names that are not likely
 to appear in a program.  Example: MODULE could be converted to
 ZMODULE without loss of meaning.  ZMODULE is not likely to be already used in
 a program.
      Worcha will not alter literals or comments, so the program's
 operation will not be affected by this change.  If one makes the
 changes with a standard editor, then the program may not act as
 described in this manual.

      (We hope that those people who design compilers will consider this
 problem in the future.)

(* end module delman.assembly.worcha *)
(* begin module delman.assembly.module.1 *)

      ASSEMBLY USING MODULES

      First, familiarize yourself with DELMAN.DESCRIBE.CONVENTIONS.

      You are now ready to assemble a Delila auxiliary program.  The
 raw source LISTERR cannot be compiled as it now stands because
 it is missing a set of replaceable chunks of code (called modules) to read
 books (the book reading interface modules).  These are to be
 found in DELMODS, as stated in the first few lines of LISTERR.  Notice that
 DELMODS is a program - compile and run it.  This will almost certainly
 fail.  Correct those modules that cause problems.  See the section on
 assembly problems.
      Modules can be moved around using the MODULE program.  The details
 of this process are described in MODDEF, which you should study now.


--------------------------- READ MODDEF NOW --------------------------------


(* end module delman.assembly.module.1 *)
(* begin module delman.assembly.module.2 *)

      Prepare to do the module transfers by compiling MODULES.
      All programs should be tested on small inputs at first.
 Test the Module program with the example module source and library:
      MODULE(EXSIN,EXMODLI,EXSOUT,EXCT,LIST,OUTPUT)
 Exsout should be identical to the sout example in ModDef.
 Examine list and exsout.

 Now try:
      MODULE(LISTERR, DELMODS, LISTERS, DELCAT, OUTPUT)
 The OUTPUT file will tell you the progress MODULE makes during the
 transfer.  Modules in DELMODS will be copied into the right places of LISTERR
 and the result will be LISTERS (LISTER with inserts - source code).
 It will be useful to save DELCAT for further transfers from DELMODS.

 Compile LISTERS.  Run the LISTER (using the default parameters):
      LISTER(EX0BK, EX0LIT)
      The file EX0LIT is a listing of the example book EX0BK.  It should be
 identical to EX0LI.  The possible exception is the begin-page character:
 some computers use a 1 to indicate jump to the next page, while others
 use control-L.

      We would now like to know that LISTER works correctly.  To do
 this requires a comparison program.  MERGE will do.  However, to
 construct MERGE requires modules from PRGMODS.  Compiling PRGMODS and
 running it will test interactive i/o.  The procedures in PRGMODS
 that may need modification are PROMPT, READCHAR and READLINE, in
 decreasing order of system dependence.  You should modify LINELIMIT
 and HALT by transferring the corrected modules from DELMODS into
 PRGMODS.  Prepare PRGMODS and run it.

      Prepare MERGE and use it to prove that EX0LIT = EX0LI.

      You may now construct the rest of the programs.  Note that some
 of them use several module libraries.  For the next stage of setting
 up the Delila System compile CATALS, LOOCATS and DELILAS.  You must
 now construct the libraries: skip to CONSTRUCTING YOUR OWN LIBRARIES,
 (DELMAN.CONSTRUCTION).

      NOTE FOR A SECOND TRANSPORTATION
      If you obtain a later version of the Delila System, then Delmods and
 other module libraries are likely to be altered.  You will want to replace
 modules in the new DELMODS and PRGMODS with your own (system dependent)
 versions.  If you did this directly, you would also replace corrections
 and changes to DELMODS.  To avoid this problem, simply construct a small
 module library (containing for example LINELIMIT, DATETIME modules and
 the interaction modules).  Then use this to change DELMODS and PRGMODS.

(* end module delman.assembly.module.2 *)
(* begin module delman.assembly.example *)

      AN EXAMPLE OF CONSTRUCTING A DELILA SYSTEM PROGRAM

      In this example we show the series of steps used to set up a Delila
 system program, given that the module libraries are ready (that is,
 they compile and run).  The example is for Patser, which requires both
 Delmods and Auxmods.  We assume that the tools needed to do this are
 already set up, as discussed on the previous pages.  As noted in
 DELMAN.ASSEMBLY.INTRO, it is frequently possible to automate these steps.

 1. Change Characters
      chacha(patserr,patser1,chachap)
 Chachap must contain the changes you determined earlier.

 2. Remove Blanks
      rembla(patser1,patser2)

 3. Change Words
      worcha(patser2,patser3,worchap)
 Worchap must contain a list of special reserved words and what they
 are to become.

 4. Insert Modules
      module(patser3,auxmods,patser4,auxcat)
      module(patser4,delmods,patsers,delcat)
 Auxcat and delcat will be generated by Module if they were empty.  You
 can reuse them later with their respective module libraries.  The
 module libraries needed are listed in the first few lines of each
 program.  It is not necessary to pickup the DESCRIBE module
 to compile the program.

 5. Compile
 Patsers is now a source code.

(* end module delman.assembly.example *)
(* begin module delman.assembly.problems.1 *)


      ASSEMBLY PROBLEMS

      Transportation and assembly problems occur most often because of
 unavoidable system dependent features of particular Pascal compilers.

 INTERACTIVE INPUT
      For interactive input we wrote several modules that work on our computer
 (INTERACT in PRGMODS).  These procedures may or may not be transportable,
 so you may have to modify them.  For example, interactive input on a cyber
 Pascal compiler requires the file name "input/" - you would have to remove the
 "/" for your compiler.  (This is no longer necessary, as the source
 code is now under UNIX which does not require this.)

 DATE AND TIME PROCEDURES
      The module for date and time calls (module PACKAGE.DATETIME in DELMODS)
 must be rewritten.  We strongly recommend that you keep the same form for
 the dates in libraries so that these routines remain interfaces.  Changing
 the form of the date would make transportation of libraries difficult because
 they would not have the same structure in different locations.
      Modules that will work on a VAX computer are in VAXMODS.  You may find
 it easier to adapt these to your computer rather than the ones that
 are in Delmods.
      If your computer does not have a clock, the simplest way to get this
 module running is to add DATE and TIME procedures in the form called
 by READDATETIME.  These dummy procedures could return either a fixed time
 or a random time made by a true random number generator.  The date
 and time is used to uniquely identify books and some data files.

 QUOTES
      CDC Cyber Pascal compilers require double quotes(") where the standard is
 the single quote (').
 SOLUTION: use CHACHA to convert:
         " to '   and   ' to "
 In some cases you will have to use two single quotes so that Pascal prints
 a single quote.  Some programs that print 5' and 3' are Lister, Helix,
 Matrix and Dotmat.  To convert, simply alter the constant called 'prime'.

(* end module delman.assembly.problems.1 *)
(* begin module delman.assembly.problems.2 *)

 LINELIMIT
       In CDC Cyber Pascal compilers, output to files is limited to 1000
 lines unless the LINELIMIT procedure
 is called.  Your compiler may not require or recognize this silliness.
 SOLUTION: The calls to linelimit are isolated to the procedure
 UNLIMITLN in the module by the same name in DELMODS and PRGMODS.  Simply
 surround the call (inside the modules!!!) with comments.

 INTERNAL FILES (thanks to Sandy Parkinson)
       An "internal file", for the discussion here, is a file used
 by a Pascal program as a scratch pad.  It is not connected to the
 outside world.  Some computer systems and their Pascal compiler
 require that all files be connected to the outside, as they are not
 capable of creating temporary files.  At least two Delila programs
 use internal files: Module and Split.  Correction of this problem
 requires some programming.  It may not be possible to do it for Split.

 COMPARISONS OF PACKED ARRAYS
       May cause you some problems.  One solution is to use arrays
 that are not packed and to write your own comparison procedure.

 THINGS THAT WE HAVE NOT THOUGHT OF...
      Please tell us!  Our address is in DELMAN.INTRO.POLICY.

      For notes on the writing of transportable programs see DELMAN.PROGRAM
      and DELMAN.DESCRIBE.CONVENTIONS.WRITING.

(* end module delman.assembly.problems.2 *)
(* begin module delman.guide *)


  GGGGGG   UU    UU  IIIIIIII  DDDDDDD   EEEEEEEE
 GG    GG  UU    UU     II     DD    DD  EE
 GG        UU    UU     II     DD    DD  EE
 GG        UU    UU     II     DD    DD  EEEE
 GG        UU    UU     II     DD    DD  EE
 GG  GGGG  UU    UU     II     DD    DD  EE
 GG    GG  UU    UU     II     DD    DD  EE
 GG    GG  UU    UU     II     DD    DD  EE
  GGGGGG    UUUUUU   IIIIIIII  DDDDDDD   EEEEEEEE


(* end module delman.guide *)
(* begin module delman.guide.intro *)

      HELLO COMPUTER - A GUIDE TO THE NEW USER

      ABOUT THIS SECTION:  This section is a guide to using the computer.
 Whenever you have questions about the computer, this is the place to
 look, because the rest of the manual is about the Delila System ONLY.
 That is to say, we have split this manual into several parts - and it will
 not help for you to look for the right thing in the wrong part.  The
 reason for this is that the information about the Delila System can be
 moved from one computer to another (just like the Delila System) but
 information about computers usually cannot be moved.  DELMAN.GUIDE must be
 REWRITTEN for other computers and operating systems.


      ABOUT THIS COMPUTER:  This manual section is written specifically for
 UNIX operating systems.  (UNIX is a trademark of Bell Laboratories.)


      OTHER DOCUMENTS AND RESOURCES:
      In general, ask around.

      Type
          help
      to get pointers.

      Learn how to use the UNIX manual program (man).

      The apropos program is useful for finding things.

      There are hundreds of books on UNIX.  Find one you like.  Many
      people seem to like:
         UNIX for People by P. Birns, P. Brown and J. C. C. Muster
         Prentice-Hall, Inc, 1985

      The easiest way to learn to use a computer is to use the computer!
      Obtain a login identification and plunge in.

      DO NOT REVEAL YOUR PASSWORD TO ANYONE!!!

(* end module delman.guide.intro *)
(* begin module delman.guide.advice *)

      SOME ADVICE TO A NEW COMPUTER USER:

 1) YOU CAN'T HURT THE COMPUTER.  Don't hesitate to try things and
 to play around!

 2) After you learn how to get on and off the computer your best bet is to
 get a firm grip on what files are, how you can make them and how to
 manipulate them.  The easiest way to understand what is happening is to watch
 it happen.  You should use the commands that display your files after each
 file manipulation - until you have a good feeling about what is happening.
 If you do this you will quickly become confident about what you are doing.

 3) A lot of the general principles that you pick up will be similar
 on other computers.

 4) Be wary of the characters you type.  Notice that a zero (0) is NOT
 the same as the capital letter O - the computer can tell them apart.
 This is also true for a one (1) and the small l.

 5) Do not do any serious work while you learn to use the computer.  You
 are likely to destroy some of your files.  That will hurt you and not
 the computer.  Loss of good data can be terribly frustrating.

 6) If you have a problem TRY A SIMPLER CASE,
                          TRY TO ISOLATE THE PROBLEM.

 7) An experienced advisor is worth a thousand hours of computer time.


      UNCRITICAL ACCEPTANCE OF COMPUTER RESULTS

   "So useful has the computer become in all branches of statistical analysis
 that there may be some tendency to forget that even it has its limitations.
 The computer cannot work magic--not yet anyway.  It will do only what it is
 instructed to do, and the validity of the results is determined by the
 accuracy and adequacy of the data put in and the wisdom of the people
 writing the instructions.  Granted, the computer can perform a great
 many calculations much more rapidly than mere mortals can do them.
 Nevertheless, speed of computational work is not the same thing as
 infallibility in aiding with the decision-making process.  A statistical
 critic, of all people, should guard against being overawed by the news
 that certain information was turned out by a computer.  The mere fact
 that computers are being used these days even to cast horoscopes should
 be ample proof that a computer is no more immune to spewing out
 nonsense than are real flesh-and-blood people."
      -from FLAWS AND FALLACIES IN STATISTICAL THINKING
         by Stephen K. Campbell (N.J. Prentice-Hall Inc., 1974), p. 182

(* end module delman.guide.advice *)
(* begin module delman.guide.delila *)

      HOW TO USE THE DELILA SYSTEM ON THIS COMPUTER

 Computer:  Cutterjohn and Sparky.

 The Delila System programs and documentation are kept in the directory
     ~toms/delila
 The binary forms (which you can run) are in
     ~toms/bin
 If you put this directory in your path, then they will simply be commands.

(* end module delman.guide.delila *)
(* begin module delman.program *)


    PPPPPPP   RRRRRRR    OOOOOO    GGGGGG   RRRRRRR      AA     M      M
    PP    PP  RR    RR  OO    OO  GG    GG  RR    RR    AAAA    MM    MM
    PP    PP  RR    RR  OO    OO  GG        RR    RR   AA  AA   MMM  MMM
    PP    PP  RR    RR  OO    OO  GG        RR    RR  AA    AA  MMMMMMMM
    PP    PP  RR    RR  OO    OO  GG        RR    RR  AA    AA  MM MM MM
    PPPPPPP   RRRRRRR   OO    OO  GG  GGGG  RRRRRRR   AAAAAAAA  MM    MM
    PP        RR  RR    OO    OO  GG    GG  RR  RR    AA    AA  MM    MM
    PP        RR   RR   OO    OO  GG    GG  RR   RR   AA    AA  MM    MM
    PP        RR    RR   OOOOOO    GGGGGG   RR    RR  AA    AA  MM    MM


(* end module delman.program *)
(* begin module delman.program.essay *)

      SUGGESTIONS ON HOW TO LEARN AND DO PROGRAMMING
      (An Essay By Tom Schneider)

 ABOUT LANGUAGES
      A computer language is the meeting ground between the absolutely
 rigid requirements of a computer (it must be told exactly what to
 do) and the ambiguous and flexible uses of human languages
 (such as "go jump in a lake", "pour me a cup" etc).

      Recently many academic institutions in the USA have allowed students
 to substitute computer languages for a knowledge of human languages.
 Although a knowledge of computers is becoming increasingly important
 in our society, this change is short sighted: no computer
 language is anywhere near as powerful or beautiful as those
 practiced by humans.  With dedication one can easily learn twenty
 computer "languages" in a few years, whereas the polyglot is rare
 indeed.  It is important to learn both kinds of language.  For one to
 substitute FORTRAN for French is preposterous cheating.

 HOW DO LANGUAGES WORK? COMPILERS
      Every kind of computer has its own internal "machine" language.
 It is difficult for a person to write or read this because it
 consists of long stretches of ones and zero's: 0100101010111010000011
 10110111101001110010100101001010...   Every "bit" (a one or a zero) must be
 exactly right or the machine will not operate correctly.  Most
 people can't deal with such immense amounts of detail.  The solution
 is to force the computer to keep track of the details and let the person
 think in word-like and sentence-like units:
      IF SUNNY THEN REJOICE
               ELSE MOPE;
 Once one has written a set of sentences in a "higher" level language,
 one must have the computer convert them to its own internal machine
 language (this is not strictly true, but we will only discuss one
 method here).  The process is called compiling.  A self-contained and
 consistent set of "sentences" and "paragraphs" is called a program.
 Obviously one also needs a program to do the compiling - that program
 is called a compiler.
      For example, one relatively modern language is called Pascal.  A
 Pascal compiler sits ("resides") in ("on" - so much for jargon)
 a particular computer.  It converts statements made in the Pascal
 language into machine zero's and one's for that computer (and only
 that computer).  In other words, it converts a SOURCE code into an
 OBJECT code.  The object code can be made to operate ("run") only
 on one kind of computer.  (Note: the word "code" means "program".  Also,
 on some computers one must convert the object code into "executable"
 code before it can be run.)
      (Here is something to puzzle over.  It is now common practice to write
 a compiler in the same language that the compiler compiles.  The
 Pascal compiler was written in Pascal.  It's like pulling oneself
 out of the mud by the bootstraps... how did it start?)

 WHY PASCAL?
      One of the first languages written was called FORTRAN.  In its day
 (the 1950's) it was a great boon because one no longer needed to write
 in machine language (or even one step up, assembly).  Since that time
 many new ideas have been incorporated into languages.  Some of them
 (such as recursion and complex data types) fall outside the range that
 FORTRAN can handle.  This evolution is to be expected.  Yet people
 still try to teach an old dog, so there have been a series of
 "improvements" to FORTRAN.  The result is a great mish-mash of
 dialects.  For these reasons (and other things like the dread
 FORMAT statement) it is difficult (although not impossible) to write good
 transportable code in FORTRAN.  ("Transportable" or "machine independent"
 means that the program will work on several different computers.)

      Pascal is a more modern language, so it includes recently developed
 concepts.  One can write excellent crystal clear code in this language.
 Unfortunately this property does not prevent one from writing poor and obscure
 code!


 TOPDOWNING: How To Write Clear Code
      There are as many ways to write code as there are people.  Yet a
 few simple principles allow one to organize one's thoughts quickly
 and efficiently.
      Writing a program is just like ... writing an outline.
 One starts at the "top" by writing the main things to be done:
   Tom's Day
      I.   Morning
      II.  Travel To Work
      III. Work
      IV.  Travel Back Home
      V.   Evening

 Then one writes the first section:
      I. Morning
         A. Get Up
         B. Shower
         C. Get Dressed
         D. Eat
         E. Put On Coat
 This is repeated for the other sections.  Eventually we get even deeper:
      I. Morning
          A. Get Up
            1. Huh?
            2. Open eyes
            3. Yawn
               ...

 In Pascal, one dispenses with the numbering of sections.  Instead,
 each section has a name.  A section is called a procedure.  Since you
 can read all about procedures, I won't go into more detail here.
      The main advantage to this method is that if one is careful, each
 procedure is isolated from all the others.  There is only one thing to
 think about at a time.

 SPAGHETTI PROGRAMMING
      Many computer languages, including Pascal, allow one to jump from one
 statement to others in the program.  These GOTO statements invariably
 lead to poor programs because one creates nests of GOTO's that jump
 all over the place.  These can be difficult to figure out.  I
 have seen a case where a professional programmer didn't know about an
 inefficient series of jumps that he had written.  Even large companies
 sell code that is a tangled mess.  Modern programmers have found that
 the solution is amazingly simple:
      DON'T USE GOTO'S
 The Delila system programs use only one GOTO, in a procedure named HALT
 which terminates the program by jumping to the end of the program. This
 is necessary because Pascal does not provide for a program abort procedure.
 (Pascal HALT is not standard.)  There are NO other circumstances when a
 GOTO is required!!

 A METHOD FOR WRITING PROGRAMS

      This is what I do when I write a program:  I have a stack of old
 computer paper (or standard size paper, not printer size).  I write
 one procedure on each sheet.  An entire procedure is "no longer than"
 one page.  In fact, any procedure longer than a page is usually
 a warning that I need more procedures.  It is not necessary at first
 to write the details of every procedure, only to define the
 procedures.  Starting from the top I work down a ways, realize that I
 need a set of primitive procedures (eg. to manipulate text lines)
 so I define them, but the way they work can be written later.  So
 as the highest levels of the program are formed, the lower levels
 are defined.  Eventually it is time to write details of the lower
 levels.  Sometimes the higher level can be simplified as the lower
 levels become clearer.
      As you can tell from this description, one begins from the top, but
 the entire structure changes as one goes.  Don't be afraid to toss
 out a procedure that's no good - it's only one page and the paper
 can be recycled.

      The last point is important:  be flexible.  Don't keep banging your
 head against a logical dilemma.  I have often outlined a whole
 program - and then tossed it out because there was a
 better solution.  Learn when to drop.  Clues: you find yourself
 trying to do many things at once; the primitive procedures that
 you have devised are awkward to use; and you find it impossible to
 document a procedure.

      Document a procedure??


 DOCUMENTATION: The Key To Immortal Code

      Even in a high level language like Pascal, it is possible to have a
 functioning program that is not easy to understand.  To define a procedure
 I often write down the name of the procedure, the variables (pieces of
 information to be manipulated) that it uses and then a few English sentences
 that define exactly how the variables are to be used.  This is all one needs
 for the higher levels of the outline.  Those written sentences are called
 comments.  They are part of the documentation required to make the program
 easy to write and ... easy to read.

      It is impossible to overemphasize the importance of documentation
 because nobody EVER does enough (me included).
      If you don't document, within a short time (e.g. a month to half
 a year) you will have forgotten the details of the program - and it will be
 painful to figure it out again.  Worse than that - nobody else will be
 able to work with it!
      It is not hard to write out what you are trying to do in a particular
 section of code or procedure, and it has a real advantage: one is
 forced to think clearly.
      There are several places in a program that ought to have comments:

 PROGRAM STATEMENT - the program should state its purpose in life, how it
 should be used, who wrote it and the date of the latest version.  Some
 technical details can be included.

 CONSTANTS - Include a constant called VERSION and CHANGE THIS EVERY TIME
 THAT YOU CHANGE THE SOURCE CODE.   Write the version to all output from the
 program.  This will assure that all output can be unambiguously
 associated with a particular version of the program.  This will save you
 many headaches! (Note: some computers keep track of file versions.
 FILE VERSIONS WILL NOT SUBSTITUTE FOR AN INTERNAL CONSTANT because
 the program output is not affected and it is not transportable.)

 All CONSTANTS, TYPES and VARIABLES should have a short description of
 their purpose.  DON'T USE ONE VARIABLE FOR TWO PURPOSES - you will
 be unable to document these cases properly and the code will be
 confusing.

 Each PROCEDURE or FUNCTION should have a short description that
 tells how to use it and gives the purpose of each passed variable.


 *****************************************************************************
 *    SUMMARY: programming is vastly simplified by using two simple tactics: *
 *             topdowning and documentation.                                 *
 *****************************************************************************


      A NOTE ON DATA STRUCTURES
      Higher level languages, such as Pascal (but not FORTRAN) allow one to
 describe data in forms (structures) that resemble the way one thinks
 about the problem.  To take advantage of these facilities, it pays to
 name each "variable" (a structured box into which data is put) and "type"
 (the structure of the box) carefully.  A good name will make
 operations on the variable obvious, and errors will stand out because
 they will "sound" wrong.


 LOCATING ERRORS: Debugging
      Even with top down programming and documentation, errors are made.
 These are called "bugs".  There are several kinds:
      SYNTAX - the compiler will yell at you for things like spelling mistakes
      BOMBING - the program stops abruptly when it should not
      LOGIC - the program produces strange results
      SUBTLE - the program can't handle certain rare conditions correctly

 SYNTAX - It helps to check what you type in.  Since I put one procedure
 per hand written page, this is the easiest unit to check.  Many subtle
 bugs can also be caught this way.

 BOMBING - It is often obvious where the program died.  Work backwards through
 the logic to find the error.  Clear, top-down code makes this much easier:
 one can often tell immediately where the problem is.  Tracing also can
 help.  See below.

 LOGIC and SUBTLE - Some computer systems allow one to trace the path that
 the computer follows through a program.  So far I have not found these
 useful because they are cumbersome and they put out too much data.
 A few well placed write statements will trace the program flow quite well.
 (A "write statement" could print the value of a variable out for you and
 tell you where the computer currently is in the program.)
 In Pascal, one method is to make a global constant:
      DEBUGGING = TRUE; (* FOR DEBUGGING PURPOSES *)
 and use it this way:
      IF DEBUGGING THEN WRITELN(OUTPUT, "BEGIN PROCEDURE CIRCLE");
 By changing the value of DEBUGGING one can turn the trace on and off.
 To turn off an individual trace point, one can "comment it out":
      (* IF DEBUGGING THEN WRITELN(OUTPUT, "BEGIN PROCEDURE CIRCLE"); *)
 The symbols "(*" and "*)" will make Pascal ignore the contents,
 because they become comments.  The advantage of this over removing the
 statement is that it allows one to reactivate it easily.

 By far, the most time saving method is to write clear, well documented code.


 TESTING CODE
      It is often worthwhile to test a program on a small set of examples that
 one has worked out by hand.  You should be aware however, that correct
 answers to tests do not prove that the program is correct.  (This may
 seem obvious, but it is an easy mistake to make.)  Sometimes one can
 prove the correctness of a program.  This is a current field of research
 in computer science.


 HOW TO READ MANUALS
      Obtain your own copy of the manual and begin to read.  Get a general idea
 of how the language, editor or system works.  Don't worry about details
 yet.  As soon as you have an idea about how to do something, try it on
 the computer.  Play.  Later on, you can read through the manual seriously
 if you want.  However there is often a lot of detail that you would have
 to memorize.  It is simpler to know that something can be done (by reading
 it once lightly) and to look it up when you need to do it.


 WRITING TRANSPORTABLE PROGRAMS

      A program written for one computer may not run on another computer
 because the compilers for the two computers may not understand the
 same language.  Moving a program from one computer to another is called
 transportation.  If you are going to the trouble and effort to write a
 good program, then you may as well make it easy for other people to use
 it.  Your program would then be transportable.
      Obviously to be transportable, a program must be well written and
 documented.  That is not all.  You must avoid all the fancy "features"
 that your compiler advertises, because no one else has these.  If you
 are forced to use some feature, then isolate it to a few replaceable
 procedures.  We have provided you with a transportable(!) mechanism for
 replacing chunks of code like this - see the document MODDEF and the MODULE
 program.

 PROGRAM MAINTENANCE... SENILITY... AND DEATH.
      The most costly aspect of using computer programs is not their initial
 writing, but maintaining them once they are written.  This is well
 documented in the literature.  But why should a program need
 maintenance?  Aren't they fixed text that does not change?  In the
 simplest sense this is true.  But over time, bugs in the code are found
 and fixed, and needs and expectations change.  Programs are not
 static, they evolve.  Good programming techniques and documentation
 make maintenance easier during the life time of a program, but eventually the
 program becomes so hard to change that one must scrap it altogether
 and start a fresh design.  So programs have a birth, a life of use and
 maintenance and, finally, a senility before they die.

 REFERENCES
      "Pascal User Manual and Report", Second Edition, by Kathleen Jensen
      and Niklaus Wirth.  Springer-Verlag, 1978.

      "Software Tools in Pascal", Brian W. Kernighan and P. J. Plauger.
      Addison-Wesley Publishing Co. 1981.

      "Algorithms + Data Structures = Programs", Niklaus Wirth.
      Prentice-Hall, Inc., 1976.

      "Structured Programming", O. J. Dahl, E. W. Dijkstra and C.A.R. Hoare,
      Academic Press. London, 1977.

      "Selected Writings on Computing: A Personal Perspective",
      E. W. Dijkstra, Springer -Verlag, New York, 1982.

(* end module delman.program.essay *)
(* begin module delman.program.fable *)


      A Fairy Tale For Programmers


      The Three Most Important Concepts
         for Writing Good Code

 1. Put comments in your code.

 2. Don't ever forget that six months from now your program
    will be useless even to you without comments.

 3. Several people who published a rather well known article on
    using computers to study sequences (and whose names shall remain unsaid
    to protect the guilty) sent their programs to us two years after they
    had published their article.  It turned out that we could not use
    their programs directly because we did not have available the language
    that they used.  It was necessary to translate each line of code into
    our language before we could use their program.  Ok, fine, we know how to
    do that.  But despite the fact that these were old programs that they had
    been working on for a long time, there were almost no comments in
    their code.  That made the translation 100 times more difficult!!
    One sees an equation in the code - what does it mean?  If they do
    something in a funny way, was it a mistake or is it important to
    do it that way?  What a headache!
    We threw out their programs and wrote our own.


          MORAL: Code that is not documented in English will
                 not survive in the long run.  Therefore:
          Put In Comments.
          Comment As You Code, NOT AFTERWARDS - Comments Are Part Of The Code.
          Change The Comments When You Change The Code, NEVER PUT THIS OFF.


      Epilogue
      Years later, out of curiosity, the program called CODE
 (COmment DEnsity) was written.  We were startled to discover that
 the frequency of characters devoted to comments in our code
 averages around 30 percent!


(* end module delman.program.fable *)
(* begin module delman.use *)


 UU    UU   SSSSSS   EEEEEEEE
 UU    UU  SS    SS  EE
 UU    UU  SS        EE
 UU    UU   SSSSSS   EEEE
 UU    UU        SS  EE
 UU    UU        SS  EE
 UU    UU        SS  EE
 UU    UU  SS    SS  EE
  UUUUUU    SSSSSS   EEEEEEEE


(* end module delman.use *)
(* begin module delman.use.intro *)

         Use Of The Delila System

      INTRODUCTION
      This section of the Delila Manual assumes that you have read the
 introduction to the manual, that a Delila System is running on your
 computer, and that you know how to get on the computer, to make
 files, to modify and correct files, and to run programs (See DELMAN.GUIDE.).

      There are several sources of information that you can keep in mind:
 1) The papers in DELMAN.INTRO.REFERENCES will show you
 how we have used the Delila System.
 2) LIBDEF.  This is a technical specification of Delila and the
 libraries.  However, there is a set of detailed examples that
 can be read profitably without reading all the definitions.
 3) The section of DELMAN called Program and Data Descriptions
 (DELMAN.DESCRIBE) lists everything that is available to you.  Whenever
 you want a tool to do something, that is the place to look.

      In this section we will first discuss the structure of a Delila Library
 and how you can find your pet (pet's?) sequence in it.  Next we
 describe how to tell Delila to go and fetch your sequences.  We will
 then discuss programs that let you study the sequences.  The sequence
 analysis will bring us back to Delila.

(* end module delman.use.intro *)
(* begin module delman.use.structure.1 *)

      LIBRARY STRUCTURE

      Think about a tree.  The trunk spreads into a series of branches,
 sticks and twigs.  A Delila library looks something like that, except
 that there are several kinds of branch, stick and twig, much as each
 twig ends in a leaf, bud or a flower.
      We have given names to the kinds of branches and leaves in Delila
 libraries.  Near the trunk there are the ORGANISM and the
 RECOGNITION-CLASS.  An ORGANISM is a cluster of data pertaining to a
 real-world organism.  The term "organism" is somewhat ambiguous, so it
 is a matter of taste as to the classification of some creatures (is a
 virus a traveling plasmid?).  In our library T4, T7 and E. coli
 information is stored in ORGANISMs.
      A RECOGNITION-CLASS is a cluster of data about any process that
 recognizes specific nucleic-acid sequences.  These include chemical
 modification and restriction enzymes.  (At present this portion of
 the library is not fully implemented, so we will not discuss it further.)

      The library structure can be diagrammed in a schema:
         A-->>--B  means A has one or more of B.
         C--->--D  means C has one of D.

                             LIBRARY
                              :   :
                              V   V
                              V   V
                              :   :
                  ............:   :.............
                  :                            :
              ORGANISM                RECOGNITION-CLASS
                  :                            :
                  V                            V
                  V                            V
                  :                            :
              CHROMOSOME                       :
               : : : :                         :
               V V V V                         :
               V V V V                         :
               : : : :                         :
   ............: : : :.........                :
   :       ......: :....      :                :
   :       :           :      :                :
  MARKER  TRANSCRIPT  GENE   PIECE....       ENZYME
   : :     :           :     : : :   :         :
   V V     V           V     : : :   V         V
   : :     :           :.....: : :   :         :
   : :     :...................: :   :         :
   : :...........................:   :         :
   :                                 :         :
  SEQUENCE                       SEQUENCE  SEQUENCE

(* end module delman.use.structure.1 *)
(* begin module delman.use.structure.2 *)

      In this schema you can see that ORGANISMs have one or more
 CHROMOSOME branches.  Once again, the term CHROMOSOME is intended to
 be somewhat flexible.  In Delila it means a complete biological
 unit of nucleic-acid either DNA or RNA.  For example, we refer to both the
 ECOLI (the 5 million base one) and the CHROMOSOME PBR322 (the 4.3kb plasmid).
      Notice that real-world chromosomes are "inside" their organism.  In the
 same way, one can think of CHROMOSOMEs to be inside their ORGANISM and
 ORGANISMs to be inside a library.  You may think of a Delila Library
 either as a tree or a series of objects, one nested inside the other.
 A little reflection will show that these are equivalent because one
 can convert from one form to the other.
      Every ORGANISM and CHROMOSOME has a name by which it can be identified.
 For example, T4 is the name of the coliphage of rII fame, while ECOLI
 is the name for Escherichia coli.  There is other information stored
 at these branch points as well.  An ORGANISM tells us the genetic map units
 used, such as centiMorgan or kilobasepair.  The CHROMOSOME goes on to
 specify the beginning and ending of the corresponding chromosome in
 the given units.
      Now we will delve inside a CHROMOSOME.  There are MARKERs,
 TRANSCRIPTs, GENEs and PIECEs.  What is going on?  So far we have
 been leaning toward a description of an ideal situation where all
 the nucleic-acid sequence information of a chromosome would be stored inside
 a single data object -- a PIECE.  Although this fits small phages such as
 PHIX174 and FD, it is nowhere near true even for ECOLI.  There are many dis-
 connected fragments of E. coli sequence now known.  As sequencing progresses,
 the fragments will connect more and more until the entire sequence is known.
 So a PIECE may be either the entire sequence information in a CHROMOSOME
 or only one of many fragments.  In this way we can store sequences
 in their natural arrangement, and still accommodate data that is
 fragmented due to technical limitations.  As more sequence is obtained,
 the SEQUENCE inside a PIECE is extended or fused to neighboring PIECEs.
      Like all the other library objects, a PIECE has a name, usually related
 to its biological functions.  To keep all the fragments straight, each
 PIECE tells its location on the genetic map.  The nucleic-acid
 sequence is stored inside a SEQUENCE, written 5' to 3'.  Besides these
 data, each PIECE stores a useful set of information: a
 coordinate system.
      For the purposes of identification, every published sequence is given
 a set of consecutive integers corresponding to basepairs or bases
 along the DNA or RNA sequence.  This numbering scheme is captured
 in the coordinates of each PIECE.  Using Delila, subfragments of a
 PIECE can be easily obtained.  These are also PIECES and every base
 in the new PIECE has the same number that its parent did.  This has
 WONDERFUL consequences:  every printout can refer to the original
 published literature.  It is also easy to compare the results from
 several analyses.

(* end module delman.use.structure.2 *)
(* begin module delman.use.structure.3 *)

      Let's move on to the GENE, one of the other data-objects inside a
 CHROMOSOME.  A GENE defines the endpoints of the genetic information
 of a protein in the SEQUENCE of a PIECE.  For example, in ORGANISM ECOLI;
 CHROMOSOME ECOLI there is a PIECE LAC.  The GENE LACI refers to this
 PIECE by pointing to the first G of the GTG and the A of the TGA.
      A TRANSCRIPT is similar to a GENE, but it defines any region
 transcribed into mRNA.  For consistency, we consider a tRNA to be a
 TRANSCRIPT and not a GENE.  GENE is reserved for the coding sequence
 of polypeptide products.
      Suppose that a mutation is known for your favorite sequence.  The
 MARKER is designed to record the change made by the mutation.
 MARKERs can also record splice junctions and other interesting
 sequence features.  In the future Delila will allow one to obtain
 both a sequence and its mutated forms using MARKERs.

      Notice that MARKERs, TRANSCRIPTs and GENEs all refer or point to
 a particular PIECE.  Each PIECE therefore has a "family" of related
 branches.  It is here that the tree-like structure of the library
 begins to break down: some of the branches are connected to one
 another in a kind of network.

      Now it is time to become practical.  Obtain a copy of HUMCAT.  This
 is a catalogue of the library, the HUMan's CATalogue.  (Delila also
 has one for herself).  Look around HUMCAT.  Notice that it is
 organized by ORGANISM, CHROMOSOME, and so forth.  Find a GENE or
 TRANSCRIPT that you are interested in.  In the next section you
 will learn how to obtain it to play with.

(* end module delman.use.structure.3 *)
(* begin module delman.use.language.1 *)

      DELILA - THE LANGUAGE

      WHY WRITTEN INSTRUCTIONS?
      One of our major design decisions was the use of written instructions
 for the librarian.  While we realize that this is somewhat foreboding
 to a new user, it does have several advantages over direct interactive
 use.  One is that it is easier to correct mistakes in the list of
 sequences that are to go into the book than it is to change sequences by
 hand.  Corrections to instructions are done with a text editor.  Also, the
 amount of information necessary to obtain a fragment of sequence is usually
 less than the information in the sequence itself, so storing instructions
 instead of sequences is efficient.  Another advantage is that a complete
 and concise record may be kept.  As we will see later, the instructions can
 also be generated by auxiliary programs, allowing one to automate many
 complex manipulations.

      WHAT IS THE DELILA LANGUAGE?
      This section describes the use of the language Delila:
         DEoxyribonucleic-acid
           LIbrary
             LAnguage.
      The language is not as complex or comprehensive as a natural language
 such as English or French.  It was designed for a particular task:
 telling a nucleic-acid data base manager - the librarian - the set of
 fragments that one wants to collect for study.  (The name Delila is an
 anachronism that we can't bear to part with...)
      Since the library is structured like a tree, the language must allow
 one to specify individual branches.  Eventually a particular PIECE
 will be identified, and one can request one or more fragments from
 the PIECE.  Let us look at an example:

      TITLE "EX1: THE LACI GENE";
      ORGANISM ECOLI;
         CHROMOSOME ECOLI;
            GENE LACI;
            GET ALL GENE;

 (Note: this instruction set is kept in the file EX1IN, so you can
 try it.  All EXn examples are sent with the Delila System.)

      Statements in Delila end with a semicolon (;) - there are five
 statements above.  The first statement will give a title to the book.
 The next three specify a particular GENE in the library structure.
 One thinks of this as a series of steps climbing the library tree.
 Starting at the "root" of the library, we first named the ORGANISM
 ECOLI.  This moves us out to that ORGANISM.  Then the CHROMOSOME
 was chosen to be ECOLI - the main chromosome (as opposed to a
 plasmid such as PBR322).  Next, the particular gene, lacI, is
 specified by "GENE LACI;".
      As we noted in the section on structure, GENES point to the
 particular PIECE that they reside on.  GENE LACI points to the PIECE LAC.
 Although we need not know this for the request, Delila knows it
 automatically.  When the GET is performed, Delila will obtain the
 sequence of lacI from the G of the GTG through the A of the TGA.
      After Delila has read each of these statements, the information
 about the object (ORGANISM, CHROMOSOME or GENE) is put into the
 book.  The GET generates a PIECE that is also placed into the book.

(* end module delman.use.language.1 *)
(* begin module delman.use.language.2 *)

      TRY IT OUT
      Type a file containing Delila instructions that specify the gene
 you chose at the end of the section on library structure.  For this
 discussion, we will use the name EX1IN, although you may use another
 name.  Find the entry on Delila (DESCRIBE.DELILA) in the back of this
 manual and run it:
      delila(ex1in,ex1bo,ex1dl)
 Look at the ex1dl file. This is the Delila Listing.  The first
 line will look like this:
   82/01/21 23:17:51     DELILA 1.20     PASS 1           PAGE 1
 Delila performs two passes through the instructions.  Pass 1 checks for
 spelling and syntax errors.  If you made a typing mistake, it will be noted
 in the listing and Delila will not begin Pass 2.  Should Pass 1 be
 successful, then Pass 2 begins.  Notice that there are several lines that look
 something like this:
 * 81/01/18 22:29:26, 80/11/19 22:17:46, LIBRARY 1: BACTERIOPHAGE
 * 81/01/18 22:29:26, 80/11/19 22:17:46, LIBRARY 2: E. COLI AND S. TYPHIMURIUM
 These are the full titles of the libraries from which you are pulling
 sequences.  Each title has three parts separated by commas:
      1) the instant (date and time in descending order) that the library
         was created.
      2) the instant that the PARENT of this library was created.
      3) the title of the library.
 Notice that Delila also prints the current date and time at the top
 of the listing (if your system has these functions).  The first line of a
 book or library contains its full title.  For this example, this is:
 * 82/01/21 23:17:51, 81/01/18 22:29:26, EX1: THE LACI GENE
 What is the "genealogy" of the book that you obtained?
      Back to the listing, Pass 1.  The instructions that you typed are
 repeated on the listing.  To the left are two columns of numbers -
 the leftmost is the line number and the next is the statement number
 (there can be several statements on one line or one line may contain
 only part of a statement).  This information is sometimes useful.
      Now let's look at the listing, Pass 2.  Notice that the instructions
 that you typed are repeated again, but that there are extra lines
 inserted.  In Pass 1 Delila checked for typing errors, while in Pass
 2 Delila pulls out data items and places them into the book.  As
 each item is put into the book, it is given a number:
      2     2        ORGANISM ECOLI;
                                  #1
 This is useful for some auxiliary programs.  We will discuss control of
 the numbering in a later section.
      If your instructions worked then there will be two other numbers just
 below the get:
      5     5              GET ALL GENE;
                             #4
                                      ^29^1111
 These numbers show you the numbers of the beginning base (29) and
 the ending base (1111) for the PIECE put into the book.


(* end module delman.use.language.2 *)
(* begin module delman.use.language.3 *)

         RANGE DEFAULTS
      It is quite possible that you got an error message at this point:

      4     4           GENE LACZ;
      5     5              GET ALL GENE;
                             #4
                                      ^1234^100000
 ---ERROR(S)---------------------------^206^203
 203: OUT OF RANGE AND DEFAULT RANGE = HALT
 206: WE DO NOT KNOW THIS LIMIT (A WARNING)

 This indicates that only part of the gene you are interested in
 exists in the library.  Delila detects the fact that one end of
 the GENE goes off the end of its PIECE, and says that this limit (the
 end of the gene) is unknown.  (This is indicated by the 100000.)  Normally
 Delila will HALT when this situation is discovered.  You can change this by
 using the instruction:
      DEFAULT OUT-OF-RANGE REDUCE-RANGE;
 anywhere before the problem but after the TITLE.  This resets the default
 response to an out of range situation.
      In REDUCE-RANGE mode, Delila will attempt to find the closest edge
 of the PIECE and use that.  The listing will show a record of what
 Delila does:
      6     6              GET ALL GENE;
                             #4
                                      ^1234^100000^1419
 ---ERROR(S)---------------------------^206^208

 206: WE DO NOT KNOW THIS LIMIT (A WARNING)
 208: OUT OF RANGE AND DEFAULT RANGE = REDUCE (A WARNING)
 In this case the PIECE in the book begins at 1234 and ends at 1419.
      To cause Delila to continue without putting any PIECE down in the book
 one would use:
      DEFAULT OUT-OF-RANGE CONTINUE;
 You may use several default statements to affect how Delila responds.
 To reset the default to halting, use HALT instead of CONTINUE or
 REDUCE-RANGE.  (See DELMAN.USE.CONTROL)


      Use the programs COUNT and LISTER to look at your book.

(* end module delman.use.language.3 *)
(* begin module delman.use.language.4 *)

      MORE ON INSTRUCTIONS
      There are several ways to obtain sequences in a book.  For example
 one could use:

      TITLE "EX2: AN ABSOLUTE GET";
      (* FIRST WE WILL SPECIFY THE LAC PIECE: *)
      ORGANISM ECOLI; CHROMOSOME ECOLI; PIECE LAC;
      (* NEXT WE WILL REQUEST A PARTICULAR FRAGMENT OF THAT PIECE: *)
      GET
         FROM 29  (* THE BEGINNING ABSOLUTE POSITION *)
         TO 1111; (* THE ENDING ABSOLUTE POSITION *)

 There are several things to note about these instructions.  First, there
 are 5 instructions and four comments.  A comment is the text between
 a (* and a *).  You should use comments freely to document what you
 are doing.  This is made easy by the fact that comments can extend over
 several lines.  Delila ignores comments.
      Several instructions can be put on one line (the specifications, above)
 and one instruction can be spread over several lines (the request).
      The GET above defines two basepairs in the LAC sequence.  The sequence
 between (and including) these bases is put into the book.   Delila always
 puts sequence in the book 5' to 3'.  Thus to get the complement of the
 instructions above, one simply uses:
      GET FROM 1111 TO 29;


      RELATIVE VERSUS ABSOLUTE REQUESTS
      In contrast to EX2 we could write:

      TITLE "EX3: A RELATIVE GET";
      ORGANISM ECOLI; CHROMOSOME ECOLI; GENE LACI;
      GET FROM GENE BEGINNING
            TO GENE ENDING;

 In this case we did not state absolute numbers to define our book.
 Yet in all three examples (EX1, EX2, and EX3) the same PIECE will be
 generated in the book.
      There are two ways to define a base in a sequence.  One is to give
 its exact coordinate as in EX2.  That is called an ABSOLUTE reference.
 The other way is to define the distance from a fixed point, as in
 EX3: a RELATIVE reference.
      Both absolute and relative referencing have advantages and disadvantages.
 Using absolute coordinates allows us to pinpoint particular bases.  However,
 Delila libraries evolve over time, and when two previously separate
 PIECEs are fused, only one coordinate system is kept.  An absolute
 reference will not last.  On the other hand, a relative reference
 will last because the GENE BEGINNING will always be the start of the
 gene no matter what happens to the actual coordinate system.

(* end module delman.use.language.4 *)
(* begin module delman.use.language.5 *)

      FORMS OF REQUESTS

      By now you may have noticed that there are two kinds of GET:
      GET ALL ... ;
      GET FROM ... TO ... ;
 The two positions of the FROM-TO form are independent as long as
 one refers to locations on the same PIECE.  In absolute terms one
 can say
      GET FROM -22 TO 56; (* ABSOLUTE *)
 or one can make it relative to a gene beginning:
      GET FROM GENE BEGINNING - 10
            TO GENE BEGINNING +  5;
 One can even write instructions relative to an absolute location:
      GET FROM 56 - 10 TO 56 + 5;
 This is to be pronounced "get from fifty-six minus ten to fifty-six plus
 five".  We will come back to this form later.

      MARKERs, GENEs, TRANSCRIPTs and PIECEs all have a BEGINNING and an
 ENDING that you can use.  For example,

      TITLE "EX4: NON-CODING LAC LEADER";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      GENE LACZ; (* NOW DELILA KNOWS THE PIECE *)
      TRANSCRIPT LACZ;
      GET FROM TRANSCRIPT BEGINNING
            TO GENE BEGINNING -1;

 Notice that both a GENE and a TRANSCRIPT can be specified at the
 same time.


      AMBIGUOUS DIRECTIONS
      Consider the circular genome of ORGANISM G4.  The numbering of the
 PIECE is from 1 to 5577.  Suppose that you asked for:
      TITLE "G4 COORDINATE PUZZLE";
      ORGANISM G4; CHROMOSOME G4; PIECE G4;
      GET FROM 1 TO 10;
 This is ambiguous!  There are TWO PIECES that run from 1 to 10:
 one clockwise and the other counterclockwise.  In this case Delila
 will supply you with the clockwise fragment.  However to be more
 specific in one's request, one would write:
      GET FROM 1 TO 10 DIRECTION +;
 or
      GET FROM 1 TO 10 DIRECTION -;
 But there are still two other possibilities!
      GET FROM 10 TO 1 DIRECTION +;
      GET FROM 10 TO 1 DIRECTION -;
 Delila is capable of handling most requests like these.  (Certain
 of the most complex cases remain to be solved.)

(* end module delman.use.language.5 *)
(* begin module delman.use.language.6 *)


      RESPECIFICATION
      What if one wanted to specify more than one "leaf" (GENE, TRANSCRIPT,
 or MARKER) at one time?  Then one would use:

      TITLE "EX5: THE REGION BETWEEN LACI AND LACZ";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      PIECE LAC; (* NOW DELILA KNOWS THE PIECE *)
      GET FROM (GENE LACI) ENDING + 1 TO (GENE LACZ) BEGINNING - 1;

 This form is called a "respecification", to distinguish it from
 a specification.


      MULTIPLE REQUESTS
      After Delila has completed a GET, as in the last few examples, the
 specifications are still in effect and one can do more GETs,
 change the specification, more GETs, etc:

      TITLE "EX6: MULTIPLE SPECIFICATION AND REQUESTS";
      ORGANISM ECOLI;
         CHROMOSOME PBR322;
            GENE AMPR; GET ALL GENE; (* GET GENE OF BETA-LACTAMASE *)
         CHROMOSOME ECOLI; (* CHANGE SPECIFICATION *)
            TRANSCRIPT 16SRRNAB; GET ALL TRANSCRIPT; (* 16S RRNA *)
            TRANSCRIPT 23SRRNAB; GET ALL TRANSCRIPT; (* 23S RRNA *)
      ORGANISM PHIX174;
         CHROMOSOME PHIX174;
            (* GET TWO OVERLAPPING GENES *)
            GENE A; GET ALL GENE;
            GENE B; GET ALL GENE;


      WHEN DOES DELILA ACT?
      During Pass 2, Delila places the various items into the book.  Thus
 as ORGANISM, CHROMOSOME, GENE or TRANSCRIPT instructions are read,
 they are executed immediately.  This is not true for the PIECE in the
 example EX3 because at that point Delila does not know the endpoints
 of the sequence desired.  Delila "knows" which PIECE you are interested
 in, but not what particular bases.  When Delila reads the GET, the bases
 become apparent.  You can see this in the Pass 2 listing:  a PIECE
 is not given a number, rather the number is listed for the GET that
 generates the PIECE in the book.  The numbers are for objects in
 the book, not for those in the library.

(* end module delman.use.language.6 *)
(* begin module delman.use.auxiliary.programs *)

      AUXILIARY PROGRAMS: LISTER AND SEARCH

      In the section on language, we discussed how one can use Delila to
 generate books containing sequences one is interested in.  It is difficult
 to read the sequences in a book because they are in an awkward (from your
 viewpoint) compressed format.  In every day use, we almost never look
 inside a book because there is a much easier way:  generate a fancy
 listing using the program LISTER.
      In the section on the Delila language you used LISTER to look
 at the books that you generated.  (If you have not done this, then
 you should do it now.)  As other programs, LISTER will print
 sequence 5' to 3'.  If you want the complement, it is easy to use
 Delila to obtain it.
      LISTER is an example of an auxiliary program.  In contrast, Delila is
 the center of the Delila System.  The purpose of Delila is the
 manipulation of sequence information.  Other "auxiliary" programs
 perform tasks such as making listings or doing analyses.  These
 programs are explained in DELMAN.DESCRIBE.
      The only other auxiliary program that we will discuss here is the
 SEARCH program.  SEARCH will search a book for a simple pattern.  As
 you will recall, books have the same structure as libraries.  As
 SEARCH proceeds to look into an ORGANISM it will know the name of the
 ORGANISM:
      ORGANISM ECOLI;
 Then it will enter the CHROMOSOME:
      CHROMOSOME PBR322;
 Finally it begins to search a PIECE:
      PIECE PBR322;
 In other words, SEARCH can write Delila instructions that trace the
 search path.  Suppose that we had told SEARCH to search for the pattern
 5' AAGCTT 3' (HindIII).  We also tell it that the FROM should be -5 and
 the TO +10.  When search finds the site it can then write:
      GET FROM 29 -5 TO 29 +10 DIRECTION +;
 29 is the position of the first A of AAGCTT in PBR322.
 These Delila instructions are an answer to the search!

      You should try this and the other Auxiliary programs.


(* end module delman.use.auxiliary.programs *)
(* begin module delman.use.data.flow *)

      DATA FLOW AND DATA LOOPS

      In the section on Auxiliary programs we discussed the use of the
 SEARCH program to locate patterns in books.  The search results appear
 in three ways:  on the screen, in a file for printing, and as Delila
 instructions.  These instructions can be given to Delila to generate
 the sequences of found sites.  One can view this entire process as a
 flow of data between one program and the next.  Since this manual can
 not have (nice) line figures, we strongly urge you to look at the flow
 figures in the published papers listed in DELMAN.INTRO.DESCRIPTION.
 Connecting parts of the Delila system together is much like playing
 with tinkertoys.
      Data flowing in the Delila system can pass through a program several
 times.  Our first example was the conversion of a book to a library and
 the subsequent extraction of book subsets.  The SEARCH program
 provides a more complex case where searching of a book generates
 Delila instructions that can be used to create a new book.  The new book
 is the set of located sequences.  This cyclic string of events is
 called a loop.

      Once you are acquainted with these data flow loops you can look at the
 SEPA program.  This program deals entirely with Delila instructions
 of the form:
      GET FROM 56 -40 to 56 +60;
 along with ORGANISM, CHROMOSOME and PIECE specifications.  The
 SEARCH program produces instructions in this form.  SEPA is used to
 separate instruction sets.
      For example, suppose you are interested in all the AluI (5' AGCT 3')
 sites that are not part of PvuII (5' CAGCTG 3') sites.  You have used
 DELILA and SEARCH to generate two sets of instructions, ALUIMIX and
 PVUII.  You then can use SEPA to get the set that you want:
      SEPA(PVUII,ALUIMIX,PVUIIO,ALUI)
 PVUIIO would be a reorganized non-redundant list of the PvuII
 instructions, and ALUI would list all AluI sites that are not
 PvuII sites.  Both our second and third papers describe the way that
 we use SEPA.  (Note: to do a search like this one must be sure that the sites
 are numbered the same way.  The search rule for AluI would be #AGCT,
 while the search for PvuII would be C#AGCTG.  The # symbol tells SEARCH
 to write the number of the following base in the instructions.  This forces
 the SEARCH program to number the same A in the two cases.)


(* end module delman.use.data.flow *)
(* begin module delman.use.coordinates.1 *)

      THE COORDINATE SYSTEM OF A PIECE

      In the sections on library structure and the Delila language, we kept
 touching on the topic of coordinate systems for PIECEs.  Delila is
 required to maintain the numbering of sequence fragments, and a
 coordinate system is the means to do so.  This is not a simple problem,
 for one must handle both linear and circular genomes.  For the new
 user, it suffices to know that Delila can do that, and you could
 skip this section.


      Let us start with the simpler case, a linear PIECE.  The SEQUENCE
 in the library is numbered consecutively from 1 to 100.  So far so
 good, we need to record three pieces of information:
      CONFIGURATION: LINEAR
      BEGINNING:     1
      ENDING:        100
 Any subset of the PIECE such as:
      GET FROM 40 TO 50;
 will also be linear and can be handled by these three variables.
 Notice that one could:
      GET FROM 50 TO 40;
 to obtain a complement.  In that case the BEGINNING is greater than
 the ENDING and the numbering decreases.

      What if the CONFIGURATION is CIRCULAR?  Then based on our discussion
 about ambiguous directions, we should at least add a
      DIRECTION:    +
 for linear sub-fragments.  However the situation can be worse than that!
      Let us imagine a circular PIECE in the library.  It is numbered 1 to
 100 in the direction 5' to 3' of one DNA strand.  We then make a
 request:
      GET FROM 10 TO 90 DIRECTION -:
 The PIECE to be placed in the book is 21 bases long, with descending
 numbers, EXCEPT for a COMPLETELY UNPREDICTABLE DISCONTINUITY where
 the numbering jumps from 1 to 100.  Some more information about the
 "parent" coordinates must be stored.

(* end module delman.use.coordinates.1 *)
(* begin module delman.use.coordinates.2 *)

      The problem is to record the necessary coordinate information and to
 avoid becoming confused.  In the Delila System, the numbering of
 each PIECE has two parts: a COORDINATE part and a PIECE part.
      The COORDINATE part defines the location of a sequenced region on
 the genetic map.  Once that is established, the PIECE part tells what
 fragment is stored in the PIECE.  Both parts are transmitted to the
 book by Delila, but the coordinate part is fixed and unchanging while the
 PIECE part will vary depending on the fragment.  In summary so far:
      COORDINATE part = defines the relation of coordinates to the genetic map
      PIECE part = defines the relation of SEQUENCE to the COORDINATE part


 For the coordinate part:
 GENETIC MAP BEGINNING  This number locates the beginning nucleotide of the
 coordinate system on the genetic map.  We use these numbers to
 order the PIECEs in our Master library.

 The COORDINATE CONFIGURATION refers to the topological shape of the
 coordinates.  A linear genetic map could only have PIECEs with linear
 coordinates.  For a circular genetic map, circular coordinates may be
 chosen, but when only a portion of the sequence is known, each PIECE may be
 more conveniently handled as a linear coordinate system.

 A COORDINATE DIRECTION defines the orientation of the numbering system with
 respect to the genetic map.  + means "in the same direction as", - means
 "in the opposite direction as".

 The COORDINATE BEGINNING and COORDINATE ENDING nucleotides are integers
 that specify the limits of the coordinate system.  They are usually
 the ends of the largest known contiguous sequence.  The BEGINNING base
 corresponds to the genetic map beginning, the bases are consecutively
 numbered, and the ENDING is always greater than the BEGINNING number.

       The coordinate system described above provides a framework for stating
 the exact numbering of the SEQUENCE in a PIECE.  This also requires
 four items of information: configuration, direction, beginning and
 ending, all relative to the coordinate system.

 The PIECE CONFIGURATION may be circular only if the coordinate
 configuration is also circular.  When the coordinates are linear, the
 PIECE must also be linear.

 The PIECE DIRECTION may be + or - with respect to the coordinates,
 representing homology or complementarity to the coordinate system.

 The PIECE BEGINNING and ENDING are the numbers of the endpoints of the
 SEQUENCE.  Both must lie within the bounds set by the COORDINATE BEGINNING and
 ENDING.  The BEGINNING is always the 5' end of the molecule.

(* end module delman.use.coordinates.2 *)
(* begin module delman.use.coordinates.3 *)

      It turns out that this system handles all the confusing cases noted
 earlier.  To write out the nine values of coordinates we will keep
 this order:
      (GENETIC MAP BEGINNING,
       COORDINATE CONFIGURATION,
       COORDINATE DIRECTION,
       COORDINATE BEGINNING
       COORDINATE ENDING,
       PIECE CONFIGURATION,
       PIECE DIRECTION,
       PIECE BEGINNING,
       PIECE ENDING)
 The linear piece that we began this section with would be:
      (1,LINEAR,+,1,100,LINEAR,+,1,100)
 (The GENETIC MAP BEGINNING and COORDINATE DIRECTION are arbitrary.)

 The first subset was "GET FROM 40 TO 50;":
      (1,LINEAR,+,1,100,LINEAR,+,40,50)

 The complement: "GET FROM 50 TO 40;" is:
      (1,LINEAR,+,1,100,LINEAR,-,50,40)


 The circular PIECE is:
      (1,CIRCULAR,+,1,100,CIRCULAR,+,1,100)
 The request
      GET FROM 10 TO 90 DIRECTION -;
 would make:
      (1,CIRCULAR,+,1,100,LINEAR,-,10,90)

      You should work out the results for the other three possible request on
 this circular PIECE:
      GET FROM 10 TO 90 DIRECTION +;
      GET FROM 90 TO 10 DIRECTION +;
      GET FROM 90 TO 10 DIRECTION -;

 HINT: It helps to make diagrams.


      The catalogue program, described in DESCRIBE.CATAL, will list
 the coordinate systems for pieces of a book or library in tabular format.

(* end module delman.use.coordinates.3 *)
(* begin module delman.use.control.1 *)

      HOW TO CONTROL THE RESPONSES OF DELILA

      There are several situations in which Delila manipulates the information
 in a library in a way that may not always be what one wants.  That is,
 there are certain things that Delila does in the absence of any instructions.
 These default actions can be changed by using a special class of
 instructions - they are called default resets.  There are four basic
 kinds of default (as defined in LIBDEF) but we will discuss only
 three of them here.

 OUT-OF-RANGE DEFAULT
      We discussed this default in the section on the Delila language
 (DELMAN.USE.LANGUAGE).  A request may be outside the limits of a PIECE
 in a library for two reasons:
 1) The place is outside the coordinate system and is therefore
 unsequenced (Delila calls it "unknown").
 2) The place is within the coordinates, but the PIECE does not
 extend that far in the particular library being used.
      In either case, Delila's actions will be based on the RANGE default:
      DEFAULT OUT-OF-RANGE REDUCE-RANGE;
 Delila will attempt to find the nearest edges of the PIECE and use
 these.  (NOTE: there are known bugs associated with this process,
 although it works in almost all cases.)
      DEFAULT OUT-OF-RANGE CONTINUE;
 Delila will not place the requested PIECE in the book, and will
 continue to process any further instructions.
      DEFAULT OUT-OF-RANGE HALT;
 Delila will stop processing instructions.  The book will not be useable
 by auxiliary programs.
      In all cases, a warning message is put into the listing.

 KEY DEFAULT
      One can use this default to prevent the information about MARKERs,
 TRANSCRIPTs and GENEs from going into the book.  For example:
      DEFAULT KEY GENE OFF;
 will turn off printing of the GENE information.  The various data
 items in a library will contain free form notes about the object.
 (You can use the REFER program to look at these.)  This command can
 also be used to turn off the NOTEs when one wants to reduce the size
 of the resulting book.

(* end module delman.use.control.1 *)
(* begin module delman.use.control.2 *)

 NUMBERING DEFAULT
      In the section on language we discussed the numbering of the items going
 into a book.  This command is used to control the numbering.  One can
 turn it on or off:
      DEFAULT NUMBERING OFF; (* NOTHING FROM HERE ON WILL BE NUMBERED *)
 One can set numbering for particular items:
      DEFAULT NUMBERING PIECE; (* ONLY PIECES WILL BE NUMBERED *)
      DEFAULT NUMBERING TRANSCRIPT GENE; (* BOTH TRANSCRIPTS AND GENES
                                            WILL BE NUMBERED *)
 To make numbering more flexible, one can reset the number that the
 next item will get:
      DEFAULT NUMBERING 27; (* THE NEXT ITEM WILL BE NUMBERED 27 *)
 This default can be used to make sure that particular items will
 have the same numbers in different books.
      The number will be put into the notes of the item as the first line
 in the notes.  This allows them to be easily found by auxiliary
 programs.

 NOTE INSERTION
      One can put one's own notes into the next object placed in the book
 by using:
      NOTE "THIS IS THE REPLICATION ORIGIN FROM PHIX174";
      GET FROM ...
 Since this is not a default reset, it does not use the word "default".
 The new notes will follow the notes that were in the library.  By
 turning off notes from the library, and using note insertion, one can replace
 notes in a library.  Notes in PIECEs can be seen with program REFER.


      One can put these default or note insertion statements anywhere
 in a set of Delila instructions.  More details on these and other
 commands can be found in LIBDEF.


    All the defaults have initial values:

    default type       initial value
    ============       ==============
    KEY
         NOTE           ON
         MARKER         ON
         TRANSCRIPT     ON
         GENE           ON

    OUT-OF-RANGE        HALT

    NUMBERING           ON, 1, ALL


(* end module delman.use.control.2 *)
(* begin module delman.use.comparison *)

         SEQUENCE COMPARISONS AND STRUCTURE ANALYSIS

      The purpose of this section is to point out auxiliary programs that can
 be used to compare two sequences or find structures in a sequence.

      Sequence comparisons can be done with DOTMAT, which forms all possible
 pairs between sequences in two books.  For each pair, one sequence
 is put on the X axis of a coordinate system and the other is on the Y
 axis.  Both 5' ends are at the origin and X runs down the printout
 page while Y runs across the page.  (Simply rotate the page 90 degrees
 counter-clockwise to get standard Cartesian coordinates.)  The
 sequences are compared for complementarity at each possible (X,Y)
 pair formed between the two sequences.  A "dot" is placed at a coordinate
 if pairing can occur.  Notice that the display will be symmetrical
 around the line Y = X.  Long stretches of pairing will run on diagonals
 (along segments of lines Y = -X + C).  To look for homology using
 DOTMAT, use DELILA to obtain the complement of one of the pieces.
      DOTMAT produces all possible pairings.  Sometimes one wants to
 eliminate the short helixes, to make finding the longer ones easier.
 The pair of programs HELIX and MATRIX will do this.

      One can use these two programs to find overlaps between sequences
 obtained by shot-gun cloning.  Put the complete sequence on the X axis book
 and 20 bases from each end of the other sequence in the Y axis book.
 Search for long oligo's, say 15 or longer.  If there is a significant
 overlap, you will get a response from HELIX.

      Another program that can be used for comparisons is the INDEX program.
 With this tool you can make an index of the locations of the oligo-
 nucleotides in a book.  The measure of the similarity between
 oligonucleotides in the final alphabetized list of oligo's is related
 to sequence homologies.  This method is extremely powerful.

      MATRIX/HELIX vs INDEX
 MATRIX/HELIX
      advantage:  The 2 dimensional plot is easy to look at.
      disadvantage:  It is slow.  For two sequences M and N bases long, a
         dot matrix operation takes MxN operations.  It is so-called Order
         N Squared in computation time since the time to compare a sequence
         with itself is a function of the square of the sequence length.

 INDEX
      advantage:  It is fast, since the sorting algorithm is order NlogN.
      disadvantage:  One can't get a feeling for the results easily.  One
         method is to mark listings made with LISTER.

(* end module delman.use.comparison *)
(* begin module delman.use.aligned.books *)

      HOW TO MAKE AND USE ALIGNED BOOKS

      WHAT IS AN ALIGNED BOOK?
      To perform statistical analysis on sequence sites (eg. ribosome binding
 sites, promoters, splice junctions, etc.) one needs a way to align a set
 of PIECEs in a book.  For ribosome binding sites, we have used the A of
 the AUG or various points in the Shine/Dalgarno.  A book is aligned by
 chosing one base from each PIECE to be the alignment point.  The alignment
 bases could be chosen by a list of coordinates, but we have found that there
 are advantages to using Delila instructions to specify the base:

      TITLE "EX7: ALIGNED BOOK";
      ORGANISM ECOLI; CHROMOSOME ECOLI;
      PIECE LAC;
      GET FROM 29 -5 TO 29 +10; (* LACI RBS *)
      GET FROM 1234 -5 TO 1234 +10; (* LACZ RBS *)

 Here, the zero point for LACI alignment is base 29 and for LACZ it is base
 1234.  The "from parameter" is -5 and the "to parameter" is +10.
 The instructions allow one to align the book that is created from the
 instructions.  WARNING: the instructions must follow a rigid format; this
 is described in DELMODS in module info.align, along with details on
 how to write programs using aligned books.
      (See also DELMAN.USE.DATA.FLOW and DESCRIBE.ALIST)

      AUXILIARY PROGRAMS FOR ALIGNED BOOKS
      After generating an aligned book (a book and an aligning instruction set)
 one can list it using program ALIST or obtain a histogram that tells the
 composition of the book at each point relative to the aligned base
 with HIST.  A chi-squared analysis of an aligned book is done using HISTAN.

      GENERATING A SET OF ALIGNED RIBOSOME BINDING SITES
      We have provided the instructions for creating a set of aligned gene
 starts, in file GAIN.  GAIN was originally created from instructions
 of the form:
      ORGANISM ...; CHROMOSOME ...;
      GENE ...;
      GET FROM GENE BEGIN TO GENE BEGIN +2;
      ...
 This is file GRIN (genes relative to begin instructions).
 The resulting book was searched (one would use SEARCH with a rule of
 (A/G/T)TG ) to generate the instructions in aligned form.  GAIN was
 then made by replacing the from-position with the word FIRST and the
 to-position with LAST.  To use GAIN you must first create the
 transcript library from file TRAIN (TRAnscript library Instructions,
 use DELILA with LIB1 and LIB2).  Then replace FIRST and LAST with
 the desired range.  Notice that there are a few cases, marked
 "SPECIAL" that you must deal with individually.  Notice also, that genes
 that are oriented in the direction opposite the PIECE had to be set up
 by hand (this may be automated someday).  The instructions could now
 be named GAIN1, and DELILA can be used to generate the aligned book.

      A detailed example of these operations is given in
 DELMAN.CONSTRUCTION.EXAMPLE.

(* end module delman.use.aligned.books *)
(* begin module delman.use.perceptron.1 *)

      USE OF THE PATTERN PROGRAMS

      "Perceptron" is the name given to a class of algorithms for pattern
 recognition with learning capabilities.  Minsky and Papert have written an
 excellent book on the topic ("Perceptrons", MIT Press, 1969) which explores
 both the limitations and potentials of the method.  They also prove the
 "Perceptron Convergence Theorem" which guarantees that a solution will be
 found if one exists.  We have written an article (Stormo, et. al., 1982,
 Nucleic Acids Research, 10: 2997-3011) which describes our use of the
 algorithm to investigate translational initiation sites.
      The algorithm takes as input patterns which can be divided into two
 classes, and finds a "Weighting Function" which serves to distinguish the
 patterns in the two classes.  More rigorously, if we encode a sequence into
 a string of bits, S, the algorithm attempts to find a W such that W*S >= T
 (some "threshold") if and only if S belongs to one class of the two classes of
 sequences.  We mean by "*" the dot, or inner product of S and W, which are
 vectors of the same dimensions.  If we start with two sets of sequences,
 S+ and S-, and an arbitrary W and T, the algorithm can be described by
 the following three step procedure:
       Test: choose a sequence S from S+ or S-,
             if S is in S+ and W*S >= T go to Test,
             if S is in S+ and W*S <  T go to Add,
             if S is in S- and W*S <  T go to Test,
             if S is in S- and W*S >= T go to Subtract;
       Add:  replace W by W + S,
             go to Test;
       Subtract: replace W by W - S,
             go to Test.
 An example of this process is shown in our NAR paper (reference given above).
 (Note: this process can be done without goto's...)

      The program which implements the perceptron algorithm to work on
 sequences is called PatLrn.  Other programs which use the output of PatLrn
 are:
       PatLst - a lister program for the output of PatLrn;
       PatAna - does some simple analyses of the output of PatLrn;
       PatVal - evaluates the aligned sequences in a book by the PatLrn output;
       PatSer - searches a book for sites which are evaluated with a given
                PatLrn W output to be above some user specified value.

(* end module delman.use.perceptron.1 *)
(* begin module delman.use.perceptron.2 *)

      EXAMPLES FOR THE PATTERN PROGRAMS

      The files "exspbk" and "exsnbk" are the sets of positive and negative
 sequences used in the example of Figure 1 of our "Perceptron" paper (NAR 10,
 2997-3011).  The file "expa1" contains the initial pattern from that same
 example.  Given these files and the program "PatLrn" you can recreate
 the example thusly:
      PatLrn(exspbk,a,exsnbk,b,pat,expa1).
 The file "pat" should be identical (except for the date/time) to the file
 "expa2" that we have provided.  You can check that with the "Merge" program
 if you want.  It is also identical to the solution pattern from the example
 and it keeps track of the number of changes needed to get to that solution.
 The files "a" and "b" are empty in this case, because we are aligning the
 sequences by their first bases.  If we wanted to align them by any other
 base those files would contain the instructions which generated the sequences
 (see DELMAN.USE.ALIGNED.BOOK).

      Now use the program "PatAna" to do some simple analyses of the pattern.
      PatAna(pat,patan).
 The file "patan" is identical to the file expan2 that we provided.  It
 contains some useful information about the pattern, such as the minimum and
 maximum sequence values which could be obtained from this pattern, as well
 as the average value expected for random sequences and a feeling for the
 distribution of values.

      The program "PatVal" will use a pattern to evaluate a book of sites.
 Try:
      PatVal(exspbk,a,pat,valp).
         and
      PatVal(exsnbk,b,pat,valn).
 "valp" is the evaluation of each sequence of the positive class, and "valn"
 is the evaluation of each of the negative class sequences.  Check with the
 example in the paper to see that they are correct.  Again the "a" and "b"
 files are empty because we are aligning by the first base of the sequences.

      The program "PatSer" will use a pattern to search through a sequence,
 using each base in turn as the aligned base.  Those sites which are
 evaluated above some minimum, either set by the user or taken to be the
 minimum functional from the pattern itself, are identified.  Furthermore,
 instructions to get those sites so identified are written to the file "inst".
 Try this on an example file:
      PatSer(exsebk,pat,val,inst).
 notice that when the pattern extends beyond the sequence the sites are still
 evaluated, but the user is notified of the over-extension.

      The program "PatLst" is used to make nice horizontal printings of the
 patterns, such as for use as publishable figures.  Try this on the W51
 matrix which is from the paper and which we provide.  Read the page
 DESCRIBE.PATLST to see how to set the width of the pattern printed to
 a page to whatever you want.

(* end module delman.use.perceptron.2 *)
(* begin module delman.use.perceptron.3 *)

      A NOTE ABOUT SIGNIFICANCE

      While the example we provide in the paper, and that you have just done,
 is convenient for demonstrating the method, separating two sets of two
 sequences, each five long, is in fact trivial.  Try:
      PatLrn(exspbk,a,exsnbk,b,newpat).
 "newpat" is identical to "expa0" that we provided, and as you can see is
 not interesting.  The mathematical problem of when it becomes
 significant that one can separate two sets of sequences is still an open
 problem, but we can say some things.  As the number of sequences in each
 class gets larger the probability of separation decreases, as it does
 when the number of nucleotides in each sequence diminishes.  As a good
 rule of thumb we like to have more sequences in the smallest class
 (usually the functional class) than there are nucleotides in any one
 of the sequences.  Under these conditions one can be reasonably confident
 that a solution pattern is likely to identify features of biological
 significance.

(* end module delman.use.perceptron.3 *)
(* begin module delman.use.encode.1 *)

USE OF THE "ENCODE" PROGRAM

The program Encode was written to allow a user to encode sequences into
strings of integers in a flexible way.  For instance, one can encode
the sequences as mono-, di-, tri-, or higher oligonucleotides.  One can
assign specific oligos to certain positions or record only that they are
within some "window" of positions.  Within a window all the oligos may
be counted or only some, such as only those "in frame".
The program takes as input the book of sequences and the instruction set
which generated it and which specifies the alignment.  If the instruction
file is empty then all the sequences are aligned by their first bases.
The other input file, which must be non-empty, is the parameter file
"EncodeP" which specifies how the sequences are to be encoded.  It is
the options of the parameter file which give the program its flexibility
and power, and so they should be thoroughly understood.
The parameter file may contain any number of individual parameter records,
each of which will in turn be applied to each sequence in the book.  This
allows one to encode different regions of the sequences differently, or
to encode one region in more than one way.  Each parameter record has
five pieces of information, each written on a separate line:
      line 1 - the range over which this parameter record is to operate;  this
               line has two integers which are the bases, relative to the
               aligned base, for which to use this encoding;
      line 2 - the size of the window; the window begins at the start of the
               range and contains this many nucleotides in it;  the number
               of each base, or oligo, which occurs in this window is written
               to the output; note that positional information within the
               window is lost, so that if exact position is needed the window
               size should be 1;
      line 3 - the shift to the next window; this specifies how many bases
               to move the window over to its next position; this is repeated
               until the window begins beyond the end of the range;
      line 4 - this specifies the coding level, and the arrangement of the
               bases to be coded; the coding level is the number of bases in
               the oligos which are encoded, i.e., 1 means monos are encoded,
               2 means dis are encoded, ...; for coding levels greater than 1
               the user may allow for skips between the encoded bases;  for
               instance, one may want to encode as di-nucleotides bases which
               are separated by a nucleotide; this would be declared on this
               line by writing "2 : 1"; likewise, one could encode as a tri-
               nucleotide the first bases of three consecutive codons by the
               line "3 : 2 2", where the 3 indicates the coding level (tri-
               nucleotides) and the 2's represent the number of bases
               skipped between each encoded base; if there is no colon after
               the coding level declaration, all skips are assumed to be 0;
      line 5 - the shift to the next coding site; this allows the user to
               not count every occurrence of the oligos in the window, but
               rather to move some number of bases to the next encoded site;
               if all the oligos are wanted, this number should be 1.
The above line information constitutes a single parameter record.  The
parameter file may contain any number of these records concatenated
together.  Each sequence will be encoded by the entire list of parameter
records and the resulting string of integers will be written to the
"EncSeq" file.  The encoded string for each sequence ends with a special
"end of sequence" symbol, which is listed in the file header.
For examples of how this program works see "DELMAN.USE.ENCODE.2".

(* end module delman.use.encode.1 *)
(* begin module delman.use.encode.2 *)

EXAMPLES OF USING THE "ENCODE" PROGRAM

The files "ExEncIn" and "ExEncBk" contain the sequence around the beginning
of the rIIB gene of T4, and the instructions which align this sequence by
the ATG of the gene.  The aligned sequence looks like:

       ---                   ++
       111--------- +++++++++11
       210987654321012345678901
       ........................
       ATAAGGAAAATTATGTACAATATT

Notice that the 0 base is the A of the ATG (this is what we aligned by) and
that our sequence contains the 12 preceding bases and the 11 following.  This
is through the fourth amino acid of the protein.  If we wanted to encode only
the mono-nucleotides of the initiation codon we would make our parameter file:
   0 2
   1
   1
   1
   1

this would give the encoding:
 1 0 0 0 0 0 0 1 0 0 1 0 -1

Notice the -1 which specifies the end of the encoded sequence.  Each 4 integers
before that specifies which base occurs at each of the three encoded positions.
The A is encoded as 1 0 0 0, the T as 0 0 0 1, and the G as 0 0 1 0.

If we wanted to know the number of each mono-nucleotide in this whole region
and we didn't care about their positions, we would encode as:
   -12 11
   24
   24
   1
   1

This would give the encoding:
 12 1 3 8 -1

Notice that this is really just the composition of the sequence, since our
window covers the entire sequence.  We could get the di-nucleotide composition
with the parameters:
   -12 11
   24
   24
   2
   1

and get the encoding:
 5 1 1 5 1 0 0 0 1 0 1 1 4 0 1 2 -1

Notice that this encoded string is a vector of 16 integers (up to the end
of sequence mark, -1).  The number in each element of the vector is the number
of each di-nucleotide in the sequence, in the order AA,AC,AG...TC,TG,TT.

Examples continued in DELMAN.USE.ENCODE.3.

(* end module delman.use.encode.2 *)
(* begin module delman.use.encode.3 *)

Examples of using the "encode" program, continued from
DELMAN.USE.ENCODE.2.

      To encode the di-nucleotide composition of the Shine and Dalgarno region
and also the mono-nucleotides of the coding sequence, each in its own position,
we would make this list of parameters:
   -10 -6
   5
   5
   2
   1
   0 11
   1
   1
   1
   1

This would give us the encoding:
 2 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 1
 1 0 0 0 0 1 0 0 1 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 0 1 0 0 0 1
 -1

Here the first 16 integers are the di-nucleotide composition of the Shine and
Dalgarno region, and appended to that are the mono-nucleotide encodings for
each position of the coding sequence.  We could get the di-nucleotides of
successive codon first positions by:
   0 11
   12
   12
   2 : 2
   3
or we could get the codon composition by:
   0 11
   12
   12
   3
   3
or we could get the di-nucleotide encoding of the first and last position of
each codon, including the position of the codon by:
   0 11
   3
   3
   2 : 1
   3
These are left as exercises to the user, and it is encouraged that the user
make up other tests and try them until this program is easy to use.

(* end module delman.use.encode.3 *)
(* begin module delman.use.dbpull.define *)

  In addition to Delila, there are at least two other generally available
large nucleic sequence data bases. The DB program system handles both the
European Molecular Biology Laboratory (EMBL) libraries and those of the
Genetic Sequence Databank (GenBank(TM)).
  If you want to contact someone who helps operate these data bases use the
following addresses:
                   GenBank
                   c/o Computer Systems Division
                   Bolt Beranek and Newman Inc.
                   10 Moulton St.
                   Cambridge, Ma. 02238
                   USA

                   Graham Cameron
                   European Molecular Biology Laboratory
                   Postfach 10.2209, 0-6900 Heidelberg, West Germany

  The DB program system is a small set of programs. DBcat prepares
catalogs for DBpull. DBpull extracts part or all of an entry of either
EMBL or GenBank format. DBbk converts database entries into the Delila
book form that Delila programs use. All of these programs handle both data
base formats even when both occur together in the same library.
  At this point, please obtain some sample library entries from both data
bases and look them over.
  Embl and GenBank libraries are arranged in series of entries, each entry
possessing a unique entry id, a nucleic acid sequence, and other miscellaneous
information. Most of the lines in the libraries start with a word or abbreviated
code that indicates what kind of information the line contains. The
following definitions will clarify these points.

Library definitions:

Entry: An entry starts with a line which begins with an "ID" (EMBL) or a
"LOCUS" (GenBank). All subsequent lines are part of the entry until the
line that contains simply "//". "//" is the entry terminus code for both
data bases.

Entry id: On the first line of each entry, after the "LOCUS" or the "ID",
comes a few spaces and then a weird looking word or code that may or may not
resemble a familiar biological name. This is the entry id, it is the name the
entry is known by and it is what DBpull uses to identify which entries it
will extract.

Line codes: The phrases "ID" and "LOCUS" are line codes. There are other line
codes in each entry such as "REFERENCE" and "ORIGIN" in GenBank and "DE"
"SQ" in EMBL. Some lines do not have a code and some have one, but it is in-
dented. Other lines have codes, but there is no other information on the line.
these special cases will be discussed below in the definition of line code
request instructions.

  Now that you are familiar with the data bases you can understand the DBpull
instruction set. Each instruction takes up only one line. Each line does one
of two things; either it indicates what entry type (GenBank or EMBL) is
requested on the following lines or it makes an actual request for part or
all of an entry identified by its entry id. Please note that the following
definitions will be made clearer by referring to the examples that follow.

(* end module delman.use.dbpull.define *)
(* begin module delman.use.dbpull.instructions *)

Note: Instructions are entirely upper case because that is what the computer
      system DBpull was designed on required.

Instructions that determine entry request type of succeeding lines:

  EMBL: This indicates that requests for entries somewhere in the EMBL
  libraries will be on the following lines.
  GENBANK: Same for requests found in the GenBank libraries.
  GENB: Same as "GENBANK".

Instructions that tell which entries are to be pulled:

  Entry id: An instruction line beginning with an entry id will pull part
  or all of that entry. The parts extracted will depend on which of the
  "instructions that define extraction" (defined below) follows the id on
  the same line.

  Wildcard id: This request looks like an entry id request but somewhere
  in the entry name are one or two "*" symbols. The "*" represents any
  number of unspecified characters. It may be inserted at the beginning of
  the id, at the end, or at both the beginning and the end but not the
  middle. (Confused? see instructions example 3 below)

  EVERY: The word "EVERY" at the start of a request line calls for every
  entry of a particular entry type. (See instruction example 4)

Instructions that define extraction:

  Line codes: Following the instruction that tells which entry or entries
  are to be pulled, on the same line, come instructions that structure
  the extraction. One or more line codes occurring in this space will result
  in the lines of the entry which have matching codes being pulled. Genbank
  line codes are actually words. The full word or an abbreviation will work,
  but the abbreviation can not be shorter than 3 letters. "LOC", for instance,
  will pull the "LOCUS" line while "LO" would not. When there are one
  or more lines in the entry directly below a pulled line that either
  do not possess a line code, posses indented codes, or posses the code "xx",
  these additional lines will be extracted also.

  RAW: Instead of line codes one can simply insert the word "RAW". This will
  pull only the sequence of the entry without origin or coordinate labels.
  The sequence will end with a "." to separate it from other sequences and to
  make it suitable for input into Makebk. (see delman.describe.makebk) Also,
  if the first request of fin is "RAW", fout will have no dateline and
  therefore it will not make a suitable secondary data base for DBpull.

  ALL: Instead of "RAW" or line codes the word "ALL" will result in an
  entire entry being extracted.

(* end module delman.use.dbpull.instructions *)
(* begin module delman.use.dbpull.examples *)

Instruction examples (DBpull input file Fin)

Example 1:

EMBL
ADCXXX ID DE SQ
GENBANK
M13 LOC REFERENCE
ANABANIFH LOCUS

Comments: The first and third lines indicate what types of entries are
requested on the following lines. If, for instance, M13 were an EMBL entry
this set of instructions would not find it.

Example 2:

GENB
T7 RAW
MS2 ALL

Comments: The two requested ids are not in alphabetical order and the
DBpull output file fout will have the same order as the requests.

Example 3:

EMBL
*RNA SQ ID
*RNA* ID SQ
GENB
M* ORI SITES
GOOGOOGAGA ALL
T7 RAW

Comments: The character "*" is a wildcard; it represents any number of
unspecified characters.
The first request will grab any entry whose id ends in "RNA", the
second any one that has "RNA" anywhere in it, and the third any id which
starts in an "M". The fourth request is a joke and, like any other non-
existent id, will yield a "not found" message and then halt the program. If
there were no GenBank entry ids beginning in "m" a "not found" would appear
but DBpull would not halt because this id request is a wildcard. The logic
behind this distinction is that wildcards are used to search for the
possible existence of an entry, but regular ids are used only for entries
that are well known by the user. Note that "ORI" (origin) pulls sequence in
GenBank and "SITES" tells you where the genes and other features are. "SQ ID"
and "ID SQ" are equivalent; lines are pulled in the order that they occur.

Example 4:

EMBL
EVERY ID
GENB
EVERY LOC

Comments: This example would make a catalog for users of the entire EMBL
and GenBank data bases. The catalog would be alphabetical because the
catalog files used by DBpull (produced by DBcat) are presorted. If
"catalogs for humans" are provided with your libraries do not try this
example; it is very expensive. If you do try it, you might want to request
additional line codes to "LOC" and "ID" for a more informative catalog.

(* end module delman.use.dbpull.examples *)
(* begin module delman.use.search.1 *)

                  Use of the Search Program

i. searching dna sequences for particular strings
      The search program works on books of sequences.  Any search pattern
will be looked for in each sequence of the book.  Search patterns consist
of strings of nucleotides, such as 'aatggct'.  You may also specify
ambiguous patterns, such as 'a or g', in either of two ways: '(a/g)' or
'r'.  All possible ambiguities can be asked for, by either way.  From
within the search program type 'l' to see the list of one-letter codes
for each ambiguous base combination.  One can also include in the search
positions for which you don't care what the base is, indicated by 'n'.
For instance, 'anc' would search for a and c separated by any base.  One
can also use 'e' (for extension) to vary the spacing between specified
regions.  The 'e' is considered to be an 'n' and also as nothing.  For
example, 'aec' would search for both 'anc' and 'ac'.  We used this feature
to search for 'shine and dalgarno' sequences before 'atg's by specifying
'gga5n4eatg'.  This means 'gga followed by 5 to 9 unspecified bases followed
by atg'.
      One can search for strings which are close to the specified by allowing
mismatches to the specified sequence.  This is done by typing 'm' as a
search command, and then specifying how many mismatches are allowed.  If
there are regions within the specified sequence where you want no mismatches,
this is stated by enclosing that region between and '<' and '>'.  For example,
if mismatches were set to 1 and the pattern searched were 'aat<ggc>t', then
the 'ggc' must be found exactly, but the rest of the pattern need only be
within one of a perfect match.
      The search program returns to you the positions of the matches found in
the book.  Unless otherwise specified, the position corresponds to the first
base of the pattern.  However, one can ask for the position to be another
base by preceding that base by '#'.   For example, 'aa#atggct' would return
as the position of the match the 'a' of the 'atg'.
      It is also possible to make searchs for relations between bases.  Six
relations are allowed: identity (i); non-identity (ni); complementarity (c);
non-complementarity (nc); complementarity including g-t pairs (w); and
non-complementarity including g-t pairs (nw).
Relational searchs are specified by first
the symbol '^', followed by the pattern position this base is to be related
to, followed by the relation.  For example, 'n^1i' would find all sites in
which there is a repeated base (aa, cc, gg or tt).  Notice that the base
to which the relation refers must proceed the point of the relation in the
pattern.  Searching for the pattern '5n^1c' would find sites of complementary
bases separated by 4 unspecified bases.
      More information on search patterns  and other commands in general
can be obtained by typing 'help' while in the program.

(* end module delman.use.search.1 *)
(* begin module delman.use.search.2 *)

ii.  Creating Delila Instruction Files
      The search program also allows one to create instruction files so
that the located sites may be put into a book for further analysis.  This
is especially useful when you want to include in the analysis regions around
the sites.  For instance, you could set the 'from' distance to -60 and the
'to' distance to +40.  Then by searching for 'gga5n4e#atg' you would get
the instructions necessary to obtain the sequences from -60 to +40 around
the atg's which are preceded by Shine and Dalgarno sequences.  Help on
using this feature of the program can be obtained by typing 'd help' while in
the program.

(* end module delman.use.search.2 *)
(* begin module delman.construction *)


           cccccc    oooooo   n     nn            
          cc    cc  oo    oo  nn    nn            
          cc        oo    oo  nnn   nn            
          cc        oo    oo  nnnn  nn            
          cc        oo    oo  nn nn nn            
          cc        oo    oo  nn  nnnn  --------  
          cc        oo    oo  nn   nnn            
          cc    cc  oo    oo  nn    nn            
           cccccc    oooooo   nn    nn            
                                                  
                                                  
 ssssss   tttttttt  rrrrrrr   uu    uu   cccccc   tttttttt            
ss    ss     tt     rr    rr  uu    uu  cc    cc     tt               
ss           tt     rr    rr  uu    uu  cc           tt               
 ssssss      tt     rr    rr  uu    uu  cc           tt               
      ss     tt     rr    rr  uu    uu  cc           tt               
      ss     tt     rrrrrrr   uu    uu  cc           tt     --------  
      ss     tt     rr  rr    uu    uu  cc           tt               
ss    ss     tt     rr   rr   uu    uu  cc    cc     tt               
 ssssss      tt     rr    rr   uuuuuu    cccccc      tt               
                                                                      
                                                                      
          iiiiiiii   oooooo   n     nn  
             ii     oo    oo  nn    nn  
             ii     oo    oo  nnn   nn  
             ii     oo    oo  nnnn  nn  
             ii     oo    oo  nn nn nn  
             ii     oo    oo  nn  nnnn  
             ii     oo    oo  nn   nnn  
             ii     oo    oo  nn    nn  
          iiiiiiii   oooooo   nn    nn  
                                        

(* end module delman.construction *)
(* begin module delman.construction.intro *)

      CONSTRUCTION OF DELILA LIBRARIES

      Introduction
      This section assumes that you are familiar with DELMAN.USE.
 Construction of a Delila System Library involves several steps:
      - Entry of the raw sequence data (twice)
      - Correction of the sequences
      - Gathering of the information about the sequences
      - Creation of a "module" for insertion into the library
        (not the same module type as the ones used by program Module.)
      - Insertion of the module
      - Construction of a catalogue
      - Checking that the library is correct.

      When you are gathering the data to create part of a library
 (the library insertion module) you may find the forms in
 DELMAN.CONSTRUCTION.FORM useful.  Use the Module program to make
 as many copies as required.


      NOTES FOR TRANSPORTATION
      Since the libraries that we send you have already been checked, you
 need only run the CATAL program (as discussed below) to generate the
 catalogues for these libraries.  After that, Delila can be used.

(* end module delman.construction.intro *)
(* begin module delman.construction.structure *)

      MORE ON LIBRARY STRUCTURE - LOGICAL VS PHYSICAL STRUCTURE

      In DELMAN.USE.STRUCTURE we discussed the structure of a Delila
 Library.  The descriptions were about how the parts are connected,
 and what is inside each part.  This is the logical structure of the
 data base.  We did not discuss the details of how a library is actually
 constructed, because it is not necessary to know these things when
 working with the Delila System.  The description of these details
 is the description of the physical structure of the data base.

      Since we do not yet have an extensive set of tools for constructing
 Delila Libraries, it is necessary to describe the physical structure
 enough so that you can build your own libraries.  Because these details
 are rigorously stated in LIBDEF, most things are automated by program
 Makebk, and Catal does lots of checking, we will only discuss the general
 concepts here.

      The logical structure of a library follows the schema shown in LIBDEF
 or DELMAN.USE.STRUCTURE.  This structure is a two dimensional net.
 Libraries are implemented physically in files, and so are linear
 structures.  If we exclude for the moment the references to a PIECE
 by MARKERs, TRANSCRIPTs and GENEs, then the library structure is a
 a tree.  Any tree can be represented as a nested series of objects
 in linear order:
      ORGANISM   (open  parenthesis for an ORGANISM)
      CHROMOSOME (open  parenthesis for a  CHROMOSOME)
      GENE       (open  parenthesis for a  GENE)
      GENE       (close parenthesis for a  GENE)
      PIECE      (open  parenthesis for a  PIECE)
      PIECE      (close parenthesis for a  PIECE)
      CHROMOSOME (close parenthesis for a  CHROMOSOME)
      ORGANISM   (close parenthesis for an ORGANISM)

 If you look at any book (eg. EX0BK) or library (eg. LIB1) you will
 see this structure.  Lines in a library either define the structure
 or are chunks of data (attributes).  Attributes are signaled by an
 asterisk (*) as the first character on the line.


      We must now allow various objects to refer to PIECEs.  This is done
 by a reference to the name of the PIECE.  For example, one of the
 attributes in a GENE is the name of the PIECE that the GENE is on.
 (In cases where the GENE spans two PIECEs, we use two GENEs.)

      To simplify the operation of the CATAL program (to be described later)
 we have added one more rule.  All objects that refer to a particular
 PIECE are called the "FAMILY" of the PIECE.  The rule is that a
 FAMILY precedes its PIECE in the physical (file) implementation.

(* end module delman.construction.structure *)
(* begin module delman.construction.catal *)

      MAKING NEW LIBRARIES - THE CATALOGUE PROGRAM

      The first technical difference between Libraries and Books in the Delila
 System is that Libraries have catalogues while Books do not.  Catalogues
 serve several purposes.  First, since they are a condensed list of
 the objects in a Library, they allow objects to be found quickly.
 There are catalogues for both Delila and for people (the latter is
 called a HUMCAT - HUMan's CATalogue).  These are constructed by the
 program CATAL.

      Since a library may be constructed by hand, it is also convenient to
 check the Library's physical structure at the time the catalogue is made.

      The Problem Of Duplicate Names
      Using Delila, a Book may be easily constructed that contains two objects
 within the same structure (if they are in different structures, it
 won't matter).  For example:
      ORGANISM ECOLI;
         CHROMOSOME ECOLI;
            GENE LACI; (* THIS IS ON PIECE LAC *)
            GET ALL GENE DIRECTION HOMOLOGUOUS;
            GET ALL GENE DIRECTION COMPLEMENT;
      If this Book were to become a Library, then a reference to PIECE LAC
 would be ambiguous since there are two PIECEs with that name within the
 CHROMOSOME.  The CATAL program detects these cases and makes the names differ
 by adding symbols to the names of second and subsequent duplicately named
 objects.  The second technical difference between Books and Libraries is that
 Books may have duplicate names, while Libraries may not.


      Notes For Transportation
      Unknown ends of objects (such as a GENE) are represented in this
 version by a number that is off the end of the coordinates of
 the PIECE.  For consistency, we have used +100000 or -100000 so
 that these can be more easily recognized (to our knowledge no
 continuous sequences are this long ... yet!).  If your computer
 cannot handle integers this large, then you can reduce these
 numbers, as long as they are outside of the individual coordinates.

(* end module delman.construction.catal *)
(* begin module delman.construction.example *)

      AN EXAMPLE OF CONSTRUCTING DELILA LIBRARIES

      In this example we show the series of steps used to set up the Delila
 libraries provided on the tape.  The special bracket notation ([...])
 is used here to indicate the contents of a file.  A slash (/) inside
 the brackets indicates the beginning of a new line in the file.
 Other notation is described in DELMAN.DESCRIBE.CONVENTIONS.

 1. Generate Library Catalogues
      catal(humcat,[ADVANCE DATES],lib1,cat1,newlib1,lib2,cat2,newlib2)
      copy(newlib1,lib1)
      copy(newlib2,lib2)
 The humcat should be identical to or similar to the one we send.
 (Note:  l3 is empty, and c3 and newlib3 will not be written, but your
 computer may require that these files exist as empty files in order to
 run Catal.  A similar situation holds for Delila and many other programs.)

 2. Build Transcript Book
      delila(train,trabk,tradl,lib1,cat1,lib2,cat2)
 There will be warnings that can be ignored at this point.

 3. Build Transcript Library
      catal(trahu,[ADVANCE DATES],trabk,tract,trali)
 You will see a number of cases where duplicate names are resolved.

 4. Test Grin File
      delila(grin,grbk,grdl,trali,tract)
      comp(grbk,cmp,[3])
 cmp should show 140 ATG, 7 GTG, 2 TTG.

 5. Test Gain File
      Within the Gain file, the "FIRST", "LAST" and "SPECIAL" cases must be
 replaced by numbers.  The WORCHA program comes in handy here, because it will
 do this easily:
      worcha(gain,ga3in,[FIRST/0/LAST/2/SPECIAL/0])
      delila(ga3in,ga3bk,ga3dl,trali,tract)
      comp(ga3bk,cmp,[3])
 cmp should be the same as for Grin.

 6. Expanding Grin
      You can now expand the "FIRST" to "LAST" region of Gain, taking care not
 to violate the "SPECIAL" cases.

(* end module delman.construction.example *)
(* begin module delman.construction.data.entry *)

         RULES OF RAW SEQUENCE INSERTION

 (1) A raw sequence is a file containing only the letters A, C, G or T
 (no U is allowed, use T).  You may type these letters or a set of
 letters on the keyboard that is convenient (eg. 1234); then convert
 the letters to ACGT using the program CHACHA.

 (2) For reasons of transportability and readability, the length
 of each sequence line should not exceed the width of characters on a
 typical terminal:  Do not type more than 60 bases per line.  You can reformat
 the data with REFORM or MAKEBK.

 (3) Sequences can and should be entered in free format with spaces
 to improve the readability of the sequence during entry.  This
 also helps in the corrections described below.  Much later it helps one to
 find parts of the sequence during fusion of PIECEs.

 (4) Before entry, use a pencil to mark off intervals of sequence to
 type.  This makes entry easier since there are rest points.  I often
 check off each (or every other) interval as I go, so I rarely get
 lost and duplicate or delete intervals.  If you can keep the lines like those
 in the paper, the sequence will be easier to check and correct later
 (but remember rule 2).

 (5) Two people should INDEPENDENTLY enter the sequence.
 Independence is important: one person will FREQUENTLY make the
 same mistake twice.  Do not be fooled into entry of a sequence and
 its complement by one person.  We have had two cases where the same deletion
 was entered in the same place by one person, even though he was typing
 the sequence and its complement.  Have two people independently
 type the sequence and the complement.  By doing it this way, you
 will also catch some typographical errors if you are using a published
 source.  (Another method:  if one person is to enter both strands, be
 sure that they are typed from two copies on which different intervals
 are used.)

 The method of independent entry allows automatic correction.  It seems
 to be faster and more reliable than other methods.

 (6) I caught the deletions mentioned above by knowing how long the
 sequence should be.  You should not rely on the computer for the
 length.  Predict it and then check it.

 (7) The file names of the two copies should include the
 initials of the person who typed the file.  See the example below.

 (8) A complemented or inverted strand may be re-complemented or
 re-inverted using the program REFORM.  Note that the free format
 of (3) will be lost.  You should use the reformatted sequence only
 for checking, and not for the final Library insertion, since you
 would lose the formatting if you did.

 (9) At this point you have two files of "raw" sequence.  The sequences
 may be merged together and corrected using MERGE.

 FOR EXAMPLE:  If the sequence was OMPA, TS and MA typed the raw
 copies, and the copy of MA contains the format desired for the
 Library, you could use MERGE like this:
      MERGE(OMPAMA,OMPATS,OMPA,GARBAGE)

 (10) Be sure to save all raw files (eg. OMPAMA, OMPATS, OMPA) until
 the library insertion is completed and taped or backed-up.


(* end module delman.construction.data.entry *)
(* begin module delman.construction.library.design *)

      SEQUENCE INSERTION PROCEDURE

      The following procedure assures the accurate and complete insertion
 of sequences into a Delila Library.  Overview of the method:

                    REFERENCE OBTAINED
                           :
      .....................*....................
      :                    :                   :
      V                    V                   V
      :                    :                   :
 RAW SEQUENCE         RAW SEQUENCE       DESIGN BOOK
    COPY 1               COPY 2                :
      :                    :                   :
      V                    V                   :
      :                    :                   :
   CHACHA               CHACHA                 :
      :                    :                   :
      V                    V                   :
      :                    :                   :
      :.......MERGE........:                   :
                :                              :
                V                              :
                :                              :
           RAW SEQUENCE                        :
           CORRECTED COPY                      :
                :                              :
                V                              V
                :............MAKEBK............:
                               :
                               V
                               :
                    LIBRARY INSERTION MODULE
                               :
                               V
                               :
                        LIBRARY INSERTION

 I. Obtaining Sequences
      A. Sequences may be obtained from
         1) Publications and preprints
         2) Computer transfer
         3) Your lab

      B. One copy of the source article and the sequence (or two copies of
         the sequence when no paper is available) are to be made for entry to
         our reference shelf.  The photocopies must be of GOOD quality, with
         NO loss of information.

 II. Raw Sequence Insertion (See DELMAN.CONSTRUCTION.DATA.ENTRY for details)
      A. Double entry is preferred over other methods.
      B. Programs are available to make this easy: REFORM and MERGE.
         RAWBK may be used on the checked raw sequence to get results quickly.
      C. THE NAME OF THE GAME IS ACCURACY.

 III. Book Design
      A. First be sure that you understand library structure and coordinate
         systems.  See LIBDEF and DELMAN.USE.
      B. Use forms to write out inserted sections.  These can be found in the
         sections that begin with "DELMAN.CONSTRUCTION.FORM".
      C. Check the library to see if you can fuse the new sequence to
         previous sequence.
      D. Decide on a coordinate system or fuse to previously defined coordi-
         nates.  (NOTE: when there is no zero, add 1 to the negative numbers.)
         Write this information on the source copy for our reference shelf.
      E. Record the source of all fragments and special information (eg:
         no zero, negative numbers incremented) in the PIECE notes.
         Put a complete reference into the PIECE notes.  Include
         the positions on the coordinate system, such as: (-1288 to -208)
      F. Record all MARKERs, TRANSCRIPTs and GENEs in your coordinates.
         Unknown values are either +100000 or -100000, depending on which
         end of the coordinates the value is beyond.
      G. Create the Library insertion module using MAKEBK.  All MARKERs,
         TRANSCRIPTs and GENEs pointing to a PIECE must be placed immediately
         prior to the PIECE that they refer to.  They are called the "family"
         of the PIECE.  (Note: we call this piece of a Delila library a
         module, but this is not the same as the ones the Module program works
         with.  The meaning should be clear from the context.)

 IV. Insertion - With The Utmost Of Care
      A. Always insert whole Library insertion modules.  Replace old parts of
         the library by modifying a module and reinserting it (with an editor).
      B. Quickly check the book structure for blatant errors.

 V. Checking the new Library
      A. The catalogue program (CATAL) is used to check library structure
         and to generate human and librarian catalogues.
      B. Modules that contain only parts of books can be made into whole
         books by placing a shell around the module.  Example:  a PIECE and its
         family can be inserted into a shell of a fake ORGANISM and CHROMOSOME
         to check the PIECE structure.
      C. Correct modules are inserted into the library and CATAL is run on
         the entire library.  Be sure that file CATALP is empty, to ensure that
         the dates are advanced.
      D. End point checking: all coordinate numbers should be checked.
         To do this, use DELILA to pull out: COORDINATE, PIECE, GENE,
         TRANSCRIPT and MARKER endpoints.  This is painful, but it has caught
         many errors.  Example:
               GET FROM GENE BEGINNING TO GENE BEGINNING +2;
         should give mostly ATG, and a few XTG. (SOMEDAY THIS MAY BE AUTOMATED)

 VI. Listings Of The New Library
      These are often useful (program to use in parenthesis)
      A. LIB (SHIFT)
      B. HUMCAT (CATAL)
      C. REF (REFER)
      D. LIS (LISTER)  may be large.

(* end module delman.construction.library.design *)
(* begin module delman.construction.form.organism *)

                   NAME:                     LIBDEF, 1980 JUNE 9


ORGANISM

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            GENETIC MAP UNITS (REAL)


(INSERT A SERIES OF
ORGANISMS AT THIS
POINT)


ORGANISM


(* end module delman.construction.form.organism *)
(* begin module delman.construction.form.chromosome *)

                   NAME:                     LIBDEF, 1980 JUNE 9


CHROMOSOME

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            GENETIC MAP ENDING (REAL)


(INSERT A SERIES OF
MARKERS, GENES, TRANSCRIPTS,
AND PIECES AT THIS POINT)


CHROMOSOME


(* end module delman.construction.form.chromosome *)
(* begin module delman.construction.form.marker *)

                   NAME:                      LIBDEF, 1980 JUNE 9


MARKER

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)


*                                            STATE (ON/OFF)

*                                            PHENOTYPE

DNA

*

*

DNA


MARKER


(* end module delman.construction.form.marker *)
(* begin module delman.construction.form.transcript *)

                   NAME:                     LIBDEF, 1980 JUNE 9


TRANSCRIPT

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)

TRANSCRIPT


(* end module delman.construction.form.transcript *)
(* begin module delman.construction.form.gene *)

                   NAME:                     LIBDEF, 1980 JUNE 9


GENE

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*

*

*

*

NOTE

*                                            PIECE REFERENCE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            DIRECTION (+/-)

*                                            BEGINNING NUCLEOTIDE (INTEGER)

*                                            ENDING NUCLEOTIDE (INTEGER)

GENE


(* end module delman.construction.form.gene *)
(* begin module delman.construction.form.piece *)

                   NAME:                     LIBDEF, 1980 JUNE 9


PIECE

*                                            SHORT NAME

*                                            LONG NAME

NOTE

*                                            (NOTES INCLUDE PRECISE REFERENCE

*                                            FOR EVERY BASE IN THE PIECE)

*

*

NOTE

*                                            GENETIC MAP BEGINNING (REAL)

*                                            COORDINATE CONFIGURATION
                                                (CIRCULAR/LINEAR)

*                                            COORDINATE DIRECTION (+/-)

*                                            COORDINATE BEGINNING (INTEGER)

*                                            COORDINATE ENDING (INTEGER)


*                                            PIECE CONFIGURATION
                                                (CIRCULAR/LINEAR)

*                                            PIECE DIRECTION (+/-)

*                                            PIECE BEGINNING (INTEGER)

*                                            PIECE ENDING (INTEGER)

DNA

* (INSERT SEQUENCE HERE)

DNA

PIECE

(* end module delman.construction.form.piece *)
(* begin module delman.describe *)


 DDDDDDD   EEEEEEEE   SSSSSS    CCCCCC   RRRRRRR   IIIIIIII  BBBBBBB   EEEEEEEE
 DD    DD  EE        SS    SS  CC    CC  RR    RR     II     BB    BB  EE
 DD    DD  EE        SS        CC        RR    RR     II     BB    BB  EE
 DD    DD  EEEE       SSSSSS   CC        RR    RR     II     BBBBBBB   EEEE
 DD    DD  EE              SS  CC        RR    RR     II     BB    BB  EE
 DD    DD  EE              SS  CC        RRRRRRR      II     BB    BB  EE
 DD    DD  EE              SS  CC        RR  RR       II     BB    BB  EE
 DD    DD  EE        SS    SS  CC    CC  RR   RR      II     BB    BB  EE
 DDDDDDD   EEEEEEEE   SSSSSS    CCCCCC   RR    RR  IIIIIIII  BBBBBBB   EEEEEEEE


(* end module delman.describe *)
(* begin module delman.describe.conventions.naming-parameters *)

      PROGRAM NAMING CONVENTIONS

      Every Delila System program exists in several forms:

 1) Raw source code - without modules inserted.  Example: "lister.r"
 would be the raw code for the LISTER program.  We are not sending code
 this way.

 2) Pascal source code - with all modules inserted.  This code is ready
 to compile.  Example: "lister.p".  (Our previous convention was to add
 an s to the end of the file name to indicate this.)

 3) Compiled code.  Our convention is to remove the suffix: "lister".
 To simplify the manual, programs are listed under the compiled code
 name (lister).


      PARAMETER FILE NAMES
      A file that controls the operation of a program is called a parameter
 file.  For LISTER this file is LISTERP.  For SPLIT it is ...
 SPLITP (get it? HA! HA! sorry.)

      RULES FOR PARAMETER FILES
 1) If the file is not empty then the file must contain values for all
 parameters.  With few exceptions, this should reduce the number of complex
 rules that one must deal with.

 2) Each parameter is on its own line.

 3) Parameters are left justified on the line.

 4) A parameter may be followed by one or more spaces and then any
 comment.  This lets the user write reminders of what the allowed
 values are.

      WHY CAN'T DEFAULT PARAMETER VALUES BE STATED IN THIS MANUAL?

 1) If default values are changed, then the manual must also be changed.
 since there is no automatic mechanism to assure that these remain
 the same, it is likely that it will be forgotten.  The manual would
 then be out of date.

 2) The manual entry defines the program but does not enforce details
 of operation.  It is somewhat like the LIBDEF specification.

 3) It is easy to find out what the defaults are since almost every
 program states the values used in its listing.  Running a small test
 takes only two minutes.

(* end module delman.describe.conventions.naming-parameters *)
(* begin module delman.describe.conventions.writing *)

      PROGRAM WRITING CONVENTIONS

      Program source code will always follow certain rules:

 1) The first line(s) will be the Pascal PROGRAM statement.

 2) The module libraries that are sources of the modules will be stated.

 3) One of the global constants will be called VERSION.  This number
 or string identifies the particular version of the source code.  We
 change VERSION every time that we modify the source file.  The program
 name and VERSION are written to the OUTPUT file when the program runs.

 4) There will be a document module that describes the program.
 The module is identical to the one in this manual such as
      DESCRIBE.LISTER
 It follows the format defined in
      DELMAN.DESCRIBE.DOCUMENTATION.PROGRAMS

 5) All constants, types, variables, procedures, functions
 and sections of code will have comments that describe their function.

 6) Interactive programs always have a HELP command.


      FOR TRANSPORTATION:
 1) Put non-standard features inside modules.

 2) Program lines longer than 80 characters are avoided.  (NB: This is ALWAYS
 possible in PASCAL).  The FLAG program will detect any lines that are too long.

 3) Reading into packed arrays is forbidden.  Read into unpacked arrays
 and pack or transfer values.

 4) The Pascal Users Manual suggests that PASCAL identifiers "must
 differ over their first 8 characters."  There are two problems related
 to this.  Assume that the transport is from a computer that requires
 N characters to differ, where N > 8 (eg. 10).
   a) Transport to a computer that requires M < N may cause names like A23456789
   and A2345678X to be considered identical, and compilation will be prevented.
   b) Transport to a computer that recognizes M > N will detect cases
   where one name was written two ways, with the difference in the last
   characters (between N and M).  The "most famous" such case was
   in CATAL: HUMCATLINE and HUMCATLINES were used on a computer where
   N = 10 and failed on computers where M > 10.
 The solution in both cases is to avoid names that differ beyond
 8 characters.  Is somebody willing to write a program to detect this?

(* end module delman.describe.conventions.writing *)
(* begin module delman.describe.conventions.running *)

      PROGRAM RUNNING CONVENTIONS

 In this manual we will use a single notation to mean running a program:
      lister(book,list)
 means to run the program LISTER using a file named BOOK.  The program
 will produce output to file LIST.

 The names BOOK and LIST are not necessarily the same as the file names
 declared in the source of LISTER (LISTERS), we assume that the names
 are mapped one on one.  Also, file names to the right may not be
 always mentioned, to simplify the notation.  For example:
      edit(inst1)
         :
         :   (create Delila instructions in file INST1)
         :
      delila(inst1,book1,delist1)
             (run DELILA to create a book named BOOK1 and
              a Delila listing DELIST1 that shows where the errors are.
              the library and catalogue are not mentioned.)
      lister(book1,list1)
             (Run the auxiliary program LISTER.
              OUTPUT and LISTERP are not mentioned.)

 The file OUTPUT will always contain messages and diagnostics intended
 for the CRT screen or teletype.
 The file INPUT is always used for interactive input by the programs.


 To fully define the files that a program uses we will write:
      LISTER(BOOK: IN; LIST: OUT; LISTERP: IN; OUTPUT: OUT)
 IN and OUT define the direction of information flow into or out of
 the program.  INOUT would mean that the source file may be modified
 (such as by an editor).  This is a symbolic way to represent the data
 flow diagrammed in our papers (see DELMAN.INTRO.DESCRIPTION).


 NOTE: The mapping of logical file name (the one the program knows) to
 physical file name (the actual one the computer system uses) is
 frequently done with an ASSIGN or LINK command in the job control language of
 the computer.

(* end module delman.describe.conventions.running *)
(* begin module delman.describe.short.cluster.files *)

      Short clustered descriptions of some Delila System files

 DOCUMENTS
      AAA     Names Of Delila System Files
      chars   Character List
      delman1 Delila System Manual
      delman2 Delila System Manual, for program descriptions
      libdef  Delila Library System Definition
      moddef  Module Transfer System Definition

 LIBRARIES
      humcat  Human's Catalogue For The Library
      lib1    Library 1: Bacteriophage
      lib2    Library 2: E. Coli And S. Typhimurium

 DELILA INSTRUCTIONS
      train   Transcript Library Instructions
      grin    Gene Starts In Relative Form (Use Transcript Library)
      gain    Gene Starts In Absolute Form (Use Transcript Library)

 SEARCH PROGRAM RULES
      genrule Finds Genes And Non-Genes
      enzrule Finds Restriction Enzyme Sites In Books

 WEIGHT MATRICES FOR THE PERCEPTRON
      w101    101 Wide, Finds All Genes In Transcript Library
      w71     71 Wide, Finds All Genes In Transcript Library
      w51     51 Wide, Finds All Genes And Some Nongenes

 EXAMPLES
      ex0bk   Example Book
      ex0hu   Example Catalogue For Humans
      ex0dl   Example Delila Listing
      ex0in   Example Instructions - To Create EX0BK
      ex0li   Example Listing From LISTER
      ex0lo   Example Loocat On Catalogue from EX0BK

 EXAMPLE DELILA INSTRUCTIONS FOR DELMAN
      ex0in   "ex0: example"
      ex1in   "ex1: the laci gene"
      ex2in   "ex2: an absolute get"
      ex3in   "ex3: a relative get"
      ex4in   "ex4: non-coding lac leader"
      ex5in   "ex5: the region between laci and lacz"
      ex6in   "ex6: multiple specification and requests"
      ex7in   "ex7: aligned book"
      ex8in   "ex8: non-coding lac leader- via respecification"

 EXAMPLES FOR TESTING THE MODULE PROGRAM
      exsin   example source in
      exmodli example modlue library

 EXAMPLES FOR TESTING AUXILIARY PROGRAMS
      expepin Delila Instructions For Testing Pemowe

 EXAMPLES FOR TESTING THE PERCEPTRON
      exspbk  Example Sequences Positive Book
      exsnbk  Example Sequences Negative Book
      expa0   Example Pattern 0, Learn EXSPBK Vs EXSNBK With Zero Start
      expa1   Example Pattern 1, An Initial Matrix For Learning
      expa2   Example Pattern 2, Learn EXSPBK Vs EXSNBK Using EXPA1 As Start
      expan2  Result Of Patana On EXPA2
      exsebk  A Book For Searching With EXPA2

 EXAMPLES FOR TESTING ENCODE PROGRAMS
      exencin Example Encode Instructions
      exencbk The Book For EXENCIN
      exencen Example Encoding Of EXENCBK

 FONTS FOR BIGLET
      font    font for the biglet program
      phont   demonstration font for the biglet program

 EXAMPLE PARAMETER FILES
      Often a program will have a file associated with it
 that controls it and is called a parameter file.  For example, the
 pbreak program uses a parameter file called pbreakp.  Many programs
 have example files.  They are not listed here, but you may want
 to look for them before you run the program.  An example is the xyplo
 program, for which there are the files xyplop.demo, xyin.demo,
 xyplop.test and xyin.test.
      As programs are modified, this section will not always be up to date.

(* end module delman.describe.short.cluster.files *)
(* begin module delman.describe.short.cluster.programs *)

      Short clustered descriptions of Delila System programs
      Documentation exists as describe.[name]

 MODULE LIBRARIES
   auxmod: modules for auxiliary programs
   delmod: delila module library
   doodle: pascal graphics library and preprocessor for pic under unix
   cybmod: specific module library for the cyber computer
   genmod: genbank access modules
   matmod: mathematics modules
   prgmod: programming modules for the delila system
   unixmod: specific module library for the unix operating system
   vaxmod: specific module library for the vax computer

 MODULE MANIPULATION
   module: module replacement program
   makemod: create a set of empty modules from a list of names
   makman: make manual entries from a source code
   maknam: make manual entry names
   modin: generate modularized delila instructions for absolute sites
   modlen: determine module lengths
   makemod: create a set of empty modules from a list of names
   nulldate:  modules to neutralize the date-time functions
   pbreak: breaks a file into pages at a certain trigger phrase
   show: show modules in a module library
   undel: remove references to delman in modules

 TOOLS
   biglet: text enlargement program
   calc: a calculator that propagates errors
   calico: character and line counts of a file
   cap: put capital letters inside quotes of a program
   censor: removes code from a program
   chacha: changes characters in a file
   code: find the comment density of a pascal program
   column: pull defined column from input
   concat: concatenate files together
   copy: copy one file to another file
   decat: break a file into 10 files
   decom: remove comment starts from within a comment
   difint: differences between integers
   flag: points out excessively long lines
   ll: line lengths
   lig: ligation theory
   lochas: look at characters in a file
   merge: compare two files and merge them
   nocom: remove comments
   number: add line numbers to a file
   rembla: remove blanks from ends of lines in a file
   repro:  make multiple copies of a file
   same: counts the number of lines that are identical in two files
   shell: basic outline for a program
   shift: copy one file to another file, with a blank in front of each line
   short: find locations of short lines in a file
   shortline: make short lines out of long lines
   split: split a wide file into printable pages
   sqz: squeeze the input file to fit into fewer characters per line
   sumfile: sum of file sizes
   test: a simple test program for Pascal
   unshi: remove first column of characters from a file
   ver: look at the version of a program
   verbop: increment the version number of a program
   vernum: print the version number of a program
   versave: save the file under the version number
   unsqz: unsqueeze the input file
   whatch: what characters are in a file?
   worcha: word changing program
   wl: wrap lines in a file
   woco: word counting program
   wordlist: lists words in a file
   ww: word wrap

 TOOLS FOR TEX
   notex: remove tex and latex constructs
   ref2bib: refer to bibtex converter
   sortbibtex: sort a bibtex database
   untex: remove tex and latex constructs
   untitle: remove titles from bbl file
   unverb: remove verbatim sections from a latex file

 GRAPHICS
   doodle: pascal graphics library and preprocessor for pic under unix
   domod: doodle modules
   dops: pascal graphics library and preprocessor for postscript
   dosun: pascal graphics library and preprocessor for Sun graphics
   shrink: reduce size of postscript graphics

   genhis: general histogram plotter
   genpic: convert genhis output to pic input

   xyplo: plot x, y data
   log: convert columns of data to log

   dnag: graphics of dna

 LIBRARIAN
   delila: the librarian for sequence manipulation
   catal: cataloguer of delila libraries, the catalogue program
   loocat: look at a catalogue

 GENBANK
   dbbk: database to delila book conversion program
   dbcat: database catalog production and sorting program.
   dbfilter: filter GenBank databases to remove unwanted entries
   dbinst: extract Delila instructions from a GenBank database
   dblo: look at the catalogue of a genbank/embl database
   dbpull: database extraction program.

 AUXILIARY PROGRAMS FOR DATA BASE CONSTRUCTION
   makebk: make a book from a file of sequences.
   rawbk: make a raw sequence into a book
   reform: raw sequences reformatted

 AUXILIARY PROGRAMS FOR SEQUENCE LISTING
   lister: list the sequences of pieces in a book with translation
   parse: breaks a book into its components

 AUXILIARY PROGRAMS FOR ALIGNED SEQUENCES
   alist: aligned listing of a book
   gap: gaps in aligned listing of a book
   hist: make a histogram of aligned sequences.
   histan: histogram analysis.
   malign: optimal alignment of a book, based on minimum uncertainty

 AUXILIARY PROGRAMS FOR ANALYSIS
   cluster: cluster indana subindexes into groups of duplicate entries
   coda: composition file to data for genhis
   comp: determine the composition of a book.
   compan: composition analysis.
   count: counts the amount of sequence in a book
   frame: evaluator of potential reading frames
   indana: analysis of an index
   index: make an alphabetic list of oligonucleotides in a book
   pemowe: peptide molecular weights
   search: search a book for strings

 AUXILIARY PROGRAMS FOR HELIXES
   dotmat: dot matrices of two books
   helix: find helices between sequences in two books
   keymat: keyed-matrices for helices between two books
   matrix: dot matrices for helices between two books
   rep: records repeats between sequences in two books

   sorth: sort helix list
   instal: delila instruction alignment

 AUXILIARY PROGRAMS FOR PATTERN LEARNING
   patana: pattern analysis
   patlrn: pattern learning
   patlst: lister of patlrn output.
   patser: pattern searcher
   patval: pattern evaluations of aligned sequences

 AUXILIARY PROGRAMS FOR ENCODED SEQUENCES
   encfrq: encoded sequence frequency analysis
   encode: encodes a book of sequences into strings of integers
   encsum: sum of the vectors of encoded sequences

  AUXILIARY PROGRAMS FOR INFORMATION ANALYSIS
   calhnb: calculate e(hnb), var(hnb), ae(hnb), avar(hnb), e(n) 
   frese: frequency table to sequ
   palinf: find palindromes, based on information theory 
   rf: calculate Rfrequency 
   rseq: rsequence calculated from encoded sequences
   rsim: Rsequence simulation
   rsgra: rsequence graph
   dalvec: converts Rseq rsdata file to symvec format
   makelogo: make a graphical `sequence logo' for aligned sequences
   ckhelix: check that the helix location is where one wants
   alpro: frequency and information of aligned protein sequences
   alword: frequency and information of aligned words
   dirty: calculate probabilities for dirty DNA synthesis
   sites: analyse sites from randomized sequence data base
   bkdb: convert a book to database format for the sites program
   siva: site information variance
   diana: diaucleotide analysis of an aligned book
   tri: test environment for triangle array
   digrab: diagonal grabs of diana data
   da3d: diana da file to 3d graphics
   dotsba: dots to database
   Ri: Rindividual is calculated for every site in the aligned book
   scan: scan a book with a wmatrix and generate a vector
   vfilt: vector filter
   tod: to database format for sites program
   winfo: window information curve

 AUXILIARY PROGRAMS FOR OTHER USES
   refer: print the references in the pieces of a book
   sepa: separates delila instruction sets
   lenin: convert a list of lengths into Delila instructions

 RANDOM NUMBERS AND SEQUENCES
   markov: markov chain generation of a dna sequence from composition.
   tstrnd: test random generator
   gentst: test random generator
   normal: generate normally distributed random numbers
   rndseq: generate random dna sequences
   aran: aligned random sequences

 MATHEMATICS
   av: average integers
   binomial: produce the binomial probabilities for a found black to white ratio
   binplo: produce the binomial probabilities for a found black to white ratio
   cerf: complement of the error function
   cisq: circle to square
   chi: estimates chi squared from degrees of freedom
   linreg: linear regression
   mnomial: produce the multinomial distribution for base probabilities
   pcs: partial chi squared
   riden: ring density graph
   ring: z space ring
   sphere: plot density of shannon spheres
   stirling: test of stirling's formula
   zipf: Monte Carlo simulation for Peter Shenkin's problem

 MISCELLANEOUS
   aa: not actually a program, this is the header page for Delila manual
   asciicode: converts ascii table to Pascal code
   binhex: convert binary to hex
   hexbin: convert hex to binary
   mstrip: remove control m's from a file
   epsclean: clean an eps file

   kenin: create Delila instructions from Kenn's all.gen instructions
   kenbk: book from a file of sequences of sequences provided by Kenn Rudd

   tipper: copy a file to the output file with special symbols at end
   todawg: change a book into dawg format

   ev: evolution of binding sites
   evd: evolution display

   makedate: make a date file
   makessbdate: make a date file from a Sample_Sheet.bin file

 PROGRAMS TO CONTROL MACHINERY
   odti: munch od and time plates together for xyplo
   titer: analyse titertek optical density data
   spec: analyse two spectra from the camspec
   ssbread: read a sample sheet from the ABI sequencer
   tkod: read od values from tk data

(* end module delman.describe.short.cluster.programs *)
% makman 1.32
(* begin module describe.delman2 *)


ddddddd   eeeeeeee  ll        m      m     aa     n     nn  
dd    dd  ee        ll        mm    mm    aaaa    nn    nn  
dd    dd  ee        ll        mmm  mmm   aa  aa   nnn   nn  
dd    dd  eeeeeee   ll        mmmmmmmm  aa    aa  nnnn  nn  
dd    dd  ee        ll        mm mm mm  aa    aa  nn nn nn  
dd    dd  ee        ll        mm    mm  aaaaaaaa  nn  nnnn  
dd    dd  ee        ll        mm    mm  aa    aa  nn   nnn  
dd    dd  ee        ll        mm    mm  aa    aa  nn    nn  
ddddddd   eeeeeeee  llllllll  mm    mm  aa    aa  nn    nn  
                                                            
                                                            
                     222222   
                    22    22  
                          22  
                       2222   
                      22      
                     22       
                    22        
                    22        
                    22222222  
                              

Note: this page is kept in file aa.p on our UNIX system to
make it easy to make a manual of all the program documentation
with this as the first page.
   This is done by concatenating all the program source codes
together and running this through makman and pbreak:
   cat *.p | makman | pbreak > delman2.print &

   If your version of pbreak does not add blanks in front of
the lines of delman2.print, you can run delman2.print
through the program maknam to create a short listing of what
each program does.

(* end module describe.delman2 *)
version = 4.07 of aa.p delman2 1993 Jan 27 Schneider-Stormo
(* begin module describe.documentation.programs *)
<(*>
<name>
      program name<:> a one-line description of the program.
                      See description (below) for more details.

<synopsis>
      name<(>file1<: >i/o<, >file2<: >i/o<, >file3<: >i/o<, >...<)>
      This is the program statement with each file name followed
      by the input/output (i/o) use of the file:
         in     the file is used strictly for input (read-only)
           out  the file is used strictly for output (write-only)
         inout  the file is used for both input and output (read/write)
         intty  the file is used for interactive input (teletype)

<files>
      file1<: > multiple line detailed description of file 1
      file2<: > multiple line detailed description of file 2
      file3<: > multiple line detailed description of file 3
      ...

<description>
      The purpose and use of the program.
      All programs in the delila system are documented in the form shown on
      this page.
      <...>  indicates a literal, you must include it.
      <...>* these sections are optional, others are obligatory.
      This rigid style model will encourage uniformity and help the reader
      to know where to look.
         Note:  the description should be in flowing language, to introduce the
      program to people and make them interested in using it.
         Warning: do not make any describe module longer than 60 lines
      or it will not fit as a page in delman.

<examples>*
      An example of the use of this form is module describe.lister

<documentation>*
      Other sources of information or documents on the program.

<see also>*
      Other programs and related programs.

<author>
      One should be proud of one's work, and one should be
      responsible for it.

<bugs>
      problems with the program and how to get around them (if known).
      Since in many cases, no bugs are known, this section is intended to
      include bugs in the design of the program.  How might the program be
      written better if one were to start again from scratch?

<technical notes>*
      Details about the implementation that may be relevant to a user.
      These notes are not, repeat not, to contain values of constants,
      since these may change (use the name of the constant).

<*)>
(* end module describe.documentation.programs *)
version = 1.00 of describe.documentation.programs
(* begin module describe.alist *)
(*
name
      alist: aligned listing of a book

synopsis
      alist(inst: in, book: in, alistp: in, colors: in, namebook: in,
            list: out, clist: out, output: out)

files
      inst: delila instructions of the form 'get from 56 -5 to 56 +10;'
         (This file may be empty, in which case the sequences will be
         aligned by their 5' ends.)
      book: the book generated by delila using inst
      alistp: parameters to control the program.  If empty, the range of the
         instructions are used.  Otherwise, 
         1. The first line contains one line with two integers
         defining the range to display.  This allows one to have a wide
         alignment, but look only at a portion.
         2. If the first character of the second line is 'p' the piece
         information is given in the list.
         3. If the first character of the third line is 'n' then paging
         is not done to the list.
      namebook: names of genes or transcripts from this book appear in
         the list.  If namebook is empty, then only the items specified in
         alistp are given.
      list: the aligned listing
      clist: the aligned listing, in PostScript color
      colors: colors defining the bases, see makelogo for definition.
      output: messages to the user

description
      Alist is useful for looking at aligned sets of sequences.
      The pieces in the book are aligned according to the instructions in
      file inst, and listed in the list file.  Each piece is identified, and
      a bar of numbers (called a 'numbar') that are read vertically defines
      the locations of bases around the aligning point.

example
      To generate the input set, start with a set of instructions that name
      genes and get them (as 'get from gene beginning -0 to gene beginning
      +2;').  Produce namebook.  Check for genes that are reversed relative
      to the piece (use hist and alist without instructions), and correct
      the delila instructions.  To convert these instructions to absolute
      form, use program search with 'd f -54321 t +12345 q atg gtg ttg' on
      namebook.  Now convert -54321 and +12345 to the range of interest
      (beware of absolute locations with the same numbers).  Finally,
      generate the book using delila.  (Someday this process will be simpler.)

documentation
      delman.use.aligned.books

author
      Thomas D. Schneider

bugs
      If you use relative instructions, then alist will bomb.
      Ie, do not use instructions of the form:
          get from gene beginning - 5 to gene beginning +5;

      Alist is not very smart about how it finds the instructions.  It uses
      the first letter of the line to find the instruction 'get'.
      Unfortunately, if the word 'gene' is found, alist does not know this
      and will bomb.  Simply add blanks infront of the word 'gene' if you
      want to keep the gene instruction.

      There is also an unsolved bug in alist:
      When the pieces and instructions are not 'just right', alist will
      produce listings that are thousands of characters wide...  The reason
      for this is not completely clear, but it is related to attempting
      to extend the from-to range of an aligned book, and perhaps to incorrect
      responses of delila when attempting to 'reduce' a piece beginning or
      ending that is off the end of a fragment of a circular piece.  The code
      now contains traps that halt the program when wide listings would have
      been generated.

technical notes
      variable nametype defines the kind of name picked up in namebook.

*)
(* end module describe.alist *)
version = 4.64; (* of alist.p 1993 January 26
(* begin module describe.alpro *)
(*
name
   alpro: frequency and information of aligned protein sequences

synopsis
   alpro(protseq: in, symvec: out, output: out)

files
   protseq:  Aligned protein sequences.  The first line, intended for
      identification of the entire data set, is skipped.  The header line must
      begin with an asterisk '*'.  The remaining lines are used for the
      sequences.  They are divided into `entries'.  The beginning of an entry
      has any (positive) number of identification lines, each of which begins
      with an asterisk '*'.  The sequence follows.  Gaps are indicated with
      dashes (-).  The end of the sequence is indicated by a period.
   symvec:  table of frequencies and information content.  The information
      measure is corrected for small sample size (Schneider et al, 1986).

   output: messages to the user

description
   Take an aligned set of protein sequences and produce input to the
   makelogo program for producing a logo.

   The program originally only created a vector that contained the characters
   of the alphabet, so the output was called an 'alvec'.  To reflect the use of
   symbols, the name of the output file was changed to symvec, but I like
   'alpro', and 'prosym' is awkward that I decided to keep the name alpro.

examples

* This is an example sequence.
AG-EGCTT.
* This is the second example sequence.
* It is the last one.
YLREBS-A.

documentation
   Jotun Hein, Methods of Enzymology 183:626-645 (1990)
   Schneider et al. JMB 188:415 (1986)
   
@article{Schneider.Stephens.Logo,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "6097-6100",
year = "1990"}

see also
   makelogo.p

author
   Thomas D. Schneider
   National Cancer Institute
   Laboratory of Mathematical Biology
   Frederick, Maryland  21702-1201
   toms@ncifcrf.gov

bugs

technical notes
   The feature which adjusts the stack height when there is a small amounts of
   data, (described in the second paragraph of page 6100 of the logo paper),
   has been removed now because the ability to display the variance as a
   standard deviation by makelogo alerts the person that the position has
   little data in it.  Thanks to Peter Shenkin for the suggestion.

   The original feature was described as follows:

      "Positions that contain mostly spacer characters for the alignment are
      also reduced in weight by multiplying the information by the maximum
      number of sequences and dividing it by the actual number at the spacer
      position.  Thus if there are 10,000 sequences, a position with 200 A's
      would would be close to 2 bits of pattern.  However, since the position
      only represents 2% of the sequences, this program would only give it a
      weight of 0.02*2 = 0.04 bits.  A better method is not known.  However,
      this prevents one from being fooled by positions that don't appear in
      most sequences."

*)
(* end module describe.alpro *)
version = 1.52; (* of alpro.p 1992 March 6
(* begin module describe.alword *)
(*
name
   alword: frequency and information of aligned words

synopsis
   alword(words: in, symvec: out, output: out)

files
   words:  Aligned words.  Since the input is usually to be a UNIX dictionary,
      there need not be any header lines.  However, if they exist, they must
      begin with an asterisk '*'.

      The remaining lines are used for the words.

   alwordp: parameters to control the program.  If the file is empty
      defaults are used.
      If the first line begins with the letter `e' then the words are
      aligned by their last character.
      If there is a first line, the second line must have the maximum
      word length to be included in the calculation.  Words longer than
      this will be skipped (and reported to output).

      If the first character of the second line is 'a' then all of the
      words in the file will be read.  Otherwise, only the first
      word on each line will be read.
   symvec:  table of frequencies and information content.  The information
      measure is corrected for small sample size (Schneider et al, 1986).
   output: messages to the user

description
   Take an aligned set of protein sequences and produce input to the
   consensus program for producing a logo.

examples

* This is an example sequence.
AGGEGCTT.
* This is the second example sequence.
* It is the last one.
YLREBS.

documentation
   Jotun Hein, Methods of Enzymology 183 (1990)
   Schneider et al. JMB 188:415 (1986)

see also
   alpro.p, makelogo.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.alword *)
version = 2.07; (* of alword.p 1992 June 4
(* begin module describe.aran *)
(*
name
   aran: aligned random sequences

synopsis
   aran(book: in, aranp: in, list: out, sequ: out, output: out)

files
   book: the book generated by Delila

   aranp:  Parameters to control the program.
      The FIRST LINE must contain one real number which is the degree of
      conservation.  For example, if this is 0.85, then each base will have 85%
      chance of being the same, while the other bases will be 5% each.

      The SECOND LINE must contain the number of sequences to generate.

   list: details of the run.

   sequ: the aligned sequences, for input to makebk

   output: messages to the user

description
   Aran takes a sequence as a starting point and generates random
   sequences from it.  The program simulates a very simple dirty synthesis of
   the sequence.  The synthesis is to be mostly the bases given in the
   sequence.  The probability of conserving each base (f) is defined in the
   parameter file.  If a particular base is not conserved, then the other
   three bases are assigned probabilities of (1-f)/3.

example
   See alist

documentation
   delman.use.aligned.books

author
   Thomas D. Schneider

bugs
   See alist

technical notes
   The program constant seqmax defines the length of the longest sequence
   that can be created.

*)
(* end module describe.aran *)
version = 1.15; (* of aran.p 1990 Oct 3
(* begin module describe.asciicode *)
(*
name
   asciicode: converts ascii table to Pascal code

synopsis
   asciicode(ascii: in, code: out, output: out)

files
   ascii:  The ascii file must contain this table:

|  0 NUL|  1 SOH|  2 STX|  3 ETX|  4 EOT|  5 ENQ|  6 ACK|  7 BEL
|  8 BS |  9 HT | 10 NL | 11 VT | 12 NP | 13 CR | 14 SO | 15 SI 
| 16 DLE| 17 DC1| 18 DC2| 19 DC3| 20 DC4| 21 NAK| 22 SYN| 23 ETB
| 24 CAN| 25 EM | 26 SUB| 27 ESC| 28 FS | 29 GS | 30 RS | 31 US 
| 32 SP | 33  ! | 34  " | 35  # | 36  $ | 37  % | 38  & | 39  ' 
| 40  ( | 41  ) | 42  * | 43  + | 44  , | 45  - | 46  . | 47  / 
| 48  0 | 49  1 | 50  2 | 51  3 | 52  4 | 53  5 | 54  6 | 55  7 
| 56  8 | 57  9 | 58  : | 59  ; | 60  < | 61  = | 62  > | 63  ? 
| 64  @ | 65  A | 66  B | 67  C | 68  D | 69  E | 70  F | 71  G 
| 72  H | 73  I | 74  J | 75  K | 76  L | 77  M | 78  N | 79  O 
| 80  P | 81  Q | 82  R | 83  S | 84  T | 85  U | 86  V | 87  W 
| 88  X | 89  Y | 90  Z | 91  [ | 92  \ | 93  ] | 94  ^ | 95  _ 
| 96  ` | 97  a | 98  b | 99  c |100  d |101  e |102  f |103  g 
|104  h |105  i |106  j |107  k |108  l |109  m |110  n |111  o 
|112  p |113  q |114  r |115  s |116  t |117  u |118  v |119  w 
|120  x |121  y |122  z |123  { |124  | |125  } |126  ~ |127 DEL

   code:  Pascal code that converts integers to these names.
   output: messages to the user

description

   This program generates a chunk of Pascal code that is useful
   for detailed investigation of file characters.

examples

documentation

see also
   lochas.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.asciicode *)
version = 1.01; (* of asciicode.p 1993 January 26
(* begin module describe.auxmod *)
(*
name
      auxmod: modules for auxiliary programs

synopsis
      auxmod(hst: in, cmp: in, patt: in, output: out)

files
      hst: a histogram from hist for testing, or empty
      cmp: a composition from comp for testing, or empty
      patt: a pattern matrix from patlrn for testing, or empty
      output: the version of auxmod is printed.  test results are printed.
         successful compilation and running of the program indicates that
         the modules are correct.

description
      auxmod is a collection of modules used only rarely in various
      auxiliary programs.  it includes modules for reading compositions
      (comp.), histograms (hist.), helix lists (findcolon and
      gethelix) and pattern matrices (matrix.).

see also
      delmod, module, hist, comp, patlrn

author
      gary d. stormo and thomas d. schneider

bugs
      none known

*)
(* end module describe.auxmod *)
version = 'auxmod 1.39 86 dec 12 gds/tds';
(* begin module describe.av *)
(*
name
   av: average integers

synopsis
   av(input: in, output: out)

files
   input:  give pairs of integers
   output: rounded average of the integers

description
   Genbank features are given as endpoints; we need to convert
to the central base for delila instructions.  This program lets one
do that.  The program rounds the result.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.av *)
version = 1.03; (* of av.p 1992 Jun 2
(* begin module describe.biglet *)
(*
name
   biglet: text enlargement program

synopsis
   biglet( fin: in, font: in, bigletp: in, fout: out, output: out )

files
   fin: contains user's text to be enlarged.
   font: the first line contains the actual height and width of characters
         in the font.  The following lines contain character images.  A
         character image has two parts, a reference character and the
         letter image.  Characters in the image that match the reference
         character are printed, while a mismatch prints a space.
   bigletp: contains parameters to control enlargement.  If the file is
         empty the fonts are not enlarged. otherwise, each line
         contains the height and width enlargement factors.  The line
         may also contain a character inside quote marks (single or
         double) to substitute for the matched characters of the font
         images.  Each line of bigletp corresponds to a fin text line.
         If there are no further lines, previously set values are used.
   fout: each line of fin is expanded by bigletp parameters and printed
         out in the form of the font images.
   output: messages to the user.

description
   Each letter of text (in file fin) is expanded and printed as a larger
   letter which is composed of many smaller letters.  The expansion can be
   set for each text line or for all lines with one parameter setting.
   There is an optional parameter which allows all the large letters of a
   specified line to be composed of a single character.  The larger letters
   are based on a file called font which can contain any sort of images.

examples
   For a font file whose first line is a left justified 5 4:
   f (sixth letter)     (a space)         - (a dash)
   fff-               ----                xxxx      Note: in the file each
   f---               ----                xxxx      character image must be
   fff-               ----                ---x      left justified and be
   f---               ----                xxxx      directly below the
   ----               ----                xxxx      previous image.
   Also, each image has mismatches at its right and below used for spacing.
   for bigletp:          example 1)   2 1         example 2)   3 2 'r'
                                                               1 2 'w'
   The first example magnifies the first and all subsequent text lines
   twice in height.  The second example magnifies the first line at 3 by 2
   and composes it out of 'r's.  The next line will be twice as wide as the
   font and composed of 'w's.  All subsequent fout text will be also be
   twice as wide but made up of the usual font characters.

   The phont file is a demonstration font file, while the font file
   is a working font.

author
   Matthew A. Yarus

bugs
   none known

technical notes
   If your font images are larger than program allows change constants
   letmaxhi and letmaxwi in biglet source code.
*)
(* end module describe.biglet *)
version = 1.65; (* of biglet 1986 dec 15
(* begin module describe.binhex *)
(*
name
   binhex: convert binary to hex

synopsis
   binhex(input: in, output: out)

files
   input: binary representation of an image, from binhex
   output:  hexadecimal representation of an image, PostScript
   shape: First line contains two characters to skip and then
      two integers, the width and height of the image.

description
   To allow one to work with a PostScript hex image in binary format
   it is converted.

examples

documentation
   PostScript red book p. 170

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.binhex *)
version = 1.08; (* of binhex.p 1991 October 17
(* begin module describe.binomial *)
(*
name
   binomial: produce the binomial probabilities for a found black to white ratio

synopsis
   binomial(xyin: out, xyplop: out, binomialp: in, output: out)

files
   xyin: a table of probabilities of finding the given black to white
      ratio, versus the true probability.  The form is a series of lines
      that begin with '* ', followed by two columns of numbers.
      The first column is the number of blacks, and the second column is
      the corresponding value of p(black:white|pb) = the probability of
      obtaining black and white given pb, the probability of black.
      This file is direct input to the xyplo program.
   xyplop: the controls for the xyplo program to generate the graph.
      These may be modified by the user before plotting.
   binomialp: parameters to control the program, on three lines:
      blacks and whites: two integers on the first line, representing the
         number of black balls and white balls obtained in an experiment
      probability of black
      plot max: maximum number of blacks to show.

description
   Suppose there exists a large bin containing both black and white
   balls.  The true fraction of black balls in the bin is fraction, and
   the fraction of white balls is (1-fraction).  We obtain a sample of
   black and white balls from the bin, given as the first two parameters
   in binomialp.  The probability of getting this black:white sample is:

                                 (black+white)!         black             white
       p(black:white|fraction) = -------------- fraction      (1-fraction)
                                 black!white!

   The program generates these probabilities for a given fraction.
   The results are in a form that the xyplo program can use to plot.

see also
   xyplo, binplo

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.binomial *)
version = 1.42; (* of binomial, 1988 feb 24 *)
(* begin module describe.binplo *)
(*
name
   binplo: produce the binomial probabilities for a found black to white ratio

synopsis
   binplo(xyin: out, xyplop: out, binplop: in, output: out)

files
   xyin: a table of probabilities of finding the given black to white
      ratio, versus the true probability.  The form is a series of lines
      that begin with '* ', followed by two columns of numbers.
      The first column is the value of fraction, and the second column is
      the corresponding value of p(black:white|fraction) = the probability of
      obtaining black and white given fraction.  This file is direct input
      to the xyplo program.
   xyplop: the controls for the xyplo program to generate the graph.
      These may be modified by the user before plotting.
   binplop: parameters to control the program
      blacks and whites: two integers on the first line, representing the
         number of black balls and white balls obtained in an experiment
      points: one integer on the second line, how many data points should
         be generated in the fout.  If points is zero, then the program
         tests its binomial probability procedure by adding all the
         probabilities that correspond to the binomial distribution.
         For example, with 1 black and 18 white balls, the test is to
         add the probabilities for (0,19), (1,18), ... (19,0).  This
         value should be close to 1.00 if the procedure is correct.

description
   Suppose there exists a large bin containing both black and white
   balls.  The true fraction of black balls in the bin is fraction, and
   the fraction of white balls is (1-fraction).  We obtain a sample of
   black and white balls from the bin, given as the first two parameters
   in binplop.  The probability of getting this black:white sample is:

                                 (black+white)!         black             white
       p(black:white|fraction) = -------------- fraction      (1-fraction)
                                 black!white!

   the program generates these probabilities for all values of fraction,
   and gives the results in a form that the xyplo program can use to plot.

see also
   xyplo

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.binplo *)
version = 1.29; (* of binplo, 1987 feb 10 *)
(* begin module describe.bkdb *)
(*
name
   bkdb: convert a book to database format for the sites program

synopsis
   bkdb(book: in, database: out, output: out)

files
   book: a book containing many sequences of the same size.
   database: the format used by the sites program.
   output: messages to the user

description
   The program converts a book to the database format used by the sites
   program.

examples

documentation

see also
   sites.p

author
   Thomas Dana Schneider

bugs
  It sure would be nice to have on uniform type of format, but the GenBank
  format is not yet defined (and it is 5 years after GenBank was told
  by a national advisor to do this!), so we wait.

technical notes

*)
(* end module describe.bkdb *)
version = 1.01; (* of bkdb.p 1991 January 14
(* begin module describe.calc *)
(*
name
   calc: a calculator that propagates errors

synopsis
   calc(input: in, output: out)

files
   input: reverse polish calculator input
   output: results

description
   The program is based on the idea of the dc program under UNIX.  That
   program takes input as reverse polish and calculates values.  This
   program does the same, but values have estimates so one may calculate and
   propagate errors.  Tokens (commands and numbers) are usually separated by
   spaces or carriage returns.  Tokens that begin with a digit or a dash (-)
   are numbers.  Numbers always come in pairs, the first is the estimate and
   the second is the error.

   Some of the commands are:
     h give current list of all commands and functions
     numbers (as pairs) are entered on the stack
             5 2
          means 5 +/- 2
             5'
          means 5 +/- 0, so you can avoid giving the estimate if you want.
          any other legal command may replace the single quote as "5p".
     + add the top two numbers on the stack together
     _ (UNDERSCORE) subtract the top number from the next number
       on the stack (underscore is used to be distinct from minus sign, -)
     * multiply the top two numbers on the stack together
     / divide the top number on the stack by the next number on the stack
     s print the stack, top down
     p print the top number on the stack

   Note:  When the program is asked to do calculations silently,
   (using the t command) it immediately shuts up and does not say that it
   is doing so.  This makes it easier to write programs without having them
   announce in the output that they are doing silent calculations.

documentation
   An Introduction to Error Analysis,  John R. Taylor
   University Science Books, Mill Valley, CA. 1982.

author
   Thomas Schneider

bugs
   Pascal numeric input is used, so anything that can make Pascal
   bomb will bomb this program.  For example, "- ", will cause
   the program to think there is a number after the dash, and (our)
   Pascal will object.  This should be protected against now,
   so the program should never bomb (famous last words).

   The u (uncertainty) function error estimate is set to zero when
   the probability is zero.  This is a guess.

*)
(* end module describe.calc *)
version = 2.44; (* of calc.p 1992 September 3
(* begin module describe.calhnb *)
(*
name
      calhnb: calculate e(hnb), var(hnb), ae(hnb), avar(hnb), e(n)

synopsis
      calhnb(fin: in, fout: out, output: out)

files
      fin: the genomic composition (integers) on one line followed by
         a set of integers, one per line representing values of n
      fout: a table showing n, e(hnb), ae(hnb) and their difference.
         the variances var(hnb) and avar(hnb) are tabulated along with
         the difference between their square roots.  this is the difference
         between the standard deviations.  e(n) is found from the genomic
         entropy minus e(hnb).
      output: messages to the user.

describe
      given a genomic composition and a series of integers (n) that
      represent the number of sample sites, calhnb calculates the sampling
      error as e(hnb) and the variance var(hnb).  it also finds the
      approximations ae(hnb) and avar(hnb).  these values are presented in a
      table along with the differences between the exact and approximate
      calculations.  this table will allow a user to decide when to use the
      approximations.  beware that the exact calculation becomes very expensive
      for large n.

documentation
      "Information content of binding sites on nucleotide sequences"
      T. D. Schneider, G. D. Stormo, L. Gold, and A. Ehrenfeucht
      JMB 188:415-431 (1986)

see also
      rseq

author
      thomas d. schneider

bugs
      none known
*)
(* end module describe.calhnb *)
version = 2.21; (* of calhnb 1988 feb 24
(* begin module describe.calico *)
(*
name
      calico: character and line counts of a file

synopsis
      calico(input: in, output: out);

files
      input: a file for which one wants to know the number of characters
         and lines
      output: the number of characters and lines in input

description
      there are many circumstances when one would like to know the number of
      characters and the number of lines in a file.

examples
      will a file fit on one page?  can this file be put into the
      memory of a personal computer for transportation to another computer?

author
      susan p. scolman and thomas d. schneider

bugs
      none known

technical notes
      blanks at ends of lines are counted as characters.
      only the end of line mark is counted, not carriage return and line feed.

*)
(* end module describe.calico *)
version = 1.08; (* of calico.p 1993 January 27
(* begin module describe.cap *)
(*
name
      cap: put capital letters inside quotes of a program

synopsis
      cap(sin: in, sout: out,output: out)

files
      sin: the source program or file
      sout: the source program with capital letters in all
         quote strings.
      output: messages to the user

description
      A pascal program under Unix must be small characters, yet a
      database will often be in capital letters, so the program will
      not recognize the data.  This program makes the sin program
      have capital letters only in the quote strings.

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.cap *)
version = 1.08; (* of cap.p 1989 July 8 *)
(* begin module describe.catal *)
(*
name
      catal: cataloguer of delila libraries, the catalogue program

synopsis
      catal(humcat: out, catalp: in,
            l1: in, cat1: out, lib1: out,
            l2: in, cat2: out, lib2: out,
            l3: in, cat3: out, lib3: out,
            output: outt)

files
      humcat: the catalogue generated for humans.  it includes the names
         of things in the libraries and their coordinates.  humcat is quite
         wide so you will need a line-printer to print it.  alternatively
         you can use the split program.
      catalp: a parameter to control the program.  the library
         dates are not changed if the first character is 'n' (no date
         modification) or 'b' (book source of library, dates are not to
         be changed).  otherwise the dates are advanced.
      l1: the first input file of the library
      cat1: the first catalogue
      lib1: the first output library
      l2: the second input file of the library
      cat2: the second catalogue
      lib2: the second output library
      l3: the third input file of the library
      cat3: the third catalogue
      lib3: the third output library
      output: progress report and error messages

description
      the catalogue program checks all the input libraries for correct
      structure.  duplicated names are removed and a new set of library
      files is created, along with their catalogues for delila.  a catalogue
      is also generated for people to use.  each new library is associated with
      one catalogue.  under most circumstances this pair can be given to
      delila along with pairs created at different times.

documentation
      libdef (defines catal), delman.use.coordinates, delman.construction

see also
      loocat, delila, split

author
      Michael Aden and Thomas Schneider

bugs
      not all checks on the library structure are made.  some checks from
      libdef are now outdated or not done: p. 3.1 2 d, e, f, g and l.

technical notes
      the circumstances when a library-catalogue pair must not be used with
      another pair:  it is not possible for delila to check for two
      organisms with the same name that exist in different libraries.  in
      this case, run the two libraries through catal together to eliminate
      the ambiguity.  if this is not done, the results will be anomalous.

*)
(* end module describe.catal *)
version = 9.23; (* of catal.p 1992 September 14 *)
(* begin module describe.censor *)
(*
name
   censor: removes code from a program

synopsis
   censor(input: in, output: out)

files
   input:  input program with private text
   output: output program without private text

description
   The program allows one to maintain a Pascal program for personal use which
   contains features that are not yet to be made public.  The program contains
   special comment marks that delimit the text to be removed.  There are two
   situations.

   The first is the case of sections of text inside comments.  Any text
   surrounded by  will not be copied to the output.  This includes the
   double brackets themselves.

   The second case is sections of normal code.  Letting '@' represent the
   asterisk (so that this description does not run into trouble when it
   is inside a Pascal comment), the text between and including the symbols
   (@@) is not copied to the output.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.censor *)
version = 1.46; (* of censor.p 1991 February 20
(* begin module describe.cerf *)
(*
name
   cerf: complement of the error function

synopsis
   cerf(input: in, list: out, output: out)

files
   input:  Give the z value you want evaluated.  Enter a number less than
      zero to stop the program.
   list: the complement of the error function and the error function
   output: messages to the user

description
   The area under the Gaussian distribution is found, given values of z.
   The error function is:
      erfc(y) = (2/sqrt(pi)) * integral from y to infinity exp(-t*t) dt.

documentation
   This is program ERFD3, figure 11.7, p. 330-333 in
   Pascal Programs for Scientists and Engineers
   Alan R. Miller, Sybex, 1981

author
   Thomas Dana Schneider

bugs
   none known

technical notes
   the tolerance may be adjusted, see the constants.

*)
(* end module describe.cerf *)
version = 1.04; (* of cerf 1988 September 14
(* begin module describe.chacha *)
(*
name
      chacha: changes characters in a file

synopsis
      chacha(fin: in, fout: out, chachap: in, output: out)

files
      fin:     any file in which one wants to translate one set of
               characters into another set.
      fout:    the file to which the translated copy of fin is written.
      chachap: the chacha parameter file which contains the translation sets.
               chachap must only contain 2 lines.
               the first line contains the characters used in fin, typed one
                  right after the next with no blanks at the beginning.
               the second line contains the characters that the
                  characters in the first line are to be translated
                  into, typed in the same way and in corresponding order.
               if you want to change a character to blanks,
                  or vice versa, then you must have the blank character
                  in between other characters in chachap.
      output:  where error messages will appear.

description
      chacha translates characters in a file to a new set of characters.
      also, more than one character can be translated in one run
      of the program.

examples
      to convert between double and single quotes, use:
         '"
         "'
      to convert blanks to periods, use:
         j j
         j.j
      in the chachap file.
      each character on the first line on chachap will be translated
      into the character directly beneath it on the second line in
      the output file.

documentation
      delman.assembly.intro  and  delman.assembly.chacha

see also
      worcha

author
      patrick r. roche

bugs
      none known

technical notes
      the maximum number of characters that can be translated is
      constant top.  caution: top is also the maximum line length.

*)
(* end module describe.chacha *)
      version = 3.10; (* of chacha 1985 apr 17 *)
(* begin module describe.chi *)
(*
name
   chi: estimates chi squared from degrees of freedom

synopsis
   chi(input, output);

files
   input:  degrees of freedom
   output: messages to the user

description
   estimates chi squared, given degrees of freedom.

documentation
   @book{Finberg1978,
   author = "S. Finberg",
   title = "Analysis of Cross Classified Catagorical Data",
   publisher = "MIT Press",
   address = "Cambridge, Mass?",
   year = "1978",
   comment = "from Chip Lawrence, S=Steven"}
   appendix iii

author
   Thomas Dana Schneider

bugs
   it's only an estimate

*)
(* end module describe.chi *)
version = 1.03; (* of chi 1988 July 12
(* begin module describe.cisq *)
(*
name
   cisq: circle to square

synopsis
   cisq(cisqp: in, xyin: out, output: out)

files
   cisqp: parameters to control the program
      First line: lowest value of m, mlo. 
      Second line: highest value of m, mhi. 
      Third line: increment in the value of m, mstep.
      Fourth line: desired radius of a circle if m = 2, reffective.
      Fifth line: number of steps to take to move around 360 degrees.
      Sixth line: A factor by which to increase the value of theta, spinfactor.
         1 gives a square, 1.5 gives a hexagon.
   xyin: input to the xyplo program.  Curves that are close to integer
      values of n have the symbol m, others have the symbol r.  This allow
      them to be distinguished by the graphics routines.
   output: messages to the user

description
   Plot the equation
      |x|^m + |y|^m = |reffective|^m
   where reffective is the "effective" radius of the curve, |x| is the absolute
   value of x, and ^ means to raise to the mth power.  This gives a line if m =
   1, a circle if m = 2 and approaches a square as m -> infinity!

   The method for producing the curves is to re-express the equation in polar
   coordinates.  One must be a bit careful to distinguish between the effective
   radius (reffective) and the current polar coordinate (r).  After making this
   distinction we can write:

      x = r cos theta
      y = r sin theta

   and rearrange to solve for r, while keeping reffective fixed as it should
   be.

   Dividing the basic formula by r (>0) and converting to polar coordinates
   gives:

            (reffective/r)^m := / ((|cos(theta)|)^m + (|sin(theta)|)^m);

    To do this in Pascal, we have to use the form, a^m = exp(m*ln(a)).
    This gives:

            exp(m * ln(r/reffective)) := 1 / ( exp(m * ln(abs(cos(theta))))
                                             +
                                             exp(m * ln(abs(sin(theta)))) )

    where we have also introduced the absolute function on the sine and cosine.
    One more rearrangement gives
             r := reffective * exp( ln(
                                        1 / ( exp(m * ln(abscostheta))
                                              +
                                              exp(m * ln(abssintheta)) )
                                       ) / m);

    which is the form used in the code.

    In the cases where the sine or cosine are zero (ie on the axes), we
    must not calculate at all, to avoid log of zero.  We simply
    set r = reffective in those cases.

    The program has a special feature to speed up the angle of the calculation
    (theta) so that it moves faster than the angle at which the graph is
    plotted.  With a factor of 3/2, the four corners become 3/2 * 4 = 6
    corners, and we obtain a hexagon.

examples
   To produce a nice square, use the parameters:

0.5   First line: lowest value of m.
5.0   Second line: highest value of m.
0.1   Third line: increment in the value of m
1     Fourth line: desired radius of a circle if m = 2.
100   Fifth line: number of steps to take to move around 360 degrees.
1     Sixth line: A factor by which to increase the value of theta.
         1 gives a square, 1.5 gives a hexagon.

   To produce a hexagon transformed into a circle, use the parameters:

1.5   First line: lowest value of m, mlo.
2.0   Second line: highest value of m, mhi.
0.1   Third line: increment in the value of m, mstep.
1.0   Fourth line: desired radius of a circle if m = 2, reffective.
100   Fifth line: number of steps to take to move around 360 degrees.
1.5   Sixth line: A factor by which to increase the value of theta, spinfactor.
         1 gives a square, 1.5 gives a hexagon.

   It is not clear why one has to use the lowest value of n as the same as the
   theta factor (6th parameter), but it works!  (One would have to prove that
   with these parameters one gets an exact straight hexagon edge.)

documentation
   Inspired by:

> Article 7568 in sci.math:
> From: pvmg0487@uxa.cso.uiuc.edu
> Subject: hexagonal cone function sought
> Message-ID: <107700002@uxa.cso.uiuc.edu>
> Date: 22 Nov 89 22:25:00 GMT
> 
> I would like to generate a 3-D cone like object, but with a hexagonal
> base.  Any suggestions as to an appropriate equation?
> 
> Thanks -- Vernon

> Article 7578 in sci.math:
> From: toms@ncifcrf.gov (Tom Schneider)
> Subject: Re: hexagonal cone function sought
> Message-ID: <1405@fcs280s.ncifcrf.gov>
> Date: 25 Nov 89 01:18:14 GMT
> References: <107700002@uxa.cso.uiuc.edu>
> Reply-To: toms@fcs260c2.UUCP (Tom Schneider)
> Organization: National Cancer Institute, Frederick
> Lines: 30
>
> In article <107700002@uxa.cso.uiuc.edu> pvmg0487@uxa.cso.uiuc.edu writes:
> >
> >I would like to generate a 3-D cone like object, but with a hexagonal
> >base.  Any suggestions as to an appropriate equation?
> >
> >Thanks -- Vernon
>
> Well, that's pretty surprising, since just today I was thinking about a
> function that does almost exactly what you want!  It turns out that the
> equation x^n + y^n = r^n is a line (diamond) if n = 1, a circle if n = 2 and
> approaches a square as n -> infinity!  So all one needs to do is express this
> in polar notation, and then scrunch an extra two corners in to get what you
> want!
>
> First, use the form x^n + y^n = rmax^n (to avoid confusion!) and substitute x
> = r cos(theta), y = r sin(theta).  Divide both sides by r^n, and rearrange to
> get r expressed as a function of theta.  To get the powers, I had to use a^b
> = exp(b*ln(a)).  The thing is symmetrical around the 4 quadrants, so I
> avoided logs of negative numbers by taking the absolute values of the sine
> and cosine functions.  Also, at angles of n*pi/2, one gets division by zero,
> so just substitute the desired radius.
>
> I have done this by writing a Pascal program that will do the job.  Pretty!
> It turns out that to get a hexagon, you have to plot between n=1.5 and n=2
> because of the scrunching.  Email me if you want a copy of the program.
>
>   Tom Schneider
>   National Cancer Institute
>   Laboratory of Mathematical Biology
>   Frederick, Maryland  21701-1013
>   toms@ncifcrf.gov

> From daemon Tue Nov 28 09:34:41 1989
> Return-Path: <daemon>
> Date: Tue, 28 Nov 89 08:35:52 -0600
> From: Paul Vernon McDonald <pvmg0487@uxa.cso.uiuc.edu>
> Message-Id: <8911281435.AA01048@uxa.cso.uiuc.edu>
> To: toms@ncifcrf.gov
> Subject: Pascal code for hexagon
> 
> Tom,
> I'd be most grateful to receive your code, if you are willing to share it.
> I curently have a working version of the hexagon, done in piecewise
> fashion, but I'd be interested in a generic solution.  In fact I plan
> to use other shapes in the future, so your code may be of great help.
> 
> Thanks,
> 
> Vernon McDonald
> University of Illinois
> Department of Kinesiology
> Urbana, IL, 61801
> vmcdonald@uiuc.edu
> 
> From toms Tue Nov 28 13:17:17 1989
> To: pvmg0487@uxa.cso.uiuc.edu
> Subject: Cisq
> 
> Vernon:
>   Sure, I wrote the code mostly because of your posting.  But actually it
> has suddenly become very important to my work (it's a long story...) and so
> it is useful to me to have it.  I have to brush it up a bit and I will
> send it to you.  If you have a PostScript printer, then you may also
> want the xyplo program, which produces PostScript x-y plotting of data.
> This made writing cisq (circle square) easier because I only needed to
> create the right numbers and xyplo did the graphics for me.
> Tom

see also
   xyplo.p, the Pascal program that produces PostScript x-y plotting graphics.

author
   Thomas Dana Schneider

bugs
   One might also want to produce the hexagon for INCREASING values of n,
   rather than being confined into the region n=1.5 to 2.  It seems that to do
   this requires that one do a fancy job of warping the square region into the
   appropriate triangular region.  This should be pretty easy with the right
   afine transformation, but the program doesn't have that feature in it.
   Fortunately, it is not necessary.

technical notes

*)
(* end module describe.cisq *)
version = 1.43; (* of cisq.p 1989 December 19
(* begin module describe.ckhelix *)
(*
name
   ckhelix: check that the helix location is where one wants

synopsis
   ckhelix(makelogop: in, ckhelixp: in, output: out);

files
   makelogop: the parameter file of the makelogo program
   ckhelixp:
      wave location: the point in bases on THIS logo which is to
         align with the other logos.
         NOTE:  this is NOT necessarily the high or low point
         of the wave as given by the wave parameter file of the
         makelogo program, hence it is not read from that file.
      zero:  location of the desired center in cm on the page
   output: messages to the user

description
   The program is used to determine the position to place a sequence logo so
that a particular point of the cosine wave (in bases of the nucleic acid
coordinate system) is exactly at a given point on the page in cm.  This allows
one to adjust the location of the logos so that they can overlap.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.ckhelix *)
version = 1.01; (* of ckhelix.p 1992 April 28
(* begin module describe.cluster *)
(*
name
   cluster: cluster indana subindexes into groups of duplicate entries

synopsis
   cluster(clusterp: in, subind: in, inst: in, book: in,
           pairs: out, clumps: out, output: out)

files
   clusterp: The cluster parameter file that consists of the following:
             FIRST LINE  'y' turns the flag on, 'n' turns it off
                  (debugging) allows one to look at raw data in the bags.
             The debugging flag controls the printing of the raw data above the
             regular output of the cluster program, which is created solely by
             procedure showRAWbag.  This can then be compared with the data in
             the chart for correctness.  Raw data consists of the series of
             coordinate pairs in the bag and the sides they are matched on.
             printed above the standard output structure.

             example: -  (  630,   69)   R
                      L  (  649,   88)   -  {20}  {20}
                      *************************************
                                          |   630       663
                      HUMUK               |     ----------
                                          |         34
                      HUMUPA              |     ----------
                                          |    69       102
                      *************************************

             It is important to note that the raw data will only appear in the
             pairs output file, and will not be written in clumps at all.  This
             means that parameter 3, writepairs, must also be turned on for
             this flag to be effective.

             SECOND LINE 'y' turns the flag on, 'n' turns it off
                  (showfragments) allows one to see pairs that are fragmented.
             The showfragments toggle controls printing the outputs of pairs
             with "imperfect" matches.  That is, in some cases a repeating
             sequence will match in several frames, causing repeated sequence
             matching and producing a large list of coordinate pairs.  This
             list can be shown if the parameter is turned on, but the statement
             "WARNING:  sequence pairs are overmatched" will appear if it is
             turned off.  The actual sequences will be shown in either case,
             so the comparison can always be done by hand by the user.  The
             output is excessively long, but the sequences will be shown, so
             the comparison can be done by the user.

             example:    1     acggatcgtgtgtgtgtgtgtgtgtacgatcggatcgat
                         2     acggatcgtgtgtgtgtgtgtgtgtacgatcggatcgat

             These sequences will have matches between all of the 'gt' base
             pairs, resulting in an overwhelming number of matches.  The
             maximum number of possible matches is found by taking the length
             of the sequences and dividing it by the value in the overmatched
             parameter (FIFTH LINE) times the number of instructions that
             match between any two pieces in the dbinst.  This results in
             a maximum number of matches between any two pieces.  Any pieces
             above this limit will can have their output completely shown or
             can generate a warning message (see showfragments, SECOND LINE).
	     In addition to preventing the example case, showfragments will
	     also prevent the display of any other case that may cause an
	     excessive number of matches.

             THIRD LINE 'y' turns the flag on, 'n' turns it off
                  (writepairs) controls the printing of the pairs output file.
             If writepairs is on, the original clustering pairlist will be
             printed into the output file pairs.  If it is off, this file will
             not be printed.  This parameter must be turned on to effectively
             use the debugging parameter (see FIRST LINE).

             FOURTH LINE 'y' turns the flag on, 'n' turns it off
                  (writeclumps) controls printing of the clumps output file.
             If writeclumps is on, the original clustering pairlist will be
             sent through the clumping procedures.  The output file clumps will
             contain the sequences involved in the matches on the pair in
             addition to the clumped version of the pairlist.  The clumping
             process takes an excessive amount of time for very large files,
             since the program must traverse the entire pairlist to find all
             related pairs, then put the pairs on to the clumplist, then go
             through the book and find sequences to match every instruction
             in every pair of every clump.  Although it is much easier to
             determine which pieces are true repeats through use of the clumps
             file, it is certainly possible to do so by simply using the pairs
             output file.

             FIFTH LINE any integer
                    (matchparameter) is the number of matches to be allowed
             between two instructions.  This can be determined by dividing the
             sequence length from the book by the minimum window size from the
             subindex, or a maximum number of matches between instructions can
	     be set.  An integer less than or equal to 0 will calculate maximum
	     matches using the above method.  Any number greater than 0 will be
	     used as the new maximum matches.

             example:  if the instructions call for the sequences

               piece1: get from 100 -50 to 100 +50;
               piece2: get from 200 -50 to 200 +50;

               The sequence length is 101.  If the windowsize read from the
               subindex = 15, then 6 possible matches can occur between these
               two instructions (101 div 15 = 6).

	     The TOTAL number of matches between two pieces is found by
	     multiplying matchparameter by the number of instructions in a
	     given pair.  If a piece has more matches than this, it is
	     considered to be overmatched, the bag will not be printed, and the
	     statment 'WARNING: sequence pairs have too many matches.' will
	     appear.  Overmatched pairs can be printed using the showmatches
	     parameter (see SECOND LINE).
   subind: a subindex from the indana program matching the inst and the book
   inst: a set of delila instructions that correspond to the book
   book: a delila book that contains the sequences being clumped
   pairs: the output list of paired sequences
   clumps: the output list of clumped sequences
   output: When errors occur, the program halts and produces an error message

description
   Duplicate entries in the subind subindex are clustered into a unified list
   of pairs and copied to output files as sequence numbers, lengths, and
   sequence base pairs.

   Pairs are determined by the indana program, which delegates sequence
   similarities with an '*'.  Cluster takes the subindex and shows the
   coordinate range and length of the similarity by pairs.  The pairs file is
   a list of relationships between two sequences, the clumps file takes this
   list of pairs and groups related ones together. The seqalign modules of the
   program then access the book and get the corresponding sequences to print
   out with the instruction number and piece name.

documentation
   none

see also
   index.p, indana.p

author
   R. Michael Stephens

bugs
   None currently known.

technical notes
   The read for the indana window size is based on the '[' character before
   the number in the subind heading.  Any changes to indana that alter this
   format must be reflected in the getwindowsize procedure.
*)
(* end module describe.cluster *)
version = 5.06; (* of cluster.p 1992 September 18
(* begin module describe.coda *)
(*
name
      coda: composition file to data for genhis

synopsis
      coda(cmp: in, data: out, codap: in, output: out)

files
      cmp: a composition, the output of program comp
      data: identification lines are followed by
         the number of occurences of each oligo and the sequence
         of the oligo, one pair per line.  the form of the
         file can be changed using the parameters in codap.
      codap: parameter file.  four parameters, one per line.
         1. composition depth to be used in the data file (integer)
         2. the least frequent oligo to record in data (integer).
         3. the most frequent oligo to record in data (integer).
         4. if the first character is 'b', the number of each
         oligo is given before the oligo, 'a' means after.
         'n' means do not give the number.  's' means the data file will be
         used as input to the search program.  no numbers are given and
         commands to search are made which will result in a list
         of the locations of the selected oligos.
            if parameters 2 to 4 are missing they default to
            0 100000 b.

      output: messages to the user

description
      coda converts a composition file from the comp program
      into a list of oligos.  unlike the original composition
      file, this list may contain all oligos of the length desired
      (to save space, comp removes an n-long oligo when the
      two n-1 long oligos inside it do not exist).  however,  coda
      can be told to only include frequent or infrequent oligos
      using the parameter file.  two ways to use the data are:
      1. use the data file as input to genhis to determine the
      distribution of the composition.
      2. use the 's' feature to generate instructions for the
      search program.  search converts the list of oligos to
      locations in a sequence.  unshi then is used to remove
      the extra blanks and genhis then gives a map of the
      locations of rare or common oligos.

example
      file: datat7

see also
      comp, genhis, search, unshi

author
      thomas dana schneider

bugs
      none known
*)
(* end module describe.coda *)
version = 2.04; (* of coda, 1986 dec 15
(* begin module describe.code *)
(*
name
      code: find the comment density of a pascal program

synopsis
      code(fin: in, output: out)

files
      fin: a pascal source code.
      output: a report on the comment density of the pascal program.

description
      with the comment density program, you can find out how much of
      your program is devoted to comments.  in general, the better programs
      will have more comments than those that are poor.  the program gives
      you the percent of characters devoted to comments.  a typical value
      should probably be around 30 percent of the characters devoted to
      description.  suggested places to put comments are in the delman
      manual in the module delman.guide.programming.

author
      thomas d. schneider

bugs
      the program does not keep track of blanks, so one's style with
      blanks could affect the percentage.

*)
(* end module describe.code *)
version = 2.06; (* of code 1986 dec 9
(* begin module describe.column *)
(*
name
   column: pull defined column from input

synopsis
   column(input: in, columnp: in, output: out)

files
   input:  file with several columns of data separated by spaces
   columnp: parameters:  one line:  which column to extract
      Lines in input that start with '*' are simply copied to the output.
   output: messages to the user

description
   The column program allows one to extract columns from a dataset.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.column *)
version = 1.03; (* of column.p 1992 September 16
(* begin module describe.comp *)
(*
name
      comp: determine the composition of a book.

synopsis
      comp(book: in, cmp: out, compp: in, output: out)

files
      book: the sequences;
      cmp: the composition, determined for mononucleotides up to
            oligonucleotides of length "compmax", see file compp;
      compp: parameter file used to set the length of the oligonucleotides for
            which the composition is to be determined ("compmax");  that number
            must be the first thing in the file; if the file is empty
            compmax is set by default to the constant "defcompmax";
      output: for messages to the user.

description
      counts the number of each oligonucleotide (from length 1 to compmax) in
      the book and prints that to file "cmp".  the output is printed in order
      of increasing length of oligonucleotide (i.e., first the monos, then the
      dis, ...).  if there are no occurences of an oligonucleotide, but its
      one-shorter parent did occur, it will be given a zero.  none of its
      descendants will be printed in the composition file.

see also
      compan, histan

authors
      gary stormo and tom schneider

bugs
      none known

technical note
      the algorithm is an interesting application of linked lists.  the
      composition is stored as a tree, and a number of "spiders" climb the
      tree during its construction.

*)
(* end module describe.comp *)
version = 5.25; (* of comp, 1988 oct 10 *)
(* begin module describe.compan *)
(*
name
      compan: composition analysis.

synopsis
      compan(cmp: in, anal: out, companp: in, output: out)

files
      cmp: the input composition, which is the output of program comp;
      anal: the output analysis of this program;
      companp: for parameters; should contain a single integer which specifies
         the maximum level for which the composition is analyzed.  the
         maximum allowed level is 4, or the maximum level for which the
         composition was determined.
      output: for user messages;

description
      calculates chi squared from a composition using:
         1) assumption of equal frequencies to calculate mono, di, tri
            and tet expecteds;
         2) mono frequencies to calculate di, tri  and tet expecteds;
         3) di frequencies to calculate tri and tet expecteds;
         4) tri frequencies to calculate tet expecteds;
      the partial chi squared values are given for each oligo.
      the 'information' content of the composition is also calculated,
      using the standard information theory definition:
         information = -sum(frequency * log(frequency)),
      where the sum is over each oligonucleotide of a given length
      and the log is taken to the base 2.  this gives the information
      in bits.

see also
      comp

author
      gary stormo

bugs
      the program cannot do calculations for compositions larger than 4
*)
(* end module describe.compan *)
version = 3.23;  (* of compan, 1988 oct 10
(* begin module describe.concat *)
(*
name
      concat: concatenate files together

synopsis
      concat(afile: in, bfile: in, abfile: out, output: out)

files
      afile: the first file to be copied to abfile
      bfile: the second file to be copied to abfile
      abfile: the concatenation of afile and bfile
      output: messages to the user

description
      concat joins two files, afile and bfile, into a single file
      named abfile.  afile is first copied to abfile, followed by bfile.
      a warning is given to the user if either afile or bfile is empty, but
      in this case, the program copies the other file to abfile anyway.

examples
      one can use concat to join delila instruction sets in the cyclic
      teaching of the perceptron (see our third nar paper).  note that
      delila will not accept several titles in the instructions, so be sure
      that one of the two sets has no title, or remove it by hand.

author
      billie lemmon and thomas schneider

bugs
      none known
*)

(* end module describe.concat *)
 version = 1.08; (* concat 1986 dec 9
(* begin module describe.copy *)
(*
name
      copy: copy one file to another file

synopsis
      copy(fin: in, fout: out, output: out)

files
      fin: the file to be copied
      fout: the copy of fin
      output: messages to the user

description
      copy makes one copy of the file fin on the file fout.  you may
      discover that this is a simple task that you often want to do, but
      that your system does not provide an easy way.

see also
      shift

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.copy *)
version = 1.06; (* of copy.p 1985 march 9 *)
(* begin module describe.count *)
(*
name
      count: counts the amount of sequence in a book

synopsis
      count(book: in, list: out, output: out);

files
      book: any book from the delila system
      list: the number of bases in each piece and the total number of bases
      output: messages to the user

description
      count is a tiny tool, much like a tooth pick, that is handy to have
      around.  the count is based on the coordinate system of each piece,
      not on the actual number of bases.

author
      thomas d. schneider

bugs
      if the number of bases does not match the coordinate system, then
      no warning is given to the user.

*)
(* end module describe.count *)
version = 3.07; (* of count.p 1991 Aug 6
(* begin module describe.cybmod *)
(*
name
      cybmod: specific module library for the cyber computer

synopsis
      cybmod(output: out)

files
      output: where the date and time will appear.

description
      cybmod contains modules that will replace corresponding modules in
      the other module libraries which are cyber-system dependent. this
      will allow easy transportation of the delila system to cyber computers
      running under kronos.

documentation
      moddef, delman.describe.module

see also
      delman.describe.delmod, moddef, delman.describe.module

see also
 delmods, prgmods, matmods, vaxmods

author
      thomas d. schneider

bugs
      none known

technical notes
      the datetime package required a const 'namelength' and a type 'alpha'.
      these are part of the book.const and book.type modules of delmod, and
      are identical to those types and consts.  note:  programs which use
      the datetime package must have these types and consts either from
      delmod or manually declared.
*)
(* end module describe.cybmod *)
version = 1.02; (* of cybmod 1986 nov 11'*)
(* begin module describe.da3d *)
(*
name
   da3d: diana da file to 3d graphics

synopsis
   da3d(da: in, scene: out, output: out)

files
   da: output of the diana program; position to poistion correlations
   da3dp: parameter file to control scene.
           horizontal: shift of graph horizontally (in cm)

           vertical: shift of graph vertically (in cm)

           xlocation: location of viewer in bases

           ylocation: location of viewer in bases

           zlocation: location of viewer in bits

           magnify: magnification factor for whole scene, 1 = no change.

           xmagnify: magnification factor for x axis only.

           ymagnify: magnification factor for y axis only.

           zmagnify: magnification factor for z axis only.

           datacolumn: column of da which to use for the graph.

   scene: 3D scene of the da data according to da3dp.  Result is in
      PostScript
   output: messages to the user

description
   Show the position to position correlation data in three dimensions.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.da3d *)
version = 1.17; (* of da3d.p 1991 December 11
(* begin module describe.dalvec *)
(*
name
   dalvec: converts Rseq rsdata file to symvec format

synopsis
   dalvec(rsdata: in, dalvecp: in, symvec: out, output: out)

files
   rsdata: data file from rseq program

   dalvecp: parameters to control dalvec
      If empty, then the normal sequence logo will be produced.
      If the first character of the first line is a 'c', then a chi-logo
      is produced.  The height of this logo is the information.  The
      heights of the individual letters are, however, not the frequencies,
      but rather their partial chi-square values.  The expected value
      is 1/4 of the number of characters.  This is compared to the observed
      value by:
        partial chi-square =(observed - expected)^2/expected
      These partial values are normalized and placed in symvec in place of
      the relative frequencies.  Thus the significance of each letter is
      used.  When the observed is less than expected, the reported value
      is made negative.  Makelogo prints these characters upside down.
   symvec: reformating of the rsdata file for input to the makelogo program.
      A series of header lines begining with asterisk ("*") are produced.
      The next line contains one integer which is the number of symbols
      in the vector (4 for DNA or RNA, 20 for proteins).
      After this, the format of the file is a series of entries.  Each entry
      has two parts.  The first part is on one line and contains
         position total information
         position: the position number
         total:  the sum of the values in the vector
         information: the information content of the vector.
      The remaining parameters on the line are from the rsdata file:
         rs: sum of rsl
         varhnb: variance of rsl
         sumvar: sum of varhnb
         ehnb: 2-e(n)

      The second part consists of a list of 4 integers, representing the
      the numbers of bases or amino acids at the position in an aligned
         set of sequences.

   output: messages to the user

description
   Convert the rsdata file from rseq into a format that the makelogo program
   can use.  The format is a 'symbol vector'.

   ChiLogos: If you leave the parameter file empty, then the standard sequence
   logo will be created.  However, if the first letter of the file is a 'c',
   then a new kind of logo will emerge from makelogo: the chi-logo.  The height
   is as it was before.  The height of the individual letters is different,
   instead of being proportional to the frequency of the letter, it is
   proportional to the significance of the letter, based on the chi-square
   test.  That is, the expected number of letters is the number of letters at
   that position, n(l) divided by 4 (for simplicity!).  The observed number
   comes from the rsdata file.  The partial-chi square is
   (observed-expected)^2/expected.  Note that the sum of the partials is the
   normal chi-square.  So bases that contribute strongly get big.  Also, bases
   that are under represented are printed UPSIDE DOWN, so you can (usually)
   tell you have a chilogo at a glance.  The chilogo allows one to see the
   importance of the infrequent letters.  The technical mechanism for making a
   letter upside down is to have its number negative in the symvec file.

author
   Thomas D. Schneider

examples

see also
   rseq, makelogo

bugs

   The program originally only created a vector that contained the characters
   of the alphabet, so the output was called an 'alvec'.  To reflect the use of
   symbols, the name of the output file was changed to symvec, but I like
   'dalvec', and 'dsymvec' is so awkward that I decided to keep the name
   dalvec.

*)
(* end module describe.dalvec *)
version = 2.14; (* of dalvec.p 1991 October 1
(* begin module describe.dbbk *)
(*
name
      dbbk: database to delila book conversion program

synopsis
      dbbk(db: in, l1: out, changes: out, output: out)

files
      db: contains one or more complete entries from either the EMBL
         or GenBank genetic sequence data bases.  These entries may be
         obtained by using the original libraries or by using an entry
         extraction program.  Dbpull is the delila program for data base
         accessing; to get complete entries the instruction 'all' must
         have been used in the dbpull fin file.  (See delman.use.dbpull)
      l1: each db entry is represented in l1 by a delila style
         entry containing information extracted from the db entry.
         All of l1 has the biologically oriented structure of
         a standard delila book.  The first line of l1 is not part
         of an entry, but contains the computer system date and the
         title of the book.
      changes: Delila programs cannot handle sequences that have
         ambiguities because Delila was designed on the assumption
         that people would finish their sequences.  Unfortunately
         this is not true, and the databases contain bases other
         than acgt to indicate ambiguity.  These are converted to
         "a" and the cases are reported in this file.
      output: messages to the user.

description
      This program converts GenBank and EMBL data base entries into a
      book of delila entries.  The organism name is fused together
      with a period and is used for both organsim and chromosome names.
      Organism and chromosome only change if the name changes in db.

see also
      delila, dbpull, libdef, catal

author
      Matthew Yarus

bugs
      databases do not have enough data on genes within each piece to make
      a book with gene sections.

      The changes file is a design bug in Delila.

      Genus names are limited to genuslimit (a constant) to avoid
      names longer than the standard Delila limit.

technical notes
      dbbk is known to convert GenBank entries from July 1989.
      It may not work on later versions.
*)
(* end module describe.dbbk *)
version = 3.13; (* of dbbk.p 1992 December 10
(* begin module describe.dbcat *)
(*
name
      dbcat: database catalog production and sorting program.

synopsis
      dbcat (dbl1, dbl2, dbl3, dbl4, dbl5, dbl6, dbl7, dbl8: inout,
             ecat: out, gcat: out, output: out )

files
      dbl1, dbl2, dbl3, dbl4, dbl5, dbl6, dbl7, dbl8:  text libraries that
         contain entries of either embl(european molecular biology labratory)
         or genbank(genetic sequence data bank) types. in both cases the
         general format is a series of entries, each entry beginning with a
         twenty letter identification code name for a particular genetic
         sequence followed by many lines of other relevant information. all
         lines begin with a two or three letter code identifying the purpose
         of the line. however, the two entry types have different line codes
         and contain similar but not identical kinds of information.
      ecat:  catalog of embl type library entries. each catalog entry
         contains the location of the beginning of the library entry, a
         number signifying which library the entry is found in, and the
         special identification code of the entry's genetic sequence.
      gcat:  same as ecat except containing information on genbank entries.
      output:  messages to the user.

description
      this program makes catalogs for use in the program dbpull. in
      addition to sorting catalog entries in the innate alphanumeric
      order of the computer it is run on, dbcat marks both catalogs and
      libraries with the date of the run so that dbpull never uses mis-
      matched sets of information.

documentation
      delman.describe.dbpull, embl and genbank libraries.

see also
      loocat, catal, dbpull.

author
      matthew yarus

bugs
      none known

technical notes
      dbcat functions on genbank(tm) release 9 (june 1, 1983)
*)
(* end module describe.dbcat *)
version = 2.10; (* of dbcat.p 1989 July 11
(* begin module describe.dbfilter *)
(*
name
   dbfilter: filter GenBank databases to remove unwanted entries  

synopsis
   dbfilter(input: in: output: out, dbfilterp: in)

files
   input: a database of GenBank entries 
   output: database after the filtration.  When errors occur, the program halts
           and produces an error message at the end of the output file. 
   dbfilterp: parameters to control the program
        FIRST LINE: the name of the organism to use, consisting of two parts
        (eg, Homo sapiens).

description
   GenBank entries in input that contain the requested organsim are copied
   to output.

   The GenBank ORGANISM contains the two part genus/species name, such as:

  ORGANISM  Homo sapiens

   Entries of an unwanted ORGANISM type are not copied from input to output.
   Those of the desired type are transferred directly.

examples
   If dbfilterp contains:
      Homo sapiens
   then only those entries with the ORGANISM type Homo sapiens will be copied 
   into output. All others will be filtered out.

documentation
   none

see also
   dbinst.p dbbk.p

author
  R. Michael Stephens 

bugs
   Error messages are buried at the bottom of the output file.

technical notes
   Constant maxlines determines the greatest number of lines that can be handled
   between LOCUS and ORGANISM.

*)
(* end module describe.dbfilter *)
version = 1.08; (* of dbfilter.p 1992 November 1
(* begin module describe.dbinst *)
(*
name
   dbinst: extract Delila instructions from a GenBank database

synopsis
   dbinst(db: in,
          binst: out, einst: out,
          oinst: out, sinst: out,
          olength: out, slength: out,
          dbinstp: in, locuslist: out, missing: out, output: out)

files
   db: a set of GenBank entries
   binst: instructions for finding the beginning of a feature
   einst: instructions for finding the ending of a feature
   oinst: instructions for finding the whole feature, called the "object".
      They are given in the form "from begin + f to end + t" where f and t are
      the "from" and "to" parameters given in dbinstp.
   sinst: instructions for finding the regions between features, called
      the "space".  They have the same form as those of oinst.
   olength: list of object lengths
   slength: list of space lengths
   dbinstp: parameters to control the program
     First line: the name of the feature to use.
     Second line: two integers, the base "from" and the base "to" relative to
        the alignment point to write the instructions.
        If "from" is larger than "to" then generic names "before" and "after"
        are written.  This allows one to make a generic file of instructions
        to be copied and edited later.
     Third line: 4 characters without spaces that control which instruction
        files are to be written.  To have all 4 on, use 'beos', for begin, end,
        object and space.  Any other character means that the corresponding
        file will not be written.  The file will be rewritten however.
     Fourth line: 2 characters without spaces that control which length
	files are to be written.  To both on, use 'os', for object and space.
	Any other character means that the corresponding file will not be
	written.  The file will be rewritten however.
     Fifth line:  If the first character is 'r' then remove obviously
	duplicated instructions and object or space lengths.  When alternative
	splicing occurs, GenBank records the endpoint several times, so that
	the sequence instructions are identical.  By using this toggle switch,
	such cases are eliminated.
     Sixth line:  If the first character is 'f' then the coordinates of the
        instruction are written whether or not the object is off the end
        of the sequence.  This allows one to pick up objects that are
        partially on a piece.

        If the first character is 's' then select against the feature if
        either end is missing.  This makes the length list correspond
        to the instruction set.

     Seventh line:  Alignment shift.  This integer is added to the
        from and too coordinates of the instructions written out.
        Normally this should be 0.  An example helps.  Normally, if the zero
        of splice donor sites is defined the first base on the intron,
        then if one is writing instructions based on exon coordinates
        the zero base will be 1 too low.  By making the alignment shift
        1, the instructions written out will match the expectations of
        other programs. 
        Note: object coordinates are shifted accordingly; this may
        not be quite what you want if you are using them from the olength
        file!  However, the length is not affected.

   locuslist: a list of all the loci in the db that have features of interest.
      This list can be used with dbpull to create reduced databases containing
      only those entries that contain the features we want.
   missing: Features that are listed under the database COMMENT are listed
      here.  These are "EMBL features not translated to GenBank features".  We
      do not consider these to be reliable.  They are NOT included in the binst,
      einst or olength, slength instructions.
   output: messages to the user

description
   The GenBank entries in db are scanned, and Delila instructions are
   generated, according to a desired feature table item.  Four kinds of
   instruction are made:  beginning, ending, object and space.  Beginning
   appears only if the data for the beginning of the feature is in the db.
   Ending appears only if the data for the ending of the feature is in the db.
   Object appears only if both the beginning and ending are there.  Space only
   appears if there was an ending to the previous feature, and the current
   feature has a beginning.  Thus object and space instructions is guaranteed
   to be a "natural" length.

   The names for the instructions are determined as follows.  The GenBank
   ORGANISM contains the two part genus/species name, such as:

  ORGANISM  Homo sapiens

   The parts are joined into "Homo.sapiens", and this becomes the name of the
   organism and chromosome in the instructions.  The instructions for organism
   and chromosome only change when the genus/species name changes in db.  The
   LOCUS name of the entry is picked up and used as the piece name.  These
   naming conventions are the ones generated automatically by the dbbk program,
   so one need not think about it most of the time.

   In each entry, lines of the form:

    pept    <     1       46     Ig V-R-H region protein, exon x

   are located and used to generate Delila "get" statements.

   If a "<" appears before the first number, then no instruction is
   written to binst, since the beginning point is before the GenBank sequence.

   If a "<" appears before the second number, then no instruction is
   written to einst, since the ending point is after the GenBank sequence.

   If a "<" or ">" appears in the db, then no object instructions or
   lengths are written.

   If a ">" appears in the previous feature or ">" appears in the current
   feature, then no space instructions or lengths are written.

   So for the above example, only one Delila instruction would be written:

        get from 46 -10 to 46 +20;

   if the dbinstp contained -10 20, and

        get from 46 before to 46 after;

   if the dbinstp contained 10 -20.

   where "before" and "after" are replaced by the integers from dbinstp.

examples
   If dbinstp contains:
      pept
      -10 20
   then instructions to get peptide starts (binst) and ends (einst)
   from -10 to +20 will be made.

   Instructions for the entire peptides, from -10 before the start of
   the peptide to 20 bases after will also be made.

   Instructions for the regions between peptides, from -10 inside each previous
   peptide to 20 bases into the inside of the next peptide will also be made.

documentation
   none

see also
   dbbk.p

author
   Thomas Dana Schneider

bugs
   The program does not produce the instructions for space between the first
   object and the beginning of the sequence or the space after the last object
   in the sequence.  This is possible (and perhaps should be controlled by a
   parameter) but it would not produce "natural" lengths because those space
   lengths depend on the length of the reported sequence.

   It is not clear that spaces are done properly anymore.  Possible bug
   at "SPACE PROBLEM".

   Genus names are limited to genuslimit (a constant) to avoid names longer
   than the standard Delila limit.

technical notes
   The expected column locations of the complement flag in the database, (the
   'before end of piece' and the 'after end of piece' flags) are given in the
   program constants.

*)
(* end module describe.dbinst *)
version = 3.39; (* of dbinst.p 1992 September 16
(* begin module describe.dblo *)
(*
name
      dblo: look at the catalogue of a genbank/embl database

synopsis
      dblo(cat: in, list: out, output: out)

files
      cat: a catalogue from program dbcat
      list: a listing in tabular form of the catalogue
      output: messages to the user

description
      the program dbcat creates a machine readable catalogue of the
      locations of entries in a genbank /embl database.  one cannot
      read this directly because it is a compressed internal format
      of the computer.  (that is, it is a file of records.)  to
      read the file, one must convert it into normal characters, which
      is what dblo does.

author
      thomas schneider

see also
      dbcat, dbpull, loocat, delila

bugs
      none known

*)
(* end module describe.dblo *)
version = 1.09; (* of dblo 1989 July 11
(* begin module describe.dbpull *)
(*
name
      dbpull: database extraction program.

synopsis
      dbpull (fin: in, fout: out,
              dbl1, dbl2, dbl3, dbl4, dbl5, dbl6, dbl7, dbl8: in,
              ecat: in, gcat: in, output: out)

files
      fin:  User requests for extractions from libraries.  Each request
         takes up a single line and consists of a genetic sequence identi-
         fication code followed by either a single special extraction code
         or a series of line code requests.  If an entry request is to be
         found in embl format the request line must have a line containing
         simply 'embl' somewhere above it.  A line containing only 'genb'
         will instruct dbpull to look only for genbank format entries on
         the following request lines.  Important note: the exact form of
         fin instructions is found in delman.use.dbpull.instructions.

         If no request is given, then ALL is assumed.  This means that
         the program will now run using a raw list of entry names.
      fout:  contains fulfilled requests in the same entry order as fin.
         this file may serve, itself, as a database library for dbpull as
         long as 'id ' or 'loc' occur with every request.(one of these two
         line codes identifies the beginning of each entry and holds its id)
      dbl1-dbl8:  same files, in the same order, as dbcat. see delman.
                  describe.dbcat.
      ecat:  same as in dbcat also.
      gcat:  same as in dbcat also.
      output:  messages to the user.

description
      this program uses catalogs generated by the dbcat program to quickly
      extract all or part of embl or genb type entries from data base lib-
      raries. the user may choose one of two special requests('all', which
      pulls out an entire entry or 'raw', which pulls out only the genetic
      sequence) or s-he may simply request a number of line codes. the wild-
      card character '*' represents any number of unspecified characters
      in an id request. this allows one fin line to extract several entries
      whose ids have characters in common. the id 'every' extracts all ids
      it is compared to. dbpull also checks the production dates of all the
      catalogs and libraries to see that they are consistent.

documentation
      dbhelp, delman.describe.dbcat, embl and genbank libaries.

see also
      dbcat.

author
      matthew yarus

bugs
      none known

technical notes
      1:dbpull functions on genbank(tm) release 9 (june 1, 1983). 2: if the
      value of the constant checknum is increased, dbpull will do a more
      complete check of its catalogs.
*)
(* end module describe.dbpull *)
version = 2.41; (* of dbpull.p 1989 November 14
(* begin module describe.decat *)
(*
name
   decat: break a file into 10 files

synopsis
   decat(input: in, decatp: in,
         f0,f1,f2,f3,f4,f5,f6,f7,f8,f9: out,
         output: out)

files
   input:  multiple line detailed description of file 1, etc
   decatp: parameters.  one integer, the number of bytes to put into
     each file.
   fx: input split into parts f1..fx
   output: messages to the user

description
   Break a file into parts.  Any excess goes into the last file.
   The files are split at the next line after the size given in decatp
   has been exceeded.  This avoids broken lines, but it means that the
   user must leave a safety.

   Purpose: to be able to send files larger than 50000 bytes.
   The mailer at boulder objects to ones larger.
   Test for correctness: cat f0 ... f9 >x; diff of input and x should be empty.

author
   Thomas Dana Schneider

bugs
   fixed number of files.

*)
(* end module describe.decat *)
version = 1.13; (* of decat 1989 September 25
(* begin module describe.decom *)
(*
name
   decom: remove comment starts from within a comment

synopsis
   decom(input: in; output: out)

files
   input:  a program with comments within comments.
   output: the same program with internal comments neutralized.

description
   At the moment there are many cases in the delila system where the construct:
              ( ( )
   exists (where '(' means the begin of a comment and ')' means the end).
   This is a result of the version mechanism of the module program.
   Until this is changed, these will hang around.  The Sun compiler
   gives a warning about these, and to remove the warnings, the '*'
   after the second '(' can be removed by this program.

see also
   module.p

author
   Thomas Dana Schneider

bugs
   WARNING:  Some programs have comment starts inside quotes.  DECOM
   IS NOT SMART ENOUGH TO AVOID CHANGING THESE.  If they exist, decom
   will mess up your program.  Compare the output of decom with the
   input before you accept the results.

*)
(* end module describe.decom *)
version = 1.03; (* of decom, 1988 Dec 14
(* begin module describe.delila *)
(*
name
      delila: the librarian for sequence manipulation

synopsis
      delila(inst: in, book: out, listing: out,
             lib1: in, cat1: in,
             lib2: in, cat2: in,
             lib3: in, cat3: in,
             output: out, debug: out)

files
      inst: instructions written in the language delila that tell the
         program delila what sequences to pull out of the library.
      book: the set of sequences pulled out of the library.
      listing: the instructions are listed along with errors found or
         actions taken.
      lib1: the first library from which to obtain sequences
      cat1: the first catalogue, corresponding to lib1
      lib2: the second library
      cat2: the second catalogue, corresponding to lib2
      lib3: the third library
      cat3: the third catalogue, corresponding to lib3
      debug: traces through the actions taken, for debugging delila
         (only produced if constant debugging is true.)
      output: messages to the user

description
      delila is a data base manager for nucleic acid sequences.  it takes
      a set of instructions, written in the language delila (deoxyribonucleic
      acid library language) and a large set of sequences called a library.
      the output is a listing of the actions taken (or errors) corresponding
      to the instructions, and a "book" containing the sequences desired.

examples
      see the documentation

documentation
      libdef (defines delila), delman.intro, delman.use, delman.construction

see also
      catal, loocat

author
      thomas d. schneider, gary d. stormo and paul morrisett
      useful suggestions by jeff haemer

bugs
      there are many known bugs in delila.  most are related to extracting
      linear fragments of circular sequences.  we are designing a second
      version of delila which should solve these problems.
         the following features are not available in this program:
      recognition classes and enzymes, markers,
      automatic printing to the book of structures that intersect a piece,
      get all (for org, chr, rec and enz), get every and if.

*)
(* end module describe.delila *)
version = 1.77; (* of delila.p 1989 November 14 *)
(* begin module describe.delmod *)
(*
name
      delmod: delila module library

synopsis
      delmod(book: in, output: out)

files
      book: any book from the delila system, or an empty file.
      output: the version of delmod is printed along with test results
         if the book is not empty.  Successful compilation and running of the
         program indicates that the modules are correct.

description
      Delmod is a collection of modules used by delila system programs.
      The easiest way to obtain a list of the modules is to run the module
      program using delmod for both sin and modlib (with dummy files for
      the other input).  There are a number of information modules, indicated
      by names beginning with 'info.'.  There are also a number of packages of
      modules that pickup other modules.  These begin with 'package.'.  You
      should note that some modules are constants, others types, etc.  These
      must remain in their proper location to allow compilation.

examples
      A good book to use to test delmod is ex0bk.

see also
      module

author
      Thomas D. Schneider and Gary D. Stormo

bugs
      none known

*)
(* end module describe.delmod *)
version = 'delmod 6.64 93 Jan 10 tds/gds'
(* begin module describe.diana *)
(*
name
   diana: dinucleotide analysis of an aligned book

synopsis
   diana(book: in, inst: in, dianap: in, da: out, output: out);

files
   book: the standard delila book that is to be analyzed
   inst: the instruction set that was used to make the book
   dianap: Parameters to control the program.  If there are two integers on the
      first line, then they determine the from-to range over which to do the
      analysis.  If from > to or the file is empty, the range from book and
      inst is used.
   da: the di-analysis file that contains the output triangular array.
          column 1: Tells what that row of data is.  There are three choices:
                n  : the column is a normal data element in the triangle.
                d  : the column is an element on the diagonal of the triangle,
                   where the two coordinates are equal.
                i  : the column contains the information content of a triangle.
	  column 2: Tells what the dinucleotide is.  There are 16 dinucleotides
		aa, ac, ag, at, ... , tt as well as `in' or `id', which denote
		columns that contain information content.  `id' means that the
		information is for the diagonal, while `in' are off-diagonal.
		Combining in with id gives the entire information triangle.
          column 3: The position on the sequence that corresponds to the x
                coordinate on a Cartesian graph.
          column 4: The position on the sequence that corresponds to the y
                coordinate on a Cartesian graph.
          column 5: A column of constants usable by xyplo.
          column 6: A column of constants usable by xyplo.
          column 7: The number of data points at a position
          column 8: The frequency of the dinucleotide in column 2 at position
                (column 3, column 4).  If the column is an information column,
                then this is the information at that position.
          column 9: One minus the frequency (or 1 - column 8).  If this is
                an information column, then this is the chi-square value.
	  column 10: A column of constants usable by xyplo.  If this is in an
		information row, then this column is the number of degrees of
	        freedom at that position
          column 11: A column of constants usable by xyplo.
	  column 12: A column of constants usable by xyplo.
	  column 13: Information from column 8 normalized by dividing by
                the maximum possible information (4 for non-diagonal, 2 for
                diagonal).
	  column 14: 1 minus column 13
	  column 15: correlation coefficient for x to y on information (in or
                     id) rows
   output: error messages from the program

description
   Diana goes through a book and looks at relationships of dinucleotide pairs
   within the sequences.  The output of the program is in the form of a
   triangular array.  For every pair of coordinates, the frequency of the
   dinucleotide pair is tabulated.  The program also calculates the chi-square
   for the dinucleotide given the expected mono-nucleotide frequencies.  The
   correlation information is calculated as the information in the dinucleotide
   frequencies less the information in each of the two mononucleotides.  The
   output of the program may be sent into xyplo for graphics.  Note the
   distinction between the `in' and `id' information columns.  Information in a
   binding site is usually going to appear on the diagonal, so by making this
   distinction, one can eliminate the diagonal information peaks for statstical
   analysis with genhis.

documentation
   none

see also
   xyplo.p, alist.p, genhis.p

author
   R. Michael Stephens

bugs
  The program has no error correction for small sample size. It assumes that 2
  bits is the maximum uncertainty for single bases and that 4 bits is the
  maximum uncertainty for dinucleotides.

technical notes
*)
(* end module describe.diana *)
version = 1.77; (* of diana.p 1992 June 12
(* begin module describe.difint *)
(*
name
   difint: differences between integers

synopsis
   difint(input: in, output: out);

files
   input: a set of integers, one per line
   output: the difference between each integer and the previous one.

describe
   lines that begin with an asterisk ('*') are first copied to the output.
   then the difference between each integer in input and the previous one is
   given to the output.  the program acts as if the integer before the first
   one is zero.

author
   thomas dana schneider

bugs
   none known

*)
(* end module describe.difint *)
version = 1.05; (* of difint, 1986 dec 2 *)
(* begin module describe.digrab *)
(*
name
   digrab: diagonal grabs of diana data

synopsis
   digrab(input: in, ii: in, xyin: out, xyplop: out, output: out)

files
   input: User defines the value of n.
   ii:  output of the diana program, filtered for information lines only.
   xyin:  input to the xyplo plotting program, extracted lines.
   xyplop:  parameters controlling xyplo
   output: messages to the user

description
   This program extracts lines that have the form (x,x+n) from the da output of
   diana.  The result may be plotted with xyplo.

examples

documentation

see also
   diana.p, xyplop.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.digrab *)
version = 1.04; (* of digrab.p 1990 August 10
(* begin module describe.dirty *)
(*
name
   dirty: calculate probabilities for dirty DNA synthesis

synopsis
   dirty(dirtyp: in, distribition: out, xyin: out, output: out)

files
   dirtyp: parameter file.
      one line giving the number of random bases that will be used (r).
      one line giving the average number of changes desired (n)
   distribution: the distribution of numbers of changes at the
      peak for n
   xyin: Graphics output of the program.  The input to xyplo for plotting.
      The graph gives three curves against the independent variable p,
      which is the probability of getting the correct base and randoms
      is the number of random bases:

      o = probability of only one base changed, as randoms (1-p)p^(randoms-1)
      m = probability of one or more bases changed: 1 - p^randoms
      n = probability of n bases changed

      I have not found this output to be too useful; I concentrate
      on the distribution file.
   output: messages to the user

description
   If one is designing a randomized ("dirty") DNA synthesis, how
   heavily should it be randomized?  To use this program, pick the size
   of the region you want to randomize, r.  Then make a guess at the
   average number of changes you want over the region, n.  Put r and n
   into dirtyp and run the program.  Look at the distribution file.
   the line for n=0 is the frequency that you will get back the
   original sequence.  You must chose whether this is tolerable.  For
   example, when I synthesized the T7 promoters, I knew that I could
   find at least 1 promoter in 100 clones by toothpicking, and I was
   willing to toothpick thousands.  This way I was sure to get some
   positives, even if they were the original sequence.  (As it turned
   out, the frequency of functional promoters was much higher than
   1%.)  If you have a strong selection, you could make this a small
   number, by increasing the number of changes per clone.  With more
   changes per clone you will get more data from the randomization, so
   make it as high as you can tolerate.

   The program calculates the ratio of bases to random bases.  In the
   experiment described in the NAR paper, the technician put 4 drops of
   the appropriate base with 1 drop of the equiprobable mix.  This made
   the dirty bottle.

example

This is the analysis used in the NAR paper.  With the dirtyp file containing:

27    the number of random bases that will be used.
4     the number of changes desired (n)

the distribution file is:
* dirty 2.38
* distribution of number of changes calculated from binomial
* 27 random positions
*  4 average number of bases changed
* p = probability of correct base = 0.85185185
* fraction of [base] : 0.80246914
* fraction of [random n] : 0.19753086
*
* ratio of [base] to [random N]: 4.06250000
*
* TO DO THE SYNTHESIS, 
* add one part of an equimolar mixture of the 4 bases
* to 4.06250000 parts of the "wild type" base
*
* In the following table,
*    n = number of changes
*    f = frequency of n changes
*    s = running sum of frequencies f (should approach 1.0)
* In the first row, where n=0, f is the frequency of wild type sequences
*
n = 0   f = 0.01317741   s = 0.01317741
n = 1   f = 0.06187652   s = 0.07505392
n = 2   f = 0.13989473   s = 0.21494866
n = 3   f = 0.20274599   s = 0.41769465
n = 4   f = 0.21156103   s = 0.62925568
n = 5   f = 0.16924883   s = 0.79850451
n = 6   f = 0.10792679   s = 0.90643130
n = 7   f = 0.05630963   s = 0.96274093
n = 8   f = 0.02448245   s = 0.98722338
n = 9   f = 0.00898872   s = 0.99621210
n =10   f = 0.00281386   s = 0.99902596
n =11   f = 0.00075629   s = 0.99978226
n =12   f = 0.00017537   s = 0.99995763
n =13   f = 0.00003519   s = 0.99999282
n =14   f = 0.00000612   s = 0.99999894
n =15   f = 0.00000092   s = 0.99999986
n =16   f = 0.00000012   s = 0.99999999
n =17   f = 0.00000001   s = 1.00000000
n =18   f = 0.00000000   s = 1.00000000
n =19   f = 0.00000000   s = 1.00000000
n =20   f = 0.00000000   s = 1.00000000
n =21   f = 0.00000000   s = 1.00000000
n =22   f = 0.00000000   s = 1.00000000
n =23   f = 0.00000000   s = 1.00000000
n =24   f = 0.00000000   s = 1.00000000
n =25   f = 0.00000000   s = 1.00000000
n =26   f = 0.00000000   s = 1.00000000
n =27   f = 0.00000000   s = 1.00000000

see also
   xyplo

documentation
   
   @article{Schneider1989,
   author = "T. D. Schneider and G. D. Stormo",
   title = "Excess Information at Bacteriophage {T7} Genomic Promoters
            Detected by a Random Cloning Technique",
   year = "1989",
   journal = "Nucleic Acids Research",
   volume = "17",
   pages = "659-674"}

author
  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland
  toms@ncifcrf.gov

bugs
   n must be an integer

*)
(* end module describe.dirty *)
version = 2.38; (* of dirty, 1989 March 28
(* begin module describe.dnag *)
(*
name
      dnag: graphics of dna
  
synopsis
      dnag(bdna: in, dooin: out, output: out) 
  
files 
      bdna: b- form dna coordinates.  lines beginning with '*' are ignored.
         on each line following is the coordinate of one atom.
         the first character is the kind of group:
            * P = phosphate, D = deoxyribose
            * A = adenine, G = guanine, C = cytosine, T = thymine
         the next character is blank
         the next two characters are the atom and its number
         then the locations are given, separated by spaces:
            radius (angstrom) - angle (degree) - z axis (angstrom)
      dooin: graph of dna in doodle format
      output: messages to the user
  
description 
      dnag generates a graph of DNA.

documentation
      B-DNA Cylindrical Polar Coordinates from
      S. Arnott and D. W. L. Hukins
      Biochem. and Biophys. Res. Comm 47: 1504-1509 (1972)
      "Optimised Parameters for A-DNA and B-DNA"

      M. Karplus and R. N. Porter
      Atoms & Molecules
      Benjamin/Cummings Publishing Co., Menlo Park, Ca, 1970
      p. 204-7, crystal radii
  
author
      Thomas D. Schneider 
  
bugs
      The location of the strings may not be centered exactly in the circles.
To make this easy to adjust, two fudge factors (fudgex and fudgey) are provided
as constants.

*)
(* end module describe.dnag *)
version = 1.73; (* of dnag.p 1993 January 26
(* begin module describe.domod *) 
(*
name
   domod: doodle modules
  
synopsis
   domod(input: in, output: out) 
  
files 
   input: text.  portions surrounded by .PS and .PE are
      searched for function names.  when a function name is found,
      the parameters on the same line are read.
   output: copy of input text except that the functions detected
      during reading are translated into doodle commands.

description 
      Domod contains the doodle modules.  Calls to the procedures
   cause the corresponding doodle command to be written to the output
   file.  Since this is the same as the input, the program only reformats
   the input.  That is, in UNIXease,
      domod<a>b
      domod<b>c
      diff b c
   shows no difference between b and c.  The program serves as a module
   library for the procedures that generate doodle commands.

see also
   doodle dosun

author
   Thomas D. Schneider 
  
bugs
   domod does not copy correctly outside of pictures.  Inside
   of pictures it appears to read the entire demo and copy it to
   output correctly, such that domod<demo>a;domod<a>b;diff a b
   gives no differences.
  
technical note
   The globals picxglobal and picyglobal are updated, so
   a program that does graphics using these calls can use these
   variables to find out where it is.
*)
(* end module describe.domod *)
version = 1.40; (* of domod.p 1989 Aug 9
(* begin module describe.doodle *) 
(*
name
   doodle: pascal graphics library and preprocessor for pic under unix
  
synopsis
   doodle(input: in, output: out) 
  
files 
   input: text.  portions surrounded by .PS and .PE are
      searched for function names.  when a function name is found,
      the parameters on the same line are read.
   output: copy of input text except that the functions detected
      during reading are translated into pic commands.

description 
   Doodle is a preprocessor for the pic program.  (Yes you got it
   right... doodle is a preproprocessor for troff.)
   The pic preprocessor takes a series of commands and converts  
   them to troff input under the unix operating system.
   Commands allow one to draw pictures and imbed them into 
   text.  Doodle creates pic commands for things like lines
   and axes and spirals and other things.
      Doodle's main purpose is to be a testing shell for a general set of
   pascal graphics routines, available as modules.

see also
   the doodle manual, doodle.info, module

author
   Thomas D. Schneider 
  
bugs
   none known  
  
*)
(* end module describe.doodle *)
version = 1.95; (* of doodle, 1988 jan 6
(* begin module describe.dops *) 
(*
name
   dops: pascal graphics library and preprocessor for postscript
  
synopsis
   dops(demo: in, input: in, output: out) 
  
files 
   demo: a file for demonstration of the program.
      Start dops interactively.  Start a picture with
          .PS 81 2 2
      then type
          demo
      Graphics instructions will be read from the file 'demo', and the
      corresponding postscript will appear on the output.
      You can try instructions by hand.  Then type
          .PE
          ^d (control-d)
      to conclude.
   input: Graphics instructions.  Portions surrounded by .PS (with
      the appropriate parameters) and .PE (.PS =picture start and .PE =
      picture end) are searched for function names.  When a function
      name is found, the parameters on the same line are read.
   output: the functions detected within .PS to .PE
      are translated into PostScript graphics

description 
   Dops converts the graphical instructions made by modules from
   domod.p and produces graphics in the language PostScript.

examples
   To demonstrate the 3-D graphics, use
     .PS 81 2 2
     test3d
     .PE
     (control-d to leave the program)
   A complete test file is called 'demo', which should be run
   non-interactively.

see also
   doodle.p, domod.p, dosun.p
   PostScript Language Tutorial and Cookbook,
   PostScript Language Language Reference Manual
   both from Addison Wesley, 1985
   demo - file that demonstrates all functions

references
   
@article{Schneider1982,
author = "T. D. Schneider
 and G. D. Stormo
 and J. S. Haemer
 and L. Gold",
title = "A design for computer nucleic-acid sequence storage, retrieval and
manipulation",
year = "1982",
journal = "Nucleic Acids Research",
volume = "10",
pages = "3013-3024"}

@article{Schneider1984,
author = "T. D. Schneider
 and G. D. Stormo
 and M. A. Yarus
 and L. Gold",
title = "Delila system tools",
year = "1984",
journal = "Nucleic Acids Research",
volume = "12",
pages = "129-140"}

author
   Thomas D. Schneider 
   National Cancer Institute
   Laboratory of Mathematical Biology
   Frederick, Maryland
   toms@ncifcrf.gov
  
bugs
   none known  
  
technical note
   NONSTANDARD is a comment that means that this portion of the code
is dependent on non-standard pascal (or graphics) for its function.
*)
(* end module describe.dops *)
version = 2.63; (* of dops.p 1991 November 2
(* begin module describe.dosun *) 
(*
name
   dosun: pascal graphics library and preprocessor for Sun graphics
  
synopsis
   dosun(demo: in, input: in, output: out) 
  
files 
   demo: a file for demonstration of the program.
      type 'demo' to run it.
   input: text.  portions surrounded by .PS and .PE are
      searched for function names.  when a function name is found,
      the parameters on the same line are read.
   output: copy of input text except that the functions detected
      during reading are translated into Sun graphics.

description 
      Dosun is equivalent to doodle (see doodle.p) but produces
   output directly to the screen using Suncore graphics. 

see also
   doodle.p, suncore graphics manual, domod.p

author
   Thomas D. Schneider 
  
bugs
   none known  
  
technical note
   NONSTANDARD is a comment that means that this portion of the code
is dependent on non-standard pascal for its function.
*)
(* end module describe.dosun *)
version = 2.17; (* of dosun, 1988 jan 13
(* begin module describe.dotmat *)
(*
name
      dotmat: dot matrices of two books

synopsis
      dotmat(xbook: in, ybook: in, mlist: out, dotmatp: in, output: out)

files
      xbook: a book from the delila system

      ybook: a book from the delila system

      mlist: a dot matrix for each sequence pair between the two books.
         the "dots" are printed as numbers:
            1 means gt base pair
            2 means at base pair
            3 means gc base pair
         xbook sequences are written down the page and those of
         ybook go across the page.
         if mlist is wider than your printer, use the split program.

      dotmatp: parameters to control the mlist.
         if dotmatp is empty, default values are used.  otherwise
         if the first line begins with a "g" then g-u pairs are printed.

      output: messages to the user

description
      dotmat produces a dot matrix for all complementary base pairs
      between all pairs of sequences in the two books.  because a list of
      helices is not made, the program is much more efficient for short
      minimum helices than is the pair of programs helix and matrix.

documentation
      delman.use.comparison
      J. V. Maizel, Jr. and R. P. Lenk PNAS 78: 7665-7609 (1981)

see also
      helix, matrix, split

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.dotmat *)
version = 3.06; (* of dotmat 1986 dec 12 *)
(* begin module describe.dotsba *)
(*
name
   dotsba: dots to database

synopsis
   dotsba(dots: in,  database: out, output: out)

files
   dots: dot input format of sequences.
      First line is the header line for the database.
      Second line is the standard, not to be copied to the database.
      Following lines have a period (dot, '.') replacing bases that
      are the same as the standard or a different base.  There may be
      any number of spaces.  Following this is a bar (|).  Following
      the bar are other data to be copied to the database: clone number,
      primer for sequencing, and the date.
   database:  reformatted data ready for sites program
   output: messages to the user

description
   To convert from dots format to one the sites program can use.
   It should not have been necesary to do this, but Peter Papp didn't
   type the original sequences in unfortunately.

examples

documentation

see also
   sites.p

author
   Thomas Dana Schneider

bugs
   This is a stupid program.

technical notes

*)
(* end module describe.dotsba *)
version = 1.07; (* of dotsba.p 1990 December 9
(* begin module describe.encfrq *)
(*
name
      encfrq: encoded sequence frequency analysis

synopsis
      encfrq(encseq: in, cmp: in, fout: out, output: out)

files
      encseq: the output of the encode program
      cmp: a composition from the comp program.
      fout: frequency tables for each parameter set.  these are followed
         by z values for each frequency.  if cmp is empty, then equal
         frequencies are assumed.
      output: messages to the user.

description
      the frequency of each n-tide (mono- or di- or etc) is displayed in
      fout.  the actual number of sequences passing through a particular
      n-tide and position (ie, a parameter window) is taken into account.
         a second set of tables of z values are also presented.
      these are calculated from the composition provided in comp (p, the
      probability of obtaining the n-tide), the actual number of
      occurences (b) and the number of sequences at that position (n).
      the distribution of b can be described as a binomial distribution,
      with mean (m) np and standard deviation (s) sqrt(npq).  b is then
      normalized to obtain z: z=(b-m)/s.  if n is large, then z is
      normally distributed, and the probabilities can be found on any
      table for the normal distribution (use a two tailed test).  a rule
      of thumb for when the normal distribution can be used is that
      both np and n(1-p) should be greater than 5.  locations that violate
      this rule are marked with a '*'.
         locations of the z table that contain z values of 3 or greater are
      displayed to the right of the z table.  since these look somewhat
      like a dna footprint, they are called z-footprints.  the output
      for dinucleotide z-footprints is very wide, so one must split
      it up using the split program.  recommended values for splitp are
      p/14/112/4, where the slash means "start a new line".

see also
      encode, comp, split

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.encfrq *)
version = 1.52; (* of encfrq.p 1993 Jan 27
(* begin module describe.encode *)
(*
name
      encode: encodes a book of sequences into strings of integers

synopsis
      encode(inst: in, book: in, encseq: out, encodep: in, output: out)

files
      inst: the instructions generating the book; for aligning the sequences
      book: the sequences to be encoded
      encseq: the encoded sequences
      encodep: parameter file for describing how the sequences are to be
            encoded.  see description section for format of this file
      output: for messages to the user

description
         this program is used to encode a book of sequences into a string of
      integers. each sequence in the book is encoded into a single string of
      integers (ended by an 'end of sequence' symbol) according to the user
      specified parameters, which are in the file 'encodep'.
         the parameters are stored as a list of parameter records, of which
      there may be any number.  each parameter record has five lines of
      information which it must include (all i's and j's are integers):
      1.  i j specify the nucleotides, relative to the aligned base,
              over which this parameter record is to operate; these may
              be any integers, but i <= j is required;
      2.  i   is the size of the windows to be encoded; within the window
              the number of each oligonucleotide of length 'coding' are
              determined and printed as part of the total sequence vector;
      3.  i   is the shift to the next window to be encoded;
      4.  i : j1 j2 j3 ...  is the 'coding'-level and arrangement; the
              'coding'-level, i, is the number of nucleotides in the oligos we
              are counting, i.e., 1 means monos, 2 means dis, ...;  if i > 1
              then we can also skip bases between the ones we are encoding;
              if the i is followed next by a colon, there must be i-1 integers
              (j1..j(i-1)) which specify the number of bases to be skipped
              between the ones which are encoded; for example, if we have the
              sequence xyz and we are interested in the di-nucleotides we can
              get the xy by the parameter '2 : 0', or we could get the xz by
              parameter '2 : 1'; if there is no colon all the skips are
              assumed to be zero;
      5.  i   is the shift to the next coding site within the window;
              this allows us to encode only some of the oligos within a window,
              such as only those that are in-frame;
      multiple parameter records can be concatenated in the encodep file
      and then each sequence in the book will be encoded according to each
      parameter record into a single vector of integers.

documentation
      delman.use.encode, delman.use.aligned.books

author
      gary stormo

bugs
      none known
*)
(* end module describe.encode *)
version = 1.28;  (* of encode.p 1991 Jan 11
(* begin module describe.encsum *)
(*
name
      encsum: sum of the vectors of encoded sequences

synopsis
      encsum(encseq: in, sum: out, output: out)

files
      encseq: the file of encoded sequences; this is the output of
            the program 'encode'
      sum: the output of this program
      output: for messages to the user

description
      this program takes as input a file of encoded sequences, from the
      encode program, and sums the individual sequence vectors into one
      vector of their sums.  this is useful for doing histograms or
      compositions of many sequences.

see also
      encode

author
      gary stormo

bugs
      none known
*)
(* end module describe.encsum *)
version = 1.20; (* of encsum.p 1991 apr 3
(* begin module describe.epsclean *)
(*
name
   epsclean: clean an eps file

synopsis
   epsclean(input: in, output: out)

files
   input: eps file from microtex scanner
   output: cleaned eps file ready to print

description

   1. On the Mac:  scan an image into the mac using the microtex B&W scanner
   software.  Use:
      EPS format
      600 dpi
      2x2 high contrast
      line art

   2. Use the fetch program to move it to Unix with parameters:
      binary
      rawdata

   3. On Unix, the file comes over with all control-M's instead of newlines (ie
   instead of ascii 012).  However, if all controm-M's were converted to
   newlines, the image would be wrecked because it apparently contains control
   M's!

   4. Run the file through this program to make it useable.  This program works
   by finding the second occurance of the word 'dopic'.  Before this point,
   control-M's are converted to ascii 012, after this point they are left alone.
   The first occurance of 'dopic' is the definition of the "do picture" routine
   that defines the image.  The second one calls the routine, so the raw
   data follow that point.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.epsclean *)
version = 2.09; (* of epsclean.p 1993 January 26
(* begin module describe.ev *)
(*
name
      ev: evolution of binding sites

synopsis
      ev(list: out, all: inout, evp: in, output: out)

files
      list: a record of the evolutionary events that occurred
         See the description of evp for the kinds of data that can be printed.
         The data may be graphed as desired using the xyplo program.
         The first few lines of the file form an informative header.
         All of these lines begin with an asterisk, '*'.
         The data themselves are organized into individual lines
         broken by spaces into a set of tokens.  These are described in
         the header.
      all: all the variables, genome sequences and genetic structure
         to allow continuation of the evolution
      evp: parameters to control the program, one per line:
         Number of creatures (c, integer).
         Number of potential binding sites per creature (G, integer).
         Number of sites per creature (gamma, integer).
         Width of the recognizer in bases (width, integer).
         Bases per integer of the recognizer weights (bpi, integer).
         Mutation rate in hits per creature per generation (mu, integer).
         Seed: a real number between 0 and 1 used to start the random
            number generator.  The date and time is used if this
            number is outside 0 to 1.
         Cycles: number of additional generations to run (cycles, integer).
         Display interval:  for example, 10 means every 10th generation.
         Display control: the first 7 characters on the line control
            the kind of data printed to the list file:
                a = display average number of mistakes and the standard
                    deviation for the population.
                c = display changes in the number of mistakes.
                    The current Rsequence is given if r (below) is turned on.
                    This allows graphs of Rsequence vs mistakes to be made.
                g = display genomic uncertainty, Hg.  If this deviates much
                    from 2.0, then the simulation is probably bad.
                i = display individuals' mistakes
                o = display orbits: information of individual sites is shown
                r = display information (Rsequence, bits)
                s = current status is printed to the output file.
            These may be in any order.   Any other characters (eg, blanks) are
            ignored.
         Selecting: boolean. If true, then the organisms are sorted
            by their mistakes.   If false, then the organisms are randomly
            sorted.  Normally this should be 'true', but it does allow one
            to switch the selection off suddenly and watch things like no
            evolution and the decay of existing patterns by entropy increase.
            Selecting is true unless the first character on the line is an 'f'.
         StorageFrequency:  The frequency (every so many generations) with which
            to store a copy of everything in the all file.  If the computer
            crashes part way through a long run, then the run can be continued
            from the last storage.  Of course, there a storage is always made
            at the end of the evolution.
      output: messages to the user, including warnings about conditions,
         If the display control in evp (see above) is includes 'o', then 
         the generation number and the range of mistakes are given.
         If the display control includes 'a', then the mean and standard
         deviation of the mistakes are also given.

description
      A population of evolving creatures is simulated.  Each creature
      consists of a genome made of the 4 bases.  All creatures have
      a certain number of binding sites, and the recognizer for the
      sites is encoded by a gene in each genome.  The genomes are completely
      random at first.  The recognizer of each creature is translated
      from the gene form to a perceptron-like weight matrix.  This
      matrix is scanned across the genome.  The number of mistakes
      made is counted.  There are two kinds of mistake:
          how many times the recognizer misses a real site
      and
          how many times a non-site is detected by the recognizer.
      These are weighted equally.  (If they were weighted differently
      it should affect the rate but not the final product of the simulation.)
      All creatures are ranked by their number of mistakes.  The half of the
      population with the most mistakes dies; the other half reproduces
      to take over the empty positions.  Then all creatures are
      mutated and the process repeats.
         The integer weights of the recognizer are stored as base 4
      numbers in twos complement notation.  a=00, c=01, g=10, t=11.
      If 'bases per integer' were 3, then aaa encodes 0, acg is 6, etc.
      txx and gxx (where x is any base) are negative numbers; ttt is -1.
         The threshold for recognition of a site is encoded in the genome
      just after the individual weights.  It is encoded by one integer.

documentation
      for information calculations, see:
      Schneider et al, J. Mol. Biol. 188: 415-431 (1986)

examples

1. A lovely evolution can be had with the following evp:
*******************************************************************************
   32 NUMBER OF CREATURES 
 1024 NUMBER OF BASES PER CREATURE, G
   64 NUMBER OF SITES PER CREATURE, gamma
    6 WIDTH OF THE RECOGNIZER IN BASES
    5 BASES PER INTEGER OF THE RECOGNIZER 
    1 MUTATION RATE IN HITS PER CREATURE PER GENERATION 
 0.50 SEED FOR THE RANDOM NUMBER GENERATOR
40000 CYCLES
   10 DISPLAY INTERVAL
cgrs567 a=av, c=change, i=indivls, g=Hg, r=Rs, o=orbit, s=status
true  SELECTING
*******************************************************************************

The list file may be plotted with this parameter file for xyplo, the xyplop:
*******************************************************************************
1 2       zerox zeroy         graph coordinate center
x 0 40000 max       (character, real, real) if zx='x' then set xaxis
y -1 6.00 zy min max          (character, real, real) if zy='y' then set yaxis
10 28     xinterval yinterval number of intervals on axes to plot
7 7       xwidth    ywidth    width of numbers in characters
0 2       xdecimal  ydecimal  number of decimal places
6 6       xsize     ysize     size of axes in inches
generation                                  
Rsequence (bits) | Hg (bits) near 2 | mistakes/gamma are connected circles
n         zc                  if zc='c' then a crosshairs put on zero of x and y
n 2       zxl base            if zxl='l' then make x axis log to the given base
n 2       zyl base            if zyl='l' then make y axis log to the given base
          ---------------------------------------------------------------------
1 3       xcolumn   ycolumn   columns of xyin that determine plot location
2         symbol column       the xyin column to read symbols from
0  0      xscolumn  yscolumn  columns of xyin that determine the symbol size
          ---------------------------------------------------------------------
          symbol to plot      'c'=circle, 'b','d'=box, 'x', '+', 'I', 'f', 'g'
r         symbol flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
cl 0.05   connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          ---------------------------------------------------------------------
          symbol to plot      'c'=circle, 'b','d'=box, 'x', '+', 'I', 'f', 'g'
g         symbol flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
cl 0.05   connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          ---------------------------------------------------------------------
c         symbol to plot      'c'=circle, 'b','d'=box, 'x', '+', 'I', 'f', 'g'
c         symbol flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
cl 0.05   connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          ---------------------------------------------------------------------
.
          ---------------------------------------------------------------------
. 0 6 0.10
. 0 5 0.10
- 0 4 0.10
. 0 3 0.10
. 0 2 0.10
. 0 1 0.10
l 0 0 1
*******************************************************************************
Note that dotted lines are placed across the graph for each bit, but a
dashed line is put for Rfrequence (= 4 bits).  The vertical axis represents
the three kinds of data in the list file.

see also
      evd, xyplo

author
      Thomas Dana Schneider

bugs
      none known

*)
(* end module describe.ev version = 2.50; (@ of ev 1988 oct 6 *)
version = 3.21; (* of ev.p 1989 December 14
(* begin module describe.flag *)
(*
name
      flag: points out excessively long lines

synopsis
      flag (fin: in, fout: out, output: out)

files
      fin: a text file; typically pascal source code.
      fout: the first line of fin followed by a list of the lines which are
            too long.  the list gives the line number of each line, the line
            itself and a flag on the last acceptable character.
      output: the number of lines in fin which contain more than 80 characters.
description
      during transportation of files from one computer to another, lines
      longer than 80 characters are often truncated to 80 characters to make
      'card images' on the tape.  this byzantine practice is left over
      from the days when cards were the state-of-the-art in talking to
      computers.  since the tape does not know what a 'card image' is,
      and since cards are going the way of the passenger pigeon,
      this is like equipping a nuclear oil tanker with oars.  maybe
      someday things will be different, but until then, flag exists
      and will detect long lines, allowing one to fix a program or file
      before transportation.  note: trailing blanks on each line are
      ignored.

author
      john hoffhines and tom schneider

bugs
      none known

technical notes
      the constant 'maxline' defines the number of characters accepted
      on each line.  we recommend that maxline be set to 80 because this
      is the standard number of characters on a punched card.

*)
(* end module describe.flag *)
const version = 1.14; (* of flag.p 1991 Feb 20
(* begin module describe.frame *)
(*
name
      frame: evaluator of potential reading frames

synopsis
      frame(test: in, norm: in, result: out, output: out)

files
      test: encoded vectors of the sequences to be tested for reading frames
      norm: encoded vector of the sequences used as the standard for testing
      result: the results of the tests; each sequence from test is evaluated
         for each of the three possible reading frames
      output: for messages to the user

description
      this calculates correlation coefficients between the standard and each
      of the three possible frames of the test sequences.  the sequences must
      be encoded so that each of the oligos (of whatever length is desired)
      are counted in each of the three frames.

examples
      the files framet and framen are examples of test and norm

documentation
      delman.use.frame

see also
      encode

author
      gary d. stormo

bugs
      none known
*)
(* end module describe.frame *)
version = 1.16;  (* of frames 1986 dec 9
(* begin module describe.frese *)
(*
name
   frese: frequency table to sequ

synopsis
   frese(fresep: in, sequ: out, output: out)

files
   fresep: input frequency table (parameters to the program)
     a set of integers, 5 per line, representing first the coordinate
     and then the numbers of a,c,g and t to use.
   sequ: sequences which could have produced the fresep frequencies,
     ready for input to makebk.
   output: messages to the user

description
   Frese converts a table of frequencies to a set of raw sequences
   so they may be analyzed.  The raw sequences have the same frequencies,
   but, of course, are not the same as the original sequences.

examples

documentation

see also
   makebk.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.frese *)
version = 1.01; (* of frese.p 1991 November 30
(* begin module describe.gap *)
(*
name
      gap: gaps in aligned listing of a book

synopsis
      gap(inst: in, book: in, gapp: in, data: out, output: out)

files
      inst: delila instructions of the form 'get from 56 -5 to 56 +10;'
         (This file may be empty, in which case the sequences will be
         aligned by their 5' ends.)
      book: the book generated by delila using inst
      gapp: parameters to control the program.  If empty, the range of the
         instructions are used.  Otherwise, 
         1. The first line contains one line with two integers
         defining the range to find gaps in.  This allows one to have a wide
         alignment, but look only at a portion.  (This is equivalent to the
         alist display range.)
	 2. If the first character of the second line is 'p' the piece
	 information is given in the data list.
         3. minimum number of gaps to report
         4. modulus: integer. Gaps are reported if if (gap mod modulus) = 0.
            For example, if modulus = 5, only gaps of 5 long are reported.
            To get all gaps, use modulus = 1.
      data: the gap listing.  First column is the number of gaps.
	 If the sequences are numbered, the second column is the sequence
	 number.
         If the display is on (2nd parameter), then the piece name is given,
         followed by the coordinate of the zero base.
      output: messages to the user

description
      Gap is useful for determining the distribution of gaps in an aligned set.
      The pieces in the book are aligned according to the instructions in
      file inst, and listed in the list file.  Each piece is identified.

example

documentation

see also
   alist.p

author
      Thomas D. Schneider

bugs
   as in alist.p

technical notes

*)
(* end module describe.gap *)
version = 1.08; (* of gap.p 1993 Jan 26
(* begin module describe.genhis *)
(*
name
      genhis: general histogram plotter

synopsis
      genhis(data: in, histog: out, genhisp: in, output: out)

files
      data:  File of numbers to be histogrammed.  Header lines that begin with
	 '*' may be copied from this file to 'histog' or may be skipped.  The
	 column from which to read the data may also be specified.  See the
	 description of the file 'genhisp' to see how to do this.  Once the
	 data region has begun, (that is, there is at least one non '*' line),
	 then lines that begin with '*' are also skipped.
      histog: the output histogram.  contains the header lines copied from
         file 'data', plus data about the numbers (min, max, mean, variance
         and st. dev.), and the plot.  may also contain a standard plot.
      genhisp: parameter input file.  this is used to change any of the
         parameters from default values.  any may be changed and they can
         be specified in any order.  the first character on a line tells
         what parameter is to be set, the other information sets it.  the
         parameters that can be changed, and their line codes:
         h - sets header reading;  this is followed by two integers, the
            first specifying the number of lines to copy and the second the
            number of lines to skip;  if the first number is <0 those lines
            beginning with '*' are copied;  default is -1 0.
         c - sets the column of data that is to be analyzed and plotted;
            the default is column 1;  note: a column is any string
            of nonblank characters; columns are separated by blanks;
         p - sets the standard plot;  poisson and gaussian plots are
            available and are specified by following the p by either p or g.
         x - sets the x-axis scale; this is to be followed by either an n
            or an s, and then a number;  if n, then the number of intervals
            on the x-axis is set; if s, then the size of intervals is set;
            default is to set the number of intervals to constant 'defslots'.
         r - sets the range of data to be plotted;  this is followed by two
            numbers which specify the subrange of the data for which the plot
            is desired;  default is to plot all the data.
      output: for messages to the user.

description
      This program takes numerical data from a file and plots a histogram of
      those data.  It also calculates the min, max, mean and variance of the
      data.  If desired, the user may get a standard plot, based on the mean
      and variance, plotted along with the data.  The user may specify the size
      or number of intervals on the x-axis.  The y-axis is automatically scaled
      to fit on a page.  The scaling factor is reported to the user.

example
      try file datat7
      with genhisp of
         x n 20
         p g

author
      Gary Stormo

bugs
      Try different x axis intervals:  regular spikes can be data artifacts!

technical notes
      The constant 'pageheight' is used to set the scaling factor so that
      the plots do not exceede the size of a page.
*)
(* end module describe.genhis *)
version = 1.67;  (* of genhis.p 1992 November 16
(* begin module describe.genmod *)
(*
name
      genmod: genbank access modules

synopsis
      genmod(entries: in, output: out) 
  
files 
      entries: a set of genbank entries for a given organism
      output: messages to the user and tests of the modules
  
description 
      these are modules containing procedures to access genbank
      entries.

author
      thomas d. schneider 
  
bugs
      none known  
  
*)
(* end module describe.genmod *)
version = 1.33; (* of genmod, 1986 feb 4
(* begin module describe.genpic *)
(*
name
      genpic: convert genhis output to pic input

synopsis
      genpic(histog: in, genpicp: in, picin: out, output: out)

files
      histog: the output of the genhis program
      genpicp: parameters to control the histogram are one per line.
            if they are missing, defaults are used. all are in inches.
         boxwidth; width of the histogram boxes.
         boxheight; height of the histogram boxes.
         intervalsize; the space for the interval number.
         histogramvalue; the space for the histogram value.
         boxshift; how much to shift the boxes up relative to the numbers.
         ifield: number of characters devoted to the interval
         idecimal: number of characters devoted to the interval's decimal places
         nfield: number of characters devoted to the number of numbers
      picin: the data in histog are converted to PostScript
      output: messages to the user.

description
      The genhis program generates a histogram in simple
      character format.  The program genpic converts this
      simple histogram into PostScript commands.  Therefore, one can
      imbed output from genhis in text of a paper.

author
      Thomas D. Schneider

bugs
      none known

technical note
       defaults for the parameters are in module genpic.const.

*)
(* end module describe.genpic *)
version = 2.01; (* of genpic.p 1992 November 16
(* begin module describe.gentst *)
(*
name
   gentst: test random generator

synopsis
   gentst(gentstp:in, data: out, output: out) 

files 
   gentstp: parameter file controlling the program.
      Three numbers, one per line:
         seed: random seed to start the process
         total: the number of numbers to generate
         components: the number of random numbers between 0 and 1 to add
            together to generate the total
   data: the input file for genhis.  this is a set of numbers which
      should have gaussian distribution if the random number generator
      is a reasonable one.  It will be N(0,1), a normal distribution
      with mean 0 and standard deviation 1.
   genhisp: control file for genhis
   output: messages to the user

description 
   test of a random number generator by creating a gaussian
   distribution of numbers for plotting by genhis

example
  seed := 0.5;
  total := 10000;
  components:= 100;

see also
   tstrnd, genhis

author
   thomas d. schneider

bugs
   none known
  
technical notes 
   the constant n in procedure randomtest determines how many times  
   the random number generator will be in a series of tests.  if n 
   is small, the the test will be poor, if it is large then the test may 
   take a long time. 

*)
(* end module describe.gentst *)
version = 3.12; (* of gentst.p 1993 Jan 27
(* begin module describe.helix *)
(*
name
   helix: find helices between sequences in two books

synopsis
   helix(xbook: in, ybook: in, hlist: out, helixp: in, output: out)

files
   xbook: a book from the delila system
   ybook: a book from the delila system
   hlist: a list of helices between pieces in xbook and ybook.
      the first line is the program identification
      the second two lines are the x and y book titles
      the third line gives the minimum length or the maximum energy
          of helixes recorded
      the fourth line states whether or not g-u pairs are allowed
      the fifth line states whether or not energies are printed
      the following lines are the helices
      breaks between sequences are indicated.
   helixp: parameters that control the helix list.
      if helixp is empty, default values are used.
      otherwise, the file must contain three lines:
      1. if this number is a positive integer, it specifies the minimum length
         in base pairs of the helixes written in hlist.  if it is a negative
         real number, it specifies the maximum energy in kcal of the helixes
         written.
      2. if the first character is a "g" then g-u pairs are allowed,
         otherwise not.
      3. if the first character is an "e" then the energy of each helix
         will be written in hlist.
   output: messages to the user

description
   All sequences in xbook are compared to all sequences in ybook.
   The complementary helices (of some minimum length and longer or of
   some maximum energy or less) are listed in hlist by the 5' ends of the
   helix on both sequences.  This information, along with the length of
   the helix, determines the location of the helix.  One can allow g-u
   pairing if desired.  If the helix lengths desired are very short,
   it is better to use dotmat (see "technical notes" below).
      The new Rules are now used to calculate the helix.

documentation
   delman.use.comparison

   J. V. Maizel, Jr. and R. P. Lenk PNAS 78: 7665-7609 (1981)

   Tinoco et al. Nature New Biology vol 246 pp 40-41, 1973.

   S. M. Freier, R. Kierzek, J. A. Jaeger, N. Sugimoto,
   M. H. Caruthers, T. Nelson, and D. H. Turner,
   "Improved free-energy paramters for predictions of RNA duplex stability"
   PNAS 83: 9373-9377 (1986)

see also
   matrix, dotmat, keymat

author
   Thomas D. Schneider

bugs
   GU pairs and bulges are not done using the new data.
   An option for pair-wise (rather than multiplicative) comparisons
   of sequences would be nice.

technical notes
   The shortest length helix ever recorded in hlist is determined
   by the constant absminlength.  This overrides the parameters.

*)
(* end module describe.helix *)
      version = 3.23; (* of helix 1990 December 21 *)
(* begin module describe.hexbin *)
(*
name
   hexbin: convert hex to binary

synopsis
   hexbin(input: in, output: out)

files
   input:  hexadecimal representation of an image, PostScript
   shape: First line contains two characters to skip and then
      two integers, the width and height of the image.
   output: binary representation of an image

description
   To allow one to work with a PostScript hex image in binary format
   it is converted.

examples

documentation
   PostScript red book p. 170

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.hexbin *)
version = 1.07; (* of hexbin.p 1991 October 17
(* begin module describe.hist *)
(*
name
      hist: make a histogram of aligned sequences.

synopsis
      hist(inst: in, book: in, hst: out, histp: in, output: out);

files
      inst: the instructions which generated the book, used to
            align the sequences; the format of the instructions must be
            'get from # -x to # +y;' - the alignment is done on the base #.
            if this file is empty the sequences are aligned by their 5
            prime ends.
      book: the sequences, not longer than 'dnamax' (see technical note);
      hst: the histogram table, giving the position occurence
            of all oligonucleotides from oligomin to oligomax  in length
            (see file histp).  note - if the histogram is wider than can
            be printed on a page, use the program split to print hst.
      histp: to set the length of oligonucleotides searched for; contains two
            integers, one per line, the first specifying oligomin
            (the length of the shortest oligonucleotide which is searched for),
            the second oligomax (the longest oligonucleotide searched for,
            which cannot be greater than 4);
            note: if oligomin is zero, then the bases are counted.
      output: for error messages.

description
      makes a histogram of the occurences of oligonucleotides at positions
      relative to some aligned base.  this is done for all oligonucleotides
      with lengths from  'oligomin' to 'oligomax', set by file histp.

see also
      split, align, achsq

author
      gary stormo

bugs
      none known

technical note
      the constant 'dnamax'  from the module book.const may be too
      large for efficient use of this program.  if you do not expect to do
      histograms on aligned sequences of greater than, say, 120 nucleotides
      you can go into the source program (hists) and change 'dnamax' before
      compiling.
*)
(* end module describe.hist *)
version = 4.24; (* of hist.p 1992 Nov 9
(* begin module describe.histan *)
(*
name
      histan: histogram analysis.

synopsis
      histan(hst: in, cmp: in, chisq: out, output: out)

files
      hst: the histogram input; is the output of program hist;
      cmp: a composition input; is the output of program comp;
      chisq: the chi-squared analysis output;
      output: for user messages;

description
      histan determines the chi-squared values at each position for a
      histogram.  the observed values come from the histogram.  if a
      composition is provided the expected values come from that, otherwise
      the expected values assume equal frequencies of all bases.  the
      chisquared is calculated for each level of oligonucleotide (i.e.,
      monos, dis, tris) for which the histogram data exists.

see also
      hist, comp

author
      gary stormo

bugs
      none known
*)
(* end module describe.histan *)
version = 4.21;  (* of histan.p 1992 Nov 9
(* begin module describe.indana *)
(*
name
      indana: analysis of an index

synopsis
      indana(ind: in, ana: out, subind: out, indanap: in, output: out)

files
      ind: an index produced by the index program.  it must not be a
         teaching index.
      ana: a histogram of the similarities of the index along with the
         mean, standard deviation and frequency distribution of the
         the similarities.
      subind: portions of the index selected by the parameters in indanap.
         pairs (or adjacent sets) of lines of the index are printed.  the
         similarities of the original index are maintained.  this means that
         the first similarity of a pair (or set) is not a reflection of the
         similarity to the line above it.  the ones that are 'true'
         are marked with an asterisk [*].
      indanap: parameters to control indana, containing 3 lines:
         1. the lowest similarity to put into subind
         2. the highest similarity to put into subind

description
      An index is usually quite large, so it is difficult to look at by hand.
      Indana allows one to select a portion of the index by various criteria.
      The portion is called a "sub-index".  If the original book contained a
      number of highly similar oligo- nucleotides, then the histogram of
      similarities will show a spike of high similarities.

see also
      index

author
      Thomas Schneider

bugs
      none known

*)
(* end module describe.indana *)
version = 5.24; (* of indana.p 1992 September 18
(* begin module describe.index *)
(*
name
      index: make an alphabetic list of oligonucleotides in a book

synopsis
      index(book: in, ind: out, indexp: in; output: out)

files
      book: the book of sequences to be indexed
      ind: the alphabetized index to the book
      indexp: parameters to control index.  if this file is empty, then
         default values are used.  otherwise there may be 4 or 5 lines:
         first line: the number of bases in the alphabetizing window
         second line: the number of bases to print before the central window
         third line: the number of bases to print in the central window
         fourth line: the number of bases to print after the central window
         fifth line: if the first letter is a 't', then the index
            will run in a teaching mode.  do not use this mode on large books.
         sixth line: if the first letter is 'f' then only the first
            oligo of each sequence is used for alphabitization.  This produces
            a drastic reduction in the number of oligos sorted.  It is
            meant to be used to sort aligned sequences, to see if there are
            identical copies.
      output: messages to the user

description
      The index program generates an index of oligonucleotide fragments in a
      book.  The first base of the alphabetizing window is stepped across all
      bases of the sequence, creating a list of overlapping oligos and their
      positions.  The oligos are then sorted along with their positions.  Three
      printing windows allow one to look at bases before the first base, from
      the first base some distance on (this is not the alphabetizing window)
      and a third set even further 3'.  It is not inefficient to make the
      alphabetizing window large when there are no long repeats in the
      sequences (as when comparing two similar genes).  Following the printing
      windows are: the sequence number of the piece in the book (provided by
      delila); the position of the first base; the orientation of the oligo;
      and the similarity.  This last item is the number of bases that an oligo
      matches the previous oligo in the index, up to the point that they
      differ.  High similarity means a repeat.

examples
      The index can be used to locate restriction enzyme sites, by simply
      'looking them up'.  It has the advantage that when new enzymes become
      available, one does not need the computer to locate their sites.  Direct
      repeats will show up as high similarity oligos, and if one gets the
      complement along with a sequence in a book (using delila) then inverted
      repeats can be found.  The first column of the alphabetizing window
      contains all the mononucleotides; the first two, the di's, etc.

documentation
      L. J. Korn, C. L. Queen and M. N. Wegman, PNAS 74: 4401-4405 (1977)

see also
      search, helix, delila, delman.use.comparison

author
      Gary Stormo and Thomas Schneider

bugs
      One cannot sort more sequence than can fit into the computer memory.

technical notes
      The constant mapmax determines the maximum number of bases indexed.
*)
(* end module describe.index *)
version = 9.24; (* of index.p 1992 September 18
(* begin module describe.instal *)
(*
name
      instal: delila instruction alignment

synopsis
      instal(xbook: in, ybook: in, shlist: in, inst: in,
             rinst: out, sinst: out, list: out, output: out)

files
      xbook: a delila system book containing one piece used to
         align the ybook pieces.
      ybook: a delila system book containing pieces to be aligned by
         the piece in xbook.
      shlist: the output of the sorth program.  these sorted helixes
         must have been generated using helix(xbook,ybook,hlist,...) and
         then sorth(hlist,shlist,...,[.../1/...]).  that is, sorth must
         have been used to select only the top 1 helix from hlist.
      inst: the instructions used to generate ybook (or a comparable set
         of instructions that correspond to the ones for ybook).
      rinst: reduced instructions: those instructions from inst that
         have a unique helix in shlist.  in other words, inst is copied to
         rinst only for instructions that have a unique alignment.
         the other instructions are also copied, but surrounded by
         comment delimiters to neutralize them.  neutrialized
         instructions are followed by a delila instruction that will
         maintain the original piece numbers.
      sinst: shifted instructions: the instructions written to rinst are
         realigned by the helixes of shlist.  the new alignment
         is the coordinate where the 5 prime end of the xpiece
         would lie on each y piece.
      list: progress of the realignemnt.
      output: messages to the user.

description
      the purpose of instal is to automatically realign a set of instructions.
      for example, if one has a set of instructions that define the
      initiation codon of some procaryotic ribosome binding sites,
      one may want another alignment by the shine and dalgarno.  to
      do this, the following steps are needed:
      1. the instructions (inst) are converted to a book (ybook) using
      delila.  instructions that define the 3 prime end of the 16s rrna are
      written and used to create xbook.
      2.  potential helixes between xbook and ybook are found with the
      helix program, making an hlist.
      3. the strongest helixes of the hlist are selected using the
      sorth program.  thus each piece of ybook has a unique (or no)
      helix associated with it.
      4. instal is used to alter the original instructions.  instructions
      (pieces) with no unique helix are neutralized by putting them in
      comments to in both rinst and sinst.  in addition, the instructions
      of sinst are 'shifted' so they are aligned by the 5 prime end of xpiece.

see also
      delila, helix, sorth

author
      thomas dana schneider

bugs
      none known

technical notes
      the largest shift that is recorded is specified by the
      constant absshift

*)
(* end module describe.instal *)
version = 1.47; (* of instal 1985 may 5
(* begin module describe.kenbk *)
(*
name
      kenbk: make a book from a file of sequences of sequences provided by Kenn
      Rudd

synopsis
      kenbk(sequ: in, book: out, output: out, input: intty)

files
      sequ: file of sequences in Kenn Rudd's format.

	 That format consists of lines and sequence.  A line starts with the
	 '>' character.  This is followed by the sequence name then one or more
	 spaces.  Then the expected size of the sequence is given.   The next
	 line begins the sequence, in capital letters.  The next sequence is
	 indicated by another '>'.

      book: the output file containing the sequences and the necessary
         information  for it to be a proper book.  the user types in the
         required information after prompts from the program.

      output: for messages and queries to the user;

      input: interactive input.

description
      kenbk takes a file of raw sequences (sequ) in Kenn Rudd's format
      and converts that into a proper delila book format, getting the
      title of the book from the user.

see also
      rawbk

author
      Thomas Schneider

bugs
      Delila cannot handle N's so they are converted to A's.  This
      should not affect searches much.
*)
(* end module describe.kenbk *)
version = 1.13; (* of kenbk.p 1991 May 31
(* begin module describe.kenin *)
(*
name
   kenin: create Delila instructions from Ken's all.gen instructions

synopsis
   kenin(allgen: in, inst: out; output: out)

files
   allgen:  gene instructions of the form provided by Kenn Rudd:
      piecename l1 l2 geneA l3 l4 geneB
      These are on a single line.  The first location is the start
      of the gene.  If l1>l2, the gene is on the complementary strand
   keninp: parameters to control the program.
      First line: FROM and TO of the output instructions.
      Second line: if the first character is 'b' then both open reading frames
	 and identified genes are written to inst.  If it is 'n' then no open
	 reading frames (orf) genes are not made into instructions.  If it is
	 'o' then ONLY orfs are used.
   inst:  Delila instructions corresponding to allgen:
      piece piecename;
      get from l1 -FROM to l1 + TO;
   output: messages to the user

description
   This program converts Kenn Rudd's list of gene locations in his
   database into Delila instructions.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.kenin *)
version = 1.24; (* of kenin.p 1991 Mar 23
(* begin module describe.keymat *)
(*
name
   keymat: keyed-matrices for helices between two books

synopsis
   keymat(xbook: in, ybook: in, hlist: in, kmlist: out,
          keymatp: in, output: out)

files
   xbook: a book from the delila system
   ybook: a book from the delila system.  If you want to look for
      structures in one sequence, then use the program copy to make
      a copy of xbook in ybook.
   hlist: the helix listing for xbook and ybook made by program helix
   kmlist: the matrices listed out.  Sequences from the x book are printed
      vertically, while those from the y book are horizontal.
      Depending on the parameters selected in keymatp, the helices are
      printed as either numbers representing the type of base pair,
      the actual base from the xbook, the actual base from the ybook,
      or a symbol representing the energy of the helix.
      If kmlist is wider than your printer, use the split program.
   keymatp: parameters to control the kmlist
      if keymatp is empty, default values are used.  Otherwise, keymatp
      must contain at least 3 lines.
         line 1: contains a positive integer - the minimum length helix to
            record from hlist into kmlist;
         or a negative real number - the maximum energy of a helix to
         record from hlist to kmlist.
         line 2: contains 2 positive integers greater than or equal to 1.
            These are the x and y scaling factors (respectively) which
            allow you to display large matrices in a small space by
            scaling them down.
         line 3: contains either 'n', 'x', 'y', or 'e',  which define what
             symbols will be printed for the helices in the kmlist.
            'n' - helices will be printed as a set of numbers:
                  1 = g-t bp    2 = a-t bp    3 = g-c bp
            'x' - helices will contain the base from the x-book sequence.
            'y' - helices will contain the base from the y-book sequence.
            'e' - a key symbol for the energy of each helix will be printed.
               The program will produce a table of energies and their
               corresponding key symbols.
               Note: the third parameter request is overridden when either
               scale factor is larger than 1.
         line 4: resolution of energy display.  This defines
                 the resolution of the matrix w/respect to
                 energies.  Used when line 3 is 'e'.
            'n' - numbers ('0' to '9') used for the keys
            'l' - numbers ('a' to 'z') used for the keys
            'a' - numbers ('a' to 'z' and '0' to '9') used (not available)
   output: messages to the user

description
   Keymat produces a keyed-dot matrix for the two books.  The display
   can use numbers and letters to indicate the energy of various helixes.
   One major feature is the ability to compress large regions onto a page
   using scaling factors.
   Only helices of some length (or longer) or of some maximum energy (or less)
   are printed.  The helices are made using program helix.  
   This program was based on the matrix program.

documentation
   J. V. Maizel, Jr. and R. P. Lenk PNAS 78: 7665-7609 (1981)

see also
   helix, dotmat, matrix, split

author
   patrick r. roche
bugs
   If maximum energy is strongest helix, the program may object.
   The alphanumeric range does not work.  It bombs with a 'bus error'
   as it reads in the x piece.
*)
(* end module describe.keymat *)
version = 5.37; (* of keymat 1987 feb 16
(* begin module describe.lenin *)
(*
name
   lenin: convert a list of lengths into Delila instructions

synopsis
   lenin(lengths: in, leninp: in, finst: out, linst: out, output: out)

files
   lengths: The olength or slength file from dbinst.
      The file is expected to contain comment lines that start with '*'.
      These are followed by columns of LENGTH, FIRST-POSITION,
      LAST-POSITION and PIECE-NAME.
   leninp: parameters to control the program.
       First line: FROM and TO for the finst instructions
       Second line: FROM and TO for the linst instructions
   finst:  Delila instructions constructed from the lengths file
      according to the parameter file, and the FIRST-POSITION of the object
      or space.
   linst:  Delila instructions constructed from the lengths file
      according to the parameter file, and the LAST-POSITION of the object
      or space.
   output: messages to the user

description
   The program allows one to make a set of instructions that correspond
   to the ends of objects that exist in the GenBank entries.
   Dbinst does not do this; and it is easier to do it this way.

   For the finst file, the Delila instructions created are of the form:

   piece PIECE-NAME;
   get from FIRST-POSTION FROM.first to FIRST-POSITION TO.first;

   while for the linst file, the Delila instructions created are of the form:

   piece PIECE-NAME;
   get from LAST-POSTION FROM.last to LAST-POSITION TO.last;

examples

documentation

see also
   dbinst.p

author
   Thomas Dana Schneider

bugs
   Title of finst is not correct.  To correct, don't have dbinst put
the title, and have lenin construct it.

technical notes

*)
(* end module describe.lenin *)
version = 1.24; (* of lenin.p 1990 August 21
(* begin module describe.lig *)
(*
name
   lig: ligation theory

synopsis
   lig(input: in, list: out, output: out)

files
   input: user commands
   list: a 'hard copy' of the inputs and the outputs
   output: ligation predictions

description
   This program computes the results of a ligation reaction
   for insertion of a linker onto both ends of a linearized plasmid. 
   The user gives 
      the size of a plasmid in KB,
      the pico moles or micrograms of plasmid (you get to chose)
      the size of an insert in KB,
      the pico moles of or micrograms of insert
      the volume of the reaction in micro liters
   The program calculates whether circular or linear molecules are favored
   for the plasmid alone and with the insert.  The logic is:
   1.  The plasmid alone should circularize.
   2.  The plasmid with the insert should be linear.
   Thus, as the ligation proceeds, the first thing that happens is that
   the plasmid and insert ligate together (2 above).  Then the concentration of
   ends is lower, and circularization will be favored (1 above).
   Obviously this is all really rule of thumb, but it does seem to work
   in my experience.

documentation
   A. Dugaiczyk, H. W. Boyer and H. M. Goodman
   J. Mol. Biol. 96: 171-184 (1975)
   'Ligation of EcoRI Endonuclease-generated DNA fragements
    into Linear and Circular Structures'

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.lig *)
version = 1.27; (* of lig 1988 Jan 5
(* begin module describe.matmod *)
(*
name
      linreg: linear regression

synopsis
      linreg(input: in, output: out)

files
      input:
          first line is which pair of columns to correlate as x then y
          remaining lines are the data in columns, ending with end of file.
      output: regression results

description
      linear regression is performed between the indicated data columns.

author
      thomas schneider

bugs
      none known

*)
(* end module describe.linreg *)
version = 2.00; (* of linreg 1985 dec 19
(* begin module describe.lister *)
(*
name
      lister: list the sequences of pieces in a book with translation

synopsis
      lister(book: in, list: out, listerp: in, output: out)

files
      book: any book generated by the delila system

      list: a carefully numbered listing of the sequences in book,
         with an index to the pieces at the end

      listerp: lister parameters to control the listing.
         If listerp is empty, default values are used.
         Otherwise, the file must contain four integers, one per line:
         1. the number of bases per line in the listing.  Note that besides
            margin characters, there will be one blank with each base.
            This must be a multiple of 3 whenever one is printing amino acids.
         2. the mode for listing amino acids:
            0 = none
            1 = predict peptides starting at aug or gug. show nonsense codons.
            2 = translate all frames
         3. an integer in the range 0 to 7.  The binary representation of
            this number determines which amino acid frames are allowed to
            be printed.  The highest bit is the highest printed frame.
         4. Amino acid code: one character
               1 = 1 letter code
               3 = 3 letter code
         5. an integer that controls the listing of the sequence.
               0 = no sequence (but show amino acids and sequence numbering)
               1 = show sequence
               2 = show sequence and complement underneath
         6. Output format:  one character
               c = computer defined page character (often will be a control-L)
               l = LaTeX document page notation
               n = no page marks
         7. Page length (integer)

      output: messages to the user

description
      Lister is a general purpose program for the listing of nucleic-
      acid sequences.  Every fifth base is carefully marked with an
      asterisk directly above it.  Every tenth base is numbered with the
      number defined by the coordinate system.
         The listing can include translation to amino acids.  The amino acid
      is set directly below the codon.  Dashes mark the frame.

examples
      If listerp contains:

30    basesperline: number of bases per line in the listing
1     aastate: 0=no aa; 1=predict peptides; 2=translate all frames
7     frameallowed: binary; highest bit is highest frame on, etc.
1     codelength: 1 or 3 letters per amino acid
2     seqlines: 0=no sequence; 1=single strand; 2=double strand
c     pageaction: c=computer; l=LaTeX; n=none
55    pagelength:  page length
      listerp: parameters for the lister program

30    the listing will be 30 bases wide,
1     with predicted peptides for
7     the top frame.
1     The translated sequence is listed in single letter code.
2     Both DNA strands will be given.
c     The computer's default will be used to page the output.
55    Each page will break at 55 lines.

      More examples for frame control (parameter 3):
         7 (111 in binary) will translate all frames
         4 (100 in binary) will translate only the first frame
         3 (011 in binary) will translate the second and third frames

author
      Thomas D. Schneider

bugs
      none known

*)
(* end module describe.lister *)
      version = 5.50; (* of lister.p 1992 December 21 *)
(* begin module describe.ll *)
(*
name
   ll: line lengths

synopsis
   ll(input: in, output: out)

files
   input: the source of the lines
   output: the length of each line

description
   The lengths of lines in the input file are given to the output file.
   A useful way to use the program is to find the longest length line in
   the file using the Unix sort routine:
       ll < myfile | sort | tail -1

see also
   Unix sort routine, flag.p.

author
   Thomas D. Schneider

bugs
   none known

*)
(* end module describe.ll *)
   version = 1.03; (* of ll 1991 Mar 23
(* begin module describe.lochas *)
(*
name
   lochas: look at characters in a file

synopsis
   lochas(input: in, output: out)

files
   input:  a file
   output: identification of ascii characters in the file
      each line contains:

         the first three characters: the ordinal of the character
         blank (ie " ")
         dash (ie "-")
         the character or a blank in special cases
         dash (ie "-")
         blank (ie " ")
         6 characters: the number of the character in the file
         blank (ie " ")
         the remainder contains one of:
            NULL - a null character found
            BLANK - a blank character was found
            HIGH ORDER BIT REMOVED - the character had its high order bit
                set.  To print it, this was removed.
      or
         END OF LINE
         which indicates that an end of line condition was found.
         This is counted as a single character.

description
   The program allows one to inspect the characters in a file.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.lochas *)
version = 1.05; (* of lochas.p 1993 January 6
(* begin module describe.log *)
(*
name
   log: convert columns of data to log

synopsis
   logp: parameter file.  Base to take the log.
   input: lines starting with '*' are copied to output.
      The log is taken of the first two columns of input and this is
      written to output
   output: copied header lines and log of first two input columns

description
   The program takes the log of the first two input columns, and writes
   to the output their logs to the given base.  This lets one convert data
   for a log-log plot with the xyplo program.

see also
   xyplo

author
   Thomas D. Schneider

bugs
   To generalize this program, it would be nice to specify which columns
   are to be transformed, and other columns would be just copied to the
   output.  This requires that the program be able to take a list of
   columns, sort them, then skip and copy to the columns to be transformed.
   Other columns may contain non-numeric characters, so the copy must be
   of characters.  It would also be nice to have the program do other
   functions, like square root and sine.

*)
(* end module describe.log *)
version = 1.08; (* of log 1986 april 19
(* begin module describe.loocat *)
(*
name
      loocat: look at a catalogue

synopsis
      loocat(cat: in, list: out, output: out)

files
      cat: a catalogue generated by the catal program
      list: a listing of the contents of cat
      output: messages to the user

description
      loocat allows one to look at a catalogue that the librarian
      delila normally looks at.  these catalogues are files of a special
      type of record (called item) so that delila can read the information
      rapidly.  however this makes it difficult to see what the catalogue
      contains.  loocat is useful for understanding or debugging catalogues.

documentation
      libdef, delman.construction.catal

see also
      catal, delila

author
      gary stormo and thomas schneider

bugs
      none known

*)
(* end module describe.loocat *)
version = 1.10; (* of loocat 1985 apr 19
(* begin module describe.makebk *)
(*
name
      makebk: make a book from a file of sequences.

synopsis
      makebk(sequ: in, book: out, output: out, input: intty)

files
      sequ: file of raw sequences, each ending in a '.';  no characters are
         allowed in this file except the bases (a,c,g,t,u) and period and blank.
      book: the output file containing the sequences and the necessary
         information  for it to be a proper book.  the user types in the
         required information  after prompts from the program.
      output: for messages and queries to the user;
      input: interactive input.

description
      makebk takes a file of raw sequences (sequ) separated by periods (.)
      and converts that into a proper delila book format, getting the
      required information from the user.  the user may also have
      makebk fill in the piece information automatically, using
      default values.

see also
      rawbk

author
      gary stormo

bugs
      none known
*)
(* end module describe.makebk *)
version = 2.42; (* of makebk 1986 nov 14
(* begin module describe.makedate *)
(*
name
   makedate: make a date file

synopsis
   makedate(input: in, thedate: in, output: out)

files
   input: the date from the user
      The date may end with a single letter, to distinguish
      between several dates on one day.

      Acceptable formats of the date are
        1992 Jul  4
        1992 Jul 4
        1992 Jul 14
        1992 Jul 14a
        1992 Jul 1 a
        1992 Jul 1a

   thedate:  the date file created
   output: messages to the user

description
   Create a file containing a date

examples

documentation

see also
   tod.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.makedate *)
version = 1.27; (* of makedate.p 1992 October 30
(* begin module describe.makelogo *)
(*
name
   makelogo: make a graphical `sequence logo' for aligned sequences

synopsis
   makelogo(symvec: in, makelogop: in, colors: in, marks: in,
             logo: out, output: out)

files
   symvec: A "symbol vector" file from the alpro or dalvec program.
     If the file is empty, the alphabet is printed.  This allows one
     to determine the correction factors described below.
     If the error bars have a negative size, they are not displayed.
     This allows the sites program to control the display when it
     would not be appropriate.
     If the number of a symbol is negative in symvec, then the
     symbol will be rotated 180 degrees before being printed.
     The absolute value is used by makelogo to determine the height.
     This allows statistical tests which find rare symbols to be
     significant to show that the symbol is rare by having
     it up side down.  Notice that ACGT are all easy to distinguish
     from their upside down versions, but unfortunately this is not always
     true for protein sequences.
   makelogop: parameters to control the program.
     line 1: contains the lowest to highest range of the binding
             site to do the logo graph. (FROM to TO range)
     line 2: bar: sequence coordinate before which to print a vertical bar
             NOTE: the vertical bar takes up a small amount of horizontal space.
             This will offset the logo from that point on by a tiny amount.
     line 3: xcorner and ycorner.  This is the coordinate of the
             lower left hand corner of the logo (in cm).
             These should be real numbers.
     line 4: rotation: angle in degrees to rotate the logo.  Warning:
             rotations other than by factors of 90 degrees may produce
             incorrect logos because character scaling depends on the
             orientation of the characters.  (Essentially, it's a design
             fault of PostScript.)
     line 5: charwidth: (real, > 0) the width of the logo characters, in cm
     line 6: barheight barwidth: (real, > 0) height of the vertical bar, in cm,
             and its width, in cm.
     line 7: barbits: (real) The height of the vertical bar, in bits, is given
	     by the absolute value of barbits.  If barbits is positive, an
	     "I-beam" will appear at the top of the symbol stack.  The I-beam
	     indicates one standard deviation of the stack height, based
	     entirely on how small the sample of sequences is.  If the value of
	     barbits is negative, the I-beam is not displayed.  Not knowing how
	     big the sampling effects are can fool one, so one should usually
	     have the I-beam, even if it is ugly.
		WARNING: it is not known how to calculate the error for data
	     derived from a dirty DNA synthesis experiment (see Schneider1989,
	     reference given below).  In that case the error could be
	     calculated (in program sites) from the number of sequences, so
	     that the error bar would be an underestimate of the variation.
	     Unfortunately, when I tried this, people interpreted the error bar
	     as the size they saw, so this does not work well visually.
	     Therefore when data come from the sites program, the I-beam is
	     suppressed.
		The combination of barheight and barbits determines the size of
	     the logo in bits per centimeter.  Both must be specified even if
	     no vertical bar is desired.
     line 8: barends: if the first character on the line is a 'b', then bars
             are put before and after each line, in addition to the other bar.
             The first bar on each line is labeled with tic marks and
             the number of bits.  If you don't want this, you can remove
             the call to maketic in the logo.
     line 9: showingbox: if the first character on the line is an 's',
             then show a dashed box around each character.
    line 10: outline: If the first character is 'o' then the characters show
             up in outline form.  Otherwise, they are solid.
    line 11: caps: if the first letter is 'c' then alphabetic
             characters are converted to capital form.
    line 12: stacksperline: number of character stacks per line output
    line 13: linesperpage: number of lines per page output
    line 14: linemove: line separation relative to the barheight
    line 15: numbering: if the first letter is 'n' then each stack is
	     numbered.  Otherwise, the number is suppressed as a PostScript
	     comment.  This allows you to modify the logo file by hand to
	     reinstate numbering for only the positions you want by removing
	     the percent (%) symbol from in front of the calls to makenumber.
    line 16: shrinking: (real)  Factor by which to shrink the characters.  If
	     shrinking <= 0 or shrinking >= 1 then the characters exactly fit
	     into the dashed box.  If shrinking > 0 and shrinking < 1, the
	     characters are shrunk inside the dashed box.  To use this feature,
	     the parameter showningbox be on, so that the user does not create
	     a logo whose height is misleading.
    line 17: strings: the number of user defined strings to follow.  Each
	     string definition takes up two lines.  The first is the (x,y)
	     coordinate of the string, the second is the string itself.  The
	     coordinates are in centimeters relative to the coordinate
	     transforms performed above.  (This way, the title position stays
	     the same relative to the logo.)
    line 18: (x,y,s) coordinates of first user defined string (if strings >= 1)
             followed by the factor by which to scale the string.  A factor
             of 1 means no scaling.  In addition, if the x coordinate is
             negative, then the string is centered by using the string
             width, the stacksperline and charwidth.
    line 19: the first user defined string (if strings >= 1)
    line 20: (x,y,s) coordinates of second user defined string (if strings >= 2)
    line 21: the second user defined string (if strings >= 2)
             (etc. for the remaining strings.)
    The remainder of the file is ignored and may contain comments.
   colors: Defines the color of each character printed.
     Any number of lines that begin with an asterisk [*] can be used
     as comments to identify the file or portions of the file.
     Put into the file one line for each character that is to
     have a color other than black.  The line must contain:
                    character red green blue
     The last three parameters are real values between 0 and 1 (inclusive).
     The values depend on the PostScript interpreter, but 0 means black
     and a value of 1 means the most bright.
     To assign the asterisk a color, proceed it with a backslash [as \*].
     To assign the backslash a color, proceed it with a backslash [as \\].
     If the file is empty, the logo is made in black and white and the lower
     half of the I-beam error bar is made white so that when it is inside the
     letters it is visible.
   marks: an empty file means no marks are made.  Otherwise, a series
     of lines containing four pieces of data that define marks to be placed
     over the output:
        mark: o means open circle, b means filled circle.
        base coordinate: a real number that determines the center of the mark
        bits coordinate: a real number that determines the position of the
           mark in bits.
        scale: a positive real number by which to scale the mark.
     The symbols must be in increasing order of position in the site.
   logo: the output file, a PostScript program to display the logo.
   output: messages to the user

description
   The makelogo program generates a `sequence logo' for a set of aligned
   sequences.  A full description is in the documentation paper.  The input is
   an `symvec', or symbol-vector that contains the information at each
   position and the numbers of each symbol.  The output is in the graphics
   language PostScript.

   The program now indicates the small sample error in the logo by a small
   'I-beam' overlayed on the top of the logo.  Although the user may
   turn this off to make pretty logos, I strongly recommend use of it
   to avoid being fooled by small amounts of data.

   Making A Logo As Part of Another Figure
   ---------------------------------------

   The normal logo file is designed to stand by itself.  However, it is often
   desirable to incorporate the logo as part of another figure.  The difficulty
   is that the stand-alone logo PostScript program will erase the page (which
   wipes out any previous figure drawing) and show the page (which prints the
   page right after the logo).  To prevent these actions, the lines of
   PostScript code which do this have comments that contain the word REMOVE.
   All you have to do is remove these lines and your logo will be able to fit
   into your figure.  In Unix this can be easily done by:

   grep -v REMOVE logo > logo.ps

   If you do this, then it is advisable to do the erasepage and the
   showpage yourself.  A convenient way to do this is to have several
   files that contain postscript commands, and to use a shell script
   to concatenate them together:

   cat start.ps logo.ps end.ps > myfigure.ps

   If you have a large number of logos together in one figure, you can reduce
   the size of the final figure by another trick.  Logo files begin with a
   header which is the same from one figure to the next assuming you don't
   change colors/letter combinations.  So the first logo in the figure must
   contain this header, but later ones don't really need it.  You can remove
   the header material by using the censor program:

   censor < logo.ps > logo.no.header.ps

author
   Thomas D. Schneider
   National Cancer Institute
   Laboratory of Mathematical Biology
   NCI/FCRDC Bldg 469. Room 144
   P.O. Box B
   Frederick, MD  21702-1201
   (301) 846-5581 (-5532 for messages)
   network address: toms@ncifcrf.gov

examples
   makelogop parameters:

-15 2      FROM to TO range to make the logo over
1          sequence coordinate before which to put a bar on the logo
15 2       (xcorner, ycorner) lower left hand corner of the logo (in cm)
90         rotation: angle to rotate the graph
1.0        charwidth: (real, > 0) the width of the logo characters, in cm
10 0.1     barheight, barwidth: (real, > 0) height of vertical bar, in cm
2          barbits: (real) height of the vertical bar, in bits; < 0: no I-beam
no bars    barends: if 'b' put bars before and after each line
show       showingbox: if 's' show a dashed box around each character
no outline outline: if 'o' make each character as an outline
100        stacksperline: number of character stacks per line output
1          linesperpage: number of lines per page output
1.1        linemove: line separation relative to the barheight
numbers    numbering: if the first letter is 'n' then each stack is numbered
1          shrinking: factor by which to shrink characters inside dashed box
2          strings: the number of user defined strings to follow
2 14 1     coordinates of the first string (in cm)
First TITLE
3 13 1     coordinates of the second string (in cm)
SECOND TITLE

   colors:
* Color scheme for logos of DNA (for the makelogo program).
* color order is red-green-blue
*
* green:
A 0 1 0
a 0 1 0
*
* blue:
C 0 0 1
c 0 0 1
*
* red:
T 1 0 0
t 1 0 0
*
* orange:
G 1 0.7 0
g 1 0.7 0

  A test symvec is provided with the program, file 'symvec.demo', to be run with
  'colors.demo' and 'makelogop.demo'.

documentation
   Description of Logos:
@article{Schneider.Stephens.Logo,
author = "T. D. Schneider
 and R. M. Stephens",
title = "Sequence Logos: A New Way to Display Consensus Sequences",
journal = "Nucl. Acids Res.",
volume = "18",
pages = "6097-6100",
year = "1990"}

   The Blue Book:
@book{PostScriptTutorial1985,
author = "{Adobe Systems Incorporated}",
title = "PostScript Language Tutorial and Cookbook",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
callnumber = "QA76.73.P67P68",
isbn = "0-201-10179-3",
year = "1985"}

   The Red Book:
@book{PostScriptManual1985,
author = "{Adobe Systems Incorporated}",
title = "PostScript Language Reference Manual",
publisher = "Addison-Wesley Publishing Company",
address = "Reading, Massachusetts",
callnumber = "QA76.73.P67P67",
isbn = "0-201-10174-2",
year = "1985"}

   Dirty DNA synthesis experiments:
@article{Schneider1989,
author = "T. D. Schneider
 and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}

see also
   rsgra.p, rseq.p, dalvec.p, alpro.p, sites.p, censor.p

bugs
   Some chi-logo (upside down characters) do not display on OpenWindows,
   but do print ok on the Apple LaserWriter IIntx.  The reason is completely
   obscure.

   A bug in NeWS 1.1 is that characters that are scaled too small are
   forced to be big.  This messes up the logo and can be confusing.
   Another bug in NeWS 1.1 prevents one from using the outline, but
   the dashed boxes will show up.
   Sometimes displaying a logo in NeWS 1.1 on a Sun 4 will cause an 'illegal
   instruction', after which one is thrown completely off the computer.  The
   source of this is not known, since it is not repeatable.
   The first two bugs are resolved under OpenWindows 2; the third has not
   been observed.
   These NeWS bugs do not apply to the Apple LaserWriter IIntx,
   which prints everything correctly.

technical notes
   Unfortunately PostScript fonts are not exactly the same height.  Thus if A
   and T are the standard, then C and G hang above and below the line.
   This has been solved in this version of makelogo.  As a consequence,
   the user never need to determine any character sizes empirically, and
   the logos should work on any PostScript printer.

   Special thanks go to the following people for their help in solving this
   problem:

   Kevin Andresen [kevina@apple.com]
   "The problem facing you is that, while the PostScript language is more or
   less standard, the font shapes depend on the designer, type vendor, or
   language implementation.  The fonts used in NeWS are not exactly the same
   as those from Adobe, which are not the same as those from Bitstream, which
   are not the same as the original lead type, etc.  (This is an
   industry-wide issue.)  One way to compensate for this in PostScript is to
   use the charpath and pathbbox operators and scale appropriately."

   He provided a program, which I then rewrote and generalized.  That version
   almost worked, but not quite.  This was solved by:

   finlay@Eng.Sun.COM (John Finlay) who said:
   "It would appear that the calculation of the pathbbox for characters varies
   with the scale of the characters (I don't know why exactly but would
   speculate that there's probably some weirdness with the font hints and
   scaling).  I modified your postscript to iterate once on the size and
   recalculate the pathbbox at the scaled size.  Seems to printout OK (inside
   the boxes) on a LWI, LWII and in NeWS2.0 (though NeWS still seems to get the
   wide slightly wrong)."

   shiva@well.sf.ca.us (Kenneth Porter) was also involved and actively
   interested.  My apologies if I have forgotten someone else who contributed.

   The letter I and the vertical bar (|) are treated specially since in the
   Helvetica-Bold font they are rectangles and would completely fill the
   character space.  In addition, the letter I is centered by makelogo.

   Thanks go to Joe Mack for suggesting numbering and titles (strings) and to
   Pete Lemkin and Wojciech Kasprzak for pointing out that the shrink option
   would be helpful.  Thanks to Jeff Haemer for pointing out that the
   PostScript program should begin with '%!', and for suggesting that
   the string fonts should be different from the logos themselves.

   MISSING LOGO LETTER PROBLEM

   The OpenWindows PostScript on a Sun workstation will mess up displaying a
   stack of letters if the vertical movement is too small.  The result is that
   the letters above that point are missing.  This occurs if there is a highly
   conserved base and very few other bases.  The result is a huge gap where the
   highly conserved base should be.  Other printers do fine, so this is a
   problem with the Sun implementation of PostScript (will they ever get it
   right???).  If you don't have this window system, set the constant
   gooddisplay to true.  If you do want the logos to show up properly on the
   screen, use false.  Unfortunately, this will mean that the vertical
   translation for the small letters won't be done, so the display will be very
   slightly wrong.

*)
(* end module describe.makelogo *)
version = 7.53; (* of makelogo.p 1992 June 8
(* begin module describe.makessbdate *)
(*
name
   makessbdate: make a date file from a Sample_Sheet.bin file

synopsis
   makessbdate(input: in, thedate: in, output: out)

files
   input: a Sample_Sheet.bin file
      The date may end with a single letter, to distinguish
      between several dates on one day.

      Acceptable formats of the date are
        1992 Jul  4
        1992 Jul 4
        1992 Jul 14
        1992 Jul 14a
        1992 Jul 1 a
        1992 Jul 1a

   thedate:  the date file created
   output: messages to the user

description
   Create a file containing a date

examples

documentation

see also
   tod.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.makessbdate *)
version = 1.28; (* of makessbdate.p 1993 January 25
(* begin module describe.makman *)
(*
name
   makman: make manual entries from a source code

synopsis
   makman(input: in, output: out)

files
   input: a source code containing one or more modules with names
      of the form 'describe.name'.  The module must be proceeded
      by a "version = " identification line.
   output: the modules with names of the form 'describe.name'.
      This is followed by the "version = " line.

description
   Modules with names of the form "describe.name" are copied from the
   input to the output.  By appending a set of such modules together
   from several programs, one can create a manual.  The pages may
   then be broken apart with the break program.

see also
   module.p, break.p, shell.p

author
   Thomas D. Schneider

bugs
   none known
*)
(* end module describe.makman *)
version = 1.32; (* of makman.p 1993 January 27
(* begin module describe.makmod *)
(*
name
      makemod: create a set of empty modules from a list of names

synopsis
      makmod(fin: in, fout: out, output: out)

files
      fin: a set of names separated by blanks or end-of-lines.
      fout: a file of modules with the names listed in fin.
      output: messages to the user.

description
      makmod creates a set of empty modules that have
      the names given in the fin file.  one may then use
      the module program to extract modules by the same names
      from a module library (for example).

examples
      if the fin file contains:
         first second
         3.rd

      the fout file will contain:                                     *)
         (* begin module first *)
         (* end module first *)

         (* begin module second *)
         (* end module second *)

         (* begin module 3.rd *)
         (* end module 3.rd *)                                        (*

see also
      moddef, module, show

author
      john hoffhines

bugs
      none known

*)
(* end module describe.makmod *)
version = 1.11; (* of makmod 1986 dec 12
(* begin module describe.maknam *)
(*
name
   maknam: make manual entry names

synopsis
   maknam(input: in, output: out)

files
   input: a source code containing one or more modules with names
      of the form 'describe.name'.  Generally though, the output
      of the makMAN program is used.
   output: The name line of each describe module, which is always assumed to be
      the third line below the '(@ begin module descibe.x' line.  (@ is used
      here instead of * to prevent compilers from complaining.)

description
   For each module with a name of the form "describe.name",
   the third line of that module is copied to the output.
   This generates a file containing the name description of the program.
   (See example line above.)  This program is intended to be used on the
   concatinated output of the makman program, so that
   one can create a manual page that describes each program.

see also
   makman

author
   Thomas D. Schneider

bugs
   none known
*)
(* end module describe.maknam *)
version = 1.06; (* of maknam.p 1993 Jan 27
(* begin module describe.malign *)  
(*
name
      malign: optimal alignment of a book, based on minimum uncertainty
synopsis
      malign(inst: in, book: in, malignp: in, uncert: out, newalign: out,
         optalign: out, optinst: out, bestinst: out, output: out) 
files 
      inst: delila instructions of the form 'get from 56 -5 to 56 +10;' 
      book: the book generated by delila using inst 
      malignp: parameter file with the following parameters:
      winleft, winright: left and right ends of window for calculating
          uncertainty, relative to aligned base
      shiftmin, shiftmax: minimum and maximum shift of aligned base
      iseed: integer random seed
      nranseq: number of random sequences, or 0 to use sequences in book
      nshuffle: number of times to redo alignment after random shuffle
      ifpaired: 1 to treat each pair of sequences as complementary strands,
         0 not to
      standout: output run #, pass # and H to standard output every pass
         if 1, every run if 0, or not at all if -1
      npassout: output H and alignment every npassout passes, or only at
         end of runs if zero, or not at all if -1
      nshiftout: output L and H(L) every nshiftout sequence shifts, or only
         at end of passes if zero, or not at all if -1
      tolerance: tolerance in change of H
      ntolpass: maximum number of passes with change below tolerance
      uncert: uncertainty as function of position, for the last run, at the
      end of each pass or after selected number of sequence shifts
      newalign: values of H and the relative alignments; starting, final, and
      intermediate if selected
      optalign: user-readable listing of unique optimal relative alignments
      and number of times each was achieved
      optinst: list of unique optimal alignments in absolute coordinates,
      to be used to make inst file for selected alignment
         This file is like optalign, but the coordinates are for the
         original sequence.
      bestinst: a new inst file at the very best alignment
description
      Given a book of aligned sequences, this program searches for the alignment
      of the sequences that has the lowest uncertainty, i.e. the highest value
      of Rsequence.  The user specifies the "window" of bases within which
      uncertainty is calculated, and the maximum number of bases that each
      sequence is allowed to shift from the original alignment.  The program
      considers each sequence in turn, shifting it to an alignment with minimum
      uncertainty while holding the other sequences fixed.  A "pass" is complete
      when all sequences have been considered.  A "run" is complete when no
      alignments have changed in the preceding pass, and the alignment is then
      considered "optimal".  The first run starts with the original alignment;
      every run after that starts with a "shuffled" alignment obtained by
      shifting each sequence independently by a random amount between the
      allowed limits.  The program maintains a list of all of the unique optimal
      alignments achieved from these starting alignments, and it outputs them in
      order of increasing uncertainty.
author: David Mastronarde
bugs: The realignment algorithm, which shifts all sequences by the same amount
      to attempt to keep the window near its original position, is somewhat
      ad hoc in nature and the effects of different settings for it parameters
      have not been explored.  If the window spans two real sites with competing
      alignments, many optimal but meaningless alignments with similar
      uncertainties may be obtained.  The random sequences can't be examined. *)
(* end module describe.malign *)  
version = 2.22; (* of malign.p 1991 February 8
(* begin module describe.markov *)
(*
name
      markov: markov chain generation of a dna sequence from composition.

synopsis
      markov(cmp: in, mkvseqs: out, listing: out, markovp: in, output: out)

files
      cmp: the input composition, which is the output of program comp.
      mkvseqs: the output dna sequences of this program.
      listing: contains the following information about program execution:
         program and version number.
         first three lines of the input composition file used to
            generate the sequences with.
         the four input parameters - number of sequences, length of sequences,
            the seed number, and the depth of sequence generation.
         a listing of any sequence that could not be generated from the
            prior length specified in the markovp file.
            the listing contains the sequence number, the sequence position
            and the depth of restart of sequence generation.
      markovp: for parameters; markovp must contain four numbers
         each on separate lines:
         1. number of sequences desired - an integer
         2. the length of each sequence - an integer
         3. a seed number between zero and one or outside this range
         if a computer date and time seed is desired.  the seed
         is used to start the random number generator. - a real
         4. the number of bases prior to the one about to be inserted
         which are to influence the choice of the base to be inserted.
         zero means equiprobable random sequences are desired.
         example:   20 number of sequences desired
                    100 length of sequences desired
                    2.0 for a computer date and time seed
                    3 composition depth used to generate next base
      output: for user messages.

description
      markov generates a set of random dna sequences which have
      approximately the same composition as the one in the
      composition file supplied to the program.  the user chooses
      the depth of the composition to be used.  for example, if
      trinucleotides (composition depth = 3) are used, the
      previous two bases determine the probability of the next base in
      the sequence.  this is called a markov chain.
         sometimes the program will work itself
      into a corner, when no composition exists for the
      previous few bases.  in these cases, the program restarts
      with the longest possible oligonucleotide that does exist in the
      composition.  these cases are recorded in the listing.

see also
      comp, compan, rndseq

author
      john eberwein, gary stormo, tom schneider

bugs
      none known
*)
(* end module describe.markov *)
version = 3.73; (* of markov, 1989 march 5
(* begin module describe.matmod *)
(*
name
      matmod: mathematics modules

synopsis
      matmod(encseq: in, output: out)

files
      encseq: empty or the output of the encode program for testing parameter
         reading routines.
      output: the version of matmod is printed.  successful compilation
         and running of the program indicates that the modules are correct.

description
      self contained modules for mathematical manipulation.
      included is a procedure for linear regression analysis of data pairs,
      a random number generator, newton's method to find roots of functions,
      and routines for reading the parameters from the encode program.

see also
      delmod, module, encode

author
      thomas d. schneider and gary d. stormo

bugs
      none known

technical notes
      the constant n in procedure randomtest determines how many times
      the random number generator will be in a series of tests.  if n
      is small, the the test will be poor, if it is large then the test may
      take a long time.

*)
(* end module describe.matmod *)
version = 'matmod 2.05 88 dec 15 tds/gds';
(* begin module describe.matrix *)
(*
name
      matrix: dot matrices for helices between two books

synopsis
      matrix(xbook: in, ybook: in, hlist: in, mlist: out,
             matrixp: in, output: out)

files
      xbook: a book from the delila system

      ybook: a book from the delila system.  If you want to look for
         structures in one sequence, then use the program copy to make
         a copy of xbook in ybook.

      hlist: the helix listing for xbook and ybook made by program helix

      mlist: the matrices listed out.  Sequences from the x book are printed
         vertically, while those from the y book are horizontal.
         helices are printed as a set of numbers:
            1 means gt base pair
            2 means at base pair
            3 means gc base pair
         if mlist is wider than your printer, use the split program.

      matrixp: parameters to control the mlist
         If matrixp is empty, default values are used. otherwise, the first
         line contains one number.  If this number is a positive integer,
         it specifies the minimum length helix in base pairs from hlist to
         record in mlist;
         if this number is a negative real number, it specifies the maximum
         energy in kcal of the helixes written in mlist.

      output: messages to the user

description
      Matrix produces a dot matrix for the two books.  Only helices
      of some length (or longer) or of some maximum energy (or less) are
      printed.  The helices are made using program helix.

documentation
      delman.use.comparison
      J. V. Maizel, Jr. and R. P. Lenk PNAS 78: 7665-7609 (1981)

see also
      helix, dotmat, split, keymat

author
      Thomas D. Schneider

bugs
      none known

technical notest
      The constant maxarray defines the maximum area that the program
      can handle.

*)
(* end module describe.matrix *)
      version = 3.28; (* of matrix 1987 feb 13 *)
(* begin module describe.merge *)
(*
name
      merge: compare two files and merge them

synopsis
      merge(afile: in, bfile: in, apfile: out, bpfile: out,
            output: out, input: intty)

files
      afile: the first input file
      bfile: the second input file
      apfile: the afile with corrections from bfile or the user
      bpfile: the bfile with corrections from afile or the user
      output: messages to the user
      input: interactive input from the user.

description
      the merge program was designed to aid in the entry of sequences.
      merge will also compare any two files for differences.
      two typed copies of the data are made (afile and bfile).  merge will
      compare the files, ignoring spaces and end-of-lines.  this allows
      the two data files to be typed independently by two people in two
      formats.  differences between the files are flagged, and the user
      may then indicate which file is correct and merge will fix the other
      file.  the user may also modify the files using a small
      editing facility.  the changes go to the prime files (apfile, bpfile).
      to be sure that apfile and bpfile are identical after the merge,
      you can merge them again.  several commands can be put on one line
      separated by blanks.  if you type an unrecognizable command, or ask for
      help at any time then merge will list the commands available.

examples
      when two sequences were compared, the program gave this output:
i am 91% sure that this has a deletion in b (insertion in a) of 5 characters:
file a: line 1
aatccttatccctcctaatttcgtttttgct
       >iiiii<        x         at 9 insertion, 1 mismatch downstream
file b: line 1
aatccttacctaatttcctttttgct
       ><        x         at 9 deletion, 1 mismatch downstream
      the sequences matched before the points indicated.  file a had
      an insert of "tccct" followed several bases later by a c to g change.
      the mismatch made merge less sure of the deletion.
      one must look at the original sequence to make the correction.

author
      thomas d. schneider

bugs
      1. lines without any characters are not copied from the file to its
      pfile.  see procedure readline.
      2. excessive blank characters may fool the guess procedure, since it
      does not remove blanks before guessing.  removing blanks would make
      it difficult to autofix, and the spacings would be lost in the pfile.
      3. the program can not compare more than one line from each file at a
      time,  so the guess is limited to what is visible on a line.  the entire
      program must be rewritten to allow multiple line guessing.
*)
(* end module describe.merge *)
version = 9.53; (* of merge 1989 May 1 *)
(* begin module describe.mnomial *)
(*
name
   mnomial: produce the multinomial distribution for base probabilities

synopsis
   mnomial(mnomialp: in, list: out, output: out)

files
   mnomialp: parameters to control the program, as pairs of lines
      first line: na,nc,ng,nt
      second line: pa,pc,pg,pt
        this may repeat.
      When ng=nt=0 and pg=pt=0, the binomial distribution is calculated.
      The mean (n*pa) and standard deviation (sqrt(n*pa*pc)) are given.
   list: results
   output: messages to the user

description
   This program calculates the multinomial distribution:
                                 (na+nc+ng+nt)!   na   nc   ng   nt
    p(na,nc,ng,nt|pa,pc,pg,pt) = -------------- pa   pc   pg   pt
                                 na!nc!ng!nt!

see also
   binplo

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.mnomial *)
version = 1.18; (* of mnomial, 1988 Dec 14
(* begin module describe.modin *)
(*
name
      modin: generate modularized delila instructions for absolute sites

synopsis
      modin(fin: in, inst: out, output: out)

files
      fin: sequence site positions in a special format.
      inst:  modularized delila instructions
      output: messages to the user

description
      The existence of a file containing modularized delila instructions
      allows one to pull, from the file, instructions for generating
      specific sequence sites using the module program.  For instance,
      using modin, one may make a file containing delila instructions for
      all the laci amber mutation sites, one site per module.  Then, using
      the module program, one could pull from the file instructions for
      sites a9, a16, a19, and a21 by using the module program.  This
      would be useful if one had several different sets of amber mutations
      to analyse separately.

see also
      module, delila, describe.modin.use

author
      John Hoffhines

bugs
      The program has not been used much, so its usefulness is not known.

*)
(* end module describe.modin *)
version = 1.43; (* of modin.p 1993 Jan 27
(* begin module describe.modin.use *)
(*
name
      modin.use: more information on using the modin program

     MODIN FIN FILE FORMAT (BNF)

<BOOK TITLE><SETS>
<SETS>::=<SET>|<SET><SETS>
<SET>::=SET <KEY> <KEY> <KEY> <KEY> <SIGNED NUMBER> <SIGNED NUMBER>
            <MODULE GROUP>
<MODULE GROUP>::=<MODULE PARAMETERS>|<MODULE GROUP> <MODULE PARAMETERS>
<MODULE PARAMETERS>::=M <NUMBER> <IDENTIFIER>|<NUMBER> <IDENTIFIER>

     NOTES ON MODIN FIN FILE FORMAT

1) Any terms undefined here are defined in LIBDEF.

2) Keys designate, respectively, the words "organism", "chromosome",
    and "p" (piece), "g" (gene), or "t" (transcript).

3) The module parameter "m" sets delila instructions direction
    to - for one module only.  Default is +.

4) Within the module parameters section, "number" is DNA base position,
   and "identifier" immediately following it is that which will be
   used in that position's module name.

5) Only one set per line is allowed, without its module group.  Module
   groups follow on subsequent lines; more than one per line is allowed,
   but they may not be truncated by the end of a line.

*)
(* end module describe.modin.use *)
version = 1.43; of describe.modin.use 1993 Jan 27 *)
(* begin module describe.modlen *)
(*
name
      modlen: determine module lengths

synopsis
      modlen(fin: in, fout: out, modlenp: in, output: out)

files
      fin: a text file containing modules
      fout: a list of the module names and their lengths
      modlenp: parameters to control modlen.
         if the file is empty, modlen gives a complete list of
         modules and their lengths to fout, and notes those
         longer than 'a certain number of lines' lines to output.
         otherwise the file is expected to contain two integers,
         each at the start of a line.  these are the shortest and
         longest lengths to print to output.
      output: messages to the user and modules with lengths
         determined by modlenp.

description
      the delila manual consists of module pages.  this tool allows
      you to find out if the pages will fit onto your printer.

see also
      module, show, break

author
      thomas d. schneider

bugs
      none known

technical notes
      'a certain number of lines' is set by the constant defshort.
*)
(* end module describe.modlen *)
version = 1.36; (* of modlen 89 July 14 *)
(* begin module describe.module *)
(*
name
      module: module replacement program

synopsis
      module(sin: in, modlib: in, sout: out,
             modcat: inout, list: out, output: out)

files
      sin: the source program or file
      modlib: a library of modules (if empty, modules of sin are stripped)
      sout: the source program with modules replaced from modlib
      modcat: an alphabetic index to modlib that is recreated
         if it does not match modlib
      list: progress of the transfer.  meaning of the list columns:
         nesting depth:  how deeply the module was nested inside other modules
         action:  what was done with the module.  if a module was not
            transferred, a symbol on the left flags the situation:
              (blank) successful transfer
            * module not found in the source
            v no transfer because version modules can not be transferred
            ? recursive transfers were aborted because the modules may be
              infinitely nested (the depth at which this happens can be
              increased by changing the program - ask your programmer).
              (problem: can you construct this bizarre infinite situation?)
         module name: the name of the module in the source.  in recursive
            cases, these are from the modlib.
      output: messages to the user

description
      the module program allows one to construct libraries of special
      purpose program modules, which one simply 'plugs' into the
      appropriate place in a program.  this speeds up both program design
      and error correction.  module is more general-purpose than the standard
      'include' type processes because it performs a replacement rather than
      a simple insertion.  the operation is recursive, so a module may be
      composed of other modules.  the replacement mechanism also allows one to
      run the program in 'reverse' so that module-libraries are created by
      extracting modules from existing programs.  this makes the building of
      module libraries easy, and helps keep them updated with new modules and
      improvements to old ones.
         for a full description, see the documentation.

documentation
      moddef, delman.assembly.modules,
      delman.intro.organization 'technical notes'

see also
      delmod, prgmod, matmod, break, show (especially...)

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.module *)
version = 'module 6.07  88 jan 6 tds';
(* begin module describe.mstrip *)
(*
name
   mstrip: remove control m's from a file

synopsis
   mstrip(input: in, output: out)

files
   input: a file which contains control m's (^M as seen in vi)
      which one desires not to have control m's
   output: the input copied to the output without the ^M's.

description
   the tip program in conjunction with the cyber produces extra
   control m's at the ends of lines in the scripts.  this program
   removes them.

author
   tom schneider

bugs
   none known
*)
(* end module describe.mstrip *)
version = 1.01; (* of mstrip.p 1993 Jan 27
(* begin module describe.nocom *)
(*
name
   nocom: remove comments

synopsis
   nocom(input: in; output: out)

files
   input:  a program with comments.
   output: the same program with the contents of the comments removed

description
   This program removes comments from a Delila or Pascal source code
   so that one can compare two outputs of dbinst.

see also
   dbinst

author
   Thomas Dana Schneider

bugs
   may not apply to nocom:

   WARNING:  Some programs have comment starts inside quotes.  DECOM
   IS NOT SMART ENOUGH TO AVOID CHANGING THESE.  If they exist, nocom
   will mess up your program.  Compare the output of nocom with the
   input before you accept the results.

*)
(* end module describe.nocom *)
version = 1.03; (* of nocom.p 1990 May 14
(* begin module describe.normal *)
(*
name
   normal: generate normally distributed random numbers

synopsis
   normal(normalp:in, data: out, output: out) 

files 
   normalp: parameter file controlling the program.
      Two numbers, one per line:
         seed: random seed to start the process
         total: the number of numbers to generate
   data: This is a set of numbers which should have Gaussian distribution
      if the random number generator is a reasonable one.
      It will be N(0,1), a normal distribution
      with mean 0 and standard deviation 1.
   genhisp: control file for the genhis histogram plotting program.
   output: messages to the user

description 
   Test of a random number generator by creating a gaussian
   distribution of numbers for plotting by genhis.
   Method: if U is a member of the set [0..1] and Un and Un+1
   are two members, then define
      theta = Un 2 pi
      r     = sqrt(-2 ln(Un+1)) 
   then when these polar coordinates are converted to Cartesian
   coordinates, one gets two independent Normally distributed numbers,
   with mean 0 and standard deviation 1.  To get other standard deviations
   multiply by a constant, and to get other means, add a constant.

   The proof was from a friend; I only have sketch notes at the moment.
   I'm sure it is available in standard texts.  However, it works,
   as shown by the example.

example
  seed := 0.5;
  total := 10000;
     The mean was 0.00 (to two places) and the standard deviation was 1.01.

see also
   gentst, tstrnd, genhis

author
  Tom Schneider
  National Cancer Institute
  Laboratory of Mathematical Biology
  Frederick, Maryland
  toms@ncifcrf.gov

bugs
   none known
  
*)
(* end module describe.normal *)
version = 3.16; (* of normal.p 1993 Jan 27
(* begin module describe.notex *) 
(*
name
   notex: remove tex and latex constructs
  
synopsis
   notex(input: in, output: out) 
  
files 
   input: a tex or latex file
   output: the file with:
      '\xxx' command words converted to spaces,
      '{$}' converted to spaces
      free floating '.' ',' '(' ')' removed
      comments (%) removed

      multiple spaces are comressed to single spaces.
      multiple lines are compressed to 2 lines (to preserve the
      paragraph structure).

      Only characters numbers and blanks are left behind

description 
   This reduces the number of words counted by wc to something close to correct.
   It is harsher than untex in that it specifically filters out
   everything except numbers, alphabetic chracters and the blank.

author
   Thomas D. Schneider 
  
bugs
   citations and comments on lines by themselves leave a blank line.
  
*)
(* end module describe.notex *)
version = 1.32; (* of notex.p 1991 February 1
(* begin module describe.nulldate *)
(*
name
   nulldate:  modules to neutralize the date-time functions

synopsis
   nulldate(output: out)

files
   output: where the (neutralized) date and time will appear.

description
   If transportation of a program or translation to C is hindered by the
   presence of the date-time modules, then one may want to blank out the
   function of those modules for the time being.  Thus all the dates produced
   will be zero, but one will be able to run the programs Nulldate contains
   modules that will replace corresponding modules in the other module
   libraries which are system dependent.  This will allow easy transportation
   of the Delila system to other computers.

documentation
   moddef, delman.describe.module

see also
   delman.describe.delmod, moddef, delman.describe.module
   delmods, prgmods, matmods, vaxmod

author
   Thomas D. Schneider

bugs
   none known

technical notes
   The datetime package required a const 'namelength' and a type 'alpha'.
   These are part of the book.const and book.type modules of delmod, and
   are identical to those types and consts.  Note:  programs which use
   the datetime package must have these types and consts either from
   delmod or manually declared.
*)
(* end module describe.nulldate *)
version = 1.03; (* of nulldate 1991 Nov 5 *)
(* begin module describe.number *)
(*
name
   number: add line numbers to a file

synopsis
   number(input: in, output: out)

files
   input:  input file
   output: input file with line numbers at the start

description
   Add line numbers to the input file.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs
   Perhaps should have option to define number of columns for
   the line numbers.

technical notes

*)
(* end module describe.number *)
version = 1.08; (* of number.p 1991 September 16
(* begin module describe.odti *)
(*
name
   odti: munch od and time plates together for xyplo

synopsis
   odti(od: in, time: in, odtime: out, output: out)

files
   od: a file containing just an od plate from the tk program.
      blank wells are indicated by an '*'.
   time: a file containing times.
      blank wells are indicated by an '*'.
   odtime: the od and time values are spliced together,
      lines beginning with * are copied to output, then
      the time followed by each od are put on lines by themselves.
      this is the form that that the xyplo program can use for
      plotting.

description
   the od and time plates are fused together for plotting with xyplo.

auther
   tom schneider

see also
   xyplo and tk (written in basic)

bugs
   does not take full tk output.

*)
(* end module describe.odti *)
version = 1.02; (* of odti.p 1993 Jan 27
(* begin module describe.palinf *)
(*
name
      palinf: find palindromes, based on information theory

synopsis
      palinf(book: in, fout: out, palinfp: in, output: out)

files
      book: a book from the delila system
      fout: locations of palindromes
      palinfp: parameters to control palinf, one per line
         1. the minimum rsequence of the palindrome to detect.
            alternatively, if the number is negative, it is the
            desired significance of the detected peaks, given in
            standard deviations.
         2. (optional) size (integer).  the largest size palindrome allowed;
            base pairs across both halves of the site.  if omitted, the
            entire sequence is used (which may be very expensive).
            if this number is even, the next higher odd number will be used.
         3. (optional) if the first character of this line is an 'm' then
            palinf will plot palindrome size (m) versus information content
            (rsequence).  a sharply rising curve indicates a good palindrome.
            'x' means plot position (x) versus information content (rsequence).
            a different character, such as 'n', means to list
            the detected palindromes.
      output: messages to the user.

description
      Each piece of the book is searched for imperfect palindromes with
      significance determined by the first parameter in palinfp.
      There are two kinds of palindrome: even and odd, refering to
      the size of the palindrome in bases.  An odd palindrome will have a
      central base, while an even one will not have one.  Method of use:
      search without the 'm' option to pick out sites of interest.  Then use
      'm' under 'stringent conditions' or on a smaller fragement to see the
      structure of the palindrome.  The final r value will be the maximum of
      r values for all smaller palindromes.
      Note: equiprobable compositions are assumed for e(hnb).

examples
      the parameters [21/71/m] will locate the E. coli lac operator
      uniquely in the 401 bases surrounding the start of the lacZ transcript.

documentation
      Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
      The information content of binding sites on nucleotide sequences.
      J. Mol. Biol. 188: 415-431.

author
      thomas schneider

bugs
      If parameter 2 is very large, spurious sites will be found.

technical notes
      Limiting the size of the palindrome will increase the search speed.
*)
(* end module describe.palinf *)
      version = 2.28; (* of palinf 1987 feb 10
(* begin module describe.parse *)
(*
name
      parse: breaks a book into its components

synopsis
      parse(book: in, list: out, parsep: in, output: out)

files
      book: a book from the delila system
      list: a listing of the parts in the book
      parsep: parse parameters from the user
         if parsep is empty, default values are used.  otherwise, parsep
         must contain four lines corresponding to the variables
         that the user may reset.  they are:
            number of bases printed per line
            symbol to mark the end of sequences
            print header information
            print information about each sequence
            print raw sequences
         the last 3 items are boolean (true/false) values.  if you want to
         have the information, put a t (standing for true) at the beginning of
         the line.  if you do not want it, put an f (standing for false).
      output: messages to the user

description
      to parse is to break into component parts.  this program breaks a book
      into parts.  this allows one to easily look at sequences of a book
      without having to look at the book structure or the fancy listing
      provided by the lister program.

examples
      if parsep contains 60/./f/f/t then the sequences will be listed, with
      the '.' character ending each sequence.  all other information would be
      lost.

author
     thomas schneider

bugs
      only piece information is listed.

*)
(* end module describe.parse *)
      version = 2.20; (* of parse 1988 feb 24
(* begin module describe.patana *)
(*
name
      patana: pattern analysis

synopsis
      patana(pattern: in, anal: out, output: out)

files
      pattern: a pattern matrix, the output of the pattern learning
         program patlrn;
      anal: the analysis of the pattern matrix;
      output: for messages to the user;

description
      patana does some simple analyses of a pattern matrix.  for each
      position (i.e., row) of the matrix it calculates the:
         sum;
         average;
         variance;
         maximum;
         minimum.
      it also calculates the sum of each of those measures.

      the sum of sums is used in other calculations.  if all training
      sequences are the same length, this is the difference between
      the number of + class sequences and - class sequences added
      together to make the w matrix.

      the sum of the average is an estimate of the mean response to
      random sequences

      the sum of the variance is a variance that estimates the
      spread of responses to random sequences.  take the
      square root to obtain the standard deviation.

      the sum of the maxima is the largest response possible.

      the sum of the minima is the smallest response possible.

see also
      patlrn

author
      gary d. stormo (modified by tom schneider)

bugs
      none known
*)
(* end module describe.patana *)
version = 2.18; (* of patana 1987 jul 2
(* begin module describe.patlrn *)
(*
name
      patlrn: pattern learning

synopsis
      patlrn(funcbook: in, funcinst: in, nfuncbook: in, nfuncinst: in,
            pattern: out, start: in, minmax: in, ignore: in, patlrnp: in,
            output: out)

files
      funcbook: the book of sequences belonging to the functional class;
      funcinst: the instructions for funcbook, for aligning the sequences;
      nfuncbook: the book of sequences for the nonfunctional class;
      nfuncinst: the instructions for nfuncbook, for aligning the seqs;
      pattern: the resulting wmatrix which separates the classes;
      start: a matrix for initializing wmatrix to. it is initialized to
            all 0's if this file is empty;
      minmax: to set the values of funcmin (the minimum value for a functional
            sequence) and nfuncmax (the maximum value for a nonfunctional
            sequence). if this file is empty they are set to 1 and 0,
            respectively, and vary along with the matrix;
      ignore: a file specifying regions of the sequences which are to be
            ignored in the learning process; the maximum number of regions
            which can be ignored is set by the constant 'maxignore';  the
            file must contain two integers per line, the first specifying
            the 5' end and the second the 3' end of the region to be ignored.
      patlrnp: parameter file for setting maxtimes, the number of times
            through all the sequences before stopping without a solution;
      output: for messages to the user.

description
      patlrn uses the 'perceptron' algorithm to find a weighting function
      (a 'wmatrix') which serves to distinguish the sequences in the two
      classes from one another.  our paper, stormo et.al., nar 10, 2995 (1982),
      describes the algorithm in detail and gives an example of its use.

see also
      patlst, patana, patser, patval

author
      gary d. stormo (modified by tom schneider)

bugs
      the section of code for ignoring regions of the sequences in the
      learning process (i.e., when the file 'ignore' is not empty) has
      been overlayed over the rest of the code, rather than worked into
      it, and consequently, using this feature can be quite inefficient.

technical note
      the program will be more efficient if the constant 'dnamax' in the
      module 'book.const' is made to be the size of the sequences used
      by the program.  for instance, setting it to whatever 'maxmatrix'
      is would be a good idea.
*)
(* end module describe.patlrn *)
version = 3.24; (* of patlrn 1986 dec 9
(* begin module describe.patlst *)
(*
name
      patlst:  lister of patlrn output.

synopsis
      patlst(pattern: in, patout: out, patlstp: in, output: out)

files
      pattern: the input pattern matrix to be reformatted;  this is the
               output of the program patlrn.
      patout:  the output reformatted pattern matrix.
      patlstp: a parameter file for specifying the pagewidth of the patout
               file;  must contain an integer as the first thing on the first
               line, which specifies the number of matrix elements to be printed

               across a page;  if this file is empty the pagewidth is set to the
               constant 'defpagewidth'.
      output:  for messages to the user.

description
      patlst takes the output from the patlrn program and reformats the
      pattern matrix to run horizontally across the page. it is broken
      so that it fits neatly on the page.  this is useful for making
      publishable copies of the pattern matrices.

see also
      patlrn, patana

author
      gary d. stormo

bugs
      none known
*)
(* end module describe.patlst *)
version = 1.08;  (* of patlst 1989 July 8
(* begin module describe.patser *)
(*
name
      patser: pattern searcher

synopsis
      patser(book: in, pattern: in, scale: in, patserp: in,
             values: out, inst: out, output: out)

files
      book: the book of sequences to be searched. only numbered sequences are
         searched.
      pattern: the pattern used to search with. this is the output of the
         pattern learning program, patlrn.
      scale: contains one integer, by which the values should be divided
         to bring them into the correct scale if a matrix from rseq was used.
      patserp: parameter file, to set the value of 'printmin', the minimum
         value of a site in order for it to be identified in the file 'values'.
         if this file is empty, 'printmin' is set to the functional sequence
         minimum of the pattern matrix.
      values: the sites, and their values, which are evaluated above 'printmin'.
      inst: the instructions to get the regions around the sites identified
         in the file 'values'.  the region obtained is identical to the pattern
         used in the search.
      output: for messages to the user.

description
      patser uses a pattern matrix, the output of the pattern learning program
      patlrn, to search a book of sequences.  each base in each sequence is
      used as the 'aligned base', and the sites which are evaluated above
      'printmin' are identified in the file 'values'.  instructions which can
      be used to obtain those sites, and the nucleotides around them over the
      region contained in the pattern matrix, are put into the file 'inst'.

      NOTE: if the pattern is off the end of the sequence it is
      nolonger reported.

see also
      patrln, patval

author
      gary d. stormo (modified by tom schneider)

bugs
      none known
*)
(* end module describe.patser *)
version = 2.31; (* of patser.p 1992 June 16
(* begin module describe.patval *)
(*
name
      patval: pattern evaluations of aligned sequences

synopsis
      patval(book: in, inst: in, pattern: in, scaleup: in,
            values: out, output: out)

files
      book: the book of sequences to be evaluated;
      inst: the instructions generating the book, for alignment;
      pattern: the wmatrix used to evaluate;
      scale: contains one integer, by which the values should be divided
         to bring them into the correct scale if a matrix from rseq was used.
      values: the value of each sequence in the book;
      output: for messages to the user.

description
      Patval uses a pattern matrix (the output of patlrn) to evaluate a
      book of aligned sequences.

see also
      delman.use.perceptron
      patlrn, patser, patlst, patana

author
      Gary D. Stormo

bugs
      none known
*)

(* end module describe.patval *)
version = 2.23; (* of patval 1989 Mar 29
(* begin module describe.pbreak *)
(*
name
   pbreak: breaks a file into pages at a certain trigger phrase

synopsis
   pbreak(pbreakp: in, input: in, output: out, list: out)

files
   pbreakp: The parameter file which contains the trigger on one line.
      Only one trigger is allowed in pbreakp.
      The next line may contain one integer which represents the right most
      position (in characters, 1 is the first character on a line)
      where the trigger will be looked for.  Default is an enormous number.
   input: the file to break up
   output: the broken file
   list: where messages will appear.

description
   The program pbreak will go through a file, line by line, looking for a
   "trigger" phrase.  Upon finding the trigger on a line, pbreak will insert a
   "new page" mark at the beginning of the line.  This will cause the printer
   to start a new page at this line when the file is printed.  A page number is
   added and an alphabetical index of the lines containing the trigger strings
   and their page numbers is printed at the end of the output file.  The
   trigger phrase can be any string of characters and is in the file pbreakp.
   The pbreak program is thus useful for breaking up large files into
   workable-size chunks, or to make a large file more readable.

examples
   Pbreak has been used to make pascal source code easier to read and
   work with by using the trigger "procedure" to make a file which
   when printed has one procedure to a page.  Pbreak also has been used
   to make the delila manual, delman.  Delman is one large, continuous
   textfile, and pbreak is used to break delman into its formatted pages
   by using the parameters 
        (@ begin module
        1
   which will only recognize modules that begin at the left margin.
   (Note: the @ in the example above must be replaced by a '*' to
   make the example work.  The (@ form fools the compiler, and prevents
   it from thinking I'm doing something funny.)

documentation
   delman.intro.organization: "technical notes"

author
   Patrick R. Roche
   modified by Tom Schneider

bugs
   none known

technical notes
   Three procedures, firstpage, makepage and lastpage contain instructions for
   forming new pages.  These are system dependent.  Constant pagelength
   determines the size of the page.

   If a line is too wide, the page number will not be printed, however the
   number will override the characters on the line in the index at the end.
   The constant "top" defines the maximum length of a buffer, thus the maximum
   length of an input line or a trigger.  The constant "pagewidth" defines the
   page width for numbering of pages and the printing of the index.  Pagewidth
   should be set to the desired page width - 1 to come out right.  The constant
   liston (true or false) indicates whether or not to display the index on the
   file list.

*)
(* end module describe.pbreak *)
version = 4.26; (* of pbreak.p 1993 Jan 27
(* begin module describe.pcs *)
(*
name
   pcs: partial chi squared

synopsis
   pcs(pcsp: in, list: out, output: out)

files
   pcsp:  A series of lines, each of which contains
     of 4 integers, representing the numbers of a,c,g,t.
   list: partial chi squares calculated for the integers
   output: messages to the user

description
   calculate the partial chi squared values of 4 bases

examples

documentation

see also

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.pcs *)
version = 1.02; (* of pcs.p 1991 August 23
(* begin module describe.pemowe *)
(*
name
      pemowe: peptide molecular weights

synopsis
      pemowe(book: in, list: out, output: out)

files
      book: any book from the delila system.  only one weight is given per
         piece in the book.  the first triplet of the piece is the first
         codon translated.
      list: a list of the piece numbers and names in the book, along with
         the molecular weight of the peptide and
         the number of atoms of the peptide for each piece.
      output: messages to the user.

description
      pemowe is designed to find the molecular weights of polypeptides
      that might be coded by a particular sequence.  it is to be used in
      cases where one knows where a particular peptide is, not when one wants
      the weights of all possible peptides.  one should use delila to
      construct the book.  the calculation of the weights takes into account
      loss of water for each peptide bond formed.  calculation ends at stop
      codons or at the end of the piece.

examples
      sanger et. al., nature 265: 687 (1977) on page 692 gives a list of
      calculated molecular weights from phix174.  these can be used to test
      pemowe, by using the delila instructions expepin.  the largest deviation
      from sanger's numbers is for gene j at 3 percent.

documentation
      data is from the crc handbook of chemistry and physics 60th ed, 1980.

author
      thomas d. schneider

bugs
      only one peptide per piece is calculated.  one could write another
      program that predicted peptides (like lister), and generated
      instructions for pulling out those peptides using delila.

*)
(* end module describe.pemowe *)
version = 2.20; (* of pemowe.p 1993 Jan 19
(* begin module describe.prgmod *)
(*
name
      prgmod: programming modules for the delila system

synopsis
      prgmod(input: intty, output: out)

files
      input: interactive file used for testing the program
      output: messages to the user.

description
      prgmod is a set of generally useful modules for programming.
      these include procedures for interactive input/output, producing
      bars of numbers for graphs (called 'numbars') and sorting of
      arrays with a very fast algorithm.
         successful compilation and running of the program indicates
      that the modules are correct.  the program is interactive, so
      to test the modules, follow the instructions prgmods provides.

see also
      delmod, module, alist (uses numbar), index (uses quicksort)

author
      thomas d. schneider

bugs
      none known

technical notes
      the interactive routines may have to be changed when the program
      is transported.

*)
(* end module describe.prgmod *)
version = 4.12; (* of prgmod.p 1993 Mar 26
(* begin module describe.quoteline *)
(*
name
   quoteline: add quote marks to the beginning of every line in a file

synopsis
   quoteline(input: in, output: out)

files
   input:  input file
   output: input file with " at the start of every line

description
   Add quotes to the input file.  This allows generation of moo notes easily.
   One may "cut and paste" pieces of the file into a @edit of a note, and the
   lines will be added to the file.  Crude but effective.

examples

documentation

see also

author
   Thomas Dana Schneider

bugs
   A poor way to do things.  Ftp would be better.

technical notes

*)
(* end module describe.quoteline *)
version = 1.01; (* of quoteline.p 1993 February 5
(* begin module describe.rara *)
(*
name
   rara: rank-rank reformulation of a data set

synopsis
   rara(data: in, xyin: in, output: out)

files
   data: a data set with two columns.  '*' on the start of lines
      are comments copied to the output xyin file.
   xyin: Doubly sorted data.  The first two columns are the original two data
      columns.  The first data column is sorted.  The third column is the rank
      of the first column (1 to n, in order).  The forth column is the rank of
      the second column (1 to n but no longer in order).

   output: messages to the user

description
   To test data correlations but to make them insensitive to outliers, the data
   can be ranked and then graphed or the correlation coefficient found by
   xyplo.

   First, the data pairs are sorted on the second data column.  The second data
   column is then assigned ranks (1 to n).  The data are then sorted again on
   the first data column and the first data column is assigned ranks.  This
   leaves the first data column sorted.

examples

documentation

see also
   xyplo.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.rara *)
version = 1.02; (* of rara.p 1993 March 16
(* begin module describe.rawbk *)
(*
name
      rawbk: make a raw sequence into a book

synopsis
      rawbk(raw: in, book: out, input: intty, output: out)

files
      raw: a file with a sequence on it, that is only the letters
         a,c,g,t, or u, with any spacing and carriage returns.
      book: a file which contains the sequence in the book form
         such that it will interface with the delila system programs.
      input: the interactive input from the keyboard. rawbk needs to
         get some information from the user to name the sequence.
      output: where error messages will appear.

description
      The purpose of this program is to allow one to rapidly create a book
      from a raw sequence. rawbk will take a 'raw' sequence and put it into
      the standard form of a book so that the delila system programs can be
      used on the sequence.  The user is asked for one name, which will
      become the name of all things in the book (title, organism,
      chromosome and piece).
         The program reads thru 'raw', keeping track of characters and lines.
      It will flag any letters other than 'a','c','g','t', or 'u', that appear
      in the file and note their locations. it will count the bases. if any
      characters were flagged, or any other error occurs, rawbk will
      put 'halt' into the book, in the same form the librarian does, to
      prevent further use of the book.  Otherwise, the book is constructed
      to contain one piece of sequence.  The coordinates begin with base 1.

see also
      makebk

author
      Thomas D. Schneider

bugs
      The program should use book writing routines from delmods, but it
      has not been updated yet.

*)
(* end module describe.rawbk *)
version = 3.12; (* of rawbk 1988 july 9
(* begin module describe.ref2bib *)
(*
name
   ref2bib: refer to bibtex converter

synopsis
   ref2bib(refs: in, bib: out, output: out)

files
   refs: reference list used by refer
   bib: reference list used by bibtex
   output: messages to the user

description
   The program converts from refer to bibtex reference list formats.

documentation
   man refer, LaTeX reference manual

author
   Thomas Dana Schneider
   National Cancer Institute
   Laboratory of Mathematical Biology
   Frederick, Maryland
   toms@ncifcrf.gov

bugs
   only a few of the refer types have been converted.
   see comments in the code for the entire list.

*)
(* end module describe.ref2bib *)
version = 1.46; (* of ref2bib 1988 December 14
(* begin module describe.refer *)
(*
name
      refer: print the references in the pieces of a book

synopsis
      refer(book: in, list: out, output: out)

files
      book: any book from the delila system
      list: references in the pieces of the book, organized by organism
         chromosome and map location.
      output: messages to the user.

description
      refer is a convenient way to obtain the references for pieces.
      since each piece note contains other information, this will also be
      printed.

author
      thomas schneider

bugs
      references have no standard format, so the output can not be
      formatted more than what is in the book.

*)
(* end module describe.refer *)
version = 2.06; (* refer 1986 dec 2
(* begin module describe.reform *)
(*
name
      reform: raw sequences reformatted

synopsis
      reform(fin: in, fout: out, input: intty, output: out)

files
      fin: the raw sequences to be reformatted.  the file must contain
         only the letters: 'a', 'c', 'g', 't', and/or 'u'.
      fout: the reformatted sequence.
      input: the user defines how to reformat the sequence:
         reformat - the sequence is only reformatted.
         invert - the order of the sequence stays the same, but the bases
            are complemented.
         complement - the order of the sequence is reversed and the bases
            are complemented.
         the user also specifies the number of bases to be printed on
         each line of fout.
      output: messages to the user.

description
      Reform allows one to type a file containing raw sequence data typed in
      whatever form is convenient, and to convert it into a form that
      the merge program can use to compare to a second typed copy.
      For example, when a sequence is to be entered from two strands,
      the sequence and its complement, one can enter both strands and then
      invert the second strand for comparison by merge.  Alternatively,
      the second strand could be entered backwards and the complement taken
      using this program.

see also
      merge

author
      Thomas Schneider

bugs
      The first typed line is ignored because of a problem with
      the standard input procedures on Unix.  Simply type a carriage
      return when the program starts up.

*)
(* end module describe.reform *)
version = 1.20; (* of reform 1992 September 12
(* begin module describe.rembla *)
(*
name
      rembla: remove blanks from ends of lines in a file

synopsis
      rembla(fin: in, fout: out, output: out)

files
      fin: a text file
      fout: a copy of fin with trailing blanks removed from all lines,
         any blank lines at the end of the file will also be removed.
      output: messages to the user

description
      blanks can creep onto the end of lines in a file without one knowing
      it, either by the computer system, from transportation or an editor.
      this program removes those blanks, so that less storage is needed
      for the file.  some programs require that there be no blank lines in the
      file,  yet transportation can generate blank lines at the end of
      the file.  this program will remove such lines.

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.rembla *)
version = 2.08; (* of rembla 1986 dec 12
(* begin module describe.rep *)
(*
name
      rep: records repeats between sequences in two books

synopsis
      rep(hlist: in, xbook: in, ybook: in,
             fout: out, pout: out, repp: in, output: out)

files hlist: a list of helices for xbook and ybook generated
             by the program helix.
      xbook: a book from the delila system.
      ybook: a book from the delila system.
      fout:  a file containing the following information about each repeated
         sequence that satisfies the criteria of repp:
         *  the 5 prime ends of the two occurrences of the repeat.
         * rlength, length of the repeated sequence.
         * distance: if direct repeats, the number of bases from five prime end
            to five prime end of each repeat; if inverted repeats, the number
            of bases from three prime end to five prime end (i.e., pseudo-loop
            distance). in every case, the smallest possible distance is given.
      pout: a file containing information about palindromes (only filled when
            inverted repeats are found in related sequences, see below).
      repp: input parameter file, must contain 3 characters, one per line.
               this may be followed by 4 integers, one per line.
         * mode of repeat:
            d = direct repeat (xbook and ybook have opposite directions)
            i = inverted repeat (xbook and ybook are in the same direction)
         * the types of xbook and ybook used in helix program:
            u = unrelated (any two sequences - no distances are calculated)
            r = related (sequences derived from the same piece of dna. the
                coordinate numbering of both books must be the same in order
                to calculate distances.)
         * the energies of hlist reflect the composition of the repeat
            e = "energies" are to be reported
            n = no "energies" are to be reported
         * minimum number of bases in a repeat to be recorded.
         * maximum number of bases in a repeat to be recorded.
         * minimum distance between repeated sequences to be recorded.
         * maximum distance between repeated sequences to be recorded.
      output: messages to the user.

description
      rep uses information generated by the helix program to record the
      occurrences of repeated sequences of dna. helices are interpreted as
      repeats, direct or inverted depending upon the input sequences.
      repeats that meet the criteria of minimum length and minimum and/or
      maximum distance between half repeats are reported in fout.
      palindromes are reported in pout.

see also
      helix, matrix, keymat

author
      Britta Singer and Lane Wyatt

bugs  1.  when xbook and ybook have sequence in common, hlist reports
      each "helix" twice.  rep is able to eliminate duplicates only
      when xbook and ybook overlap completely.  thus in cases of
      partial overlap, some repeats may be duplicated.
      2.  rep uses external coordinates to calculate distances and will
      bomb with complicated coordinates. *)
(* end module describe.rep *)
version = 1.73; (* of rep.p 1993 Jan 27
(* begin module describe.repro *)
(*
name
        repro:  make multiple copies of a file

synopsis
        repro(fin: in, fout: out, input: intty, output: out)

files
        fin: any file of which multiple copies are wanted
        fout: the new copies of the desired files
        input: the paramiter file giving (n), the number of copies
           to be made. this is the interactive file.
        output: messages to the user

description
        This tool enables the user to make any number of copies
        of a file.  Each copy begins on a new page.

examples
        A typical use for this program is to make multiple copies
        of the delman manual after breaking it into pages with
        the program break.

see also
      break

author
        Billie Hall Lemmon

bugs
        none known

*)
(* end module describe.repro *)
version = 2.04; (* repro 1986 dec 12
(* begin module describe.rf *)
(*
name
   rf: calculate Rfrequency

synopsis
   rf(input: in, output: out)

files
   input: interactive input from the user
   output: messages to the user

description
   calculate Rfrequency as

   Rf = - log (number of binding sites / genome size)

   where the log is to the base 2, giving the result in bits.

documentation
   Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
   The information content of binding sites on nucleotide sequences.
   J. Mol. Biol. 188: 415-431.

author
   thomas d. schneider

bugs
   The program depends on interactive procedures on a particular computer
   and so may need to be modified on transportation.

*)
(* end module describe.rf *)
version = 1.07; (* of rf 1988 October 14
(* begin module describe.ri *)
(*
name
   ri: Rindividual is calculated for every site in the aligned book

synopsis
   Ri(inst: in, book: in, rsdata: in, values: in, rip: in,
      xyin: out, sequ: out, ribl: out, output: out)

files
   inst: delila instructions of the form 'get from 56 -5 to 56 +10;'
      (This file may be empty, in which case the sequences will be
      aligned by their 5' ends.)
   book: the book generated by delila using inst
   rsdata: data file from rseq program
   values: a file containing the values of the objects to which the Ri
      values are to be compared.  The file may be empty.
   rip:  Parameters to control the program.
      On the FIRST LINE are the FROM and TO over which to do the Ri
      calculation.  These must not exceed either of those in the inst/book or
      the rsdata.

      The SECOND LINE defines the column of the values file to use.

      The THIRD LINE: two integers: the lowest and highest evaluation to report
      to xyin and sequ.  If the first character of the line is 'a' then all
      evaluations are reported.  Otherwise two real numbers are expected.
      Sequences within this range are printed to xyin and to sequ depending
      also on the fifth parameter.

      The FOURTH LINE: two integers: the lowest and highest evaluation to
      report to xyin and sequ.  If the first character of the line is 'a' then
      all evaluations are reported.  Otherwise two real numbers are expected.
      Sequences within this range are printed to xyin and to sequ depending
      also on the fifth parameter.

      The FIFTH LINE determines whether or not to produce any raw sequences in
      the sequ file.  If the first character of the line is 'p', sequences
      selected according to the third and fouth parameters are printed to sequ
      file.  (This is a complete on-off switch for the sequ file.)

      The SIXTH LINE determines whether or not to print the sequence of the
      site being analyzed.  If the first character is 'p' then the sequence is
      printed to the xyin file.

      The SEVENTH LINE determines whether or not to print sequences which have
      a partial site.  The problem is that if there is part of a site, then the
      Ri value is questionable, depending on where the deletion was.  The best
      analysis would not use a partial site, as it messes up the statistics.
      If the first character is:

         n  Don't print the line at all.
         i  Keep the line, but force the Ri value to be -infinity.
            This allows the lines of xyin to be correlated to the values
            still.
         -  (any other character): print as it is.

      The EIGHTH LINE determines what to do when f(b,l) = 0. Positions for
	 which f(b,l) = 0 will have negative infinity in the Ri(b,l) table.
	 The letter 's' means to use Rodger Staden's method of giving 1/(n+t),
	 where t is a non-negative integer following the 's'.  When t = 0, it
	 is Staden's method.  Using t=1 may be the most logical choice.  If
	 there is no 's', the program expects a number which the value for
	 negative infinity.  It should be a value sufficiently below zero so
	 that sites that are being excluded from the definition according to
	 f(b,l) are separated from the true sites.  -1000 is a useful value, as
	 it will always displace sites with exceptions far away from zero.

   xyin: input to the xyplo program.
      The Ri(b,l) table is reported in comments in the table, along with
      the value of the consensus (largest possible evaluation) and the
      anti-consensus (smallest possible evaluation).  The rest of the
      file contains these columns of data:
         piece number
         piece name
         length of region analyzed on this piece
         sequence region analyzed 
         Rindividual for the piece
         value from the values file (or 0 if values is empty)

   sequ: the raw sequences reported to xyin if any selection is made
      (fourth line of rip file).  These end in periods, so they can be
      given to makebk to create a book.

   ribl: weight matrix Ri(b,l).  The information content for each base b at
      each position l, in bits.  Lines that start with * are notes.  the next
      line contains the matrix FROM-TO coordinates, this is followed by the
      matrix in the order A, C, G, T from FROM to TO.

   output: messages to the user

description
   The program determines the individual informations of the sites in the book
   as aligned by the instructions, according to the frequency table given in
   the rsdata file.  The program calculates the Ri(b,l) table:

       Ri(b,l) := 2 - (- log2( f(b,l)))

   and sums this up for each sequence.  Ri is defined so that the average of
   the Ri's for a set of sequences is Rsequence.  However, if the sequences are
   incomplete, the average will probably be less than Rsequence.  The xyin
   output is ready to read into the xyplo program for plotting and linear
   regression.  The ribl matrix is ready to be used to scan sequences with the
   scan program.

   The program can be used in subtle ways.  For example, one can analyze the
   individual information of the left half of a binding site.  This result can
   then be used in the values file to compare against the analysis of the right
   side of a binding site.

author
   Thomas D. Schneider

examples

rip:

-10 +10       From-to range to do the evaluation
1             column of the values file to copy to xyin
a 0 1000      lowest to highest Ri to put in xyin and sequ (a = any)
a -1000 +1000 lowest to highest Value to put in xyin and sequ (a = any)
n             p means print sequence to the sequ file
p             p means print sequence to the xyin file
-             -: accept all sites; n: no partials; i: partials -> -infinity
s 1           s: use Staden's Method, f(b,l)=1/(n+t); else negative infinity

documentation

@article{Staden1984,
author = "R. Staden",
title = "Computer methods to locate signals in nucleic acid sequences",
journal = "Nucl. Acids Res.",
volume = "12",
pages = "505-519",
year = "1984"}

and

@unpublished{SchneiderRi,
author = "T. D. Schneider",
title = "Measuring the Information of Individual Binding Sites
on Nucleotide Sequences",
comment = "indiv.tex",
note = "in preparation"}

see also
   rseq.p, xyplo.p, scan.p

bugs

technical notes

*)
(* end module describe.ri *)
version = 1.97; (* of ri.p 1993 March 21
(* begin module describe.riden *)
(*
name
   riden: ring density graph

synopsis
   riden(color: in, xyin: out, output: out)

files
   color: output of the ring program
   ridenp: parameter file for this program.  Two lines:
           First line: largest radial distance recorded
           Second line: number of bins to store the data in
   xyin:  histogram of the density
   output: messages to the user

description
   This program converts the graph generated by the ring program into
   a form that allows one to see if the results are as expected.

examples

documentation

see also
   ring.p

author
   Thomas Dana Schneider

bugs
   Program only works for D=2.  The curves don't match for D=4, but
   do for higher dimensions.  It is not obvious why.

technical notes

*)
(* end module describe.riden *)
version = 1.28; (* of riden.p 1989 November 25
(* begin module describe.rila *)
(*
name
   rila: reformat the ribl table into latex format

synopsis
   rila(ribl: in, latex: out, rilap: in; output: out)

files
   ribl: output of ri program
   latex: table format for LaTeX
   rilap:  two integers that define the range of ribl to convert.  required.
   output: messages to the user

description
   Read the ribl and reformat it so it can be used in a LaTeX table.

examples

documentation

see also
   ri.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.rila *)
version = 1.16; (* of rila.p 1992 August 19
(* begin module describe.ring *)
(*
name
   ring: z space ring

synopsis
   ring(data: in, ringp: in, color: output, output: out)

files
   data: set of Gaussianly distributed variables from the program gentst.
   ringp: parameters:
     first line: total dimensionality D.
     second line: number of points to do.  If the end of the data is reached,
       the actual number of points generated is reported to output.
     third line: number of steps to generate the fD(r) graph.
       (The range for this is always -2.5 to +2.5 on both x and y axes.)
       If the number of steps is less than 1, then no smooth graph is done.
     fourth line: a real number, "0 <= partial <= 1" by which to multiply
       the actual fD(r) density by to obtain the density reported to the
       color file.  This allows one to tone down the gray scale, or to
       avoid having the highest density of color equal the lowest (as when
       the hue is used and a hue of 1 is the same as a hue of 0).
     fifth line: printing of data on plot (one character):
       d=dimension,p=dimension+point,a=all,n=none

   color: a xyin file for input to the xyplo or riden program.  The columns are:
        1   symbols:    f=from fD(r), s = simulated point');
        2   x:          x coordinate
        3   y:          y coordinate
        4   xwidth:     width of symbol on x axis
        5   ywidth:     width of symbol on y axis
        6   density:    density
        7   inverse:    1 - density (for inverse plotting)
        8   maximum:    MAXimum density
        9   minimum:    MINimum density
       10   maximum:    MAXimum density
       11   minimum:    MINimum density
       12   partial:    partial density for grey tones

      Partial is the largest density allowed.  When plotted in color, hues come
      from a color wheel in which the highest color is almost identical to the
      lowest color.  That is the color of hue=1 is almost identical to the
      color of hue = 0.  To avoid this effect, make partial less than 1.0.  A
      partial less than 1.0 also avoids completely black gray scale plots.

   output: messages to the user, number of points generated.

description
   Simulate mapping from many-dimensional to 2-dimensional Z space.  Sets of D
   Gaussian values are read from the data file, squared, summed and square
   rooted.  The x and y value in Z space is determined from an angle and a
   radius.  The angle is found from the last two Gaussian values, while the
   radius is determined by the noise (rms) for all dimensions.  The statistical
   function fD(r) is to be graphed in color or gray scale using xyplo, while
   the simulated points are graphed as points on top of the smooth fD(r)
   function.  The program output is ready to read into the xyplo plotting
   program.

examples
   ringp used for generating figures:
16      total dimensionality
100     number of points to do
128     steps for plotting smooth fD(r) graph
0.50    partial
d       d=dimension,p=dimension+point,a=all,n=none

   xyplop used for generating figures:
2 2       zerox zeroy         graph coordinate center
x -2.5 2.5 zx min max         (character, real, real) if zx='x' then set xaxis
y -2.5 2.5 zy min max         (character, real, real) if zy='y' then set yaxis
10 10     xinterval yinterval number of intervals on axes to plot
4 4       xwidth    ywidth    width of numbers in characters
1 1       xdecimal  ydecimal  number of decimal places
5 5       xsize     ysize     size of axes in inches
x
y
c         zc                  if zc='c' then a crosshairs put on zero of x and y
n 2       zxl base            if zxl='l' then make x axis log to the given base
n 2       zyl base            if zyl='l' then make y axis log to the given base
          *********************************************************************
2 3       xcolumn   ycolumn   columns of xyin that determine plot location
1         symbol column       the xyin column to read symbols from
4 5       xscolumn  yscolumn  columns of xyin that determine the symbol size
10  8  7  hue saturation brightness   columns for color manipulation
          *********************************************************************
r         symbol to plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
b         symbol flag         character in xyin that indicates that this symbol
-1.0      symbol sizex        side in inches on the x axis of the symbol.
-1.0      symbol sizey        as for the x axis, get size from yscolumn
n         no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
r         symbol to plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
f         symbol flag         character in xyin that indicates that this symbol
-1.0      symbol sizex        side in inches on the x axis of the symbol.
-1.0      symbol sizey        as for the x axis, get size from yscolumn
n         no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
c         symbol to plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
s         symbol flag         character in xyin that indicates that this symbol
0.0858    symbol sizex        side in inches on the x axis of the symbol.
0.0858    symbol sizey        as for the x axis, get size from yscolumn
n         no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
g         symbol to plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
g         symbol flag         character in xyin that indicates that this symbol
-1.0      symbol sizex        side in inches on the x axis of the symbol.
-1.0      symbol sizey        as for the x axis, get size from yscolumn
n         no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
.
          *********************************************************************


Useful color parameters are:

8 6 10  Light density plot, printable on a black and white device (best).
8 7 10  Dark density plot, printable on a black and white device.
6 8 10  Color plot, red background.
7 8 10  Color plot, purple background (neat).
6 7 10  Color and density varying to make the simulated points easy to see.
        (red background)
7 6 10  Color and density varying to make the simulated points easy to see.
        (white background - lovely!)

Warning: since the program has changed, these may no longer be correct.

documentation
   ccmm

see also
   gentst.p xyplo.p riden.p

author
   Thomas Dana Schneider

bugs
   none known.  Confirm that the density distribution is correct by
   using program riden.

technical notes

*)
(* end module describe.ring *)
version = 3.00; (* of ring.p 1989 Nov 25
(* begin module describe.rndseq *)
(*
name
      rndseq: generate random dna sequences

synopsis
      rndseq(sequ: out, rndseqp: in, output: out)

files
      sequ: the random sequence
      rndseqp: parameters to control the generation of the sequence,
         on 4 lines:
         number (integer): the number of sequences to generate;
         length (integer): the length of each sequence;
         a c g t (4 integers): the proportions of bases desired;
         seed (real): a number between 0 and 1 is the starting seed
            for the random number generator.  a number outside
            this range indicates that the date and time should
            be used.  the date and time 83/10/17 20:15:32
            makes a seed of 0.235102710138.  the date-time
            is used backwards to assure that 1) the seed is
            always unique, and 2) it varies rapidly with time.
      output: messages to the user.

description
      rndseq creates randomly generated dna sequences, separated
      by periods.  the number, length and composition of the sequences
      are all specified by the user.  the user can also set the
      start point (seed) of the pseudo-random number generator.  if the
      same seed is given at a later time, then the same series of
      bases will be produced.  alternatively, the user can have the
      program use the current date and time to create a unique seed.

examples
      5       number of sequences
      100     length of each sequence in base pairs
      1 1 1 1 ratios of a, c, g, t
      2       random generator seed: 0 to 1; outside this: inverse date/time

author
      Thomas Dana Schneider

bugs
      none known

technical notes
      the number of characters per line is set by constant linelength.
*)
(* end module describe.rndseq *)
version = 1.10; (* of rndseq.p 1993 March 25
(* begin module describe.rseq *)
(*
name
      rseq: rsequence calculated from encoded sequences

synopsis
      rseq(encseq: in, cmp: in, rsdata: out, wmatrix: out, output: out)

files
      encseq: the output of the encode program
      cmp: a composition from the comp program.
          if cmp is empty, then equal frequencies are assumed.
      rsdata: a display of the information content of each position
         of the sequences, with the sampling error variance.
         This output is ready to be used as input to rsgra or as data
         for genhis for plotting.
      wmatrix: a weight matrix for searches.
      scale: contains an integer that is the amount by which the values
         in wmatrix have been multiplied.  By dividing by this scale up
         factor the wmatrix values will be normalized to bits.  This allows
         the wmatrix to contain integers.
      output: messages to the user.

description
      Encoded sequences from encseq are converted to a table of frequencies
      for each base (b) at each aligned position (l).  rsequence(l)
      and the variance var(hnb) are calculated and shown along with
      their running sums.  rsequence and the variance due to sampling
      error are shown for the whole site, but the running sums let one
      find rsequence and the variance for any subrange desired.
         n, the number of example sequences may vary with position, so
      both n and e(hnb) are shown.
         A w matrix, w(b,l) is generated that can be used to search
      for sites.  When applied to the original aligned sequences, the
      average of the individual values will be rsequence.  (this will
      not be exactly true if the number of samples varies with
      position in the site, n(l)).

documentation
      Schneider, T.D., G.D. Stormo, L. Gold and A. Ehrenfeucht (1986)
      The information content of binding sites on nucleotide sequences.
      J. Mol. Biol. 188: 415-431.

see also
      encode, comp, encfrq

author
      Thomas D. Schneider

bugs
      Does not handle di-nucleotides or longer oligos

technical notes
      Constants maxsize (procedure calehnb) and kickover (procedure
      makehnblist) determine the largest n for which e(hnb) is used.  Above
      this, ae(hnb) is used.  Do not set these below 50 without careful
      analysis.  Other constants are in module rseq.const.

*)
(* end module describe.rseq *)
version = 5.32; (* of rseq.p 1990 Oct 2
(* begin module describe.rsgra *)
(*
name
   rsgra: rsequence graph
  
synopsis
   rsgra(rsdata: in, picture: out, rsgrap, marks: in,
         output: out) 

files 
   rsdata: data file from rseq program
   picture: graph of rsequence in PostScript
   rsgrap: parameters to control the program.
      first line: two integers that define the from-to range to display
   marks: an empty file or a set of integers, one per line that are the
      locations of bases that should be specially marked on the graph.
      If the first line of the file begins with the letter 'b' and is
      followed by a real number, then this number defines the location of
      a bar to be placed on the graph immediately after the position given.
   output: messages to the user
  
description 
   Rsgra generates a graph of Rsequence versus position l.

   See the discussion about the REMOVE feature in makelogo.p.
  
author
   Thomas D. Schneider 
  
bugs
   none known  
  
*)
(* end module describe.rsgra *)
version = 4.99; (* of rsgra.p 1992 July 21
(* begin module describe.rsim *)
(*
name
      rsim: Rsequence simulation

synopsis
      rsim(rsimp: in, cmp: in, xyin: out, output: out)

files
      rsimp: paramters to control the program:
         n: number of sequences to use to generate each fbl(simulated)
         rangelow,
         rangehigh: low and high bounds of the range of the matrix
         Rs: estimated value of Rsequence from the rsgra program
	 SD: Standard Deviation of Rs based on sample size from the rsgra
	    program.
            This defines the range Rslower = Rs - SD; Rsupper = Rs + SD.
         seed: a real number between 0 and 1 used to start the random
            number generator.  The date and time is used if this
            number is outside 0 to 1.  (N.B. if the system random
            number generator has been used in procedure rnd, then
            this parameter will have no effect.)
         simulations: number of fbl(true) to make
         Rtlower: lower limit to Rsequence(true) to work with.  This allows
            one to remove the small ones and get on with the ones of interest.
         Rtupper: upper limit to Rsequence(true) to work with.
         selection: if the first character of the line is 's', then only
	    those points which fall in the Rslower to Rsupper range are put
	    into xyin.  (Ie, only the 'p' values.)  This allows very large
	    crunches to be done which don't create such a large xyin file.
      cmp: composition file from comp program.  If it is empty, the program
         will assume equiprobable bases.
      xyin: output of the program, input to the xyplo program
         column 1: values of R(simulated) that fall within the Rslower
                   and Rsupper range are indicated by a 'p', others by 'n'.
         column 2: Rsequence(true)
         column 3: Rsequence(simulated)
      output: messages to the user.

description
      Rsim stands for Rsequence-simulation.  The program generates a set of
      Rsequence values to determine the variation of Rsequence for small sample
      sizes.

      Method.

      A frequency table is constructed with zero information content, namely it
      contains 0.25 in all positions (l) and bases (b).  This table, fbltrue,
      is 'evolved' by altering the frequencies until it has an information
      content Rsequence(true) (=Rtrue) at least as high as Rtlower.

      A set of n sequences is generated using the fbltrue probabilities, and
      the information content, Rsimulated, is calculated for the set.  We
      select out those Rsimulated values which fall within the range of the
      Rs+/-SD.  This is repeated many times.  The distribution of Rtrue values
      (which correspond to the selected Rsimulated values) represents the range
      of possible information contents of frequency tables which could have
      produced the observed results.  In this way, we bootstrap ourselves to
      get the range.  Note that SD is only a measure of small sample size.

      Use.

      Run an information analysis of the sites.  This analysis determines n,
      rangelow and rangehigh for the rsimp.  From the output of rseq (rsdata
      file), determine Rs and SD over the same range.  Begin with only a few
      simulations.  It is preferable to determine how long each simulation
      takes using at timing program like the UNIX /usr/5bin/time, so that the
      time for the final simulations can be predicted.  10,000 simulations is
      sufficient for the final analysis.  Set Rtlower and Rtupper wide at first
      to be sure to capture the whole distribution.

      Graph the results with the xyplo program, using the rsim.xyplop
      file for parameters.  The output looks like:

               Rsimulated
               |                    .  . .                                 
               |                  .  .                                     
               |                 .  .                                      
   Rs + SD     |              o  o                                         
   Rs          |             oo o                                          
   Rs - SD     |           ooo                                             
               |         ..                                                
               |        .                                                  
               |    .                                                      
               |  .                                                        
               | ..                                                        
               ----------------------------                                
                        Rtrue             

                 ^                       ^
                 Rtlower                 Rtupper

      The program choses a random number between Rtlower and Rtupper, Rtrue.
      Then it creates the fbltrue matrix with all 0.25 values.  This places
      Rtrue at 0 initially.  The matrix is evolved up to the current Rtrue
      value.  Therefore the set of all fbltrue matricies should have a flat
      information content distribution.  YOU MUST CHECK THAT THIS IS TRUE!!
      Copy the xyin file to the name 'data' and use the genhis program with
      these parameters:

c 2
x n 30

      to get a histogram of the distribution of Rtrue, coming from column 2 of
      the file.  The distribution should be reasonably flat over the entire
      region of the small circles (o) above.  If it is not, you must determine
      what is wrong before continuing.

      Those small circles represent the range that Rs +/- SD slices
      horizontally from the distribution of Rtrue versus Rsimulated.  Recall
      that an each Rtrue leads to an fbltrue from which a single simulation of
      n binding sites is created; the information content of that is
      Rsimulated.

      So we want the distribution of Rtrue within the bounds of the slice.  To
      do this, we select that slice for analysis.  In UNIX, we pull out all
      lines from xyin which have 'p' in them (p means: "plot this").  Use:

          grep p xyin > data

      Then run genhis with these parameters:

c 2
p g
x n 30

      Notice how well or poorly the plotted gaussian ("p g") fits your
      distribution.  If it is a good fit you are done.  Take the standard
      deviation which genhis provides.  Use the original Rsequence value for
      the mean.   (The mean found on the genhis listing this way will be
      approximately Rsequence, but it has been created by passage through the
      simulation, so is not as good as the orginal data.)

documentation

@article{Schneider1986,
author = "T. D. Schneider
 and G. D. Stormo
 and L. Gold
 and A. Ehrenfeucht",
title = "Information content of binding sites on nucleotide sequences",
journal = "J. Mol. Biol.",
volume = "188",
pages = "415-431",
year = "1986"}

@article{Stephens.Schneider.Splice,
author = "R. M. Stephens
  and T. D. Schneider",
title = "Features of spliceosome evolution and function
inferred from an analysis of the information at human splice sites",
journal = "J. Mol. Biol.",
volume = "228",
pages = "1124-1136",
year = "1992"}

see also
      rseq, xyplo, genhis, rsim.xyplop

author
      Thomas D. Schneider

bugs
      Does not handle di-nucleotides or longer oligos

technical notes
      Constants maxsize (procedure calehnb) and kickover (procedure
      makehnblist) determine the largest n for which e(hnb) is used.  Above
      this, ae(hnb) is used.  Do not set these below 50 without careful
      analysis.  Other constants are in module rsim.const.

      Although it is possible to create more than one Rsimulated from
      each Rtrue, this causes vertical streaks on the graph, and so
      will distort the simulation.  It's better to get a completely
      clean one each time.

      Originally, a psudo random generator was used to create fbltrue from a
      random matrix (rather than 0.25) but this causes problems because such a
      matrix contains information and so low information points are under
      represented and higher ones over represented.  This distorts the
      statistics!

      The program contains a portable random number generator.  Unfortunately
      this can be 10 times slower than the non-portable one available on most
      systems.  The procedure rnd allows one to switch between the two.  When
      the system generator is used, one may find that the random numbers repeat
      exactly from one run to the next.  The seed parameter would not affect
      the results.  To avoid this problem, the random number generator is run
      until the requested seed is produced, within the tolerance given by the
      constant seedtolerance.  The runs are displayed on the output.

*)
(* end module describe.rsim *)
version = 2.17; (* of rsim.p 1993 January 26
(* begin module describe.same *)
(*
name
      same: counts the number of lines that are identical in two files

synopsis
      same(a: in, b: in, output: out)

files
      a: any file
      b: a file to be compared to file a
      output: messages to the user.

description
      same counts the number of lines that are identical in two files,
      a and b.  if the files are identical up to the end of one file,
      but the other file continues, same counts the identical lines and
      tell the user which file is shorter.   no lines are examined after
      the first line that differs between the two files.
         blanks at the ends of lines are ignored.

see also
      merge

authors
      britta swebilius singer and thomas d. schneider

bugs
      none known

*)
(* end module describe.same *)
version = 1.11; (* of same 1985 apr 20
(* begin module describe.scan *)
(*
name
   scan: scan a book with a wmatrix and generate a vector

synopsis
   scan(book: in, ribl: in, scanp: in, data: out, output: out)

files
   book: a book from the delila system
   ribl: a weight matrix from sites or ri programs.
      Lines that start with * are notes.  the next line contains the matrix
      FROM-TO coordinates, this is followed by the matrix in the order A, C, G,
      T from FROM to TO.
   scanp: parameters to control the program.
      seqs: One integer on the first line is the number of sequences to scan to
      produce the vector.  0 = none, positive = that number; negative = all.

      Ri cutoff: One real on the second line is the information content at or
      above which to report in the data file.

      Probability cutoff: One real on the third line is the lowest probability
      which to report in the data file.  The probability of a site is determined
      from the mean and standard deviation of the Ri distribution.

      range: two integers that define the FROM-TO range of the ribl matrix to
      use.

      ways:  One integer.  2 means scan both the sequence and its complement.
	 1 means simply scan the sequence.  0 means to let the program figure
	 it out.  The program determines the symmetry of the matrix.  If it is
	 symmetrical, it will only scan one way.  If it is asymmetrical, both
	 scans are done.

   data: The results.  Comments are lines that begin with '*'.  The columns are
      defined in comments in the file.  The matrix is searched over both the
      sequence and its complement.  Ri is reported, as is the Z and probability
      based on the mean and st.dev.
   output: messages to the user

description
   The Ri(b,l) weight matrix is scanned across the sequences in the book to
   produce a vector.

examples

documentation

see also
   sites.p ri.p genhis.p

author
   Thomas Dana Schneider

bugs

technical notes
   The mean and standard deviation of the Ri distribution are stored just
   after the Ri(b,l) table in the ribl file.  They are produced automatically
   by the ri program.

*)
(* end module describe.scan *)
version = 1.92; (* of scan.p 1993 January 26
(* begin module describe.search *)
(*
name
      search: search a book for strings

synopsis
      search(book: in, inst: out, result: out, input: intty, output: out)

files
      book: any book from the Delila system
      inst: Delila instructions of the form 'get from 56 -5 to 56 +5;'
         that define the location of found strings.  one must turn on printing
         to the inst file to obtain these (see below).
      result: a transcript of the results seen on the output file.
         Lines not containing numerical data begin with an '*' so that
         they can be ignored by other programs such as genhis and xyplo.
      input: typed input from the user, or a file of rules.
      output: messages, results and prompts to the user.

description
      (note: in the following examples, do not type the quote marks.)
      the search program allows one to look for simple patterns in a book.
      the patterns can be like 'ggag', that is, with particular bases
      (always written 5' to 3') or it can include unknown 'spacing' bases,
      as in 'ggagnnnnnnnnnatg'.  any base will be allowed in the n positions.
      one can shorten the instruction: 'ggag9natg', and one can make some of
      the spacing 'extentable' as in 'ggag5e4natg' which allows a 5 to 9
      spacing between the two elements.  one can obtain Delila instructions
      for the strings found by turning on printing, setting 'from' and 'to'
      values and searching.  for example: 'd p f -5 t +10 q gga6e3n#atg'
      sets up printing, with from=-5, to=+10.  the search will result in
      instructions for strings centered on the a of the atg (by the # symbol).
      the form '(a/g)ct' means to search for both 'act' and 'gct'.
      you may specify numbers of mismatches, and control how much is printed.
      you can type many commands on one line, separated by spaces.
      you can also search for relations between bases.  currently the
      allowed relations are: identity, non-identity, complementarity and
      non-complementarity.  see delman.use.search or type 'help' while inside
      the program to get more information.

      If one is working with an odd binding site (one with an odd number
      of bases) one should use the # symbol to obtain Delila instructions.
      The complement sequence will continue to number the central base.
          gaa#nttc complemented becomes gaa#nttc

      If one is working with an even binding site (one with an even number
      of bases) one should use the % symbol to obtain Delila instructions.
      The complement sequence will continue to number the following base.
          ga%attc complemented becomes ga#attc

documentation
      delman.use.search

author
      Thomas D. Schneider, modified by Gary Stormo

bugs
      there is overlap between the letters used as commands to the program
      and letters used as ambiguous bases.  for instance, h can mean (a/c/t) or
      it can mean 'help'.  the best way to avoid confusion is to
      always start search strings with either a,c,g,t,n or (.
      warning: if you use a file for input, be sure that the rules include
      a quit command and have no errors in them.  it is possible that errors
      will lead to an infinite loop.   (this may be a general problem with
      interactive i/o in pascal on your computer.)

*)
(* end module describe.search *)
version = 5.64; (* of search.p 1993 January 9
(* begin module describe.sepa *)
(*
name
      sepa: separates delila instruction sets

synopsis
      sepa(presites: in,  mixture: in,
              sites: out, nonsites: out, output: out)

files
      presites: delila instructions for sites of interest.  they are in
         any order and may contain several references to the same place.
         (let us call this a.)
      mixture: delila instructions for both sites and nonsites, as obtained
         from the search program.
         (let us call this b.)
      sites: the presites are reordered and redundant requests are removed.
         (these reordered instructions we will call a".)
      nonsites: the mixture is reordered, redundant requests and requests in
         the presites instructions are removed.
         (using the previous notation, this would be (b-a)".)
      output: messages to the user.

description
      the separate program has two main purposes:
      1) to eliminate redundancy in both the site and the nonsite sets.
      2) to eliminate the sites from the nonsite set.
      the delila instructions must be in the form output by the search
      program (as in delmods book.iw modules).  once the separation is
      completed, you may obtain the aligned book by using delila.

documentation
      delman.use.data.flow, nar 10(9): 2971 and 2997 1982

see also
      search, delila, alist

author
      thomas d. schneider

bugs
      sepa can not tell that these instructions are identical:
         get from 56 -10 to 56 +10 direction -;
         get from 56 -10 to 56 +10;
      because the second one may not be direction -.  this potential problem
      can be avoided by always giving the direction.

      also, it is advisable to make aligned listings with alist to be sure
      that the new aligned book is correct.

*)
(* end module describe.sepa *)
version = 2.08; (* of sepa.p 1990 Aug 15
(* begin module describe.shell *)
(*
name
   shell: basic outline for a program

synopsis
   shell(afile: in, output: out)

files
   afile:  multiple line detailed description of file 1, etc
   output: messages to the user

description
   The purpose and use of the program.
   This page is to be copied and edited for making new programs.

examples
   An example of the use of this form is module describe.lister

documentation
   Other sources of information or documents on the program.

see also
   aa.p

author
   Thomas Dana Schneider

bugs
   problems with the program and how to get around them (if known).

technical notes
   Details about the implementation that may be relevant to a user.

*)
(* end module describe.shell *)
version = 1.00; (* of shell.p 1993 January 8
(* begin module describe.shift *)
(*
name
      shift: copy one file to another file, with a blank in front of each line

synopsis
      shift(fin: in, fout: out, output: out)

files
      fin: the file to be copied with shifting
      fout: the shift of fin
      output: messages to the user

description
      shift makes a copy of the file fin on the file fout, with an extra
      blank line as the first character of each line.  this is useful on
      computer systems with a line printer that uses the first character
      for carriage control.  one can then shift files, such as programs,
      before printing them.

see also
      shift

author
      thomas d. schneider

bugs
      none known

*)
(* end module describe.shift *)
version = 1.03; (* of shift 1985 apr 25
(* begin module describe.short *)
(*
name
      short: find locations of short lines in a file

synopsis
      short(fin: in, fout: out, shortp: in, output: out)

files
      fin: the file to be analyzed
      fout: a list of lines that are short.
      shortp: a parameter to determine what 'short' means.
         this is one integer.  lines of this length or shorter
         will be reported to fout.
      output: messages to the user

description
      database programs that scan a line and assume that there are
      a certain number of characters on the line will lose track of
      the correct location if the line is shorter than they expect.
      this has happened with delila and dbcat.  the short program
      scans a file for lines shorter than a given length and
      lists them in the fout file.  the purpose of the program is
      to help debug database programs.


author
      thomas schneider

bugs
      none known
*)
(* end module describe.short *)
version = 1.01; (* of short 1985 may 30
(* begin module describe.shortline *)
(*
name
   shortline: make short lines out of long lines

synopsis
   shortline(input: in, output: out)

files
   input:  text to be wrapped
   output: wrapped text

description
   This Pascal program takes ASCII text and filters it.  Lines longer than the
constant maxline are forced to be maxline long by inserting carriage returns.

author
   Thomas Dana Schneider

bugs
   the constant maxline is fixed at compile time, of course.

*)
(* end module describe.shortline *)
version = 1.00; (* of shortline.p 1991 Oct 4
(* begin module describe.show *)
(*
name
   show: show modules in a module library

synopsis
   show(modlib: in, modcat: inout, print: out, input: intty, output: out)

files
   modlib: a module library as used by program module
   modcat: a module catalogue for modlib, generated by program module or
           show.  it is used (if it is not empty) for faster startup.
   print:  modules that the user pulls out from modlib
   input:  typed instructions from the user
   output: messages to the user

description
   Among other uses, the show program lets you look at pages of the delila
   manual by using the computer.  Each page is a unit we call a 'module'.  The
   name of the module that contains the page you are reading is
   'describe.show'.  Notice that the name has two parts separated by periods.
   The show program takes advantage of this naming convention to let you select
   the section(s) of the manual that you want to see.  Show generates a list of
   the module names.  For delman this is

      1 * version
      2   delman.

   With this list of name-parts one has several choices:  you can choose to
   look at the "version" page by typing "version." or "1" (without quotes).
   The * in the list means that the page will print on the terminal.  To look
   at the list of pages that begin with "delman." you would simply type
   "delman." or "2".  The period in the list means that there are sub-parts to
   the name, such as "delman.intro".

   The names form a tree-like structure that the show program knows about.
   You can climb down the tree by either typing the name or the number given.

   One can type more parts to a name than one.  For example, the command
   "delman.describe.module" would print documentation on the module program.

   Commands are separated by blanks.  Show considers any consecutive
   string of characters (with no blanks) that contains a period to be
   a module name.  Anything without a period is a command, such as "top"
   which gets one to the top of the name tree.

   Once you find a section that you want to step through page by page, you can
   use the n command.  You can also simply hit the carriage return repeatedly.

   Type "help" for a list of other commands and details.

documentation
   moddef

see also
   module

author
   Thomas D. Schneider and Billie H. Lemmon

bugs
   Some combinations of n and l commands may make the parent on the list
   incorrect.  Go to the top to correct this.

   On Unix systems, the program will ignore the first line you type.  Simply
   hit a carriage return when the program starts.

technical notes
   The names in the module library must be separated by periods for the show
   program to recognize the parts of the names.

*)
(* end module describe.show *)
   version = 3.06; (* of show.p 1989 July 8
(* begin module describe.shrink *)
(*
name
   shrink: reduce size of postscript graphics

synopsis
   shrink(input: in, output: out)

files
   input:  A PostScript program, containing a translate command.
   shrinkp: Parameter file.  The first line contains the scale factor.
   output: A copy of the input with the scale instructions.  A scale
      command is placed immediately after the translate command, so that
      the shrinking occurs toward the zero of the image.

description
   One often wants to run rsgra to look at a large region of aligned sequence,
   but the normal output won't fit on a page.  By passing the PostScript
   file through this program, one can scale the graphics to something
   that fits on a page.

examples
   0.5  would reduce the size of the image by a factor of 2.
   Note: the 0 is necessary for most Pascal compilers.

documentation

see also
   rsgra.p

author
   Thomas Dana Schneider

bugs
   The program is very specific in what it does.

technical notes

*)
(* end module describe.shrink *)
version = 1.01; (* of shrink.p 1989 November 14
(* begin module describe.sites *)
(*
name
   sites: analyse sites from randomized sequence data base

synopsis
   sites(database: in, standard: in,
         caps: out, latex: out, list: out, sorted: out,
         stats: out, tables: out, rsdata: out, output: out)

files
   database: database consisting of DNA sequence data.
      The first line is the name of the database.
      The remaining lines consist of experimental packages.
      The start of a package is a line like:
          @ -27 11 -21 5 0.85
      The '@' must be left justified as the first character on the
      line.  The numbers are defined to be:

          @ FROM.range TO.range FROM.random TO.random fraction.canonical

      FROM.range: the coordinate of the first base reported in the database
      TO.range:   the coordinate of the last base reported in the database
      FROM.random: the coordinate of the first randomized base
      TO.random:   the coordinate of the last randomized base
      fraction.canonical:  the fraction of the canonical base during
                           chemical synthesis.

      The next line defines the canonical sequence which was 'randomized'.  It
      is in the format of the remaining sequences.  The first sequence in the
      package is always the standard, so do not forget to include it!

      The sequences follow the standard.  The format of the standard and the
      randomized sequences consists of:

      DNA sequence, plasmid name, primer, experiment, date (year, month, day)

      separated by one space each instead of commas.
      The sequence may contain any of the characters: "acgtxd.".
      "x" means that the base is not known.  "d" means that that base
      was deleted.  The program will reject these sequences (to make pure
      data), but this allows them to be stored in the database. "." means
      'the same as the standard sequence in this position'.  This allows
      one to enter sequences as a set of changes from the standard.

      The next experimental package begins with another '@'.  The data from
      each experimental package are gathered as frequencies and normalized by
      using the given canonical base frequency.  The normalized frequencies
      from all the packages are averaged to produce the final results.  This
      allows one to combine several experiments together, however all
      experiments are given the same weight.  This is reasonable if the
      experiments have similar canonical frequencies and numbers of sequences,
      but is probably not correct if one experiment carries more "importance"
      than another.  A method to accounting for these different weightings is
      not known.

   standard: Use the rsdata output of the rseq program from the natural
      sequences as your standard.  It is used for statistical comparison of the
      experiment to wild-type sequences.

   caps: listing of the database sorted and with capital letters showing
      changes from the standard and database errors.

   latex: just like list, but in a form that can be run through the typesetting
      program LaTeX.

   list: listing of the database in an easy-to-read format showing only the
      changes from the standard.  Also gives the tables of numbers of bases.

   sorted: the list sorted by sequence

   stats: frequency statistics of the database differences.
      summary of information results.

   tables: frequency tables for various stages of the normalization.

   rsdata:   This simulates the output of the rseq program by giving the
      numbers of bases (b) at each position (i).  When the frequency tables are
      normalized in this program, the effective number of sequences is lost.
      To make sure that the numbers reported in rsdata are accurate, they are
      multiplied by constant scaleup.  The table can be run through dalvec and
      makelogo to make a sequence logo.  The variance, varhnb, is set to be
      negative to indicate that no method is known for how to calculate it.  An
      earlier version of the program gave the minimum error based on the number
      of sequences in the database, but people tended to miss this fact when
      looking at the final sequence logo, so were unduely impressed by the
      data.

   output: messages to the user

description
   The function of the sites program is to gather, collate and analyze
   data from a randomization experiment.  See the reference given below.

   It was designed to help enter sequence data.  One may enter several copies
   of a particular sequence, and they will be joined together by merging their
   data.  Sequences of the same clone are identified by their common plasmid
   names. Inconsistent data are flagged.

   First the program sorts the data and checks that multiple entries are
   consistant with one another.  If they are not, the program halts and you
   should look into the caps file to figure out what is wrong.

   The program converts the database into a more readable form in list, and
   provides statistical analysis.  If the standard is:
gaattcaaattaatacgactcactatagggagaaagctt pTS37 kc7 ex100 87 nov 2
   and one of the data base lines is:
gaattcaaattaattcgactcactttagggaaaaagctt pTS331 1204 ex394 87 nov 2
   the program presents the data in file list as:
..............t.........t......a....... pTS331 1204 ex394 87 nov 2
   which is more readable.  This allows entry as a sequence, but display
   in a form that is easy to understand.

   If two primers are used, and data are found for both, then the
   name becomes 'both'.

   The stats file contains tables of the wild type frequencies and
   the experimental frequencies.

documentation

@article{Schneider1989,
author = "T. D. Schneider
 and G. D. Stormo",
title = "Excess Information at Bacteriophage {T7} Genomic Promoters
Detected by a Random Cloning Technique",
year = "1989",
journal = "Nucl. Acids Res.",
volume = "17",
pages = "659-674"}

see also
   siva.p, dalvec.p, makelogo.p

author
   Tom Schneider

bugs
   For sorting all plasmid initials are ignored, sorting is by the plasmid
   number only.

   A correction for small sample size is not known for the normalized
   experimental data.  Certainly the method given in program Calhnb is not
   right.  Therefore, the program does not report the expected variation.

*)
(* end module describe.sites *)
version = 7.91; (* of sites.p 1993 January 25
(* begin module describe.siva *)
(*
name
   siva: site information variance

synopsis
   siva(sorted: in, sivap: in, incu: out, curves: out, list: out,
        output: out)

files
   sorted: the output of the sites program that contains a sorted
      list of sites for each experiment performed.
   sivap: parameters to control the program.
      first line: two integers, from and to coordinates over which
         to do the calculations.
      second line: repeats, the number of times to take passes through
          the data removing subsets.  This improves the statistics.
   incu: the xyin input to xyplo, output of this program.  Two columns:
      first column is the number of sites used to find the information
      second column is the amount of information in bits
        The curves loop around along the axis, so they remain connected.
      curves: another xyin file, for graphing the wiggling info curves
        first column is the position across the site
        second column is the information
        The curves loop around along the axis, so they remain connected.
   list: statistical picture of the result.  Two columns:
      first column is the number of sites used to find the information
      second column is the average amount of information (corresponds
         to the second column of incu, but is the average)
      third column is the variance of the information (corresponds
         to what your eye picks out as the thickness of the incu curves)
   output: messages to the user

description
   Siva calculates the variance of the information in a set of randomized sites
   by eliminating each site in turn and keeping track of the increase in the
   information content.  The information content must increase, since with
   fewer samples there must be less variation (this is the small sample bias
   effect).  The program allows one to graph the information content versus the
   number of sites removed (incu).  When this is done repeatedly, with
   different orders of removing the sites, a thick band of curves is created.
   The thickest part of this band shows the greatest possible amount of
   variation that could be in the total set of sequences.

   To be even-handed, the program removes the first sequence, then randomly
   removes the others.  This creates the first curve.  Then the program removes
   the second sequence and randomly removes the others for the second curve.
   If there are n sequences, then n removal curves will be generated.  This is
   one complete repeat of the process.  If you want, you can do this a number
   of times to get better statistics, using the repeat parameter in sivap.

   The largest variation in the information content is surely greater than the
   variation of the information content in all the sets of removals of sites.

   For several experiments, the statistics are joined into one set.  With
   several experiments, surely the variation of the combined experiments would
   be less than the variations found for the individuals.  So if one experiment
   gives a greater variation, that will increase the variation siva reports in
   list, so the highest value in list is an upper limit on the variation.

documentation
   @article{Schneider1989,
   author = "T. D. Schneider
    and G. D. Stormo",
   title = "Excess Information at Bacteriophage {T7} Genomic Promoters
   Detected by a Random Cloning Technique",
   year = "1989",
   journal = "Nucl. Acids Res.",
   volume = "17",
   pages = "659-674"}

see also
   sites.p

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.siva *)
version = 1.95; (* of siva.p 1993 January 26
(* begin module describe.sortbibtex *)
(*
name
   sortbibtex: sort a bibtex database

synopsis
   sortbibtex(fin: in, fout: out, output: out)

files
   fin: a bibtex database
   fout: bibtex database sorted by the key
   output: messages to the user, including errors in the
      structure of the database and duplicate entries.

description
   Sort a BibTeX database by the citation keys.

examples

documentation

see also
   rembla.p

author
   Thomas Dana Schneider

bugs
   Entries are defined by blank lines.  Use rembla to make sure that
   there are no extra spaces on the ends of lines.

technical notes

*)
(* end module describe.sortbibtex *)
version = 2.13; (* of sortbibtex.p 1993 February 16
(* begin module describe.sorth *)
(*
name
      sorth: sort helix list

synopsis
      sorth(hlist: in, shlist: out, list: out, sorthp: in, output: out)

files
      hlist: a list of helixes generated from program helix.
      shlist: a list of helixes, where the longest or strongest helix
         has been chosen from each piece to piece comparison ('set').
      list: progress of the program.
      sorthp: parameters to control the program.
         1. characters on the first line of the file determine the priority
         order for sorting the helixes.  all commands must end with 'a' to
         indicate 'ambiguous'.  the commands are:
            ea - sort on energies (see technical notes)
            la - sort on lengths (see technical notes)
            ela - sort first on energies then on lengths.
            lea - sort first on lengths then on energies.
         2. the second line of the file must contain one integer,
         'top'.  up to 'top' of the strongest helixes will be
         written to shlist. if 'top' = 1, then any set of helixes that are
         ambiguous are not copied to the shlist.  this allows
         one to find the strongest unambiguous helix in each set.
         3. the third line is the minimum length or maximum energy
         of helixes to be sorted.
      output: messages to the user.

description
      the strongest helixes in hlist are sorted and copied to shlist.
      the user can sort on energy, length, energy then length, or
      length then energy.  the user may chose more than one helix
      to be output (eg, the top 10).

see also
      helix

author
      thomas dana schneider

bugs
      none known

technical notes
      when only one variable is sorted on, the order of the other
      variable will not be meaningful because it is determined by
      the way the sort algorithm works.
      the constant 'maxhelix' determines the maximum number of
      helixes that can be sorted.
*)
(* end module describe.sorth *)
version = 2.40; (* of sorth 1985 may 5
(* begin module describe.spec *)
(*
name
   spec: analyse two spectra from the camspec

synopsis
   spec(csdata: in, baseline: in, xyin: out, output: out)

files
   csdata:  contains one spectrum from the Camspec to be used
     as the data.
   baseline:  contains one spectrum from the Camspec to be used
     as the baseline.
   xyin: input to xyplo program
   output: messages to the user

description
   Analysis of spectra produced by the camspec.

   Setup:  Establish communications with the Camspec.

   Give the T command.  If it does
   not respond with '2', use the A command and repeat T until it does.
   This sets the mode to transmittance.

   Then set to 450 nm with           450G
   Check that it is set with         U
   Set to 100% transmittance with    B
   Check that it is 100% with        V
   You can use the O command to do both checks at once.

   Use T and A to set the mode to 0, for reporting in absorbance.
   'AT' should do it.

   Obtain the spectrum with
      xl h
   where x is the interval:
      L 5nm
      M 1nm
      N 0.5nm
   and l is the lower wavelength, h is the high.  Eg,
      L400 700
   scans from 400 to 700 nm with steps of 5nm.

examples

documentation

see also
   xyplo.p

author
   Thomas Dana Schneider

bugs

technical notes
   The spectrum is set to -maxint at both ends so that when multiple
   xyin spectra are concatenated, the return lines all run below the graph
   (where you won't see them with xyplo).  Xyplo will object, but ignore it.

   The camspec sends absorbance data multiplied by 100.  This number is
   in constant 'correctionfactor'.

*)
(* end module describe.spec *)
version = 1.10; (* of spec.p 1992 June 15
(* begin module describe.sphere *)
(*
name
   sphere: plot density of shannon spheres

synopsis
   sphere(spherep: in, sigma: out, xyin: out, output:out)

files
   spherep: parameters.
     The first line is the step size interval (0.01 works well).
     the second line is the maximum radius to calculate out to (= maxr,
         3.1 works well).
     Each following line is a dimension to plot.
     If the dimension number is negative, it must be followed on the same line
     by the coordinates of the position to place the dimension numeral.
   sigma: lists the estimates for Rmaximum +/- sigma,
      taken as the radius when the curve passes through exp(-1/2).
   xyin: input to xylop, the plot
   output: messages to the user

description
   Create a graph of radius versus density of Shannon spheres
   at various given dimensions.  The output is run through xyplo.

   The function is:

   pd(R) = R^(D-1) * exp( sqr(R)/ (2* sqr(sigma)))

   where '^' means to exponentiate and
   where sqr(sigma) * (D-1) - sqr(Rmaximum)
   so setting Rmaximum = 1 relates sigma and D.

   The graph is in the range (0,0) to (r=maxr,1)).
   The curve is normalized so that its maximum is at (1,1).
   (except when dimension = 1, where it is at (1,0).

   Since xyplo can't plot several separate curves, without being
   told each symbol, this program simply starts at (0,pd(r)), draws
   the curve to (maxr,pd(maxr)), then circles back by drawing lines
   to the x axis (2*maxr,0) and then the origin (0,0).  By setting
   the region that xyplo plots below maxr, one gets nice, fully
   correct curves that do not appear to be connected.

documentation
   [1988 jan 23,5]

see also
   xyplo

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.sphere *)
version = 1.38; (* of sphere 1989 November 23
(* begin module describe.split *)
(*
name
      split: split a wide file into printable pages

synopsis
      split(sin: in, sout: out, splitp: in, output: out)

files
      sin: the file to be split into pages
      sout: the split result
      splitp: parameters to control split.  if splitp is empty,
         defaults are used.  otherwise splitp must contain 3 to 5 lines:
         1. if the first character is p (for 'page prompting') then the
            pagination is controlled by the sin.  (this is done by
            duplicating the first several columns to all the horizontal
            pages, as determined by the second parameter.)
            otherwise, pages begin as determined by the second parameter.
         2. for page prompting (see parameter 1) this is the number of
            columns to duplicate from the left margin to all pages.
            if not page prompting, then this is the lines per page in sin.
         3. columns per page in sin (not less than 1).
         4. number of header lines to copy to sout before splitting the rest.
         5. if 4. is negative, this is a trigger inside quotes (").
            -(4.) lines beyond this trigger splitting will begin.
         note: columns and lines per page refer to the input file, sin.
         to find the actual width of the output file pages, add 1 to
         parameter three (when not page prompting) or
         add parameter two to parameter three (when page prompting).
         one extra line is added per page for the page coordinate.
      output: messages to the user.

description
      the split program slices up the sin file into an array of pages,
      each located by an (x,y) coordinate.  in this way a file which is too
      large to print can be printed and then reconstructed.  in otherwords,
      if you have a program which produces output that is wider than the
      printer page (or the screen of the crt, for that matter) then you can
      run your output through split to obtain pages that will print ok.
      the upper lefthand corner of each page tells the coordinate of the page
      as (x down, y across).  a header page shows all the page coordinates.

examples
      if splitp contains: n/60/130/10 (on 4 lines) then sin will be
      split into 60 line by 130 column pages, after 10 header lines.
      if splitp contains: p/1/120/-5/"trigger" then each page will be 120
      characters wide and the first column will be copied to each page.
      the header extends 5 lines beyond and including the trigger.
      for p/5/132 the first 5 columns will be copied to each page.

author
      thomas d. schneider

bugs
      none known

technical notes
      constant pagecharacter is the (system dependent) begin page character.
*)
(* end module describe.split *)
version = 3.52; (* split 1986 nov 14
(* begin module describe.sqz *)
(*
name
   sqz: squeeze the input file to fit into fewer characters per line

synopsis
   sqz(fin: in, fout: out, output: out, sqzp: in);

files
   fin: a text file with lines longer than 80 characters
   fout: the squeezed file.  all lines that end with the endofline symbol
      are to be continued on the next line.
      the endofline symbol is written out as the first character of the
      file, so that the unsqz program can use it.
      if the endofline symbol is found anywhere in the fin file,
      then the fout will be emptied, and the program will halt.
   sqzp: if not empty, then the first character redefines
      the endofline symbol.
   output: messages to the user

describe
   for transportation, this program allows a file to be compressed
   to fewer than 80 characters per line.

see also
   unsqz

author
   thomas dana schneider

bugs
   none known

technical note:
   the default endofline character is defined by a global constant
*)
(* end module describe.sqz *)
version = 1.13; (* of sqz.p 1993 Jan 27
(* begin module describe.ssbread *)
(*
name
   ssbread: read a sample sheet from the ABI sequencer

synopsis
   ssbread(ssb: in, report: out, output: out)

files
   ssb: A sample sheet from the ABI Sequencer
      Each sample must have plasmid and primer names in the
      sample name section.
   report:  reading of the identifying date, sample number, plasmid and primer
      names There are two parts to the report.  In the first, the program
      locates sections of the file by coordinate, and shows what it finds
      where.  In the second, it reads the data which tod will read (using the
      identical procedures.

   output: messages to the user

description
   The program allows one to test the reading modules for program tod.

examples

documentation

see also
   tod.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.ssbread *)
version = 1.27; (* of ssbread.p 1993 January 29
(* begin module describe.stirling *)
(*
name
   stirling: test of Stirling's formula

synopsis
   stirling(output: out)

files
   output: a table of Stirling's approximation

description

examples

documentation
   Stirling's approximation for factorial is compared to the exact factorial
   function.  The results can be plotted with xyplo.

see also
   xyplo

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.stirling *)
version = 1.03; (* of stirling.p 1993 January 27
(* begin module describe.sumfile *)
(*
name
   sumfile: sum of file sizes

synopsis
   sumfile(input: in, output: out)

files
   input:  the input to this program should be from the Unix command:
       du -s ~/_*
      (where the underscore should be removed; it is there to avoid
       a stupid compiler bug!!)
   output: The output is three columns:
      first column is the first column of the input, the size in kb of
         the various files
      second column is the running sum of the first column
      third column is the same as the second column of the input, the
         names of the files.

description
   The program allows one to find out how many files will fit onto a tape.

examples
   An example of the use of this form is module describe.lister

documentation
   see the man page for du.

author
   Thomas Dana Schneider

bugs
   none known

technical notes
   none.

*)
(* end module describe.sumfile *)
version = 1.00; (* of sumfile 1989 March 31
(* begin module describe.tipper *)
(*
name
      tipper: copy a file to the output file with special symbols at end

synopsis
      tipper(fin: in, tipperp: in, output: out)

files
      fin: the file to be copied
      tipperp: this file indicates what string should be copied
         to the output file after the fin file.  if tipperp
         is empty, then tipper uses the string normally recognized by
         the unix tip program to mean end of file.
         (the string is determined by typing ~seofread? to tip)
         otherwise the contents of tipperp are copied to
         output after the fin file.
      output: the copy of fin, followed by either string as
         determined by the tipperp file.  tipper will not give its
         version number unless the fin file is empty.

description
      tipper makes one copy of the file fin on the file output,
      and then appends a special string of characters to the end of
      the file.  these characters are intended to be recognized
      by the tip program under unix to indicate the end of the file.
      this makes transportation of files from a remote system to
      a unix operating system quite easy.

see also
      copy, the tip program under the unix operating system, whatch.

author
      thomas d. schneider

bugs
      tip responds to any of the characters in the special
      string.  tipper does not warn the user if those characters
      appear inside the file.  so, for example, if the source code of
      tipper is transferred, the transfer is broken by tip
      at the point that the special string is detected in the
      code and the source code begins to spill out to the screen
      rather than a file under unix.  to fix this, simply reset the tip
      eofread variable to one which is not in the fin file.
      the whatch program can be used to determine a good character.

technical notes
      the special string is defined by constant 'eofread'.
*)
(* end module describe.tipper *)
version = 1.11; (* of tipper 1985 apr 26
(* begin module describe.titer *)
(*
name
      titer: analyse titertek optical density data

files
      titer(plates: in, result: out, verbose: out, output: out)

synopsis
      plates: output of the tk program; containing a header line,
         and a series of plates that describe the names of wells,
         their optical densities at various times and od620 data.
      result: a tabulation of the beta-galactosidase values.
      verbose: more detail on how the calculations were done.
      output: messages to the user.

description
      take data from titertek plates and do the analysis:
        get sample names from id plate, find duplicates
        read in volume values
        read in od620 values
        read in od414 values for each time point
                calculate best slope from all time points
                calculate activity for each sample
        report beta-galactosidase data
           st.dev. is the standard deviation for the samples
           % dev.  is 100*st.dev./activity.  If this is larger than 10%
                   for 4 samples, we usually redo the measurement.

examples
        titer.plates  example input file
        titer.result  example output file
        titer.verbose  example output file

see also
        beta-galactosidase assay protocol
        tk program (in advanced IBM basic)
        tkod.p program

author
        Gary Stormo

*)
(* end module describe.titer *)
const   version = 2.29;  (* of titer.p 1993 Jan 27
(* begin module describe.tkod *)
(*
name
   tkod: read od values from tk data

files
   tkod(input: in, xyin: out, xyplop: out, output: out)

synopsis
   input: output of the tk program; containing a header line,
      and a series of plates that describe the names of wells,
      their optical densities at various times and od620 data.
   xyin: the data from the input rearranged for xyplo
   xyplop: control file for xyplo
   output: messages to the user.

description
   Tkod takes OD620 data from 96 well plates as provided by the
   tk program and converts the data into a form that the xyplo
   program can use to plot with.

see also
   titer program
   beta-galactosidase assay protocol
   tk program (basic)
   xyplo

author
   Tom Schneider

(* end module describe.tkod *)
version = 1.11;  (* of tkod, 1987 august 4 *)
(* begin module describe.tod *)
(*
name
   tod: to database format for sites program

synopsis
   tod(abi: in, thedate: in, ssb: in, todp: in,
       results: out, summary: out, db: out, output: out)

files
   abi: Raw sequences from the ABI sequencing machine.  The
      files called *_??.Seq.txt are manipulated under Unix by:

      more *_??.Seq.txt | cat > abi
      echo "" >> abi

      The more program puts each name followed by the contents, and it is smart
      enough to pipe it to cat which joins the results together.  Thus the abi
      file contains the sample names followed by the sequences.  The echo puts
      a single carriage return at the end of the file so that it ends cleanly.

   thedate:  Date that the sequences were run, output of makedate program.

   ssb:  This must be a copy of the "Sample_Sheet.bin" file from the ABI
      machine.  It contains:
           lane number
           plasmid name
           primer name
      in a funny non-ASCII format which this program extracts from.  (The
      program extracts the data from their known rigid locations.)  The sample
      name column of the sample sheet must contain the plasmid name.  Any
      number of spaces, slashes (/) or null characters are then skipped and the
      next non null word (ending in null or space) is taken as the primer
      name.  Thus the format is "plasmid/primer".  For example:  pTS421/pTS37f1

      There is a bug in the ABI code which will replace the first letter
      of the 24th lane with a null character sometimes.  To get around
      the bug, we will try to rewrite the sample sheet if this appears.

   todp: parameters to control the program
      first line:  a string of characters, called R1, which represents
         a restriction site or other sequence.
         NOTE: it should be self complementary.
      second line:  a string of characters, called R2, which represents
         a restriction site or other sequence.
         NOTE: it should be self complementary.
      following lines: Editing commands for sequences.  There is one editing
	 command per line, and each consists of three integers (called N, P1
	 and P2) followed by a string (S).  N is the lane number to edit.
	 The P1 and P2 define two positions in the sequence.  The sequence
	 between these positions is deleted and replaced by the string S.  The
	 string must contain only the letters 'acgt' or the single letter 'd'.
	 This allows one to insert sequence (make the P1 = P2 + 1, string is
	 'acgt' form), to delete (P1 > P2 + 1, string is 'd') and to replace
	 (make P1 = P2 + 1 + length of string in 'acgt' form).  Lines that
	 begin with '*' are comments, copied to the results file.  Comments
         for each edit must be placed just below the edit command line.

   results: running commentary of the processing of the sequences.

      The following changes are made:
      1.  Each sequence is edited according to instructions in todp.
      2.  Each sequence is converted to lower case.
      3.  The letter 'n' is converted to 'x'.
      4.  When there is exactly one copy of R1 and one copy of R2,
          the region between R1 and R2 is printed (including R1 and R2).
          Otherwise, the entire original sequence is printed.
      5.  The sequence complement is printed if necessary to assure that R1
          is printed before R2.  The program will print the original
          sequence if R1 and R2 cannot be found on the complement.

      The sequence is then joined to the data from ssb, and the results printed.

   summary: summary of the results.

   db: The sequences from abi are reformed into the database format needed
      by the sites program.

   output: messages to the user

description
   Convert output sequence from ABI sequencing machine into format usable
   by the sites program.

examples

documentation

see also
   sites.p, makedate.p, dotod

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.tod *)
version = 3.00; (* of tod.p 1993 January 28
(* begin module describe.todawg *)
(*
name
      todawg: change a book into dawg format

synopsis
      todawg(book: in, dp: out, output: out)

files
      book: a book from the delila system
      dp: a file of the sequences corresponding to the book, in the
         form needed for the dawg program
      output: messages to the user

description
      the dawg program needs a special format file to create a dawg.
      this program converts a book to that format.

documentation
      obtain from david haussler information about dawgs.

author
      thomas d. schneider

bugs
      none known
*)
(* end module describe.todawg *)
version = 1.05; (* of todawg 1986 nov 14
(* begin module describe.tstrnd *)
(*
name
      tstrnd: test random generator

synopsis
      tstrnd(output: out) 

files 
      output: the version of tstrnd is printed.  successful compilation 
         and running of the program indicates that the modules are correct. 

description 
      test of a random number generator

author
      thomas d. schneider

bugs
      none known
  
technical notes 
      the constant n in procedure randomtest determines how many times  
      the random number generator will be in a series of tests.  if n 
      is small, the the test will be poor, if it is large then the test may 
      take a long time. 

*)
(* end module describe.tstrnd *)
version = 'tstrnd 1.06 1988 October 17';
(* begin module describe.undel *)
(*
name
      undel: remove references to delman in modules

synopsis
      undel(fin: in, fout: out, output: out)

files
      fin: a text file containing modules
      fout: a copy of fin with any modules beginning with
            'delman.describe.' replaced by 'describe.'
      output: messages to the user

description
   from this point on, manual pages will be simply describe.name.
   this program removes the old convention.

author
      thomas d. schneider

bugs
      none known
*)
(* end module describe.undel *)
version = 1.15; (* of undel.p 1993 Jan 27
(* begin module describe.unixmod *)
(*
name
      unixmod: specific module library for the unix operating system
  
synopsis
      unixmod(output: out) 
  
files 
      output: where the date and time will appear.  
  
description 
      unixmod contains modules that will replace corresponding modules in  
      the other module libraries which are cyber-system dependent. this 
      will allow easy transportation of the delila system to unix
      operating systems for the pyramid 90-x.
  
documentation 
      moddef, delman.describe.module  
  
see also  
      delman.describe.delmod, moddef, delman.describe.module  
  
see also
 delmods, prgmods, matmods, vaxmods
  
author
      tom schneider
  
bugs
      none known  
  
technical notes 
      the datetime package required a const 'namelength' and a type 'alpha'.  
      these are part of the book.const and book.type modules of delmod, and 
      are identical to those types and consts.  note:  programs which use 
      the datetime package must have these types and consts either from 
      delmod or manually declared.
*)
(* end module describe.unixmod *)
version = 'unixmod 1.13 86 feb 13';
(* begin module describe.unshi *)
(*
name
 unshi: remove first column of characters from a file

synopsis
 unshi(fin: in, fout: out, output: out)

files
 fin: the file to be unshifted
 fout: the unshifted file
 output: messages to the user

description
 the unshi program reverses the effects of the shift program
 by removing the first character of each line in a file.

see also
 shift

author
 patrick r. roche

bugs
 none known yet silly us

*)
(* end module describe.unshi *)
version = 1.04; (* of unshi 1985 apr 26
(* begin module describe.unsqz *)
(*
name
   unsqz: unsqueeze the input file

synopsis
   unsqz(fin: in, fout: out, output: out);

files
   fin: the output of the sqz program
   fout: the unsqueezed file.
   output: messages to the user

describe
   unsqz reverses the operation of sqz.
   The first character of the fin file is used to indicate where
   lines should be fused together, so no matter what character was used
   to sqz, unsqz will always work.

see also
   sqz

author
   thomas dana schneider

bugs
   none known

*)
(* end module describe.unsqz *)
version = 1.06; (* of unsqz.p 1993 Jan 27
(* begin module describe.untex *) 
(*
name
   untex: remove tex and latex constructs
  
synopsis
   untex(input: in, output: out) 
  
files 
   input: a tex or latex file
   output: the file with:
      '\xxx' command words converted to spaces,
      '{$}' converted to spaces
      free floating '.' ',' '(' ')' removed
      comments (%) removed

      multiple spaces are comressed to single spaces.
      multiple lines are compressed to 2 lines (to preserve the
      paragraph structure).
description 
   This reduces the number of words counted by wc to something close to correct.

author
   Thomas D. Schneider 
  
bugs
   citations and comments on lines by themselves leave a blank line.
  
*)
(* end module describe.untex *)
version = 1.26; (* of untex.p 1991 Mar 19
(* begin module describe.untitle *) 
(*
name
   untitle: remove titles from bbl file
  
synopsis
   untitle(input: in, output: out) 
  
files 
   input: a bbl file from bibtex.
   output: a bbl file without titles.

description 
   Titles are removed by deleting between the two copies of the
   '\newblock' strings, leaving the second one.  If the first '\newblock'
   contains the italics indicator, '{\it', then the program realizes
   that this must be a book title, and it keeps the title.

author
   Thomas D. Schneider 
  
*)
(* end module describe.untitle *)
version = 1.17; (* of untitle, 1988 july 2
(* begin module describe.unverb *) 
(*
name
   unverb: remove verbatim sections from a latex file
  
synopsis
   unverb(input: in, output: out) 
  
files 
   input: a latex file
   output: the file with verbatim sections removed

description 
   Removing verbatim sections helps to reduce the number of words counted
   by wc to something approximately correct.  To be used in conjunction
   with untex.

see also
   untex

author
   Thomas D. Schneider 
  
bugs
   none known
  
*)
(* end module describe.unverb *)
version = 1.17; (* of unverb, 1988 September 14
(* begin module describe.vaxmod *)
(*
name
      vaxmod: specific module library for the vax computer

synopsis
      vaxmod(output: out)

files
      output: where the date and time will appear.

description
      vaxmod contains modules that will replace corresponding modules in
      the other module libraries which are cyber-system dependent. this
      will allow easy transportation of the delila system to vax computers
      running under vms.

documentation
      moddef, delman.describe.module

see also
      delman.describe.delmod, moddef, delman.describe.module

see also
 delmods, prgmods, matmods

author
      patrick r. roche

bugs
      none known

technical notes
      the datetime package required a const 'namelength' and a type 'alpha'.
      these are part of the book.const and book.type modules of delmod, and
      are identical to those types and consts.  note:  programs which use
      the datetime package must have these types and consts either from
      delmod or manually declared.
*)
(* end module describe.vaxmod *)
version = 1.10; (* of vaxmod 1991 Mar 19
(* begin module describe.ver *)
(*
name
      ver: look at the version of a program

synopsis
      ver(input: in, output: out)

files
      input: a program source code
      output: the line that contains "version = " in input

description
      this program lets one look at the version number of a program
      source code.

author
      thomas schneider

see also
      verbop

bugs
      none known

*)
(* end module describe.ver *)
version = 2.01; (* of ver.p 1990 Dec 13
(* begin module describe.verbop *)
(*
name
      verbop: increment the version number of a program

synopsis
      verbop(source: inout, output: out)

files
      source: a program source code, with a version constant in the form
          "version = " followed by a real number.
         the version number is incremented by 0.01.
      output: the new version number is reported.

description
      if you are too lazy to change the version number of a program
      every time you alter the code, then you have no excuses any longer,
      because this program will do it for you automatically...

author
      Thomas Schneider

see also
      ver, code

bugs
      none known

*)
(* end module describe.verbop *)
version = 2.08; (* of verbop.p 1990 june 20
(* begin module describe.vernum *)
(*
name
      vernum: print the version number of a program

synopsis
      vernum(input: in, output: out)

files
      input: a program source code, with a version constant in the form
          "version = " followed by a real number.
      output: the new version number is reported.
         If there is none, the program reports 0.

description
      the program finds the version number of a file and reports it to
      output for the purpose of saving copies.

author
      thomas schneider

see also
      ver, verbop, code

bugs
      none known

*)
(* end module describe.vernum *)
version = 1.04; (* of vernum 1988 feb 19
(* begin module describe.versave *)
(*
name
   versave: save the file under the version number

synopsis
   versave(input: in, output: out)

files
   input: a text file, with a version constant in the form
       'version = ' followed by a real number.  The name of the
       file (including dot extensions) must be found after the word 'of '.
   output: Four lines are produced:
       file
       (name of text file found after the 'of')
       version
       (the real number found after 'version = ')

description
   Generate commands for worcha on how to change a script for saving
   the file.  A script is then passed through worch to produce the
   executable commands.

example
   For an input file containing:
      version = 1.00; (@ of versave.p 1989 April 4
   The output is:
      file
      versave.p
      version
      1.00
   This is to be placed in the worcha parameter file, worchap.

   An example script is:
      cp file old/file.version
      echo saved file in old/file.version

   Using worcha with the script would become:

      cp versave.p old/versave.p.1.00
      echo saved versave.p in old/versave.p.1.00

   When executed, this will save the text.

author
   thomas schneider

see also
   worcha, verbop, ver, code

bugs
   none known

*)
(* end module describe.versave *)
version = 1.09; (* of versave.p 1989 May 8
(* begin module describe.vfilt *)
(*
name
   vfilt: vector filter

synopsis
   vfilt(data: in, fines: out, output: out)

files
   data: the output of the scan program
   vfiltp: paramters to control the program.  one integer, the
      lowest value to pass through the filter
   fines: the same form as data, but low values removed
   output: messages to the user

description
   the program eliminates the lowest values in a scan of a matrix
   against a sequence.

see also
   scan

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.vfilt *)
version = 1.02; (* of vfilt 1988 jan 6
(* begin module describe.whatch *)
(*
name
      whatch: what characters are in a file?

synopsis
      whatch(fin: in, fout: out, output: out)

files
      fin: the file to be studied
      fout: an alphabetic list of the characters in the file, giving:
         the character,
         the ordinal number of the character (pascal ord function),
         how many such characters are in the file,
         and the percent of the character in the file.
      output: messages to the user

description
      sometimes it is necessary to determine what characters are in a file.
      if the file is very large, it is not possible to do this by hand.

author
      thomas schneider

bugs
      none known

technical notes
      the constant maxchars determines the number of characters accepted.

*)
(* end module describe.whatch *)
version = 1.11; (* of whatch 1985 apr 20
(* begin module describe.winfo *)
(*
name
   winfo: window information curve

synopsis
   winfo(data: in,  winfop:in, xyin: out, output: out)

files
   data:   output of rseq
   winfop: parameters to control the program
      First line: window size
   xyin:   input to xyplop
   output: messages to the user

description
   Make a sliding window average of an information curve

examples

documentation

see also
   rseq.p xyplo.p

author
   Thomas Dana Schneider

bugs
   not yet!

technical notes
   Constant maxwin is the largest window size allowed.

*)
(* end module describe.winfo *)
version = 1.08; (* of winfo.p 1989 November 28
(* begin module describe.wl *)
(*
name
   wl: wrap lines in a file

synopsis
   wl(input: in, output: out)

files
   input:  text to be wrapped
   output: wrapped text

description
   This Pascal program takes ASCII text and filters it.  Lines longer than the
constant maxline are altered by inserting carriage returns.

author
   Thomas Dana Schneider

see also
    ww.p

bugs
   the constant maxline is fixed at compile time.

*)
(* end module describe.wl *)
version = 1.00; (* of wl
(* begin module describe.woco *)
(*
name
   woco: word counting program

synopsis
   woco(input: in, output: out)

files
   input:  a file to find the number of words in
   output: number of words in the file

description
   The program knows about latex constructs a little.
   A word is defined to be any contiguous string of A-Z, a-z, 0-9,
   excluding those that begin with a \.

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.woco *)
version = 1.08; (* of woco 1988 July 5
(* begin module describe.worcha *)
(*
name
      worcha: word changing program

synopsis
      worcha(fin: in, fout: out, worchap: in, output: out)

files
      fin:     the file in which words need to be changed to other words.
      fout:    the file where the copy of fin with the words changed is written.
      worchap: the parameter file containing the words that need to
               be replaced and their replacements.  Worchap must be
               constructed as follows:

               a word that needs to be changed is on the first line, the
               following line contains the replacement word,
                  next line:  word to be replaced,
                  following line: replacement word,
               and so on....etc.

               so, the odd numbered lines, (1,3,5....), have the words
               from fin that will be replaced, and the even numbered
               lines, (2,4,6...), contain the replacement words.
      output:  where error messages will appear.

description
      This program was designed to go through a pascal program and
      locate and replace 'words', (pascal identifiers).
         Worcha will sort through a file and look for the words that need to be
      changed, ignoring comments and both single and double quotes.  Upon
      finding the old words, worcha will substitute the specified new words from
      worchap when copying the input file onto the specified output file.  As
      many words as necessary may be changed at one time.  Worcha produces a
      list of the changes within a comment at the end of the fout file.

documentation
      delman.assembly.worcha

author
      Patrick R. Roche

bugs
      The program will yell if word length is equal to wdlgthmax.

technical notes
      Worcha uses linked-lists for storing the words to be changed and their
      replacements.  Thus as many words as desired may be changed at one time.

*)
(* end module describe.worcha *)
version = 2.48; (* of worcha.p 1989 April 5
(* begin module describe.wordlist *)
(*
name
   wordlist: lists words in a file

synopsis
   wordlist(input: in, output: out)

files
   input:  a file to find the words in
   output: the words of the file listed one per line

description
   The program knows about latex constructs a little.
   A word is defined to be any contiguous string of A-Z, a-z, 0-9,
   excluding those that begin with a \.

author
   Thomas Dana Schneider

bugs
   none known

*)
(* end module describe.wordlist *)
version = 1.13; (* of wordlist 1993 January 26
(* begin module describe.ww *)
(*
name
   ww: word wrap 

synopsis
   ww(input: in, output: out)

files
   input:  text to be wrapped
   output: wrapped text

description
   This Pascal program takes ASCII text and filters it.  Lines longer than the
constant maxline are altered by replacing the first space after position
maxline with a carriage return.  This has the effect of wrapping the lines
between 'words'.
   The original purpose was to get around a design flaw in another program.
The program fig produces graphics for X and NeWS windows.  The graphics
is converted to PostScript by another program, f2ps.  Unfortunately f2ps
was poorly designed: the PostScript produced has many lines longer than
70 characters.  When this PostScript code is sent to the (latest as
of 1988) Apple NTX LaserWriterII, the printer dies.  By running this filter,
the problem is bypassed.  Moral: never make lines longer than 80 characters!

author
   Thomas Dana Schneider

bugs
   the constant maxline is fixed at compile time, of course.

*)
(* end module describe.ww *)
version = 1.05; (* of ww 1988 September 14
(* begin module describe.xycor *)
(*
name
   xycor: correlate two xyin files from the ri program

synopsis
   xycor(axyin: in, dxyin: in,
        arp: out, aip: out, ait: out, ain: out, arn: out,
        drp: out, dip: out, dit: out, din: out, drn: out,
        list: out, output: out)

files
   axyin:  the xyin file output from ri representing data
     of acceptor sites
   dxyin:  the xyin file output from ri representing data
     of donor sites
   arp, aip, ait, ain, arn,
   drp, dip, dit, din, drn:
      output files.  Each letter of the name has a meaning:

         r = an Ri value
         i = an interval (distance)

         p = previous
         t = total of previous and next intervals
         n = next

         a = acceptor
         d = donor

      Thus arp is the comparison of an acceptor Ri to the previous donor Ri.
   data:  donor and acceptor Ri compared to 5 other parameters, xyin format
      Unfortunately, missing columns cannot be handled by xyplo (yet?),
      so this approach is not as easy as the 10 separate files.
   list:  other output of this program, showing the data structure
     and all the details of the data relationships.
   output: messages to the user

description
   This program determines the relationship between
   Ri for donors and acceptors with introns and exons.

   The comparisions that are needed are:

   1.    donor Ri to acceptor Ri across intron
   2.    donor Ri to acceptor Ri across exon
   3.    donor Ri to adjacent exon length
   4.    donor Ri to adjacent intron length
   5. acceptor Ri to adjacent intron length
   6. acceptor Ri to adjacent exon length
   7.    donor Ri to sum of surrounding exon and intron
   8. acceptor Ri to sum of surrounding exon and intron

or:

   D1.    donor Ri to acceptor Ri across intron
   D2.    donor Ri to acceptor Ri across exon
   D3.    donor Ri to adjacent exon length
   D4.    donor Ri to adjacent intron length
   D5.    donor Ri to sum of surrounding exon and intron

   A1. acceptor Ri to donor Ri across intron
   A2. acceptor Ri to donor Ri across exon
   A3. acceptor Ri to adjacent exon length
   A4. acceptor Ri to adjacent intron length
   A5. acceptor Ri to sum of surrounding exon and intron

D1 = A1, D2 = A2.  This form allows two output files to be created
for simple analysis.  These are dout and aout.

   This program reads the two xyin files into a single data structure that
   reconstructs the intron/exon structure.  Then the data are output for
   analysis by xyplo.

   NOTE: Because some data will be missing, and because xyplo cannot
   handle missing data items, I probably should simply have 10 files.

   Anytime that a donor is next to a donor or an acceptor next to an acceptor,
   that is a flag for alternative splicing.  It can also indicate that a
   particular site was eliminated for some reason (although why Mike Stephens
   or dbinst did that is sometimes mysterious).  In any case, the length data
   would be questionable.  I chose to eliminate such pairs from the statistics,
   since there are only 85 of them in the entire data structure.

examples

documentation

see also
   ri.p, xyplo.p

author
   Thomas Dana Schneider

bugs

technical notes

*)
(* end module describe.xycor *)
version = 1.44; (* of xycor.p 1993 March 11
(* begin module describe.xyplo *)
(*
name
   xyplo: plot x, y data

synopsis
   xyplo(xyin: in, xyout: output, xyplop: in, output: out)

files
   xyin:  A set of header lines that begin with asterisk ('*') are copied to
       output.  Remaining lines are the data in columns, ending with end of
       file.  Do not use tabs to separate data, as the tabs will be recognized
       as tokins!  Missing columns are not allowed.  See the demonstration file
       xyin.demo for an example.  Once the first data line has been read,
       lines that begin with an '*' will be ignored.  This allows one
       to place comments or other information deeper into the file withou
       having xyplo object.

   xyplop:  Parameters to control the plot, on lines as shown.  The major
       sections of the parameter file are separated by lines that are used by
       the program as separators.  A separator line may begin with blanks, and
       these must be followed by asterisks, as shown below.  These lines simply
       make the file easier to deal with, but you must have them in the file!
       The easiest way to create a xyplop file is to copy the demonstration
       file (xyplop.demo) and modify that to suite your needs.

       xzero yzero         amounts to move the graph origin (inches)
       zx min max          (character, real, real) if zx='x' then set xaxis
       zy min max          (character, real, real) if zy='y' then set yaxis
                           These two lines set the minimum and maximum range of
                           the data to graph.  Other characters mean the
                           program automatically uses the range of the data.
       xinterval yinterval number of intervals on axes to plot
       xwidth    ywidth    width of numbers in characters
       xdecimal  ydecimal  number of decimal places
       xsize     ysize     size of axes in inches
       xlabel              the x axis label
       ylabel              the y axis label
       zc                  if zc='c' then a crosshairs put on zero of x and y
                                 'x' then only X axis is plotted
                                 'X' then only X axis and crosshairs
                                 'y' then only Y axis is plotted
                                 'Y' then only Y axis and crosshairs
                                 'n' then neither axis nor crosshairs
                                 'N' then neither axis with crosshairs
                           Otherwise, both axes are plotted without crosshairs.
       zxl base            if zxl='l' then convert the x axes to a log scale
                           using the indicated base
       zyl base            if zyl='l' then convert the y axes to a log scale
                           using the indicated base
       * define columns to read data from ***********************************
                           This section defines which column of xyin contains
                           what kind of data.  You can use a column only once.

       xcolumn   ycolumn   columns of xyin that determine the
                           location of the symbol
       symbol-column       the xyin column to read symbols from
                           if zero, then use the first symbol defined below
       xscolumn  yscolumn  columns of xyin that determine the size  of the
                           symbol.  If zero, then no data is expected.

			   NOTE: for most symbols this is the entire size of
			   the symbol.  For the I beam symbol, the yscolumn is
			   half of the total size plotted.  Thus one may use
			   standard deviations and obtain a symbol of 2
			   standard deviations high centered on the y
			   coordinate.

       hucolumn sacolumn brcolumn       hue saturation brightness columns.
                           These control the color of the rectangle symbol.
                           1 0 0 is black (assumed if columns are all zero)
                           1 0 1 is white
       * define one or more symbols *****************************************

                           Each of these sections defines one of the symbols by
                           specifying what to do for each symbol flag seen in
                           the symbol column.  There may be as many symbols as
                           will fit in memory.

			   The last of these sections must contain just a '.'
			   as the 'symbol-to-plot'.  This is required to end
			   the symbol definition section since there are an
			   indefinite number of symbols.

       symbol-to-plot      (character) Most symbols are plotted at the
			   coordinates given in xcolumn and ycolumn.

                           'c' plot a circle
                           'b' plot a box
                           'x' plot an x
                           '+' plot a plus
                           'I' plot an I beam symbol
                           'd' plot a box with central dot
                           'p' point (or dot) alone.

			   'R' plot a filled rectangle in color.  Unlike the
			   other symbols, which are centered on the data, the
			   lower right hand corner of this rectangle is placed
			   on the data.  This allows the user more control on
			   placement.

                           'r' like 'R' but gray scale.  The brightness
                           column is used for controling the brightness.

			   'f' Means to plot the symbol-flag (defined below).
			   The 'f' type allows several symbols to be made each
			   with its own regression and connection lines, but
			   plotted with the entire flag string in xyin.  The
			   symbols are distinguished by their first character.
                           The symbol-flag in xyplop should be set to the string
                           that one desires to be recognized.

			   'g' Means 'grab bag'.  The 'g' type has lower
			   priority than any other symbol.  Xyplo searches
			   through all the available symbols looking for a
			   match to the symbol-flag.  If a symbol-flag cannot
			   be found, then the data are assigned to the
			   'grab-bag'.  The program uses the symbol-flag on the
			   graph.
                           The symbol-flag in xyplop can be anything.

                           The symbol underscore (_) in xyin is converted
                           to a blank to allow the appearance of separated
                           words.

			   One can do grab-bag connected curves without symbols
			   by setting g and the symbol-flag to ' '.  One can
			   also set the symbol-to-plot to blank (or other
			   unrecognized symbol) to get specific connected
			   curves.  In this case, the symbols MUST be connected
			   or the program will object (invisible symbol and
			   invisible connection means data loss).

       symbol-flag         The string of characters that indicates that this
			   symbol should be plotted.  Eg, if the
			   'symbol-to-plot' is I and the flag is x, then
			   whenever an x is seen in the symbol column, an I
			   beam will be plotted.  The flag can be more than one
			   character long, but (unfortunately) it cannot
			   contain blanks.

       symbol-sizex        Side in inches on the x axis of the symbol.  If this
			   value is negative, the data in xscolumn is used to
			   determine the size.  For circles, sizex determines
			   the radius, sizey is ignored.

       symbol-sizey        Side in inches on the y axis of the symbol.  If this
			   value is negative, the data in yscolumn is used to
			   determine the size.  For circles, sizeX determines
			   the radius but a positive number is still required
			   for sizey.

       connection linetype size   If the first character is 'c' then the
			   symbols will be connected by lines of linetype as
			   defined below.  (Linetype must follow the c
			   immediately, without blanks.)

       linetype  size      linetype is a character defining the kind of
                           regression line to plot for this symbol:
                           'l' means do regression line
                           'i' invisible,
                           '.' dotted
                           '-' dashed
                           'n' means no line.

			   '-' and '.' require a size in inches for the
			   spacing.  The others also require a number, but it
			   is ignored.

       * end the symbol definitions with a period (left justified!) *********
.
       * define zero or more user defined lines *****************************
       linetype m b size   One or more lines to be drawn on the plot, m and b
			   are slope and intercept.  Linetype and size are
			   define as for the symbol connection lines.

   xyout: regression results, ready for PostScript input.  (See technical
          notes.)

   output: messages to the user

description
   The data in the xyin file are converted to graphics in the PostScript
   language on the xyout file, under control of the parameters set in xyplop.
   There are several distinct sections of the parameters:

   1. The first set of parameters determine the overall characteristics of the
      graph.
   2. The second set of parameters defines the columns of xyin to be read.
   3. The next section of the parameter file defines one or more symbols to be
      plotted on the graph.  If desired, a linear regression is performed
      between the data columns, and this may be graphed for each symbol.  The
      invisible option allows one to obtain the regression data without the
      graph.
   4. A section with just a period ends the symbols section.
   5. The last section contains lines you define.

   Recommended procedure for using xyplo: obtain a copy of xyplop.demo and
   xyin.demo, set permission to read them for yourself (on a Unix system use
   chmod), and copy them to the names xyplop and xyin.  Try them out as is.  If
   you don't get a graph, doing your own data will not do any good!  Then
   convert the xyplop to your own use by changing the xyplop.demo file and
   substitute your xyin file.  This way the complexity of xyplop can be held at
   bay.

see also
   xyplop.demo, xyin.demo,
   xyplop.test, xyin.test,
   xyplop.mul, xyin.mul,
   doodle

author
   Thomas Schneider

technical notes
   The program originally generated output in the pic format.  One could then
   run this through pic and troff to produce a graph.  However, the program has
   been modified to eliminate the pic notation (by substituting modules from
   dops rather than domods).  All lines outside the graphics now are preceeded
   by a %, which is beginning of a comment in PostScript.  Thus the output of
   the program can be run directly into a PostScript interpreter.  This saves
   on both memory and speed of graphing since the intermediate file is no
   longer created.

bugs
   Minor unobvious things have prevented people from getting graphs.  Most
   problems occur when badly formed xyplop files are used, and the program has
   no way to tell what the difficulty is.  Recently, more checks have been put
   it, so the program can detect most oddly formed xylop and xyin files.  Check
   your xyplop carefully.

*)
(* end module describe.xyplo *)
version = 7.77; (* of xyplo.p 1993 March 22
(* begin module describe.zipf *)
(*
name
   zipf: Monte Carlo simulation for Peter Shenkin's problem

synopsis
   zipf(zipfp: in, data: out, xyin: out, output: out)

files
   zipfp:  parameters to control the program
      first line: integer, number of correlation coefficients to create
      second line: integer, number of symbols for each correlation coefficient.
         eg, 20 means amino acids.
      third line: character.  't' means use Tom's method, 'p' means use Peter's.
      fourth line: character.  'g' means to graph the simplex.
   data:  a list of correlation coefficients.  This is to be input
      to the genhis program.
   xyin:  data for graphing the simplex.  The graph is generated with the
      xyplo program.
   output: messages to the user

description

   1992 Jan 13  Returned call to Stephen Altschul 496-2475.  He suggested that
   Peter Shenkin's results of rank versus log of probability are due to random
   effects.  This is easy to test with a Monte Carlo simulation:

   Tom's method
      chose s (eg 20) random numbers
      find their sum
      divide each number by the sum to produce s random numbers which
         sum to 1.
      sort the numbers
      take the log versus the rank
      determine the correlation coefficient
      repeat to get distribution of correlation coefficients.

   Peter's method
      chose s-1 random numbers between 0 and 1
      sort the numbers
      take the differences to produce 20 numbers that sum to 1
      resort the numbers
      take the log versus the rank
      determine the correlation coefficient
      repeat to get distribution of correlation coefficients.

   Graph of simplex.  The numbers all add to 1 for either method.  They are
   points in an s dimensional space.  The volume they fit into is a hyper plane
   of s-1 dimensions since they sum to 1, called a simplex.  The distribution
   of the points can be visualized by projecting onto a plane and graphing with
   the xyplo program.  The projection is done by using polar coordinates.
   There is a vector P from the center of the simplex to each point to graph.
   There is a vector, A, from the center of the simplex to the point where the
   first coordinate has value 1 and all others are zero.  The magnitude of P is
   determined, and the angle between P and A determines an angle.  These
   numbers are in polar coordinates.  They are converted to rectangular
   coordinates in the xyin file.  If s = 3, then the simplex is a simple plane
   reaching between the three points A=(1,0,0), B=(0,1,0) and C=(0,0,1).  The
   projection takes this equilateral triangle onto the xy plane.  In higher
   dimensions, the points are collapsed to the xy plane, so high dimensional
   effects are expected.  This means that the center should tend to become
   empty, and the distribution will become spherical.

examples

zipfp file:
***********************************************************
10000 10000      1000 Number of correlation coefficients to print out
3 16            Number of symbols being simulated
p             t= tom's, else peter's
g             g = graph the symplex, otherwise not

zipfp:  parameters to control the zipf program.
***********************************************************

genhisp file for use with genhis
***********************************************************
x n 50
r -1 -0.5
***********************************************************

xyplop file for use with xyplo
***********************************************************
2 2       zerox zeroy         graph coordinate center
x -1 1 zx 0 25    zx min max (character, real, real) if zx='x' then set xaxis
y -1 1 zy 0 250   zy min max (character, real, real) if zy='y' then set yaxis
10 10     xinterval yinterval number of intervals on axes to plot
6 6       xwidth    ywidth    width of numbers in characters
1 1       xdecimal  ydecimal  number of decimal places
5 5       xsize     ysize     size of axes in inches
x
y
c         zc                  'c' crosshairs, axXyYnN
n 2       zxl base            if zxl='l' then make x axis log to the given base
n 2       zyl base            if zyl='l' then make y axis log to the given base
          *********************************************************************
1 2       xcolumn   ycolumn   columns of xyin that determine plot location
0         symbol column       the xyin column to read symbols from
0  0      xscolumn  yscolumn  columns of xyin that determine the symbol size
0 0 0     hue saturation brightness   columns for color manipulation
          *********************************************************************
p         symbol-to-plot      c(circle)bd(dotted box)x+Ifgpr(rectangle)
0         symbol-flag         character in xyin that indicates that this symbol
0.05      symbol sizex        side in inches on the x axis of the symbol.
0.05      symbol sizey        as for the x axis, get size from yscolumn
nl 0.05   no connection (example for connection is c- 0.05 for dashed 0.05 inch)
n  0.05   linetype  size      linetype l.-in and size of dashes or dots
          *********************************************************************
.
          *********************************************************************
***********************************************************

documentation

see also
   genhis.p, xyplo.p

author
   Thomas Dana Schneider

bugs

technical notes
   The non-standard random number generator is used (rand).
   This could be replaced by a portable one, but with the danger
   of it not giving good results.

*)
(* end module describe.zipf *)
version = 1.32; (* of zipf.p 1993 January 26
(* begin module describe.program-list *)
This is a list of the Delila programs as of
Wed Mar 31 11:41:34 EST 1993
 
program name<:> a one-line description of the program.
alist: aligned listing of a book
alpro: frequency and information of aligned protein sequences
alword: frequency and information of aligned words
aran: aligned random sequences
asciicode: converts ascii table to Pascal code
auxmod: modules for auxiliary programs
av: average integers
biglet: text enlargement program
binhex: convert binary to hex
binomial: produce the binomial probabilities for a found black to white ratio
binplo: produce the binomial probabilities for a found black to white ratio
bkdb: convert a book to database format for the sites program
calc: a calculator that propagates errors
calhnb: calculate e(hnb), var(hnb), ae(hnb), avar(hnb), e(n)
calico: character and line counts of a file
cap: put capital letters inside quotes of a program
catal: cataloguer of delila libraries, the catalogue program
censor: removes code from a program
cerf: complement of the error function
chacha: changes characters in a file
chi: estimates chi squared from degrees of freedom
cisq: circle to square
ckhelix: check that the helix location is where one wants
cluster: cluster indana subindexes into groups of duplicate entries
coda: composition file to data for genhis
code: find the comment density of a pascal program
column: pull defined column from input
comp: determine the composition of a book.
compan: composition analysis.
concat: concatenate files together
copy: copy one file to another file
count: counts the amount of sequence in a book
cybmod: specific module library for the cyber computer
da3d: diana da file to 3d graphics
dalvec: converts Rseq rsdata file to symvec format
dbbk: database to delila book conversion program
dbcat: database catalog production and sorting program.
dbfilter: filter GenBank databases to remove unwanted entries  
dbinst: extract Delila instructions from a GenBank database
dblo: look at the catalogue of a genbank/embl database
dbpull: database extraction program.
decat: break a file into 10 files
decom: remove comment starts from within a comment
delila: the librarian for sequence manipulation
delmod: delila module library
diana: dinucleotide analysis of an aligned book
difint: differences between integers
digrab: diagonal grabs of diana data
dirty: calculate probabilities for dirty DNA synthesis
dnag: graphics of dna
domod: doodle modules
doodle: pascal graphics library and preprocessor for pic under unix
dops: pascal graphics library and preprocessor for postscript
dosun: pascal graphics library and preprocessor for Sun graphics
dotmat: dot matrices of two books
dotsba: dots to database
encfrq: encoded sequence frequency analysis
encode: encodes a book of sequences into strings of integers
encsum: sum of the vectors of encoded sequences
epsclean: clean an eps file
ev: evolution of binding sites
flag: points out excessively long lines
frame: evaluator of potential reading frames
frese: frequency table to sequ
gap: gaps in aligned listing of a book
genhis: general histogram plotter
genmod: genbank access modules
genpic: convert genhis output to pic input
gentst: test random generator
helix: find helices between sequences in two books
hexbin: convert hex to binary
hist: make a histogram of aligned sequences.
histan: histogram analysis.
indana: analysis of an index
index: make an alphabetic list of oligonucleotides in a book
instal: delila instruction alignment
kenbk: make a book from a file of sequences of sequences provided by Kenn
kenin: create Delila instructions from Ken's all.gen instructions
keymat: keyed-matrices for helices between two books
lenin: convert a list of lengths into Delila instructions
lig: ligation theory
linreg: linear regression
lister: list the sequences of pieces in a book with translation
ll: line lengths
lochas: look at characters in a file
log: convert columns of data to log
loocat: look at a catalogue
makebk: make a book from a file of sequences.
makedate: make a date file
makelogo: make a graphical `sequence logo' for aligned sequences
makessbdate: make a date file from a Sample_Sheet.bin file
makman: make manual entries from a source code
makemod: create a set of empty modules from a list of names
maknam: make manual entry names
malign: optimal alignment of a book, based on minimum uncertainty
markov: markov chain generation of a dna sequence from composition.
matmod: mathematics modules
matrix: dot matrices for helices between two books
merge: compare two files and merge them
mnomial: produce the multinomial distribution for base probabilities
modin: generate modularized delila instructions for absolute sites
modin.use: more information on using the modin program
modlen: determine module lengths
module: module replacement program
mstrip: remove control m's from a file
nocom: remove comments
normal: generate normally distributed random numbers
notex: remove tex and latex constructs
nulldate:  modules to neutralize the date-time functions
number: add line numbers to a file
odti: munch od and time plates together for xyplo
palinf: find palindromes, based on information theory
parse: breaks a book into its components
patana: pattern analysis
patlrn: pattern learning
patlst:  lister of patlrn output.
patser: pattern searcher
patval: pattern evaluations of aligned sequences
pbreak: breaks a file into pages at a certain trigger phrase
pcs: partial chi squared
pemowe: peptide molecular weights
prgmod: programming modules for the delila system
quoteline: add quote marks to the beginning of every line in a file
rara: rank-rank reformulation of a data set
rawbk: make a raw sequence into a book
ref2bib: refer to bibtex converter
refer: print the references in the pieces of a book
reform: raw sequences reformatted
rembla: remove blanks from ends of lines in a file
rep: records repeats between sequences in two books
repro:  make multiple copies of a file
rf: calculate Rfrequency
ri: Rindividual is calculated for every site in the aligned book
riden: ring density graph
rila: reformat the ribl table into latex format
ring: z space ring
rndseq: generate random dna sequences
rseq: rsequence calculated from encoded sequences
rsgra: rsequence graph
rsim: Rsequence simulation
same: counts the number of lines that are identical in two files
scan: scan a book with a wmatrix and generate a vector
search: search a book for strings
sepa: separates delila instruction sets
shell: basic outline for a program
shift: copy one file to another file, with a blank in front of each line
short: find locations of short lines in a file
shortline: make short lines out of long lines
show: show modules in a module library
shrink: reduce size of postscript graphics
sites: analyse sites from randomized sequence data base
siva: site information variance
sortbibtex: sort a bibtex database
sorth: sort helix list
spec: analyse two spectra from the camspec
sphere: plot density of shannon spheres
split: split a wide file into printable pages
sqz: squeeze the input file to fit into fewer characters per line
ssbread: read a sample sheet from the ABI sequencer
stirling: test of Stirling's formula
sumfile: sum of file sizes
tipper: copy a file to the output file with special symbols at end
titer: analyse titertek optical density data
tkod: read od values from tk data
tod: to database format for sites program
todawg: change a book into dawg format
tstrnd: test random generator
undel: remove references to delman in modules
unixmod: specific module library for the unix operating system
unshi: remove first column of characters from a file
unsqz: unsqueeze the input file
untex: remove tex and latex constructs
untitle: remove titles from bbl file
unverb: remove verbatim sections from a latex file
vaxmod: specific module library for the vax computer
ver: look at the version of a program
verbop: increment the version number of a program
vernum: print the version number of a program
versave: save the file under the version number
vfilt: vector filter
whatch: what characters are in a file?
winfo: window information curve
wl: wrap lines in a file
woco: word counting program
worcha: word changing program
wordlist: lists words in a file
ww: word wrap 
xycor: correlate two xyin files from the ri program
xyplo: plot x, y data
zipf: Monte Carlo simulation for Peter Shenkin's problem
(* end module describe.program-list *)