digest.doc update 10/29/90 DIGEST I. Function- DIGEST reads in a file which lists restriction sites for one or more enzymes in a given fragment. The user can ask DIGEST to calculate the resultant fragments from multiple digests. Either parital or complete digests may be calculated. II. Program Flow Below is the screen output of a typical interactive session with DIGEST. Program output and user responses are listed as they would actually appear on the screen. Comments, which are listed here for explanatory purposes but would not appear in the program, are enclosed in the symbols (* *). (* program begins *) DIGEST VERSION 5/15/90 Enter input filename: B:PUC18.RES (* IBM-PC DOS protocol *) Enter output filename: PRN (* IBM-PC DOS protocol*) _____________________________________________________________________ DIGEST MAIN MENU _____________________________________________________________________ Input file: B:PUC18.RES Output file: PRN _____________________________________________________________________ 1) Read in a new sequence 2) Open a new output file 3) Generate digests (output to screen) 4) Generate digests (output to file) _____________________________________________________________________ Type the number of your choice (0 to quit program) 3 (* MAIN LOOP *) The names of enzymes with known sites in pUC18 will be displayed a screenful at a time. You will be asked to specify enzymes one at a time to include in this digest. There are 10 enzymes listed. Press RETURN to begin. (* User presses RETURN and enzyme names appear on screen. User types numbers of enzymes one at a time and 0 when done. A "+" appears next to enzymes that have been added to the digest.*) 1) Ava2 2) BamH1 3) Dra1 4) EcoR1 5) Hind3 6) Hinf1 7) Kpn1 8) Pst1 9) Pvu2 10) Taq1 Type number of an enzyme or 0 for more enzymes: 10 1) Ava2 2) BamH1 3) Dra1 4) EcoR1 5) Hind3 6) Hinf1 7) Kpn1 8) Pst1 9) Pvu2 10)+Taq1 Type number of an enzyme or 0 for more enzymes: 9 1) Ava2 2) BamH1 3) Dra1 4) EcoR1 5) Hind3 6) Hinf1 7) Kpn1 8) Pst1 9)+Pvu2 10)+Taq1 Type number of an enzyme or 0 for more enzymes: 0 Type C for complete digests, P for parital: C Type D to generate a digest, Q to quit (* The user may repeat the cycle and print new digests by *) (* by typing D. Typing Q ends the program. *) (* Below is an example of what output might look like: *) DIGEST Version 5/15/90 pUC18 Configuration: CIRCULAR Length: 2686 bp # of Sites Frags Begin End Taq1 TCGA 4 Pvu2 CAGCTG 2 1444 907Taq1 2350Taq1 644 2351Taq1 308Pvu2 276 631Pvu2 906Taq1 182 449Taq1 630Pvu2 110 309Pvu2 418Taq1 30 419Taq1 448Taq1 EcoR1 GAATTC 1 Pvu2 CAGCTG 2 2686 309Pvu2 308Pvu2 2686 451EcoR1 450EcoR1 2686 631Pvu2 630Pvu2 2544 451EcoR1 308Pvu2 2506 631Pvu2 450EcoR1 2364 631Pvu2 308Pvu2 322 309Pvu2 630Pvu2 180 451EcoR1 630Pvu2 142 309Pvu2 450EcoR1 Kpn1 GGTACC 1 Pst1 CTGCAG 1 Hind3 AAGCTT 1 2643 443Kpn1 399Hind3 27 416Pst1 442Kpn1 16 400Hind3 415Pst1 III. What the Output Means The column "# of Sites" tells how many sites are found for each enzyme. See note 2 below for the equations predicting the numbers of fragments in partial or complete digests. The column "Frags" lists the fragments produced by digestion with the given enzymes in descending order of size, as they would appear on a gel. The columns "Begin" and "End" list the positions of cut and the cutting enzyme on the corresponding 5' and 3' ends of each fragment in "Frags". IV. Input file Generally, DIGEST uses output from INTREST or BACHREST directly as input. However, the user can even use DIGEST to predict restriction fragments for molecules whose DNA sequence is not known. If one knows the restriction sites for several enzymes, it is possible to create a datafile in the same format as the INTREST/BACHREST output. Since DIGEST doesn't allow too much deviation from the format of INTREST, some parts may be omitted, as discussed below. Here is a general formula for the input file. represent information that must be supplied by the user. [Items enclosed in square brackets] are optional. <seq.name> Configuration: <CIRCULAR or LINEAR> Length: <number> [bp] <title line, may be blank> <title line, may be blank> <enz.name> <recognition seq.> <number> [([<number>])] <num. of sites> <site1> <site2> ...... <siteN> <blank line> In constructing such a file, the following rules apply: 1. The first line is a title line and is ignored. 2. The sequence name may be up to 20 non-blank characters 3. All enzymes must include a name, a recognition sequence, a cutting site, and the number of sites found. These data items must all be on the same line, and must be separated by one or more blanks. 4. Restriction enzyme names must be ten or fewer characters. Blanks are not permitted. 5. If the recognition sequence is assymetric, (ie. the inverse complement is not the same as the original site) a second cutting site must be included after the first. This number must be enclosed in parentheses, as in INTREST/BACHREST output. (Empty parentheses are permitted if you're too lazy to include a number.) 6. The positions of the sites found are listed below the enzyme, one site per line. The information in the columns "Frags","Begin", and "End" may be omitted entirely. 7. A blank line must separate each enzyme listing. V. Usage Notes 1. As many enzymes as you wish may be included in a given digest. In practice, this is usually not more than three. 2. The numbers of expected fragments in complete or partial digests of circular or linear molecules with n cutting sites are defined below: circular linear ------------------------------------------------------------- | | | complete | f (n) = n | f (n) = n + 1 | | c,c | c,l | | | | |-------------------------------+-----------------------------| | 2 | 2 | partial | f (n) = n | (n+1) + (n+1) | | p,c | f (n) = ------------- | | | p,l 2 | -------------------------------------------------------------