OSP: OLIGO SELECTION PROGRAM, User Documentation Program Author: LaDeana Hillier INTRODUCTION OSP is a computer program developed to aid in selecting oligonucleotide primers for DNA sequencing and for the polymerase chain reaction. OSP allows the user to specify (or use default) constraints for primer and amplified product lengths, %G+C contents, and (absolute or relative) melting temperatures; for primer 3' nnucleotides; and for the maximum allowable primer-self, primer-primer, and primer-product annealing propensities. Candidate primer sequences are screened against a user-supplied data set of other sequences (e.g. repetitive element or vector sequences) to help minimize the possibility of non-specific priming. Primers meeting all constraints are ranked and displayed in order of increasing overall ``score'', which is a user-definable weighted sum of the above parameter values. OSP is currently being routinely used both for sequencing primer selection in the St. Louis/Cambridge nematode sequencing project and to generate PCR primers for the YAC/STS mapping project in the St. Louis Human Genome Center, with a high success rate in both projects. In comparison to other primer selection programs we have examined, OSP is unusual in the range of user settable parameters employed (for example, annealing propensity involving of the internal primer sequence can be scored differently from annealing propensity at the 3' end), in allowing pre-screening against other sequences, and in the flexibility allowed the user in setting constraints and ranking candidate primers. This flexibility allows the user's selection criteria to evolve in the light of laboratory experience. The program is available at no cost from the authors in two versions, one of which produces text-only output, and the other having an interactive X windows graphic interface. The interactive X windows version of the program also provides the capability to call up trace and sequence data (using the ted trace editor) from both the ABI 373A or Pharmacia ALF fluorescent sequencing machines. INVOCATION *****Text Version Osp can be run from the command line by typing either: osp OR osp your_constraints_file_filename Using the second option, any constraints defined in the user's constraints file will override the programs internal default constraints (See SETTING UP A CONSTRAINTS FILE) in choosing primers. ******X Version The program may be invoked from the command line by typing: ospX One or more of the following command line options are also acceptable: ospX [-seq sequence_filename] [-ver version_number (0 or 1)] [-def default_constraints_filename] [-out output_filename] [-opt option_number (0,1,2,3, or 4)] Command Line Options: -seq sequence_filename The name of the file containing the sequence from which the user wishes to obtain primers. Alternatively, this filename may be entered once the X version is up and running. -ver version number (0 or 1) Version 0 is the default version of the program and has all the primer choice options available. Version 1 is purely for choosing sequencing primers and does not offer any additional options. There are several differences between version 1 and version 0. (1) Starting and ending sequence indices Version 0 allows the user to select regions within an input sequence from which to choose primers Version 1 does not allow the user to select starting and ending points for the search for primers within the sequence. (2) Oligo output Version 0 outputs to a file as many primers as the user would like along with scoring information associated with those oligos. The default output filename is sequence_filename.oligo Version 1 outputs only one user-specified oligo along with its template and serial number to the output file. The default output filename is sequence_filename.p, unless the user specified -out output_filename on the command line. Once a default output filename has been selected, that same filename will appear as the default until changed by the user. Thus, this version of the program expects the user to output only their chosen oligos, to a single file. (3) User-specified constraints Version 0 allows the user to specify product, primer and annealing constraints. Version 1, because it is designed solely for selecting sequencing primers, does not include the product or primer-primer related constraints. The only product constraints that are included in version 1 are PROD_LEN_MIN and PROD_LEN_MAX, the minimum and maximum product lengths. In version 1, these are interpreted as length from the end of the sequence rather than product length. -def default_constraints_filename The name of the file containing constraint values. Any constraints defined in the user's constraints file will override the programs internal default constraints (See SETTING UP A CONSTRAINTS FILE) in choosing primers. -out output_filename For version 1 of the program, this defines the output filename to be used as the default output filename. -opt option_number The option number may be selected on the command line or by selecting the OPTIONS button (clicking with the leftmost mouse button on OPTIONS) after the program is up and running. The following option numbers are available (the default is 1): PRIMER PAIRS (0) Search for primers in TWO FLANKING REGIONS (1) Search for ALL possible primer pairs in one sequence (2) Output SCORES for two specific primers which you supply SINGLE PRIMERS (3) Search for a SINGLE primer in one sequence (4) Output SCORES for a single primer which you supply EXECUTION *****Text Version After invoking osp, the user is prompted for choosing a program option: Select Program Option: PRIMER PAIRS (0) Search for primers in TWO FLANKING REGIONS (1) Search for ALL possible primer pairs in one sequence (2) Output SCORES for two specific primers which you supply SINGLE PRIMERS (3) Search for a SINGLE primer in one sequence (4) Output SCORES for a single primer which you supply Following selection of a program option the user is prompted for filename(s) and orientation of the sequences within those files. For example, if the user selects option 0, the following sequence of events would take place: Would you like to: (1) Enter the name of ONE SEQUENCE FILE and a starting and ending point within that sequence for each of the two flanking regions or primers (2) Enter names of TWO SEPARATE FILES containing the two flanking regions or primers? Choose option 2 and the program will prompt: File containing LEFT flanking region? Provide a sequence filename and the program will prompt: Is this sequence in its: T -- Top strand orientation B -- Bottom strand orientation Choose an orientation and the program will prompt: File containing RIGHT flanking region? Provide a sequence filename and the program will prompt: Is this sequence in its: T -- Top strand orientation B -- Bottom strand orientation ***In choosing primer pairs, the user should remember that the program always looks for sense primers on the ``top strand'' and antisense primers on the ``bottom strand.'' In this example if the user had selected top strand orientation for the right flanking region, the program would then reverse/complement the sequence and look for the antisense primer. If, however, the user had selected bottom strand orientation for the right flanking region, the program would look for the antisense primer on the input sequence file itself. Next the program prints out the current values for each of the user-definable constraints. The user is then prompted with the following: Would you like to C -- Change specific program parameters R -- Read in a new file containing constraint information U -- Use the current constraint values If the user answers, 'c' or 'C', the program will prompt: ******************************************** Please type the name of the PARAMETER and a VALUE EXAMPLE: PROD_LEN_MIN 17 To SAVE current constraints to a file type: SAVE output_filename To END typing in new parameters and values type: * ******************************************** In this way, the user may change any of the current constraint values before selecting primers and save those changes to a file. Finally, the user is prompted for starting and ending nucleotide position information. *****X Version Once the X Windows version is up and running the user is confronted with several options. The following command buttons are available (each of which may be "called" by pressing the leftmost mouse button while the cursor is positioned within the desired button area): OPTIONS This allows the user to select a new program option. The current program option is displayed in the title bar at the top of the osp window, e.g. OLIGO SELECTION PROGRAM: Search for all possible primer pairs in one sequence. If the user selects option 0 and then selects to input two sequence files, the user will then be prompted for the orientation of the second sequence file. -----> This button allows the user to select the orientation of their sequence file. The label will be changed to <----- when the user clicks on this button, thus indicating that the program should search for a primer on the reverse complement of the input sequence. If the user is searching for primer pairs and has asked to input two sequence files, the arrow indicates the orientation of the "sense" file and the user will be prompted under OPTIONS for the orientation of the "antisense" file. In choosing primer pairs, the user should remember that the program always looks for sense primers on the ``top strand'' and antisense primers on the ``bottom strand.'' If the user selects top strand orientation for the right flanking region, the program would then reverse/complement the sequence and look for the antisense primer. If, however, the user had selected bottom strand orientation for the right flanking region, the program would look for the antisense primer on the input sequence file itself. DS DS stands for double stranded and pertains to the calculation of annealing scores. When clicking on the DS button, the label will toggle between DS and SS. If the user chooses DS, the program will compare any primer it finds to both the top and bottom strands of the sequence calculating the scores for annealing propensity. If the user chooses SS, the program will compare any primer(s) it finds to only the "sense" strand of the sequence. ANALYZE This button allows the user to actually analyze their given sequence looking for oligos. After the analysis is completed, the primers are displayed in the large window at the bottom of the osp display. PARAMS This button allows the user to change any of the oligo selection constraints by providing a new window with all of the current constraints displayed. There are four options available within this window: CONFIRM: After making changes to any of the constraints (by placing the cursor in the desired window and typing) pressing confirm exits this window and provides the newly specified values as the current defaults. CANCEL: If the user exits this window using the CANCEL button, changes made to the constraints values are not saved. INPUT CONSTRAINTS FROM FILE: If a filename is typed into the text window next to the SAVE CONSTRAINTS TO FILE: prompt, then clicking on INPUT CONSTRAINTS FROM FILE: results in the constraints values in the constraints file by that name (See SETTING UP A CONSTRAINTS FILE below) being used in the next oligo search. Alternatively, the user may place a constraints file filename when starting up ospX (e.g. ospX -def your_filename). SAVE CONSTRAINTS TO FILE: If a filename is typed into the text window immediately to the right of this prompt, the current constraint values (both the constraints in this window and those found in the WEIGHTS window) are saved to that file. WEIGHTS This button allows the user to change any of the weights used in ranking the chosen primers. The same options are available from this window as are available in the PARAMS window. SAVE This button allows the user to save several primers along with their attributes and scores (Version 0 of the program) or a single oligo with its template and serial number to a file (Version 1). A "popup" window is provided for the user to fill in the necessary information. In version 1, the default number of oligos saved is 10, and the default output filename is sequence_filename.oligo. In version 0, the default oligo saved is the last one the user had clicked on; the default output filename is either sequence_filename.p or the default output filename specified on the command line, or the output filename used the last time the user had saved an oligo. OUTPUT INFO A "popup" window will appear when clicking on this button which contains the contents of the file specified by the current default output filename. If you have not yet created an output file, a window will not be displayed. The user may do any editing in this window and save or cancel this edited output file. OLIGO INFO A "popup" window will appear when clicking on this button which contains information pertaining to the most recent primer selection; the window will be empty if the user has not yet pressed the ANALYZE button. The information in this window pertains to how many primers were examined and rejected and reasons for those rejections. It is especially helpful when 'No suitable products are found.' QUIT This button quits the ospX application. OTHER: -----> This button allows the user to select which strand of their file of other sequences that will be used in calculating primer-other annealing scores. The sequence which is in the other sequence file is considered to be in its ``top'' strand form. If the OTHER button is ----->, primers will be compared with the sequences as they are written in the other sequence file. However, if the OTHER button is <-----, then the primer will only be compared to the reverse/complement of the sequence(s) which exist(s) in the user's other sequence file. Both strands will be searched, however, if the OTHER: DS button (see below) is left on DS. The label toggles between -----> and <----- . OTHER: DS This button toggles between DS (double stranded) and SS (single stranded). When DS is the current mode, each of the primers selected by the program will be aligned with both strands of the sequence(s) in the other sequence file when calculating primer-other annealing scores. However, if the current mode is SS, the primers will be compared to the strand indicated in the neighboring OTHER:-----> button. TYPE/PASTE IN SEQUENCE After clicking on this button, a "popup" window will appear which allows the user to type or paste (using the left mouse button to "paint" over the desired sequence, then click once on the middle mouse button when the cursor is positioned in the sequence window) in their desired sequence, rather than having it in a file. The number of sequence windows that open is dependent on the option number the user has chosen. If two sequences are entered, the first entered sequence is assumed to be in the form indicated by the ---->/<---- button; the second sequence is assumed to be in its "top" strand form. SEQUENCE INFORMATION WINDOW The sequence information window is located directly below the analyze, strand direction... buttons. The prompts in this window will change based on the program option the user has chosen. For example, if the user has chosen option 0 (Search for primers in TWO FLANKING REGIONS) and has chosen to enter two separate sequences, the following prompts will appear: Sense file: Other File: Antisense file: If the user has chosen option 0 and chosen to enter only 1 sequence, the following prompts will appear: Sequence file: Start:1 End:0 Other File: Start:1 End:0 In this way, the user knows exactly what information must be entered for the option they have selected. ***Note that by leaving 0 as the ending point for the sequence, the program understands it is to look to the end of the sequence file when choosing primers. RESULTS WINDOW This is the window immediately below the TYPE/PASTE IN SEQUENCE button. This window will contain informative messages pertaining to how the analysis is proceeding (e.g. ....getting products....) Error messages can also be put in this window. Its main function, however, is for the display of primers and their attributes. OLIGO GRAPHICS WINDOW After a sequence file and starting and ending points have been entered, and the analyze button has been pressed, a graphic display of your sequence and the oligos chosen will appear in this window. Clicking with the leftmost mouse button on any of these primers will result in information about those primers being displayed in the RESULTS WINDOW. If the sequence you have entered has a corresponding sequencing machine trace file (and has been edited using the ted trace editor), clicking above the bar representing your sequence will result in a ted window being opened centered on the point at which you clicked. If this option is available to you, the RESULTS window will provide the following prompt: Click with the left mouse button ON any primer to have score information displayed here Click with the left mouse button ABOVE the sequence bar at the desired indices to open a ted window. PROGRAM OUTPUT -- Primer Selection Information After the program has found a set of primer pairs, it provides some information about the primer selection process. In the X windows version of the program this information can be seen when clicking on the OLIGO INFO button or when (in version 0 of the program) looking at the primer output file. In this text version of the program, the information is printed directly to the screen and can be found in the primer output file as well. Here is a sample of that information: PRIMERS (Number of SS-type sequences found was 23) Sense: Total considered: 67 Rejected based on ambiguity (N): 0 Rejected based on gc_content: 37 Rejected based on primer Tm: 0 Rejected based on self-annealing 3 Rejected based on other-annealing: 0 Number accepted: 27 Antisense: Total considered: 45 Rejected based on ambiguity (N): 0 Rejected based on gc_content: 21 Rejected based on primer Tm: 0 Rejected based on self-annealing: 20 Rejected based on other-annealing: 0 Number accepted: 4 PRODUCTS Total considered: 108 Rejected based on identical endpts: 36 Rejected based on length: 0 Rejected based on primer-primer annealing: 4 Rejected based on primer-product annealing: 0 Rejected based on melting temperature: 0 Rejected based on gc_content: 0 Rejected based on difference in Tm: 52 Number accepted: 16 This type of information is especially helpful when no suitable products are found or when too many products are found. The user, by examining this information, knows which constraints need to be "loosened" or "tightened" before rerunning the program. PROGRAM OUTPUT -- Primer Output Files The user may output their primers to a file by using the SAVE button in the X windows version or by providing a filename when prompted in the text version. The output file contains the sequence filename, the constraints used during that run of the program, the primer selection information described above, and the top 'n' (where n is specified by the user) primers selected. For primer pairs the following output is provided: Primer Pair # 1 5' end 3' end length G+C(%) Tm OLIGO1: ...aaCGGATTTGGTCGTATTGGGcgc... 28 46 19 52 57.6 OLIGO2: ...caTGTAGTTGAGGTCAATGAAGGGgtc... 128 107 22 45 58.2 ANNEALING SCORES PRIMER-SELF PRIMER-PRIMER PRIMER-PRODUCT PRIMER-OTHER 3' Internal 3' Internal 3' Internal 3' Internal OLIGO 1: 4.0 8.0 12.0 16.0 14.0 22.0 OLIGO 2: 4.0 10.0 4.0 10.0 14.0 18.0 16.0 22.0 PRODUCT Length G+C(%) Tm Total Score 101 48 77.2 52.0 The lower case letters in the oligo are the neighboring nucleotides, while the upper case letters are the oligo itself. Oligo2 is displayed in its "ready to sequence" or "bottom strand" form. This is indicated by the 5' end indices being greater than the 3' end indices. If the user has entered two separate files primer-product annealing scores and product scores are not provided. For a discussion of the annealing scores see the section below entitled, CALCULATION OF ANNEALING SCORES. If the user has chosen single primers the following output is provided: Primer # 1 (Length= 119) -- OLIGO: ...caACGGATTTGGTCGTATTGggc... ANNEALING SCORES PRIMER-SELF PRIMER-OTHER 5' end 3' end length G+C(%) Tm 3' Internal 3' Internal 27 44 18 44 52.7 4.0 10.0 12.0 18.0 Total Score: 18.0 CALCULATION OF ANNEALING SCORES The ANNEALING SCORES are calculated based on comparing all possible alignments of two sequences. If two primers are being compared each possible alignment of the two primers will be considered when calculating their maximum score. For example, this would be one possible alignment primer 1 5' __________________ 3' primer 2 3' __________________ 5' This another possible alignment primer 1 5' __________________ 3' primer 2 3' __________________ 5' To calculate the annealing score, start adding the scores each time a pair of complementary nucleotides are seen and stop adding when two non-complementary nucleotides are encountered. The following would give a score of 12 (given AT_SCORE=2 and CG_SCORE=4): TAAGGCTCGAT CTTCCTGGCTC The scores seen in this case are a match of A on one strand and T on the other strand which is worth the user-specified AT_SCORE (a default of 2), followed by another A-T combination (total score now is 4). The matches of C on one strand with a G on the other strand are worth the user-specified CG_SCORE (a default of 4) bringing the total to 12. The next pair of nucleotides encountered are non-complementary, thus 12 is retained for the score for that run of nucleotides. The other contiguous run of bases, CGA vs GCT, only gives a score of 10. The score, then, is 12 (since 12 > 10) for this alignment of those two sequences. If 12 were the highest score found after looking for runs of contiguous bases for each possible alignment of these sequences, it would be retained as the total score for these sequences. When calculating THREE PRIME annealing scores, the calculation starts by looking at the 3' end of the primer to see if that nucleotide is complementary to the nucleotide it is currently aligned with. If it is not, go on to the next alignment. If it is complementary, go on to the next nucleotide summing the scores until coming to a non-complementary nucleotide. After encountering the first non-complementary nucleotide, go on to the next possible alignment. When calculating INTERNAL annealing scores, the comparisons continue along the entire alignment. For example, if the first two nucleotides in a given alignment are complementary, and the next one is not. The first score is the sum of those two alignments. Then continue along that same alignment looking for the next complementary nucleotide and start score again. Thus, the THREE PRIME annealing scores only look at the annealing propensity of the three prime end of the primer whereas the INTERNAL annealing score looks at the entire sequence for contiguous complementary bases. SETTING UP A CONSTRAINTS FILE The format of the constraints file is as follows: CONSTRAINT_NAME value * where the CONSTRAINT_NAME is in all capital letters and is a valid constraint name and a corresponding value is specified. The value should be followed by a carriage return. The last line of the file must contain a * (indicating the end of the file) or it will not be processed correctly. Placing a constraint name and value in the constraints file overrides the program's internal default constraint values. Thus, only constraints which will differ from the program's internal defaults need be entered in the constraints file. Two sample constraints file are included with this distribution. One includes all the parameters the user may wish to set. The other is much shorter and includes only those constraints that a user typically needs to change. One way to set up your own constraints file is to use any text editor to modify one of those sample files. Alternatively, the user could set up his own constraints files containing only those constraints which differ from the default constraints used by the program. However, other ways are also available. In the X Windows Version of the program the user may change any constraints by clicking on the PARAMS or WEIGHTS buttons -- and then save those changes to a file by using the SAVE CONSTRAINTS TO A FILE button. In the text version the user, after answering C to Change specific primer/product constraints, may type in new constraints and values following the directions mentioned above under text version program EXECUTION. After typing in all of their changes to the constraints values they may type SAVE output_filename and save those constraints to a file. Remember that calculation of annealing scores is dependent on the DS/SS buttons, and for PRIMER-OTHER can be dependent on the OTHER: ----->/<----- button. The following annealing scores may be calculated: PRIMER-SELF: the primer to itself PRIMER-PRIMER: the primer to the other primer, the scores reported reflect the maximum of primer1 versus primer2 and primer2 versus primer1. PRIMER-PRODUCT: the primer to the product (the region between the two primers) PRIMER-OTHER: this score is the maximum of the following: 1. the primer versus the rest of the input sequence 2. the primer versus the sequences in the other sequence file. CONSTRAINTS AND THEIR VALUES When any constraint has its value set to 0, the program assumes the user does not wish to consider that constraint. Thus, when the maximum product length is set to 0, the program will consider products of any length greater than the minimum product length. In this way, the user may choose to ignore certain parameters. The only constraints which can not take on a value of 0 are minimum and maximum primer length. PRODUCT CONSTRAINTS PROD_LEN_MIN & PROD_LEN_MAX - minimum and maximum product length (includes the length(s) of the primer(s)) for choosing of primer pairs. When selecting a single primer, the it is considered as the minimum or maximum distance from the end of the input sequence (including the length of the primer) PROD_GC_MIN & PROD_GC_MAX - upper and lower boundaries on %G+C content of the product, including the primers. PROD_TM_MIN & PROD_TM_MAX - upper and lower boundaries on the Tm of the product, including the primers. Tm is calculated using the following formula (Bolton and McCarthy, 1962) Tm = 62.3 + 0.41*(%G+C) - 500/N where N is the length of the sequence PRIMER CONSTRAINTS PRIM_LEN_MIN & PRIM_LEN_MAX - minimum and maximum primer lengths considered. Note that if you have set the program parameters to look at all the primers from 18 to 22, the program will consider each possible primer from 18 to 22 even with an identical 3' end. However, when products are selected the following rules are applied: Given the following set of primers: 1. Oligo1: 29 46 Oligo2: 127 108 2. Oligo1: 28 46 Oligo2: 127 108 3. Oligo1: 27 46 Oligo2: 127 108 4. Oligo1: 26 46 Oligo2: 127 108 5. Oligo1: 25 46 Oligo2: 127 108 Only the 1st primer pair would be displayed -- i.e., the shortest sense primer is kept. The same would hold for 5 different length antisense primers each with the same sense primer. Similarly, in the case where the program is only choosing one primer, if five primers (25 46, 26 46, 27 46, 28 46, and 29 46) all passed the primer criteria, only the shortest primer would be kept for display in the primer output list. PRIM_GC_MIN & PRIM_GC_MAX - upper and lower boundaries on %G+C content of the primers. PRIM_TM_MIN & PRIM_TM_MAX - upper and lower boundaries on the Tm of the primers, Tm is calculated using the following formula: Tm = 62.3 + 0.41*(%G+C) - 500/N where N is the length of the sequence PRIM_TM_DIFF - the difference in melting temperature between two primers. In general, given that two primers have an identical GC content, a difference in melting temperature of 2 to 3 degrees corresponds to a difference in length of 2 bps. PRIM_NUCS - the 3' nucleotide(s) of the primers. Only primers with the specified 3' ends are considered. ANNEALING SCORE CONSTRAINTS Annealing scores are determined by the following two scores: AT_SCORE - score for an A<->T pair; this value is used when calculating the maximum annealing score for two aligned sequences (primer-primer, primer-self, primer-product or primer-other). When, in a given alignment, an A on strand aligns with a T on the other strand, AT_SCORE is the annealing score assigned to that pair. CG_SCORE - score for a C<->G pair When ambiguous nucleotides are encountered, annealing scores can be calculated in one of two ways determined by the setting of the WT_AMBIG constraint. WT_AMBIG - possible values are avg or full. "full" : example: C<->R score = CG_SCORE Setting the WT_AMBIG constraint to "full" could be referred to as a "worst-case" calculation, i.e., the calculation is made assuming the ambiguous nucleotide is the nucleotide which would results in giving the highest possible annealing score to that nucleotide pair. "avg" : example: C<->R score = 1/2 * CG_SCORE Setting the WT_AMBIG constraint to "avg" is a less severe method for calculating annealing constraints. In this method, the program assumes that an ambiguity code having, for example, two nucleotides that it represents has a 50% chance of being one of those nucleotides and a 50% chance of being the other. And thus the score calculated is the average of those two possible scores. Each annealing score must meet the following annealing score cutoffs: PRIM_SELF_I_ANN - primer-self internal annealing PRIM_SELF_3_ANN - primer-self three prime annealing PRIM_PRIM_I_ANN - primer-primer internal annealing PRIM_PRIM_3_ANN - primer-primer three prime annealing PRIM_PROD_I_ANN - primer-product internal annealing PRIM_PROD_3_ANN - primer-product three prime annealing PRIM_OTHER_I_ANN - primer-other internal annealing PRIM_OTHER_3_ANN - primer-other three prime annealing PRIMER RANKING CONSTRAINTS The following constraints determine how the primers are ranked. The internal default constraint values of osp simulate those that were hard-wired into the original version of this program. The weights are all floating point and are multiplied times the primer or product characteristics with which they are associated. For example, the WT_PROD_LEN is multiplied times the product length, the WT_PROD_GC is multiplied times the product %G+C content, etc., and summed to provide a total score for a given primer or primer pair. The primers are then sorted in ascending order; the primers with the smallest scores are considered to be the "best" primers. WT_PROD_LEN - product length WT_PROD_GC - product %G+C content WT_PROD_TM - product melting temperature WT_PRIM_1_GC - primer 1 %G+C content WT_PRIM_2_GC - primer 2 %G+C content WT_PRIM_1_LEN - primer 1 length WT_PRIM_2_LEN - primer 2 length WT_PRIM_1_TM - primer 1 melting temperature WT_PRIM_2_TM - primer 2 melting temperature WT_PRIM_TM_DIFF - difference in primer melting temperature WT_PRIM_SELF_I_ANN - primer-self internal annealing WT_PRIM_SELF_3_ANN - primer-self three prime annealing WT_PRIM_PRIM_I_ANN - primer-primer internal annealing WT_PRIM_PRIM_3_ANN - primer-primer three prime annealing WT_PRIM_PROD_I_ANN - primer-product internal annealing WT_PRIM_PROD_3_ANN - primer-product three prime annealing WT_PRIM_OTHER_I_ANN- primer-other internal annealing WT_PRIM_OTHER_3_ANN- primer-other three prime annealing NUM_STRANDS - possible values are 1 or 2, corresponding to single or double stranded, and pertains to the calculation of annealing scores. If the user chooses 2 (double-stranded), the program will compare any primer it finds to both the top and bottom strands of the sequence calculating the scores for annealing propensity. If the user chooses 1 (single_stranded), the program will compare primer(s) to only the "sense" strand of the sequence. OTHER SEQUENCE FILE INFORMATION OTHER_SEQ_NAME - name of the other sequence file (see OTHER SEQUENCE FILE FORMAT below) OTHER_SEQ_NUM_STRAND - possible values are 1 or 2, corresponding to single or double stranded, and pertains to the calculation of primer-other annealing scores. If 2, double stranded, is selected, each of the primers selected by the program will be aligned with both strands of the sequence(s) in the other sequence file when calculating primer-other annealing scores. However, if the current mode is 1, single stranded, the primers will be compared to the strand indicated in the variable OTHER_SEQ_STRAND. OTHER_SEQ_STRAND - possible values are top or bottom and allows the user to select which strand of their file of other sequences that will be used in calculating primer-other annealing scores. The sequence which is in the other sequence file is considered to be in its ``top'' strand form. If the value is "top", primers will be compared with the sequences as they are written in the other sequence file. However, if the value is bottom, then the primer will only be compared to the reverse/complement of the sequence(s) which exist(s) in the user's other sequence file. Both strands will be searched, however, if OTHER_SEQ_NUM_STRAND is 2. SEQUENCE FILE FORMAT The input sequence files may be one of several different formats. The program accepts both IUB/GCG and Staden/Sanger ambiguity codes. The sequence may contain blanks, tabs and new lines. Also, numbers in the 1st column of each sequence line are discarded as non-sequence data. Comments are only allowed on the first line of the sequence file (before the first return or newline character), and must be preceded by a '>'. Files whose first character is a semicolon, ';', are considered to be output from the ted trace editor (an editor for data from fluorescence based sequencing machines) and are read as such. OTHER SEQUENCE FILE FORMAT The other sequence file is the same as the sequence files format mentioned above except that multiple sequences can be entered within the one file. The sequences are delineated by '>' marks. Therefore, a sample other sequence file may look like the following: >ras1 AGATCGGCAGCTAGC >arq1 AGACTACAAGATCTGCGATCGATCGTAGCANATAGTGCTGATGTARAGTGTW The format is thus: >comment as many lines of sequence as you wish, which can be preceded by a number like 100 AGATCGACTAGCATCGATCAGCTACGATCGAGTCATGCTAGC >comment AGAACGTCGATCGATCAGCTAGCTACGTAGCTAGCTAGCTATCGATGCTAGCTGAT So that the > lines indicate the start of a new sequence. PROGRAM DISTRIBUTION AND INFORMATION Additional information can be found in the following: Hillier, L. and Green, P. (1991) OSP: An Oligo Selection Program, In preparation. Please direct all comments and suggestions to the author, LaDeana Hillier: email: lfw@elegans.wustl.edu Phone: 314-362-7667 regular mail: Washington University Medical School Department of Genetics Box 8232 4566 Scott St. Louis, MO 63110 USA