Assignment 3 - March 12, 2018

Verifying the structure of DNA constructs using restriction digests

This assignment is worth 5% of the course grade.

Due by 11:59 pm, Wednesday March 21

Rationale: It is not enough to go through the steps of creating DNA constructs using standard cloning procedures. Each construct must be verified to ensure that the resultant DNA is actually what we intended it to be. While in principle this could be done by DNA sequencing, the reality is it usually takes days or weeks to get results back from sequencing services.

Restriction digests give us a way to verify the structure of our constructs in a single experiment. The first step is to create a file containing the DNA sequence that we should get if the experiment worked. The hypothesis we want to test is that the construct is correct. Computer programs can predict the bands that should be produced by digestion with various restriction enzymes. If the plasmid DNA gives the predicted bands, then the hypothesis is true, and the construct is probably correct. If we get different bands than those predicted, that hypothesis is false, and we know that the construct is different from what we intended.

Create a directory containing the test sequences

Inside your PLNT2530 directory, create a sub-directory called as3, to hold materials associated with Assignment 3. Save all files in your PLNT2530/as3 directory.

Copy the following files from your Ugene/Intro directory to your as3 directory:
To make it easier to write your report, you can download template files in LibreOffice or MS-Word format. These documents contain dummy data to be replaced with your own data. For reference, a  PDF file is also available.

1. (6 points) Demonstrate the effect of sequence topology on restriction digests.

It is critical to realize that the topology of a sequence ie. whether it is circular or linear, drastically affects the fragments predicted in a restriction digest. If the topology is set incorrectly for a sequence, the results of a restriction site search will be wrong. In this section, we will save the pBS_SK-GUS construct in FASTA format. FASTA is a commonly-used sequence format that contains only the sequence and a short definition line describing it. There is no way to specify the topology of a sequence in a FASTA file. Consequently, most programs will consider sequences in this format to be linear.

The goal is to compare restriction fragments produced by the circular and the linearized sequences for pBS_SK-GUS. Since we already have pBS_SK-GUS in its circular form, we next need to create the linearized form.
  1. Open in bldna
  2. Select the sequence, and then choose File --> View sequences. Set the format to FASTA, and click Run. The FASTA file will appear on the screen.
  3. To make it easy to distinguish this sequence from the original, change the name of the sequence (found on line 1) to pBS_SK-GUS.fsn.
  4. Save the file to your as3 directory as pBS_SK-GUS.fsn.
  5. Open pBS_SK-GUS.fsn in your bldna window.
  6. Run BACHREST searches on both sequences, and save both output files as pBS_SK-GUS.bachrest and pBS_SK-GUS.fsn.bachrest.
In your report, answer the following questions:
a) What is different about the fragments predicted if we digest a circular sequence as if it were linear?
b) In the BACHREST output, compare the results for EcoRI and HindIII, between the linear and circular sequences. Aside from what you already discussed in a),  one of the sites appears to be missing. Explain the cause.

Notes on Restriction digests
1)Remember that the three rightmost columns in the BACHREST output, with the column headings Frags, Begin and End, give the sizes of fragments seen in descending order of size, as they would appear on a gel, and the beginning and end of each fragment.
2) Some enzymes recognize several possible restriction sequences. For example, SmlI recognizes 5'C^TYRAG3', where Y stands for pyrimidine (C or T) and R stands for purine (A or G). See Sequence File Formats for a complete list of ambiguity symbols for nucleotides.

For both parts of this question, include in your reports examples of BACHREST output for specific enzymes that support your conclusions. Present the data in the boxes provided in the template.

2. (6 points) Create a construct using pBluescript SK(+), for comparison with the construct made using pBluescript SK(-).

a) Using the same procedures as detailed in the Introduction to Ugene tutorial, create a construct by cloning the same 3 kb 35S-GUS fragment from pBI121 between the EcoRI and HindIII sites in pBluescript SK(+). Call this file
b) For each construct, create a map in the Ugene circular Overview, and export as PNG files to pBS_SK+GUS.png and pBS_SK-GUS.png.

Import your maps as shown in the template.

3. (6 points) Use BACHREST to find restriction digests that would allow us to determine which of the two Bluescript vectors was used in the real construct.

As shown in the Bluescript map, only difference between pBS_SK(+) and pBS_SK(-) is that the f1 origin of replication is in opposite orientations in the two vectors. This fragment is in the plus orientation in pBS_SK(+) and the minus orientation in pBS_SK(-). Because there are several Bluescript vectors with similar names, unless meticulous lab notes are kept, it may not have been recorded which of the vectors was actually used for cloning. Unfortunately, this happens more often than you might think in the lab.

Fortunately, if we create the sequences for the insert cloned into both vectors, we can find restriction enzymes that will give different fragments if the insert was cloned into pBS_SK(+), than would be seen if it was cloned into pBS_SK(-). Therefore, with a few restriction digests in the lab, we could find out which vector was actually used. 
  1. Read into bldna.
  2. Run BACHREST on this sequence to compare restriction fragments that would be generated by different enzymes with those generated from pBS_SK-GUS. From the text editor, save the BACHREST output to pBS_SK+GUS.bachrest
Comparison of the digests should reveal a number of enzymes that give easily distinguishable banding patterns if each of the real constructs was run a gel. Of these, choose enzymes for illustrating the differences, using the following criteria:
In your report, display your results in the boxes provided, showing fragments from both constructs for comparison. You should be able to find at least three digests that easily distinguish between the two constructs.

As a second means of comparison, create restriction maps of each construct in Ugene, showing just the three restriction enzymes you have chosen, plus EcoRI and HindIII. Export these maps as pBS_SK+GUSrestmap.png and pBS_SK-GUSrestmap.png. These should be included in your report as shown in the template.

Discussion: Based the restriction digest data and the maps of the two constructs, roughly where is the f1 region on your maps? Explain how your data lead you to that conclusion.

Note on maps
Where two or more restriction sites are very close together, the precise ordering of those sites shown in the map may not be correct, depending on how Ugene draws the map.

4. Presentation of the report (2 points

The report should include the following:
In addition to your report, also upload to the UMLearn Dropbox any new GenBank, FASTA or BACHREST files created for this assignment. Don't bother uploading the png image files. These should be embedded in your report.

Presentation guidelines

Use fixed fonts for program output
Many programs generate output that only makes sense with a fixed font.  Most fonts commonly-seen in documents are proportional fonts, meaning that narrow letters such as 'i' or 'l' take up very little width, whereas wide letters such as 'O' take up more space. In fixed fonts, all letter and numbers take up exactly the same width on a line of text. For example, output from BACHREST, listing restriction cutting sites and restriction fragments for a sequence cut with the enzyme AcoI, are shown in both fixed and proportional fonts. Make sure that when output is shown on a web page, it is in a fixed font.

fixed font
                                         # of
Enzyme          Recognition Sequence     Sites     Sites   Frags   Begin     End
AcoI            Y^CCGGR                      3

                                                    1289    2817    2533    5349
                                                    1556    1288       1    1288
                                                    2533     977    1556    2532
                                                             267    1289    1555

proportional font
AcoI            Y^CCGGR                      3
                                                    1289    2817    2533    5349
                                                    1556    1288       1    1288
                                                    2533     977    1556    2532
                                                             267    1289    1555

Most fonts are proportional, eg. Helvetica, Times.
Examples of fixed fonts include Courier, Liberation mono and Terminal.

Submitting your assignment

Your PDF report, along with associated GenBank, FASTA and BACHREST files, is due by 11:59 pm, Wed. March 21 on the PLNT2530 UMLearn dropbox site in the Bioinformatics3 folder. Files in word processing formats (.doc, .docx, .rtf, .odt) are not acceptable.

If you have questions, it may help to send me a message at