BIRCH

Tutorial: Designing PCR primers to amplify a gene from genomic DNA


March 29, 2017

PrimerBLAST publication: http://www.ncbi.nlm.nih.gov/pubmed/?term=22708584


Rationale: In many cases the quickest way to clone a gene is to amplify it by PCR from genomic DNA, and clone the PCR product. This approach is especially useful if the target is a complete copy of the gene, including the flanking regions which may contain important regulatory sequences that may be found in a genomic sequence.

Goal:
To demonstrate the process of finding a gene in a genomic sequence, designing PCR primers for the gene, and retrieving the expected PCR product that would be generated using those primers.

In practice, if the goal was to clone a PCR fragment, one could either add suitable restriction sites to these primer sequences before synthesis of the primers, or after PCR, ligate cloning adaptors to the blunt-end PCR fragments.

Overview:
For this tutorial, we will use the example of the Brassica napus gene LepR3 for resistance the blackleg fungus, Leptosphaeria maculans.

Larkan, N. J., Lydiate, D. J., Parkin, I. A. P., Nelson, M. N., Epp, D. J., Cowling, W. A., Rimmer, S. R. and Borhan, M. H. (2013), The Brassica napus blackleg resistance gene LepR3 encodes a receptor-like protein triggered by the Leptosphaeria maculans effector AVRLM1. New Phytol, 197: 595605. doi:10.1111/nph.12043

1. Create a working directory

I can't repeat this often enough. ALWAYS create a new directory for each project.

cd tutorials
mkdir primerblast
cd primerblast

go into tutorials directory
create a directory called primerblast
go into the primerblast directory

This directory will be used for all files associated with this tutorial.

2. Locate the Brassica napus LepR3 gene

Since we'll need to use NCBI PrimerBlast, it's best to start by finding the gene on the NCBI web site at https://www.ncbi.nlm.nih.gov. Using the search panel at the top of the page, Set the database to Gene, and type in the search term as shown:


The top part of the results are presented at right. Note that the formal annotation for this gene lists it as a receptor-like protein. This makes sense, since most plant disease resistance genes are Toll-like receptor protein kinases.

The genome annotation tells us that this gene is found on an unplaced scaffold. It should be noted that the publication for this gene places LepR3 on Chromosome 10A (ie. in the A genome of B. napus.) That suggests that although the genome assembly was unable to link this scaffold to a specific chromosme, this scaffold probably comes from Chromosome 10A.



The gene can be viewed in the context of the chromosome scaffold further down the page:



The gene is annotated as spanning nucleotides
294560..298000. Note that the view is shown with respect to the orientation of the LepR3 gene, which is encoded on the reverse strand. Therefore, the coodinates in the genome viewer go from high numbers to low numbers, going left to right.

3. Design PCR primers to amplify the gene and its flanking regions

Since to goal is to clone the genomic copy of the gene for use in plant transformation, we want not only the coding region, but also the flanking regions that are likely to contain important regulatory sequences. If you were to click on the GenBank link, you would see a GenBank report encompassing 3441 bp, covering ONLY the mRNA coding region and a bit of the downstream flanking region. (See
LepR3mRNA.gen). Therefore, if we want a larger PCR product that includes the flanking regions, we have to choose a larger region to be used for primer design.

To see the gene in the larger context of flanking sequences, click once on the Zoom out button (-)



While it is impossible to predict the actual extent of the promoter and other regulatory sequences, let's assume that we need 1000 bp upstream, and 500 bp downstream of the gene. That means that we want the PCR product to encompass nucleotides 294,000 to 299,000.

Keep in mind that these coordinates are the sequences that we want to guarantee will be found in the PCR product. Therefore, the primers must be located outside this region. When designing primers, we need to specify a large enough area upstream and downstream from the desired product to ensure that the program can find suitable primers. For simplicity, let's say that the region to be searched by PrimerBLAST should include 1000 bp flanking the desired product. That would put the coordinates for the search at 293,000 to 300,000.

We are now ready to run PrimerBLAST. You can select this region with the mouse as shown below:



You can launch Primer BLAST from the popup menu, or from the Tools menu.

Primer-BLAST lists the accession number of the genomic scaffold, and the coordinates of the selected region are shown in the Forward primer and Reverse Primer boxes.


In the Primer Parameters section, we set the Minimum PCR product size to 5000 (ie. 299,000 - 294,000)


We also have to explicitly tell Primer-BLAST not to choose primers within the 5000 bp target region that we want included in the PCR product. Therefore, go to Advanced settings at the bottom of the page, and find the box reading "Excluded regions" in the Advanced Primer Parameters.

Primer-BLAST chooses an Excluded region using a starting point and a length. Therefore, type in

294000,5000

(No commas are allowed in large numbers such as 294,000).


How does Primer-BLAST work?

The search for primers is essentially a 2-step process:

1. Use the Primer3 program to design candidate primer pairs for the target sequence. Almost all of the parameters to Primer-BLAST are actually parameters for Primer3.
2. Use MegaBLAST to search an NCBI database for matches to the primer. Any good matches to genes in the database other than the target sequence will cause candidate primers to be discarded. This step is critical to ensure that the final primer pairs will amplify the desired target, and no other sequences in the genome. By default, the RefSeq mRNA database is searched for unwanted matches.



The graphical view shows the products produced by the 10 best PCR primer pairs, superimposed on the genome map. You can mouse over any product to see the length of the product, along with links for downloading the sequence of the product.


In the example, the shortest of the products 5179 bp, was produced using primer pair 3.

If you scroll down to the Detailed Primer Reports, you will see the sequences of the primers, as well as other data on the PCR primers. For comparison, results for Primer pairs 2 and 3 are shown.

Further examination of the map shows that all of the reverse primers start at about the same place within the promoter region (~299,000), and the length differences are primarily due to choice of the forward primer, in the 3' downstream region of the gene. Since we are mostly interested in maximizing the promoter region, we'll choose Primer Pair 3 for further work. To save the sequence of the PCR product, along with its gene annotation, mouse over the map for Primer Pair 3 to bring up the menu shown above, and click on GenBank view. The GenBank view will appear in a new tab or window.



Note that the Accession number is the same as the scaffold, and the region shown in this file is the 5179 bp fragment going from 293914 to 299092. Save the file using the Send menu in the upper right corner, and choose Complete record, File, and GenBank (full).

Click on Create File to save, and save as LepR3-PCR.gen.

Note: The GenBank entry does NOT list the sequences of the primers! While it is true that the sequence begins and ends with the forward and reverse primer sequences, there is no annotation to tell precisely what those primers are. Therefore, you need to make a special note of the primer data from the Primer-BLAST results.