PLNT2530 PLANT BIOTECHNOLOGY   

Assignment 2 - February 2024

This assignment is worth 5% of the course grade.

Due by 11:59 pm, Monday February 26.


Purpose: To demonstrate that you can find sequences in the NCBI databases, using keywords. (Although you might be able to narrow down your search using free-form search engines such as Google, credit will only be given for demonstrating a set of search terms that lead to a small number of hits, which can then be checked by inspection of the GenBank entries for those hits. Keep in mind that the NCBI databases are not indexed in search engines such as Google.)

1. (5 points) Search for and retrieve the pBI121 cloning vector
Inside your PLNT2530 directory, create a sub-directory called as2, to hold materials associated with Assignment 2. Save all files in your PLNT2530/as2 directory.

In the Agrobacterium lab, we will be working with plants transformed using the binary T-DNA vector pBI121.Your first task is to use blncbi find and retrieve the GenBank entry for the cloning vector pBI121. pBI121 is a binary T-DNA vector used for Agrobacterium transformation. The T-DNA region contains a NPTII gene as a selectable marker for resistance to the antibiotic kanamycin. The T-DNA also contains the GUS reporter gene encoding β-glucuronidase.

Your job is to experiment with search terms until you are certain that you have the pBI121 vector, and not a related vector eg. a vector derived from pBI121, or a parent vector. Essentially, this is a three step process:

1. Do a search using blncbi with your search terms.
2. Retrieve hits that look promising using SEQFETCH
3. View the GenBank entries for the hits using File --> View sequences.

Before saving, you should view the GenBank entry from bldna to verify that you have the correct file. Next, save the GenBank entry in a file called pBI121.gen.

Take screenshots of the following for your report:

2. (10 points) Search for and retrieve the tan spot necrosis toxin gene, ToxA.

Later in the course, we will be working with a fungal gene for the tan spot protein. Pyrenophora tritici-repentis causes tan spot disease in wheat. The Tox A protein from P. tritici-repentis causes necrosis of plant tissue, which is characteristic of this disease. A number of labs have cloned this gene independently. One of those is a cDNA clone isolated here at the University of Manitoba. The insert size of the clone is approximately 900 bp.  Based on this information, use blncbi to find and retrieve the GenBank entries for the Ptr tan spot necrosis toxin cDNA clone. This is likely to take some experimentation with different search keys and terms.

The goal is to find, using as few search terms as possible, the entries that contain the necrosis toxin protein coding sequence (annotated as a CDS feature in any given GenBank entry) and exclude those that are false positives. For example, the gene names "toxA" or "ToxA" or "tox-A" etc. might be used for different genes in different species, or the word necrosis might be found in any entry in which the word necrosis was used. Beware that the title fields shown in column B can be misleading. You must also look at the annotation in the GenBank files (covered in the tutorial).  For example, an entry that gets a hit because the terms "necrosis" and "toxin" both appear somewhere in the entry could conceivably come from an entry with a literature citation that had both words in the title, but had nothing to do with the tan spot toxin. Proteins annotated by ambiguous terms such as "hypothetical protein" should be considered false positives, unless other evidence in the entry explicitly identifies the protein as the tan spot necrosis toxin.

It is important to note that you may identify different sets of toxA genes with different queries. While it would be ideal to find a single query that finds all true toxA genes and no false positives,  2 or 3 independent queries is fine if it gets all the genes.

Once you settle on one or a few good queries, save your query results from blncbi and the corresponding GenBank entries from bldna.

You will probably end up with a number of different tan spot toxin genes. Enough information is given to identify a single GenBank entry with the gene actually described. For full credit on this part, you must identify which GenBank entry is the correct copy of the gene, and explain how you know this is the correct gene.

Search tips:
1) blncbi lets you construct complex queries using the conjunctions AND, OR and NOT, as well as grouping two or more things using parentheses. For example, a search aimed at retrieving either the pUC18 or pUC19 vectors might use the query

(pUC18 [Title] OR pUC19 [Title]) AND syn [Division]

2) Don't waste your time adding trivial terms to your query to specifically exclude sequences by Accession number. eg.

(pUC18 [Title] OR pUC19 [Title]) AND syn [Division] NOT (S38358 [Accession] OR
M22135 [Accession] OR X13074 [Accession] OR X13070 [Accession])


If the number of false positives is small, they will be easy to weed out by inspection of the GenBank files.

3) Whole genome shotgun sequencing contigs and scaffolds may show up as hits. Although these probably do contain the gene you're looking for, it adds a layer of complexity to work with contigs, because they will contain many genes. Ideally, we want to find a smaller GenBank sequence that has a single tan spot gene, and no other genes.


Saving blncbi results
The results from any blncbi window can be saved by choosing Edit --> Select All, and then choosing File --> Save SELECTION As.

Make sure that the File format is tsv, and give the file a descriptive name with the .tsv file extension.


TSV stands for "Tab-separated value" files. This is a generic format in which each row in a table has one or more fields (columns). The values on each line are separated by TAB characters, which have the same effect as using tabs in a document. Virtually all spreadsheet programs such as LibreOffice of MS Excell can import TSV files.

Check your files
It is always a good idea to examine your GenBank or TSV files in a text editor to make sure that they contain what you think they contain!


Take screenshots of the following for your report:

3. (5 points) Create a document to show your results

Use the template document to create a report. The report template is available in LibreOffice and MS-Word formats. Replace the dummy results in the template with your own results. The report should include the following:
If you were unable to exclude all false positives, list the accession numbers of the false positives. Briefly explain why you think they were found, despite being false positives.

Export your final report to a PDF file. Reports in MS-Word or LibreOffice formats are not acceptible.

Along with your report, you should upload your tsv and GenBank files. Make sure that all files have descriptive names.

Presentation guidelines


Submitting your assignment

Your PDF report, along with associated TSV and GenBank files, is due by 11:59 pm, Monday February 26. on the PLNT2530 UMLearn dropbox site in the Assignment 2 folder. Files in word processing formats (.doc, .docx, .rtf, .odt) are not acceptable.


If you have questions, it may help to send me a message at frist@cc.umanitoba.ca.