BIRCH
Workflow example: Use BLASTP
to search for sequences related to a query protein, within a specific
taxonomic group
The pea defense gene DRR206 confers strong resistance to fungal
pathogens when transformed into Brassica
napus. Although no Brassica
homologue of this gene has been found by hybridization, we would like
to see if a homologous sequence can be found in species closely-related
to B. napus. Since protein
searches are more sensitive than DNA
searches, we need to get the amino acid sequence from the genomic
sequence. This involves one step to extract the protein coding sequence
(CDS), and another to translate the CDS into protein.
The first step is to read in GenBank file for the genomic sequence.
Save the file PSU1716l.gen in your
current working directory. In bldna, choose File --> Open,
and read in the file.
Next, we need to extract the protein coding sequence from the genomic
sequence. In GenBank files, coding sequences are annotated as CDS
features. Select the sequence and choose Database -->
FEATURES - Extract by feature keys:
Set the feature key to CDS, Database to "Selected sequences", and send
output to bldna. A new bldna window pops up with the coding sequence
which
was extracted from the larger genomic sequence.
To translate into
protein, select PSU11716:CDS1, and choose DNARNA --> Ribosome. The
translated protein will be sent to a new blprotein window.
To run the BLASTP search, select the amino acid sequence and choose
Database --> NCBI BLASTP.
Choose the GenBank Nonredundant Protein
database. To limit the search to relatives of Brassica napus, click
"Yes" for Restrict search to entries containing string:, and type
"Brassicaceae" as the search string. To eliminate poor matches, set the
# matches expected by random chance to 1.0e-8 (ie. 10-8).
Click on Run to start the search.
The query sequence is sent to the NCBI Blast server, and the results
are returned when complete. The results appear in several windows. The
Blast report appears in a text editor
and the Accession numbers of the hits are sent to another text editor
window:
The Accession number file can be saved, and used to retrieve proteins
from the NCBI Batch Entrez service at http://www.ncbi.nlm.nih.gov/entrez/batchentrez.cgi?db=Protein.