BIRCH
Workflow example: Use BLASTP
to search for sequences related to a query protein and retrieve hits
The pea defense gene DRR206 confers strong resistance to fungal
pathogens when transformed into Brassica
napus. Although no Brassica
homologue of this gene has been found by hybridization, we would like
to see if a homologous sequence can be found in speices closely-related
to B. napus. Since protein
searches are more sensitive than DNA
searches, we need to get the amino acid sequence from the genomic
sequence.
The first step is to read in GenBank file for the genomic sequence.
Save the file PSU1716l.gen in your
current working directory. In
GDE, choose File --> Open,
and read in the file.
Next, we need to extract the protein coding sequence from the genomic
sequence. In GenBank files, coding sequences are annotated as CDS
features. Choose Database -->
Features (extract by feature keys):
Set the feature key to CDS, Database to "Selected sequences", and send
output to GDE. A new GDE window pops up with the coding sequence which
was extracted from the larger genomic sequence. To translate into
protein, select PSU11716:CDS1, and choose DNA/RNA --> Ribosome.
Where the menu says "Send output to a new GDE window", choose "No." The
translated protein will be sent to the current GDE window.
To run the BLASTP search, select the amino acid sequence and choose
Database --> NCBI BLASTP.
Choose the GenBank NonRedundant (Protein)
database. To limit the search to relatives of Brassica napus, click
"Yes" for Restrict search to entries containing string:, and type
"Brassicaceae" as the search string. To eliminate poor matches, set the
# matches expected by random chance to 1.0e-8 (ie. 10-8).
Click on OK to start the search.
The query sequence is sent to the NCBI Blast server, and the results
are returned when complete. The results appear in several windows. The
Blast report appears in a text editor
and the Accession numbers of the hits are sent to another text editor
window:
The Accession numbers are also sent to a dGDE window. dGDE is a program
for working with lists of numbers, such as accession numbers, TaxID
numbers etc. dGDE is designed to send requests to the SeqHound data
warehouse (http://www.blueprint.org/seqhound),
and retrieve information from SeqHound. Since SeqHound can
only retrieve sequences using GI numbers, we first need to convert the
Accession numbers into GI numbers. Choose Convert --> Convert
Name/Acc to GI. Click on "Accession to GI", and send the output
to the
current GDE window (ie. No).
The Accession numbers are sent to SeqHound, and the GI numbers are read
into the GDE window. Select the GI numbers, and choose Retrieve -->
Protein from Protein GI. Click OK.
GDE will send the GI numbers to SeqHound, calling the SeqHound method
SHoundGetFastaList. The sequences are returned to a new GDE window: