Bioinformatics Lab Assignment
Detection of Genome Rearrangements in
Yeast Species
This assignment is worth 20% of the lab grade.
Due date: December 6, 2023.
Goal: To discover chromosomal rearrangements since the divergence
of three yeast species, Saccharomyces cereviseae, S.
arboricola and S. eubayanus, from a common ancestor.
1. Download the Fasta Genomic sequence
for Saccharomyces eubayanus using the procedure
described in the tutorial Finding and retrieving complete
eukaryotic genomes.
Find the Saccharomyces eubayanus genome SEUB3.0 on the NCBI
Genome web site.
Download the Submitted GenBank assembly file
GCA_001298625.1.zip to your tutorials/getgenome directory,
and run unzip to decompress the files. Move the fasta file
GCA_001298625.1_SEUB3.0_genomic.fna to your getgenome directory.
2. (4 points) Create new genome files with
simpler chromosome names
The chromosomes in the Fasta files that were downloaded from NCBI
use Accession numbers as names, which are too long to fit on the
dotplots, and also confusing to use. You can create copies of these
files in which the names are changed to the Roman numerals I - XVI,
as described below.
Download shortnames.sh to your
getgenome directory. This is a script, in other words, a file
containing bash commands to execute. The purpose of this script is
to strip out everything from the definition lines for each sequence,
and leave only the Roman numerals. For unmatched scaffolds, the name
will be S followed by the number. Mitochondrial names will simply be
the letter 'M'.
You need to run the chmod command to make this file executable
before you can run it:
chmod u+rx shortnames.sh
As a check, type the file permissions should be rwx for the owner,
as shown below:
-rwx------ 1 frist drr 381
Dec 15 11:14 shortnames.sh
Now, you can use the script to change the names in each of the three
genome files. When we run shortnames.sh, we must preceed the name of
the command with './', which tells the shell to look for this
command in the current directory (represented by '.').
The commands to create new copies of the three files will be:
./shortnames.sh GCA_000146045.2_R64_genomic.fna Scer.fna
./shortnames.sh GCA_000292725.1_SacArb1.0_genomic.fna
Sarb.fna
./shortnames.sh GCA_001298625.1_SEUB3.0_genomic.fna
Seub.fna
Next, to verify that the scripts worked, use the grep command to
search for name lines which begin with '>', and send the output
to a file. For example,
cat Scer.fna | grep '>' >
Scer.contignames
prints the contents of Scer.fna using the cat command. The ouput of
cat is piped into the grep command, which searches for lines
containing the right arrow character, and the output of grep is
written to a file called Scer.contignames. The file should look
something like this:
>I
>II
>III
>IV
>V
>VI
>VII
>VIII
>IX
>X
>XI
>XII
>XIII
>XIV
>XV
>XVI
|
Use similar commands to create output files listing the name lines
for the other two genomes. The three sets of output should be pasted
into your template document.
Next, create a new directory called tutorials/compare, for the
remaining steps of the assignment. Move Scer.fna, Sarb.fna and
Seub.fna to this new directory. This is probably most easily done in
the file manager using Cut and Paste, although you could do it at
the command line (eg. mv Scer.fna ../compare).
Use 'cd' to go to your tutorials/compare directory for all remaining
steps.
3. (4 points) Create pairwise dotplots
comparing S.cer vs. S. arb and S. cer. vs. S. eub.
This part is done using procedures covered in Comparing
genomes using dotplots.
- Run lastdb to create a database for S. cereviseae called Scer.
(Note: Use the files created by shortnames.sh (ie. Scer.fna,
Sarb.fna and Seub.fna), NOT the original files downloaded from
NCBI.)
- Compare S. arb and S. eub genomes to S. cer, using the S. arb
and S. eub genomes as input,
- run lastal to create an alignment file (.maf)
- run last-dotplot to create an image file (.png)
Include both dotplots in your report. It is probably best if each
dotplot is on a separate page. Use a graphics program such as
LibreOffice Draw to add labels, arrows, circles etc. to your
plot to indicate important features, and save the image as a
file that could be imported into your report.
Note: The order of
chromosomes in these dotplots will not necessarily be the
order in which they appear in the input file! last-dotplot
appears to sort the chromosomes alphabetically by name.
Check the chromosome numbers on the output. In contrast,
Mauve appears to display the chromosomes in the order in
which they appear in the input files.
|
4. (4 points) Create a Mauve alignment
comparing all three genomes
This part is done using procedures covered in Comparing
genomes using Mauve.
In Mauve, choose File --> Align with progressive Mauve
and read in the three genomic sequences. S. cer. should be read
first so that it is used as the reference genome. Use the name
CerArbEub as the basename for output files.
Export graphic images of your Mauve alignment comparing the three
genomes. This will be included in your report. Mostly, running Mauve
is helpful to verify which LCBs in one genome correspond to which
LCBs in another genome.
Hint: When you mouse-over
any LCB in a genome, the Roman numeral of the chromosome in
which it is found will appear at the bottom of the Mauve
window. This is important for finding out, in each genome,
which chromosome an LCB belongs to.
|
Indicate in the figure which LCBs represent the major
rearrangements. Use circles, arrows and labels, or other graphical
annotation to make this more clear. This could be done either
by importing your raw image into a drawing program such as
LiberOffice Draw, PowerPoint, PhotoShop etc. or by hand. The
final annotated image should be imported into your report.
5. (4 points) Write up your results in a
report.
Describe the major chromosomal rearrangements that you have
found, citing the evidence from the dotplots and from the Mauve
results. Chromosomal rearrangements might include insertions,
duplications, deletions, inversions or translocations. If you see
translocations, are they reciprocal?
Your results should be written using one of the following template
files:
These files contain places for pasting in output and screenshots,
as part of your report.
You can write the report on the Linux using LibreOffice, found in
the Applications --> Office menu. Alternatively, you
can install the Filezilla
client on your own computer and download your files from your
Linux account, to write the report there.
6. (4 points) Quality of presentation
Quality will include:
- Everything must be legible.
- Clarity of discussion. Remember, longer is not always better.
Use words economically with precise terminology and a coherent
chain of logic. Organize your thoughts first, then write.
- Figures and other graphics should present the evidence in a
readable way. When you're writing up scientific results, a bit
of creative labeling of the figure can save several paragraphs
of text. This assignment is is part designed to develop those
visual communication skills. Make the figure stand on its own.
Ask the question: "If someone just looked at the figure and
didn't read the text could they still understand what the
results are telling us?".
- When possible, past in text, rather than a screenshot. This
saves disk space. As well, text in graphics can often be blurry.
- Screenshot does NOT mean taking a picture of the screen using
a phone. The result is usually of poor quality. The Mauve
tutorial describes how export screen images to a file.
Submitting
your assignment
- Make sure your assignment report includes your name, student
number and UM email address.
- Save your assignment report as a PDF file, and upload it to
the PLNT3140 UMLearn dropbox site in the Bioinformatics folder.
Files in word processing formats (.doc, .docx, .rtf, .odt) are NOT
acceptable.
Note on grading: In assigning a grade, some consideration
may be given to how the answer communicates your ideas. Keep in
mind the following:
- use of precise and appropriate biological terms
- organization of ideas in your answer, so that a clear chain of
logic is apparent. Formatting tools such as subheadings or
bullet points help to show the reader the structure of your
ideas
- ambiguity - Try to read your answers from the viewpoint of
your reader. Are there several possible ways to interpret what
you have said?
Note on academic integrity: The results in this assignment
obviously are derived from the research literature. It will be
considered a breach of academic integrity to search for the paper on
the Internet and simply copy the author's conclusions from the
paper.
If your don't understand how to do something, call me or stop by
my office, or send me a message
at frist@cc.umanitoba.ca. Also,
remember that you can read documentation for each program using
links from the tutorials.