Bioinformatics Lab Assignment
Detection of Genome Rearrangements in
This assignment is worth 20% of the lab grade.
Due date: December 3 (Mon.), 4 (Tue.) or 5 (Wed.),
Goal: To discover chromosomal rearrangements since the divergence
of three yeast species, Saccharomyces cereviseae, S.
arboricola and S. eubayanus, from a common ancestor.
1. Download the Fasta Genomic sequence
for Saccharomyces eubayanus using the procedure
described in the tutorial Finding and retrieving complete
Download GCF_001298625.1_SEUB3.0_genomic.fna.gz to your
tutorials/getgenome directory, and run gunzip to decompress the
2. (3 points) Create new genome files with
simpler chromosome names
The chromosomes in the Fasta files that were downloaded from NCBI
use Accession numbers as names, which are too long to fit on the
dotplots, and also confusing to use. You can create copies of these
files in which the names are changed to the Roman numerals I - XVI,
as described below.
Download shortnames.sh to your
getgenome directory. This is a script, in other words, a file
containing bash commands to execute. The purpose of this script is
to strip out everything from the definition lines for each sequence,
and leave only the Roman numerals. For unmatched scaffolds, the name
will be S followed by the number. Mitochondrial names will simply be
the letter 'M'.
You need to run the chmod command to make this file executable
before you can run it:
chmod u+rx shortnames.sh
As a check, type the file permissions should be rwx for the owner,
as shown below:
-rwx------ 1 frist drr 381
Dec 15 11:14 shortnames.sh
Now, you can use the script to change the names in each of the three
genome files. When we run shortnames.sh, we must preceed the name of
the command with './', which tells the shell to look for this
command in the current directory (represented by '.').
The commands to create new copies of the three files will be:
./shortnames.sh GCF_000146045.2_R64_genomic.fna Scer.fna
Next, to verify that the scripts worked, use the grep command to
search for name lines which begin with '>', and send the output
to a file. For example,
cat Scer.fna | grep '>' > Scer.contignames
prints the contents of Scer.fna using the cat command. The ouput of
cat is piped into the grep command, which searches for lines
containing the right arrow character, and the output of grep is
written to a file called Scer.contignames. The file should look
something like this:
Use similar commands to create output files listing the name lines
for the other two genomes. The three sets of output should be pasted
into your template document.
Next, create a new directory called tutorials/compare, for the
remaining steps of the assignment. Move Scer.fna, Sarb.fna and
Seub.fna to this new directory. This is probably most easily done in
the file manager using Cut and Paste, although you could do it at
the command line (eg. mv Scer.fna ../compare).
Use 'cd' to go to your tutorials/compare directory for all remaining
3. (5 points) Create pairwise dotplots
comparing S.cer vs. S. arb and S. cer. vs. S. eub.
This part is done using procedures covered in Comparing
genomes using dotplots.
Include both dotplots in your report. It is probably best if each
dotplot is on a separate page. Optional: If you wish, you may
use a graphics program such as LibreOffice Draw to add labels,
arrows, circles etc. to your plot, and save the image as a file that
could be imported into your report.
- Run lastdb to create a database for S. cereviseae called Scer.
- Compare S. arb and S. eub genomes to S. cer, using the S. arb
and S. eub genomes as input,
- run lastal to create an alignment file (.maf)
- run last-dotplot to create an image file (.png)
|Note: The order of
chromosomes in these dotplots will not necessarily be the
order in which they appear in the input file! last-dotplot
appears to sort the chromosomes alphabetically by name.
Check the chromosome numbers on the output. In contrast,
Mauve appears to display the chromosomes in the order in
which they appear in the input files.
4. (5 points) Create a Mauve alignment
comparing all three genomes
This part is done using procedures covered in Comparing
genomes using Mauve.
In Mauve, choose File --> Align with progressive Mauve
and read in the three genomic sequences. S. cer. should be read
first so that it is used as the reference genome. Use the name
CerArbEub as the basename for output files.
Export graphic images of your Mauve alignment comparing the three
genomes. This will be included in your report. Mostly, running Mauve
is helpful to verify which LCBs in one genome correspond to which
LCBs in another genome.
|Hint: When you mouse-over
any LCB in a genome, the Roman numeral of the chromosome in
which it is found will appear at the bottom of the Mauve
window. This is important for finding out, in each genome,
which chromosome an LCB belongs to.
Indicate in the figure which LCBs represent the major
rearrangements. Use circles, arrows and labels, or other graphical
annotation to make this more clear. This could be done either by
importing your raw image into a drawing program such as LiberOffice
Draw, PowerPoint, PhotoShop etc. or by hand. The final
annotated image should be imported into your report.
5. (5 points) Write up your results in a
Describe the major chromosomal rearrangements that you have
found, citing the evidence from the dotplots and from the Mauve
results. Chromosomal rearrangements might include insertions,
duplications, deletions, inversions or translocations. If you see
translocations, are they reciprocal?
Your results should be written using one of the following template
These files contain places for pasting in output and screenshots,
as part of your report.
You can write the report on the Linux using LibreOffice, found in
the Applications --> Office menu. Alternatively, you
can install the Filezilla
client on your own computer and download your files from your
Linux account, to write the report there.
6. (2 points) Quality of presentation
Quality will include:
- Clarity of discussion
- Figures and other graphics should present the evidence in a
- Make sure your assignment report includes your name, student
number and UM email address.
- Save your assignment report as a PDF file, and upload it to
the PLNT3140 UMLearn dropbox site in the Bioinformatics folder.
Files in word processing formats (.doc, .docx, .rtf, .odt) are NOT
Note on grading: In assigning a grade, some consideration
may be given to how the answer communicates your ideas. Keep in
mind the following:
Note on academic integrity: The results in this assignment
obviously are derived from the research literature. It will be
considered a breach of academic integrity to search for the paper on
the Internet and simply copy the author's conclusions from the
- use of precise and appropriate biological terms
- organization of ideas in your answer, so that a clear chain of
logic is apparent. Formatting tools such as s subheadings or
bullet points help to show the reader the structure of your
- ambiguity - Try to read your answers from the viewpoint of
your reader. Are there several possible ways to interpret what
you have said?
If your don't understand how to do something, call me or stop by
my office, or send me a message
at firstname.lastname@example.org. Also,
remember that you can read documentation for each program using
links from the tutorials.