Bioinformatics Lab Assignment

Detection of Genome Rearrangements in Yeast Species

This assignment is worth 20% of the lab grade.

Due date: December 11 (Mon.), 12 (Tue.) or 13 (Wed.), 2017.

Goal: To discover chromosomal rearrangements since the divergence of three yeast species, Saccharomyces cereviseae, S. arboricola and S. eubayanus, from a common ancestor.

1. (1 point) Download the Fasta Genomic sequence for Saccharomyces eubayanus using the procedure described in the tutorial Finding and retrieving complete eukaryotic genomes.

Download GCF_001298625.1_SEUB3.0_genomic.fna.gz to your tutorials/getgenome directory, and run gunzip to decompress the file.

2. (2 points) Create new genome files with simpler chromosome names

The chromosomes in the Fasta files that were downloaded from NCBI use Accession numbers as names, which are too long to fit on the dotplots, and also confusing to use. You can create copies of these files in which the names are changed to the Roman numerals I - XVI, as described below.

Download to your getgenome directory. This is a script, in other words, a file containing bash commands to execute. The purpose of this script is to strip out everything from the definition lines for each sequence, and leave only the Roman numerals. (The script will not change the definition lines for unmatched scaffolds and for the mitochondrial genomes.)

You need to run the chmod command to make this file executable before you can run it:

chmod u+rx

As a check, type the file permissions should be rwx for the owner, as shown below:

-rwx------ 1 frist drr      381 Dec 15 11:14

Now, you can use the script to change the names in each of the three genome files. When we run, we must preceed the name of the command with './', which tells the shell to look for this command in the current directory (represented by '.').

The commands to create new copies of the three files will be:

./ GCF_000146045.2_R64_genomic.fna Scer.fna
./ GCF_000292725.1_SacArb1.0_genomic.fna Sarb.fna
./ GCF_001298625.1_SEUB3.0_genomic.fna Seub.fna

Next, to verify that the scripts worked, use the grep command to search for name lines which begin with '>', and send the output to a file. For example,

cat Scer.fna | grep '>'  >  Scer.contignames

prints the contents of Scer.fna using the cat command. The ouput of cat is piped into the grep command, which searches for lines containing the right arrow character, and the output of grep is written to a file called Scer.contignames. The file should look something like this:

>NC_001224.1 Saccharomyces cerevisiae S288c mitochondrion, complete genome

Use similar commands to create output files listing the name lines for the other two genomes. The three sets of output should be pasted into your template document.

Next, create a new directory called tutorials/compare, for the remaining steps of the assignment. Move Scer.fna, Sarb.fna and Seub.fna to this new directory. This is probably most easily done in the file manager using Cut and Paste, although you could do it at the command line (eg. mv Scer.fna ../compare).

Use 'cd' to go to your tutorials/compare directory for all remaining steps.

3. (5 points) Create pairwise dotplots comparing S.cer vs. S. arb and S. cer. vs. S. eub.

This part is done using procedures covered in Comparing genomes using dotplots. Include both dotplots in your report. It is probably best if each dotplot is on a separate page. Optional:  If you wish, you may use a graphics program such as LibreOffice Draw to add labels, arrows, circles etc. to your plot, and save the image as a file that could be imported into your report.

Note: The order of chromosomes in these dotplots will not necessarily be the order in which they appear in the input file! last-dotplot appears to sort the chromosomes alphabetically by name. Check the chromosome numbers on the output. In contrast, Mauve appears to display the chromosomes in the order in which they appear in the input files.

4. (5 points) Create a Mauve alignment comparing all three genomes

This part is done using procedures covered in Comparing genomes using Mauve.

In Mauve, choose File --> Align with progressive Mauve and read in the three genomic sequences. S. cer. should be read first so that it is used as the reference genome. Use the name CerArbEub as the basename for output files.

Export graphic images of your Mauve alignment comparing the three genomes. This will be included in your report. Mostly, running Mauve is helpful to verify which LCBs in one genome correspond to which LCBs in another genome.

Hint: When you mouse-over any LCB in a genome, the Roman numeral of the chromosome in which it is found will appear at the bottom of the Mauve window. This is important for finding out, in each genome, which chromosome an LCB belongs to.

5. (5 points) Write up your results in a report.

Describe the major chromosomal rearrangements that you have found, citing the evidence from the dotplots and from the Mauve results. Chromosomal rearrangements might include insertions, duplications, deletions, inversions or translocations. If you see translocations, are they reciprocal?

Your results should be written using one of the following template files:

These files contain places for pasting in output and screenshots, as part of your report.
You can write the report on the Linux using LibreOffice, found in the Applications --> Office menu. Alternatively, you can install the Filezilla client on your own computer and download your files from your Linux account, to write the report there.

6. (2 points) Quality of presentation

Quality will include:

Submitting your assignment

Note on grading: In assigning a grade, some consideration may be given to how the answer communicates your ideas. Keep in mind the following:

Note on academic integrity: The results in this assignment obviously are derived from the research literature. It will be considered a breach of academic integrity to search for the paper on the Internet and simply copy the author's conclusions from the paper.

If your don't understand how to do something, call me or stop by my office, or send me a message at Also, remember that you can read documentation for each program using links from the tutorials.