return to tutorials

TUTORIAL: Genome Assembly

  June 10, 2019

Next page


Fristensky B (2019) Genomic Sequencing and Assembly. PLNT4610 Bioinformatics lecture notes, Univ. of Manitoba.
If you have never done genome assembly before, this lecture is a good starting point for understanding the process, parameters and potential problems.

Goal: Assemble a genome from raw sequencing reads.


A simplified workflow for genome assembly is shown at right.

Genome assembly is carried out within the genome directory, and for each major step, a separate subdirectory is used. By convention, the names of subdirectories tell the series of programs used to generate the results in each directory.

Raw read files are saved in the raw directory, and symbolic links to these files, with short, meaningful names, are created.

Sequencing adaptors are removed from the raw reads by trim_galore and the files containing the trimmed reads are saved in reads.trim_galore. (Alternatively, Trimmomatic could be used at this step.)

The trimmed reads are used as input by pollux, which corrects errors in the reads and writes the corrected reads to the pollux directory.

At each step, FASTQC is run to check the properties of the reads.

Finally, the assembly itself is done using programs such as ABySS or Spades or SOAPdenovo2, which produce contig files and scaffold files. For each assembly, quast produces an reports with extensive statistics that can guide the choice of which assembly is best, or which assemblies should be repeated with different a parameters for improvement.

The next 3 tutorials implement these steps: