Preprocessing of sequencing reads

Jan. 13, 2019

References

FastQC - http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

trim_galore User's Guide
trim_galore manual

SeqKit

0. Obtain sequencing read files

DATASET: Fakankun I et al. Ph.D. thesis, University of Manitoba (in progress) Rhodosporidium diobovatum.
This dataset is a random sample of about 5% of the reads from fungal genomic DNA.

raw read files (Illumina, paired end)	insert size (nt)
DL300_S1_L001_R1_001_sample.fastq.gz DL300_S1_L001_R2_001_sample.fastq.gz	300
DL400_S2_L001_R1_001_sample.fastq.gz DL400_S2_L001_R2_001_sample.fastq.gz	400
DL700_S3_L001_R1_001_sample.fastq.gz DL700_S3_L001_R2_001_sample.fastq.gz	700

Since genome assembly requires several steps, it is best to organize the files for each step into a separate directory. In addition to keeping things simple, this organization gives us more flexibility for saving files offline, compressing files, or ignoring sets of files for backup.

In your tutorials directory create a new directory called blreads/genome/raw, and then go to that directory:

mkdir blreads/genomecd blreads/genome mkdir raw cd raw
Next, download the fastq.gz files found at

TUTORIAL: Genome Assembly

Preprocessing of sequencing reads

0. Obtain sequencing read files