BIRCH

Previous

TUTORIAL: Genome Assembly

Preprocessing of sequencing reads


  Jan. 13, 2019

Next page


References

FastQC - http://www.bioinformatics.babraham.ac.uk/projects/fastqc/

trim_galore User's Guide
trim_galore manual

SeqKit


0. Obtain sequencing read files

DATASET: Fakankun I et al. Ph.D. thesis, University of Manitoba (in progress) Rhodosporidium diobovatum.
This dataset is a random sample of about 5% of the reads from fungal genomic DNA.

raw read files (Illumina, paired end)
insert size (nt)
DL300_S1_L001_R1_001_sample.fastq.gz
DL300_S1_L001_R2_001_sample.fastq.gz
300
DL400_S2_L001_R1_001_sample.fastq.gz
DL400_S2_L001_R2_001_sample.fastq.gz
400
DL700_S3_L001_R1_001_sample.fastq.gz
DL700_S3_L001_R2_001_sample.fastq.gz
700

Since genome assembly requires several steps, it is best to organize the files for each step into a separate directory. In addition to keeping things simple, this organization gives us more flexibility for saving files offline, compressing files, or ignoring sets of files for backup.

In your tutorials directory create a new directory called blreads/genome/raw, and then go to that directory:

mkdir blreads/genome
cd blreads/genome
mkdir raw
cd raw

Next, download the fastq.gz files found at