BIRCH

Previous page

TUTORIAL: Genome Assembly

Read correction


  November 30, 2023

Next page



References

Marnier E et al. (2015) Pollux: platform independent error correction of single and mixed genomes. BMC Bioinformatics201516:10 DOI: 10.1186/s12859-014-0435-6

Pollux manual

3. Correct errors in trimmed sequencing reads using Pollux

Pollux offers a number of improvements over previous methods for correcting DNA sequencing reads:
The goal is to perform error correction on the trimmed read files (*_P.fastp) from the previous step. If you closed the blreads window, simply start another instance of blreads in the reads.Trimmomatic directory.


Running pollux

Your blreads window should now appear as shown at right. The read files will have the .fastq file extension.


As before, we need to tell blreads which read files should be paired together for paired-end reads. Choose File --> guesspairs.py.
We only want to process the paired read files from Trimmomatic. Choose File --> Select All, and then File --> guesspairs.py. Set the target file extension to P.fastq.  (In some cases, you may need to specify more than one set of file extensions as a comma-separated list eg. .fq,.fastq Clicking on the Hints button will give a more detailed explanation of these parameters.)



Clicking on Run will bring up a new blreads window with the best quess of file pairing, in two columns.

To run Pollux with these read pairs, choose Edit --> SelectAll, and then Reads --> Pollux.



Set the Name for output directory to  ../reads.Trimmomatic.pollux. Remember that "../" will tell BioLegato to write the output directory to the parent of the current working directory.

Pollux may take awhile to run, so you may wish to set "Notify of completion by email" to Yes and type in an email address.



The output is written to the ../reads.Trimmomatic.pollux directory, whose contents is shown at right.

The report in pollux.log summarizes the numbers of errors corrected of different types eg. insertions, deletions, corrections, homopolymer corrections.

The high quality corrected reads are found in files with the .corrected.fq extension. Corrections based on low k-mer counts have the .low extension, and are generally not used.





Next: Assembly of contigs and scaffolds