-- Version history -- 2.3 1. Changed logic in misassembly computation. Fixed several minor bugs in misassembly detection algorithm and one major bug caused by linear represenation of circular references and contigs in fasta format. 2. Added contig alignment plots. See details in manual and in the QUAST paper (Fig. 1) 3. Genome analyzer module (computation of genome fraction, duplication ratio, number of genes and operons) is parallelized. 4. Option --test became an installation util analogue. It compiles all required binaries and checks correctness of QUAST and metaQUAST execution on test datasets. 5. Former plots.pdf upgraded with report tables and renamed to report.pdf. Now it is a file with all tables and plots generated by QUAST. 6. A new option --no-plots added for speeding up computation if plots are not needed. 7. GeneMark license updated, instructions for manual updating added. 8. Generation of misleading single-columns histograms removed (when only single assembly file was specified). 9. More error and exception handlers added. 10. Fixed bug with indel counting (caused slightly overestimated indels rate in some cases). 11. Fixed several minor bugs. 12. Code refactored. 2.2 1. The tool now supports metagenomic assemblies. It accepts multiple references and produces several reports: — for all contigs and all input genomes merged into one, — separate reports for only contigs aligned to a particular genome, — for the contigs not aligned to any reference provided. Usage: metaquast.py contigs_1 contigs_2 ... -R reference_1,reference_2,reference_3,... All other options for metaquast.py are the same as for quast.py. 2. MetaGeneMark is used to find genes in metagenomic assemblies. In metaquast.py by default, in quast.py with --meta option. 3. In place of --allow-ambiguity, a new option --ambiguity-usage (-a) introduced. The new option lets specify a way to process ambiguous regions: -a one, -a all or -a none. 4. A new option --labels (or -l) allows to provide human-readable assembly names. Those names will be used in reports, plots and logs, instead of file names. For example: -l SPAdes,IDBA-UD if your labels include spaces, use quotes: -l "SPAdes 2.5, SPAdes 2.4, IDBA-UD" -l SPAdes,"Assembly 2",Assembly3 5. Minor improvements of HTML reports. 6. Fixed bugs in misassemblies detection algorithm. 2.1 Option --strict-NA added to control computation of NAx/NGAx metrics. This option forces QUAST to break contigs by any misassembly event, including local misassemblies (like in v.2.0). By default, QUAST v.2.1 breaks contigs only by extensive misassemblies to compute NAx/NGAx (like in v.1.*). Improvement of indels computation. QUAST now counts consecutive single nucleotide indels as one indel. Total length of all indels is also reported (equal to # indels metric evaluated with previous versions). Short (<= 5 bp) and long (>5 bp) indels are reported. Option --est-ref-size added to set estimated reference size for computing NGx metrics in case a reference genome is not available. GAGE mode is parallelized. Fixed bugs in misassemblies detection algorithm. Fixed bugs in SNPs detection algorithm. Fixed bugs in processing circular chromosomes (affects Genome fraction, # genes, # operons). Fixed several minor bugs. 2.0 Significantly improved assessment of large genomes. Current limit on size of a reference genome is 536 Mbp PER CHROMOSOME instead of 536 Mbp TOTAL in the previous versions. Alignment to different chromosomes is performed in parallel. Changes in algorithm for evaluating Genome fraction, # genes and operons. Filtration of short, ambiguous, and redundant alignments is performed before the evaluation. Option --use-all-alignments is added for compatibility with 1.* versions. New algorithm for finding SNPs and indels. Ability to change colors, line styles, etc. in plots and content, metric names in reports. GlimmerHMM for predicting genes in eukaryotes. Gene Finding is parallelized and its run is controlled by --gene-finding option. Improvement of HTML-reports and plotting units. Fixed several bugs. 1.3 QUAST is now a multi-threaded tool: the most time-consuming step (alignment to a reference genome) is computed in parallel. A MacOS version of GeneMark. Significantly improved HTML-reports. More informative error messages. A simple logic for evaluating scaffolds. New metrics: duplication ration and largest alignment. More careful counting of misassemblies. Min contig threshold changed from 200 to 500. Fixed several bugs. 1.2 Indels and N's counting. More detailed statistics on misassemblies (classification in inversions, relocations, translocations, local misassemblies). More detailed statistics on partially unaligned contigs. Text reports now also available in LaTeX format. Python 2.5 now supported. Fixed bug in reading genes annotations in GFF and NCBI formats. QUAST now can be rerun on existing Nucmer alignments files. 1.1 Mismatches counting. Fixed bug in misassemblies counting (some inversions were omitted). GC content plot is logarithmically scaled. ORFs are not counted, GeneMark added instead (for gene finding, only on Linux). Nucmer aligner setting changed (from IDY% = 80 to 95, i.e. now all alignments are more robust). 1.0 Initial open source release!