3.2.1 ----- - while reading NCBI TRACEINFO XML, MIRA now complains if it sees either insert_size or insert_stdev but not both - -AS:mrpc could lead to non-sequential contig numbering, fixed - change: Sanger naming scheme now expects only a postfix length of at least 3 instead of 4 - error message when reads too long are encountered now clarifies size of MAXREADSIZEALLOWED - change: MIRA now checks read names for funny or illegal characters - bugfix when loading FASTA files: \r on name lines in DOS/Windows files was not handled properly - bugfix in fast2frag.tcl: when qualities were available at loading, only zeros were written to fragment quality file. Bug entered sometimes in the 3.1.x series. 3.2.1rc2 (3.3.6) ---------------- - fixed bug introduced in 321rc1 which led to more than necessary disk operation in SKIM hit reduction after the first pass of MIRA. 3.2.1rc1 (3.3.5) ---------------- - "--job=mapping,solexa" did not set really good values for -SK:mhpr, leading to sometimes sub-optimal mappings. New value should be good for all but the most repetitive eukaryotes. - "--job=mapping" now uses "-AS:nop=1:rbl=1" to conform more to user expectations of a mapping. - test: reducing SKIM hits now uses 30 GiB at most for array, regardless of memory settings. Furthermore, array is re-used between passes. Should reduce MIRA memory footprint. - bugfix: when Solexa sequences in FASTA and FASTA quality files were not in the same order, wrong assignmesnt of qualities could happen under certain circumstances. 3.3.4 ----- - performance bugfix: SKIM reduction in mapping projects with several millions reads now runs considerably faster (e.g. seconds instead of 45 minutes when using 15 million reads) - the SSAHA2 screen reading routines now also read SMALT output files where the "-f ssaha" output option was used - small enhancement: no second round of proposed end clipping if first round did not yield any clip. 3.3.3 ----- - bugfix: mapping with >= 8 strains is now possible again for non-Solexa reads - enhancement: mapping with >= 8 total strains and <= 7 Strains with Solexa data is now possible - silenced error output for 'stat' calls on BSD and BSD-like systems - reworked *_info_assembly.txt file to be a bit clearer - reduced compiler warnings (replaced deprecated gcc hash_map with BOOST unordered_map) 3.3.2 ----- - bugfix: miraSearchESTSNPs failed to save correct data for step3. Error was introduced somewhere between 3.0.5 and 3.2.0 - expanded new pathfinder functionality which caches start sites also to EST assemblies - added -MI:lcs:lcs4s - compiling: building MIRA from a source package now works using calls like "../some/other/dir/configure" - compiling: migrated 12 year old home-brew solution for including flex files to now standard makefile functionality. 3.3.1 ----- - saving skim hits in binary instead ASCII saves between 20 and 60% disk space for skim hits. Also improves NFS performance a bit due to added buffering. Also improves multithreading performance: now at almost n*100% where n is number of threads. Lastly, improves performance of SKIM reduction filtering algorithm. - updated docs to reflect interferences between -AS:mrpc and -OUT:sssip:stsip - chapters in the manual now have readable (and constant) reference names - To help 90% of the users doing a mapping assembly: the default name for backbones / reference sequences is now "ReferenceStrain" and not an empty string (-SB:bsn). Also, the default name for reads without strains is now "StrainX". New parameters -SB:ads:dsn. - testing new pathfinder functionality which caches start sites (-PF:mrcbs) - added missing -OUT:orm:otm - fixed bug which could lead to an out-of-memory error when no overlaps were present in an assembly - removed deprecated -GE:kcim parameter - changed behaviour: in EST assemblies, statistics for large contigs are computed for contigs >= 1000 bases instead of 5000 as for genomes 3.2.0 ----- Changes since 3.0.5: - Support for data from Pacific Biosciences. Though the 3.0.x (and even earlier 2.9.x) versions could handle PacBio data when configured accordingly, MIRA 3.2.0rc1 now officially supports PacBio. Beside the usual support for non-paired and "paired-end" data, MIRA also has a new automatic editor for PacBio reads which should be useful when dealing with "elastic dark inserts" (longer stretches of unread bases whose length is only approximately known) in PacBio strobed sequencing mode. With this, MIRA should be able to deal with strobes of unread bases up to ~400 bases without having to split strobed sequences in multiple read- pairs. Simulation with bacterial data show that PacBio strobe reads in 200/200 mode (sequence 200 bases, skip 200 bases, repeat) and having 3000 sequenced bases can reconstruct a bacterial genome quite well (1 contig, correct genome organisation), leaving only "clean up" work of getting some bases right via, e.g., hybrid approach by complementing with e.g. Solexa (Illumina) data. Though I admit that PacBio support is a long shot (I don't have real PacBio data at the moment), I expect MIRA to be "good enough" for first test with real data (for those people out there actually having access to some). On the other hand, I haven't seen any other assembler yet being able to support strobed reads without splitting them. - Fully revamped manual In an exercise of self-defence (too many mails in my inbox), I've updated the manuals to DocBook format and considerably expanded them. The result: nicer manuals in HTML and PDF format, with extensive walkthroughs There are also screenshots in colour. And variegated. - The "usual" improvements and bug-fixes If PacBio and the revamped manuals had not been on the schedule, the "usual" improvements would have a had been described more prominently in this announcement. Things being what they are, I'll just mention - hybrid Sanger/Solexa or 454/Solexa of bacteria now finish within hours instead of days. - longer contigs - less memory utilisation (thanks to Google TCMalloc library) - better support for SSDs - warnings when using NFS