--------------------------CASAVA1.8.2----------------------------------------- Version 1.8.2a7 * BF-1176 fastq header barcode sequence is written as NNNNNNNNN when adapter masking is used Version 1.8.2a6 * BF-1174 produceAlignStats crashes on sparse export data * BF-1115 Adding index sequence in header of undertermined fastq files to be consistent and help troubleshooting index issues Version 1.8.2a5 * BF-1164 Adapter sequence masking * BF-1129 Provide a meaningful default USE_BASES for eland_pair Version 1.8.2a4 * BF-1170 The FASTQ files should only contain the PF reads by default * BF-1169 Excessively long dependency lists cause qmake to segfault * BF-1165 Add the possibility of generating only 1 fastq per read (asked by customers and NEEDED BY SWiM) * BF-1168 Missing tile data in fastq causes alignment workflow failure on plot_error_graph.pl Version 1.8.2a3 * BF-1154 Remove unused data from validation examples * BF-1114 Implement mismatch simulation at the barcode read boundary * BF-1124 Allow any number of indexes, samples and projects per lanes (fix for make: execvp: Temp/CASAVA-1.8.0a8.sh: Argument list too long) * BF-1152 TEMP_DIR is ignored due to a hard-coded Temp folder name in Alignment.pm Version 1.8.2a1 * BF-1128 ELAND MatchCache performance improvements * BF-1148 produceAlignStats sometimes crashes on empty export files --------------------------CASAVA1.8.1----------------------------------------- Version 1.8.1a1 * BF-1138 target r1 is broken in bcl to fastq conversion * BF-1141 Archival build does not include unmapped read pairs * BF-1132 QseqToFastq converter does not properly handle PF field * BF-1139 ClusterFinder does not output any clusters/reads if no anomalous pair reads are found * BF-1142 Run number padding inconsistency in CASAVA is not handled in PairStats (used by Grouper). --------------------------CASAVA1.8.0----------------------------------------- Version 1.8.0a19 * BF-1125 Incorrect initialization of several sample-specific variables during the configuration of the alignment * BF-1122 --sortKeepAllReads breaks post-alignment make * BF-1123 --refSequences breaks post-alignment make Version 1.8.0a18 * BF-1120 default validation fails to detect position files properly for 100723_EAS346_0188_FC626BWAAXX * BF-1119 post-alignment fails to qsub any task with sge 6.2u5 * BF-1118 Autodetection of positions files should be based on RTAConfiguration.xml for newer RTAs (i.e. trunk) Version 1.8.0a17 * BF-1113 The call to kagu in ELAND_standalone.pl should use contig names. * BF-1112 configureBclToFastq.pl issues a WARNING about missing runParameters.xml (which exists) instead of the RTAConfiguration.xml (which is actually missing) * BF-1106 Run Folder and Platform not specified in the Barcode_Lane_Summary.htm & Sample_Summary.htm * BF-1110 no .xsl file prevents the BustardSummary.xml from being opened in IE * BF-1111 provide examples in configureValidation.pl command line help * BF-1108 configureBuild.pl command line RNA example incorrectly named as DNA example * BF-1107 FastqConverter incorrectly parsed the filter flag in qseq and export files. * BF-1105 sortExport.pl dies on negative alignment positions in anomalies.txt * BF-1104 demux help script needs to change the term 'demultiplexed_dir' to 'Unaligned'. * BF-1102 rpm installations dont have Default link to the validation dataset Version 1.8.0a16 * BF-1103 bcl demultiplexer fails to parse cycles list on Suse 11 service pack 1 * BF-1101 convert: unable to open image `Temp/tmp-plot_error_graphs-33004-701.txt.ps': No such file or directory during alignment * BF-979 Remove the dead code Version 1.8.0a15 * BF-1100 Build fails to handle anomalous read pairs when duplicate marking is turned off. * BF-1093 grouper ClusterMerger segfaults if there are no semi-aligned clusters Version 1.8.0a14 (never built) * BF-1099 qmake -inherit from under qsubbed script causes permanent workflow failures * BF-1097 ELAND_standalone.pl was not handling old CASAVA FASTQ files properly. * BF-1096 ELAND_standalone.pl was not up-to-date, causing perl syntax errors. * BF-1095 update legal statement at the beginning of each source file * BF-979 Remove the dead code * BF-1018 cleanup unused makefiles * BF-944 Chimeric Duplicate Filtering * BF-1091 Rename 'tricker' to 'trimmer' in the next release of CASAVA 1.8.0 * BF-995 orphanAligner crashesh when shadow is flagged as QC * BF-1092 report pages have flowcell totals in lane rows * BF-1090 In the DemultiplexedStats.htm file, change "% of lane" to "% of raw clusters per lane" * BF-1076 Yield calculation in Demultiplexer and Alignment Summaries should be based on the number of PF clusters, not Raw clusters. * BF-1089 Snp-caller rejects valid command-line argument Version 1.8.0a13 * BF-1083 array overrun in tricker * BF-1062 nonOverlapping_exon_coords.pl does not handle exons of only one base in length * BF-1087 empty export.txt files cause sortExport to choke * BF-1049 alignments produced with positions outside of the reference as result of eland accepting the reference that is bigger than it can handle * BF-1084 Post-alignment task manager does not detect error when seqGeneMdGroupLabel is incorrect. * BF-871 ELAND + ELANDv2 segfaults when chromosome file starts with GGTG Version 1.8.0a12 * BF-1086 increase default CASAVA_MISSING_FILE_WAIT_DELAY to 75 seconds * BF-1085 /bin/bash: Undetermined_indices/Summary_Stats_.../score.xml.tmp: No such file or directory * BF-929 Remove production of word pattern statistics from produceAlignStats * BF-692 ELAND reports wrong alignment descriptors for low quality reads starting with N * BF-1013 Flicker integration Version 1.8.0a11 * BF-931 multiseed alignments to the beginning or end of a contig cause match descriptor against a different contig. * BF-1079 ELAND_standalone.pl should parse the input file formats in order to discover the proper lane information for the filenames. * BF-1081 perl glob for *fa fails if more than one ELAND_GENOME is present in config.txt * BF-1077 Use consistent units of Yield (MBase) across all Yield calculations/stats in Demultiplexing and Alignment Summaries * BF-1076 Yield calculation in Demultiplexer and Alignment Summaries should be based on the number of PF clusters, not Raw clusters. * BF-1080 bam target fails due to undefined subroutine * BF-1072 gsIndex: linear indexing must flush index buffer to work correctly on very small BAM files * BF-1074 Kagu generates unsupported nominal orientation case * BF-1078 The circular parameter in kagu does not allow for reference sequences to be specified. * BF-966 CBI/Build36.3/seq_gene.md.gz produces warnings on HumanNCBI36Fasta_all.fa * BF-1014 rnaCounts target must fail to configure if seqGeneMdGroupLabel is not set for seqGeneMdFile * BF-1023 configureRnaBuild.pl wrecks the quoted or escaped arguments when invoking configureBuild.pl * BF-368 run.pl does not issue errors for unrecognized command-line arguments * BF-1075 prevent empty contig names in squashGenome * BF-1070 Missing insert size for small datasets * BF-917 GROUPER provides no output in regions where the reference genome is lower-case * BF-1071 don't provide standard deviations for sets of less than 3 elements in html reports * BF-1053 aggregate all score/rescore.xml into a single per-flowcell file. Version 1.8.0a10 * BF-1067 rename POST_RUN_COMMAND into DATASET_POST_RUN_COMMAND since the behavior is different now * BF-1068 command line and config.txt ELAND_GENOME_MASK is ignored * BF-1061 Memory accessed past the boundaries of an array in ELAND_inner.h * BF-1029 Add flowcell-id to fastq files and transfer BF-1029 fix to qseq conversion workflow. * BF-1050 Unanchored pair marking in BAM files and variant-calling filtration option. * BF-1064 alignment does nothing if bustard config.xml contains TileRange * BF-1063 missing dependency on Temp folder for SampleSheet.xml (breaks any alignment workflow) * BF-1054 missing dependency on the folder containing rnaqc file (breaks eland_rna) * BF-1052 use grep -E instead of grep -P * BF-1031 Cypress 2 flowcell metadata support * BF-1035 CASAVA should ignore missing control files * BF-1039 Sample sheet entries with barcodes of length that does not match to the number of index cycles must cause failure at the configuration step * BF-1048 support Read/Number in RunInfo.xml * BF-695 SegFault in ClusterFinder using Mate Pair data * BF-999 Create application specific default target lists * BF-945 ELAND crashes (failed assertion) if files which contain 2bpb in their filename are in a squashed directory * BF-960 orphan aligner crashes on empty eland_extended files * BF-875 eland provides no error details when the tmpfile creation fails * BF-1042 make pair-end and rna examples in alignment and build command line help Version 1.8.0a9 * BF-677 Return code of gnuplotimage are not checked * BF-1041 The email notification is missing at the end of the alignment * BF-961 Support for missing BCL files in BCL demultiplexer * BF-1027 Install the example datasets during the "install" instead of the "build" * BF-1036 Set small variant caller to output used base counts by default * BF-1034 Fragment length distribution assessment does not work correctly in Kagu with small genomes (PhiX) and forgetting to use the circular flag. * BF-1038 bash 4 on ubuntu requires bash31 compatibility to be forced to understand loggignShell.sh properly * BF-1037 Validation installation cmake code is incompatible with ubuntu * BF-1029 BCL Demultiplexer does not rebuild all missing files properly if killed and new make started. * BF-989 large numbers in html reports look like 2.187298487e+09 * BF-901 Files with read-only permission in html/ folder * BF-1012 rename ELAND_SET_SIZE to ELAND_FASTQ_FILES_PER_PROCESS * BF-786 Error when RefFlat files contain empty first field * BF-1033 when filter files are per-read, original read numbers must be used for filter file name composition * BF-983 Alignment summaries are missing software version fields * BF-1026 make all in successfuly finished Unaligned folder casues all demultiplexing to be redone * BF-1015 Add support for sample analysis * BF-1009 RTA 1.10.36.0 data causes perl warnings due to unexpected data in RunInfo.xml * BF-996 Mean Quality Score in Demultiplexed_Stats much lower than expected for a one tile run * BF-1005 Support turning off generation of Undetermined_indices in configureBclToFastq.pl * BF-1004 configureValidation must not produce makefiles with relative paths * BF-1010 control bits missing in fastq for data from RTA 1.9 and 1.10 with controls in filter files * BF-1007 --samtoolsRefFile occasionally causes Build workflow to fail during the reference genome integrity verification * BF-1008 --no-eamss has no effect in demultiplexBcls Version 1.8.0a8 * BF-1002 relative --prefix breaks installation during validation dataset unpacking * BF-1001 ANALYSIS none datasets produce WARNING in logs * BF-1000 pair statistics missing in barcode-lane and sample summary report pages * BF-937 Target re-run fails on builds which use '--samtoolsRefFile' * BF-998 Unable to configure alignment for single-tile datasets * BF-996 Mean Quality Score in Demultiplexed_Stats much lower than expected for a one tile run * BF-997 Post-alignment fails when given paired-end data as part of a single-end build. * BF-990 --use-bases-mask y*,y*,y* breaks demultiplexer * BF-972 Improvement in error reporting when encountering an error with unexpected number of reads. * BF-994 Correct soft-clip handling in variant caller for rna-seq analysis * BF-993 failures due to new small files or directories invisible across compute nodes * BF-991 Restore date in the name of Build Parsed foldes for GS compatibility Version 1.8.0a7 * BF-834 Make new examples for 1.8 * BF-982 Incorrect prototypes for some perl sub * BF-984 relative paths cause problems in configureAlignment.pl * BF-985 Support for dataset POST_RUN_COMMAND * BF-986 ${DEST_DIR} not working during CASAVA's compilation/installation * BF-981 CASAVA does not recognize the correct filter file format from RTA 1.9 * BF-978 Fix qseq to fastq paired-end read ordering * BF-770 Casava builds hang on the Hayward SGE * BF-900 Remove | from the list of banned characters for reference files in CASAVA 1.8 * BF-971 In the new CASAVA 1.8 Summary files, change the name from Clusters to Clusters (Raw) in the Project Results Summary Version 1.8.0a6 * BF-976 IndelFinder boost lexical cast exeception when aggregating insert sizes of different export files produces number with non-zero fractional part * BF-975 Failure to process single-ended runs (Unknown configuration variable: READ_LENGTH2) * BF-974 Rename GERALD.pl, run.pl and runRNA.pl to configureAlignment.pl, configureBuild.pl and configureRNABuild.pl * BF-963 AlignContig and SmallAssembler modules in Grouper require parameter changes * BF-969 Updated ELAND_standalone.pl for CASAVA 1.8 * BF-942 Updated post-alignment CASAVA to use the new sample-oriented directory structure * BF-932 Update illumina_export2sam and post-alignment bin/sort to handle control reads * BF-934 Support for compressed export files * BF-830 Conversion of the pickBestPair.pl script to C++ * BF-836 Implement BCL files split by sample * BF-801 BCL to fastq converter Version 1.8.0a5 * BF-965 Check to make sure contig is not aliged to N's in the genome * BF-968 Correct slow random block access to post-alignment BAM files Version 1.8.0a4 * BF-943 bin/sort I/O reduction * BF-928 Re-design of SmallAssembler to process large read clusters * BF-950 support minimum read length 8 in eland * BF-946 Clarify run.pl target usage and reformat sort target as a plugin Version 1.8.0a3 * BF-936 Variant-caller hangs in genomic regions with many assembled contig alignments * Post-Run Command Fixes Version 1.8.0a2 * BF-930 Muti-fasta references eland extended files differ from equivalent slingle-fasta reference * BF-925 Separated previously shared AlignCandIndelReads and AlignContig parameters, reverting AlignCandIndelReads defaults to previous values. * BF-929 Reduced maximum number of word patterns per type for produceAlignStats to store from 1M to 100k. * BF-920 runGrouperBin.pl exceeds 4 Gb memory limit * BF-927 Allow summary.xml with absent lanes * BF-918 Grouper's SmallAssembler needs to take read length into account * BF-924 elandv2e - inconsistent gap placement Version 1.8.0a1 * BF-919 Add safe-mode for starling indel-realignment * BF-913 Fixed issue: Some Grouper module parameter values specified in the workflow are not the recommended ones * BF-915 Add handler for non-canonical reference characters to BamAlignmentReader * BF-912 Anom pair clustering in ClusterFinder should not separate reads by strand mapping * BF-906 Improve pericentromeric indel noise filtration in variant caller * BF-905 Set variant caller expected het indel allele ratio as a function of indel and read size * BF-904 Enable setting of variant caller maximum indel size at runtime * BF-903 Variant caller incorrectly handles some open-ended contigs * BF-909 SGE options for run.pl and taskServer.pl * BF-876 Modified Grouper to accept BAM files rather than converted export files * BF-896 Remove clusters with more than 10 links to reduce probability of excessive merging in ClusterMerger * BF-907 realigned read output interferes with Genome Studio BAM import * BF-908 Invalid realigned BAM records for GROUPER-mapped shadow reads * BF-902 Extend bam target to support realigned reads * BF-895 Fix SequenceUtils to accomodate seq length >100 bp * BF-749 Refactor ELAND * BF-816 Port Genome Studio BAM linear index creation to CASAVA and create plugin. * BF-894 Catch blocks in grouper do not handle exception or return an error code. * BF-892 produceAlignStats has no limit on the number of read patterns stored for stats -> hits mem limit (temp fix -> new ticket for final fix) * BF-884 Shadows but no semi-aligned reads -> ClusterFinder segfault (simulation data) * BF-840 Create archival build option and convert post-sort CASAVA to run directly from BAM records * BF-887 Update illumina_export2sam to parallel minor formatting changes in post bin/sort BAM files * BF-885 Add independent error model switch For snp-calling * BF-883 Add option to print used allele counts in snps/sites files * BF-868 Make numbers of SDs for anomalous insert size configurable * BF-880 Set default mapping threshold for single-ended read applications to match CASAVA 1.7 * BF-879 Fix ClusterFinder memory leak * BF-878 Fix ClusterMerger infinite recursion vulnerability * BF-877 Script "buildCoverage.pl" does not handle zero or negative Match Position values correctly * BF-874 Post-alignment RNA-Seq fails when given the workflowAuto command (-wa) * BF-769 Grouper enhancements including use of anomalous read pairs, improved breakpoint prediction from semi-aligned reads, spanContigs.pl, open-ended alignment and other AlignContig improvements * BF-856 calsaf support is broken by BF-814 * BF-833 Better error message when the SGE queue is not found for the PostAlignment TaskManager * BF-639 As part of parameter checking, CASAVA should check that a supplied SGE queue exists * BF-841 Multi-entry fasta files should not be allowed in ELAND_GENOME folder * BF-853 Complete new variant-caller integration * BF-772 better implementation of RNA QC stats * BF-713 CASAVA should make its install path available to post-run commands for a build * BF-702 Make ELAND write its temporary files into the Temp dir instead of /tmp * BF-776 ELANDv2e - speed optimizations given repeat resolution * BF-667 newer version of boost (1_44_0) * BF-806 Add the BCL as a new oligo source for ELAND * BF-854 Add starling support for open-ended GROUPER contigs * BF-814 Support for samtools indexed reference sequences * BF-763 SuMake squashGenome throw some error when it runs out of disk space.. * BF-839 Change task dependencies to prevent report code from running in parallel * BF-838 Fix missing c++ headers from BF-776 merge * BF-837 Remove CASAVA 1.0 allele caller * BF-835 Fix temp file error in hyrax reports and check all File::Copy::move calls * BF-819 illumina_export2sam should check export field count * BF-813 illumina_export2sam should handle "RM" reads * BF-743 third variant calling prototype - improved dense and overlapping indel handling * BF-661 allow gapped eland for elandRNA * BF-826 too many accesses to qval files, even when not using this workflow * BF-827 BF-711 breaks BF-586 * BF-828 buggy extract_run_info() in Jerboa.pm * BF-255 --dataSetSuffix not working with SNP files * BF-797 Generate an error when the reference sequence is different from the one that has already been used * BF-805 Install samtools version 0.1.8 during installation for use by CASAVA scripts. * BF-796 Generate an error when minor version is different (e.g. 1.7.x vs 1.8.y) * BF-761 spliceSites.pl to check and report when no splice sites could be generated * BF-789 Do not display "/usr/local/bin/run.pl finished" when using the workflowAuto parameter until the entire job is finished. * BF-788 Only display the run.pl options if the user types run.pl --help (i.e. do not display options every time run.pl is executed) * BF-753 Can't return outside a subroutine at /illumina/software/casava/CASAVA-1.7.0/libexec/CASAVA-1.7.0/sorted2bam.pl line 155. * BF-721 Error when giving run.pl a reference directory as a relative path using '~' * BF-775 Splice site generation produces duplicate entries when the same splice site is present on both strands * BF-768 leading zeros in export run number cause artifactual read duplication during indel calling * BF-756 Enable installation and post-alignment demo completion on Mac OS 10.6 * BF-744 small_variants plugin does not handle bins without known sequence * BF-742 varling_caller ignores -report-range-{begin,end} options * BF-739 Enable compilation of blt/GROUPER/varling on Mac OS 10.6 ----------------------------- Moved to CASAVA_20100609 CVS Root. * BF-737 annoying messages when sample sheet contains empty fields * BF-734 Remove 'bam2' Stage2 plugin * BF-698 Support for export files subsampling in post-alignment * BF-660 Add small_variants plugin to provide new snp-calling prototype * BF-457 Enalbe indel finder to run on single read data * BF-649 Single-read test dataset is not really single-read data --------------------------CASAVA1.7.0----------------------------------------- Version 1.7.0a11 - Updated README document to clarify supported platforms. Version 1.7.0a10 * BF-731 The USE_BASES mask is not applied to the sequence files * BF-730 Bogus 'undefined reads' message in pickBestPair * BF-729 GERALD does not always convert '.' into 'N' * BF-728 Increase the size of the window for the detection of optical duplicates in the diversity script Version 1.7.0a9 * BF-722 Add chromosome label options to bam target * BF-718 Phasing & pre-phasing info missing in post-demultiplexed BustardSummary.xml * BF-717 "ANALYSIS none" is always set for the unknown directory when performing Demultiplexing+Alignment * BF-716 shell command-line length limit affects copying Phasing and Matrix info * BF-715 demultiplexer hangs on empty sample sheets Version 1.7.0a8 * BF-712 alignment fails when there are more than one reference per lane * BF-711 fixed template-to-config conversion * BF-707 'default' workflow cannot be ran on its own * BF-706 incorrect ELAND_READ_LENGTH in the invocation of pickBest* * BF-705 the lane-specific USE_BASES mask is not set for read 3 Version 1.7.0a7 * BF-703 varling_caller assertion fails : assert(min_indel_count!=0) Version 1.7.0a6 * BF-693 failure to build sequence files * BF-694 RD: Logic for whether to create a prequisite checkpoint for SV is incorrect * BF-409 pickBestAlignmentRNA does not see difference between no genomic alignments and too many genomic alignments * BF-525 RD: validation plugin added to CASAVA_20091209/src/perl/libexec/Stage2/lib/Casava/PostAlignment/Plugins/ * BF-687 better error message when dying because of empty tiles file * BF-607 aggregate multiple pair.xml per lane into a single pair.xml (fix case of existing but empty xml tags) * BF-684 bomb if QSEQ files are missing * BF-682 less accurate aggregation of pair.xml files * BF-679 produceAlignStats does not check for empty export files * BF-681 squash and unsquash crash on empty files * BF-678 INTERMEDIATE files to be deleted only at the very end of a 'make' session * BF-676 RD: bam_RD target should check for sorted.txt files only if it is the first target (existing build) * BF-665 Fixed the support for eland repeat files * BF-668 produceAlignStats sequentially generates the score/rescore files for all the tile of each lane * BF-104 QC sample summary for rnaSeq (fix for case when grep can't find anything) * BF-671 produceAlignStats produces wrong score files when tiles are not represented * BF-670 Spurious "8:ANALYSIS none" added to config.txt * BF-666 Alignment analysis mode sequence_pair does not work Version 1.7.0a5 * BF-662 -t bam' samtools index - fails when consecutive lines are not in sorted by chromosomal position - fix requires 'samtools sort * BF-663 failed interpretation of config.txt when SampleSheet.csv has different line endings Version 1.7.0a4 * BF-655 failure to demultiplex + align in multiplexed RNA dataset * BF-650 qmake/qsubmake errors when aligning, due to concurrency Version 1.7.0a3 * BF-643 RD: Restored allowing sv to work without gaps dir * BF-538 Delete the unused intermediary files -- leave an option to keep them * BF-648 trailing '/' in the directory name, when demultiplexing a subset of output bin folders, doesn't work * BF-645 --use-bases in ELAND_standalone.pl not covering all possible cases * BF-616 Disable hash table optimizations for eland_ms_XX for XX in {16,...,31} * BF-615 RD: Chimera ranking should take repetitive regions into account * BF-644 Error when running demultiplexer with new demultiplexing parameters * BF-618 Report clusters/mm2 and lane yield in MB instead of clusters/tile and lane yield in KB in Summary.htm * BF-646 pickBestPair.pl fails if given an old style _genomesize.xml file * BF-624 don't keep _raw_count.txt files * BF-585 Adaptation of custom calibration workflows for parallel version of the demultiplexer * BF-640 propagation of BF-632 to demultiplex flows * BF-638 Orthogonalization of demultiplexing parameters * BF-632 Set ELAND_SET_SIZE to an empty default value, forcing users to set it appropriately * BF-594 Licensing updates for CASAVA 1.7 * BF-635 broken QC::Builder constructor in build.pl * BF-586 Dealling with mixed indexed and non-indexed lanes in the same flowcell * BF-598 Gracefully deal with wrong barcodes in sample sheet * BF-634 squashGenome core dumps on large genomes Version 1.7.0a2 * BF-598 Gracefully deal with wrong barcodes in sample sheet * BF-630 fixed missing summary files in demultiplexing + alignment workflow * BF-629 missing error plots after demultiplexing * BF-607 Aggregate multiple pair.xml per lane, into a single pair.xml file * BF-599 modified workflows to allow pickBest* to use only a subset of tiles per lane * BF-605 post-alignment fails if ELAND_REPEAT is used in DNA alignment * BF-613 it should be possible to restart the demultiplexer reliably at any stage * BF-625 genesListPath default value is incorrect * BF-622 Provide user-controllable spurious indel rate for indel genotype caller * BF-620 RD: Invalid samMD match descriptor when CIGAR string contains "P" * BF-619 RD: Invalid samMD match descriptor when CIGAR string contains "N" * BF-606 RNA Example fails when using the --time option on the taskServer * BF-578 support ELAND_RNA_GENOME_SEQ_MD_GZ for NCBI references * BF-574 RD: Parallelise SVchimeraMerge_v4scores.pl by chromosome * BF-610 RD: samMD2Export writes invalid export lines when SAM 'XD' field is missing * BF-604 sorted2sam.pl: remove ISIZE/other PE-specific information during SE read conversion * BF-608 Change sam export match descriptor from 'MD' to 'XD' to avoid collision with samtools match descriptor * BF-570 RD: Merge SAM MD code into CASAVA * BF-578 SnpCaller.pl does not calculate lines in the "coverage.txt" file correctly when --denseAlleleCalls is used * BF-602 CASAVA 1.7 bin/sort fails when binSizeProject != binSizeBuild * BF-555 document *CovCutoff=-1 behavior * BF-601 documenting indel plugin flags * BF-592 RD: SV workflow should finish with concatenation of per-chr feature files into whole-genome feature files Version 1.7.0a1 * BF-597 A more robust approach when RunInfo.xml is missing * BF-591 fixed BF-583 for recursive Makefile invocation * BF-547 GERALD initialization at demultiplexing time * BF-546 Added workflows for parallel demultiplexer * BF-459 Parallelization of the demultiplexer * BF-104 QC sample summary for rnaSeq * BF-548 added support for dealing with any run, or combination of runs * BF-584 move features into examples * BF-583 qmake: *** No rule to make target `Summary.xml_rule_jerboa', needed by `Summary.xml'. Stop. * BF-537 Inhibit production of _sequence files, unless otherwise specified * BF-536 Inhibit production of _sorted files, unless otherwise specified * BF-512 automatically generate splice_sites and exon_coords files in alignment and post alignment workflows * BF-581 have CODING file in each important source tree folder explaining general rules for that folder such as naming convention, basic component requirements, etc * BF-559 CASAVA 1.6 installation failed when using gcc 4.4 * BF-331 Reference genome filename is used as tag name in pair.xml - illegal XML if starts with digit * BF-419 GERALD generates invalid XML in genomesize.xml file given NCBI chromosome names * BF-330 jerboa.pl reports errors parsing a pair.xml that apparently result in an invalid Pair Summary * BF-468 CASAVA does not support digits as the first character of a chromosome name * BF-580 make run.pl --help more readable by removing developers documentation * BF-577 export2sam writes incorrect CIGAR field for unmapped reads * BF-576 Safeguard against indel caller crash due to missing bin or non-existent sorted.txt file * BF-549 RD: SV code needs to use run.conf.xml exportRunId when run number `fixed' * BF-573 add option to include purity filtered reads in export2sam.pl * BF-463 ELANDv2 Speeding up the hash table access * BF-562 CASAVA SE demo contains indelCovCutoff flag even though indels target will not be run * BF-566 export2sam should treat reads with mapping position < 1 as unmapped * BF-565 export2sam: format match contig name according to conversion spec / additional read pair checks * BF-539 Post-run command in Gerald that triggers CASAVA build on a single lane * BF-564 export2sam should limit MAPQ to 254 and write out full mapping score(s) to optional sam fields * BF-563 change export2sam default quality interpretation to phred+64, leave solexa+64 as an option * BF-558 remove CVS smart tags from source code * BF-527 Cryptic error message from PA-CASAVA when insert size distribution fields are missing from pair.xml file * BF-425 resumed workflows never finish * BF-553 invalid @PG header line in export2sam.pl * BF-552 Fix mismatch density filter scaling wrt window size * BF-499 Added (initial) SAM/BAM headers. * BF-523 Match descriptor from sorted.txt is now reverse complemented in MD field for reverse strand matches * BF-526 Removal of bins with no reads (by runRemoveEmptyBins.pl) seems wrong * BF-529 RD: Added new consensus sequence plugin which optionally includes indel predictions * BF-426 squashGenome misses final base when converting from squash format to fasta * BF-545 support for /usr/bin/time in TaskManager * BF-541 Replicated alignment simplifications into ELAND_standalone * BF-519 Removed _calsaf files from GERALD workflow * BF-517 Completely removed _ub_qseq files from GERALD workflow * BF-533 add reference consistency check to blt/varling snp-caller * BF-532 Samtools cmd can now be specified to run.pl * BF-518 Add switch to write dense allele files in DNA-Seq mode * BF-506 SAM/BAM generation should handle missing chr bins or chr bins missing sorted.txt * BF-470 Automate and rationalize the packaging mechanism * BF-504 RD: Support legacy chromosome naming in SAM/BAM chr name conversion * BF-498 SAM/BAM generation should support single read data * BF-472 Add genome.bam generation from per-chromosome BAMs * BF-490 reorganize perl subtree * BF-496 move task manager * BF-495 use CodeMin archive instead of CVS source tree * BF-497 reorganize share subtree * BF-494 Change indels description in run.pl * BF-491 reorganize c++ subtree * BF-492 Use +33 not +64 ASCII offset for basecall quality values in SAM/BAM * BF-493 Reverse strandedness of match should be propagated into SAM/BAM even if mapping is non-unique * BF-490 reorganize perl subtree * BF-488 split off data and redistributables in CVS ----------------------------- Moved to CASAVA_20091209 CVS Root. * BF-20 RD: Integrate Structural Variance code * BF-443 Added the calculateDiversity script [see BF-109] * BF-485 fixed IVC plot links * BF-482 allele-caller should not print empty rows in RNA-Seq mode * BF-481 fixed cpu allocation in multiplexed alignments * BF-479 Demultiplex PE example does not produce s_1_pair.xml * BF-475 Improving demultiplexer's documented examples * BF-237 bin/sort optimization --------------------------CASAVA1.6.0----------------------------------------- Version 1.6.0a12 * BF-430 RNA feature files updated for mouse, human and rat * BF-467 After boost installation installer fails to find boost on UBUNTU * BF-462 configure -verbose must enable FindBoost Boost_DEBUG mode * BF-313 Extended generation of BAM files from sorted.txt. * BF-461 separate out Stage2 plugins in source tree * BF-458 explicitly use /bin/bash and implicitly use perl from /usr/bin/env * BF-434 allow indel coverage cutoff to be disabled Version 1.6.0a11 * BF-456 Added small data set to illustrate the demultiplexer workflow * BF-430 BF-430 human splice sites updated with files not containing random chromosome data * BF-431 readBases does not consider reverse-mapped reads at the end of the exon * BF-438 Replace Post-Alignment CASAVA Test data with new DNA-Seq and RNA-Seq data * BF-428 indel.htm navigation broken "%snpHomHetStats" instead of "bullet snpHomHetStats " * BF-432 ELANDv2 drops foward strand matches that hit positions that are multiples of 2^24 * BF-440 more robust reading of SignalMeans files * BF-439 support for alternative workflow Version 1.6.0a10 * BF-429 ELAND_standalone.pl fails due to --gapped parameter when calling ELANDv2 * BF-427 fixed handling of non-unique SampleIDs in demultiplexer's Sample Sheet * BF-424 Support for empty export files * BF-395 Check for valid block ID in unsquash phase; code cleanup; checking for chromosomes only consisting of N * BF-422 fixed the re-generation of the demultiplexed config.xml for SE runs Version 1.6.0a9 * BF-377 Single end analyses can be done on the second end of a paired end run * BF-416 TMPDIR is not read from the environment anymore. It can only be overriden on the make command line * BF-415 check that the number of reads specified in USE_BASES is compatible with the analysis * BF-414 fixed Use Bases guessing mode in single ended reads. * BF-403 When running configure without a prefix option, installation will fail if the user needs to install cmake and doesn't have root permissions. Version 1.6.0a8 * BF-404 workaround for an issue with qmake on sun grid engine (incorrect processing of includes) * BF-407 need to warn users that gcc 3.x causes the installation to fail on Fedora and CentOS5 * BF-401 smart manipulation of missing lanes in multiplexedGERALD * BF-212 cryptic error when chromosome name in export file does not match the one in genomesize.xml * BF-398 Installing CASAVA with both DESTDIR and --prefix does not work Version 1.6.0a7 * BF-399 re-distributed boost libraries are not always being linked with * BF-397 added the generation of the coverage plots for the default analysis (phageAlign) * BF-254 identify the tiles by globing the qseq files instead of using the base calls config.xml * BF-362 fixed the incorrect management of the lanes, reads and tiles in Jerboa Version 1.6.0a6 * BF-392 check that the boost libraries are available at run-time * BF-389 GERALD to retrieve the Run Folder information from config.xml * BF-393 fixed ELAND file-type detection when the file is empty * BF-394 fixed extraction of tile information when config.xml is broken * BF-314 Front end Makefile for the demultiplexed GERALD folders * BF-365 When configuration fails in CASAVA, the error messages are misleading. * BF-387 Check that number of seeds smaller/equal to 4 * BF-316 chromosomes in lexicographic order in duplicates png plot on html page * BF-194 regenerate splice site files installer improvements: --verbose causes verbose makefile --static forces static linking so that resulting binaries can be executed on another machine that does not have the same shared libraries Forced unpacking of ChartDirector to minimize the chance of installation failures. * BF-188 cryptic error message given if project directory is missing when running the configure target Version 1.6.0a5 * BF-358 RNA counting is out of sync with sorted.txt file creation * BF-309 Cross-calibration of custom quality tables, from indexed runs * BF-299 Demultiplexing of qval files * BF-376 CASAVA_FORCE_STATIC_LINK produces binaries that are impossible to execute * BF-101 not build system for new boost libraries * BF-257 Better handling of cmake installation * BF-382 fixed handling of relative paths in demultiplexer * BF-373 fixed loss of number of 2-error matches in multiseed phase in ELAND v2 * BF-372 handling of gapped/singleseed/multi command line flags for ELAND v2 * BF-318 Added the sample information to the analysis summaries when available * BF-280 Support for custom quality calibration in Gerald * BF-380 cryptic error when hostname returns name that is not proper host name * BF-371 invalid color definition in CASAVA snp gff output * BF-369 move standard options out of unsupported section in run.pl usage Version 1.6.0a4 * BF-334 added the full software version to the summary XML files * BF-366 repeated running of CASAVA examples fails * run.pl RNA example temporarily removed as the RNA test data is broken * non-end-user part of the demultiplexer moved into libexec. * BF-352 indel-caller/indel-summary adjustments preceding CASAVA 1.6 release * DESTDIR installation variable made working properly. * BF-145 verify that reference names in different input files match * BF-203 alignability workflow needs to be updated for multiseed eland * Installation target folder configuration support improvements * Console logging and error indication improvements * Plugin framework improvements * BF-50 support for alignability target * BF-252 two-tier elandv2 + multiseed support * BF-223 same target can be specified multiple times for one workflow * BF-332 Fixed vulnerability to anomalous quality values. * BF-362 Fixed spurious Jerboa.pm warnings for mixed read numbers analyses. * BF-224 Fixed regex for sequence.txt header parsing. * BF-336 Nominal orientation for pairs now read not hard-coded for Summary. Version 1.6.0a3 * BF-361 Syntax error in create_tile_thumbnails.pl * BF-266 Reduced Depth Builds (simulateCoverage) * BF-359 useAlleleMaxMismatchFilter not supported by run.pl Version 1.6.0a2 fix for broken statistics perl script cmake configuration minor documentation fix in demultiplex.pl Version 1.6.0a1 * BF-265 wrong (overhanging) alignments to the splice junctions * BF-226 make splice_sites-49.fa default * BF-298 switch to readBases rna counting as default * BF-105 Allele frequencies are wrong for RNA SNPs close to splice sites. readBases counting method improved to not fail when splice reads do not cover the junction point Splice read counting functionality restored for readBases * BF-185 have --snpFilterSetting=0 and --snpCovCutoff=-1 by default for RNA builds * BF-175 Pre-alignment Sample Demultiplexing * BF-247 Front-end to GERALD for multiplexed runs * BF-251 Demultiplexer support for Second Base Calls * BF-275 SampleSheet and SampleDirectories files for the demultiplexer * BF-353 Default demultiplexed mask obtained from config.xml * BF-310 unique read names in GERALD FASTQ files * BF-350 Crash when eland_pair and Y\d+ alignment specified from single read basecalling * BF-238 GERALD.pl generates incorrect use bases masks (eland_extended on paired end run) * BF-276 indel reporting table * BF-277 step to create 1 indel file per chromosome * BF-296 No warnings or errors are given if 'indel' is specified on command line and -rm = single * BF-284 CASAVA alignment can produce all-N alignment descriptors using multi-coding references * BF-190 taskServer.pl handles client error inconsistently * BF-243 No longer possible to generate workflow for allele target * BF-248 ELAND directly reads qseq files instead of eland_query * BF-302 fix snp.txt labels * BF-143 QC module * BF-240 Implement the eland_rna workflow Version 1.5.0a7 * BF-292 Allow the user to use applications from the path that they specified * BF-282 Incorrect management of the SHELL version and pipefail options * BF-207 Indel Finder script does not forward Solexa64 quality value information Version 1.5.0a6 * BF-271 Summary.htm misnames tiles remaining after BAD_TILES filtering. * BF-159 Post-CASAVA hook so that we can automatically kick stuff off when CASAVA finishes (runs the command with project being the current directory) * BF-261 shebang line is incorrect in perl scripts, also removed perl from perl script execution command line * DAP-383 Fixed XML issue between RTA and CASAVA in Summary.htm generation * BF-196 Allow CASAVA build to ignore missing GERALD metadata files * BF-254 Fixed handling of upstream tile selection. * BF-260 Added running of POST_RUN_COMMAND. * BF-262 Added alignment tile filtering using BAD_TILES. * BF-263 Fixed handling of single upstream lane. * BF-114 Replaces pickBestPair.pl with the c++ version (rollback - for stability reasons decided to hold off the switchover until 1.6) Version 1.5.0a5 * BF-253 It is impossible to control number of parallel jobs when using run.pl or runRNA.pl * BF-114 Replaces pickBestPair.pl with the c++ version * Made indels to be a default target for paired DNA only. * BF-230 typo in ./CASAVA help file alignmnet->aligment * BF-244 project specified twice in usage examples Version 1.5.0a4 * BF-232 Added support for installing in relative paths * BF-235 Fixed the generation of the genomesizes.xml * Added the rule to create the lane tile file for ANALYSIS none * Fixed the recursive definition of the CMAKE_CXX_FLAGS Version 1.5.0a3 * Produce a warning when building in the source directory * Added release notes pdf to ISO Version 1.5.0a2 * BF-213 add remaining switches/configuration parameters required for CASAVA 1.5 (early access) allele-calling * BF-221 Dead run.pl options: --indelOnlyMode and --reportMode don't do anything Version 1.5.0a1 * BF-181 runTasks.pl deprecated, command line and usage guides improved for run.pl taskServer.pl * BF-144 Workflow profiling support (timestamps in task file, configurable binSizeProject and task2gantt.pl plot tool), execution timing improvements, chromosome name parsing disabled * BF-182 Gappend alignments need to be handled by downstream statistics generation * BF-179 misordered records in sorted.txt * BF-150 Improved clustering in IndelFinder * BF-161 handle gapped match descriptors in blt allele-caller * BF-160 make allelecaller2.0 compatible with RNA-Seq mode * BF-127 integrate blt/allelecaller2.0 as a replacement for perl allele caller * BF-170 Calculate percentage of chromosome covered * BF-151 Distinguish between repeat and NM shadow reads in build * BF-158 Duplicate read selection consistency * IMP-36 ELAND handling multiple seeds internally - adding optional gapped aligner * BF-74 running target sv never completes at taskManager * BF-57 CASAVA not relinguishing port numbers after analysis is complete * BF-105 Allele frequencies are wrong for RNA SNPs close to splice sites. Use readBases rnaCountMethod method to have SNPs correct * BF-5 Poisson plot statistics generation target * BF-156 support for read indexing in alignment reader library * BF-155 invalid non-unique pairs value in Reads.idx * BF-134 Task manager attempts to listen on an empty port in range * BF-53 SNP summary needs totals * BF-157 snp html summary page does not report numbers for chromosome M * BF-159 Post-CASAVA hook so that we can automatically kick stuff off when CASAVA finishes (also works with --workflowAuto and --sgeAuto) * BF-149 modifyQVRef dropped * BF-148 sort.count files end at 10,000,001 not 10,000,000 Version 1.2.0a1 * BF-50 Alignability workflow generation is available via runComputeAlignability.pl * New version naming scheme ..[][] * CMake - based build system * Allele caller 2.0 available as alternative configurations * Logging to STDERR instead of STDOUT 07-05-2009 Version 1.1.1.3 Minor corrections to Bug fixes patch * Failure during parsing Summary.xml if the lane does not contain clusterCountRaw element fixed * README and command-line help spelling corrections * BF-125 Running target indels fails unless Indel folders exist in each chromosome directory * BF-78 Permissions problem always seen in new releases * BF-68 If incorrect project folder given (ie no config found), die explaining this! 20-03-2009 Version 1.1.1.0 - Refactor TaskManager * sgeAuto - automatically runs on SGE (use with numberOfProcesses, sgeQueue) * check if path to genome_size.xml points to a file, print error if it points to directory * check if --ref (Reference genome path) points to directory, print error if it points to a file. (check if all .fa exist) * Address chr to c1 issue 09-03=2009 Version 1.1.0.12 * Implement simple log file * Simplify error message * Implement export/sorted to fastq parser * Implement SRF XML generator * exp2sra target converts all export files from GERALD folders to a zipped fastq files and generates sra.xml * Implement backward compatibility with 1.0.8 * When new CASAVA is released and run on old build, upgrade the old build to new project.conf * soft clean - allows to restart the build without removing project.conf and run.conf.xml 22-02-2009 Version 1.1.0.10 * Implement --adjScore - takes out from alleleCaller reads with lot of mismatches (default -1 - off) * --adjScore - takes out from alleleCaller reads with lot of mismatches (default -1 - off) * --snpIndelFilter - takes out reads with at least two consecutive mismatches from SNP caller (default OFF) * --singleScoreForPE - filters out (from SNP calling) reads with single score below QVCutoffSingle YES|NO (default NO) 22-02-2009 Version 1.1.0.9 * user flag is available in CASAVA to turn duplicate removal off - prototype * Integration with CASAVA GUI * All SV finder parameters available from run.pl * custom output file names in SVFinder, SNPCaller and IndelFinder * Auto upgrading of project.conf to new version of CASAVA 09-02-2009 Version 1.1.0.8 * Inetgrate Tony's indel finder and rewrite export file access to use the Pipepline API 09-02-2009 Version 1.1.0.6 * Integrate with new indel finder 09-02-2009 Version 1.1.0.5 * Rollback to orginal Indel Findel and integrate 29-01-2009 Version 1.1.0.3 - R&D features * Add Indel Finder 12-10-2008 Version 1.1.0.2 - R&D features * Add automation for converting coverage to binary file (target cov2bin) 11-10-2008 Version 1.1.0.1 - R&D features * Add --toNMScore=NUMBER minumum SE alignment score to put a read to NM (default 6; -1 - off) 11-10-2008 Version 1.1.0 - R&D features * Add base filter - remove bases (from alleleCaller) below given quality * Implemen snpCovCutoff sets the SNPCaller coverage cutoff * Add frc parameter - produce sort.count for forward and reverce strand * Add alignmentRecali (option to disable alignment recalibration) 15-09-2008 Version 1.0.2 - Bug fixes and snpCovCutoff * Bug fixes * Implemen snpCovCutoff sets the SNPCaller coverage cutoff 20-08-2008 Version 1.0.0 - Bug fixes * Bug fixes 23-06-2008 Version 0.3.1 rc 2 - small features * Produce exon counts file * Produce splice junction counts file 31-07-2008 Version 1.0 Beta rc 1 - testing, * Now always splice junctions counts + exon counts = gene counts (Normalization for read start method) * addition stats (ribo, mito, splice, * Developed beginning of testing framework. ./t/TEST_all.pl * Check if machine name and run id are unique. If not then fix. * Implement three times the chromosomal mean cut off in SNP caller * Change name to CASAVA * Implement testing 23-07-2008 Version 0.3.1 Candidate 1 - RNA-Seq and IO API * Produce exon counts file * Produce genes counts file * Produce splice junction counts file * Implement matching of all scaffold names to regexp and changing them to give expression * Implement TESTs * Implement RM counts 23-05-2008 Version 0.3.0 Candidate - single-read mode and IO API * Implement single-read mode * Use export/_sorted format all the way in BullFrog * Add removeRun target * Implement converter from _sorted to sort.txt * remove temporary files (smaller build) * Export Summary.html to Summary.xml 25-04-2008 Version 0.2.4 - Add GFF format and Reporting. Move runs configuration to XML * Implement GFF format converter for SNP file. * Read chromosome size from pipeline genome_size.xml file. * Refactor code so that export files and directories are stored in separate export directory. * Remove bugs from rmDuplicates - make the figures consistent. * Add list runs and remove run targets * Implement result html page with list of runs and lanes * Implement duplicates statistics html report * Implement calculate coverage in SNPcaller and store data in stats * Add QC of data before running * Implement HTML reports for depth and SNP's * Implement Depth simulator and html reports 15-03-2008 Version 0.2.3 - Implement basic parallelization and move some read/write rutines to IOLib * Allow for different bin size in project directory and build directory. * Separate directory creation from scripts (create directory structure in runBullFrog). * Refactor code so that each analysis script runs only on one bin. * Make the BullFrog run the human genome 30X in parallel. * Implement TaskManger - simple client-server task management application. * Implement task status monitor to track what is going on with each task. * Improve performance by buffering writes to files (all modules). * Parallel version of sortExport (process each lane separately on each proc and merge results on each bin separately - big performance improvement). * Remove Chr.pm library and refortor code to use configuration files. * Develop common libraries SGE.pm and TAskManager.pm * Add support for multiple runs. * Implement RUN configuration - generic representation of all runs which will be shared with structure variance script. * Several small changes in spliUnSorted.pl after comparison results with BullFrog Version0_2_1 15-02-2008 Version 0.2.2 - speed improvements and integrated configuration * Add wrapper around all system call to check the exit code and system's sigs * Move all hard coded names directories and constants to BullFrog.conf file * Move all genome specific and project specific to project.conf file * Implemented configuration layer * Implemented work flow restart - one the BullFrog gets stop the allow to run the precess from the point where it was stop. (Current stage data gets removed) * spliUnSorted an improved version of mkSorted2. First split the sort.txt to two columns and second column split by chromosome. Sort files and merge with appropriate bins. * Add sort memory limt parameter (sortBufferSize). All sort commans can take up to sortBufferSize memory. * Caching of baseQ value in reviseBaseScoreTable (rmDupFromSort.pl) 4 times faster. * Caching of $params value in alleleCall (BACON4) 2 times faster * Integrate SNPcaller to the workflow by developing runSNPcaller script. (move parammeters to configuration file). * sortExport stores list of chromosomes in configuration file and all other scripts don't detect chromosoms from directory structure but read it from config file. * Allow for overwriting of default project parameters. 07-01-2008 Version 0.2.1 - Cosmetic changes and refactoring * add --nosorted parameter which skips sort.txt, sort_anom.txt and sort_orph.txt files rmDupFromSort.pl produces unsort.txt files - 10% performance improvement * Move sorting of input files from rmDupFromSort.pl to sortExport.pl. Add sort merge with previous build. Potential 10% performance improvement when adding second experiment. * Create runBullFrog script , a master script to run all stages * Change BACON4.pl BACON4.pm ChrSize.pm Common.pm DataBinning.pm ExportFormat.pm mapReadToGenome.pl MapReadToGenome.pm into perl library and move to Illumina::Sequencing. * Move executable script to bin directory * mkSorted2.pl - Externalize sorting procedures * $binSize = 300000 ? * Remove hard coded library paths * Small improvements and refactoring in sortExport.pl * Provide example test data in TestData directory * Other small changes 07-01-2008 Version 0.2 * BullFrog: BuiLd From Resequencing Of Genome (aka Project Folder) 11-09-2007 Version 0.1 - add scripts to CVS * Initial revision * Added a routine to save NM-NM matches into their own directory * Added some counts of reads, failed filters etc to Reads.Idx