// -*- mode: c++; indent-tabs-mode: nil; -*- // // Copyright 2009 Illumina, Inc. // // This software is covered by the "Illumina Genome Analyzer Software // License Agreement" and the "Illumina Source Code License Agreement", // and certain third party copyright/licenses, and any user of this // source file is bound by the terms therein (see accompanying files // Illumina_Genome_Analyzer_Software_License_Agreement.pdf and // Illumina_Source_Code_License_Agreement.pdf and third party // copyright/license notices). // // /// \file /// /// \author Chris Saunders /// #include "blt/blt_info.hh" #include "blt_common/blt_shared.hh" #include "blt_util/log.hh" #include #include void blt_info:: usage(const char* xmessage) const { log_os << "\n" << name() << " - Bayesian and likelihood ratio test snp caller\n" "\tversion: " << version() << "\n" "\n" "usage: blt -sorted file1 [-sorted file2... options] > event_report\n" "\n" " -sorted file - Analyze reads from 'file' in sorted export format (use \"" << STDIN_FILENAME << "\" for stdin)\n" " If multiple -sorted flags are given the specified sorted files will be\n" " merged and jointly analyzed. Identical reads or files in the input set will\n" " not be consolidated. Gapped alignments can be processed from the sorted files,\n" " however alignments with deletions larger than " << MAX_READ_REF_DELETION_SIZE << " bases will be filtered out.\n" "\n" "options:\n" "\n" " -bsnp-diploid x - Use Bayesian diploid genotype snp caller with heterozygosity=x\n" " -bsnp-monoploid x - Use Bayesian monoploid genotype snp caller with heterozygosity=x\n" " -bsnp-nploid n x - Use Bayesian nploid genotype snp caller with ploidy=n and prior(snp)=x\n" " -lsnp-alpha x - Use likelihood ratio test snp caller with alpha=x\n" " -bsnp-diploid-file file\n" " - Write bsnp-diploid results to 'file' instead of stdout\n" " -bsnp-diploid-allele-file file\n" " - Write the most probable genotype at every position to file\n" " -bsnp-diploid-min-maxgt-qphred n\n" " - Filter out bsnp-diploid results if Q(max_gtype)n within a window of m flanking bases. (see doc)\n" " -min-single-align-score n\n" " - Reads with single align score=1 [and <= ref_size if reference specified])\n" " -report-range-end n\n" " - Event reports and coverage end after base n or min(n,ref_size) if reference\n" " specified.\n" " (default: last read aligned position >=1 [and <= ref_size if reference specified])\n" " -report-range-reference\n" " - Event reports and coverage span the entire reference sequence.\n" " A reference sequence is required to use this flag. This sets begin=1 and\n" " end=ref_size. This flag cannot be combined with -report-range-begin/-end.\n" " (default: as described for -report-range-begin/-end above)\n" "\n" " -bacon-allele file - Write BaCON allele-calls to 'file'\n" " -bacon-allele-print-empty\n" " - Print empty rows in the BaCON allele-call file\n" " -bacon-snp file - Write BaCON snp-calls to 'file' - note that anomalous depth is not filtered\n" " -bacon-call-thresh x\n" " - (default: " << DEFAULT_BACON_CALL_THRESH << ")\n" " -bacon-second-call-thresh x\n" " - (default: bacon-call-thresh value)\n" " -bacon-het-snp-ratio-thresh x\n" " - (default: " << DEFAULT_BACON_HET_SNP_RATIO_THRESH << ")\n" " -read-sample-rate x\n" " - Deterministically sub-sample reads at rate x (0n within a window of (2m+1) positions centered as close\n" " as possible to the present base. Note that the present base is included in the test.\n" " Near sequence endpoints, the closest full window is used, so that the filter gives\n" " the same result within the ranges [1,m+1] and [size-m,size]. Each indel occuring in\n" " this window counts as one mismatch.\n" "\n" "Possible events:\n" "LSNP - snp called by the LRT snp caller\n" "BSNP2 - snp called by the Bayesian diploid genotype caller\n" "BSNP1 - snp called by the Bayesian monoploid genotype caller\n" "BSNPN - snp called by the Bayesian nploid genotype caller\n" "ANOM_COV - anomalous strand coverage at site\n" "ANOM_DIS - anomalous dependency of base distribution on strand\n" "ALLSITES_COVERAGE - coverage statistics after alignment score filtration (see below)\n" "ALLSITES_COVERAGE_USED - coverage statistics after all basecall filtration (see below)\n" "NO_REF_N_COVERAGE - ALLSITES_COVERAGE without reference 'N' positions (see below)\n" "NO_REF_N_COVERAGE_USED - ALLSITES_COVERAGE_USED without reference 'N' positions (see below)\n" "READ_COUNTS - report on the number of reads used or filtered for various reasons (see below)\n" "CMDLINE - echo the blt command line\n" "\n" "When the '-print-evidence' flag is given, a multi-line basecall report is printed\n" "for each event position. The report starts with tag: EVIDENCE, and lists information\n" "about each basecall at that position.\n" "\n" "Coverage reports:\n" " The coverage information reported with tags ALLSITES_COVERAGE and NO_REF_N_COVERAGE\n" "includes all basecalls from reads that pass the alignment score filters. 'N's are included\n" "except where a continuous sequence of 'N's is found at the end of a read. These rules are\n" "also applied to the 'bcalls' column reported in the BaCON allele-caller file.\n" " The coverage reported with tags ALLSITES_COVERAGE_USED and NO_REF_N_COVERAGE_USED\n" "includes only basecalls used for snp-calling. This basecall count is also reported for\n" "individual sites in the 'bcalls_used' column of several output files.\n" " The range of the coverage calculation is between the first and last base covered by\n" "any read, unless -report-range-begin or -report-range-end are set. Note that the default\n" "range infered from the input reads could reduce the accuracy of the coverage estimate,\n" "so use of the -report-range-{begin,end} flags is recomended.\n" " ALLSITES_COVERAGE* results are generated using every position in the begin-end range\n" "described above. NO_REF_N_COVERAGE* results are generated using every position in this\n" "range where the reference base is not 'N'. NO_REF_N_COVERAGE* is only reported when a\n" "reference sequence is specified.\n" "\n" "Read report:\n" " The READ_COUNTS event reports the number of reads...\n" "1) used in basecalling (used),\n" "2) filtered by single or pair alignment score (align-score-filter)\n" "3) filtered because they had gapped alignments (gapped-alignment)\n" "3) filtered because they were not aligned (unmapped)\n" "4) marked for filtration in primary analysis (primary-filter)\n" "Note that reads falling entirely outside of the report range are not included in these\n" "counts. Each read within the report range can only belong to one of the above categories.\n" "\n" "Event information:\n" "pos: position in reference genome\n" "ref: reference base\n" "P(snp): probability of...\n" " ...reference allele frequency being less than one (LRT model)\n" " ...any non-reference genotype (Bayesian genotype models)\n" "Q(snp): Qphred(reference allele posterior probability)\n" "freq(X): maximum likelihood freqeuncy of allele X\n" "max_gtype: genotype with highest posterior probability\n" "P(max_gtype): highest posterior probability\n" "Q(max_gtype): Qphred(1-highest posterior probability)\n" "max2_gtype: genotype with second-highest posterior probability\n" "P(max2_gtype): second-highest posterior probability\n" "\n" "blt can additionally output files for BaCON allele calls, BaCON snp calls or counts\n" "\n" "BaCON allele call file:\n" " TBD -- see CASAVA documentation\n" "\n" "BaCON snp call file:\n" " TBD -- see CASAVA documentation, except note that filtration of snps at positions\n" "with depth >= 3 times the chromosomal mean is not done.\n" "\n" "Counts file:\n" " blt can also write a summary file of observation counts for every position when\n" "'-counts filename' is specified on the command-line. The file can start with a\n" "a series of comment lines indicated by a leading '#' character, followed by lines\n" "with the following tab-delimited fields:\n" "1. reference sequence position number\n" "2. no of A basecalls used\n" "3. no of C basecalls used\n" "4. no of G basecalls used\n" "5. no of T basecalls used\n" "5. no of unused basecalls\n" "\n" " Reads that failed alignment score filters and trailing 'N's are not included in\n" "the counts file. Among the counts, unambiguous bases that passed all snp calling\n" "filters are summarized in the 'used' counts, all others are summarized in\n" "the 'unused' counts.\n" "\n" "Caveats:\n" "- No circular genome support, negative alignment positions in reads are ignored.\n" "- 'N' basecalls are ignored (except in coverage tests).\n" "- Only coverage tests are conducted at sites with 'N' in the reference sequence.\n" "\n"; exit(EXIT_SUCCESS); }