NAME

validateSubProject.pl


SYNOPSIS

  validateSubProject.pl [options] <gapdirs.txt> <libinfo.txt> <sffinfo.txt>
  Options:
  -o            Output validation summary file (optional; default defined in gapRes.config file)
  -log <file>   Log file (optional; default is validateSubProject.pl.log)
  -warn <file>  Warning file (optional; default is defined in gapRes.config)
  -h            Detailed message (optional)


DESCRIPTION

This is a wrapper program for the Gap Resolution sub system that is responsible for validating each of the sub project to determine if the gap is successfully closed or not.

Unless otherwise noted, anchor here refers to the anchor sequence obtained from the left and right contigs of the gap prior to reassembly. This is done upstream using idContigRepeats.pl.

The following tasks are done for each sub project:

  1. Create a contig orientation file of the assembly in the assembly directory
     using the 454Scaffolds.txt file.
  2. Create a read pairing info file (readinfo.txt) file in the assembly directory
     from the ace file.
  3. Validate the gap assembly by calling valdateGapAssembly.pl. This will create
     a validinfo.txt file within the sub project directory containing information
     pertaining to the validation of the assembly.  For more info, refer to 
     validateGapAssembly.pl's documentation.

For more information on the validation rules, see the documentation on validateGapAssembly.pl.

The format of the validinfo.txt file is as follows:

  leftAnchorContig=name of contig
  leftAnchorContigLength=number
  leftAnchorStart=number
  leftAnchorEnd=number
  rightAnchorContig=name of contig
  rightAnchorContigLength=number
  rightAnchorStart=number
  rightAnchorEnd=number
  anchorStart=number
  anchorEnd=number
  anchorDistance=number
  gapSize=number
  gapSizeStdDev=number
  numConsistentReadPairs=number
  numInconsistentReadPairs=number
  pctConsistent=number
  avgConsensusQualityBetweenAnchors=number
  isDistanceValid=0|1
  isReadPairingValid=0|1
  isQualityValid=0|1
  status=PASS|FAIL
  doPrimerDesign=0|1
  comment=comment entry

The format of the contigOrientation.txt file is as follows, with each item separated by a tab, one entry per line:

  1. contigName
  2. orientation (+|-)

A validationSummary.txt file is created in the working directory containing the validation status of each of the sub project directories. The format of the file is as follows, with each entry separated by a tab:

  1. full path of sub project directory
  2. status (PASS or FAIL)
  3. comments


DEPENDENCIES

The following scripts (configurable in config file) must exist in the same path as validateSubProject.pl unless the path to the script is defined in the config file:

  * newblerAce2ReadPair.pl
  * validateGapAssembly.pl

The following are the description of the input files used by the validateSubProject.pl.

  * gapdir.txt - list of gap directories created by createSubProject.pl
  * libinfo.txt - library insert size and std dev file created by parseNewblerMetrics.pl
  * sffinfo.txt - location of sff files created by parseNewblerMetrics.pl

The following files are created or must exist in each sub project directory:

  * 454Scaffolds.txt - created by Newbler in the sub project directory + Newbler/assembly.
  * contigOrientation.txt - created within the sub project directory +
    Newbler/assembly using the 454Scaffolds.txt file. The format is contig + tab
    + orientation(+/-), one per line.
  * readinfo.txt - created within the sub project directory + Newbler/assembly
    using newblerAce2ReadPair.pl.
  * scaffinfo.txt - must exist within the sub project directory (created
    elsewhere using createSubProject.pl)

For more information regarding the formats of these files, refer the documentation of the scripts that are used to create the file.

A default config file named gapRes.config residing in <installPath>/config is used to specify the following parameters:

(configurable)

  validateSubProject.anchorSeqMinAlignPercentIdentity
    The anchor sequences are aligned to the reference sequence to determine the
    positions of the anchors in the assembly. Use this to specify the minimum
    alignment percent identity.
  validateSubProject.anchorSeqMinAlignmentLength
    The anchor sequences are aligned to the reference sequence to determine the
    positions of the anchors in the assembly. Use this to specify the minimum
    alignment length.
  
  validateSubProject.percentGapSizedPaddingForValidation
    If the left and right anchors are aligned to the same contig in the assembly,
    the distance between the anchors must be within the gap size +/ a standard
    deviation to be considered valid. This parameter is used to compute the gap
    size standard deviation, represented as a percentage of the gap size (e.g.,
    std dev=gap size * percent gap size padding). The gap size is determined
    from the required pre-existing file scaffinfo.txt (previously generaged by
    createSubProject.pl)
   
  validateSubProject.percentReadPairConsistency
    When validating the read pairs for consistency, the percentage of consistent
    read pairs (valid read pairs/invalid read pairs) must be >= to the threshold
    defined by this parameter.  Read pairs are considered valid if they meet all
    of the following criteria: 1) read pairs are located on the same contig, 2)
    distance between the read pairs are within the library insert size +/ standard
    deviation, 3) The read orientation points toward each other.
  
  validateSubProject.libInsertSizeStdDevMultiplier
    One of the criteria for validating read pairs for consistency is determining
    whether the distance of the read pairs are within the library insert size +/
    standard deviation. This parameter is used as a standard deviation multiplier
    in order to allow for the configuration of the library insert size range.
  
  validateSubProject.minAvgConsensusQualityBetweenAnchors
    One of the criteria for validating a sub project is that the average cosensus
    quality between the anchors must be >= to a threshold.  This parameter is used
    to specify this threshold.
  
(system configuration)
  validateSubProject.assemblyDirectory
  validateSubProject.validateGapAssemblyOutputFile
  validateSubProject.assemblyContigsFasta
  validateSubProject.assemblyContigsQual
  validateSubProject.validationSummaryFile
  validateSubProject.createDbFileExecutable
  validateSubProject.aligner
  validateSubProject.alignerParameters


VERSION

$Revision$

$Date$


AUTHOR(S)

Stephan Trong


HISTORY