validateSubProject.pl
validateSubProject.pl [options] <gapdirs.txt> <libinfo.txt> <sffinfo.txt>
Options: -o Output validation summary file (optional; default defined in gapRes.config file) -log <file> Log file (optional; default is validateSubProject.pl.log) -warn <file> Warning file (optional; default is defined in gapRes.config) -h Detailed message (optional)
This is a wrapper program for the Gap Resolution sub system that is responsible for validating each of the sub project to determine if the gap is successfully closed or not.
Unless otherwise noted, anchor here refers to the anchor sequence obtained from the left and right contigs of the gap prior to reassembly. This is done upstream using idContigRepeats.pl.
The following tasks are done for each sub project:
1. Create a contig orientation file of the assembly in the assembly directory using the 454Scaffolds.txt file. 2. Create a read pairing info file (readinfo.txt) file in the assembly directory from the ace file. 3. Validate the gap assembly by calling valdateGapAssembly.pl. This will create a validinfo.txt file within the sub project directory containing information pertaining to the validation of the assembly. For more info, refer to validateGapAssembly.pl's documentation.
For more information on the validation rules, see the documentation on validateGapAssembly.pl.
The format of the validinfo.txt file is as follows:
leftAnchorContig=name of contig leftAnchorContigLength=number leftAnchorStart=number leftAnchorEnd=number rightAnchorContig=name of contig rightAnchorContigLength=number rightAnchorStart=number rightAnchorEnd=number anchorStart=number anchorEnd=number anchorDistance=number gapSize=number gapSizeStdDev=number numConsistentReadPairs=number numInconsistentReadPairs=number pctConsistent=number avgConsensusQualityBetweenAnchors=number isDistanceValid=0|1 isReadPairingValid=0|1 isQualityValid=0|1 status=PASS|FAIL doPrimerDesign=0|1 comment=comment entry
The format of the contigOrientation.txt file is as follows, with each item separated by a tab, one entry per line:
1. contigName 2. orientation (+|-)
A validationSummary.txt file is created in the working directory containing the validation status of each of the sub project directories. The format of the file is as follows, with each entry separated by a tab:
1. full path of sub project directory 2. status (PASS or FAIL) 3. comments
The following scripts (configurable in config file) must exist in the same path as validateSubProject.pl unless the path to the script is defined in the config file:
* newblerAce2ReadPair.pl * validateGapAssembly.pl
The following are the description of the input files used by the validateSubProject.pl.
* gapdir.txt - list of gap directories created by createSubProject.pl * libinfo.txt - library insert size and std dev file created by parseNewblerMetrics.pl * sffinfo.txt - location of sff files created by parseNewblerMetrics.pl
The following files are created or must exist in each sub project directory:
* 454Scaffolds.txt - created by Newbler in the sub project directory + Newbler/assembly. * contigOrientation.txt - created within the sub project directory + Newbler/assembly using the 454Scaffolds.txt file. The format is contig + tab + orientation(+/-), one per line. * readinfo.txt - created within the sub project directory + Newbler/assembly using newblerAce2ReadPair.pl. * scaffinfo.txt - must exist within the sub project directory (created elsewhere using createSubProject.pl)
For more information regarding the formats of these files, refer the documentation of the scripts that are used to create the file.
A default config file named gapRes.config residing in <installPath>/config is used to specify the following parameters:
(configurable)
validateSubProject.anchorSeqMinAlignPercentIdentity The anchor sequences are aligned to the reference sequence to determine the positions of the anchors in the assembly. Use this to specify the minimum alignment percent identity.
validateSubProject.anchorSeqMinAlignmentLength The anchor sequences are aligned to the reference sequence to determine the positions of the anchors in the assembly. Use this to specify the minimum alignment length. validateSubProject.percentGapSizedPaddingForValidation If the left and right anchors are aligned to the same contig in the assembly, the distance between the anchors must be within the gap size +/ a standard deviation to be considered valid. This parameter is used to compute the gap size standard deviation, represented as a percentage of the gap size (e.g., std dev=gap size * percent gap size padding). The gap size is determined from the required pre-existing file scaffinfo.txt (previously generaged by createSubProject.pl) validateSubProject.percentReadPairConsistency When validating the read pairs for consistency, the percentage of consistent read pairs (valid read pairs/invalid read pairs) must be >= to the threshold defined by this parameter. Read pairs are considered valid if they meet all of the following criteria: 1) read pairs are located on the same contig, 2) distance between the read pairs are within the library insert size +/ standard deviation, 3) The read orientation points toward each other. validateSubProject.libInsertSizeStdDevMultiplier One of the criteria for validating read pairs for consistency is determining whether the distance of the read pairs are within the library insert size +/ standard deviation. This parameter is used as a standard deviation multiplier in order to allow for the configuration of the library insert size range. validateSubProject.minAvgConsensusQualityBetweenAnchors One of the criteria for validating a sub project is that the average cosensus quality between the anchors must be >= to a threshold. This parameter is used to specify this threshold. (system configuration)
validateSubProject.assemblyDirectory
validateSubProject.validateGapAssemblyOutputFile
validateSubProject.assemblyContigsFasta
validateSubProject.assemblyContigsQual
validateSubProject.validationSummaryFile
validateSubProject.createDbFileExecutable
validateSubProject.aligner
validateSubProject.alignerParameters
$Revision$
$Date$
Stephan Trong
S.Trong 2008/12/05 creation
S.Trong 2009/08/05 - Split validation of sub projects from the creation of fakes. Fakes creation is now handled by the createSubProjectFakes.pl. - If sub project fails during analysis, skip instead of generating fatal error. The error is reported in a .warnings file.
S.Trong 2008/12/29 - added -log and -warn options.