idContigRepeats.pl - Identifies unique/repeat boundary for contigs flanking a gap and generates unique sequence tags that can be used to check for gap closure.
idContigRepeats.pl [-h] -scaff <454 scaff file> -ace <acefile> -subdirs <list> -lib <lib info file> -od <path>
Options: -h detailed message (optional) -scaff 454Scaffolds.txt file -lib librart info file -ace path to acefile (required) -od output directory (required) (where assemInfo and gapdirs reside) -subdirs list of gap subdirectories (required) *subdirs should contain a file detailing the following in tab delimited format: 1. subproject name 2. est. gap size 3. left contig size 4. left contig name 5. right contig size 6. right contig name 7. scaffold name
This program is a wrapper that calls on several components to identify the unique/repeat boundary for contigs in a subproject. This boundary is used by subsequent software to identify pools of reads for use in reassembly, and for primer design. A Subproject represents the contig information flanking a single gap in a scaffold and should contain a file that details the following in a tab delim format:
1. subproject name 2. est. gap size 3. left contig size 4. left contig name 5. right contig size 6. right contig name 7. scaffold name
The input to this program is the newbler 454Scaffolds.txt, library info file, acefile, and a list of subdirectories. As previously mentioned the subdirectories should contain a text file detailing the contig information flanking a gap. The Scaffolds.txt file is used to reverse complement individual contig fasta if it is in the negative orientation in the scaffold.
#ace2contigs Fasta and Qual sequence for the -ace <acefile> is generated by calling ace2contigs and is deposited in a configurable location (see config section below). (See component help menu for further details.)
#fasta2MegaBlastDb.pl The Fasta sequence for the acefile is then used to create a blast database by calling fasta2MegaBlastDb.pl. The database is deposited in the same location as the acefile Fasta seq. (See component help menu for further details.)
#fastaParser.pl The scaffInfo.txt file in each subdirectory listed in -subdirs <list> is parsed and fastaParser.pl is used to create the Fasta sequence for each contig. If the contig is in the - orientation it will be reverse complemented. (See component help menu for further details.)
#idRepeatBoundary.pl The blast database and contig fastas are then used by idRepeatBoundary.pl. The contig fasta is aligned to the database and results are parsed for repeats that meet configurable thresholds. A configurable sliding window is used to check for the presence of repeat nearest the gap. If repeat is identified the window keeps sliding away from the gap until a configurable amount of unique sequence is found. This defines the unique/repeat boundary that is used to determine which data can be trusted for reassembly and for primer design. (See component help menu for further details.)
The output is a subDirectory/<contigname>.boundary file and a subDirectory/<contigname>.anchor file deposited in each subdirectory in -subdirs <list>.
<contigname>.boundary: #uniqueStart uniqueEnd repeatStart repeatEnd 51 39630 1 50
<contigname>.anchor: >contig00013 GTCGAGCGGGATGGTGCCGGTCTCGCCGATCCCCTGCCACGCCACCGTCC
A default config file named gapRes.config residing in <installDir/config> is used to specify the name and location of the software components as well as options for each component. To specify your own config file, set the environmental variable GAP_RES_CONFIG to the path and name of the custom config file.
The config parameters used by idContigRepeats.pl are as follows: (components are identified by "script.").
# idContigRepeats.pl # script.createFastaFromAce=ace2contigs script.createRefBlastDb=fasta2MegaBlastDb.pl script.createContigFasta=parsefasta2 script.identifyRepeatBoundary=idRepeatBoundary.pl script.blastLocation=/usr/X11R6/bin/
#**NOTE** #The blast location can also be defined by the environmental #variable BLAST_LOC; #********
idRepeatBoundary.aligner=blastall idContigRepeats.cleanupTmpFiles=1 idContigRepeats.outputLogDir=assemInfo idContigRepeats.scaffFileName=scaffInfo.txt idContigRepeats.bondaryFileExt=.boundary idContigRepeats.anchorFileExt=.anchor
# fasta 2 aligner Db config # fasta2MegaBlastDb.formatDbOptions=-p F -o -i
# idRepeatBoundary.pl # idRepeatBoundary.blastOptions=-p blastn -F F -e 1e-5 idRepeatBoundary.repeatLength=100 idRepeatBoundary.repeatIdentity=95 idRepeatBoundary.windowLength=500 idRepeatBoundary.subWindowLength=100 idRepeatBoundary.windowStep=250 idRepeatBoundary.uniqueAnchorLength=50 idRepeatBoundary.boundaryPadLength=50 idRepeatBoundary.aligner=blastall
# ace2contigs # ace2contigs.outputFileName=454AllContigs ace2contigs.options=-q
$Revision: 1.25 $
$Date: 2010-01-07 19:42:23 $
Kurt M. LaButti 2008/10/31 creation
S.Trong 2009/08/05 - added ability to skip and create warnings file if sub project fails.