NAME

getRepeatContig.pl


SYNOPSIS

  getRepeatContig.pl [options] <scaffinfo.txt> <readinfo.txt> <fastaOfAllContigs>
    <qualOfAllContigs> <outputContigListFile>
  Options:
  -rs <number>            Length of region from gap to look for read pairs (required)
  -contig <contig name>   Name of contig adjacent to the gap (required)
  -pos <L or R>           Position of contig (L-left, R-right) relative to gap (required)
  -min <number>           Minimun repeat contig length (required)
  -max <number>           Maximum repeat contig length (required)
  -rl <number>            Minimum number of linking reads between specified contig
                          and repeat contig.
  -od <dir>               Output directory to create the fasta and qual files (optional; default is current working directory)
  -rlog <file>            Name of file containing repeat contig reads (optional; for debugging purposes)
  -fastaExt <string>      File extension to name the repeat fasta file. Prefix is contig name (required)
  -qualExt <string>       File extension to name the repeat qual file. Prefix is contig name (required)
  -h help message (optional)


DESCRIPTION

Part of the Gap Resolution sub system, this software component identifies repeat contigs based on the read pairing information on the specified contig. A fasta and qual file of each repeat contig is generated and a list of repeat contig names are created in the file specified in <outputContigListFile>. The following steps are performed to identify a repeat contig for creating a fasta and qual files.

1. Look for reads that contain pairs within the specified contig and within the region from the start of the gap to the range specified using the -regionsize parameter. The reads that have pairs must point towards the gap. The read pairs are obtained from the <readinfo.txt> file and only read pairs with the status P (paired) or M (multiply placed) are used.

2. If the read pair is in a different contig and scaffold, and there are at least N number (specified using the -rl option) of read links present, mark the repeat contig for consideration in creating fasta and qual files. The repeat contig length must be >= the minimum repeat contig length (specified using the -min option) and <= the maximum repeat contig length (specified using the -max option).

In summary, the repeat contigs must pass the following criteria to be considered for creating the fasta and qual files: a. repeat contig must be >= the minimum repeat contig length (specified using the -min option) and must be <= to the maximum repeat contig length (specified using the -max option). b. A minimum of N read links (specified using the -rl option) between the specified contig and the repeat must exist. c. repeat contig must be in a different scaffold than specified contig. d. read pairs in the specified contig must be located within a specified distance from the gap (using -rs option).

An <outputContigListFile> is created containing the names of the repeat contigs.

Description of inputs:

  scaffinfo.txt - scaffold info file generated by createSubProject.pl
  readinfo.txt - read pairing information file generated from newblerAce2ReadPair.pl
  fastaOfAllContigs - fasta file containing specified contig and all repeat contigs
  qualOfAllContigs - qual file containing specified contig and all repeat contigs
  outputContigListFile - name of file containing the list of repeat contigs described in step 2.


VERSION

$Revision: 1.7 $

$Date: 2009-08-28 22:57:07 $


AUTHOR(S)

Stephan Trong


HISTORY