#!/usr/bin/env perl

=head1 NAME

stitchClosedSubProjects.pl

=head1 SYNOPSIS

  stitchClosedSubProjects.pl [options] <454Scaffolds.txt> <454AllContigs.fna> 
                                       <454AllContigs.qual> <fakes_dir> 
                                       <gapdirs.txt> <outputFileName> 
    
  Options:
  -d        Debug mode. Prints additional information                (optional)
  -t        Timed Debug mode. Prints additional information          (optional)
  -h        Detailed help message                                    (optional)
  
=head1 DESCRIPTION

This software component is part of the Gap Resolution sub system that is
responsible for creating scaffold fasta and qual that represent the 
original assembly that Gap Resolution was run on with the fakes of any 
closed subprojects stitched in.

It is imperitive that the same parameters used to run Gap Resolution are used
to run this script (see below for explanation)

A default config file named gapRes.config residing in <installPath>/config is 
used to specify the following parameters:

(system configurable via config file)
idContigRepeats.scaffFileName
    =used to find the scaffold info file for a subproject, this specifies the
     contigs in the subproject
idContigRepeats.boundaryFileExtension
    =used to find the boundary files for each contig in a sub project.  The
     boundary files specify the position of the unique anchors in each contig.
idRepeatBoundary.uniqueAnchorLength
    =used in conjunction with the position of the anchors and the below 
     parameter to determine contig and fake start/end
createSubProjectFakes.trimConsensusSeqToThisManyBasesAwayFromAnchor
    =used in conjunction with the above parameter to determine contig and fake
     start and end positions.


From the 454Scaffolds.txt file an AGP like data structure is created which 
lists every feature for each scaffold in order.  A feature in this case can 
be a contig or a gap (which can later be modified to a fake). The size, 
orientation, as well as the start and end coordinates are recorded for each 
feature.

The agp coordinates are then modified for the features associated with each 
closed subproject with the orientation of the feature in the original assembly 
taken into account.  The fasta and qual seqs for each feature are then pulled
from the orginal 454AllContigs.fna|qual files, reverse complemented if 
necessary, and stitched together, resulting in scaffold fasta and quals that 
represent the original assembly with gaps closed.


Example:

This particular subproject has ctg1 and ctg2 with positions "a" to "b" and 
"c" to "d" respectively, with unique anchors with positions in the 
contigs x,y and x',y' respectively and a gap of size N between the contigs. 
See the documentation of idRepeatBoundary.pl for more information how the 
repeat/unique boundary and anchors are generated.


  ctg1                           gap     ctg2
a                             b        c                             d
|-----------------------------|        |-----------------------------|
                               NNNNNNNN
                       ===                 ===  <-unique anchors
                       x y                 x'y'
                       |.....................|
                           closed gap fake

stitched contiguous fasta:

a                   x-1 x                   y'y'+1                   d
|---------------------||.....................||----------------------|


Upong reassembly of the subproject data the gap is closed and a fake is 
generated from (x - trimConsensusSeqToThisManyBasesAwayFromAnchor) to 
(y' + trimConsensusSeqToThisManyBasesAwayFromAnchor).  For this example the 
trimConsensusSeqToThisManyBasesAwayFromAnchor=0.

The coordinates for each feature are modified as follows:

# modified left contig end (b) = (x-0)-1   
           ((anchor start - trimConsensusSeqToThisManyBasesAwayFromAnchor)-1)
# modified gap feature (gap of size N) = sequence of fake  replaces gap
           (x-y)
# modified right contig end (a)= 
           ((anchor end + trimConsensusSeqToThisManyBasesAwayFromAnchor_+1)

Pulling the sequence using the modified coordinates and stitching them together
represent a contiguous block of sequence from the closed subproject.



=head1 VERSION

$Revision: 1.11 $

$Date: 2010-03-06 11:44:35 $

=head1 AUTHOR(S)

Kurt M. LaButti

=head1 COPYRIGHT

DOE Joint Genome Institute Microbial Genomics

Copyright (C) 2008 The Regents of the University of California
All rights reserved.

NOTICE: The Government is granted for itself and others acting on its
behalf a paid-up, nonexclusive irrevocable worldwide license in this
data to reproduce, prepare derivative works, and perform publicly and
display publicly. Beginning five (5) years after permission to assert
copyright is granted, subject to two possible five year renewals, the
Government is granted for itself and others acting on its behalf a
paid-up, non-exclusive, irrevocable worldwide license in this data to
reproduce, prepare derivative works, distribute copies to the public,
perform publicly and display publicly, and to permit others to do so.
NEITHER THE UNITED STATES NOR THE UNITED STATES DEPARTMENT OF ENERGY,
NOR ANY OF THEIR EMPLOYEES, MAKES ANY WARRANTY, EXPRESS OR IMPLIED,
OR ASSUMES ANY LEGAL LIABILITY OR RESPONSIBILITY FOR THE ACCURACY,
COMPLETENESS, OR USEFULNESS OF ANY INFORMATION, APPARATUS, PRODUCT,
OR PROCESS DISCLOSED, OR REPRESENTS THAT ITS USE WOULD NOT INFRINGE
PRIVATELY OWNED RIGHTS.

=head1 HISTORY

=over

=item *

Kurt M. LaButti 2009/08/24 creation

=item *

=back

=cut

use strict;
use warnings;
use Pod::Usage;
use Getopt::Long;
use Carp;
use Carp qw(cluck);
use Cwd;
use Cwd qw(abs_path);
use File::Path;
use File::Basename;
use File::Temp qw(tempdir);
use File::Path 'rmtree';
use FindBin qw($RealBin);
use lib "$RealBin/../lib";
use PGF::GapResolution::ParseScaffInfo;
use PGF::GapResolution::ParseBoundaryFile;
use PGF::GapResolution::Coords qw(reverseCompCoords);
use PGF::GapResolution::Warnings;
use PGF::Newbler::Scaffolds454;
use PGF::Utilities::FastaSeq qw(formatSeq);
use PGF::Utilities::Logger;
use PGF::Utilities::RunProcess qw(runProcess);
use vars qw($optHelp $optDebug $optTimeDebug);

#============================================================================#
# INPUT VALIDATION
#============================================================================#
my $programExecution = abs_path(dirname($0))."/".basename($0)." @ARGV";
 
if( !GetOptions(
		"d"  =>\$optDebug,
		"t"  =>\$optTimeDebug,
                "h"  =>\$optHelp,
                )
    ) { 
    usage(); 
} 
 
usage(2) if $optHelp;
 
if ( @ARGV != 6 ) {
    print STDERR "Some arguments are missing.\n\n";
    usage(); 
} 
 
#============================================================================#
# INITIALIZE VARIABLES
#============================================================================#
my $scaffoldsTxtFile   = shift @ARGV;
my $origAllContigFasta = shift @ARGV;
my $origAllContigQual  = shift @ARGV;
my $fakesDir           = shift @ARGV;
my $gapDirsList        = shift @ARGV;
my $outputFileName     = shift @ARGV;
my $outputDir = getcwd;


my $DEBUG= $optDebug ? $optDebug : 0;
#my $OBJ_454SCAFF =  PGF::Newbler::Scaffolds454->new($scaffoldsTxtFile);
my $configFile = defined $ENV{GAPRES_CONFIG} ?
   $ENV{GAPRES_CONFIG} : "$RealBin/../config/gapRes.config";
my $OBJ_PROPS = PGF::Utilities::Properties->new(-configFile=>$configFile);
   $OBJ_PROPS->setExceptionIfEntryNotFound(1); # confess if entry in config file
                                               # is not found.
my $OBJ_LOGGER = PGF::Utilities::Logger->new();
my $OBJ_WARNINGS = PGF::GapResolution::Warnings->new(
    path=>$outputDir, logger=>$OBJ_LOGGER);

my $logfile = "$outputDir/".basename($0) . ".log";

#============================================================================#
# VALIDATE INPUTS
#============================================================================#
# Set path for logging.
#
setFileForLogging($logfile);

# Log execution into log file.
#
logExecution($programExecution);

my $errMsg = '';
if (!$scaffoldsTxtFile || !(-s $scaffoldsTxtFile)) {
    $errMsg .= "Cannot find 454Scaffolds.txt $scaffoldsTxtFile.\n";
}
if (!$gapDirsList || !(-s $gapDirsList) ) {
    $errMsg .= "Cannot find gapDirsList $gapDirsList.\n";
}
if (!$origAllContigFasta ) {
    $errMsg .= "Cannot find 454AllContigs.fna $origAllContigFasta.\n";
}
if (!$origAllContigQual ) {
    $errMsg .= "Cannot find 454AllContigs.qual $origAllContigQual.\n";
}
if (!$fakesDir || !(-d $fakesDir)) {
    $errMsg .= "The input fakes directory $fakesDir does not exist.\n";
}
if ($OBJ_PROPS->getProperty("createSubProjectFakes.shredClosedGapConsensus") == 1) { 
    $errMsg .= "According to your config file fakes for this project were shredded, cannot continue.\n";
}
if ($OBJ_PROPS->getProperty("createSubProjectFakes.trimConsensusSeqToThisManyBasesAwayFromAnchor") > 0) { 
    $errMsg .= "According to your config file fakes were not trimmed back to anchors, cannot continue.\n";
}


if ( length $errMsg ) {
    print STDERR $errMsg;
    logError($errMsg);
    exit 1;
}

#============================================================================#
# MAIN
#============================================================================#


# %agp is a hash of hash of hash as detailed below:
#
# $agp{scaffold#}{ordinal#}{name} = contigName, gap, or fakename (s#c#c#.a1)
#                          {size} = length of feature
#                          {start}= start of feature seq for agp
#                          {end}  = end of feature seq for agp 
#                          {ori}  = ori of the feature in the scaffold
# The first key is the scaffold name, the second key for each scaffold is the
# ordinal feature number.  A feature can be a contig, gap, or fake of closed
# gap.  The third set of keys for a particular feature in a particular 
# scaffold contains detailed info about that feature.
 
# read in fakes directory
#
my $A_fakes = readInFakes($fakesDir);  
print STDERR "# " . ($#$A_fakes+1) ." fakes have been found\n";

# exit if there are no fakes
#
if ($#$A_fakes < 0) {
    $errMsg = "No closed gap fakes are found in $fakesDir, cannot continue.\n";
    print STDERR $errMsg;
    logError($errMsg);
    exit 1;
}


# digest gapDirsList for gapDir locations
#
print STDERR "# parsing subproject locations...\n";
my $H_gapDirLocations = parseGapDirsList($gapDirsList);


# read in scaffolds.txt and build agp data structure
#
print STDERR "# building AGP from $scaffoldsTxtFile...\n";
my ($H_agpInfo, 
    $H_scaffSizes, 
    $H_contig2Ordinal) = populateAgp($scaffoldsTxtFile);


# debug print
#
printAgpHash($H_agpInfo, 
	     $H_scaffSizes, 
	     $H_contig2Ordinal) if ($DEBUG);


# modify agp information for each closed gap
#
print STDERR "# modifying AGP coords for closed subs...\n";
modifyAGP($A_fakes, 
	  $H_contig2Ordinal, 
	  $H_agpInfo, 
	  $H_gapDirLocations);


# debug print
#
printAgpHash($H_agpInfo, 
	     $H_scaffSizes, 
	     $H_contig2Ordinal) if ($DEBUG);


# create temp fasta dir
#
my $tmpDir = "stitch_temp";
$tmpDir = createDirectory($tmpDir);


# split fasta and quals into separate files in a temp dir
#
print STDERR "# splitting orginal fasta/quals to $tmpDir...\n";
splitQueryFasta("$origAllContigFasta", $tmpDir);
splitQueryFasta("$origAllContigQual",  $tmpDir);

#print stitched together contigs/scaffolds
print STDERR "# printing stitched scaffolds...\n";
printStitchedScaffolds($H_agpInfo,
		       $H_scaffSizes, 
		       $H_contig2Ordinal, 
		       $fakesDir, 
		       $outputFileName, 
		       $tmpDir);


#cleanup tmp dir if no debug
if (! $optDebug) {
print STDERR "# cleanup.\n";
my $cmd = "rm -rf $tmpDir";
system($cmd);
}


#
#if ( $OBJ_WARNINGS->fileExists($anchorFile) && $optAceFile ) {
#    runAceTagger($optAceFile, $anchorFile);
#}

# If warning messages present, then save in .warnings.out file.
#
#$OBJ_WARNINGS->createFile() if $OBJ_WARNINGS->getNumberOfWarnings;

exit 0;
    
#============================================================================#
# SUBROUTINES
#============================================================================#

sub printDate {
    my ($msg) = @_;
    my $date = `date`;
    chomp $date;
    print STDERR "time=$date $msg\n";
}

#============================================================================#
sub modifyAGP {
    my ($A_fakes, $H_contig2Ordinal, $H_agpInfo, $H_gapDirLocations) = @_;
 
    my $boundaryFileExtension = 
        $OBJ_PROPS->getProperty("idContigRepeats.boundaryFileExtension");
    my $anchorSize = 
	$OBJ_PROPS->getProperty("idRepeatBoundary.uniqueAnchorLength");
    my $scaffInfoFileName = 
	$OBJ_PROPS->getProperty("idContigRepeats.scaffFileName");
    my $trimLenFromAnchor = 
	$OBJ_PROPS->getProperty("createSubProjectFakes.trimConsensusSeqToThisManyBasesAwayFromAnchor");
    

    #loop through each fake and calculate agp coords
    foreach my $fake (@$A_fakes) {
	chomp $fake;
	my $gapDir = $fake;
	$gapDir =~ s/\.a1//;
	
	#scaffinfo object
	my $scaffoldInfoFile = "$H_gapDirLocations->{$gapDir}/$scaffInfoFileName";

	my $OBJ_SCAFFINFO = PGF::GapResolution::ParseScaffInfo->new($scaffoldInfoFile);
	
	#gap dir contig info
	my $scaffold       = $OBJ_SCAFFINFO->getScaffoldName();
	my $leftContig     = $OBJ_SCAFFINFO->getLeftContig();
	my $leftContigLen  = $OBJ_SCAFFINFO->getLeftContigLength();
	my $rightContig    = $OBJ_SCAFFINFO->getRightContig();
	my $rightContigLen = $OBJ_SCAFFINFO->getRightContigLength();
	my $rightContigOrdinal = $H_contig2Ordinal->{$rightContig};
	my $leftContigOrdinal  = $H_contig2Ordinal->{$leftContig};
	my $rightContigOri = $H_agpInfo->{$scaffold}{$rightContigOrdinal}{'ori'};
	my $leftContigOri  = $H_agpInfo->{$scaffold}{$leftContigOrdinal }{'ori'};
	
	print "$fake $scaffold $leftContig $leftContigLen $leftContigOri $rightContig $rightContigLen $rightContigOri\n" if ($DEBUG);    
	
	#construct boundary file names
	my $rightBoundaryFile = "$H_gapDirLocations->{$gapDir}/$leftContig"  . "$boundaryFileExtension";
	my $leftBoundaryFile  = "$H_gapDirLocations->{$gapDir}/$rightContig" . "$boundaryFileExtension";
	
	#get anchor coordinates using the boundary file for each contig
	my ($leftTagStart, $leftTagEnd)   = getAnchorCoords($rightBoundaryFile,
							    $leftContig,
							    $leftContigLen, 
							    $leftContigOri, 
							    "left",
							    $anchorSize);
	my ($rightTagStart, $rightTagEnd) = getAnchorCoords($leftBoundaryFile, 
							    $rightContig,
							    $rightContigLen, 
							    $rightContigOri, 
							    "right",
							    $anchorSize);
	
	print "$leftContig $leftContigOri $leftTagStart, $leftTagEnd\t$rightContig $rightContigOri $rightTagStart, $rightTagEnd\n" if $DEBUG;

	#calculate new agp coords using anchor coords,ori,and trim len from anchor fake was created w/ 
	#modify agp coords in data structure  note* if ctg is in - ori, then tag coordinates are end, start, not start, end
	if ($leftContigOri  =~ /\+/) {  	
	    my $leftContigAgpEnd = ($leftTagStart-1)-$trimLenFromAnchor;
	    $H_agpInfo->{$scaffold}{$leftContigOrdinal}{'end'} = $leftContigAgpEnd;
	} else {
	    my $leftContigAgpStart = ($leftTagEnd+1)+$trimLenFromAnchor;  
	    $H_agpInfo->{$scaffold}{$leftContigOrdinal}{'start'} = $leftContigAgpStart;
	}
	
	if ($rightContigOri  =~ /\+/) {  	
	    my $rightContigAgpStart = ($rightTagEnd+1)+$trimLenFromAnchor;
	    $H_agpInfo->{$scaffold}{$rightContigOrdinal}{'start'} = $rightContigAgpStart;
	} else {
	    my $rightContigAgpEnd = ($rightTagStart-1)-$trimLenFromAnchor;
	    $H_agpInfo->{$scaffold}{$rightContigOrdinal}{'end'} = $rightContigAgpEnd;
	}	

	#and change the gap "fragment" to the closed gap fake	
	$H_agpInfo->{$scaffold}{$leftContigOrdinal+1}{'name'}  = $fake;
	$H_agpInfo->{$scaffold}{$leftContigOrdinal+1}{'start'} = "all";
	$H_agpInfo->{$scaffold}{$leftContigOrdinal+1}{'end'}   = "all";
	$H_agpInfo->{$scaffold}{$leftContigOrdinal+1}{'size'}  = "all";


    }


}

#============================================================================#
sub getAnchorCoords {
    my ($boundaryFile, $contig, $contigLen, $contigOri, $side, $anchorSize) = @_;
    my $tagStart;
    my $tagEnd;

    #exit if boundary file doesn't exist
    if (! -s $boundaryFile ) {
	$errMsg = "$boundaryFile does not exist or is size 0 $!\n";
	print STDERR "$errMsg\n"; 
	logError($errMsg);
	exit 1;
    }
    
    #grab info from file using the boundary object
    my $OBJ_BOUNDARY = PGF::GapResolution::ParseBoundaryFile->new($boundaryFile);
    my $uniqueStart = $OBJ_BOUNDARY->getUniqueStartPosition();
    my $uniqueEnd = $OBJ_BOUNDARY->getUniqueEndPosition();
    my $anchorStart = $OBJ_BOUNDARY->getAnchorStart($side, $anchorSize);
    my $anchorEnd = $OBJ_BOUNDARY->getAnchorEnd($side, $anchorSize);
    
    $tagStart = ($contigOri =~ /\-/) ? reverseCompCoords($contigLen, $anchorStart) : $anchorStart;  
    $tagEnd   = ($contigOri =~ /\-/) ? reverseCompCoords($contigLen, $anchorEnd) : $anchorEnd;  


   print "$boundaryFile $contig $contigLen $contigOri $side [$uniqueStart $uniqueEnd $anchorSize] $anchorStart $anchorEnd $tagStart $tagEnd\n" if $DEBUG;

 
    #if contig is in - orientation then the calculated start is actually 
    #the end and vice versa WITH RESPECT to contig in ace
    if ($contigOri =~ /\-/) {
	return ($tagEnd, $tagStart);
    } else {
	return ($tagStart, $tagEnd);
    }
        
}

#============================================================================#
sub createDirectory {
    my $dir = shift;
    my $workingDir = getcwd."/$dir";
 
    if (-d $workingDir ) {
	my $cmd = "rm -rf $workingDir";
	system($cmd);
    }
    
    unless( mkdir( $workingDir, 0777 ) ) {
	print STDERR "ERROR: failed to create directory $workingDir $!\n";
    }

    return $workingDir;
}
 
#============================================================================#
sub splitQueryFasta {
    my ($multiFastaFile, $outputDir) = @_;
    my %res = ();
    my $tag = "";
    my @tags = ();
    my $ext = $multiFastaFile;
    $ext =~ s/.*\.//;
     
    open IFILE, "$multiFastaFile";
    $_ = <IFILE>;
    while (defined $_) {
        if ( defined $_ && /^\>(\S+)/ ) {
            $tag=$1;
            push @tags, $tag;
            $_=<IFILE>;
            while (defined $_ && !/^\>(\S+)/ ) {
                if ( /\w/ && !/^\#/ ) {
                    $res{$tag}.= uc $_;
                }
                $_=<IFILE>;
            }
            $tag=$1;
        } else {
            $_=<IFILE>;
        } 
    } 
    close IFILE;
     
     
    foreach my $tag (@tags) {
        my $file = "$outputDir/$tag".'.'."$ext";
        open OFILE, ">$file" or die "Cannot create file ${tag}.$ext: $!\n";
        print OFILE ">$tag\n";
        print OFILE $res{$tag};
        close OFILE;
        $res{$tag} = 0;
    } 
} 

#============================================================================#
sub readInFakes {
    my ($fakesDir) = @_;

    unless (opendir DIR, "$fakesDir") {
	logError("ERROR: Can't open directory $fakesDir ($!)\n",1);
    }

    my @fakes = grep !/^\.|qual$/, readdir DIR;

    closedir DIR;

    return \@fakes;
}

#============================================================================#
sub parseGapDirsList {
    my ($gapDirsList) = @_;
    my %gapDirLocations;
   
    unless(open FILE, "$gapDirsList") {
	logError("ERROR: Can't open file $gapDirsList ($!)\n",1);
    }
    

    while (my $line = <FILE> ) {
	chomp $line;
	if ($line =~ /(s\d+c\d+c\d+)/) {
	    $gapDirLocations{$1} = $line;    
	}
    }
    
    close FILE;

    return \%gapDirLocations;
}

#============================================================================#
sub populateAgp {
    my ($scaffoldsTxtFile) = @_;

    my %agpInfo;
    my %scaffSizes;
    my %contig2Ordinal;

    unless(open FILE, "$scaffoldsTxtFile") {
	logError("ERROR: Can't open file $scaffoldsTxtFile ($!)\n",1);
    }
    
    while (my $line = <FILE> ) {
	chomp $line;
	my @lineInfo = split /\s+/, $line;
#record scaffold sizes
	$scaffSizes{$lineInfo[0]} = $lineInfo[2];
#record contigs ordinal position (if not a gap)
	$contig2Ordinal{$lineInfo[5]} = $lineInfo[3] unless ($lineInfo[6] eq "fragment");
#record scaff feature info
	$agpInfo{$lineInfo[0]}{$lineInfo[3]}{'name'} = 
	    ($lineInfo[4] eq "N") ? $lineInfo[6] : $lineInfo[5];
	$agpInfo{$lineInfo[0]}{$lineInfo[3]}{'size'} = 
	    ($lineInfo[4] eq "N") ? $lineInfo[5] : $lineInfo[7];
	$agpInfo{$lineInfo[0]}{$lineInfo[3]}{'start'}= "1";
	$agpInfo{$lineInfo[0]}{$lineInfo[3]}{'end'}  = 
	    $agpInfo{$lineInfo[0]}{$lineInfo[3]}{'size'};
	$agpInfo{$lineInfo[0]}{$lineInfo[3]}{'ori'}  = 
	    $lineInfo[8] unless ($lineInfo[6] eq "fragment");
    }
    
    close FILE;

    return \%agpInfo, \%scaffSizes, \%contig2Ordinal;
}

#============================================================================#
sub printAgpHash {
    my ($H_agpInfo, $H_scaffSizes, $H_contig2Ordinal) = @_;
    
    foreach my $scaffold ( sort { $H_scaffSizes->{$b} <=> $H_scaffSizes->{$a}  } keys %$H_scaffSizes ) {
	print "$scaffold\tlen=$H_scaffSizes->{$scaffold}\n";
	foreach my $ord ( sort {$a <=> $b} keys %{$H_agpInfo->{$scaffold}} ) {	    
	    print "\t$ord\tname=$H_agpInfo->{$scaffold}{$ord}{'name'} ";
	    print "size=$H_agpInfo->{$scaffold}{$ord}{'size'} ";
	    print "start=$H_agpInfo->{$scaffold}{$ord}{'start'} ";	
	    print "end=$H_agpInfo->{$scaffold}{$ord}{'end'} ";
	    if (exists $H_agpInfo->{$scaffold}{$ord}{'ori'}) {
		print "ori=$H_agpInfo->{$scaffold}{$ord}{'ori'}\n"
		} else { print "\n";}
	  #  print "ORD=$H_contig2Ordinal->{$H_agpInfo->{$scaffold}{$ord}{'name'}}\n" if (exists $H_contig2Ordinal->{$H_agpInfo->{$scaffold}{$ord}{'name'}});
	}
    }
}

#============================================================================#
sub printStitchedScaffolds {
    my ($H_agpInfo, $H_scaffSizes, $H_contig2Ordinal, $fakesDir, $outputFileName, $tmpDir) = @_;
    
    #open files for output
    open OUTF, ">$outputFileName.fasta" 
	or confess "Can't open $outputFileName.fasta $!\n";
    open OUTQ, ">$outputFileName.fasta.qual" 
	or confess "Can't open $outputFileName.fasta.qual $!\n";
    

    printDate("\tlooping through scaffolds in order of dec. size") if ($optTimeDebug);

    #go through each scaffold in order of decreasing size
    foreach my $scaffold ( sort { $H_scaffSizes->{$b} <=> $H_scaffSizes->{$a}  } keys %$H_scaffSizes ) {

	my $scaffoldFasta;
	my $scaffoldQual;

	print "printing $scaffold\tlen=$H_scaffSizes->{$scaffold}\n" if $DEBUG;

	printDate("\t\tprinting $scaffold\tlen=$H_scaffSizes->{$scaffold}\n") if ($optTimeDebug);
	
	print OUTF ">$scaffold\n";
	print OUTQ ">$scaffold\n";

	#go through each feature in order
	foreach my $ord ( sort {$a <=> $b} keys %{$H_agpInfo->{$scaffold}} ) {	    
	    
	    my $name  = $H_agpInfo->{$scaffold}{$ord}{'name'};
	    my $start = $H_agpInfo->{$scaffold}{$ord}{'start'};
	    my $end   = $H_agpInfo->{$scaffold}{$ord}{'end'};
	    my $ori   = $H_agpInfo->{$scaffold}{$ord}{'ori'};
	    

	    #feature is a contig, pull seq
	    if ($H_agpInfo->{$scaffold}{$ord}{'name'} =~ /contig/) {

		printDate("\t\tcontigFeature $H_agpInfo->{$scaffold}{$ord}{'name'}\n") if ($optTimeDebug);
		
		my $fastaSubset = getFastaSeq("$tmpDir/$name.fna", $start, $end, $ori);
		$scaffoldFasta .= $fastaSubset;
		my $qualSubset = getQualSeq("$tmpDir/$name.qual", $start, $end, $ori);
		$scaffoldQual .= $qualSubset;


		#feature is a gap, add N's
	    } elsif ($H_agpInfo->{$scaffold}{$ord}{'name'} =~ /fragment/) {

		printDate("\t\tfragmentFeature $H_agpInfo->{$scaffold}{$ord}{'name'}\n") if ($optTimeDebug);

		my $seq = "N"x$end;
		$scaffoldFasta .= $seq;
		my $qual = "0 "x$end;
		$scaffoldQual .= $qual;


		#feature is a fake of a closed gap
	    } else {

		printDate("\t\tfakeFeature $H_agpInfo->{$scaffold}{$ord}{'name'}\n") if ($optTimeDebug);
		
		my $seq = getFastaSeq("$fakesDir/$name", $start, $end, "+");
		$scaffoldFasta .= $seq;
		my $qual = getQualSeq("$fakesDir/$name.qual", $start, $end, "+");
		$scaffoldQual .= $qual;
	    }
	}
	
	
	printDate("\tcolumnifying\n") if ($optTimeDebug);
	
	#columnify and print total scaffold seq
	print "FASTA\n[$scaffoldFasta]\n" if $DEBUG;
	printDate("\t\tformatting\n") if ($optTimeDebug);
	my $formatFasta = formatSeq($scaffoldFasta,60);
       	print OUTF "$formatFasta\n";
	printDate("\t\tdone formatting, removing extra/trailing spaces in qual\n") if ($optTimeDebug);
	$scaffoldQual =~ s/\s+/ /g; #just in case there are extra spaces
	$scaffoldQual =~ s/\s+$//g; #remove trailing spaces
	print "QUAL\n[$scaffoldQual]\n" if $DEBUG;
	printDate("\tdone.formatting qual now\n") if ($optTimeDebug);
	my $formatQual = formatSeq($scaffoldQual,60);
       	print OUTQ "$formatQual\n";
	printDate("\tdone formatting qual\n")if ($optTimeDebug);
	
    }
    
    close OUTF;    
    close OUTQ;
    
}

#============================================================================#
sub getFastaSeq {
    my ($fastaFile, $start, $end, $ori) = @_;

    open FASTA, "$fastaFile" or confess "Can't open $fastaFile $!";
    my $seq;
    while (my $line = <FASTA>) {
	chomp $line;
	next if ($line =~ /^>/);
	$seq .= $line;
    }
    close FASTA;

    #pull seq between start/end coords, if its a fake pull all
    my $seqSubset = ($start =~ /all/) 
	? substr($seq, 0) 
	: substr($seq, $start-1, ($end-$start)+1);
    
    #return reversed feature if ori is negative
    return ($ori =~ /-/) ? reverseComp($seqSubset) : $seqSubset;
}

#============================================================================#
sub reverseComp {
    my ($seq) = @_;
     
    #reverse complement sequence string and return.
    # 
    my $revCompSeq = reverse($seq);
    $revCompSeq =~ tr/acgtACGT/tgcaTGCA/;
      
    return $revCompSeq;
     
} 

#============================================================================#
sub getQualSeq {
    my ($qualFile, $start, $end, $ori) = @_;

    open QUAL, "$qualFile" or confess "Can't open $qualFile $!";
    my $seq;
    while (my $line = <QUAL>) {
	next if ($line =~ /^>/);
	$seq .= $line;
    }
    close QUAL;
    
    #replace new lines with spaces
    $seq =~ s/\n/ /g;
    

    #pull seq between start/end coords, if its a fake pull all
    my @splitQuals  = split /\s+/, $seq;
    my @qualsSubset = ($start =~ /all/) 
	? @splitQuals[0..$#splitQuals]
	: @splitQuals[$start-1..$end-1];
    
    #reverse feature quals if ori is negative    
    if ($ori =~ /\-/) {
	print "WARNING:  NOT SURE THIS IS HANDLING - ORIENTED FEATURES CORRECTLY, CHECK!\n";
	@splitQuals = reverse @qualsSubset;
	@qualsSubset = @splitQuals; #added 23/12/2009
    }

    #stringify again
    $seq = join " ", @qualsSubset;
    @splitQuals = ();
    @qualsSubset = ();
    
    #ensure space at the end
    $seq = $seq . " ";  
    #ensure all spaces are single
    $seq =~ s/\s+/ /g;  

    return $seq;
}

#============================================================================#
sub usage {
    my $verbose = shift || 1;
    pod2usage(-verbose=>$verbose);
    exit 1;
}

#============================================================================#
sub setFileForLogging {
    
    my $logFile = shift;
    
    $OBJ_LOGGER->setLogOutFileAppend($logFile);
    $OBJ_LOGGER->setLogErrorFileAppend($logFile);
    
}

#============================================================================#
sub logError {
    my $message = shift;
    my $confess = shift || 0;
    
    $OBJ_LOGGER->logError($message);
    $OBJ_WARNINGS->add($message);
    
    if ( $confess ) {
        $OBJ_WARNINGS->createFile() if $OBJ_WARNINGS->getNumberOfWarnings;
        confess $message;
    }
    
}
    
#============================================================================#
sub logOutput {
    my $message = shift;
    my $printMsg = shift || 0;
    
    $OBJ_LOGGER->logOut($message);
    print "$message\n" if $printMsg;
}
    
#============================================================================#
sub logOutputFileCreation {
    
    my $outputFile = shift;
    
    $outputFile = abs_path($outputFile);
    
    my $fileExists = -s $outputFile ? 1:0;
    my $msg = $fileExists ? "Created $outputFile" :
        "ERROR: failed to create $outputFile";
    logOutput($msg,1);
    
    return $fileExists;
    
}

#============================================================================#
sub logExecution {
    
    my $programExecution = shift;
    
    my $msg = "Command: ".$programExecution."\n".
              "Current directory: ".getcwd;
    logOutput($msg);
}

