update March 19, 2018
NAME

guesspairs.py - Given a list of sequencing read files, make a guess as to which pairs of files should be grouped together as left and right read files. Output is a .tsv file. Pairs of files are written as two fields on a line. Unpaired files are written as output lines with a single field.

SYNOPSIS

guesspairs.py --infile filename --ltag string --rtag string [--extension string] --outfile filename

OPTIONS
--infile filename  - file containing one filename per line
--ltag string  - part of the filename that is only found in left read files
--rtag string - part of the filename that is only found in right read files
--extension string - If a file extension is specified, only files with that file extension will be included in the output. Files with other extensions (eg. .html) will be ignored at input. string may be a comma-separated list of file extensions eg. .fq.gz,.fq,.fastq,.fastq.gz
--outfile filename - output in TAB-separated (.tsv) format. Paired end files are together on an output line, separated  by TAB. Unpaired files are each on a separate line.

EXAMPLE:
Given the inputfile names.in

illumina_control_L1_.fq.gz
illumina_control_R2.fq.gz
illumina_treatment_L1.fq.gz
illumina_treatment_R2.fq.gz
iontorrent_control1.fq.gz
iontorrent_control2.fq.gz


guesspairs.py --infile names.in --ltag L1_ --rtag R2 --outfile names.grouped

will create a file called names.grouped:

illumina_control_L1_.fq.gz<TAB>illumina_control_R2.fq.gz
illumina_treatment_L1.fq.gz<TAB>illumina_treatment_R2.fq.gz
iontorrent_control1.fq.gz
iontorrent_control2.fq.gz


It may still be necessary to edit this file to get a namefile that can be used for genome or transcriptome assembly.  

AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist5