update March 19, 2018
NAME
guesspairs.py - Given a list of
sequencing read files, make a guess as to which pairs of files
should be grouped together as left and right read files. Output is
a .tsv file. Pairs of files are written as two fields on a line.
Unpaired files are written as output lines with a single field.
SYNOPSIS
guesspairs.py --infile filename --ltag string
--rtag string [--extension string]
--outfile filename
OPTIONS
--infile filename - file containing
one filename per line
--ltag string - part of the filename that is
only found in left read files
--rtag string - part of the filename that is only
found in right read files
--extension string - If a file extension is
specified, only files with that file extension will be included in
the output. Files with other extensions (eg. .html) will be
ignored at input. string may be a comma-separated list of file
extensions eg. .fq.gz,.fq,.fastq,.fastq.gz
--outfile filename - output in TAB-separated (.tsv)
format. Paired end files are together on an output line,
separated by TAB. Unpaired files are each on a separate
line.
EXAMPLE:
Given the inputfile names.in
illumina_control_L1_.fq.gz
illumina_control_R2.fq.gz
illumina_treatment_L1.fq.gz
illumina_treatment_R2.fq.gz
iontorrent_control1.fq.gz
iontorrent_control2.fq.gz
guesspairs.py --infile names.in --ltag L1_
--rtag R2 --outfile names.grouped
will create a file called names.grouped:
illumina_control_L1_.fq.gz<TAB>illumina_control_R2.fq.gz
illumina_treatment_L1.fq.gz<TAB>illumina_treatment_R2.fq.gz
iontorrent_control1.fq.gz
iontorrent_control2.fq.gz
It may still be necessary to edit this file to get a namefile that
can be used for genome or transcriptome assembly.
AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist5