update Oct 2, 2011
NAME
csv2phyl.sh -
Convert molecular marker data from csv format to Phylip format
This program is obsolete. Instead,
use phylcnv.py, which handles a larger number of
file formats.
SYNOPSIS
csv2phyl.sh
[csvfile]
DESCRIPTION
This script reads a file containing
discrete data (eg. molecular markers) and writes it in Phylip
non-interleaved format. The csvfiile is a comma-separated value file of
the type produced by exporting a spreadsheet (eg. OpenOffice Calc or
MS-Excell) to a .csv file. If csv file is specified, the
script will read the csv file and write a Phylip file removing the .csv
or .CSV extension (if any) and replacing it with .phyl. Otherwise,
csv2phyl.sh will take input from the standard input and write to the
standard output.
Some spreadsheets export .csv files, enclosing each data item in double
quotes ("). csv2phyl.sh removes all double quote characters before
writing to the output file.
INPUT
The input file consists of one or more
lines lines of comma-separated marker data, in which the first field is
the name of the marker, and all other fields are single characters.
Example:
LR210,1,0,1,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,1
LR211,0,1,1,1,0,1,1,0,1,1,0,0,1,0,0,0,1,0,0,1,0,1,1,1,1,0,1
LR212,0,0,1,1,0,1,0,0,1,1,0,0,1,0,0,1,1,0,0,1,0,1,1,1,1,0,1
LR213,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0,1,1,0,1
LR214,0,0,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
LR215,1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0,1,0,1,1,0,1
LR216,1,1,1,1,0,1,1,0,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,1,0,0,1
LR217,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,0,1,1,0,1
OUTPUT
The output is a Phylip non-interleaved
data file. The first line has integers telling the number of
isolates/species/strains, and the number of markers. Each subsequent
line has an isolate/species/strain name of exactly 10 characters,
followed by marker data. If the name is greater than 10 characters, it
is truncated. If it is less than 10 characters, it is padded with
blanks to 10 characters.
Example:
8 27
LR210
101101000100100010010110101
LR211
011101101100100010010111101
LR212
001101001100100110010111101
LR213
101111011111111110000101101
LR214
001110011000100000000000000
LR215
101111011101101110000101101
LR216
111101101110000110011111001
LR217
111111011111111111000101101
NOTES
1. This script is used by mGDE for File
--> Import Discrete Data from CSV file.
AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist