update Oct 2, 2011
NAME
csv2phyl.sh - Convert molecular marker data from csv format to Phylip format
This program is obsolete. Instead, use phylcnv.py, which handles a larger number of
file formats.
SYNOPSIS
csv2phyl.sh [csvfile]

DESCRIPTION

This script reads a file containing discrete data (eg. molecular markers) and writes it in Phylip non-interleaved format. The csvfiile is a comma-separated value file of the type produced by exporting a spreadsheet (eg. OpenOffice Calc or MS-Excell) to a .csv file.  If  csv file is specified, the script will read the csv file and write a Phylip file removing the .csv or .CSV extension (if any) and replacing it with .phyl. Otherwise, csv2phyl.sh will take input from the standard input and write to the standard output.

Some spreadsheets export .csv files, enclosing each data item in double quotes ("). csv2phyl.sh removes all double quote characters before writing to the output file.


INPUT
The input file consists of one or more lines lines of comma-separated marker data, in which the first field is the name of the marker, and all other fields are single characters.

Example:
LR210,1,0,1,1,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,1,0,1,1,0,1,0,1
LR211,0,1,1,1,0,1,1,0,1,1,0,0,1,0,0,0,1,0,0,1,0,1,1,1,1,0,1
LR212,0,0,1,1,0,1,0,0,1,1,0,0,1,0,0,1,1,0,0,1,0,1,1,1,1,0,1
LR213,1,0,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,0,0,0,0,1,0,1,1,0,1
LR214,0,0,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0
LR215,1,0,1,1,1,1,0,1,1,1,0,1,1,0,1,1,1,0,0,0,0,1,0,1,1,0,1
LR216,1,1,1,1,0,1,1,0,1,1,1,0,0,0,0,1,1,0,0,1,1,1,1,1,0,0,1
LR217,1,1,1,1,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,0,0,1,0,1,1,0,1

OUTPUT
The output is a Phylip non-interleaved data file. The first line has integers telling the number of isolates/species/strains, and the number of markers. Each subsequent line has an isolate/species/strain name of exactly 10 characters, followed by marker data. If the name is greater than 10 characters, it is truncated. If it is less than 10 characters, it is padded with blanks to 10 characters.

Example:
8 27
LR210           101101000100100010010110101
LR211           011101101100100010010111101
LR212           001101001100100110010111101
LR213           101111011111111110000101101
LR214           001110011000100000000000000
LR215           101111011101101110000101101
LR216           111101101110000110011111001
LR217           111111011111111111000101101

NOTES
1. This script is used by mGDE for File --> Import Discrete Data from CSV file.
 
AUTHOR
Dr. Brian Fristensky
Department of Plant Science
University of Manitoba
Winnipeg, MB  Canada R3T 2N2
frist@cc.umanitoba.ca
http://home.cc.umanitoba.ca/~frist