reform update 3 Feb 94 NAME reform - reformats multiply-aligned sequences for printing. SYNOPSIS reform [-gpcnm] [-fx] [-sn] [-ln] [file {ralign only}] or ralign file parameters | reform [-gpcn] [-sn] [-ln] file DESCRIPTION g Gaps are to be represented by dashes (-). p Bases which agree with the consensus are represented by periods (.). c Positions at which all sequences agree are capitalized in the consensus. n Sequence data is nucleic acid. Protein default fx Specify input file format, where x is r:RALIGN (default) p:PEARSON i:MBCRR-MASE (Intelligenetics) m Input file contains multiline format sequences already aligned, as opposed to ralign output. This option is obsolete, and is equivalent to -fp. ln The output linelength is set to n. Default is 70. sn numbering starts with n (default=0) file Sequence file as described in ralign docu- mentation. reform needs to re-read the sequence file read by ralign to get the names of the sequences, which ralign ignores. This filename is only included for ralign output. If -m is set, file is ignored, and sequence names must be read from the input. Note that positions in the consensus at which no nucleotide is in the majority are represented by n's (for nucleic acids) or x's (for proteins), rather than periods, as in ralign. Gaps in the input sequences may be represented by either blanks or dashes. INPUT FILE FORMATS (a) ralign (default, -fr) As described in ralign documentation, the input file (which is assumed to be ralign output) must have each sequence on a single long line. All characters on a given line will be included in the alignment. All lines must be exactly the same length. For example, if ralign had been read sequence from a file called 'allcab.seq' and written output to 'allcab.ral', the following command might be used: reform allcab.seq allcab.ref (b) Pearson (-fp, -m) Compatible with sequence files used by Pearson's fasta programs as shown: >name1 sequence1 >name2 sequence2 ... >namen sequencen Sequences may run over many lines and line length does not have to be uniform. However, both dashes ('-') and blanks (' ') will be read in as gaps in the alignment. A right arrow (>) at the beginning of a line indicates the name line at the beginning of a new sequence. Any line beginning with a semicolon (';') will be considered a comment, and will be ignored. (c) MBCRR-MASE (Intelligenetics) (-fi) Compatible with .mase files produced by MBCRR's mase and pima programs, which use the Intelligenetics format as shown: ;one or more comment lines name1 sequence1 ;one or more comment lines name2 sequence2 ... ;one or more comment lines namen sequencen Sequences may run over many lines and line length does not have to be uniform. However, both dashes ('-') and blanks (' ') will be read in as gaps in the alignment. Each sequence MUST begin with at least one comment line. When a comment line is encountered, that signals the beginning of a new sequence. The first line after the comment is read as the name, and the sequence begins on the next line after that. SEE ALSO ralign, mase AUTHOR Dr. Brian Fristensky Dept. of Plant Science University of Manitoba Winnipeg, MB Canada R3T 2N2 Phone: 204-474-6085 FAX: 204-261-5732 frist@cc.umanitoba.ca REFERENCE Fristensky, B. (1993) Feature expressions: creating and manipulating sequence datasets. Nucleic Acids Research 21:5997-6003.