consambig Wiki The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and extending the Wiki pages. Function Create an ambiguous consensus sequence from a multiple alignment Description cons calculates a consensus sequence from a multiple sequence alignment. To obtain the consensus, the amino acid residue or nucleotide at each position is compared to the possible ambiguity codes. The consensus sequence uses the minimum ambiguity code match. Algorithm The ambiguity codes are defined in local data files. consambig uses these files to determine which codes match all residues or nucleotides in the input sequences at each position. Usage Here is a sample session with consambig % consambig Create an ambiguous consensus sequence from a multiple alignment Input (aligned) sequence set: dna.msf output sequence [dna.fasta]: aligned.consambig Go to the input files for this example Go to the output files for this example Command line arguments Create an ambiguous consensus sequence from a multiple alignment Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-sequence] seqset File containing a sequence alignment. [-outseq] seqout [.] Sequence filename and optional format (output USA) Additional (Optional) qualifiers: -name string Name of the consensus sequence (Any string) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outseq" associated qualifiers -osformat2 string Output seq format -osextension2 string File name extension -osname2 string Base file name -osdirectory2 string Output directory -osdbname2 string Database name to add -ossingle2 boolean Separate file for each entry -oufo2 string UFO features -offormat2 string Features format -ofname2 string Features file name -ofdirectory2 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit Input file format The USA of a set of aligned sequences. Input files for usage example File: dna.msf !!NA_MULTIPLE_ALIGNMENT dna.msf MSF: 120 Type: N January 01, 1776 12:00 Check: 3196 .. Name: MSFM1 Len: 120 Check: 8587 Weight: 1.00 Name: MSFM2 Len: 120 Check: 6178 Weight: 1.00 Name: MSFM3 Len: 120 Check: 8431 Weight: 1.00 // MSFM1 ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC MSFM2 ACGTACGTAC GTACGTACGT ....ACGTAC GTACGTACGT ACGTACGTAC MSFM3 ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT CGTACGTACG MSFM1 GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT MSFM2 GTACGTACGT ACGTACGTAC GTACGTACGT ACGTACGTAC GTACGTACGT MSFM3 TACGTACGTA CGTACGTACG TACGTACGTA ACGTACGTAC GTACGTACGT MSFM1 ACGTACGTAC GTACGTACGT MSFM2 ACGTACGTTG CAACGTACGT MSFM3 ACGTACGTAC GTACGTACGT Output file format The output consists of a sequence file holding the consensus sequence. Output files for usage example File: aligned.consambig >EMBOSS_001 ACGTACGTACGTACGTACGTacgtACGTACGTACGTACGTMSKWMSKWMSKWMSKWMSKW MSKWMSKWMSKWMSKWMSKWACGTACGTACGTACGTACGTACGTACGTWSSWACGTACGT Data files consambig uses the standard files Ebases.iub and Eresidues.iub in the EMBOSS data directory. EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA. To see the available EMBOSS data files, run: % embossdata -showall To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run: % embossdata -fetch -file Exxx.dat Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata". The directories are searched in the following order: * . (your current directory) * .embossdata (under your current directory) * ~/ (your home directory) * ~/.embossdata Notes None. References None. Warnings None. Diagnostic Error Messages None. Exit status It always exits with status 0. Known bugs None. See also Program name Description cons Create a consensus sequence from a multiple alignment megamerger Merge two large overlapping DNA sequences merger Merge two overlapping sequences Author(s) Jon Ison European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. History Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts. Comments None