EMBOSS Applications

Introduction
EMBOSS applications
EMBASSY applications
Application groups
Proposed new applications

Introduction

The applications are listed in alphabetical order in the tables below. They are also organised into groups of related functionality. There is also a table of areas requiring software development which includes proposed new applications. Please send suggestions for new applications to emboss@emboss.open-bio.org.

EMBOSS applications

Please send bug reports to emboss-bug@emboss.open-bio.org.

< tr>

Program name	Description
aaindexextract	Extract data from AAINDEX
abiview	Reads ABI file and display the trace
acdc	Tests definition files for any EMBOSS application.
aligncopy	Reads and writes alignments
antigenic	Finds antigenic sites in proteins
backtranambig	Back translate a protein sequence to ambiguous codons
backtranseq	Back translate a protein sequence
banana	Bending and Curvature Plot in B-DNA
biosed	Replace or delete sequence sections
btwisted	Calculates the twisting in a B-DNA sequence
cacheensembl	Prepares an EMBOSS cache file for an Ensembl server
cai	CAI codon usage statistic
chaos	Create a chaos plot for a sequence.
charge	Protein charge plot
checktrans	ORF property statistics
chips	Codon usage statistics
cirdna	Draws circular maps of DNA constructs
codcmp	Codon usage table comparison
coderet	Extract CDS, mRNA and translations from feature tables
compseq	Counts the composition of dimer/trimer/etc words in a sequence
cons	Creates a consensus from multiple alignments
consambig	Create an ambiguous consensus sequence from a multiple alignment
cpgplot	Plot CpG rich areas
cpgreport	Reports CpG rich regions
cusp	Create a codon usage table
cutgextract	Extract data from CUTG
cutseq	Removes a specified section from a sequence.
dan	Plot melting temperatures for DNA.
dbiblast	Database indexing for BLAST 1 and 2 indexed databases
dbifasta	Index a fasta database
dbiflat	Database indexing for flat file databases
dbigcg	Database indexing for GCG formatted databases
dbxcompress	Compress an uncompressed dbx index
dbxedam	Index the EDAM ontology using b+tree indices
dbxfasta	Database b+tree indexing for fasta file databases
dbxflat	Database b+tree indexing for flat file databases
dbxgcg	Database b+tree indexing for GCG formatted databases
dbxreport	Validate index and report internals for dbx databases
dbxresource	Index a data resource catalogue using b+tree indices
dbxtax	Index NCBI taxonomy using b+tree indices
dbxuncompress	Uncompress a compressed dbx index
degapseq	Removes gap characters from sequences
descseq	Alter the name or description of a sequence.
diffseq	Find differences between nearly identical sequences
distmat	Creates a distance matrix from multiple alignments
dotmatcher	Produces a dotplot of two sequences.
dotpath	Displays a non-overlapping wordmatch dotplot of two sequences
dottup	DNA sequence dot plot
dreg	Regular expression search of a nucleotide sequence
drfinddata	Find public databases by data type
drfindformat	Find public databases by format
drfindid	Find public databases by identifier
drfindresource	Find public databases by resource
drget	Get data resource entries
drtext	Get data resource entries complete text
edamdef	Find EDAM ontology terms by definition
edamhasinput	Find EDAM ontology terms by has_input relation
edamhasoutput	Find EDAM ontology terms by has_output relation
edamisformat	Find EDAM ontology terms by is_format_of relation
edamisid	Find EDAM ontology terms by is_identifier_of relation
edamname	Find EDAM ontology terms by name
edialign	Local multiple alignment of sequences
einverted	Finds DNA inverted repeats
embossdata	Finds or fetches the data files read in by the EMBOSS programs
embossversion	Writes the current EMBOSS version number
emma	Multiple alignment program
emowse	Protein identification by mass spectrometry
entret	Reads and writes (returns) flatfile entries
epestfind	Finds PEST motifs as potential proteolytic cleavage sites
eprimer3	Picks PCR primers and hybridization oligos
eprimer32	Picks PCR primers and hybridization oligos
equicktandem	Finds tandem repeats
est2genome	Align EST and genomic DNA sequences
etandem	Looks for tandem repeats in a nucleotide sequence.
extractalign	Extract regions from a sequence alignment
extractfeat	Extract features from a sequence
extractseq	Extract regions from a sequence.
featcopy	Return a feature table
featreport	Reads and writes a feature table
feattext	Return a feature table original text
findkm	Calculates Km and Vmax for an enzyme reaction
freak	Residue/base frequency table or plot
fuzznuc	Nucleic acid pattern search
fuzzpro	Protein pattern search
fuzztran	Protein pattern search after translation
garnier	Predicts protein secondary structure
geecee	Calculates the fractional GC content of nucleic acid sequences
getorf	Finds and extracts open reading frames (ORFs)
godef	Find GO ontology terms by definition
goname	Find GO ontology terms by name
helixturnhelix	Finds nucleic acid binding domains.
hmoment	Hydrophobic moment calculation
iep	Calculates the isoelectric point of a protein
infoalign	Information on a multiple sequence alignment
infobase	Return information on a given nucleotide base
inforesidue	Return information on a given amino acid residue
infoseq	Displays some simple information about sequences
isochore	Plots isochores in large DNA sequences
jaspextract	Extract data from JASPAR
jaspscan	Scans DNA sequences for transcription factors
jembossctl	Jemboss Authentication Control
lindna	Draws linear maps of DNA constructs
listor	Writes a list file of the logical OR of two sets of sequences
makenucseq	Create random nucleotide sequences
makeprotseq	Create random protein sequences
marscan	Finds MAR/SAR sites in nucleic sequences
maskambignuc	Masks all ambiguity characters in nucleotide sequences with N
maskambigprot	Masks all ambiguity characters in protein sequences with X
maskfeat	Mask off features of a sequence
maskseq	Mask off regions of a sequence.
matcher	Local alignment of two sequences
megamerger	Merge two large overlapping nucleic acid sequences
merger	Merge two overlapping sequences
msbar	Mutate sequence beyond all recognition
mwcontam	Shows molwts that match across a set of files
mwfilter	Filter noisy molwts from mass spec output
needle	Needleman-Wunsch global alignment.
needleall	Many-to-many pairwise alignments of two sequence sets
newcpgreport	Report CpG rich areas
newcpgseek	Reports CpG rich regions
newseq	Type in a short new sequence.
nohtml	Remove mark-up (e.g. HTML tags) from an ASCII text file
noreturn	Removes carriage return from ASCII files
nospace	Remove whitespace from an ASCII text file
notab	Replace tabs with spaces in an ASCII text file
notseq	Excludes a set of sequences and writes out the remaining ones
nthseq	Writes one sequence from a multiple set of sequences
nthseqset	Reads and writes (returns) one set of sequences from many
octanol	Displays protein hydropathy
oddcomp	Finds protein sequence regions with a biased composition.
ontocount	Count ontology term(s)
ontoget	Get ontology term(s)
ontogetcommon	Get common ancestor for terms
ontogetdown	Get ontology term(s) by parent id
ontogetobsolete	Get ontology ontology terms
ontogetroot	Get ontology root terms by child identifier
ontogetsibs	Get ontology term(s) by id with common parent
ontogetup	Get ontology term(s) by id of child
ontoisobsolete	Report whether an ontology term id is obsolete
ontotext	Get ontology term(s) original full text
palindrome	Looks for inverted repeats in a nucleotide sequence.
pasteseq	Insert one sequence into another.
patmatdb	Matching a Prosite motif against a Protein Sequence Database.
patmatmotifs	Compares a protein sequence to the PROSITE motif database.
pepcoil	Predicts coiled coil regions
pepdigest	Protein proteolytic enzyme or reagent cleavage digest
pepinfo	Plots simple amino acid properties in parallel
pepnet	Protein helical net plot
pepstats	Protein statistics
pepwheel	Shows protein sequences as helices
pepwindow	Displays protein hydropathy
pepwindowall	Displays protein hydropathy of a set of sequences
plotcon	Plots the quality of conservation of a sequence alignment
plotorf	Plot potential open reading frames
polydot	Multiple dotplot
preg	Regular expression search of a protein sequence
prettyplot	Displays aligned sequences, with colouring and boxing.
prettyseq	Output sequence with translated ranges
primersearch	Searches DNA sequences for matches with primer pairs
printsextract	Preprocesses the PRINTS database for use with the program PSCAN
profit	Scan a sequence or database with a matrix or profile
prophecy	Creates matrices/profiles from multiple alignments
prophet	Gapped alignment for profiles
prosextract	Extracts ID, AC, and PA lines from the PROSITE motif database.
pscan	Locates fingerprints (multiple motif features) in a protein sequence.
psiphi	Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file
rebaseextract	Extract data from REBASE
recoder	Find and remove restriction sites but maintain the same translation
redata	Isoschizomers, references and Suppliers for Restriction Enzymes
remap	Display a sequence with restriction cut sites, translation etc..
restover	Finds restriction enzymes that produce a specific overhang
restrict	Finds Restriction Enzyme Cleavage Sites
revseq	Reverse and complement a sequence.
seealso	Finds programs sharing group names
seqcount	Reads and counts sequences
seqmatchall	Does an all-against-all comparison of a set of sequences
seqret	Reads and writes (returns) a sequence.
seqretsplit	Reads and writes (returns) sequences in individual files
seqxref	Retrieve all database cross-references for a sequence entry
servertell	Display information about a public server
showalign	Display a multiple sequence alignment
showdb	Displays information on the currently available databases
showfeat	Show features of a sequence.
showorf	Pretty output of DNA translations
showpep	Displays protein sequences with features in pretty format
showseq	Display a sequence with features, translation etc
showserver	Displays information on configured servers
shuffleseq	Shuffles a set of sequences maintaining composition
sigcleave	Predicts signal peptide cleavage sites
silent	Silent mutation restriction enzyme scan
sirna	Finds siRNA duplexes in mRNA
sixpack	Display a DNA sequence with 6-frame translation and ORFs
sizeseq	Sort sequences by size
skipredundant	Remove redundant sequences from an input set
skipseq	Reads and writes (returns) sequences, skipping the first few
splitsource	Split a sequence into original source sequences.
splitter	Split a sequence into (overlapping) smaller sequences.
stretcher	Global alignment of two sequences.
stssearch	Searches a DNA database for matches with a set of STS primers
supermatcher	Finds a match of a large sequence against one or more sequences
syco	Synonymous codon usage Gribskov statistic plot
taxget	Get taxon(s)
taxgetdown	Get descendants of taxon(s)
taxgetrank	Get parents of taxon(s)
taxgetspecies	Get all species under taxon(s)
taxgetup	Get parents of taxon(s)
tcode	Fickett TESTCODE statistic to identify protein-coding DNA
textget	Get text data entries
textsearch	Search sequence documentation text. SRS and Entrez are faster!
tfextract	Extract data from TRANSFAC
tfm	Displays a program's help documentation manual
tfscan	Scans DNA sequences for transcription factors.
tmap	Predict transmembrane proteins
tranalign	Align nucleic coding regions given the aligned proteins
transeq	Translates nucleic acid sequences.
trimest	Trim poly-A tails off EST sequences
trimseq	Trim ambiguous bits off the ends of sequences
trimspace	Remove extra whitespace from an ASCII text file
twofeat	Finds neighbouring pairs of features in sequences
union	Reads sequence fragments and builds one sequence
vectorstrip	Strips out DNA between a pair of vector sequences
water	Smith-Waterman local alignment.
whichdb	Search all databases for an entry
wobble	Wobble base plot
wordcount	Counts words of a specified size in a DNA sequence.
wordfinder	Match large sequences against one or more other sequences
wordmatch	Finds all exact matches of a given size between 2 sequences
wossdata	Finds programs by EDAM data
wossinput	Finds programs by EDAM data for inputs
wossname	Finds programs by keywords in their one-line documentation.
wossoperation	Finds programs by EDAM operation
wossoutput	Finds programs by EDAM data for outputs
wossparam	Finds programs by EDAM data for parameters
wosstopic	Finds programs by EDAM topic
yank	Reads a range from a sequence, appends the full USA to a list file

EMBASSY applications

The EMBASSY grouping includes applications and packages for specialised sequence analysis and non-sequence based analysis, as well as software included from third parties who have their own licencing terms. EMBOSS is GPL licensed. The libraries are under the Lesser GPL (LGPL). This allows the EMBOSS libraries to link to other software and only requires that software to have an LGPL-compatible licence. Phylip, for example, fits this model. EMBASSY applications have the same look and feel as EMBOSS aplications.

EMBASSY - PHYLIP

The PHYLIP programs in this EMBASSY package were ported from release 3.572.

PHYLIP 3.61 has been converted as PHYLIPNEW and was released with EMBOSS 3.0.0 as a beta version.

EMBASSY - PHYLIPNEW

The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).

The PHYLIPNEW versions of these programs all have the prefix "f" to distinguish them from the original programs.

Program name	Description
fclique	Largest clique program
fconsense	Majority-rule and strict consensus tree
fcontml	Continuous character Maximum Likelihood method
fcontrast	Continuous character Contrasts
fdiscboot	Bootstrapped discrete sites algorithm
fdnacomp	DNA compatibility algorithm
fdnadist	Nucleic acid sequence Distance Matrix program
fdnainvar	Nucleic acid sequence Invariants method
fdnaml	Estimates phylogenies from nucleic acid sequence Maximum Likelihood
fdnamlk	Estimates phylogenies from nucleic acid sequence Maximum Likelihood with molecular clock
fdnamove	Interactive DNA parsimony
fdnapars	DNA parsimony algorithm
fdnapenny	Penny algorithm for DNA
fdollop	Dollo and polymorphism parsimony algorithm
fdolmove	Interactive Dollo and Polymorphism Parsimony
fdolpenny	Penny algorithm Dollo or polymorphism
fdrawgram	Plots a cladogram- or phenogram-like rooted tree diagram
fdrawtree	Plots an unrooted tree diagram
ffactor	Multistate to binary recoding program
ffitch	Fitch-Margoliash and Least-Squares Distance Methods
ffreqboot	Bootstrapped sequences algorithm
fgendist	Compute genetic distances from gene frequencies
fkitsch	Fitch-Margoliash method with contemporary tips
fmix	Mixed parsimony algorithm
fmove	Interactive mixed method parsimony
fneighbor	Phylogenies from distance matrix by N-J or UPGMA method
fpars	Discrete character parsimony
fpenny	Penny algorithm, branch-and-bound to find all most parsimonious trees
fproml	Protein maximum Likelihood program
fpromlk	Protein maximum Likelihood program with molecular clock
fprotdist	Protein distance algorithm
fprotpars	Protein parsimony algorithm
frestboot	Bootstrapped sequences algorithm
frestdist	compute distance matrix from restriction sites or fragments
frestml	Restriction site maximum Likelihood method
fretree	Interactive tree rearrangement
fseqboot	Bootstrapped sequences algorithm
fseqbootall	Bootstrapped sequences algorithm
ftreedist	Distances between trees
ftreedistpair	Distances between trees

EMBASSY - DOMAINATRIX

The DOMAINATRIX programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program name	Description
cathparse	Reads raw CATH classification files and writes a DCF file.
domainreso	Removes low resolution domains from a DCF file.
domainseqs	Adds sequence records to a DCF file.
domainnr	Removes redundant domains from a DCF file. The file must contain domain sequence information which can be added by using DOMAINSEQS.
domainsse	Adds secondary structure records to a DCF file.
scopparse	Reads raw SCOP classification files and writes a DCF file.
ssematch	Searches a DCF file for secondary structure matches. The file must contain domain secondary structure information which can be added by using DOMAINSEQS.

EMBASSY - DOMALIGN

The DOMALIGN programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program name	Description
allversusall	Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values.
domainrep	Reorder DCF file so that the representative structure of each user-specified node is given first.
domainalign	Generates structure-based sequence alignments for nodes in a DCF file.
seqalign	Reads a DAF file and a DHF and writes a DAF file extended with the hits.

EMBASSY - DOMSEARCH

The DOMSEARCH programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program name	Description
seqsearch	Generate DHF files of database hits (sequences) from a DAF file (or other file of sequences) by using PSI-BLAST.
seqfraggle	Removes fragments from DHF files (or other files of sequences).
seqsort	Reads DHF files of database hits (sequences) and removes hits of ambiguous classification.
seqnr	Removes redundancy from DHF files (or other files of sequences).
seqwords	Generates DHF files of database hits (sequences) from Swissprot matching keywords from a keywords file.

EMBASSY - SIGNATURE

The SIGNATURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program name	Description
libgen	Generates various type of discriminator for each alignment in a directory.
libscan	Generates hits (sequences in a domain hits file) from searches of various types of discriminator (HMMs, profiles etc) against a sequence database. Or generates hits from screening sequences against a library of such discriminators.
matgen3d	Generates a 3D-1D scoring matrix from CCF files (clean coordinate files).
rocon	Reads a DHF file of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a "hits file" for the hits, which are classified and rank-ordered on the basis of score.
rocplot	A generic and flexible tool for interpretation and graphical display of the performance of predictive methods using receiver Operator Characteristic (ROC) analysis.
siggen	Generates a sparse protein signature from an alignment and residue contact data.
siggenlig	Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts.
sigscan	Generates a DHF of hits (sequences) from scanning a signature against a sequence database.
sigscanlig	Generates a LHF (ligand hits file) of hits (sequences) from scanning a sequence against a library of ligand-binding signatures

EMBASSY - STRUCTURE

The STRUCTURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program name	Description
contacts	Reads CCF files and writes CON files of intra-chain residue-residue contact data.
domainer	Reads CCF files for proteins and writes CCF files for domains in a DCF file.
hetparse	Converts raw dictionary of heterogen groups to EMBL-like format.
interface	Reads protein CCF files and writes CON files of inter-chain residue-residue contact data.
pdbparse	Parses PDB files and writes CCF files for proteins.
pdbplus	Add records for residue solvent accessibility and secondary structure to a CCF file.
pdbtosp	Convert raw swissprot:PDB equivalence file to EMBL-like format.
sites	Reads CCF files and writes CON files of residue-ligand contact data for domains in a DCF file.

EMBASSY - HMMEROLD

The HMMEROLD programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.1.1.

The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.

Program name	Description
oalistat	Statistics for multiple alignment files
ohmmalign	Align sequences with an HMM
ohmmbuild	Build HMM
ohmmcalibrate	Calibrate a hidden Markov model
ohmmconvert	Convert between HMM formats
ohmmemit	Extract HMM sequences
ohmmfetch	Extract HMM from a database
ohmmindex	Index an HMM database
ohmmpfam	Align single sequence with an HMM
ohmmsearch	Search sequence database with an HMM

EMBASSY - HMMERNew

The HMMER programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.3.2.

The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.

Program name	Description
ealistat	Statistics for multiple alignment files
ehmmalign	Align sequences with an HMM
ehmmbuild	Build HMM
ehmmcalibrate	Calibrate a hidden Markov model
ehmmconvert	Convert between HMM formats
ehmmemit	Extract HMM sequences
ehmmfetch	Extract HMM from a database
ehmmindex	Index an HMM database
ehmmpfam	Align single sequence with an HMM
ehmmsearch	Search sequence database with an HMM

EMBASSY - VIENNA

These programs are adapted from the VIENNA RNA package.

This is currently under development, and is available only from the CVS server. We hope to make a beta release in the near future, but there is much work to be done on sequence formats and testing. The programs are listed in alphabetical order:

Program name	Author(s)	Description
vrnaalifold	Ivo Hofacker	RNA alignment folding
vrnaalifoldpf	Ivo Hofacker	RNA alignment folding with partition
vrnacofold	Ivo Hofacker	RNA cofolding
vrnacofoldconc	Ivo Hofacker	RNA cofolding with concentrations
vrnacofoldpf	Ivo Hofacker	RNA cofolding with partitioning
vrnadistance	Ivo Hofacker	RNA distances
vrnaduplex	Ivo Hofacker	RNA duplex calculation
vrnaeval	Ivo Hofacker	RNA eval
vrnaevalpair	Ivo Hofacker	RNA eval with cofold
vrnafold	Ivo Hofacker	Calculate secondary structures of RNAs
vrnafoldpf	Ivo Hofacker	Secondary structures of RNAs with partition
vrnaheat	Ivo Hofacker	RNA melting
vrnainverse	Ivo Hofacker	RNA sequences matching a structure
vrnalfold	Ivo Hofacker	Calculate locally stable secondary structures of RNAs
vrnapaln	Ivo Hofacker	RNA alignment
vrnaplot	Ivo Hofacker	Plot vrnafold output
vrnasubopt	Ivo Hofacker	Calculate RNA suboptimals

EMBASSY - OTHERS

Other EMBASSY packages with single applications. These are contributed single programs, or conversions of single programs.

Program name	Description
emnu	Simple menu of EMBOSS applications
esim4	Align an mRNA to a genomic DNA sequence
meme	Motif detection
mse	Conversion of Will Gilbert's MSE editor
topo	Conversion of Susan Jean Johns' TOPO
crystalball	Answers every drug discovery question you have about this sequence

Application groups

The EMBOSS applications are organized into logical groups according to their function. See the Application Groups Documentation for more information.

Proposed new applications

This is a list of areas requiring software development including putative new applications.