EMBOSS Applications

Contents


Introduction

The applications are listed in alphabetical order in the tables below. They are also organised into groups of related functionality. There is also a table of areas requiring software development which includes proposed new applications. Please send suggestions for new applications to emboss@emboss.open-bio.org.


EMBOSS applications

Please send bug reports to emboss-bug@emboss.open-bio.org.

< tr>
Program nameDescription
aaindexextract Extract data from AAINDEX
abiview Reads ABI file and display the trace
acdc Tests definition files for any EMBOSS application.
aligncopy Reads and writes alignments
antigenic Finds antigenic sites in proteins
backtranambig Back translate a protein sequence to ambiguous codons
backtranseq Back translate a protein sequence
banana Bending and Curvature Plot in B-DNA
biosed Replace or delete sequence sections
btwisted Calculates the twisting in a B-DNA sequence
cacheensembl Prepares an EMBOSS cache file for an Ensembl server
cai CAI codon usage statistic
chaos Create a chaos plot for a sequence.
charge Protein charge plot
checktrans ORF property statistics
chips Codon usage statistics
cirdna Draws circular maps of DNA constructs
codcmp Codon usage table comparison
coderet Extract CDS, mRNA and translations from feature tables
compseq Counts the composition of dimer/trimer/etc words in a sequence
cons Creates a consensus from multiple alignments
consambig Create an ambiguous consensus sequence from a multiple alignment
cpgplot Plot CpG rich areas
cpgreport Reports CpG rich regions
cusp Create a codon usage table
cutgextract Extract data from CUTG
cutseq Removes a specified section from a sequence.
dan Plot melting temperatures for DNA.
dbiblast Database indexing for BLAST 1 and 2 indexed databases
dbifasta Index a fasta database
dbiflat Database indexing for flat file databases
dbigcg Database indexing for GCG formatted databases
dbxcompress Compress an uncompressed dbx index
dbxedam Index the EDAM ontology using b+tree indices
dbxfasta Database b+tree indexing for fasta file databases
dbxflat Database b+tree indexing for flat file databases
dbxgcg Database b+tree indexing for GCG formatted databases
dbxreport Validate index and report internals for dbx databases
dbxresource Index a data resource catalogue using b+tree indices
dbxtax Index NCBI taxonomy using b+tree indices
dbxuncompress Uncompress a compressed dbx index
degapseq Removes gap characters from sequences
descseq Alter the name or description of a sequence.
diffseq Find differences between nearly identical sequences
distmat Creates a distance matrix from multiple alignments
dotmatcher Produces a dotplot of two sequences.
dotpath Displays a non-overlapping wordmatch dotplot of two sequences
dottup DNA sequence dot plot
dreg Regular expression search of a nucleotide sequence
drfinddata Find public databases by data type
drfindformat Find public databases by format
drfindid Find public databases by identifier
drfindresource Find public databases by resource
drget Get data resource entries
drtext Get data resource entries complete text
edamdef Find EDAM ontology terms by definition
edamhasinput Find EDAM ontology terms by has_input relation
edamhasoutput Find EDAM ontology terms by has_output relation
edamisformat Find EDAM ontology terms by is_format_of relation
edamisid Find EDAM ontology terms by is_identifier_of relation
edamname Find EDAM ontology terms by name
edialign Local multiple alignment of sequences
einverted Finds DNA inverted repeats
embossdata Finds or fetches the data files read in by the EMBOSS programs
embossversion Writes the current EMBOSS version number
emma Multiple alignment program
emowse Protein identification by mass spectrometry
entret Reads and writes (returns) flatfile entries
epestfind Finds PEST motifs as potential proteolytic cleavage sites
eprimer3 Picks PCR primers and hybridization oligos
eprimer32 Picks PCR primers and hybridization oligos
equicktandem Finds tandem repeats
est2genome Align EST and genomic DNA sequences
etandem Looks for tandem repeats in a nucleotide sequence.
extractalign Extract regions from a sequence alignment
extractfeat Extract features from a sequence
extractseq Extract regions from a sequence.
featcopy Return a feature table
featreport Reads and writes a feature table
feattext Return a feature table original text
findkm Calculates Km and Vmax for an enzyme reaction
freak Residue/base frequency table or plot
fuzznuc Nucleic acid pattern search
fuzzpro Protein pattern search
fuzztran Protein pattern search after translation
garnier Predicts protein secondary structure
geecee Calculates the fractional GC content of nucleic acid sequences
getorf Finds and extracts open reading frames (ORFs)
godef Find GO ontology terms by definition
goname Find GO ontology terms by name
helixturnhelix Finds nucleic acid binding domains.
hmoment Hydrophobic moment calculation
iep Calculates the isoelectric point of a protein
infoalign Information on a multiple sequence alignment
infobase Return information on a given nucleotide base
inforesidue Return information on a given amino acid residue
infoseq Displays some simple information about sequences
isochore Plots isochores in large DNA sequences
jaspextract Extract data from JASPAR
jaspscan Scans DNA sequences for transcription factors
jembossctl Jemboss Authentication Control
lindna Draws linear maps of DNA constructs
listor Writes a list file of the logical OR of two sets of sequences
makenucseq Create random nucleotide sequences
makeprotseq Create random protein sequences
marscan Finds MAR/SAR sites in nucleic sequences
maskambignuc Masks all ambiguity characters in nucleotide sequences with N
maskambigprot Masks all ambiguity characters in protein sequences with X
maskfeat Mask off features of a sequence
maskseq Mask off regions of a sequence.
matcher Local alignment of two sequences
megamerger Merge two large overlapping nucleic acid sequences
merger Merge two overlapping sequences
msbar Mutate sequence beyond all recognition
mwcontam Shows molwts that match across a set of files
mwfilter Filter noisy molwts from mass spec output
needle Needleman-Wunsch global alignment.
needleall Many-to-many pairwise alignments of two sequence sets
newcpgreport Report CpG rich areas
newcpgseek Reports CpG rich regions
newseq Type in a short new sequence.
nohtml Remove mark-up (e.g. HTML tags) from an ASCII text file
noreturn Removes carriage return from ASCII files
nospace Remove whitespace from an ASCII text file
notab Replace tabs with spaces in an ASCII text file
notseq Excludes a set of sequences and writes out the remaining ones
nthseq Writes one sequence from a multiple set of sequences
nthseqset Reads and writes (returns) one set of sequences from many
octanol Displays protein hydropathy
oddcomp Finds protein sequence regions with a biased composition.
ontocount Count ontology term(s)
ontoget Get ontology term(s)
ontogetcommon Get common ancestor for terms
ontogetdown Get ontology term(s) by parent id
ontogetobsolete Get ontology ontology terms
ontogetroot Get ontology root terms by child identifier
ontogetsibs Get ontology term(s) by id with common parent
ontogetup Get ontology term(s) by id of child
ontoisobsolete Report whether an ontology term id is obsolete
ontotext Get ontology term(s) original full text
palindrome Looks for inverted repeats in a nucleotide sequence.
pasteseq Insert one sequence into another.
patmatdb Matching a Prosite motif against a Protein Sequence Database.
patmatmotifs Compares a protein sequence to the PROSITE motif database.
pepcoil Predicts coiled coil regions
pepdigest Protein proteolytic enzyme or reagent cleavage digest
pepinfo Plots simple amino acid properties in parallel
pepnet Protein helical net plot
pepstats Protein statistics
pepwheel Shows protein sequences as helices
pepwindow Displays protein hydropathy
pepwindowall Displays protein hydropathy of a set of sequences
plotcon Plots the quality of conservation of a sequence alignment
plotorf Plot potential open reading frames
polydot Multiple dotplot
preg Regular expression search of a protein sequence
prettyplot Displays aligned sequences, with colouring and boxing.
prettyseq Output sequence with translated ranges
primersearch Searches DNA sequences for matches with primer pairs
printsextract Preprocesses the PRINTS database for use with the program PSCAN
profit Scan a sequence or database with a matrix or profile
prophecy Creates matrices/profiles from multiple alignments
prophet Gapped alignment for profiles
prosextract Extracts ID, AC, and PA lines from the PROSITE motif database.
pscan Locates fingerprints (multiple motif features) in a protein sequence.
psiphi Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file
rebaseextract Extract data from REBASE
recoder Find and remove restriction sites but maintain the same translation
redata Isoschizomers, references and Suppliers for Restriction Enzymes
remap Display a sequence with restriction cut sites, translation etc..
restover Finds restriction enzymes that produce a specific overhang
restrict Finds Restriction Enzyme Cleavage Sites
revseq Reverse and complement a sequence.
seealso Finds programs sharing group names
seqcount Reads and counts sequences
seqmatchall Does an all-against-all comparison of a set of sequences
seqret Reads and writes (returns) a sequence.
seqretsplit Reads and writes (returns) sequences in individual files
seqxref Retrieve all database cross-references for a sequence entry
servertell Display information about a public server
showalign Display a multiple sequence alignment
showdb Displays information on the currently available databases
showfeat Show features of a sequence.
showorf Pretty output of DNA translations
showpep Displays protein sequences with features in pretty format
showseq Display a sequence with features, translation etc
showserver Displays information on configured servers
shuffleseq Shuffles a set of sequences maintaining composition
sigcleave Predicts signal peptide cleavage sites
silent Silent mutation restriction enzyme scan
sirna Finds siRNA duplexes in mRNA
sixpack Display a DNA sequence with 6-frame translation and ORFs
sizeseq Sort sequences by size
skipredundant Remove redundant sequences from an input set
skipseq Reads and writes (returns) sequences, skipping the first few
splitsource Split a sequence into original source sequences.
splitter Split a sequence into (overlapping) smaller sequences.
stretcher Global alignment of two sequences.
stssearch Searches a DNA database for matches with a set of STS primers
supermatcher Finds a match of a large sequence against one or more sequences
syco Synonymous codon usage Gribskov statistic plot
taxget Get taxon(s)
taxgetdown Get descendants of taxon(s)
taxgetrank Get parents of taxon(s)
taxgetspecies Get all species under taxon(s)
taxgetup Get parents of taxon(s)
tcode Fickett TESTCODE statistic to identify protein-coding DNA
textget Get text data entries
textsearch Search sequence documentation text. SRS and Entrez are faster!
tfextract Extract data from TRANSFAC
tfm Displays a program's help documentation manual
tfscan Scans DNA sequences for transcription factors.
tmap Predict transmembrane proteins
tranalign Align nucleic coding regions given the aligned proteins
transeq Translates nucleic acid sequences.
trimest Trim poly-A tails off EST sequences
trimseq Trim ambiguous bits off the ends of sequences
trimspace Remove extra whitespace from an ASCII text file
twofeat Finds neighbouring pairs of features in sequences
union Reads sequence fragments and builds one sequence
vectorstrip Strips out DNA between a pair of vector sequences
water Smith-Waterman local alignment.
whichdb Search all databases for an entry
wobble Wobble base plot
wordcount Counts words of a specified size in a DNA sequence.
wordfinder Match large sequences against one or more other sequences
wordmatch Finds all exact matches of a given size between 2 sequences
wossdata Finds programs by EDAM data
wossinput Finds programs by EDAM data for inputs
wossname Finds programs by keywords in their one-line documentation.
wossoperation Finds programs by EDAM operation
wossoutput Finds programs by EDAM data for outputs
wossparam Finds programs by EDAM data for parameters
wosstopic Finds programs by EDAM topic
yank Reads a range from a sequence, appends the full USA to a list file

EMBASSY applications

The EMBASSY grouping includes applications and packages for specialised sequence analysis and non-sequence based analysis, as well as software included from third parties who have their own licencing terms. EMBOSS is GPL licensed. The libraries are under the Lesser GPL (LGPL). This allows the EMBOSS libraries to link to other software and only requires that software to have an LGPL-compatible licence. Phylip, for example, fits this model. EMBASSY applications have the same look and feel as EMBOSS aplications.

EMBASSY - PHYLIP

The PHYLIP programs in this EMBASSY package were ported from release 3.572.

PHYLIP 3.61 has been converted as PHYLIPNEW and was released with EMBOSS 3.0.0 as a beta version.

EMBASSY - PHYLIPNEW

The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).

The PHYLIPNEW versions of these programs all have the prefix "f" to distinguish them from the original programs.

Program nameDescription
fclique Largest clique program
fconsense Majority-rule and strict consensus tree
fcontml Continuous character Maximum Likelihood method
fcontrast Continuous character Contrasts
fdiscboot Bootstrapped discrete sites algorithm
fdnacomp DNA compatibility algorithm
fdnadist Nucleic acid sequence Distance Matrix program
fdnainvar Nucleic acid sequence Invariants method
fdnaml Estimates phylogenies from nucleic acid sequence Maximum Likelihood
fdnamlk Estimates phylogenies from nucleic acid sequence Maximum Likelihood with molecular clock
fdnamove Interactive DNA parsimony
fdnapars DNA parsimony algorithm
fdnapenny Penny algorithm for DNA
fdollop Dollo and polymorphism parsimony algorithm
fdolmove Interactive Dollo and Polymorphism Parsimony
fdolpenny Penny algorithm Dollo or polymorphism
fdrawgram Plots a cladogram- or phenogram-like rooted tree diagram
fdrawtree Plots an unrooted tree diagram
ffactor Multistate to binary recoding program
ffitch Fitch-Margoliash and Least-Squares Distance Methods
ffreqboot Bootstrapped sequences algorithm
fgendist Compute genetic distances from gene frequencies
fkitsch Fitch-Margoliash method with contemporary tips
fmix Mixed parsimony algorithm
fmove Interactive mixed method parsimony
fneighbor Phylogenies from distance matrix by N-J or UPGMA method
fpars Discrete character parsimony
fpenny Penny algorithm, branch-and-bound to find all most parsimonious trees
fproml Protein maximum Likelihood program
fpromlk Protein maximum Likelihood program with molecular clock
fprotdist Protein distance algorithm
fprotpars Protein parsimony algorithm
frestboot Bootstrapped sequences algorithm
frestdist compute distance matrix from restriction sites or fragments
frestml Restriction site maximum Likelihood method
fretree Interactive tree rearrangement
fseqboot Bootstrapped sequences algorithm
fseqbootall Bootstrapped sequences algorithm
ftreedist Distances between trees
ftreedistpair Distances between trees

EMBASSY - DOMAINATRIX

The DOMAINATRIX programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program nameDescription
cathparse Reads raw CATH classification files and writes a DCF file.
domainreso Removes low resolution domains from a DCF file.
domainseqs Adds sequence records to a DCF file.
domainnr Removes redundant domains from a DCF file. The file must contain domain sequence information which can be added by using DOMAINSEQS.
domainsse Adds secondary structure records to a DCF file.
scopparse Reads raw SCOP classification files and writes a DCF file.
ssematch Searches a DCF file for secondary structure matches. The file must contain domain secondary structure information which can be added by using DOMAINSEQS.

EMBASSY - DOMALIGN

The DOMALIGN programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program nameDescription
allversusall Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values.
domainrep Reorder DCF file so that the representative structure of each user-specified node is given first.
domainalign Generates structure-based sequence alignments for nodes in a DCF file.
seqalign Reads a DAF file and a DHF and writes a DAF file extended with the hits.

EMBASSY - DOMSEARCH

The DOMSEARCH programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program nameDescription
seqsearch Generate DHF files of database hits (sequences) from a DAF file (or other file of sequences) by using PSI-BLAST.
seqfraggle Removes fragments from DHF files (or other files of sequences).
seqsort Reads DHF files of database hits (sequences) and removes hits of ambiguous classification.
seqnr Removes redundancy from DHF files (or other files of sequences).
seqwords Generates DHF files of database hits (sequences) from Swissprot matching keywords from a keywords file.

EMBASSY - SIGNATURE

The SIGNATURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program nameDescription
libgen Generates various type of discriminator for each alignment in a directory.
libscan Generates hits (sequences in a domain hits file) from searches of various types of discriminator (HMMs, profiles etc) against a sequence database. Or generates hits from screening sequences against a library of such discriminators.
matgen3d Generates a 3D-1D scoring matrix from CCF files (clean coordinate files).
rocon Reads a DHF file of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a "hits file" for the hits, which are classified and rank-ordered on the basis of score.
rocplot A generic and flexible tool for interpretation and graphical display of the performance of predictive methods using receiver Operator Characteristic (ROC) analysis.
siggen Generates a sparse protein signature from an alignment and residue contact data.
siggenlig Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts.
sigscan Generates a DHF of hits (sequences) from scanning a signature against a sequence database.
sigscanlig Generates a LHF (ligand hits file) of hits (sequences) from scanning a sequence against a library of ligand-binding signatures

EMBASSY - STRUCTURE

The STRUCTURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.

Program nameDescription
contacts Reads CCF files and writes CON files of intra-chain residue-residue contact data.
domainer Reads CCF files for proteins and writes CCF files for domains in a DCF file.
hetparse Converts raw dictionary of heterogen groups to EMBL-like format.
interface Reads protein CCF files and writes CON files of inter-chain residue-residue contact data.
pdbparse Parses PDB files and writes CCF files for proteins.
pdbplus Add records for residue solvent accessibility and secondary structure to a CCF file.
pdbtosp Convert raw swissprot:PDB equivalence file to EMBL-like format.
sites Reads CCF files and writes CON files of residue-ligand contact data for domains in a DCF file.

EMBASSY - HMMEROLD

The HMMEROLD programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.1.1.

The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.

Program nameDescription
oalistat Statistics for multiple alignment files
ohmmalign Align sequences with an HMM
ohmmbuild Build HMM
ohmmcalibrate Calibrate a hidden Markov model
ohmmconvert Convert between HMM formats
ohmmemit Extract HMM sequences
ohmmfetch Extract HMM from a database
ohmmindex Index an HMM database
ohmmpfam Align single sequence with an HMM
ohmmsearch Search sequence database with an HMM

EMBASSY - HMMERNew

The HMMER programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.3.2.

The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.

Program nameDescription
ealistat Statistics for multiple alignment files
ehmmalign Align sequences with an HMM
ehmmbuild Build HMM
ehmmcalibrate Calibrate a hidden Markov model
ehmmconvert Convert between HMM formats
ehmmemit Extract HMM sequences
ehmmfetch Extract HMM from a database
ehmmindex Index an HMM database
ehmmpfam Align single sequence with an HMM
ehmmsearch Search sequence database with an HMM

EMBASSY - VIENNA

These programs are adapted from the VIENNA RNA package.

This is currently under development, and is available only from the CVS server. We hope to make a beta release in the near future, but there is much work to be done on sequence formats and testing. The programs are listed in alphabetical order:

Program nameAuthor(s) Description
vrnaalifoldIvo Hofacker RNA alignment folding
vrnaalifoldpfIvo Hofacker RNA alignment folding with partition
vrnacofoldIvo Hofacker RNA cofolding
vrnacofoldconcIvo Hofacker RNA cofolding with concentrations
vrnacofoldpfIvo Hofacker RNA cofolding with partitioning
vrnadistanceIvo Hofacker RNA distances
vrnaduplexIvo Hofacker RNA duplex calculation
vrnaevalIvo Hofacker RNA eval
vrnaevalpairIvo Hofacker RNA eval with cofold
vrnafoldIvo Hofacker Calculate secondary structures of RNAs
vrnafoldpfIvo Hofacker Secondary structures of RNAs with partition
vrnaheatIvo Hofacker RNA melting
vrnainverseIvo Hofacker RNA sequences matching a structure
vrnalfoldIvo Hofacker Calculate locally stable secondary structures of RNAs
vrnapalnIvo Hofacker RNA alignment
vrnaplotIvo Hofacker Plot vrnafold output
vrnasuboptIvo Hofacker Calculate RNA suboptimals

EMBASSY - OTHERS

Other EMBASSY packages with single applications. These are contributed single programs, or conversions of single programs.

Program nameDescription
emnuSimple menu of EMBOSS applications
esim4 Align an mRNA to a genomic DNA sequence
meme Motif detection
mseConversion of Will Gilbert's MSE editor
topoConversion of Susan Jean Johns' TOPO
crystalballAnswers every drug discovery question you have about this sequence

Application groups

The EMBOSS applications are organized into logical groups according to their function. See the Application Groups Documentation for more information.


Proposed new applications

This is a list of areas requiring software development including putative new applications.