The applications are listed in alphabetical order in the tables below. They are also organised into groups of related functionality. There is also a table of areas requiring software development which includes proposed new applications. Please send suggestions for new applications to emboss@emboss.open-bio.org.
Please send bug reports to emboss-bug@emboss.open-bio.org.
Program name | Description |
---|---|
aaindexextract | Extract data from AAINDEX |
abiview | Reads ABI file and display the trace |
acdc | Tests definition files for any EMBOSS application. |
aligncopy | Reads and writes alignments |
antigenic | Finds antigenic sites in proteins |
backtranambig | Back translate a protein sequence to ambiguous codons |
backtranseq | Back translate a protein sequence |
banana | Bending and Curvature Plot in B-DNA |
biosed | Replace or delete sequence sections |
btwisted | Calculates the twisting in a B-DNA sequence |
cacheensembl | Prepares an EMBOSS cache file for an Ensembl server |
cai | CAI codon usage statistic |
chaos | Create a chaos plot for a sequence. |
charge | Protein charge plot |
checktrans | ORF property statistics |
chips | Codon usage statistics |
cirdna | Draws circular maps of DNA constructs |
codcmp | Codon usage table comparison |
coderet | Extract CDS, mRNA and translations from feature tables |
compseq | Counts the composition of dimer/trimer/etc words in a sequence |
cons | Creates a consensus from multiple alignments |
consambig | Create an ambiguous consensus sequence from a multiple alignment |
cpgplot | Plot CpG rich areas |
cpgreport | Reports CpG rich regions |
cusp | Create a codon usage table |
cutgextract | Extract data from CUTG |
cutseq | Removes a specified section from a sequence. |
dan | Plot melting temperatures for DNA. |
dbiblast | Database indexing for BLAST 1 and 2 indexed databases |
dbifasta | Index a fasta database |
dbiflat | Database indexing for flat file databases |
dbigcg | Database indexing for GCG formatted databases |
dbxcompress | Compress an uncompressed dbx index |
dbxedam | Index the EDAM ontology using b+tree indices |
dbxfasta | Database b+tree indexing for fasta file databases |
dbxflat | Database b+tree indexing for flat file databases |
dbxgcg | Database b+tree indexing for GCG formatted databases |
dbxreport | Validate index and report internals for dbx databases |
dbxresource | Index a data resource catalogue using b+tree indices |
dbxtax | Index NCBI taxonomy using b+tree indices |
dbxuncompress | Uncompress a compressed dbx index |
degapseq | Removes gap characters from sequences |
descseq | Alter the name or description of a sequence. |
diffseq | Find differences between nearly identical sequences |
distmat | Creates a distance matrix from multiple alignments |
dotmatcher | Produces a dotplot of two sequences. |
dotpath | Displays a non-overlapping wordmatch dotplot of two sequences |
dottup | DNA sequence dot plot |
dreg | Regular expression search of a nucleotide sequence |
drfinddata | Find public databases by data type |
drfindformat | Find public databases by format |
drfindid | Find public databases by identifier |
drfindresource | Find public databases by resource |
drget | Get data resource entries |
drtext | Get data resource entries complete text |
edamdef | Find EDAM ontology terms by definition |
edamhasinput | Find EDAM ontology terms by has_input relation |
edamhasoutput | Find EDAM ontology terms by has_output relation |
edamisformat | Find EDAM ontology terms by is_format_of relation |
edamisid | Find EDAM ontology terms by is_identifier_of relation |
edamname | Find EDAM ontology terms by name |
edialign | Local multiple alignment of sequences |
einverted | Finds DNA inverted repeats |
embossdata | Finds or fetches the data files read in by the EMBOSS programs |
embossversion | Writes the current EMBOSS version number |
emma | Multiple alignment program |
emowse | Protein identification by mass spectrometry |
entret | Reads and writes (returns) flatfile entries |
epestfind | Finds PEST motifs as potential proteolytic cleavage sites |
eprimer3 | Picks PCR primers and hybridization oligos |
eprimer32 | Picks PCR primers and hybridization oligos |
equicktandem | Finds tandem repeats |
est2genome | Align EST and genomic DNA sequences |
etandem | Looks for tandem repeats in a nucleotide sequence. |
extractalign | Extract regions from a sequence alignment |
extractfeat | Extract features from a sequence |
extractseq | Extract regions from a sequence. |
featcopy | Return a feature table |
featreport | Reads and writes a feature table |
feattext | Return a feature table original text |
findkm | Calculates Km and Vmax for an enzyme reaction |
freak | Residue/base frequency table or plot |
fuzznuc | Nucleic acid pattern search |
fuzzpro | Protein pattern search |
fuzztran | Protein pattern search after translation |
garnier | Predicts protein secondary structure |
geecee | Calculates the fractional GC content of nucleic acid sequences |
getorf | Finds and extracts open reading frames (ORFs) |
godef | Find GO ontology terms by definition |
goname | Find GO ontology terms by name |
helixturnhelix | Finds nucleic acid binding domains. |
hmoment | Hydrophobic moment calculation |
iep | Calculates the isoelectric point of a protein |
infoalign | Information on a multiple sequence alignment |
infobase | Return information on a given nucleotide base |
inforesidue | Return information on a given amino acid residue |
infoseq | Displays some simple information about sequences |
isochore | Plots isochores in large DNA sequences |
jaspextract | Extract data from JASPAR |
jaspscan | Scans DNA sequences for transcription factors |
jembossctl | Jemboss Authentication Control |
lindna | Draws linear maps of DNA constructs |
listor | Writes a list file of the logical OR of two sets of sequences |
makenucseq | Create random nucleotide sequences |
makeprotseq | Create random protein sequences |
marscan | Finds MAR/SAR sites in nucleic sequences |
maskambignuc | Masks all ambiguity characters in nucleotide sequences with N |
maskambigprot | Masks all ambiguity characters in protein sequences with X |
maskfeat | Mask off features of a sequence |
maskseq | Mask off regions of a sequence. |
matcher | Local alignment of two sequences |
megamerger | Merge two large overlapping nucleic acid sequences |
merger | Merge two overlapping sequences |
msbar | Mutate sequence beyond all recognition |
mwcontam | Shows molwts that match across a set of files |
mwfilter | Filter noisy molwts from mass spec output |
needle | Needleman-Wunsch global alignment. | needleall | Many-to-many pairwise alignments of two sequence sets |
newcpgreport | Report CpG rich areas |
newcpgseek | Reports CpG rich regions |
newseq | Type in a short new sequence. |
nohtml | Remove mark-up (e.g. HTML tags) from an ASCII text file |
noreturn | Removes carriage return from ASCII files |
nospace | Remove whitespace from an ASCII text file |
notab | Replace tabs with spaces in an ASCII text file |
notseq | Excludes a set of sequences and writes out the remaining ones |
nthseq | Writes one sequence from a multiple set of sequences |
nthseqset | Reads and writes (returns) one set of sequences from many |
octanol | Displays protein hydropathy |
oddcomp | Finds protein sequence regions with a biased composition. |
ontocount | Count ontology term(s) |
ontoget | Get ontology term(s) |
ontogetcommon | Get common ancestor for terms |
ontogetdown | Get ontology term(s) by parent id |
ontogetobsolete | Get ontology ontology terms |
ontogetroot | Get ontology root terms by child identifier |
ontogetsibs | Get ontology term(s) by id with common parent |
ontogetup | Get ontology term(s) by id of child |
ontoisobsolete | Report whether an ontology term id is obsolete |
ontotext | Get ontology term(s) original full text |
palindrome | Looks for inverted repeats in a nucleotide sequence. |
pasteseq | Insert one sequence into another. |
patmatdb | Matching a Prosite motif against a Protein Sequence Database. |
patmatmotifs | Compares a protein sequence to the PROSITE motif database. |
pepcoil | Predicts coiled coil regions |
pepdigest | Protein proteolytic enzyme or reagent cleavage digest |
pepinfo | Plots simple amino acid properties in parallel |
pepnet | Protein helical net plot |
pepstats | Protein statistics |
pepwheel | Shows protein sequences as helices |
pepwindow | Displays protein hydropathy |
pepwindowall | Displays protein hydropathy of a set of sequences |
plotcon | Plots the quality of conservation of a sequence alignment |
plotorf | Plot potential open reading frames |
polydot | Multiple dotplot |
preg | Regular expression search of a protein sequence |
prettyplot | Displays aligned sequences, with colouring and boxing. |
prettyseq | Output sequence with translated ranges |
primersearch | Searches DNA sequences for matches with primer pairs |
printsextract | Preprocesses the PRINTS database for use with the program PSCAN |
profit | Scan a sequence or database with a matrix or profile |
prophecy | Creates matrices/profiles from multiple alignments |
prophet | Gapped alignment for profiles |
prosextract | Extracts ID, AC, and PA lines from the PROSITE motif database. |
pscan | Locates fingerprints (multiple motif features) in a protein sequence. |
psiphi | Calculates phi and psi torsion angles from cleaned EMBOSS-style protein co-ordinate file |
rebaseextract | Extract data from REBASE |
recoder | Find and remove restriction sites but maintain the same translation |
redata | Isoschizomers, references and Suppliers for Restriction Enzymes |
remap | Display a sequence with restriction cut sites, translation etc.. |
restover | Finds restriction enzymes that produce a specific overhang |
restrict | Finds Restriction Enzyme Cleavage Sites |
revseq | Reverse and complement a sequence. |
seealso | Finds programs sharing group names |
seqcount | Reads and counts sequences |
seqmatchall | Does an all-against-all comparison of a set of sequences |
seqret | Reads and writes (returns) a sequence. |
seqretsplit | Reads and writes (returns) sequences in individual files |
seqxref | Retrieve all database cross-references for a sequence entry |
servertell | Display information about a public server |
showalign | Display a multiple sequence alignment |
showdb | Displays information on the currently available databases |
showfeat | Show features of a sequence. |
showorf | Pretty output of DNA translations |
showpep | Displays protein sequences with features in pretty format |
showseq | Display a sequence with features, translation etc |
showserver | Displays information on configured servers |
shuffleseq | Shuffles a set of sequences maintaining composition |
sigcleave | Predicts signal peptide cleavage sites |
silent | Silent mutation restriction enzyme scan |
sirna | Finds siRNA duplexes in mRNA |
sixpack | Display a DNA sequence with 6-frame translation and ORFs |
sizeseq | Sort sequences by size |
skipredundant | Remove redundant sequences from an input set |
skipseq | Reads and writes (returns) sequences, skipping the first few |
splitsource | Split a sequence into original source sequences. |
splitter | Split a sequence into (overlapping) smaller sequences. |
stretcher | Global alignment of two sequences. |
stssearch | Searches a DNA database for matches with a set of STS primers |
supermatcher | Finds a match of a large sequence against one or more sequences |
syco | Synonymous codon usage Gribskov statistic plot |
taxget | Get taxon(s) |
taxgetdown | Get descendants of taxon(s) |
taxgetrank | Get parents of taxon(s) |
taxgetspecies | Get all species under taxon(s) |
taxgetup | Get parents of taxon(s) |
tcode | Fickett TESTCODE statistic to identify protein-coding DNA |
textget | Get text data entries |
textsearch | Search sequence documentation text. SRS and Entrez are faster! |
tfextract | Extract data from TRANSFAC |
tfm | Displays a program's help documentation manual |
tfscan | Scans DNA sequences for transcription factors. |
tmap | Predict transmembrane proteins |
tranalign | Align nucleic coding regions given the aligned proteins |
transeq | Translates nucleic acid sequences. |
trimest | Trim poly-A tails off EST sequences |
trimseq | Trim ambiguous bits off the ends of sequences |
trimspace | Remove extra whitespace from an ASCII text file |
twofeat | Finds neighbouring pairs of features in sequences |
union | Reads sequence fragments and builds one sequence |
vectorstrip | Strips out DNA between a pair of vector sequences |
water | Smith-Waterman local alignment. |
whichdb | Search all databases for an entry |
wobble | Wobble base plot |
wordcount | Counts words of a specified size in a DNA sequence. |
wordfinder | Match large sequences against one or more other sequences |
wordmatch | Finds all exact matches of a given size between 2 sequences |
wossdata | Finds programs by EDAM data |
wossinput | Finds programs by EDAM data for inputs |
wossname | Finds programs by keywords in their one-line documentation. |
wossoperation | Finds programs by EDAM operation |
wossoutput | Finds programs by EDAM data for outputs |
wossparam | Finds programs by EDAM data for parameters |
wosstopic | Finds programs by EDAM topic |
yank | Reads a range from a sequence, appends the full USA to a list file |
The EMBASSY grouping includes applications and packages for specialised sequence analysis and non-sequence based analysis, as well as software included from third parties who have their own licencing terms. EMBOSS is GPL licensed. The libraries are under the Lesser GPL (LGPL). This allows the EMBOSS libraries to link to other software and only requires that software to have an LGPL-compatible licence. Phylip, for example, fits this model. EMBASSY applications have the same look and feel as EMBOSS aplications.
The PHYLIP programs in this EMBASSY package were ported from release 3.572.
PHYLIP 3.61 has been converted as PHYLIPNEW and was released with EMBOSS 3.0.0 as a beta version.
The PHYLIPNEW programs are EMBOSS conversions of the programs in Joe Felsenstein's PHYLIP package, version 3.61 (August 2004).
The PHYLIPNEW versions of these programs all have the prefix "f" to distinguish them from the original programs.
Program name | Description |
---|---|
fclique | Largest clique program |
fconsense | Majority-rule and strict consensus tree |
fcontml | Continuous character Maximum Likelihood method |
fcontrast | Continuous character Contrasts |
fdiscboot | Bootstrapped discrete sites algorithm |
fdnacomp | DNA compatibility algorithm |
fdnadist | Nucleic acid sequence Distance Matrix program |
fdnainvar | Nucleic acid sequence Invariants method |
fdnaml | Estimates phylogenies from nucleic acid sequence Maximum Likelihood |
fdnamlk | Estimates phylogenies from nucleic acid sequence Maximum Likelihood with molecular clock |
fdnamove | Interactive DNA parsimony |
fdnapars | DNA parsimony algorithm |
fdnapenny | Penny algorithm for DNA |
fdollop | Dollo and polymorphism parsimony algorithm |
fdolmove | Interactive Dollo and Polymorphism Parsimony |
fdolpenny | Penny algorithm Dollo or polymorphism |
fdrawgram | Plots a cladogram- or phenogram-like rooted tree diagram |
fdrawtree | Plots an unrooted tree diagram |
ffactor | Multistate to binary recoding program |
ffitch | Fitch-Margoliash and Least-Squares Distance Methods |
ffreqboot | Bootstrapped sequences algorithm |
fgendist | Compute genetic distances from gene frequencies |
fkitsch | Fitch-Margoliash method with contemporary tips |
fmix | Mixed parsimony algorithm |
fmove | Interactive mixed method parsimony |
fneighbor | Phylogenies from distance matrix by N-J or UPGMA method |
fpars | Discrete character parsimony |
fpenny | Penny algorithm, branch-and-bound to find all most parsimonious trees |
fproml | Protein maximum Likelihood program |
fpromlk | Protein maximum Likelihood program with molecular clock |
fprotdist | Protein distance algorithm |
fprotpars | Protein parsimony algorithm |
frestboot | Bootstrapped sequences algorithm |
frestdist | compute distance matrix from restriction sites or fragments |
frestml | Restriction site maximum Likelihood method |
fretree | Interactive tree rearrangement |
fseqboot | Bootstrapped sequences algorithm |
fseqbootall | Bootstrapped sequences algorithm |
ftreedist | Distances between trees |
ftreedistpair | Distances between trees |
The DOMAINATRIX programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
cathparse | Reads raw CATH classification files and writes a DCF file. |
domainreso | Removes low resolution domains from a DCF file. |
domainseqs | Adds sequence records to a DCF file. |
domainnr | Removes redundant domains from a DCF file. The file must contain domain sequence information which can be added by using DOMAINSEQS. |
domainsse | Adds secondary structure records to a DCF file. |
scopparse | Reads raw SCOP classification files and writes a DCF file. |
ssematch | Searches a DCF file for secondary structure matches. The file must contain domain secondary structure information which can be added by using DOMAINSEQS. |
The DOMALIGN programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
allversusall | Does an all-versus-all global alignment for each set of sequences in an input directory and writes files of sequence similarity values. |
domainrep | Reorder DCF file so that the representative structure of each user-specified node is given first. |
domainalign | Generates structure-based sequence alignments for nodes in a DCF file. |
seqalign | Reads a DAF file and a DHF and writes a DAF file extended with the hits. |
The DOMSEARCH programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
seqsearch | Generate DHF files of database hits (sequences) from a DAF file (or other file of sequences) by using PSI-BLAST. |
seqfraggle | Removes fragments from DHF files (or other files of sequences). |
seqsort | Reads DHF files of database hits (sequences) and removes hits of ambiguous classification. |
seqnr | Removes redundancy from DHF files (or other files of sequences). |
seqwords | Generates DHF files of database hits (sequences) from Swissprot matching keywords from a keywords file. |
The SIGNATURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
libgen | Generates various type of discriminator for each alignment in a directory. |
libscan | Generates hits (sequences in a domain hits file) from searches of various types of discriminator (HMMs, profiles etc) against a sequence database. Or generates hits from screening sequences against a library of such discriminators. |
matgen3d | Generates a 3D-1D scoring matrix from CCF files (clean coordinate files). |
rocon | Reads a DHF file of hits (sequences of unknown structural classification) and a DHF file of validation sequences (known classification) and writes a "hits file" for the hits, which are classified and rank-ordered on the basis of score. |
rocplot | A generic and flexible tool for interpretation and graphical display of the performance of predictive methods using receiver Operator Characteristic (ROC) analysis. |
siggen | Generates a sparse protein signature from an alignment and residue contact data. |
siggenlig | Generates ligand-binding signatures from a CON file (contacts file) of residue-ligand contacts. |
sigscan | Generates a DHF of hits (sequences) from scanning a signature against a sequence database. |
sigscanlig | Generates a LHF (ligand hits file) of hits (sequences) from scanning a sequence against a library of ligand-binding signatures |
The STRUCTURE programs were developed by Jon Ison and colleagues for their protein domain research. They are included as an EMBASSY package as a beta version.
Program name | Description |
---|---|
contacts | Reads CCF files and writes CON files of intra-chain residue-residue contact data. |
domainer | Reads CCF files for proteins and writes CCF files for domains in a DCF file. |
hetparse | Converts raw dictionary of heterogen groups to EMBL-like format. |
interface | Reads protein CCF files and writes CON files of inter-chain residue-residue contact data. |
pdbparse | Parses PDB files and writes CCF files for proteins. |
pdbplus | Add records for residue solvent accessibility and secondary structure to a CCF file. |
pdbtosp | Convert raw swissprot:PDB equivalence file to EMBL-like format. |
sites | Reads CCF files and writes CON files of residue-ligand contact data for domains in a DCF file. |
The HMMEROLD programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.1.1.
The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.
Program name | Description |
---|---|
oalistat | Statistics for multiple alignment files |
ohmmalign | Align sequences with an HMM |
ohmmbuild | Build HMM |
ohmmcalibrate | Calibrate a hidden Markov model |
ohmmconvert | Convert between HMM formats |
ohmmemit | Extract HMM sequences |
ohmmfetch | Extract HMM from a database |
ohmmindex | Index an HMM database |
ohmmpfam | Align single sequence with an HMM |
ohmmsearch | Search sequence database with an HMM |
The HMMER programs are EMBOSS conversions of the programs in Sean Eddy's HMMER package, version 2.3.2.
The HMMER versions of these programs all have the prefix "e" to distinguish them from the original programs.
Program name | Description |
---|---|
ealistat | Statistics for multiple alignment files |
ehmmalign | Align sequences with an HMM |
ehmmbuild | Build HMM |
ehmmcalibrate | Calibrate a hidden Markov model |
ehmmconvert | Convert between HMM formats |
ehmmemit | Extract HMM sequences |
ehmmfetch | Extract HMM from a database |
ehmmindex | Index an HMM database |
ehmmpfam | Align single sequence with an HMM |
ehmmsearch | Search sequence database with an HMM |
These programs are adapted from the VIENNA RNA package.
This is currently under development, and is available only from the CVS server. We hope to make a beta release in the near future, but there is much work to be done on sequence formats and testing. The programs are listed in alphabetical order:
Program name | Author(s) | Description |
---|---|---|
vrnaalifold | Ivo Hofacker | RNA alignment folding |
vrnaalifoldpf | Ivo Hofacker | RNA alignment folding with partition |
vrnacofold | Ivo Hofacker | RNA cofolding |
vrnacofoldconc | Ivo Hofacker | RNA cofolding with concentrations |
vrnacofoldpf | Ivo Hofacker | RNA cofolding with partitioning |
vrnadistance | Ivo Hofacker | RNA distances |
vrnaduplex | Ivo Hofacker | RNA duplex calculation |
vrnaeval | Ivo Hofacker | RNA eval |
vrnaevalpair | Ivo Hofacker | RNA eval with cofold |
vrnafold | Ivo Hofacker | Calculate secondary structures of RNAs |
vrnafoldpf | Ivo Hofacker | Secondary structures of RNAs with partition |
vrnaheat | Ivo Hofacker | RNA melting |
vrnainverse | Ivo Hofacker | RNA sequences matching a structure |
vrnalfold | Ivo Hofacker | Calculate locally stable secondary structures of RNAs |
vrnapaln | Ivo Hofacker | RNA alignment |
vrnaplot | Ivo Hofacker | Plot vrnafold output |
vrnasubopt | Ivo Hofacker | Calculate RNA suboptimals |
Other EMBASSY packages with single applications. These are contributed single programs, or conversions of single programs.
Program name | Description |
---|---|
emnu | Simple menu of EMBOSS applications |
esim4 | Align an mRNA to a genomic DNA sequence |
meme | Motif detection |
mse | Conversion of Will Gilbert's MSE editor |
topo | Conversion of Susan Jean Johns' TOPO |
crystalball | Answers every drug discovery question you have about this sequence |
The EMBOSS applications are organized into logical groups according to their function. See the Application Groups Documentation for more information.
This is a list of areas requiring software development including putative new applications.