WebLogo is a web based application designed to make the generation of sequence logos as easy and painless as possible.
A sequence logo is a graphical representation of an amino acid or nucleic acid multiple sequence alignment. Each logo consists of stacks of symbols, one stack for each position in the sequence. The overall height of the stack indicates the sequence conservation at that position, while the height of symbols within the stack indicates the relative frequency of each amino or nucleic acid at that position. The width of the stack is proportional to the fraction of valid symbols in that position. (Positions with many gaps have thin stacks.)In general, a sequence logo provides a richer and more precise description of, for example,a binding site, than would a consensus sequence.
Crooks GE, Hon G, Chandonia JM, Brenner SE WebLogo: A sequence logo generator, Genome Research, 14:1188-1190, (2004) [ Full Text ]
Schneider TD, Stephens RM. 1990. Sequence Logos: A New Way to Display Consensus Sequences. Nucleic Acids Res. 18:6097-6100
PNG: | (600 DPI) Print resolution bitmap |
PNG: | (low res, 96 DPI) Screen resolution bitmap |
JPEG: | Screen resolution bitmap |
EPS: | Encapsulated postscript |
PDF: | Portable Document Format |
SVG: | Scalable Vector Graphics |
small: | 5.4 points wide (same as 9pt Courier), aspect ratio 5:1 |
medium: | Double the width and height of small. |
large: | Triple the width and height of small. |
auto: Automatically guess sequence type from the data |
protein |
dna |
rna |
probability: | Show residue probabilities, rather than information content. If compositional adjustment is disabled, then these are the raw residue frequencies. |
bits: | Information content in bits |
nats: | Natural units, 1 bit = ln 2 (0.69) nats |
kT: | Thermal energy units in natural units (numerically the same as nats) |
kJ/mol: | Thermal energy (assuming T = 300 K) |
kcal/mol: | Thermal energy (assuming T = 300 K) |
The background composition of the genome or proteome from which the sequences have been drawn. The default, automatic option is to use equiprobable background for nucleic acids and a typical amino acid usage pattern for proteins. However, you may also explicitly set the expected CG content for nucleic acid sequences, insist on equiprobable background distributions, or turn off composition adjustment altogether.
Compositional adjustment has two effects. First, the information content of a site is defined as the relative entropy of the monomers at that site to the background distribution. Consequentially, rare monomers have higher information content (when they occur) than relatively common monomers.
Secondly, the background composition is used in the small sample correction of information content. Briefly, if only a few sequences are available in the multiple sequence alignment, then sites typically appear more conserved than they really are. Small samples bias the relative entropy upwards. To compensate, we add pseudocounts to the actual counts, proportional to the expected background composition. These pseudocounts smooth the data for small samples, but become irrelevant for large samples. The proportionality constant is set to 4 for nucleic acid sequences, and 20 for proteins (these numbers have been found to give reasonable results in practice).
Behind the scenes, things are more complex. We do a full Bayesian calculation, starting with explicit Dirichlet priors based on the background composition, to which we add the data and then calculate both the posterior mean relative entropy (the stack height) and Bayesian 95% confidence intervals for error bars. These interesting details will be explained elsewhere.
2 Watson-Crick hydrogen bonds | TAU | dark orange |
3 Watson-Crick hydrogen bonds | GC | blue |
G | orange |
TU | red |
C | blue |
A | green |
Hydrophilic | RKDENQ | blue |
Neutral | SGHTAP | green |
Hydrophobic | YVMCLFIW | black |
Polar | G,S,T,Y,C | green |
Neutral | Q,N | purple |
Basic | K,R,H | blue |
Acidic | D,E | red |
Hydrophobic | A,V,L,I,P,W,F,M | black |
Positive | KRH | blue |
Negative | DE | red |
weblogo
, provides many more options and greater control over the final logo appearance.
pip install weblogoAlternatively, weblogo and its dependencies can be installed manually. The WebLogo source code can be downloaded from Weblogo's Github repository. This code is distributed under various open source licenses. Please consult the
LICENSE.txt
file in the source distribution for details.
After unpacking the WebLogo tarfile, it should be possible to immediately create logos using the command line client (provided that python, numpy and ghostscript have already been installed).
./weblogo --format PNG < cap.fa > cap.png
Please consult the file build_examples.sh
for more examples.
To run WebLogo as a stand alone web service, run the logo server command :
./weblogo --serve
It should now be possible to access WebLogo at http://localhost:8080/.
weblogo/htdocs
directory somewhere within the document root of your webserver. The webserver must be able to execute the CGI script create.cgi
. For Apache, you may have to add an ExecCGI
option and add a cgi handler in the httpd.conf
configuration file. Something like this:
DocumentRoot "/home/ec2-user/weblogo/weblogo/htdocs" # Further relax access to the default document root: <Directory "/home/ec2-user/weblogo/weblogo/htdocs"> Options Indexes FollowSymLinks ExecCGI MultiViews AddHandler cgi-script .cgi AllowOverride All Require all granted </Directory>It may also be necessary to set the
PATH
and PYTHONPATH
environment variables.
SetEnv PYTHONPATH /path/to/weblogo/librariesThe cgi script also has to be able to find the '
gs
' ghostscript executable.
The maximum bytes of uploaded sequence data can be controlled with the WEBLOGO_MAX_FILE_SIZE
environment variable.
SetEnv WEBLOGO_MAX_FILE_SIZE 1000000
weblogo
, The WebLogo Command Line Interface (CLI)build_examples.sh
script for inspiration.
Usage: weblogo [options] < sequence_data.fa > sequence_logo.eps Create sequence logos from biological sequence alignments. Options: --version show program's version number and exit -h --help show this help message and exit Input/Output Options: -f --fin FILENAME Sequence input file (default: stdin) --upload URL Upload input file from URL -D --datatype FORMAT Type of multiple sequence alignment or position weight matrix file: (clustal, fasta, msf, genbank, nbrf, nexus, phylip, stockholm, intelligenetics, table, array, transfac) -o --fout FILENAME Output file (default: stdout) -F --format FORMAT Format of output: eps (default), png, png_print, pdf, jpeg, svg, logodata Logo Data Options: -A --sequence-type TYPE The type of sequence data: 'protein', 'rna' or 'dna'. -a --alphabet ALPHABET The set of symbols to count, e.g. 'AGTC'. All characters not in the alphabet are ignored. If neither the alphabet nor sequence-type are specified then weblogo will examine the input data and make an educated guess. See also --sequence-type, --ignore-lower-case -U --units UNIT A unit of entropy ('bits' (default), 'nats', 'digits'), or a unit of free energy ('kT', 'kJ/mol', 'kcal/mol'), or 'probability' for probabilities --composition COMP. The expected composition of the sequences: 'auto' (default), 'equiprobable', 'none' (do not perform any compositional adjustment), a CG percentage, a species name (e.g. 'E. coli', 'H. sapiens'), or an explicit distribution (e.g. "{'A':10, 'C':40, 'G':40, 'T':10}"). The automatic option uses a typical distribution for proteins and equiprobable distribution for everything else. --weight NUMBER The weight of prior data. Default depends on alphabet length -i --first-index INDEX Index of first position in sequence data (default: 1) -l --lower INDEX Lower bound of sequence to display -u --upper INDEX Upper bound of sequence to display Transformations: Optional transformations of the sequence data. --ignore-lower-case Disregard lower case letters and only count upper case letters in sequences. --reverse reverse sequences --complement complement nucleic sequences --revcomp reverse complement nucleic sequences Logo Format Options: These options control the format and display of the logo. -s --size LOGOSIZE Specify a standard logo size (small, medium (default), large) -n --stacks-per-line COUNT Maximum number of logo stacks per logo line. (default: 40) -t --title TEXT Logo title text. --label TEXT A figure label, e.g. '2a' -X --show-xaxis YES/NO Display sequence numbers along x-axis? (default: True) -x --xlabel TEXT X-axis label --annotate TEXT A comma separated list of custom stack annotations, e.g. '1,3,4,5,6,7'. Annotation list must be same length as sequences. --rotate-numbers YES/NO Draw X-axis numbers with vertical orientation (default: False). --number-interval NUMBER Distance between numbers on X-axis (default: 5) -S --yaxis NUMBER Height of yaxis in units. (Default: Maximum value with uninformative prior.) -Y --show-yaxis YES/NO Display entropy scale along y-axis? (default: True) -y --ylabel TEXT Y-axis label (default depends on plot type and units) -E --show-ends YES/NO Label the ends of the sequence? (default: False) -P --fineprint TEXT The fine print (default: weblogo version) --ticmarks NUMBER Distance between ticmarks (default: 1.0) --errorbars YES/NO Display error bars? (default: True) --reverse-stacks YES/NO Draw stacks with largest letters on top? (default: True) Color Options: Colors can be specified using CSS2 syntax. e.g. 'red', '#FF0000', etc. -c --color-scheme SCHEME Specify a standard color scheme (auto, base pairing, charge, chemistry, classic, hydrophobicity, monochrome) -C --color COLOR SYMBOLS DESCRIPTION Specify symbol colors, e.g. --color black AG 'Purine' --color red TC 'Pyrimidine' --default-color COLOR Symbol color if not otherwise specified. Font Format Options: These options provide control over the font sizes and types. --fontsize POINTS Regular text font size in points (default: 10) --title-fontsize POINTS Title text font size in points (default: 12) --small-fontsize POINTS Small text font size in points (default: 6) --number-fontsize POINTS Axis numbers font size in points (default: 8) --text-font FONT Specify font for labels (default: ArialMT) --logo-font FONT Specify font for logo (default: Arial-BoldMT) --title-font FONT Specify font for title (default: ArialMT) Advanced Format Options: These options provide fine control over the display of the logo. -W --stack-width POINTS Width of a logo stack (default: 10.8) --aspect-ratio POINTS Ratio of stack height to width (default: 5) --box YES/NO Draw boxes around symbols? (default: no) --resolution DPI Bitmap resolution in dots per inch (DPI). (Default: 96 DPI, except png_print, 600 DPI) Low resolution bitmaps (DPI<300) are antialiased. --scale-width YES/NO Scale the visible stack width by the fraction of symbols in the column? (I.e. columns with many gaps of unknowns are narrow.) (Default: yes) --debug YES/NO Output additional diagnostic information. (Default: False) --errorbar-fraction NUMBER Sets error bars display proportion (default: 0.9) --errorbar-width-fraction NUMBER Sets error bars width display proportion (default: 0.25) --errorbar-gray NUMBER Sets error bars' gray scale percentage (default: 0.75) WebLogo Server: Run a standalone webserver on a local port. --serve Start a standalone WebLogo server for creating sequence logos. --port PORT Listen to this local port. (Default: 8080)
The development project is hosted at https://github.com/WebLogo/weblogo. If you wish to extend WebLogo or to contribute code, then you should download the full source code development package directly from the Github repository.
> git clone https://github.com/WebLogo/weblogo > pip install -e . > cd weblogo
Please consult the developer notes, DEVELOPERS.txt
and software license LICENSE.txt
Outstanding bugs and feature requests are listed on the WebLogo issue tracker.
WebLogo was created by Gavin E. Crooks, Liana Lareau, Gary Hon, John-Marc Chandonia and Steven E. Brenner. Many others have provided suggestions, bug fixes and moral support.
WebLogo was originally based upon the programs alpro and makelogo, both of which are part of Tom Schneider's delila package. Many thanks are due to him for making this software freely available and for encouraging its use.
While no permanent records are kept of submitted sequences, we cannot undertake to guarantee that data sent to WebLogo remains secure. Moreover, no guarantees whatsoever are provided about data generated by WebLogo.
Suggestions on how to improve WebLogo are heartily welcomed! Please direct questions to WebLogo's issue tracker.