name makelogo: make a graphical `sequence logo' for aligned sequences synopsis makelogo(symvec: in, makelogop: in, colors: in, marks: in, logo: out, output: out) files symvec: A "symbol vector" file from the alpro or dalvec program. If the file is empty, the alphabet is printed. This allows one to determine the correction factors described below. If the error bars have a negative size, they are not displayed. This allows the sites program to control the display when it would not be appropriate. If the number of a symbol is negative in symvec, then the symbol will be rotated 180 degrees before being printed. The absolute value is used by makelogo to determine the height. This allows statistical tests which find rare symbols to be significant to show that the symbol is rare by having it up side down. Notice that ACGT are all easy to distinguish from their upside down versions, but unfortunately this is not always true for protein sequences. makelogop: parameters to control the program. line 1: contains the lowest to highest range of the binding site to do the logo graph. (FROM to TO range) line 2: bar: sequence coordinate before which to print a vertical bar NOTE: the vertical bar takes up a small amount of horizontal space. This will offset the logo from that point on by a tiny amount. line 3: xcorner and ycorner. This is the coordinate of the lower left hand corner of the logo (in cm). These should be real numbers. line 4: rotation: angle in degrees to rotate the logo. Warning: rotations other than by factors of 90 degrees may produce incorrect logos because character scaling depends on the orientation of the characters. (Essentially, it's a design fault of PostScript.) line 5: charwidth: (real, > 0) the width of the logo characters, in cm line 6: barheight barwidth: (real, > 0) height of the vertical bar, in cm, and its width, in cm. line 7: barbits: (real) The height of the vertical bar, in bits, is given by the absolute value of barbits. If barbits is positive, an "I-beam" will appear at the top of the symbol stack. The I-beam indicates one standard deviation of the stack height, based entirely on how small the sample of sequences is. If the value of barbits is negative, the I-beam is not displayed. Not knowing how big the sampling effects are can fool one, so one should usually have the I-beam, even if it is ugly. WARNING: it is not known how to calculate the error for data derived from a dirty DNA synthesis experiment (see Schneider1989, reference given below). In that case the error could be calculated (in program sites) from the number of sequences, so that the error bar would be an underestimate of the variation. Unfortunately, when I tried this, people interpreted the error bar as the size they saw, so this does not work well visually. Therefore when data come from the sites program, the I-beam is suppressed. The combination of barheight and barbits determines the size of the logo in bits per centimeter. Both must be specified even if no vertical bar is desired. line 8: barends: if the first character on the line is a 'b', then bars are put before and after each line, in addition to the other bar. The first bar on each line is labeled with tic marks and the number of bits. If you don't want this, you can remove the call to maketic in the logo. line 9: showingbox: if the first character on the line is an 's', then show a dashed box around each character. line 10: outline: If the first character is 'o' then the characters show up in outline form. Otherwise, they are solid. line 11: caps: if the first letter is 'c' then alphabetic characters are converted to capital form. line 12: stacksperline: number of character stacks per line output line 13: linesperpage: number of lines per page output line 14: linemove: line separation relative to the barheight line 15: numbering: if the first letter is 'n' then each stack is numbered. Otherwise, the number is suppressed as a PostScript comment. This allows you to modify the logo file by hand to reinstate numbering for only the positions you want by removing the percent (%) symbol from in front of the calls to makenumber. line 16: shrinking: (real) Factor by which to shrink the characters. If shrinking <= 0 or shrinking >= 1 then the characters exactly fit into the dashed box. If shrinking > 0 and shrinking < 1, the characters are shrunk inside the dashed box. To use this feature, the parameter showningbox be on, so that the user does not create a logo whose height is misleading. line 17: strings: the number of user defined strings to follow. Each string definition takes up two lines. The first is the (x,y) coordinate of the string, the second is the string itself. The coordinates are in centimeters relative to the coordinate transforms performed above. (This way, the title position stays the same relative to the logo.) line 18: (x,y,s) coordinates of first user defined string (if strings >= 1) followed by the factor by which to scale the string. A factor of 1 means no scaling. In addition, if the x coordinate is negative, then the string is centered by using the string width, the stacksperline and charwidth. line 19: the first user defined string (if strings >= 1) line 20: (x,y,s) coordinates of second user defined string (if strings >= 2) line 21: the second user defined string (if strings >= 2) (etc. for the remaining strings.) The remainder of the file is ignored and may contain comments. colors: Defines the color of each character printed. Any number of lines that begin with an asterisk [*] can be used as comments to identify the file or portions of the file. Put into the file one line for each character that is to have a color other than black. The line must contain: character red green blue The last three parameters are real values between 0 and 1 (inclusive). The values depend on the PostScript interpreter, but 0 means black and a value of 1 means the most bright. To assign the asterisk a color, proceed it with a backslash [as \*]. To assign the backslash a color, proceed it with a backslash [as \\]. If the file is empty, the logo is made in black and white and the lower half of the I-beam error bar is made white so that when it is inside the letters it is visible. marks: an empty file means no marks are made. Otherwise, a series of lines containing four pieces of data that define marks to be placed over the output: mark: o means open circle, b means filled circle. base coordinate: a real number that determines the center of the mark bits coordinate: a real number that determines the position of the mark in bits. scale: a positive real number by which to scale the mark. The symbols must be in increasing order of position in the site. logo: the output file, a PostScript program to display the logo. output: messages to the user description The makelogo program generates a `sequence logo' for a set of aligned sequences. A full description is in the documentation paper. The input is an `symvec', or symbol-vector that contains the information at each position and the numbers of each symbol. The output is in the graphics language PostScript. The program now indicates the small sample error in the logo by a small 'I-beam' overlayed on the top of the logo. Although the user may turn this off to make pretty logos, I strongly recommend use of it to avoid being fooled by small amounts of data. author Thomas D. Schneider National Cancer Institute Laboratory of Mathematical Biology NCI/FCRDC Bldg 469. Room 144 P.O. Box B Frederick, MD 21702-1201 (301) 846-5581 (-5532 for messages) network address: toms@ncifcrf.gov examples makelogop parameters: -15 2 FROM to TO range to make the logo over 1 sequence coordinate before which to put a bar on the logo 15 2 (xcorner, ycorner) lower left hand corner of the logo (in cm) 90 rotation: angle to rotate the graph 1.0 charwidth: (real, > 0) the width of the logo characters, in cm 10 0.1 barheight, barwidth: (real, > 0) height of vertical bar, in cm 2 barbits: (real) height of the vertical bar, in bits; < 0: no I-beam no bars barends: if 'b' put bars before and after each line show showingbox: if 's' show a dashed box around each character no outline outline: if 'o' make each character as an outline 100 stacksperline: number of character stacks per line output 1 linesperpage: number of lines per page output 1.1 linemove: line separation relative to the barheight numbers numbering: if the first letter is 'n' then each stack is numbered 1 shrinking: factor by which to shrink characters inside dashed box 2 strings: the number of user defined strings to follow 2 14 1 coordinates of the first string (in cm) First TITLE 3 13 1 coordinates of the second string (in cm) SECOND TITLE colors: * Color scheme for logos of DNA (for the makelogo program). * color order is red-green-blue * * green: A 0 1 0 a 0 1 0 * * blue: C 0 0 1 c 0 0 1 * * red: T 1 0 0 t 1 0 0 * * orange: G 1 0.7 0 g 1 0.7 0 A test symvec is provided with the program, file 'symvec.demo', to be run with 'colors.demo' and 'makelogop.demo'. documentation Description of Logos: @article{Schneider.Stephens.Logo, author = "T. D. Schneider and R. M. Stephens", title = "Sequence Logos: A New Way to Display Consensus Sequences", journal = "Nucl. Acids Res.", volume = "18", pages = "6097-6100", year = "1990"} The Blue Book: @book{PostScriptTutorial1985, author = "{Adobe Systems Incorporated}", title = "PostScript Language Tutorial and Cookbook", publisher = "Addison-Wesley Publishing Company", address = "Reading, Massachusetts", callnumber = "QA76.73.P67P68", isbn = "0-201-10179-3", year = "1985"} The Red Book: @book{PostScriptManual1985, author = "{Adobe Systems Incorporated}", title = "PostScript Language Reference Manual", publisher = "Addison-Wesley Publishing Company", address = "Reading, Massachusetts", callnumber = "QA76.73.P67P67", isbn = "0-201-10174-2", year = "1985"} Dirty DNA synthesis experiments: @article{Schneider1989, author = "T. D. Schneider and G. D. Stormo", title = "Excess Information at Bacteriophage {T7} Genomic Promoters Detected by a Random Cloning Technique", year = "1989", journal = "Nucl. Acids Res.", volume = "17", pages = "659-674"} see also rsgra.p, rseq.p, dalvec.p, alpro.p, sites.p bugs Some chi-logo (upside down characters) do not display on OpenWindows, but do print ok on the Apple LaserWriter IIntx. The reason is completely obscure. A bug in NeWS 1.1 is that characters that are scaled too small are forced to be big. This messes up the logo and can be confusing. Another bug in NeWS 1.1 prevents one from using the outline, but the dashed boxes will show up. Sometimes displaying a logo in NeWS 1.1 on a Sun 4 will cause an 'illegal instruction', after which one is thrown completely off the computer. The source of this is not known, since it is not repeatable. The first two bugs are resolved under OpenWindows 2; the third has not been observed. These NeWS bugs do not apply to the Apple LaserWriter IIntx, which prints everything correctly. technical notes Unfortunately PostScript fonts are not exactly the same height. Thus if A and T are the standard, then C and G hang above and below the line. This has been solved in this version of makelogo. As a consequence, the user never need to determine any character sizes empirically, and the logos should work on any PostScript printer. Special thanks go to the following people for their help in solving this problem: Kevin Andresen [kevina@apple.com] "The problem facing you is that, while the PostScript language is more or less standard, the font shapes depend on the designer, type vendor, or language implementation. The fonts used in NeWS are not exactly the same as those from Adobe, which are not the same as those from Bitstream, which are not the same as the original lead type, etc. (This is an industry-wide issue.) One way to compensate for this in PostScript is to use the charpath and pathbbox operators and scale appropriately." He provided a program, which I then rewrote and generalized. That version almost worked, but not quite. This was solved by: finlay@Eng.Sun.COM (John Finlay) who said: "It would appear that the calculation of the pathbbox for characters varies with the scale of the characters (I don't know why exactly but would speculate that there's probably some weirdness with the font hints and scaling). I modified your postscript to iterate once on the size and recalculate the pathbbox at the scaled size. Seems to printout OK (inside the boxes) on a LWI, LWII and in NeWS2.0 (though NeWS still seems to get the wide slightly wrong)." shiva@well.sf.ca.us (Kenneth Porter) was also involved and actively interested. My apologies if I have forgotten someone else who contributed. The letter I and the vertical bar (|) are treated specially since in the Helvetica-Bold font they are rectangles and would completely fill the character space. In addition, the letter I is centered by makelogo. Thanks go to Joe Mack for suggesting numbering and titles (strings) and to Pete Lemkin and Wojciech Kasprzak for pointing out that the shrink option would be helpful. Thanks to Jeff Haemer for pointing out that the PostScript program should begin with '%!', and for suggesting that the string fonts should be different from the logos themselves.