VISA:
            An interactive program for the visual analysis of
            similarities in multiple amino acid sequences.

        CONCEPT
            VISA identifies amino acid patterns that are common to
            many members of a set of amino acid sequences, and dis-
            plays the distribution of common patterns along the se-
            quences in a series of histograms. Individual peaks of
            these histograms can be assigned different colors. Com-
            mon sequence patterns inherit and display the color of
            the peak in which they occur, leading to analogous seg-
            ments in the other sequences being marked in matching
            colors. These peaks usually correspond to the conserved
            sequence motifs that are characteristic of the studied
            proteins. The resulting color graphic overview of se-
            quence similarities can help to understand the architec-
            ture of the protein family and can be helpful while
            designing experiments to probe function.

        METHOD
            When sequences of a set of related proteins are loaded
            into VISA, the program locates amino acid triplets
            (three specific residues, separated by two short, but
            fixed length runs of nonspecific residues) that are com-
            mon to a preset fraction of sequences. A table on the
            screen shows how many common triplets can be found with
            different triplet size limits and with different com-
            monality indices. A left and a right mouse click over an
            item in this matrix sets the density of common triplets
            for the subsequent analysis. The distribution of common
            elements along the sequences is displayed on the main
            canvas in a set of histograms, one histogram per se-
            quence. Horizontal axes represent sequences; bar heights
            are proportional to the number of common patterns that
            match the sequence at given positions. With a click of
            the mouse the user selects one of the available colors,
            and with additional clicks the user puts brackets around
            a peak on one of the histograms. Common triplets in the
            sequence segment corresponding to this bracketed peak
            are then automatically in the selected color. Bars of
            the histograms will be repainted, and contributions of
            colored common triplets will be indicated by partially
            colored bars. Assignment of further colors to previously
            unpainted peaks - and to common triplets - can be con-
            tinued as long as more colors are available and more
            significant uncolored peaks can be found. To help the
            localization and display of global similarities the
            colored image can be further manipulated (e.g. rescaled,
            background adjusted, aligned). Management tools that al-
            low loading sequences and patterns from files, present-
            ing sequence annotation data, printing alignment in-
            formation, displaying conserved blocks in one-letter
            amino acid code, showing common triplet sets of con-
            served blocks, etc. are also at the user's disposal.

        USE
            Actions of the program are invoked mostly by clicking
            with the mouse over software control items like buttons,
            checkboxes or menus. These control items are either on
            the top control panel of the main window, or in one of
            the subwindows. The subwindows are activated by buttons
            that reside in the main control panel. Standard Xview
            manipulations (resizing, refreshing, hiding, scrolling,
            ...) are effective for the windows that are controlled
            from VISA.

            The set files button activates a window where file names
            for sequence, common pattern and output data have to be
            specified. Checkboxes create and load control whether an
            existing pattern file is to be used or a new pattern
            file is to be created. Press done when filenames and
            checkboxes are correctly set. Pressing the names button
            will bring up a window that displays some information
            (ID-s, sequence sizes, accession numbers and definition
            lines) about the analyzed sequences.

            Some or all of the file names for input and output data
            can be supplied on the command line,  the program will
            load these nemes into the appropriate window items.  Any
            order of names is accepted,  flags -s, -p and -o precede
            the names (no blanks between them) and specify their in-
            terpretation.

            A new subwindow with a matrix of integers appears, ei-
            ther when new sequence and common pattern files are
            loaded or when the span/index button is clicked.  Matrix
            element k at the intersection of row i and column j
            shows how many distinct triplets, that are no longer
            than i residues, occur in at least j sequences. One can
            set the span and commonality index parameters by click-
            ing the left mouse key (selection) then the right mouse
            key (confirmation) over the corresponding matrix ele-
            ment. The distribution of this selected set of k common
            patterns along the sequences will be displayed on the
            main canvas.

            One of the eight differently colored chips of the con-
            trol panel has shaded background, this is the active
            color that is used in the following bracketing, motif
            displaying and zooming operations. Active color selec-
            tion is done by a mouse click.

            When the pointer moves into the main canvas, the arrow
            changes to a crosshair. When the crosshair is placed on
            one of the sequence-representing horizontal lines and
            the left (right) mouse button is clicked, a left (right)
            bracket is deposited on the line. Common triplets that
            occur between the brackets will be assigned the active
            color. Click on the redisplay button, and the histograms
            get repainted. Some of the previously black vertical
            bars become partly or completely colored, depending on
            how many common triplets from the selected segment match
            the sequence at this position. After changing the active
            color, a second (third, etc.) peak can be bracketed, and
            additional common patterns can be assigned the new
            color. Clicks on the redisplay button update the color-
            ing of the histograms. Press the decolor button to
            delete all color assignments and to restart the coloring
            process.

            Frequently, corresponding sequence segments of
            homologous proteins will line up on the screen only if
            we offset the sequences and introduce appropriate gaps
            into them. Activating an item from the align menu will
            instruct VISA to do this. We can choose either simple
            alignment on a single color, or full alignment on all
            colors. In single color alignments all sequences are
            aligned to the sequence that has the bracket with the
            selected color (anchor sequence). Offset for a sequence
            is calculated so that the number of colored triplets
            matching the anchor sequence is maximized.  When the
            full alignment item is selected, the dominant peaks
            (peaks with most matches to the corresponding peak in
            the anchor sequence) are determined for all the colors
            and for all the sequences first. Next the dominant order
            of peaks (the order with the highest number of common
            triplets) is determined, then a longest path algorithm
            chooses which peaks can be included in the alignment.
            Appropriate gaps are then inserted into the sequences,
            and the resulting alignment is displayed. Horizontal
            rescaling might be necessary when changing to a display
            of aligned sequences, because offsets and gaps have to
            be accommodated. This rescaling is done automatically.

            Conserved sequence blocks at the center of single color
            alignments can be displayed with the zoom button. Se-
            quence data will be displayed in one-letter amino acid
            code in the right side of the zoom window. Common tri-
            plets matching the colored triplets of the anchor se-
            quence will be painted in the active color. A frame in-
            dicates the positions of brackets in the anchor se-
            quence. Positions in the other sequences that align with
            the left-bracket-residue of the anchor sequence are
            shown on the left, together with sequence identifiers.

            Pressing the motif button performs an implicit alignment
            on the active color, then brings a new window up. This
            window contains a list of all common triplets that match
            to any sequence in the conserved block of the alignment.
            The triplets are offset according to their positions in
            this block. The triplets are preceded by a number, that
            indicates how many sequences the triplet will match with
            this offset.

            Pressing the print button will dump information about
            positions and sizes of active windows into auxiliary
            output files. Stand-alone shell scripts need these data
            for creating hard copies from the screen.

            A secondary control panel opens up when the options but-
            ton is pressed. Two item here modify assignments of
            colors to triplets, in two different ways. When over-
            paint is checked the new assignment will overwrite ear-
            lier ones, otherwise attempts to assign a color to a
            triplet that already has one assigned color will be re-
            jected. With expand paint box checked the earlier as-
            signment remain in effect, and (depending on the setting
            of overpaint) repeated bracketing assign the active
            color to additional triplets. When the box is unchecked
            repeated bracketing with the same color will erase ear-
            lier assignments of that color, and only the latest
            bracket will determine what triplets receive the active
            color. When the  box suppress black is checked the paint
            routine is instructed not to show uncolored, black parts
            of the histogram. When box flip fg/bg is checked the
            background on the main canvas turns black, making light
            shades more visible. Two items in the options panel,
            horizontal scale and vertical scale, override the auto-
            matic histogram scaling. Five different sets of colors
            can be used in the analysis, the active palette is set
            by item palette selection. Item ignore threshold has a
            role in sequence alignments.  When a sequence has very
            little resemblance to the other sequences (the score is
            under t percent of the average resemblance of sequences
            to the whole block; t is the actual value of the
            threshold parameter), it is omitted from the alignment.

            Sizes of sequences, and distances between peaks can be
            estimated by the use of the ruler subwindow. Scales in
            this subwindow are adjusted automatically.

        INPUT
            Sequence data should be presented in a multisequence
            file in the GCG dataset format. Every sequence in the
            file is introduced by four consecutive chevron marks
            (">>>>") followed by an identifier string. A new line
            contains the description of the sequence, additional
            lines (lengths up to 511 characters) contain sequence
            data.

            Common pattern data can be created (and saved) from
            within VISA, or read from a pre-prepared data file. To
            create this data file off-line, use gcgpat1, the same
            program that would be invoked from VISA.

        OUTPUT
            The primary output of VISA is on your screen. The top
            panel of the main window contains control tools (but-
            tons, menus). Under this control panel is the main
            canvas. Horizontal lines represent the sequences that
            are being analyzed. Lengths of lines are proportional to
            sequence lengths and can be measured by a software
            ruler. A zoom window displays aligned sequence blocks, a
            motif window shows which common triplets occur in
            aligned sequence blocks. An annotation window is used to
            display some information about the analyzed sequences.

            There are some provisions for making hard copies from
            the screen.  VISA writes data about window sizes and
            positions into auxiliary output files (visa.corners,
            zoom.corners, motif.corners). Shell scripts
            (visadump.com, zoomdump.com and motifdump.com) use these
            data and several shareware utilities to dump parts of
            the screen and convert screendumps into Portable Pixel
            Map or Postscript standard files. The names of the
            target output files are specified as command line argu-
            ments for the appropriate script.

        EXAMPLE
            Start VISA by typing "visa" at your prompt.
            Press the PROCEED button in the welcome window.
            Click on the two LOAD checkboxes, then on the DONE but-
                ton in the "read sequence & pattern file" subwindow
                and load the demonstration set of xylanase sequences
                and their common patterns.
            Move the cursor over the number 503 (row of span 11,
                column of index 8) the span/index window, and click
                left. The number 503 gets framed, this is the number
                of triplets that are not longer than 11 residues and
                occur in at least 8 of the xylanase sequences. Click
                right mouse button to confirm the selection.
                Histograms appear on the main canvas, showing the
                distribution of common triplets along the sequences.
            Scroll the main canvas in both directions to see all the
                sequences.
            Move the haircross to the line of GUX_CELFI, to the left
                edge of the main peak, and click left. A small red
                left bracket should appear on the axis.
            Move the haircross to the right edge of this peak, about
                1 cm right from the left bracket, and click on the
                right mouse button. A right bracket appears, tri-
                plets in the enclosed interval are assigned red
                color.
            Click on the REDISPLAY botton. Wherever the selected
                triplets occur in the sequences, they get colored
                red. You should see red peaks on the histograms.
            Click on MOTIF. A new subwindow shows the selected com-
                mon triplets. This motif subwindow can be removed
                using the window controlling pin (upper left
                corner).
            Click on ZOOM to see how the sequences align on these
                red peaks. You should use the horizontal scrollbar
                to center aligned block into the middle of the text
                window. The limits of the aligned block are
                determined by the color assigning brackets of the
                histogram. This zoom subwindow can be removed by the
                standard XVIEW method (select QUIT from the frame
                menu).
            Click on the green square of the control panel, and
                select a second, uncolored peak by moving the cros-
                shair and clicking left, then right mouse buttons.
                Repeat this step, select the blue chip and bracket a
                third peak.
            Click on REDISPLAY to show how the selected triplets
                cluster in the sequences. Use MOTIF and ZOOM to dis-
                play the triplets and the alignments corresponding
                to the active color (the one selected in the control
                panel).
            Push the right button over the ALIGN menu and select the
                red item of the pop-up menu. The colored histograms
                will be shifted so that their red peaks come into
                alignment. The horizontal scales get automatically
                changed.
            Select the ALL item of the ALIGN menu. Colored
                histograms will be shifted and gaps will be inserted
                into them so that their peaks come into alignment,
                if possible. Selection of NONE will remove all gaps
                and leading offsets.
            A click on DECOLOR deletes all color selections.
            Press SPAN/INDEX and select a different set of triplets.
            A click on RULER brings up a small subwindow with scal-
                ing. Drag this over your sequence to measure sizes
                and distances.
            A click on OPTIONS activates a control panel. Checked
                FLIP box inverts the screen fg/bg colors. A checked
                SUPPRESS BLACK box will instruct the display routine
                not to display uncolored triplets. Experiment with
                HORIZONTAL SCALE, VERTICAL SCALE and PALETTE SELEC-
                TION items.
            A click on PRINT writes the coordinates of the main,
                zoom and motif windows into auxiliary files
            To create a postscript file (named e.g. xylanase.ps)
                with the image of the main window of VISA, run com-
                mand "visadump.com xylanase.ps".  This command has
                to be issued from a different window.  Do not move,
                resize or hide your VISA windows between your last
                PRINT and the completion of visadump.com script!
            Press QUIT to end the session and to leave VISA.

        REMARKS

            Bar heights of the histograms are calculated by counting
            the number of common triplets that match the position
            under consideration. One matching triplet contributes to
            three bars, with one for each of its three specific
            residues.

            Single color alignment and full alignment may result in
            different conserved blocks if the order of colored peaks
            are not the same in all of the sequences.

            Direct hard copy generation is intentionally not built
            into the program. Different users may have different
            hardware for this purpose, that would require different
            file formats. Also dumps of color images may produce
            huge output files.  Overly easy access to screendumps
            may result in quickly filling disks.

        ADDITIONAL TOOLS
            gcgpat1: to create common pattern data file.
            visadump.com (zoomdump.com, motifdump.com): makes a
                screendump of the primary (zoom, motif) canvas and
                converts this dump into a printable Postscript file.

        DISTRIBUTION by anonymous FTP
            Type "ftp vent.neb.com".
            Log in as "anonymous".
            Use your e-mail address at the password prompt.
            Type "cd /pub/software/visa" to move to the appropriate
                ditrectory
            To get the files type "mget *.*".
            Type "quit"  to leave ftp.
            If you have no Internet connection, send a request for
                the program to the address shown below.

        INSTALLATION
            The program runs on Sun workstations that use XView
            software. Detailed instructions are in the install.man
            file of the distribution kit.

        REFERENCE
            J. Posfai, Z. Szaraz and R.J.Roberts, "VISA: Visual Se-
            quence Analysis for the comparison of multiple amino
            acid sequences", Comp. Appl. Biosci. 10 (1994), pp.537-
            544.

        CONTACTS
            With your comments, questions or suggestions please con-
            tact:

            	Richard J. Roberts (or Janos Posfai)
            	New England Biolabs
            	32 Tozer Rd.,
            	Beverly, MA 01915.
            	phone:  508-927-5054
            	e-mail: roberts@neb.com (posfai@neb.com).
�