Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) NAME _M_A_S_E - Multiple Aligned Sequence Editor. SYNOPSIS Don Faulkner and Jerzy (Jurek) Jurka of the Molecular Biology Computer Research Resource have designed and implemented an editor for genetic sequences. _M_A_S_E was designed to simplify the manipulation of aligned sequence sets. It contains functionality for many of the common tasks related to this goal. In addition, _M_A_S_E was designed with a very modular approach, making additions or modifications to the code straightforward. _M_A_S_E was published in ``Trends in Biochemical Sci- + ences'', vol. 13, pp. 321-322, 1988 + DESCRIPTION Several routines are used throughout _M_A_S_E. Some fami- liarity with these routines should make your work more efficient. _R_E_G_U_L_A_R _E_X_P_R_E_S_S_I_O_N_S Several of the _M_A_S_E functions use _R_E_G_U_L_A_R _E_X_P_R_E_S_S_I_O_N_S. These are ``wild card'' patterns related to search strings in editors. The reg- ular expression handler in _M_A_S_E is from GNU- | EMACS, and is Copyright (C) 1988, | The Free Software Foundation, Inc. (This copy- right applies only to the regular expression handler.) The syntax and capabilities of the regular expression handler are discussed in another section of this document. _M_E_N_U _I_N_T_E_R_F_A_C_E _M_A_S_E provides a common method input all responses. It exists in two ``flavors'' - ``menu'' and ``non-menu''. When the ``menu'' mode is active, there is a finite set of acceptable responses. This mode may be recognized by the appearance of a reverse video ``M'' at the left of the prompt (the prompt is always terminated by an arrow (`` -> ''). To select from the menu, you may type part of the name or browse the list with the arrow keys. _M_A_S_E uses an incremental com- pletion method for menu selections. This works like a little bit of ``ESP'' - it figures out what you will be typing next (to resolve any unambiguous portions of the string.) When the text displayed reflects the function or value you desire, terminate your input with a Printed 11/7/88 DFCI 1 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) carriage return. If you know some portion of + the command name you desire, but not the FIRST + part (incremental completion requires that you + know the first part), type an asterisk, then + the part of the command name you know. The + first command which has that string in it will + be selected. If more than one matched this + wild card specification, you can go up and down + the list of matches with M-p and M-n (previous + and next, respectively). + When the non-menu mode is active, one may type input freely. One may retrieve previous selec- tions by using the up and down arrows (if your arrows don't work, contact your _M_A_S_E adminis- trator for help). There are several special command keys that are available in the menu modes. These are: Up History UP ARROW, C-K, or C-P Retrieve earlier responses Down History DOWN ARROW or C-N Retrieve later response Cursor Backup LEFT ARROW, C-B, or C-H (backspace) Position cursor in the current string Cursor Forward RIGHT ARROW, C-F, or C-L Position cursor in the current string Delete Back - Delete character behind cursor Erase Entry C-U Erase the entire line Start of Entry C-A Position cursor at the beginning of the entry End of Entry C-E Position cursor at the end of the entry Printed 11/7/88 DFCI 2 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) Expand Aliases || M-a | Expand the aliases in the current | string. _M_A_S_E currently has no method | to introduce items into this alias | list, however! | Go to previous wild-card match M-p + Go to next wild-card match M-n + Expand File Names ++ M-f + Expand the current selection as file names. These file names will be appended to the history list. For many purposes, one may want to expand an ambiguous file specification to a list of files. This subroutine performs this function. The expansion is executed using the GNU-EMACS regular expression handler (regex-gnu(3)) to test for matches, modified to be more compatible with the name expansion done within the shells (sh(1) and csh(1)). The file name is first parsed into com- ponents using slashes (``/'') as a del- imiter. The wild card aspects apply only to the components - no wild card component may cross the boundary of a slash. If the file name does not specify an absolute path (i.e. it does not begin with a period (``.'') or a slash (``/'')), then each of the com- ponents of your ``DATA'' environmental variable will be prepended to the filename in turn before the expansions are attempted. The ``DATA'' environ- ment may be set via a command similar to ``DATA=:/seq/nbrf-12:/seq/gb- 52.nih:.:; export DATA" from the Bourne shell (sh(1)), or ``setenv DATA /seq/nbrf-12:/seq/gb-52.nih:.:" from the C shell (csh(1)). As you can see, the directories to be searched are separated by colons (``:''). The ele- ments of the ``DATA'' environment will _N_O_T be expanded (i.e. they may not Printed 11/7/88 DFCI 3 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) contain wild cards). If the file name has an at sign (``@'') as it's first character, the file will be regarded as an indirect file - a list of other files. Each element in this indirect list will be passed recursively through this mechanism. Thus, the indirect elements may refer to other indirects, and may have wild cards. Wild cards allowed are: ``*'' Matches any string, including the null string. ``?'' Matches any single character. ``[...]'' Matches any one of the char- acters enclosed. A pair of characters separated by ``-'' matches any charac- ter lexically between the pair. If the current text entry was ``*.seq'' and you were to hit M-f, all files that | matched ``*.seq'' in all directories | specified by your ``DATA'' environment would be added to the history list, which you could then select among via cursor keys. Help C-? (mapped for (``?'')) Display short help message regarding the current menu item. Apropos M-? Run _A_P_R_O_P_O_S to find about all things (commands, variables . . .) that have a specific key string Edit Help M-e | Edit the file containing the short help message associated with this menu item. Edit the file using the editor speci- fied in your ``EDITOR'' environment, or vi(1). This command is reserved for the use of the _M_A_S_E administrator. _V_A_R_I_A_B_L_E_S The main editing mode of _M_A_S_E references the Printed 11/7/88 DFCI 4 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) _I_N_T_E_R_N_A_L _V_A_R_I_A_B_L_E_S ``_C_O_L_U_M_N-_H_I_G_H_L_I_G_H_T'', ``_D_I_S_P_L_A_Y-_P_O_S_I_T_I_O_N'', ``_H_I_G_H_L_I_G_H_T- _D_I_F_F_E_R_E_N_C_E_S'', and ``_L_O_C_K-_W_I_N_D_O_W_S''. They are used as follows: _C_O_L_U_M_N-_H_I_G_H_L_I_G_H_T If set to some non-zero integer, every _C_O_L_U_M_N-_H_I_G_H_L_I_G_H_T'th column will be displayed in reverse video. This is to serve as a rule. For example, if you set _C_O_L_U_M_N-_H_I_G_H_L_I_G_H_T to ``5'', then columns numbered 5, 10, 15, 20, 25 ... would be highlighted. _D_I_S_P_L_A_Y-_P_O_S_I_T_I_O_N When set to non-zero, the position of the cursor in the current sequence will be printed at the bottom of the screen, both in ``real'' column numbers and in ``effec- tive'' column numbers (ignoring gaps). _H_I_G_H_L_I_G_H_T-_D_I_F_F_E_R_E_N_C_E_S Highlight letters that differ from this specified reference sequence. Sequence ``0'' refers to the consensus sequence. _L_O_C_K-_W_I_N_D_O_W_S When set, this will cause the two windows to shift in complete synchrony. This mode may not be fully functional. _S_T_R_I_N_G _C_O_N_V_E_R_S_I_O_N The routines _O_U_T_P_U_T (for output mapping strings), _M_A_P (only from within _T_A_K_E files), and _B_I_N_D (again, only from _T_A_K_E files) require one to be able to specify strings containing control and other weird characters. _S_T_R_I_N_G _C_O_N_V_E_R_S_I_O_N is used to convert from printable, editable strings into these control-laden strings. The strings are processed as follows. There are several backslash escapes. These were inspired by the ``C'' programming language. They are: \n a new line (hex 0xa, decimal 10, octal 012) \b a back space (hex 0x8, decimal 8, octal 010) \f a form feed (hex 0xc, decimal 12, octal 014) Printed 11/7/88 DFCI 5 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) \e an escape (hex 0x1b, decimal 27, octal 033) \s a space (hex 0x20, decimal 32, octal 040) (this is useful to prevent a string break) \M Meta next - the high bit of the next char- acter will be set \Z Control Meta next - the high bit of the next character will be set, and the fifth and sixth bits will be cleared. \### Enter a number in octal. Up to three digits will be ``eaten'' - it is easiest and least confusing if you always use exactly three digits, as in ``\002'' for octal 2. One may prefix a character with a caret (``^'') to controlize it (clear the fifth and sixth bits). Thus, could be used to specify a control-A. Note that there is overlap: ``^J'', ``^j'', ``\n'', and ``\012'' are all exactly equivalent. Also note that if one desires to use these character conversions to program keys for _M_A_P and _B_I_N_D, one can call these routines from _I_N_T_E_R_P_R_E_T-_L_I_N_E; the strings will be passed through _S_T_R_I_N_G-_C_O_N_V_E_R_S_I_O_N before being handed to _B_I_N_D and _M_A_P. Change Bars Throughout the manual, you will notice some characters in the right margin. These are change bars. They serve to indicate sections that have changed since the last release of the manual. A plus sign (``+'') indicates lines that have been added. A minus sign (``-'') indicates where lines have been deleted. A vertical bar (``|'') indicates lines that have been modified. Thus, one who is familiar with a version of _M_A_S_E should have fewer problems with the manual as new releases are available. Anchored positions There are provisions for setting _A_n_c_h_o_r _P_o_i_n_t_s + within sequences, represented by periods + (``.''). This permits one to edit the align- + ment within a specific domain without affecting + the alignment in subsequent domains. This is + implemented through several routines: _A_n_c_h_o_r- + Printed 11/7/88 DFCI 6 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) _C_h_e_c_k, _A_n_c_h_o_r-_C_r_e_a_t_e, _A_n_c_h_o_r-_S_y_n_c_h_r_o_n_i_z_e, _D_E_L- + _C_o_m_p_e_n_s_a_t_e_d-_B_a_c_k_w_a_r_d, _D_E_L-_C_o_m_p_e_n_s_a_t_e_d-_F_o_r_w_a_r_d, + _I_N_S-_C_o_m_p_e_n_s_a_t_e_d-_B_a_c_k_w_a_r_d, and _I_N_S-_C_o_m_p_e_n_s_a_t_e_d- + _F_o_r_w_a_r_d. These functions were conceived by Drs. + Jerzy Jurka and Ela Holsztynska. These func- + tions (mainly _A_n_c_h_o_r-_S_y_n_c_h_r_o_n_i_z_e) may be useful + for automatic generation of alignments. + Locus name display The locus names are displayed on the left side + of the screen. The number associated with each + is the ``locus number'' associated with that + sequnces. These numbers reflect the locus's + current position; they can be changed by + several functions. If the locus has been modi- + fied, the name will be displayed in reverse + video (see _S_a_v_i_n_g _y_o_u_r _w_o_r_k). If the first + column (the character at the left-hand margin) + is an asterisk (``*''), it means that that the + last _P_A_T_T_E_R_N-_H_I_G_H_L_I_G_H_T matched somewhere in + that sequence. This provides an at-a-glance + indicator of which sequences matched a given + pattern. + EXAMPLES FILES SEE ALSO _I_N_T_R_I_N_S_I_C _F_U_N_C_T_I_O_N_S For a discussion of the functions within _M_A_S_E. _I_N_T_E_R_N_A_L _V_A_R_I_A_B_L_E_S For a discussion of variables used by _M_A_S_E and its functions to modify their behavior. _S_T_A_R_T_U_P _F_I_L_E_S For a discussion on customizing _M_A_S_E behavior. _F_I_L_E _F_O_R_M_A_T_S For a discussion of input file formats required by various functions within _M_A_S_E. BUGS _M_A_S_E should be considered an application in develop- ment; there may be random glitches throughout; proceed with caution. Since _M_A_S_E is modular, and far from mature, any sugges- tions would be most appreciated (please use the _G_R_I_P_E function within _M_A_S_E to register your comments. Since _M_A_S_E is a complex program, and is evolving, cer- tain elements of the documentation may easily become Printed 11/7/88 DFCI 7 Mase(Introduction) UNIX Programmer's Manual Mase(Introduction) out of date; the printed documentation may be out of date compared to the functions and the on-line documen- tation (accessed via help and help within menus). _M_A_S_E tries to handle window size changes. Sometimes, it works. Often, the display will be ``not quite right''. Occasionally, _M_A_S_E will crash and core dump. The solution? For now, resize the window before you start _M_A_S_E. Eventually, there will be a proper solution for this problem... Printed 11/7/88 DFCI 8