----------------------------------------------------------------------- MapView Utility software Version 1.0 Contact: Web: http://mummer.sourceforge.net ----------------------------------------------------------------------- LICENCE: open source, included with MUMmer 3.0 and above USAGE: see section 4, below. 1. WHAT IS MAPVIEW? ---------------- MapView is an utility program for displaying sequence alignments as provided by NUCmer or PROmer. For further information regarding these programs, please see the documentation and code at http://mummer.sourceforge.net . MapView takes the output from these programs and converts it to a FIG, PDF or PS file. It can break the output into multiple files for easier viewing and printing. Note that for very large reference genomes, FIG files viewed in the xfig program (Unix) may be the only option that allows the entire display to be stored in one file. 2. SYSTEM REQUIREMENTS ------------------- - PERL interpreter version 5.0 or greater. - fig2dev utility (see www.linux.org for transfig rpm package and installation documentation) - xfig viewer to visualize the FIG format (see www.linux.org regarding xfig rpm package) - Adobe Acrobat Reader for reading PDF formats (free from www.adobe.com) - Ghostscript Postscript interpreter to view PDF and postscript documents (on www.linux.org, look for the 'gv' rpm package) 3. INPUT ----- The input to MapView is the table generated by the "show-coords" program in MUMmer. It is important to use the -r -l options in show-coords in order to have the proper format for MapView. For PROmer output, it can be very helpful to run show-coords with the -k option as well, to reduce the redundant matches often found in highly similar regions. However, this option does not always select the appropriate reading frame. Both PROmer and NUCmer writes output into a specific format that can be found in the *.cluster and *.delta files. To translate this output into a human readable format, the "show-coords" program parses the delta alignment output of either NUCmer or PROmer and displays a summary information for each alignment. (Note that PROmer and NUCmer include command line options that allow them to generate the same summary information without running "show-coords" separately.) The output of show-coords is then used by MapView to create a FIG, PDF or PS file. An example of the standard output of show-coords, which is used directly as input for MapView, is below. This shows just the top few lines of a large file created by aligning an assembly of Drosophila pseudoobscura (165 million bases) to chromosome 2L of Drosophila melanogaster: /usr/local/db/euk/internal/d_melanogaster/na_arm2R_genomic_dmel_RELEASE3.FASTA celera_scaffs.fa PROMER [S1] [E1] | [S2] [E2] | [LEN 1] [LEN 2] | [% IDY] [% SIM] [% STP] | [LEN R] [LEN Q] | [COV R] [COV Q] | [FRM] [TAGS] ======================================================================================================================================================== 2540 2806 | 3216 3473 | 267 258 | 46.67 50.00 2.78 | 20302755 8916 | 0.00 2.89 | 2 3 2R 3211358 2540 2806 | 1939 2196 | 267 258 | 46.67 51.11 2.22 | 20302755 2375 | 0.00 10.86 | 2 1 2R 3211430 2540 2893 | 20172 19852 | 354 321 | 39.52 45.16 3.23 | 20302755 25647 | 0.00 1.25 | 2 -1 2R 3215406 2806 2534 | 5291 5536 | 273 246 | 41.94 47.31 3.76 | 20302755 12414 | 0.00 1.98 | -3 2 2R 3211507 .... For more information and an explanation of this format, please see the MUMmer manual http://mummer.sourceforge.net/manual 4. USAGE ----- USAGE: mapview [options] [UTR coords] [CDS coords] The optional UTR and CDS coordinates files, which are computed in based on the reference seq, should be in GFF format. These contain the coordinates of coding sequences and untranslated regions for genes on the reference genome, and will be displayed graphically if provided. GFF format is a tab-delimited file format with the following columns: Options : -f : pdf, ps or fig. the default is "fig". -x1 -x2 : only display the region on the reference genome between positions x1 and x2. By default the whole sequence will be diplayed. -d : the maximum distance (in bp) between the matches for which the matches will be linked. Default is 50000 bp. To explain: the query sequence may contain multiple contigs. All matches from the same contig are linked by drawing lines between each successive pair of matches. If the matches occur too far apart, then this can get very messy. Therefore we don't draw a line if the matches are further apart than specified by this parameter. This is especially important if the reference genome is very long and all the output is stored in a single graphical file. -m : set the magnification at which the figure is rendered to mag. The default is 1.0; this is an option for fig2dev which is used to transform the fig files to pdf or ps files. -n : the default is 10. The purpose of this parameter is to avoid making figures that are too 'large', in the sense that they cannot be converted to PDF by fig2dev. -p : the output file prefix; By default the name of the output file(s) will be PROMER_graph_.fig, where will be incremented for each output file. If you choose "-o MyName", for example, then the name of the first output file name will be MyName_0.fig. -h display this help; -v verbosely list the files processed; -g|ref If the input file is provided by 'mgaps', set the reference sequence ID (as it appears in the first column of the UTR/CDS coords file) -I Display the name of query sequences -Ir Display the name of reference genes 5. OUTPUT ------ the output can be fig, pdf, or ps files. The program uses fig2dev to transform FIG files to PDF or PS. If you supply UTR and CDS coords files, then the genes are displayed first, along the top. Alternatively spliced genes are shown on different rows, stacked vertically. The CDS regions (i.e., the protein coding portions of exons) are diplayed in light green and the 5'end and 3'end UTR's are in different colors. (For details, please see the legend in the left corner below the graphic.) The reference seq is displayed in light blue, and on a row imediately below it are shown the alignment matches. The alignment matches are displayed again in vertical positions depending on the percent identity (PID) of each match, ranging from 50% to 100%. Matches with PID< 50% (if any are included in the input file) are considered to have PID=50%. For better visualization, the connecting lines between matches are colored differently, using randomly chosen colors, from one query seq to the next. If these connecting lines are crossed, it indicates that the sequence has been reverse complemented to achieve the match; however, note that if a sequence is similar at both the protein and DNA level, we often detect matches in multiple reading frames. NUCmer and PROmer have options to display only one match when matches occur in multiple frames, but they don't always choose the correct orientation. 6. KNOWN PROBLEMS -------------- There is a known problem with the PDF files. Fig2dev has problems if the FIG file is too big. It will constantly export that file into a PDF with errors. We recomend using the PS format for files that are very big, or else breaking the files up using the -n option above.