/***************************************************************************** # Copyright (C) 1994-2008 by David Gordon. # All rights reserved. # # This software is part of a beta-test version of the Consed/Autofinish # package. It should not be redistributed or # used for any commercial purpose, including commercially funded # sequencing, without written permission from the author and the # University of Washington. # # This software is provided ``AS IS'' and any express or implied # warranties, including, but not limited to, the implied warranties of # merchantability and fitness for a particular purpose, are disclaimed. # In no event shall the authors or the University of Washington be # liable for any direct, indirect, incidental, special, exemplary, or # consequential damages (including, but not limited to, procurement of # substitute goods or services; loss of use, data, or profits; or # business interruption) however caused and on any theory of liability, # whether in contract, strict liability, or tort (including negligence # or otherwise) arising in any way out of the use of this software, even # if advised of the possibility of such damage. # # Building Consed from source is error prone and not simple which is # why I provide executables. Due to time limitations I cannot # provide any assistance in building Consed. Even if you do not # modify the source, you may introduce errors due to using a # different version of the compiler, a different version of motif, # different versions of other libraries than I used, etc. For this # reason, if you discover Consed bugs, I can only offer help with # those bugs if you first reproduce those bugs with an executable # provided by me--not an executable you have built. # # Modifying Consed is also difficult. Although Consed is modular, # some modules are used by many other modules. Thus making a change # in one place can have unforeseen effects on many other features. # It may takes months for you to notice these other side-effects # which may not seen connected at all. It is not feasable for me to # provide help with modifying Consed sources because of the # potentially huge amount of time involved. # #*****************************************************************************/ static char szReadMe1[] = "\n\ CONSED 20.0 DOCUMENTATION\n\ \n\ CONTENTS:\n\ 1 WHAT IS NEW IN CONSED 20.0\n\ 2 UPGRADING FROM CONSED 19.0 TO CONSED 20.0\n\ 3 INSTALLING CONSED\n\ 4 NOTE TO LINUX USERS (32 bit) (INSTALLATION)\n\ 5 NOTE TO LINUX USERS (64 BIT) (INSTALLATION)\n\ 6 NOTE TO ITANIUM LINUX USERS (INSTALLATION)\n\ 7 NOTE TO SOLARIS USERS (INSTALLATION)\n\ 8 NOTE TO MACOSX USERS (INSTALLATION)\n\ 9 QUICK TOUR OF CONSED\n\ 10 VARIOUS BATCH CONSED FEATURES\n\ 11 ALIGNING SANGER READS TO A REFERENCE SEQUENCE\n\ 12 USING AUTOPCRAMPLIFY\n\ 13 USING AUTOREPORT\n\ 14 FEATURES FOR SNP ANALYSIS\n\ 15 LESS USED CONSED FEATURES\n\ 16 CONSED CUSTOMIZATION\n\ 17 CREATING CUSTOM TAG TYPES\n\ 18 EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING\n\ 19 MONITORS AND MICE FOR CONSED\n\ 20 ACE FILE FORMAT\n\ 21 SAMPLE PHD BALL FORMAT\n\ 22 TIMESTAMP MISMATCH\n\ 23 CONSED REFERENCES\n\ 24 RUNNING PHRED and PHRAP\n\ 25 WHAT IS AUTOFINISH?\n\ 26 USING AUTOFINISH\n\ \n\ \n\ BIG TABLE OF CONTENTS:\n\ \n\ 1. WHAT IS NEW IN CONSED 20.0\n\ 2. UPGRADING FROM CONSED 19.0 TO CONSED 20.0\n\ 3. INSTALLING CONSED\n\ 3.10) SETTING UP TEST DIRECTORIES\n\ 3.11) PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION\n\ 3.26) ENOUGH MEMORY FOR CONSED\n\ 3.28) TESTING THE INSTALLATION\n\ 3.29) TESTING ADDING SOLEXA READS\n\ 3.30) TESTING ADDING 454 READS\n\ 3.31) TESTING 454 READS (NEWBLER ASSEMBLY)\n\ 3.32) TESTING ADD NEW READS\n\ 3.35) TESTING RUNNING CROSS_MATCH FROM ASSEMBLY VIEW\n\ 3.36) TEST RUNNING PHREDPHRAP\n\ 3.37) TESTING MINIASSEMBLIES\n\ 3.40) FAKE READS\n\ 3.41) APPENDING EXPID TO THE PHD FILES\n\ 4. NOTE TO LINUX USERS (32 bit) (INSTALLATION)\n\ 5. NOTE TO LINUX USERS (64 BIT) (INSTALLATION)\n\ 6. NOTE TO ITANIUM LINUX USERS (INSTALLATION)\n\ 7. NOTE TO SOLARIS USERS (INSTALLATION)\n\ 8. NOTE TO MACOSX USERS (INSTALLATION)\n\ 9. QUICK TOUR OF CONSED\n\ 9.1) USING CONSED GRAPHICALLY\n\ 9.4) SCROLLING\n\ 9.5) GOTO POSITION\n\ 9.6) COLORS\n\ 9.9) HIGHLIGHTING READ NAMES \n\ 9.10) DIMMING ENDS OF READS\n\ 9.11) TRACES AND EDITING\n\ 9.16) SCROLLING TRACES AND ALIGNED READS TOGETHER\n\ 9.17) SHOW ALL TRACES\n\ 9.18) SAVING THE ASSEMBLY\n\ 9.19) EXPORTING THE CONSENSUS\n\ 9.22) COMPLEMENTING THE CONTIG\n\ 9.23) FIND MAIN WINDOW\n\ 9.24) MULTIPLE UNDO EDIT\n\ 9.25) EXITING CONSED\n\ 9.26) CONSED -ACE\n\ 9.27) USING SOLEXA READS\n\ 9.31) SORTING OF READS\n\ 9.34) ALPHABETICAL SORTING OF READS\n\ 9.36) FINDING VARIANTS\n\ 9.41) EDITING/TAGGING SOLEXA READS\n\ 9.42) OVERSTRIKING THE CONSENSUS\n\ 9.43) NAVIGATING BY HIGH/LOW DEPTH OF COVERAGE\n\ 9.47) ADDING SOLEXA READS\n\ 9.56) ALIGNING SOLEXA READS AGAINST A LARGE GENOME AND SELECTING A SMALL REGION\n\ 9.62) USING YOUR OWN SOLEXA DATA\n\ 9.68) USING 454 READS (NEWBLER ASSEMBLY)\n\ 9.79) USING 454'S NEWBLER ON YOUR OWN DATA\n\ 9.84) USING 454 READS (ALIGNING TO REFERENCE SEQUENCE )\n\ 9.89) ADDING ADDITIONAL 454 OR SOLEXA READS (YOUR OWN DATA)\n\ 9.90) ASSEMBLY VIEW\n\ 9.91) READ DEPTH\n\ 9.92) FORWARD/REVERSE PAIR DEPTH\n\ 9.94) INCONSISTENT FORWARD/REVERSE PAIRS\n\ 9.102) SEQUENCE MATCHES\n\ 9.108) RUNNING CROSS_MATCH FOR SEQUENCE MATCHES\n\ 9.109) PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES)\n\ 9.113) MINIASSEMBLIES\n\ 9.117) CONTIG ARRANGEMENT--REORDER CONTIGS\n\ 9.120) CONTIG ORIENTATION\n\ 9.121) RESTRICTION FRAGMENTS\n\ 9.122) USING ANOTHER PROGRAM TO FIND CONSENSUS SITES (SUCH AS POLYMORPHIC SITES)\n\ 9.123) NAVIGATING\n\ 9.128) CUSTOM NAVIGATION\n\ 9.129) PRIMER-PICKING\n\ "; static char szReadMe2[] = "\n\ 9.134) CHECKING WHETHER A PARTICULAR OLIGO WOULD MAKE AN ACCEPTABLE PRIMER\n\ 9.135) PICKING PCR PRIMER PAIRS\n\ 9.136) ORDERING OF PRIMERS\n\ 9.137) SEARCH FOR STRING\n\ 9.138) COPY AND PASTE\n\ 9.139) ADD NEW READS (SANGER--NOT SOLEXA OR 454)\n\ 9.140) TEAR CONTIG\n\ 9.142) JOIN CONTIGS\n\ 9.143) COMPARE CONTIGS WINDOW AND INVERTED REPEATS\n\ 9.145) REMOVING READS\n\ 9.147) TAGS\n\ 9.148) CREATING LONG TAGS\n\ 9.149) CONSENSUS TAGS\n\ 9.151) WHAT THE COLORS MEAN\n\ 9.152) SEARCH FOR READ NAME\n\ 9.153) ONLINE DOCUMENTATION\n\ 9.154) THE .WRK LOG FILE\n\ 9.156) PROTEIN TRANSLATION AND OPEN READING FRAMES\n\ 10. VARIOUS BATCH CONSED FEATURES\n\ 10.1) FIXING CONTIG-ENDS\n\ 10.2) CHANGING THE CONSENSUS IN BATCH\n\ 10.3) AUTOEDIT\n\ 11. ALIGNING SANGER READS TO A REFERENCE SEQUENCE\n\ 12. USING AUTOPCRAMPLIFY\n\ 13. USING AUTOREPORT\n\ 13.1) VARIANTS REPORT\n\ 13.2) EDIT PARAMETERS: HOW TO CHANGE .consedrc PARAMETERS\n\ 14. FEATURES FOR SNP ANALYSIS\n\ 14.1) CREATING A SNP-MASKED PHASTER GENOME\n\ 14.2) PHASTER2MINIASSEMBLY.PERL\n\ 14.6) PHASTER2ACE.PERL\n\ 15. LESS USED CONSED FEATURES\n\ 15.1) MULTIPLE HIGH QUALITY DISCREPANCIES VS SEARCH FOR HIGHLY\n\ 15.2) BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY\n\ 15.3) SELECTIVELY BACKING OUT EDITS AND REMOVING READS\n\ 15.4) REMOVING READS FROM A PHRAP ASSEMBLY\n\ 15.5) ADDING READS WITHOUT CHROMATOGRAM FILES\n\ 15.6) ALIGNING READS TO A BACKBONE\n\ 15.7) COMPARING READS TO A REFERENCE SEQUENCE\n\ 15.8) TAGGING ALL READS AT ONCE\n\ 15.9) EDITING ALL READS AT ONCE\n\ 15.10) FASTER CONSED STARTUP FOR SANGER READS\n\ 15.11) VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS\n\ 15.12) HIDING SOME TYPES OF TAGS\n\ 15.13) CUSTOM CONTIG NAMES\n\ 15.14) ERROR RATE\n\ 15.15) RESTRICTION DIGEST\n\ 15.16) RESTRICTION DIGEST AND ASSEMBLY VIEW\n\ 15.17) MULTIPLE TRACE POPUP\n\ 15.18) MAXIMUM NUMBER OF TRACES DISPLAYED\n\ 15.19) SCALING THE TRACES \n\ 15.20) HOTKEYS FOR EDITING\n\ 15.21) SCROLLING TRACES INDEPENDENTLY\n\ 15.22) MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION\n\ 15.23) PREVENTING 2 USERS FROM MAKING CONFLICTING EDITS\n\ 15.24) PRINTING CONSED WINDOWS\n\ 15.25) COLOR MEANS EDITED AND TAGS\n\ 15.26) COLOR MEANS MATCH\n\ 16. CONSED CUSTOMIZATION\n\ 16.1) CUSTOMIZING NAVIGATE BY SINGLE STRANDED REGIONS AND NAVIGATE BY SINGLE\n\ 16.3) MAKING LIGHT BACKGROUND FOR SLIDES\n\ 16.4) COLOR BLINDNESS\n\ 17. CREATING CUSTOM TAG TYPES\n\ 18. EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING\n\ 18.1) BRINGING UP CONSED FROM A SCRIPT\n\ 18.2) CONTROL OF CONSED FROM SOME OTHER PROGRAM\n\ 18.3) REMOVING READS FROM A SCRIPT\n\ 18.4) HOW TO WRITE A CUSTOM NAVIGATION FILE\n\ 18.5) COMPRESSING CHROMATOGRAMS\n\ 18.6) READING CHROMATOGRAMS OUT OF AN EXTERNAL DATABASE\n\ 18.7) COMPRESSING ACE FILES\n\ 18.8) NO PHD FILES\n\ 18.9) ADDING TAGS FROM OTHER PROGRAMS\n\ 18.10) USER-DEFINED CONSENSUS POSITIONS\n\ 18.11) DEFINING KEYS (HOTKEYS) TO CALL EXTERNAL PROGRAMS AND/OR APPLY TAGS AND/OR\n\ 18.12) READ PREFIXES\n\ 18.13) USING FILES CREATED ON WINDOWS OR WINDOWS NT. \n\ 18.14) CREATING YOUR OWN ACE FILES (INSTEAD OF ACE FILES CREATED BY\n\ 18.15) CONSED OPTIONS\n\ 19. MONITORS AND MICE FOR CONSED\n\ 20. ACE FILE FORMAT\n\ 21. SAMPLE PHD BALL FORMAT\n\ 22. TIMESTAMP MISMATCH\n\ 23. CONSED REFERENCES\n\ 24. RUNNING PHRED and PHRAP\n\ 24.6) COMMON PROBLEMS RUNNING PHREDPHRAP\n\ 24.8) WHY ARE ALL THE READS NOT IN THE ASSEMBLY?\n\ 24.9) ARE THERE READS THAT ARE TOTALLY UNALIGNED?\n\ 24.10) CORRECTING FALSE JOINS MADE BY PHRAP\n\ 25. WHAT IS AUTOFINISH?\n\ 26. USING AUTOFINISH\n\ 26.3) AUTOFINISH: MINIMUM NUMBER OF ERRORS FIXED PER READ\n\ 26.4) EDIT PARAMETERS: HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS\n\ 26.6) DIVERSION: UNIX LESSON\n\ 26.7) AUTOFINISH: CHANGING MELTING TEMPERATURES\n\ 26.8) AUTOFINISH: JUST CLOSING GAPS\n\ 26.9) AUTOFINISH: JUST CLOSING GAPS JUST USING WALKS\n\ 26.10) AUTOFINISH: NOT REPEATING FAILED EXPERIMENTS\n\ 26.12) AUTOFINISH: NOT USING PARTICULAR SUBCLONE TEMPLATES\n\ 26.13) AUTOFINISH: NOT USING ENTIRE LIBRARIES FOR FINISHING\n\ "; static char szReadMe3[] = "\n\ 26.14) MULTIPLE LIBRARIES WITH DIFFERENT INSERT SIZES\n\ 26.15) AUTOFINISH CLOSING GAPS WITH MINILIBRARIES\n\ 26.16) CLOSING GAPS USING PCR\n\ 26.17) AUTOFINISH: TOO MANY UNIVERSAL PRIMER READS\n\ 26.18) AUTOFINISH FOR CDNA ASSEMBLIES\n\ 26.19) AUTOFINISH FOR LISTING GAP-SPANNING TEMPLATES\n\ 26.20) FINISHING A SPECIFIC CONTIG\n\ 26.21) MARKING THE END OF THE CLONE\n\ \n\ \n\ \n\ END BIG TABLE OF CONTENTS\n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 1. WHAT IS NEW IN CONSED 20.0\n\ \n\ -Better method of fixing the consensus and read alignments at the ends\n\ of contigs. Particularly useful after adding new reads. (see\n\ -fixContigEnds in README.txt)\n\ \n\ -fixes problem reading MIRA assembly output. A version of Velvet\n\ produces consed-ready ace files.\n\ \n\ -Ability of autoedit to fix repeated base errors in consensus such as\n\ this:\n\ \n\ consensus: ccc\n\ read1: cc*\n\ read2: *cc \n\ \n\ -Solexa (Illumina) reads can be edited and tagged\n\ \n\ -The consensus can be directly edited (ambiguity codes are allowed).\n\ \n\ -Batch changing consensus: You must supply a file of consensus\n\ positions to change\n\ \n\ -allows extrememly high depth of coverage (over 32,000 reads\n\ deep)\n\ \n\ -add454Reads.perl used to add all of the 454 reads in an sff file.\n\ Now it also allows you to specify particular reads.\n\ \n\ -Sorting of reads. By default reads are now sorted by a quality\n\ window about the cursor. In the Aligned Reads window, there is a\n\ \"sort\" menu allowing the ability to easily switch from one kind of\n\ sort to another. There is an indicator of which type of sort is\n\ currently used. Reads can also be sorted by base at the cursor\n\ position which is useful for snp and genotype calling.\n\ \n\ -Search for Highly Discrepant Positions now has the option of\n\ excluding positions in which the consensus has an x or an n (or any\n\ user-specified base). This is also available via autoreport.\n\ Search for string removes any \"*\" characters before searching.\n\ \n\ -Tears and joins: when making these, tags are added with lots of\n\ information in the comment (old contig name, # of reads, and # of\n\ base pairs, new contig name, # of reads, # of base pairs, etc)\n\ \n\ -selectRegions.perl: Allows multiple sequences in the same fasta file.\n\ Consensus positions, instead of starting at 1, start at position with\n\ respect to original big sequence\n\ \n\ -Main Consed Window's Contig List: there is now a button that\n\ toggles the order from ordered by number of reads to ordered by\n\ contig name (Contig5, Contig6, Contig7,...)\n\ \n\ -Saving highlighted read names to a file. In the Aligned Reads Window\n\ under the Misc menu is this feature. You are given a choice of just\n\ saving the reads in the current contig or in the entire assembly.\n\ \n\ -Aligned Reads Window: Click on a read base and a horizontal line\n\ will appear, aiding your eye in following along the read and finding\n\ the corresponding read name. This can be turned off.\n\ \n\ -Shift-click to select multiple contigs for \"Remove Contigs\".\n\ Shift-click to select multiple scaffolds in Assembly View's Reorient\n\ Contigs.\n\ \n\ -bug fixes, improved error messages, and performance improvements\n\ \n\ \n\ \n\ For experienced consed users who have completed the tutorial before, I\n\ suggest you look through the tutorial for each of the above sections\n\ and do those.\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 2. UPGRADING FROM CONSED 19.0 TO CONSED 20.0\n\ \n\ The safest route is to do a full installation. However, I recognize\n\ you might have spent much time customizing some of the scripts. Here\n\ are the scripts that have not changed and you do not need to reinstall\n\ and customize again:\n\ \n\ "; static char szReadMe4[] = "\n\ ace2Fasta.perl\n\ addReads2Consed.perl\n\ alignSolexaReads2Refs.perl\n\ amplifyTranscripts.perl\n\ countEditedBases.perl\n\ fasta2Ace.perl\n\ fasta2Phd.perl\n\ filter454Reads.perl\n\ findSequenceMatchesForConsed.perl\n\ lib2Phd.perl\n\ makePhdBall.perl\n\ phredPhrap\n\ removeReads\n\ revertToUneditedRead\n\ testSocket.perl\n\ transferConsensusTags.perl\n\ \n\ All other scripts have been changed, including:\n\ \n\ add454Reads.perl\n\ addSolexaReads.perl\n\ determineReadTypes.perl\n\ fasta2PhdBall.perl\n\ orderPrimerPairs.perl\n\ selectRegions.perl\n\ \n\ Several of the sample datasets have changed, several of the programs\n\ (e.g., sff2scfAndPhd.c) have changed, and the consed executable has\n\ (of course) changed.\n\ \n\ Don't be lazy with installation--you will only cause yourself more\n\ trouble later when things don't work.\n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 3. INSTALLING CONSED\n\ \n\ To install Consed, you must have some basic Unix system administration\n\ skills. For example, you must be able to run X applications such as\n\ xterm, you must know what PATH is for and how to add something to it,\n\ you must be able to edit a file using a Unix editor (such as emacs,\n\ vi, or pico), you must be able to move around in the filesystem from\n\ the command line, and you must know how to build/compile a program.\n\ If you do not know how to do these, find someone who does to help you\n\ and make sure they finish the job, including completing the tests\n\ below.\n\ \n\ REQUIRED VERSIONS OF PROGRAMS TO WORK WITH CONSED\n\ \n\ You MUST have the following versions of programs in order to use this\n\ version of Consed. If you are using a previous versions of these programs,\n\ please upgrade to the following versions. All of these programs come\n\ with consed except for phrap, cross_match, phred, and polyphred.\n\ \n\ (Note that the versions below are dates. For example, 1.080714 means\n\ year 2008, month 07 (July), and 14th day of the month--ignore the\n\ leading \"1.\". Thus 1.080714 is later than 1.080630.)\n\ \n\ 0.000925.c or later for phred (contact bge@u.washington.edu)\n\ 1.080721 or later for phrap and cross_match (see below for how to get them)\n\ 0.990622.e or later for phd2fasta (supplied with this version of consed)\n\ any version of addReads2Consed.perl (supplied with this version \n\ of consed)\n\ 080818 of phredPhrap (supplied with this version of consed)\n\ (Note: if you have an older version of phredPhrap, some of the\n\ more recent Consed features, such as miniassemblies, will not\n\ work. Note to existing polyphred users: phredPhrap now calls\n\ polyphred with different parameters which will cause it to apply\n\ different tags than it used to, but these different tags will\n\ give it behavior consistent with that described below in\n\ CONSED-POLYPHRED INTERACTION. For more information, see\n\ http://droog.gs.washington.edu/PolyPhred.html )\n\ \n\ 030117 or later for transferConsensusTags.perl (supplied with this \n\ version of consed)\n\ any version of tagRepeats.perl (supplied with this version of consed)\n\ any version of determineReadTypes.perl (or your own custom\n\ modified version)\n\ 080821 or later of sff2scf (supplied with this version--discard previous\n\ versions or you will be sorry. Type \"sff2scf -v\" and it should\n\ say the correct version. If it instead says:\n\ \"Error: Unable to open SCF file: ../chromat_dir/-v\", your\n\ version is old and should be discarded.\n\ \n\ \n\ If you are using polyphred, you must have polyphred 3.5 or more\n\ recent. USING AN OLDER VERSION OF POLYPHRED WILL CAUSE SEVERE\n\ PROBLEMS WITH CONSED WHICH WILL APPEAR AS PROBLEMS WITH FILETYPES.\n\ \n\ Everything that comes with this version of Consed should replace any\n\ of your files that have the same name (unless you really know what you \n\ are doing).\n\ \n\ Alas, phred, phrap, cross_match, and consed do not come together in a\n\ single package. Get phred from bge@u.washington.edu (Brent Ewing) Get\n\ phrap and cross_match from phg@u.washington.edu (Phil Green)\n\ \n\ To request the most recent version of phrap and cross_match, send an\n\ email to phg@u.washington.edu, with a Subject line that says \"phrap\n\ "; static char szReadMe5[] = "\n\ new version request\", and an email body that consists of the following\n\ two lines (it should be in exactly this format, to be computer\n\ readable):\n\ \n\ Request: phrap ver 1.080721 or later\n\ Registered phrap email address: [[insert address here]]\n\ \n\ The address should be the one you supplied previously when obtaining\n\ phrap; the new version will be sent to it. If you have not previously\n\ registered for phrap, or your registered address is no longer valid,\n\ you will need to include a license agreement (with all questions\n\ answered) in the email.\n\ \n\ \n\ Summary of files you must edit (instructions are below):\n\ addReads2Consed.perl\n\ determineReadTypes.perl\n\ phredPhrap\n\ primerCloneScreen.seq\n\ primerSubcloneScreen.seq\n\ repeats.fasta\n\ vector.seq\n\ \n\ In order to run the gauntlet of phred/phd2fasta/cross_match/phrap,\n\ there is a perl script phredPhrap supplied with Consed (above). YOU\n\ MUST USE THIS PERL SCRIPT. If you try to run each of these programs\n\ directly, you are on your own and you will probably spend a lot of\n\ time needlessly.\n\ \n\ \n\ 3.1) Using firefox, Safari, Internet Explorer, or some other browser\n\ on the computer of which you used for step 4, open url:\n\ \n\ http://bozeman.mbt.washington.edu/consed/consed.html#howToGet\n\ \n\ Click on the appropriate type of computer. Your browser (e.g.,\n\ firefox) will ask you what you want to name the file. Just use the\n\ default.\n\ \n\ \n\ If you are denied access, *carefully* follow the instructions on the\n\ \"Don't have a cow, man--You are not authorized to get this document\"\n\ page, including the try-to-get-consed part. Please do not email David\n\ Gordon until after you have followed these instructions.\n\ \n\ 3.2) Transfer the file to a UNIX (not a Windows) computer. Then type\n\ whichever of the following is appropriate for you. (This depends on,\n\ in netscape, when you saved the file, what name you gave the file).\n\ \n\ \n\ zcat consed_linux.tar.gz | tar -xvf -\n\ zcat consed_mac.tar.gz | tar -xvf -\n\ zcat consed_solaris.tar.gz | tar -xvf -\n\ zcat consed_solaris_x86.tar.gz | tar -xvf -\n\ \n\ Note: You must run tar on a UNIX computer--not on an Windows computer,\n\ due to a difference in the handling of breaks between lines.\n\ \n\ 3.3) I suggest you put Consed, phred, cross_match, phrap, the perl\n\ scripts, and other executables into /usr/local/genome/bin. So create\n\ /usr/local/genome/bin and \"cp\" all of the executables into it.\n\ \n\ If you can't actually use /usr/local/genome, then you could make\n\ /usr/local/genome be a link to the real location--that will work just\n\ as well.\n\ \n\ As a third choice, if you want to have another location xxx, then put:\n\ \n\ setenv CONSED_HOME xxx\n\ \n\ into the .cshrc (or equivalent if you are using bash or a shell other\n\ than csh or tcsh) of all Consed users\n\ \n\ and create $CONSED_HOME/bin and $CONSED_HOME/lib and put all of these\n\ programs into $CONSED_HOME/bin\n\ \n\ 3.4) Figure out the correct Consed executable file to use. If you are\n\ using Linux, type the following in order (below). Use the first one\n\ that does not give an error but simply says \"Version 20.0\".\n\ \n\ consed_linux64bit -v\n\ consed_linux64bit_static -v\n\ consed_linux32bit_dyn -v\n\ consed_linux32bit -v\n\ consed_linux_itanium -v\n\ \n\ If it says something like \"Exec format error. Wrong Architecture.\",\n\ try another executable.\n\ \n\ If it says something like \"error while loading shared libraries:\n\ libXp.so.6: cannot open shared object file: No such file or\n\ directory\", try another executable.\n\ \n\ \n\ \n\ If none of them work, then determine what kind of computer you have by\n\ typing:\n\ \n\ uname -a\n\ \n\ "; static char szReadMe6[] = "\n\ If it says something like this:\n\ \n\ Linux lake.interim.stanford.edu 2.6.9-78.0.1.ELsmp #1 SMP Tue Jul 22\n\ 18:11:48 EDT 2008 i686 i686 i386 GNU/Linux\n\ \n\ where there is an \"i686\" or \"i386\", then you have 32 bit linux.\n\ \n\ If it says something like this:\n\ \n\ Linux lake.interim.stanford.edu 2.6.9-67.ELsmp #1 SMP Wed Nov 7\n\ 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux\n\ \n\ where there is an \"x86_64\" present, then you have 64 bit linux.\n\ \n\ If it says something like this:\n\ \n\ Linux lake.interim.stanford.edu 2.4.21-sgi240rp04041413_10065 #1 SMP\n\ Wed Apr 14 13:09:51 PDT 2004 ia64 unknown\n\ \n\ where there is an \"ia64\" present, then you have itanium linux. \n\ \n\ \n\ \n\ If you have multiple computers, some may be 64 bit and others 32 bit.\n\ \n\ \n\ \n\ For mac, try them in this order:\n\ \n\ consed_mac_intel -v\n\ consed_mac_ppc -v\n\ \n\ If it says something like \"Bad CPU type in executable\", don't use that\n\ executable.\n\ \n\ If it says \" Library not loaded: /usr/X11/lib/libX11.6.dylib\n\ Referenced from: /usr/local/genome/bin/consed\n\ Reason: Incompatible library version: consed requires version 9.0.0 or\n\ later, but libX11.6.dylib provides version 6.2.0\"\n\ try the other executable.\n\ \n\ \n\ \n\ \n\ \n\ 3.5) Put the Consed executable in /usr/local/genome/bin (or $CONSED_HOME/bin)\n\ \n\ Read the appropriate section of this document: NOTE TO SOLARIS USERS,\n\ NOTE TO MACOSX USERS, NOTE TO LINUX USERS (32 BIT), NOTE TO LINUX\n\ USERS (64 BIT) or (if you are running Linux on an Itanium--a big 64\n\ bit box) NOTE TO ITANIUM LINUX USERS.\n\ \n\ 3.6) In /usr/local/genome/bin (or $CONSED_HOME/bin):\n\ \n\ Type:\n\ ln -s (consed executable name) consed\n\ \n\ where (consed executable name) is the name of one of:\n\ \n\ consed_linux64bit\n\ consed_linux32bit\n\ consed_linux_itanium\n\ consed_mac\n\ consed_solaris\n\ consed_solaris_intel\n\ \n\ This enables you to just use \"consed\" instead of consed_linux32bit (or\n\ whatever) in all commands to consed. It is also important since the\n\ scripts refer to \"consed\" rather than any of the names such as\n\ \"consed_linux32bit\".\n\ \n\ \n\ 3.7) Make sure that /usr/local/genome/bin (or $CONSED_HOME/bin) is in\n\ every Consed users' PATH.\n\ \n\ \n\ 3.8) Check this by logging on as a user and typing:\n\ \n\ rehash (don't worry if the rehash command says \"not found\")\n\ consed -V\n\ \n\ \n\ You should see 'Version 20.0'. If you see something else, you have\n\ some debugging to do.\n\ \n\ \n\ 3.9) Check that the correct version of cross_match is installed by\n\ typing:\n\ \n\ cross_match\n\ \n\ You should see:\n\ \n\ \n\ \n\ > cross_match\n\ \n\ cross_match cross_match \n\ cross_match version 1.080721\n\ \n\ "; static char szReadMe7[] = "\n\ cross_match version 1.080721\n\ Reading parameters ... 1.008 Mbytes allocated -- total 1.008 Mbytes\n\ \n\ Run date:time 081205:135315\n\ Run date:time 081205:135315\n\ FATAL ERROR: Sequence files must be specified on command line. See documentation.\n\ \n\ FATAL ERROR: Sequence files must be specified on command line. See documentation.\n\ \n\ \n\ \n\ where 1.080721 is a date in the form YYMMDD. It must be this date or\n\ more recent. Otherwise, follow the instructions above for getting\n\ cross_match (which is part of the phrap package).\n\ \n\ \n\ \n\ 3.10) SETTING UP TEST DIRECTORIES\n\ \n\ Copy the test directories and their contents to some location where\n\ the users have write access. Copy--do not move them--because the\n\ users will occasionally want a fresh copy. I've written the command\n\ to make it easy for you to cut/paste from this document to the command\n\ line:\n\ \n\ cp -r 454_newbler align454reads align454reads_answer assembly_view \\n\ autofinish solexa_example solexa_example_answer polyphred standard \\n\ selectRegions selectRegionsAnswer \\n\ (new_location) \n\ \n\ cd (new_location)\n\ chmod -R a+w *\n\ \n\ \n\ 3.11) PRELIMINARY TESTING OF CONSED BEFORE COMPLETING THE REST OF THE INSTALLATION\n\ \n\ From tne (new_location) where you put the test directories,\n\ type the following:\n\ \n\ cd standard/edit_dir\n\ \n\ \n\ 3.12) start Consed by typing \n\ consed\n\ \n\ If you get some error such as:\n\ \n\ Error: Can't open display:\n\ \n\ then the problem probably has nothing to do with Consed, but rather\n\ with X. To test this, run some other X application (such as xclock,\n\ xterm, xeyes, or xcalc) and see if you get the same error. (If you\n\ are running on MACOSX, you must start X11 and then consed in an\n\ xterm--see NOTE TO MACOSX USERS below.) The problem may be due to\n\ your X emulator. See 'MONITORS AND MICE FOR CONSED' below.\n\ \n\ Don't worry about a message like:\n\ Warning: Cannot convert string \"helvetica\" to type FontStruct\n\ \n\ \n\ Two windows will appear. One of these will have the list of .ace\n\ files and say 'select assembly file to open' and\n\ 'standard.fasta.screen.ace.1'. Double click on\n\ \"standard.fasta.screen.ace.1\". The first window will go away.\n\ \n\ You will now see a list of one contig and a list of reads. This is the\n\ 'Consed Main Window'. \n\ \n\ Double click on 'Contig1'.\n\ \n\ The 'Aligned Reads Window' will appear. \n\ \n\ Then follow the \"COPY AND PASTE\" instructions (elsewhere in this\n\ document) and check that that works. (This will not work on some\n\ versions of macosx. It should work on linux--if it doesn't, read the\n\ NOTE TO LINUX USERS below.)\n\ \n\ If this all works, consider this preliminary test successful.\n\ \n\ 3.13) Build phd2fasta:\n\ Go to the misc/phd2fasta directory and type 'make'\n\ Move the phd2fasta executable to /usr/local/genome/bin (or $CONSED_HOME/bin)\n\ \n\ 3.14) Build mktrace:\n\ Go to the misc/mktrace directory and type 'make'\n\ (If you get any warnings about \"gets\", ignore them.)\n\ Move the mktrace executable to /usr/local/genome/bin (or $CONSED_HOME/bin)\n\ \n\ 3.15) Build the 454 software:\n\ Go to the misc/454 directory and type (see below for solaris)\n\ \n\ gcc sff2scf.c -o sff2scf\n\ gcc sff2scfAndPhd.c -o sff2scfAndPhd\n\ gcc sffinfo.c -o sffinfo\n\ \n\ (or substitute your compiler for gcc) Move sff2scf\n\ into /usr/local/genome/bin (or $CONSED_HOME/bin)\n\ \n\ For solaris, type:\n\ \n\ "; static char szReadMe8[] = "\n\ gcc -DSOLARIS sff2scf.c -o sff2scf\n\ gcc -DSOLARIS sff2scfAndPhd.c -o sff2scfAndPhd\n\ gcc -DSOLARIS sffinfo.c -o sffinfo\n\ \n\ Copy the executables to /usr/local/genome/bin (or $CONSED_HOME/lib):\n\ \n\ cp sff2scf sff2scfAndPhd sffinfo /usr/local/genome/bin\n\ \n\ 3.16) Move all perl scripts from the scripts directory to\n\ /usr/local/genome/bin (or $CONSED_HOME/bin)\n\ Make sure all are executable by typing:\n\ chmod a+x *\n\ Make sure all are readable by typing:\n\ chmod a+r *\n\ \n\ 3.17) Create a subdirectory /usr/local/genome/lib (or $CONSED_HOME/lib)\n\ \n\ 3.18) In /usr/local/genome/lib (or $CONSED_HOME/lib), put phredpar.dat\n\ which comes with phred\n\ \n\ 3.19) Create a subdirectory /usr/local/genome/lib/screenLibs. (If you\n\ are using a location other than /usr/local/genome for the root of all\n\ Phred/Phrap/Consed programs, create $CONSED_HOME/lib/screenLibs). \n\ \n\ 3.20) From the misc subdirectory, copy the following files to the\n\ directory /usr/local/genome/lib/screenLibs (or\n\ $CONSED_HOME/lib/screenLibs).\n\ \n\ filter454Reads.fa\n\ primerCloneScreen.seq\n\ primerSubcloneScreen.seq \n\ repeats.fasta\n\ sffLinkers.fa\n\ singleVectorForRestrictionDigest.fasta\n\ vector.seq \n\ \n\ \n\ filter454Reads.fa is the puc19 vector used to produce 454 reads. 454\n\ reads containing puc19 vector are eliminated.\n\ \n\ primerCloneScreen.seq is used to screen candidate primers when you use\n\ Consed's function \"Pick Primer from Clone Template\" (on the Aligned\n\ Reads Window).\n\ \n\ primerSubcloneScreen.seq is used to screen candidate primers when you\n\ use Consed's function \"Pick Primer from Subclone Template\" (on the\n\ Aligned Reads Window).\n\ \n\ repeats.fasta is used to tag repeats (to put a blue line under the bases)\n\ \n\ vector.seq is used to mask the parts of reads that are from vector\n\ rather than insert\n\ \n\ sffLinkers.fa contains the linkers for 454 reads that separate the 2\n\ reads of a read pair.\n\ \n\ \n\ Take a look at files primerCloneScreen.seq, primerSubcloneScreen.seq,\n\ repeats.fasta, and vector.seq: They are dummy files indicating the fasta\n\ format of the sequences that should be put in them. \n\ \n\ 3.21) You should put\n\ into primerCloneScreen.seq the vector sequence of the cloning vectors\n\ you are using (BAC or cosmid) and into primerSubcloneScreen.seq the\n\ sequencing vectors you are using (plasmid, M13, etc). Don't be too\n\ generous in putting lots of vectors into the files! The larger they\n\ are, the slower primer picking will be. Our files are only this big:\n\ \n\ -rw-r--r-- 1 root root 29938 Nov 7 1997 primerCloneScreen.seq\n\ -rw-r--r-- 1 root root 7381 Aug 13 1997 primerSubcloneScreen.seq\n\ \n\ and primer picking is quite fast enough.\n\ \n\ TESTING PRIMER PICKING\n\ \n\ 3.22) Follow the steps above under PRELIMINARY TESTING OF CONSED BEFORE\n\ COMPLETING THE REST OF THE INSTALLATION to bring up the Aligned Reads\n\ Window on Contig1.\n\ \n\ Go to some location near the right end of the contig, say base\n\ 2470. Click with the right mouse button on the consensus and click on\n\ either one of the top strand primer choices (either from subclone\n\ template or from clone template). Consed will pause a moment, and\n\ then there will appear a selection of primers that pass all of\n\ Consed's requirements. (If you get an error message, Consed might not\n\ have been correctly installed. See INSTALLING CONSED above.)\n\ Templates are also chosen for each primer. You may have to scroll the\n\ primer list to the right to see the templates. Consed lists these\n\ templates in order of quality--all of them will cover the read you\n\ want to make.\n\ \n\ 3.23) You should put into the file\n\ /usr/local/genome/lib/screenLibs/vector.seq\n\ \n\ (or $CONSED_HOME/lib/screenLibs/vector.seq if you are not using\n\ /usr/local/genome for the root of the Phred/Phrap/Consed files.)\n\ \n\ the vector sequences (in FASTA format) that you want\n\ to mask out before running phrap. In general, it is the combination of\n\ primerCloneScreen.seq and primerSubcloneScreen.seq. I've given you a\n\ "; static char szReadMe9[] = "\n\ dummy file, but you should replace it with your real vector.\n\ \n\ 3.24) You should put into the file\n\ /usr/local/genome/lib/screenLibs/repeats.fasta\n\ \n\ (or $CONSED_HOME/lib/screenLibs/repeats.fasta if you are not using\n\ /usr/local/genome for the root of the Phred/Phrap/Consed files.)\n\ \n\ any sequences (in FASTA format) that you want to have automatically\n\ tagged (visibly marked by a blue line in Consed). These typically are\n\ ALU sequences. If you don't want to tag anything, then comment out\n\ (put '#' as the first character of the line) the following lines in\n\ phredPhrap:\n\ \n\ To not tag anything, change:\n\ !system( \"$tagRepeats $szAceFileToBeProduced\" ) \n\ || die \"some problem running $tagRepeats\";\n\ \n\ to:\n\ #!system( \"$tagRepeats $szAceFileToBeProduced\" ) \n\ # || die \"some problem running $tagRepeats\";\n\ \n\ 3.25) You should create a file\n\ /usr/local/genome/lib/screenLibs/singleVectorForRestrictionDigest.fasta\n\ containing the cloning vector sequence. This is used for doing\n\ in-silico restriction digests. Thus this cloning vector must start at\n\ precisely the site where you cut the (circular) vector to ligate the\n\ insert. It is not sufficient to just download the vector sequence\n\ from Genbank because they may start the sequence at a different site.\n\ \n\ 3.26) ENOUGH MEMORY FOR CONSED\n\ \n\ Enough memory is vital with large datasets. Even if you have\n\ enough physical memory, the operating system may not allow a single\n\ process to use it all. \n\ \n\ In csh or tcsh type:\n\ \n\ limit\n\ \n\ You should see something like this:\n\ \n\ cputime unlimited\n\ filesize unlimited\n\ datasize 2097148 kbytes\n\ stacksize 8192 kbytes\n\ coredumpsize 0 kbytes\n\ vmemoryuse unlimited\n\ descriptors 64 \n\ \n\ Type:\n\ limit datasize unlimited\n\ Then type:\n\ limit\n\ just to see that the number has changed.\n\ \n\ 3.27) Make sure you have enough swap space to support the amount of RAM\n\ on the computer.\n\ \n\ \n\ \n\ To get you started for doing the demonstration, I've provided such a\n\ file that will work for the test datasets, but will not work for your\n\ own data.\n\ \n\ 3.28) TESTING THE INSTALLATION\n\ \n\ After installing Consed, you should run all the following tests to\n\ make sure you have installed everything correctly:\n\ \n\ If one of the tests (below) fails with a message like:\n\ \n\ \"couldn't execute ...\"\n\ \n\ then you can troubleshoot the problem by going to the directory where\n\ this error occurred and type the command that failed. If the command\n\ includes any output redirection (e.g, 2>/dev/null or >>temp or >temp),\n\ remove everything that occurs on the line after the 2> or > so that\n\ all output comes to your screen.\n\ \n\ \n\ 3.29) TESTING ADDING SOLEXA READS\n\ \n\ Follow the 8 steps under \"ADDING SOLEXA READS\" (below)\n\ \n\ Troubleshooting: If you get an error like this:\n\ \n\ couldn't execute time /home/genome/BioSw/consed18/bin/cross_match\n\ reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0\n\ -minscore 25 -gap1_only -repeat_screen 2\n\ >>alignmentFile.081205_130653.cross.0 2>/dev/null\n\ \n\ then run it on the command line without \"time\" and without the \">>\"\n\ and \"2>\" so you can see any errors:\n\ \n\ /home/genome/BioSw/consed18/bin/cross_match\n\ reads081205_130653.fa.0 bacref.fa -discrep_lists -tags -masklevel 0\n\ -minscore 25 -gap1_only -repeat_screen 2\n\ \n\ If this says: \n\ "; static char szReadMe10[] = "\n\ FATAL ERROR: Command line option -gap1_only not recognized\n\ that indicates that you are not running the correct version of\n\ cross_match (see above).\n\ \n\ \n\ \n\ 3.30) TESTING ADDING 454 READS\n\ \n\ Follow the 4 steps under \"USING 454 READS (ALIGNING TO REFERENCE\n\ SEQUENCE )\" (below)\n\ \n\ 3.31) TESTING 454 READS (NEWBLER ASSEMBLY)\n\ \n\ Follow the first 6 steps under \"USING 454 READS (NEWBLER ASSEMBLY)\" and\n\ especially be sure that the traces pop up.\n\ \n\ \n\ 3.32) TESTING ADD NEW READS\n\ \n\ It will make your life easier if phred, phrap, and cross_match are\n\ all where Consed expects them: in /usr/local/genome/bin\n\ \n\ 3.33) Decide where to put phred's parameter file phredpar.dat and edit\n\ both addReads2Consed.perl and phredPhrap to reflect this location. I\n\ generally prefer to put it in /usr/local/genome/lib to keep all of the\n\ Phred/Phrap/Consed files in one place. \n\ \n\ 3.34) Next you should test the ADD NEW READS step in the Quick Tour\n\ (below). This step requires that everything be set up correctly and\n\ in the correct location. Hopefully the error messages are clear\n\ enough to help you if you have set up anything incorrectly.\n\ \n\ 3.35) TESTING RUNNING CROSS_MATCH FROM ASSEMBLY VIEW\n\ \n\ See RUNNING CROSS_MATCH FOR SEQUENCE MATCHES (below) and make sure\n\ that step works.\n\ \n\ 3.36) TEST RUNNING PHREDPHRAP\n\ \n\ See the section RUNNING PHRED and PHRAP (below) \n\ \n\ \n\ 3.37) TESTING MINIASSEMBLIES\n\ \n\ See PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES) and\n\ MINIASSEMBLIES (below) and make sure those steps work. \n\ \n\ The newer version of phredPhrap is required for this. If you have\n\ invested a lot of work customizing some ancient version of phredPhrap\n\ (e.g., 10 years old), and don't want to upgrade, you do have the\n\ option of keeping your customized version of phredPhrap for regular\n\ assemblies, and using the new version of phredPhrap for\n\ miniassemblies. To do this, you must specify the alternate\n\ name/location of phredPhrap by the .consedrc parameter:\n\ \n\ consed.fullPathnameOfMiniassemblyScript: /usr/local/genome/bin/phredPhrap\n\ \n\ (See CONSED CUSTOMIZATION below.)\n\ \n\ \n\ ------ NOTE: You might be done installing consed --------\n\ \n\ \n\ The following 4 installation steps are only necessary if you are using\n\ autofinish or consed's primer picker *and* if you are using\n\ Sanger reads. Otherwise, you can skip:\n\ \n\ MODIFYING determineReadTypes.perl\n\ TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl\n\ FAKE READS\n\ APPENDING EXPID TO THE PHD FILES\n\ \n\ \n\ 3.38) MODIFYING determineReadTypes.perl\n\ \n\ Read the comments in determineReadTypes.perl\n\ \n\ Phrap, Consed's primer picking, and Consed/Autofinish all need the\n\ following information for each read:\n\ is it a univeral primer forward, a universal primer reverse, \n\ or a walking read?\n\ what is its template name?\n\ \n\ If you are using different libraries that have different insert sizes, \n\ then Consed/Autofinish also need the library name for each read.\n\ \n\ Generally this information can be determined from the read name, using\n\ *your* naming convention. Modify the perl script\n\ determineReadTypes.perl to put this information at the end of the phd\n\ file using WR info items.\n\ \n\ If you don't want to do much perl programming and all your libraries\n\ have the same insert size, you have the option of using the St Louis\n\ naming convention. In this case, you needn't do anything with\n\ determineReadTypes.perl\n\ \n\ You must also uncomment (remove the \"#\"s in column 1) the lines in\n\ the phredPhrap script that say roughly:\n\ \n\ #print \"\n\n--------------------------------------------------------\n\";\n\ "; static char szReadMe11[] = "\n\ #print \"Now running determineReadTypes.perl...\n\";\n\ #print \"--------------------------------------------------------\n\n\n\";\n\ \n\ #!system( \"$determineReadTypes\" ) || die \"some problem running determineReadTypes.perl $!\n\";\n\ \n\ But what is the St Louis naming convention? Most of it (but not all)\n\ is explaned in the file phrap.doc that comes with phrap. In addition,\n\ you must never use an underscore in the name if the read is a\n\ universal primer forward or universal primer reverse read. If the\n\ read is a walk, then you must have an underscore (_) follow the\n\ template name and then have a number (the oligo number).\n\ \n\ Examples of reads in the St Louis naming convention:\n\ \n\ read eeq03a01.g1 is univ rev template: eeq03a01 library: eeq03\n\ read eeq03a02.b1 is univ fwd template: eeq03a02 library: eeq03\n\ read eeq03a02.g1 is univ rev template: eeq03a02 library: eeq03\n\ read eeq03a03.b1 is univ fwd template: eeq03a03 library: eeq03\n\ read eej45h07_2.i1 is walk template: eej45h07 library: eej45\n\ read eej46c12_1.i1 is walk template: eej46c12 library: eej46\n\ \n\ \n\ Once you have correctly customized determineReadTypes.perl, then\n\ uncomment the line in phredPhrap which calls determineReadTypes.perl\n\ \n\ It is fine to assume the St Louis naming convention for the purpose of\n\ the sample dataset directories that come with Consed (\"standard\",\n\ \"assembly_view\", \"autofinish\", and \"polyphred\").\n\ \n\ 3.39) TROUBLESHOOTING YOUR CHANGES TO determineReadTypes.perl\n\ \n\ Consed allows you to check that you have correctly modified\n\ determineReadTypes.perl: On the Consed Main Window, point to 'Info',\n\ hold down the left mouse button, and release on 'Show Info for Each\n\ Read'. Study all the information and check that the information\n\ presented is correct. If, for example, Consed thinks that there are\n\ templates that have 9 or more reads, it is likely that you have not\n\ correctly customized determineReadTypes.perl\n\ \n\ You will see a section that looks like this:\n\ \n\ template djs736a2_fp04q286 with 2 reads\n\ djs736a2_fp04q286.x2 term universal forward (from phd file)\n\ djs736a2_fp04q286.y2 term universal reverse (from phd file)\n\ \n\ You want to see the \"from phd file\" part. If, instead of \"from phd\n\ file\", it says \"inferred from name\", that means that\n\ determineReadTypes.perl couldn't figure out what kind of read it was.\n\ \n\ If you think you have made a mistake in customizing\n\ determineReadTypes.perl, it is best to delete the PHD files (and\n\ phd.ball if you are using that) and run phredPhrap again since the\n\ otherwise incorrect WR items will be left in the PHD files.\n\ \n\ There is more specific documentation within the script\n\ determineReadTypes.perl for more information about how to customize\n\ it.\n\ \n\ CUSTOMIZING determineReadTypes.perl: SPECIAL CASES\n\ \n\ \n\ 3.40) FAKE READS\n\ \n\ By \"fake reads\" I mean reads such as those created from a Genbank\n\ reference sequence or a consensus from some other assembly... or others\n\ for which there is no chromatogram (and there never was any\n\ chromatogram). If you don't use any such reads, you can skip this\n\ step. \n\ \n\ In the past, any read that ended with a .a2 or .c3 (where 2 and 3\n\ could be any numbers), was considered a fake read. Now you can make\n\ Autofinish not assume this using the .consedrc parameter (see CONSED\n\ CUSTOMIZATION): \n\ \n\ consed.fakeReadsSpecifiedByFilenameExtension: false\n\ \n\ \n\ \n\ Instead, you must have determineReadTypes.perl put \"fake\" into the\n\ \"type:\" field of a \"template\" WR item. See determineReadTypes.perl for\n\ more information.\n\ \n\ \n\ 3.41) APPENDING EXPID TO THE PHD FILES\n\ \n\ If you are not using Autofinish, you can skip this step. If you are\n\ using Autofinish, and would like Autofinish to tell you how well your\n\ reads are succeeding, then the phd files must be appended with the\n\ experiment id's. In the 3 Autofinish summary files (*.univReverse,\n\ *.univForwards, and *.customPrimers), you will see information like\n\ this:\n\ \n\ univ rev,,,->,-329,-249,71,Contig1,3,djs228_1034\n\ \n\ or this:\n\ \n\ tgaagaaatggctgactcc,56,1,->,3258,3338,3658,Contig1,4,djs228_2813,5,djs228_168,6,djs228_1248\n\ \n\ The '3' just before the djs228_1034 on the line starting with \"univ\n\ rev\" is an experiment id. There is\n\ "; static char szReadMe12[] = "\n\ also an expid '4' just before djs228_2813, an expid '5' before\n\ djs228_168, and an expid '6' just before djs228_1248.\n\ \n\ Autofinish doesn't know what you will end up calling these reads it is\n\ telling you to make. Autofinish only knows those reads by the numbers\n\ 3, 4, 5, and 6. So when you make the reads, Autofinish needs to be\n\ informed that this is 'experiment 3' or whatever. You do this by\n\ appending in the phd file the following structure:\n\ \n\ WR{\n\ expid addExpid 990811:140818\n\ 5\n\ }\n\ \n\ where WR stands for 'whole read item', \n\ expid for 'expid'\n\ addExpid is the name of the program that you will write that\n\ will append this information\n\ 990811:140818 is the date and time in format YYMMDD:HHMISS\n\ 5 is the expid\n\ \n\ This program must be run *after* phred runs to create the phd files.\n\ Thus your program must have some method of determining what the expid\n\ of each read is. What the University of Washington Genome Center does\n\ is to have the finishers put the expid as part of the filename. This\n\ makes it easy for a program to look at the phd file and figure out\n\ what the expid is and then write the WR item into that phd file. \n\ \n\ Alternatively, you could keep a database and, after the phd file is\n\ created, look into the database to see what the expid is.\n\ \n\ When you have successfully added expid's to the phd files, the next\n\ time you run Autofinish on this project, see the 'EVALUATE' section of\n\ the Autofinish output file--you will see lots of interesting\n\ information about how well the reads succeeded.\n\ \n\ \n\ \n\ \n\ --------------------------------------------------------------------------\n\ 4. NOTE TO LINUX USERS (32 bit) (INSTALLATION)\n\ \n\ Do you know for a fact that your computer is not a 64 bit computer?\n\ If it is, download consed_linux64bit instead of this version because\n\ the 64 bit version will allow you to use consed on larger assemblies.\n\ \n\ We have found that there is a large variation among different linux\n\ systems (even those with the same kernel) so I have provided 2\n\ different executables (consed_linux32bit and consed_linux32bit_dyn)\n\ in the hope that one will work for you.\n\ \n\ With one of them, consed may not come up at all but rather terminate\n\ with an error such as the following:\n\ \n\ > ./consed\n\ ./consed: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory\n\ \n\ or\n\ \n\ > ./consed: symbol regexec, version GLIBC_2.3.4 not defined in file\n\ libc.so.6 with link time reference\n\ \n\ (See below for suggestions from Consed/linux users with similar\n\ experiences.)\n\ \n\ 4.1) If Consed does come up, do the following test:\n\ \n\ Bring up Consed with the standard dataset as shown in QUICK TOUR OF\n\ CONSED (above) and open standard.fasta.screen.ace.1 as shown in the\n\ QUICK TOUR. After Consed is up, on the Main Consed Window there is a\n\ menu \"Help\" on the top right. Push the left mouse button down on Help\n\ menu. There will be a list of choices that will appear. While still\n\ holding down the left mouse button, drag the cursor to \"Test Exception\n\ Handling\" and release the left mouse button.\n\ \n\ If a popup window appears with a \"Dismiss\" button, you are fine (but\n\ you should still read the rest of this note). If Consed terminates,\n\ then this Consed executable does not work with the exception handling\n\ shared libraries you have installed. Try a different consed\n\ executable or find different shared libraries, as discussed below.\n\ \n\ If you are using consed_linux2.4, in /usr/lib, there must be a file:\n\ libstdc++-libc6.2-2.so.3\n\ \n\ If you try to run consed and this is missing, you will see an error\n\ message like this:\n\ \n\ > ./consed\n\ ./consed: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory\n\ \n\ I have provided this file in case you don't have it. Just put it in\n\ /usr/lib and see if that fixes the problem.\n\ \n\ One consed user reports:\n\ \n\ did a little poking around and found that i needed:\n\ compat-libstdc++-7.3-2.96.118 RPM for i386 since i'm running fedora\n\ core 1 at the moment. ... Anyway, if anyone gets this error tell\n\ them they're missing the Standard C++ libraries for Red Hat 7.3\n\ backwards compatibility compiler and it can be downloaded here:\n\ "; static char szReadMe13[] = "\n\ \n\ http://www2.linuxforum.net/RPM/fedora/core/1/Fedora/RPMS/compat-libstdc++-7.3-2.96.118.i386.html\n\ \n\ \n\ If you get warnings like this:\n\ \n\ Warning: String to TranslationTable conversion encountered errors\n\ Warning: translation table syntax error: Unknown keysym name: osfActivate\n\ Warning: ... found while parsing ':osfActivate: \n\ PrimitiveParentActivate()'\n\ \n\ it can be fixed by:\n\ export XKEYSYMDB=/usr/share/X11/XKeysymDB\n\ \n\ If you can't cut and paste (e.g., if you highlight a segment of the\n\ consensus sequence, you should be able to paste it into the search\n\ window. It gets highlighted, but nothing gets pasted), fix it by:\n\ \n\ using the dynamic executable: consed_linux2.6_dyn\n\ which will require libXm.so.3 If you have libXm.so.4.0.0 but not\n\ libXm.so.3, you might have to:\n\ ln -s /usr/lib/libXm.so.4.0.0 /usr/lib/libXm.so.3\n\ \n\ (It looks like libXm.so.4.0.0 comes from openmotif-2.3.0-0.1.9.3)\n\ \n\ \n\ One Consed user reported the following error message:\n\ \n\ \"Consed error : could not allocate colormap entry for color \"yellow\".\n\ This may be due to a spelling error in your color name. Or this may be\n\ because some other application is a hog. If it is netscape, run\n\ netscape via netscape -install\"\n\ \n\ He said that the fix above (consed_linux2.6_dyn and libXm.so.3) fixed\n\ this problem as well.\n\ \n\ Another system administrator said:\n\ \n\ \"I got the error:\n\ \n\ ../../consed_linux: error while loading shared libraries:\n\ libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file\n\ or directory\n\ \n\ In order to resolve the issue on linux boxes you need to install the\n\ compatibility libraries.\n\ \n\ \"To be on the safe side I installed the following\n\ \n\ compat-libstdc++-33.i386 3.2.3-61\n\ gcc-c++.i386 4.1.2-14.el5\n\ gcc.i386 4.1.2-14.el5\n\ cpp.i386 4.1.2-14.el5\n\ libstdc++-devel.i386 4.1.2-14.el5\n\ libgomp.i386 4.1.2-14.el5\n\ libstdc++.i386 4.1.2-14.el5\n\ libgcc.i386 4.1.2-14.el5\n\ \n\ \"I believe you may get away with just the first compat-libstdc\n\ package.\"\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ 5. NOTE TO LINUX USERS (64 BIT) (INSTALLATION)\n\ \n\ I've supplied two executables: consed_amd64 (statically linked) and\n\ consed_linux64bit (dynamically linked). Try the first one first. If\n\ Consed doesn't come up at all, then try the second one. The kind of\n\ problems you might have would cause consed to immediately terminate,\n\ so if consed comes up at all (you can see the Consed Main Window),\n\ that particular executable is fine for you. (See QUICK TOUR OF CONSED\n\ for how to start Consed--you must be in the correct directory.)\n\ \n\ If you can't copy and paste (see COPY AND PASTE elsewhere in this\n\ document: if you highlight a segment of the consensus sequence, you\n\ should be able to paste it into the search window), try the\n\ dynamically linked executable consed_linux64bit.\n\ \n\ One user (July 2010) reported:\n\ \n\ On Ubuntu 10.04, (Lucid) Static Consed Version 19.0 (090206) gave the\n\ following message:\n\ consed: relocation error: /lib/libnss_files.so.2: symbol __rawmemchr,\n\ version GLIBC_2.2.5 not defined in file libc.so.6 with link time\n\ reference\n\ \n\ Shared/Dynamic linking Consed gave:\n\ /pkg/consed/consed_linux64bit: error while loading shared libraries:\n\ libstdc++.so.5: cannot open shared object file: No such file or\n\ directory\n\ \n\ I solved the problem by downloading and directly installing an old\n\ library using dpkg :\n\ http://packages.ubuntu.com/jaunty/amd64/libstdc++5/download\n\ \n\ \n\ Another user reported the following problem:\n\ \n\ "; static char szReadMe14[] = "\n\ \"now we are unable to copy/paste into Consed from text editors such as\n\ emacs or vim. However, copying/pasting within Consed works just\n\ fine.\"\n\ \n\ He then found the following fixed the problem:\n\ \n\ \"Initially, as I was following the installation instructions and\n\ couldn't verify the version number with a 'consed -v' command with the\n\ 'consed_linux64bit' executable (it complained about a missing library,\n\ libstdc++.so.5) I switched to the 'consed_linux64bit_static' executable\n\ and it returned the version number properly. After finishing the\n\ installation and attempting to work with our assembly data we hit some\n\ strange errors. On a hunch and following Joel Martin's advice not to\n\ use the _static executable we installed the compat-libstdc++-296 and\n\ compat-libstdc++-33 libraries on our fedora 8 64-bit system and\n\ reverted to the non-static executable. (These were the only two in\n\ the Legacy Software directory of our Fedora 8 repository.)\"\n\ \n\ \n\ Another user reported that consed_linux64bit could not find libXp.so.6\n\ He solved this problem by downloading\n\ xorg-x11-deprecated-libs-6.8.2-1.EL.52.x86_64.rpm \n\ from \n\ http://rpm.pbone.net/index.php3/stat/4/idpl/8965447/com/xorg-x11-deprecated-libs-6.8.2-1.EL.52.x86_64.rpm.html\n\ or from\n\ http://rpm.pbone.net/index.php3/stat/3/srodzaj/1/search/libXp.so.6()(64bit)\n\ and installed the rpm package, then added \"/usr/X11R6/lib64/\" to\n\ \"/etc/ld.so.conf\" \n\ then ran the command \"ldconfig\" \n\ \n\ Several users reported that consed_linux64bit gave:\n\ error while loading shared libraries: libXp.so.6: cannot open shared\n\ object file: No such file or directory\n\ \n\ and he solved the problem with the command:\n\ yum install libXp\n\ \n\ \n\ --------------------------------------------------------------------------\n\ 6. NOTE TO ITANIUM LINUX USERS (INSTALLATION)\n\ \n\ In /usr/lib, there must be a file: libstdc++-libc6.2-2.so.3\n\ \n\ If you try to run consed and this is missing, you will see an error\n\ message like this:\n\ \n\ > ./consed\n\ ./consed: error while loading shared libraries: libstdc++-libc6.2-2.so.3: cannot open shared object file: No such file or directory\n\ \n\ If you don't have this file already, I have provided it for you in\n\ with the linux consed distribution. \n\ \n\ \n\ --------------------------------------------------------------------------\n\ 7. NOTE TO SOLARIS USERS (INSTALLATION)\n\ \n\ A.\n\ Do not use /usr/ucb/cc !!! How can you tell if you are using it?\n\ Type:\n\ \n\ which cc\n\ \n\ If it says /usr/ucb/cc, you must get gcc or else buy the commercial cc\n\ from Sun (which is /opt/SUNWspro/bin/cc).\n\ \n\ If you use /usr/ucb/cc, strange things will happen, including\n\ phd2fasta not working correctly by cutting off the first 2 characters\n\ of read names.\n\ \n\ B.\n\ If you are using large files or datasets, please use the executable\n\ consed_solaris64. The other one will not be able to handle files\n\ larger than 2Gb.\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ 8. NOTE TO MACOSX USERS (INSTALLATION)\n\ \n\ To create /usr/local/genome, there are 2 ways to do this:\n\ \n\ 8.1) within a terminal window, type:\n\ \n\ cd /usr/local\n\ sudo mkdir genome\n\ sudo chmod 777 genome\n\ \n\ (The last command says that anyone can read and write to the genome\n\ directory. If you don't want to allow this much access, read about\n\ the chmod command and adjust it according to your wishes.)\n\ \n\ 8.2) from Finder. This is tricky since Finder normally will refuse to\n\ even show you /usr which is a hidden file. To get it to show you\n\ hidden files, google \"showing hidden folders macosx\". Then create a\n\ folder \"genome\" within /usr/local and return to a terminal window for\n\ the rest of the installation.\n\ \n\ Please edit the phredPhrap to reflect the correct location of nice\n\ (there is a note in the phredPhrap script about this).\n\ "; static char szReadMe15[] = "\n\ \n\ \n\ There are 2 consed executables:\n\ \n\ consed_mac_intel\n\ consed_mac_ppc\n\ \n\ If consed_mac_intel works, use it. If it gives an error such as:\n\ \n\ consed_mac_intel: Bad CPU type in executable.\n\ or\n\ dyld: Library not loaded: /usr/X11/lib/libX11.6.dylib\n\ \n\ \n\ then try consed_mac_ppc\n\ \n\ \n\ You must put /usr/local/genome/bin (or wherever you put consed and the\n\ scripts) into your path. On MacOSX, this is by a file in your home\n\ directory .bash_profile. You would add this line:\n\ \n\ export PATH=/usr/local/genome/bin:$PATH\n\ \n\ When you log off and log back on, your new path will include consed.\n\ \n\ \n\ X-WINDOWS on MacOSX can have problems. To test this, type: \n\ \n\ xterm\n\ \n\ A xterm terminal window should appear. \n\ \n\ If not, here are some suggestions from various people, some\n\ of which may work and some may not:\n\ \n\ If you are using MacOSX 10.6, things should just work.\n\ \n\ If you are using MacOSX 10.5, there seems to be at least 2 problems\n\ with X11: one is that cut/paste does not function within X11. I've\n\ also heard that the $DISPLAY variable is not set automatically. Here\n\ is what one user suggests:\n\ \n\ The best workaround is to remove X11 altogether, and\n\ replace it with the previous version that was part of Mac OS 10.4. Here\n\ is how:\n\ \n\ Remove the X11 installation of Mac Os 10.5\n\ \n\ sudo rm -rf /usr/X11 /usr/X11R6\n\ sudo rm /System/Library/LaunchAgents/org.x.X11.plist\n\ (or rm /System/Library/LaunchAgents/org.x.startx.plist)\n\ sudo rm /Library/Receipts/X11User.pkg\n\ sudo pkgutil --forget com.apple.pkg.X11DocumentationLeo\n\ sudo pkgutil --forget com.apple.pkg.X11User\n\ sudo pkgutil --forget com.apple.pkg.X11SDKLeo\n\ sudo pkgutil --forget org.x.X11.pkg\n\ \n\ Install the X11 installation of Mac Os 10.4\n\ (found on 10.4 installation CD:\n\ System/Installation/Packages/X11User.pkg)\n\ \n\ Install the newest xquartz (X11-2.3.2.1.dmg) found\n\ at http://xquartz.macosforge.org/trac/wiki\n\ \n\ You can start X11 from the dock and run consed just as usual.\n\ \n\ For other versions of MacOSX\n\ One person suggests:\n\ \n\ http://sage.ucsc.edu/~wgscott/xtal/wiki/index.php/X11\n\ http://sage.ucsc.edu/~wgscott/xtal/wiki/index.php/X11_more_details\n\ \n\ Another says that for 10.5, an X-environment comes installed by\n\ default (XQuartz). Information about XQuartz (and the newest\n\ versions) can be found at: \n\ http://xquartz.macosforge.org\n\ \n\ Another says that for older versions of macosx (10.4 and earlier):\n\ \n\ You must have an X environment on your MAC and you might need to turn\n\ it on. If you don't know how to do this, find someone locally who can\n\ help you.\n\ \n\ If you don't have an X environment already on your MAC, download from\n\ Apple at www.apple.com/software I suggest you use XDarwin in full\n\ screen mode. Use option-apple-A to move back and forth between the\n\ MAC desktop and the X environment. \n\ \n\ Another counters that XDarwin is not so friendly and instead suggests\n\ running the X11 version found at:\n\ \n\ http://www.apple.com/downloads/macosx/apple/x11formacosx.html\n\ \n\ or else OroborOSX (http://oroborosx.sourceforge.net/), and a new\n\ (non-beta) version is available\n\ (http://oroborosx.sourceforge.net/download.html).\n\ \n\ Some people say that XDarwin is no longer supported.\n\ \n\ \n\ "; static char szReadMe16[] = "\n\ MICE\n\ \n\ \n\ If you have a 1-button mouse, I've found that:\n\ \n\ apple-click = right button click\n\ option-click = middle button click\n\ \n\ (With X11 up, you may need to go into X11 Preferences, Input and\n\ enable 3-button mouse emulation.)\n\ \n\ \n\ C COMPILER\n\ \n\ You will need to have a c compiler to compile some programs. If you\n\ can't find one on your computer, one is part of the Xcode package\n\ which is both part of a CD that came with your mac and it is also\n\ available for download from Apple. You will need to get a free\n\ membership to Apple's ADC program to download it.\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 9. QUICK TOUR OF CONSED\n\ \n\ \n\ Release 20.0\n\ \n\ Consed is a program for viewing and editing assemblies.\n\ \n\ If you are already an advanced Consed user, you should read through\n\ this and do any of the exercises on features that you are unfamiliar\n\ with. I frequently run across people who are doing something in\n\ Consed a hard way month after month, and request a new feature to make\n\ things easier, when that new feature is already in Consed.\n\ \n\ If you have never used Consed before, to follow this Quick Tour will\n\ take you less than 6 hours. I frequently run across people who do not\n\ have 6 hours to spare so they skip the Quick Tour and then they\n\ struggle for 2 days instead.\n\ \n\ When you do the quick tour, I encourage you to be free about changing\n\ the data set. If you really mess things up (such as changing all a\n\ read's bases to N's), no problem--just delete the data set and start\n\ again with a fresh copy.\n\ \n\ 9.1) USING CONSED GRAPHICALLY\n\ \n\ 9.2) Type the following:\n\ \n\ cd standard/edit_dir\n\ \n\ \n\ 9.3) start Consed by typing the appropriate command below (which\n\ command depends on what kind of computer you are running on and how\n\ your system administrator installed consed--ask him/her):\n\ \n\ consed\n\ consed_linux64bit\n\ consed_linux32bit\n\ consed_linux_itanium\n\ consed_mac\n\ consed_solaris\n\ consed_solaris_intel\n\ \n\ (Don't worry about a message like:\n\ Warning: Cannot convert string \"helvetica\" to type FontStruct )\n\ \n\ \n\ Two windows will appear. One of these will have the list of .ace\n\ files and say 'select assembly file to open' and\n\ 'standard.fasta.screen.ace.1'. Double click on\n\ \"standard.fasta.screen.ace.1\". The first window goes away.\n\ \n\ You will now see a list of one contig and a list of reads. This is the\n\ 'Consed Main Window'. \n\ \n\ Double click on 'Contig1'.\n\ \n\ The 'Aligned Reads Window' will appear. \n\ \n\ 9.4) SCROLLING\n\ \n\ Try scrolling back and forth. Try scrolling by dragging the thumb of\n\ the scrollbar. Also try scrolling by clicking on the 4 buttons: << < > >>\n\ for scrolling by small amounts. For scrolling by tiny\n\ amounts, click on the arrows at either end of the scrollbar. For\n\ scrolling by huge amounts, use the middle mouse button and just click\n\ on some location on the scrollbar. For scrolling to the beginning or\n\ end of the contig, use the <<< or >>> buttons.\n\ \n\ (Question: why can't you just move the scrollbar to the extreme right\n\ in order to go to the end of the contig? Answer: in typical\n\ assemblies, there are reads that protrude beyond the beginning of the\n\ contig and reads that protrude beyond the end of the contig. Moving\n\ the scrollbar to the extreme right will scroll the contig to the\n\ end of the rightmost read--typically far to the right of the\n\ end of the contig. Thus you should get in the habit of using\n\ "; static char szReadMe17[] = "\n\ the <<< and >>> buttons.)\n\ \n\ 9.5) GOTO POSITION\n\ \n\ In the Aligned Reads Window, click in the 'Pos:' box in the upper\n\ right-hand corner. Type in a number, such as 540, and push the\n\ 'Return' or 'Enter' key. The Aligned Reads Window will scroll to\n\ position 540. We find this feature is particularly useful when one\n\ person wants another person to look at something in the sequence.\n\ \n\ (Little used feature: if you type in a number preceded by a \"*\" such\n\ as \"*947\", the cursor will be moved to padded position 947 which\n\ counts *'s (pads).)\n\ \n\ \n\ 9.6) COLORS\n\ \n\ Notice the colors. Scroll to position 937 and notice the read 'a'.\n\ The red bases are the ones that disagree with the consensus.\n\ \n\ Notice the different shades of grey background (around the bases).\n\ They refer to the quality (error probability) of the base. Quality\n\ values mean the following:\n\ \n\ A quality value of 10 means 1 error in ten to the 1.0 power\n\ A quality value of 20 means 1 error in ten to the 2.0 power\n\ A quality value of 30 means 1 error in ten to the 3.0 power\n\ A quality value of 40 means 1 error in ten to the 4.0 power\n\ \n\ and for quality values in between:\n\ \n\ A quality value of 25 means 1 error in ten to the 2.5 power\n\ \n\ Get the idea?\n\ \n\ \n\ (These have actually been empirically verified--if you are interested\n\ in the gory details, read the phred papers:\n\ \n\ Ewing B, Hillier L, Wendl M, Green P: Basecalling of automated\n\ sequencer traces using phred. I. Accuracy assessment. Genome Research\n\ 8, 175-185 (1998).\n\ \n\ Ewing B, Green P: Basecalling of automated sequencer traces using\n\ phred. II. Error probabilities. Genome Research 8, 186-194 (1998).\n\ \n\ In that same copy of the journal is a paper about Consed, as well.)\n\ \n\ Also notice the upper and lowercase. This is just a cruder indication \n\ of the quality of the bases.\n\ \n\ 9.7) To see the quality value of a particular base, point at it and\n\ click with the left mouse button. You will see the quality displayed\n\ in the Info Box at the bottom of the Aligned Reads Window.\n\ \n\ \n\ These quality values are shown in grey scales:\n\ \n\ Quality 0 through 4 is given by dark grey\n\ Quality 5 through 9 is given by a shade lighter\n\ Quality 10 through 14 is given by a shade still lighter\n\ .\n\ .\n\ .\n\ Quality of 40 through 97 is given by white (the brightest shade)\n\ \n\ A quality value of 99 is reserved for bases that have been edited and\n\ the user is absolutely sure of the base ('high quality edited').\n\ \n\ A quality value of 98 is reserved for bases that have been edited and\n\ the user is not sure of the base ('low quality edit').\n\ \n\ The ends of the reads shows bases that are grey and have a black\n\ background. These are the low quality ends of the reads or the\n\ unaligned ends of reads, as determined by phrap.\n\ \n\ \n\ \n\ 9.8) Click on a base on a read. Then hold down the control key and\n\ type 'a'. You will move to the beginning of the read. Hold down the\n\ control key and type 'e'. You will move to the end of the read.\n\ (Emacs users will recognize these commands.) You can also scroll by\n\ 10 bp by using the \"<\" and \">\" on the keyboard (not using the mouse).\n\ \n\ \n\ 9.9) HIGHLIGHTING READ NAMES \n\ \n\ In the Aligned Reads Window, click on a read name with the left mouse\n\ button. The name will turn magenta. Click again and it will turn\n\ yellow again. Try turning it magenta and then scrolling. This\n\ feature is helpful in keeping track of a particular read as you\n\ scroll.\n\ \n\ If you have an emacs window open (or any editor window), you can paste\n\ the read name in by just clicking with the middle mouse button.\n\ When you clicked on the read name in the Aligned Reads Window with the\n\ left mouse button, the read name was loaded into the paste buffer.\n\ \n\ \n\ 9.10) DIMMING ENDS OF READS\n\ "; static char szReadMe18[] = "\n\ \n\ Scroll so that location 490 is about in the middle of the aligned\n\ reads window. Push the left mouse button down on the menu item 'Dim'.\n\ There will be a list of choices that will appear. Drag the cursor\n\ down to 'Dim Nothing' and release. Now look what happened to the\n\ color of the bases. The ends of the reads that used to be with a\n\ black background now appear red with a grey background. You are\n\ seeing the clipped-off bases with all the same information as any\n\ other base. Since there is a huge amount of red (discrepant) bases,\n\ the screen becomes distracting and busy. Thus by default the low\n\ quality clipped-off bases are made with a black background and a grey\n\ foreground so they don't distract you.\n\ \n\ Notice there is a distinction here between 'low quality ends of\n\ reads' and 'unaligned ends of reads'. Unaligned ends of reads can be\n\ low quality as well, or they can be high quality, as in the case of\n\ chimeric reads.\n\ \n\ Point with the mouse to a read name and hold down the right mouse\n\ button. You will notice there is a line that says \"high quality from\n\ nnn to nnn; aligned from nnn to nnn; chem: prim\". This is giving the\n\ same information in number form. Highlight the read name first (see\n\ HIGHLIGHTING READ NAMES above) so you don't lose the read as you\n\ scroll. Then check that the numbers agree with the dimming.\n\ \n\ You can play with the dimming options a bit. Then return it to 'Dim\n\ Low Quality' for the rest of this tour.\n\ \n\ \n\ \n\ 9.11) TRACES AND EDITING\n\ \n\ Point with the mouse at a base of one of the reads and click with the\n\ middle mouse button. (If you do not have a 3 button mouse, see\n\ MONITORS AND MICE FOR CONSED below.) The Trace Window showing the\n\ traces for that stretch of read should popup.\n\ \n\ There are 2 rows of numbers:\n\ \n\ 'con' are the consensus positions\n\ 'rd' are the read positions\n\ \n\ There are 3 rows of bases in the trace window:\n\ \n\ 'con' is the consensus\n\ 'edt' is where you can edit the base calls of the read\n\ 'phd' is the original phred base calls\n\ \n\ Notice that a red rectangle blinks (the 'cursor') in the corresponding\n\ positions of the Aligned Reads Window and the Trace Window.\n\ \n\ \n\ 9.12) Try editing in the Trace Window. You can click the left mouse\n\ button on a base in the 'edt' line to set the cursor (a blinking red\n\ rectangle). You can directly overstrike a base by typing a letter.\n\ Try this. Try undoing it (by clicking on 'undo' ). If you want to\n\ undo more than one edit, you will have to go back to the main Consed\n\ window and click on the button labeled 'Undo Edit...'--you will learn\n\ that later. You can overstrike with the following characters: acgt\n\ (bases), * (a pad, in effect deleting the base), and mrwsykvhdb (IUB\n\ ambiguity codes).\n\ \n\ You can move left and right with the arrow keys.\n\ \n\ We believe that the user should change a Sanger base call only while\n\ examining the traces. That is why editing is done here--not in the\n\ Aligned Reads Window.\n\ \n\ 9.13) You can insert a column of pads by pushing the space bar. Try\n\ this. (You may need to click on a base on the 'edt' line first.)\n\ \n\ (For those of you new to editing assemblies, a 'pad', which in Consed\n\ and phrap is represented by the '*' character, is used to align\n\ two or more sequences such as these:\n\ gttgacagtaatcta\n\ gttgacataatcta\n\ in which one sequence has an inserted or deleted base with respect to\n\ the other. By inserting the pad character, it is possible to get a\n\ good alignment: \n\ gttgacagtaatcta\n\ gttgaca*taatcta\n\ This is the purpose of pad character--it is just a placeholder.)\n\ \n\ You can then overstrike a pad with a base. In this way you\n\ can insert a base, and still preserve the alignment.\n\ \n\ 9.14) Try highlighting a stretch of a read on the edt line by holding\n\ down the middle mouse button and dragging the cursor over some bases.\n\ They will turn yellow as you drag. Then release the mouse button. A\n\ window will pop up giving you some choices of what to do with those\n\ (yellow) bases.:\n\ \n\ \n\ Make High Quality--makes the highlighted bases edited high quality\n\ (99). This tells phrap (when it reassembles) that you are\n\ sure of the sequence here.\n\ Change Consensus--make the highlighted bases edited high quality and\n\ change the consensus to agree with that stretch of the read.\n\ This is a directive to phrap (upon reassembly) to use that\n\ stretch of that read to be the consensus.\n\ "; static char szReadMe19[] = "\n\ Make low quality--makes the highlighted bases edited low quality.\n\ This tells phrap (when it reassembles) that you are not sure\n\ of the bases here and phrap can go ahead and make a join even\n\ if the bases in this region don't match perfectly.\n\ Make Low Quality to Left End--same as above, but all the way to\n\ the left end of the read.\n\ Make Low Quality to Right End--same as above, but all the way to\n\ the right end of the read.\n\ Change to n's--Change the highlighted bases to n's which means\n\ they are unknown bases. This tells phrap (when it\n\ reassembles) to not make any join based on these bases. It is\n\ useful when you believe the bases may be in the chimeric\n\ portion of a read.\n\ Change to n's to left--same as above but to left end.\n\ Change to n's to right--same as above but to right end.\n\ Change to x's to left--Change the highlighted bases to x's which\n\ means they are vector. This tells phrap to ignore these bases\n\ for the purpose of determining overlap.\n\ Change to x's to right--same as above but to right end.\n\ Add Tag--allows user to add any tag to a stretch of read bases.\n\ Dismiss--you decided you don't really want to do anything with\n\ this stretch of bases.\n\ \n\ This popup is made so that nothing else works until you choose\n\ something. Try each of these choices, except for tags, which you'll\n\ try below.\n\ \n\ 'Change Consensus' has an additional function--if a read extends out\n\ on the right beyond the end of the consensus, you can extend the\n\ consensus by using this function. You might want to do this, for\n\ example, if cross_match did not correctly find the cloning site and\n\ thus clipped too much. You can add these bases to the consensus\n\ by using 'Change Consensus'. Typically, the quality of these bases in \n\ the read and in the consensus is 99. That is so that next time phrap\n\ runs, it will correctly extend the consensus. \n\ \n\ However, if you aren't going to reassemble, you might want to just\n\ leave the quality values the way phred originally called them. You\n\ can do this by using a Consed parameter\n\ (consed.extendConsensusWithHighQuality), which you will learn more\n\ about later (see CONSED CUSTOMIZATION).\n\ \n\ \n\ 9.15) To delete a base, overstrike it with a '*' character. (Phrap\n\ ignores '*', so this is the same as deleting the character.) If you\n\ overstrike all bases in a column with * characters so the entire\n\ column consists of *'s (including the consensus base), there is no way\n\ to remove the column. This is OK since when you export the consensus\n\ (try the exercise on EXPORTING THE CONSENSUS), the *'s are not\n\ exported. While you are editing in Consed, we believe there should be\n\ a visual indication that a base was deleted.\n\ \n\ \n\ 9.16) SCROLLING TRACES AND ALIGNED READS TOGETHER\n\ \n\ In the Aligned Reads window, scroll along the contig to a\n\ different point. Click the left mouse button on a read whose trace is\n\ already up. Notice that the existing trace instantly scrolls to the\n\ corresponding location. Now go to the Trace Window and scroll the\n\ traces to a new location. Click on the edt line with the left mouse\n\ button. You will notice that the Aligned Reads window will instantly\n\ scroll to the corresponding location. Thus you can keep the Aligned\n\ Reads window and the traces scrolled to the same location.\n\ \n\ 9.17) SHOW ALL TRACES\n\ \n\ Go to a region where there are lots of reads, say base 1660. Push\n\ down the right mouse button and release on 'Display traces for all\n\ reads'. You will see all traces displayed in a scrolling window. You\n\ can drag the scrollbar on the right down and up to see all the traces.\n\ This feature is particularly useful for polymorphism/mutation\n\ detection work. This feature was added to work in cooperation with\n\ polyphred. (See CONSED-POLYPHRED intereaction below.)\n\ \n\ In this Traces Window, point at one of the bases of one of the reads\n\ and click with the left mouse button. The base should start blinking\n\ in red. Now push the down arrow key on your keyboard. The cursor\n\ should move to the next read. Repeatedly type the down arrow key.\n\ Eventually the display should scroll so you can continue to see the\n\ read the cursor is on. Try the up arrow key as well.\n\ \n\ If there are more than 100 traces at a position, you will see those\n\ traces in batches of 100 traces. You can use the bottons at the\n\ bottom of the Traces Window labelled \"prev 100 traces\" and \"next 100\n\ traces\" to move to the previous and next batches of 100 traces.\n\ \n\ There is also a button at the top of the Traces Window that changes\n\ between \"Show All Traces\" and \"Show Just Good Traces\". A \"good trace\" \n\ means a trace that is all of the following:\n\ \n\ * it has a base at the cursor location\n\ * there is no dataNeeded tag on the read \n\ \n\ (this is customizable using the resource\n\ consed.showAllTracesDoNotShowTraceIfTheseTagsPresent: )\n\ \n\ \n\ 9.18) SAVING THE ASSEMBLY\n\ \n\ To save the assembly, pull down the 'File' menu on the Aligned\n\ "; static char szReadMe20[] = "\n\ Reads Window, and release on 'Save assembly'. A box will pop up with\n\ a suggested name. I suggest you always use the one it suggests. The\n\ idea is that the ace files:\n\ \n\ \n\ (project).fasta.screen.ace.1\n\ (project).fasta.screen.ace.2\n\ (project).fasta.screen.ace.3\n\ (project).fasta.screen.ace.4\n\ (project).fasta.screen.ace.5\n\ \n\ are in order of how old they are. If you feel you are taking up too\n\ much disk space, then start deleting the ace files starting at the\n\ oldest. I do not recommend that you overwrite existing ace files.\n\ The version numbers just keep growing, and that is not a problem.\n\ \n\ \n\ 9.19) EXPORTING THE CONSENSUS\n\ \n\ Exporting the consensus. Bring the Aligned Reads Window into view\n\ again. Hold down the left mouse button on the 'File' menu and\n\ release the button on 'Export consensus sequence'. Notice that the\n\ consensus will be stored (in this case) in a file called\n\ 'Contig1.fasta'. Click 'OK'. There is now a file in your edit_dir\n\ directory called 'Contig1.fasta' that has the consensus sequence in\n\ it. If you want to see the file, bring up another Xterm (if you are\n\ UNIX literate), and type: \n\ \n\ \n\ cd standard/edit_dir\n\ more Contig1.fasta\n\ \n\ \n\ 9.20) Fancier exporting the consensus. Bring the Aligned Reads Window\n\ into view again. Hold down the left mouse button on the 'File' menu\n\ but this time release on 'Export consensus sequence (with\n\ options)...'. Just export a little snip of the consensus, from 400 to\n\ 410. (You will notice this contains a pad * character.) Under \"Write\n\ Both Bases File and Qual File or Just Bases File?\" click \"Both Files\"\n\ Click 'OK'. Consed will want to call this file 'Contig1.fasta' again.\n\ You can overwrite the existing file.\n\ \n\ Look in your other Xterm at these files:\n\ \n\ more Contig1.fasta\n\ more Contig1.fasta.qual\n\ \n\ The one file contains the bases (but no * pads) and the other\n\ contains the corresponding qualities of those bases.\n\ \n\ \n\ 9.21) Exporting the consensus of all contigs at once: Go to the Main\n\ Consed Window. Point to 'File', hold down the left mouse button, and\n\ release on 'Write all contigs to fasta file'. You then can choose a\n\ filename for all contigs to be written to. (In this project there is \n\ only 1 contig, so there is no difference between this option and just\n\ exporting a contig at a time.)\n\ \n\ \n\ 9.22) COMPLEMENTING THE CONTIG\n\ \n\ Push 'Compl Cont' in the Aligned Reads Window to complement the\n\ contig. This displays the opposite strand of the contig including the\n\ consensus and all reads. Push this button again to uncomplement it.\n\ \n\ \n\ 9.23) FIND MAIN WINDOW\n\ \n\ On the Aligned Reads window, click on 'Find Main Win'. This will\n\ cause the Consed Main Window to pop up in the event you have buried it under\n\ other windows or iconified it. (This may not work with some settings of\n\ your X emulator. In that case you will have to find and click on the\n\ Main Window to bring it up.)\n\ \n\ \n\ 9.24) MULTIPLE UNDO EDIT\n\ \n\ Now that the Consed Main Window is visible, click the 'Undo\n\ Edit...' button. There will be a popup indicating the most recent\n\ edit. (If it says \"no edits so far\", then bring up a trace and make\n\ several edits. Then click on 'Undo Edit...' again.) Click 'undo'.\n\ Then you will see the edit that was done before that. Click 'undo'.\n\ You can continue undoing if you like. You now know how to undo more\n\ than one edit. You cannot choose which edits to undo and which to not\n\ undo--edits can only be undone in precisely reverse order from the\n\ order you made them. Once you save the assembly, you cannot undo\n\ prior edits.\n\ \n\ 9.25) EXITING CONSED\n\ \n\ On the Aligned Reads Window, point to 'File' menu, hold down the\n\ left button and release on 'Quit Consed'. If it asks you some\n\ questions, answer 'Quit Without Saving and Discard .wrk File'.\n\ \n\ \n\ 9.26) CONSED -ACE\n\ \n\ Try bringing up Consed like this:\n\ \n\ consed -ace standard.fasta.screen.ace.1\n\ "; static char szReadMe21[] = "\n\ \n\ This is an alternative to just typing \"consed\" and then selecting the\n\ ace file from within consed. Many users prefer this method instead.\n\ \n\ \n\ 9.27) USING SOLEXA READS\n\ \n\ You will start with an existing solexa assembly and then, when you have learned\n\ some features using solexa reads, you will learn how to create such an\n\ assembly yourself.\n\ \n\ If consed is up, exit it.\n\ \n\ 9.28) Type:\n\ cd solexa_example_answer/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ \n\ You should see a file ref.ace.1\n\ \n\ \n\ 9.29) Start consed and double click on \"ref.ace.1\"\n\ In the Contig List, double click on \"ref\"\n\ \n\ The Aligned Reads Window should come up. \n\ \n\ 9.30) You will notice that reads continue below the bottom of the window.\n\ Make the Aligned Reads Window larger by dragging down the bottom right\n\ corner of the window. Then make it the original size again. You can\n\ also see reads below by scrolling down using the scroll bar on the\n\ right side of the Aligned Reads Window.\n\ \n\ 9.31) SORTING OF READS\n\ \n\ \n\ 9.32) Scroll to position 453. Click on the G consensus base to make a\n\ green vertical line appear. Scroll up and down using the scrollbar on\n\ the right side of the window. You will notice that all of the reads\n\ that have any base at all at position 453 are at the top, and they are\n\ sorted by the quality values of the bases around position 453. It is thus\n\ immediately clear that all high quality bases have G--there is no\n\ variant at this position.\n\ \n\ Click on the consensus \"T\" at position 454 (the neighboring consensus\n\ base). Then click back on the consensus \"G\" at position 453. Switch\n\ back and forth between the T and the G. You will notice that the\n\ reads switch their vertical order. That is because some reads have\n\ higher quality at position 453 and some have higher quality at\n\ position 454.\n\ \n\ 9.33) Turn off sorting by quality: Point to the \"Sort\" menu, hold down\n\ the left mouse button and release on 'Turn off: sorting by quality'.\n\ Now the reads are sorted by strand and then the position of the left\n\ end of the read. Scroll up and down to see.\n\ \n\ 9.34) ALPHABETICAL SORTING OF READS\n\ \n\ The reads can be ordered in other ways as well:\n\ \n\ a) alphabetically\n\ b) first all the top strand reads and then all the bottom\n\ strand reads. The top strand reads are then ordered\n\ by the left end of the reads. Same with the bottom\n\ strand reads.\n\ c) arbitrarily by a user-provided file (named readOrder.txt by default)\n\ d) by quality at the cursor position (and then by one of the other\n\ ways above if the cursor isn't visible)\n\ \n\ You have now seen b) and d).\n\ \n\ If you have reads from different patients, you might want to sort the\n\ reads by patient, so that all reads from the same patient are together.\n\ \n\ 9.35) Point to the \"Sort\" menu, hold down the left mouse button and\n\ release on \"sort options and help\". Read the instructions. Click on\n\ \"alpha\" and observe the results. The reads should be sorted in\n\ alphabetical order. Scroll down to the bottom and you will see that\n\ the order of the reads is:\n\ \n\ HWI-EAS94_4_1_87_843_510 \n\ HWI-EAS94_4_1_9_616_723\n\ \n\ with is alphabetical (not numerical) order since 8 comes before 9 in\n\ alphabetical order.\n\ \n\ When you are done experimenting, return the buttons to \"Strand/Left\n\ End\" and \"by Quality\" which are the defaults.\n\ \n\ If you want to use a user-provided file to sort the reads, you must\n\ learn CONSED CUSTOMIZATION (below) with resources:\n\ \n\ consed.showReadsInAlignedReadsWindowOrderedByFile: true\n\ consed.showReadsInAlignedReadsWindowOrderedByThisFile: readOrder.txt\n\ \n\ 9.36) FINDING VARIANTS\n\ \n\ Click on the 'Find Main Win' button. You should see the Consed\n\ "; static char szReadMe22[] = "\n\ Main Window appear (if it was buried beneath other windows).\n\ \n\ 9.37) Point to the 'Navigate' menu, hold down the left mouse button, and\n\ release on 'Search for highly discrepant positions'. \n\ \n\ 9.38) Do not change any of the defaults and just click the 'Search'\n\ button. Up will pop a window labelled 'Highly Discrepant Positions'\n\ with an empty window.\n\ \n\ Well, that's no fun--apparently there aren't any real variants in this\n\ dataset. Dismiss this window.\n\ \n\ So that you can see what they look like, repeat the steps above to\n\ bring up the Navigate by Highly Discrepant Regions Window again,\n\ but this time change \"Ignore Bases Below This Quality\" from 20 to 12.\n\ Click 'Search'.\n\ \n\ Up will pop the Highly Discrepant Positions Window with a list of the\n\ 9 locations below: \n\ \n\ min # of discrepant reads: 2 min quality: 12, \"r\": base of reference seq\n\ max depth of coverage: 100000 and ignoring reference seq\n\ A C G T * pos contig\n\ 2 8.0% 23 92.0%r 0 0.0% 0 0.0% 0 0.0% 56 ref\n\ 3 9.1% 30 90.9%r 0 0.0% 0 0.0% 0 0.0% 252 ref\n\ 2 6.9% 27 93.1%r 0 0.0% 0 0.0% 0 0.0% 256 ref\n\ 0 0.0% 0 0.0% 20 90.9%r 2 9.1% 0 0.0% 682 ref\n\ 0 0.0% 0 0.0% 31 93.9%r 2 6.1% 0 0.0% 715 ref\n\ 2 4.8% 40 95.2%r 0 0.0% 0 0.0% 0 0.0% 742 ref\n\ 2 8.7% 21 91.3%r 0 0.0% 0 0.0% 0 0.0% 936 ref\n\ 0 0.0% 1 2.4% 1 2.4% 39 95.1%r 0 0.0% 982 ref\n\ \n\ This means, for example, that at position 56 of contig \"ref\", there are\n\ 2 A's, 23 C's, 0 G's, 0 T's, and 0 *'s (deletions). There are 8.0% A's,\n\ 92.0% C's, 0% G's, 0% T's, and the reference sequence contains a C\n\ at this position\n\ \n\ 9.39) Click the 'Next' button on this window and watch the Aligned Reads\n\ Window. You can continue clicking the 'Next' button either in the\n\ Highly Discrepant Positions Window or else at the bottom of the\n\ Aligned Reads Window. Do this until you have reached the end of the\n\ list. This provides a rapid method of reviewing variants.\n\ \n\ 9.40) Go back to the Consed Main Window, point to the 'Navigate' menu,\n\ hold down the left mouse button, and release on 'Search for highly\n\ discrepant positions'. When the window pops up entitled 'Navigate by\n\ Highly Discrepant Regions', look at the different options. This time\n\ try 'Just list indels'. Are there any indel variants in this data\n\ set? Try it and see. Well, actually there is one, but you will need\n\ to change the \"minimum # of discrepant reads\" to 1 to find it. Play\n\ around with these parameters a little.\n\ \n\ There is also the ability to ignore locations in which the consensus\n\ is an x or an n (or any bases you wish to ignore). You turn on this\n\ option by clicking \"True\" on the line \"Ignore location if consensus\n\ base is one of:\".\n\ \n\ Here are 2 more obscure options:\n\ \n\ 'maximum depth of coverage'\n\ \n\ Typically you won't use this (it is set to a ridiculously high\n\ number). It is there in case you want to avoid regions that you\n\ believe are collapsed repeats and thus what appear to be variants are\n\ really just differences between different copies of repeats.\n\ \n\ 'Count only first of multiple reads starting at same location'\n\ \n\ Some people believe that solexa reads that start at exactly the same\n\ location are really the same read (the same cluster) and the image\n\ software is making multiple reads. If such a group of reads has a\n\ discrepancy in it, they want to count the group as one read with the\n\ variant rather than multiple reads with the variant.\n\ \n\ This feature is also available as a report which can be generated\n\ automatically without using consed's graphical interface. You will\n\ learn how to use consed's report feature later.\n\ \n\ 9.41) EDITING/TAGGING SOLEXA READS\n\ \n\ You can edit or tag a solexa read in the same way you did with Sanger\n\ reads (above). Scroll to the right end of the contig (around position\n\ 1000). \n\ \n\ Push the left mouse button down on the menu item 'Dim' and release on\n\ \"Dim Nothing\". You will see that there are a number of reads that\n\ protrude beyond the right end of the consensus and are red, indicating\n\ discrepant with the consensus. Suppose you want to extend the\n\ consensus based on those reads.\n\ \n\ Find read HWI-EAS94_4_1_59_547_158 and click on the name to highlight\n\ it so you don't have to find it again. (Later you will learn how to\n\ quickly find a read by name, but for now just use your eyes.)\n\ \n\ Middle mouse click on the c base at the right end of this read. You\n\ will see a Trace Window pop up similar to the Trace Window you saw\n\ before with Sanger reads, except that the trace lines are dashed.\n\ This reminds you that this is a Solexa read and the traces are\n\ completely fictional--this window just gives you the ability to edit\n\ and tag the read.\n\ "; static char szReadMe23[] = "\n\ \n\ Point at the first base of the gccatgtcataac sequence which is all\n\ red, and hold down the middle mouse button and swipe to the last base\n\ (all should turn yellow) and then release. A \"What to Do with\n\ Selection\" window should pop up.\n\ \n\ In this window, click on the \"Change Consensus\" button. In the\n\ Aligned Reads Window, you will notice that the consensus has now been\n\ extended to include the additional bases from the read\n\ HWI-EAS94_4_1_59_547_158.\n\ \n\ But there is a problem: click on consensus base \"t\" at position\n\ 1009. Now the reads are sorted by quality at that position. You will\n\ see that the highest quality bases are all G while the consensus is a\n\ t since the read you used to extend the consensus had a t.\n\ \n\ 9.42) OVERSTRIKING THE CONSENSUS\n\ \n\ You can directly edit the consensus in the Aligned Reads Window\n\ without needing to bring up any trace. This was inappropriate for\n\ Sanger reads, but makes sense for Solexa reads.\n\ \n\ Point and click on the \"t\" in the consensus at position 1009. Look\n\ down the column and notice that all 'g' bases are red and all 't'\n\ bases are not red.\n\ \n\ Type a 'g' over the 't' in the consensus. The consensus should change\n\ and now in the column all t's should now be red and all g's should not\n\ be.\n\ \n\ 9.43) NAVIGATING BY HIGH/LOW DEPTH OF COVERAGE\n\ \n\ Go back to the Consed Main Window, point to the 'Navigate' menu,\n\ hold down the left mouse button and release on 'Search for high depth\n\ of coverage regions'. A Window entitled 'Navigate by High (or Low)\n\ Depth of Coverage' should pop up.\n\ \n\ 9.44) Change the 'min depth of coverage' box from 10 to 50 and click\n\ the 'Search' button. A navigate window entitled 'High Depth of\n\ Coverage Regions' will pop up with a number of regions with depth of\n\ coverage 50 and over. Navigate to a few of them.\n\ \n\ 9.45) You can also see an overview of the depth of coverage. On the\n\ Main Consed Window, click on Assembly View. The Assembly View Window\n\ will pop up. An error box will come up saying \"Sequence matches will\n\ not be shown in Assembly View...\". Dismiss it for now.\n\ \n\ You will learn much more about the Assembly View Window later, but\n\ for now just notice a few features:\n\ \n\ 9.46) Put the pointer on the grey bar with the numbers inside it. Move\n\ the pointer left and right and notice the information displayed near\n\ the bottom of the Assembly View Window as you do this. In particular,\n\ you will see the depth of coverage and the base position change as you\n\ move the pointer. The green graph also indicates depth of coverage.\n\ \n\ 9.47) ADDING SOLEXA READS\n\ \n\ Now that you know how to view and analyze solexa data, you will learn\n\ how to do the alignments with them.\n\ \n\ 9.48) Exit Consed and type:\n\ cd solexa_example/edit_dir\n\ \n\ (This is not the same directory you were in for the examples above,\n\ which was \n\ solexa_example_answer/edit_dir \n\ You may need to first type \n\ cd ../.. \n\ \n\ depending on which directory you are currently in.)\n\ \n\ Type:\n\ ls\n\ \n\ You should see ref.fa and solexa_files.fof\n\ \n\ 9.49) Type:\n\ more ref.fa \n\ \n\ This just contains the reference sequence in fasta format\n\ \n\ 9.50) Type\n\ more solexa_files.fof\n\ \n\ It contains just one line for each fastq file. We have\n\ just one such file so there is just one line:\n\ \n\ solexa_reads.fastq\n\ \n\ where solexa_reads.fastq is a solexa fastq file (note that \"solexa\n\ fastq\" is different than normal fastq--don't mix them up). \n\ \n\ [MORE TO BE ADDED ON SOLEXA MATE PAIRS]\n\ \n\ \n\ 9.51) Type:\n\ ls ../solexa_dir\n\ \n\ You will see this file:\n\ "; static char szReadMe24[] = "\n\ solexa_reads.fastq\n\ \n\ \n\ 9.52) First make sure you are still in solexa_example/edit_dir by\n\ typing:\n\ \n\ pwd\n\ \n\ which should say something that ends with:\n\ \n\ solexa_example/edit_dir\n\ \n\ Convert the reference sequence into an assembly by typing:\n\ \n\ fasta2Ace.perl ref.fa\n\ \n\ There should now be a file ref.ace in this directory and a file\n\ phd.ball.1 in ../phdball_dir\n\ \n\ To check that everything is fine, bring up Consed and double click on\n\ \"ref.ace\"\n\ \n\ You should see an assembly with exactly 1 read--the reference sequence\n\ called 'ref'. Terminate Consed.\n\ \n\ 9.53) Type:\n\ addSolexaReads.perl ref.ace solexa_files.fof ref.fa\n\ \n\ There will be a flurry of output from various programs ending with\n\ something like this:\n\ \n\ Inserting pads in contigs to accommodate insertions in new reads...Done\n\ ending insertPadsInContigs 0\n\ Inserting pads in reads and setting read bases...\n\ now saving assembly... 0\n\ writing ./ref.ace.1\n\ See new ace file ref.ace.1\n\ done 0\n\ See log file: ref.080627.111305.out\n\ 0.0 minutes to make fasta files\n\ 0.0 minutes cross_match time\n\ 0.0 minutes consed time\n\ 0.0 minutes total time\n\ \n\ \n\ If you instead get error messages, the Consed/cross_match package is\n\ not installed correctly. See INSTALLING_CONSED (above).\n\ \n\ 9.54) Type:\n\ ls\n\ \n\ You should now see the file ref.ace.1\n\ \n\ 9.55) Start Consed and double click on ref.ace.1\n\ Double click on the contig 'ref' in the Contig List.\n\ \n\ The Aligned Reads Window will popup. Scroll around a little to\n\ convince yourself that you have created exactly the same assembly as\n\ you used in the exercises above under \"USING SOLEXA READS\".\n\ \n\ \n\ \n\ 9.56) ALIGNING SOLEXA READS AGAINST A LARGE GENOME AND SELECTING A SMALL REGION\n\ FOR VIEWING WITH CONSED\n\ \n\ In many applications, you will want to align your solexa reads against\n\ a large genome (such as the human genome) even though you are only\n\ interested in some part of that genome. For example, you might only\n\ be interested in reads that do map *best* to the region of interest,\n\ and thus you must map them against the entire genome to be sure they\n\ don't match better to some other location.\n\ \n\ Consed handles this by allowing you to run cross_match against a large\n\ genome and then allows you to specify certain regions of interest to\n\ view with consed. This exercise shows you how to do this. \n\ \n\ 9.57) Type:\n\ cd selectRegions/edit_dir\n\ \n\ (You may need to first type cd ../.. depending on which directory you\n\ are currently in.)\n\ \n\ ls\n\ \n\ You will see:\n\ \n\ solexa_files.fof which is a list of the solexa Gerald fastq\n\ files.\n\ \n\ refs.fof which is a file of filenames of the reference sequences.\n\ Typically refs.fof will be a list of the fasta files of the genome\n\ such as the human genome, but in this case it contains just one\n\ filename, ref.fa which is a small fasta file. (I made it small so it\n\ doesn't take you too long to download consed.) \n\ \n\ regions.txt is a file specifying the regions that you are interested\n\ in.\n\ \n\ 9.58) Run:\n\ alignSolexaReads2Refs.perl solexa_files.fof refs.fof my_alignments.fof\n\ "; static char szReadMe25[] = "\n\ \n\ (my_alignments.fof will be created by this program--it could be any\n\ name)\n\ \n\ The last line should say \"see my_alignments.fof\"\n\ \n\ At this point you have aligned all of the solexa reads against the\n\ reference sequence ref.fa (If you want to see those alignments, look\n\ in my_alignments.fof and then look in the file in my_alignments.fof,\n\ and then look through the pages of output for the ALIGNMENT lines.)\n\ \n\ Now suppose that we are interested in the following two regions:\n\ \n\ Bases from 1 to 100, and from 901 to 1000. \n\ \n\ 9.59) type:\n\ more regions.txt\n\ \n\ This indicates to consed that you are interested in these 2 regions.\n\ It also shows the path of the fasta file (ref.fa) which contains the\n\ sequence ref.\n\ \n\ In this case there are only 2 regions specified, but there is no\n\ reason, with your own data, that you couldn't specify thousands of\n\ regions.\n\ \n\ \n\ 9.60) type:\n\ selectRegions.perl regions.txt my_alignments.fof my_new_ace.ace\n\ \n\ The last line of the output should say something like:\n\ writing my_new_ace.ace.2\n\ \n\ 9.61) type:\n\ consed -ace my_new_ace.ace.2\n\ \n\ You should see 2 contigs: ref_1 and ref_901 which are the 2 regions\n\ specified. There should be a total of 213 reads--211 solexa reads and\n\ 2 fake fasta file reads. Bring up the Aligned Reads Window and scroll\n\ around a bit. \n\ \n\ In the contig ref_901 you will notice that the left end of the contig\n\ is numbered 901--the consensus numbers refer to the positions in the\n\ original reference sequence. Thus if your reference sequence were,\n\ for example, a chromosome, the numbers would be chromosome positions.\n\ \n\ If you would like, you can see the consensus numbers starting at\n\ position 1. To do this, point to a \"Misc\" menu, hold down the left\n\ mouse button, and release on \"Turn On/Off User-Defined Consensus Scale\n\ Numbers\".\n\ \n\ For those of you interested in what is happening behind the scenes,\n\ you might want to look at the files in phdball_dir: phd.ball.1\n\ contains all 867 of the solexa reads, phd.ball.2 contains the 2 fake reads\n\ representing the 2 regions, and phd.ball.3 contains just the 211\n\ solexa reads that align to the 2 regions. my_new_ace.ace.2 tells\n\ consed that it only needs to read phd.ball.2 and phd.ball.3 \n\ \n\ \n\ 9.62) USING YOUR OWN SOLEXA DATA\n\ \n\ You first must complete the exercises above using the test solexa\n\ data so you are confident you are doing the process correctly. \n\ \n\ No, do not continue reading here. I said \"You first must complete the\n\ exercises above using the test solexa data so you are confident you\n\ are doing the process correctly.\"\n\ \n\ After you have done the exercises above, create a project directory\n\ with subdirectories:\n\ \n\ phdball_dir\n\ edit_dir\n\ solexa_dir\n\ phd_dir\n\ \n\ 9.63) Put the Gerald fastq or Bustard files (pairs of *_seq.txt and\n\ *_prb.txt) into solexa_dir (or you could use links or you could even\n\ make solexa_dir itself be a link).\n\ \n\ 9.64) In edit_dir, make a file myFiles.fof just like solexa_files.fof\n\ in the solexa_example dataset described above.\n\ \n\ 9.65) Make a fasta file myFasta.fa in edit_dir containing the reference\n\ sequences. \n\ \n\ (Note that addSolexaReads.perl is written such that lowercase bases in\n\ the reference sequences are assumed to be repeats and matches of\n\ solexa reads to such regions are pretty much ignored. If your\n\ reference sequence were totally lowercase, you would get no\n\ matches--bad. If you instead want to not ignore matches to lowercase\n\ regions, you must modify addSolexaReads.perl by removing the words\n\ \"-repeat_screen 2\". See phrap.doc which came with the\n\ phrap/cross_match. It is generally good to use repeat_screen with a\n\ repeat-screened reference sequence since it greatly speeds up\n\ cross_match and a match to within a repeat doesn't mean much.)\n\ \n\ 9.66) Convert the fasta file to an assembly by typing\n\ fasta2Ace.perl myFasta\n\ This should create myFasta.ace\n\ "; static char szReadMe26[] = "\n\ \n\ 9.67) Run addSolexaReads.perl like this:\n\ addSolexaReads.perl myFasta.ace myFiles.fof myFasta.fa\n\ \n\ This should create myFasta.ace.1 which should contain all of the\n\ aligning solexa reads.\n\ \n\ Consensus quality values are not recalculated unless you put the\n\ following into your .consedrc file:\n\ \n\ consed.addNewReadsRecalculateConsensusQuality: true\n\ \n\ (For information on how to change the .consedrc file, see EDIT\n\ PARAMETERS: HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS elsewhere in\n\ this document.)\n\ \n\ 9.68) USING 454 READS (NEWBLER ASSEMBLY)\n\ \n\ \n\ The Newbler Assembler and Consed work hand-in-glove together. To see\n\ a Newbler assembly, exit Consed and type:\n\ \n\ 9.69) cd 454_newbler/edit_dir\n\ ls\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ \n\ 9.70) Restart Consed\n\ \n\ 9.71) Double click on \"454Contigs.ace.1\". You will see 2 contigs in\n\ the list: \n\ \n\ contig00001 \n\ contig00002\n\ \n\ 9.72) Double click on contig00001 to bring up the Aligned Reads\n\ Window\n\ \n\ 9.73) Using the thumb at the bottom, scroll from the far left of the contig\n\ all the way to the far right to get an idea of the assembly. (It is a\n\ very small one.)\n\ \n\ 9.74) In the Aligned Reads Window, scroll to position 246 and middle\n\ mouse click on the t in read ERQJC7K01CLG7G (which is probably the\n\ second to bottom read on the screen).\n\ \n\ \n\ A \"trace\" should pop up. \n\ \n\ If a trace does not pop up, there is an installation problem. See\n\ INSTALLING CONSED (above). Look closely at the error message that pops\n\ up and the error message in the xterm where Consed was started. They\n\ will indicate where Consed is expecting to find sff2scf and what the\n\ problem is.\n\ \n\ Unlike chromatograms from fluorescent sequencers, the spacing and\n\ width of the peaks is meaningless, but the height of the peaks is the\n\ actual intensity of the light emitted during each of the 454 cycles.\n\ When the light intensity indicates that there is more than one base in\n\ a row, instead of having a very tall peak, we break the peak up into n\n\ peaks where n is the number of repeated bases.\n\ \n\ 9.75) Look at the 3 \"T\" peaks at positions 244 through 246 (\"con\"\n\ line) or 195 through 197 (\"rd\" line) in the traces window. Notice\n\ that the rightmost peak is higher than the others. The reason for\n\ this is that the intensity of the light emitted was not exactly three\n\ times that of a single normal base, so we made the trace show the left\n\ peaks as high as a standard peak and the height of the rightmost peak\n\ is whatever amount of intensity is left over. Look back in the\n\ Aligned Reads Window and you will see that all other reads have 4 t's\n\ instead of 3. So the reason that the 3rd peak is higher than the\n\ others is that there are probably 4 t's here instead of 3.\n\ \n\ 9.76) In the Aligned Reads Window look at the t at unlabelled position\n\ between 228 and 229 of read ERQJC7K01A7AUR (probably the 4th read from\n\ the top). Notice that none of the other reads have a t at this\n\ position, so this read may have a base-calling error. Middle mouse\n\ click on this t. You will see in the Trace Window that the t peak is\n\ shorter than a normal peak, confirming this suspicion.\n\ \n\ 9.77) Let's take a look behind the scenes: Terminate consed and examine\n\ the contents of ../chromat_dir (it should be empty), ../phd_dir (it\n\ should be empty), ../phdball_dir (it should contain phd.ball.1), and\n\ ../sff_dir (it should contain reads.sff). Since there are no files in\n\ chromat_dir, there are no traces initially. When you click to see a\n\ trace, Consed runs the program sff2scf which creates the trace for the\n\ read you are interested in by reading reads.sff (which has all the\n\ intensity information of each base in each read). Jim Knight of 454\n\ corporation did a great job developing it. Each time a 454 trace pops\n\ up, tip your hat to Jim! \n\ \n\ Consed runs sff2scf for example like this:\n\ \n\ sff2scf sff:-f:pairedreads.sff:ERQJC7K01C3R2X\n\ \n\ where pairedreads.sff is the name of the sff file and EBE03TV02D2D4F\n\ is the name of the read (without the _left or _right extension). It\n\ will write the scf file into /tmp where Consed will read it.\n\ "; static char szReadMe27[] = "\n\ \n\ 9.78) On the Main Consed Window is a button on the left labelled\n\ \"Assembly View\". Click it. You will see 2 grey bars labelled\n\ \"contig00002c\" (the \"c\" is for \"complemented\") and contig00001. This\n\ indicates that the left end of contig00002 is connected to contig00001\n\ (the right end of contig00002c is the left end of contig00002).\n\ Newbler has given Consed forward-reverse pair information which Consed\n\ has used to determine this orientation of the contigs--another great\n\ Jim Knight job. You will learn more about the other graphics here in\n\ the Assembly View section (below).\n\ \n\ \n\ 9.79) USING 454'S NEWBLER ON YOUR OWN DATA\n\ \n\ First you should run through the tutorial above so that you know\n\ that everything works with my test dataset. \n\ \n\ 9.80) Run Newbler according to the 454 documentation using the -consed\n\ option.\n\ \n\ 9.81) Delete the .consedrc file that Newbler creates in edit_dir--it is\n\ intended for obsolete versions of consed and may cause problems with\n\ the current version.\n\ \n\ 9.82) Delete the phd.ball link in edit_dir--it is also intended for\n\ obsolete versions of consed and may cause problems with the current\n\ version.\n\ \n\ 9.83) Check that the current version of sff2scf is the one to be used.\n\ \n\ Type \"sff2scf -v\"\n\ It should say \"080721\" (or later). If instead it says \n\ \"Error: Unable to open SCF file: ../chromat_dir/-v\", \n\ your version is old and should be discarded. Use the new version that\n\ comes with consed.\n\ \n\ \n\ 9.84) USING 454 READS (ALIGNING TO REFERENCE SEQUENCE )\n\ \n\ Consed/cross_match can quickly align 454 reads to an existing\n\ reference sequence. You start with a fasta file of the reference\n\ sequence and the 454 sff files. You end up with an assembly with all\n\ of the 454 reads aligned against the reference sequence.\n\ \n\ To do this:\n\ \n\ Exit Consed and type:\n\ 9.85) cd align454reads/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ \n\ \n\ You will see 2 files:\n\ \n\ reference.fa which contains the reference sequence\n\ sff.fof which is an fof file referring to the 454 sff files\n\ \n\ (Note that add454Reads.perl is written such that lowercase bases in\n\ the reference sequence are assumed to be repeats and matches of 454\n\ reads to such regions are pretty much ignored. If you instead want to\n\ not ignore matches to lowercase regions, you must modify\n\ add454Reads.perl by removing the words \"-repeat_screen 2\". See\n\ phrap.doc which came with the phrap/cross_match.)\n\ \n\ You might notice that there also is a align454reads_answer directory\n\ parallel to the align454reads directory. This contains the files that\n\ you should get if you correctly follow this exercise and have Consed\n\ correctly installed. You can refer to it for troubleshooting. Also\n\ see INSTALLING CONSED (above).\n\ \n\ \n\ 9.86) Convert the reference.fa file into an assembly by typing:\n\ fasta2Ace.perl reference.fa\n\ \n\ There should now be a file reference.ace in this directory and a file\n\ phd.ball.1 in ../phdball_dir\n\ \n\ To check that everything is fine, bring up Consed and double click on\n\ \"reference.ace\"\n\ \n\ You should see that this is a assembly with exactly 1 read--the\n\ reference sequence. Terminate Consed.\n\ \n\ 9.87) Type:\n\ add454Reads.perl reference.ace sff.fof reference.fa\n\ \n\ There should be lots of output to the screen and no error messages and\n\ it should complete in a second or two.\n\ \n\ Type: ls\n\ \n\ Now you should see reference.ace.1\n\ \n\ 9.88) Bring up Consed and double click on \"reference.ace.1\"\n\ \n\ Then double click on contig \"myreference\" to bring up the Aligned\n\ Reads Window. Scroll around a little and middle mouse click on a read\n\ "; static char szReadMe28[] = "\n\ or two to see the trace.\n\ \n\ \n\ 9.89) ADDING ADDITIONAL 454 OR SOLEXA READS (YOUR OWN DATA)\n\ \n\ You can add additional 454 or solexa reads to an existing assembly.\n\ It doesn't matter whether the existing assembly is 454, solexa, or\n\ sanger. \n\ \n\ To add 454 reads, use:\n\ \n\ add454Reads.perl (existing ace file) (fof of sff files) (fasta)\n\ \n\ (This will add all of the reads in the sff files.)\n\ \n\ To add solexa reads, use:\n\ addSolexaReads.perl (existing ace file) (fof with prefixes) (fasta)\n\ \n\ In both cases the fasta file must precisely match the consensus of the\n\ existing ace file.\n\ \n\ If you want to add just a few 454 reads (not all of them in the sff\n\ file), and you know the names of the 454 reads you want to add, you\n\ can use a different method:\n\ \n\ In edit_dir, create a file (e.g., reads.fof) that contains the names\n\ of all the 454 reads you want to add. To find all reads in an sff\n\ file, type:\n\ \n\ \n\ sffinfo -a ../sff_dir/my454.sff >reads.fof\n\ \n\ where my454.sff is the sff file.\n\ \n\ Create a ../phd_dir directory.\n\ \n\ Run:\n\ \n\ sff2scfAndPhd my454.sff reads.fof\n\ \n\ This will create both scf files in ../chromat_dir and phd files in\n\ ../phd_dir.\n\ \n\ Then run either \"add new reads\" from within consed (see ADD NEW READS\n\ below) or automated add new reads (see ALIGNING SANGER READS TO A\n\ REFERENCE SEQUENCE (below)).\n\ \n\ \n\ \n\ SOLEXA AND 454 DATA--WHAT IS HAPPENING BEHIND THE SCENES\n\ \n\ 454 data comes in sff files (in sff_dir)\n\ \n\ 1. sff files --> phdballs (in phdball_dir) via the program \n\ consed -sff2PhdBall \n\ \n\ consed -sff2PhdBall calls a perl script\n\ filter454Reads.perl \n\ \n\ filter454Reads.perl runs cross_match to find, within each read,\n\ sffLinkers.fa which is the 454 linker sequence (if any) that separates\n\ the _left and _right 454 read. It also looks for puc19 contamination\n\ (filter454Reads.fa).\n\ \n\ 2. phdballs --> *.fa fasta files (in edit_dir) via the program \n\ consed -phdball2fasta\n\ 3. fasta files --> *.cross alignments (in edit_dir) via the program cross_match\n\ 4. alignments and phdballs --> ace file (in edit_dir) via the program\n\ consed -addReads \n\ \n\ All of the above steps are run by add454Reads.perl\n\ \n\ For solexa data, all of the steps are the same, except for the 1st step:\n\ \n\ solexa data comes in *.fastq files or *_seq.txt and *_prb.txt\n\ \"Bustard\" files (in solexa_dir)\n\ 1. solexa files --> phdball (in phdball_dir) via the program\n\ consed -solexa2PhdBall\n\ \n\ \n\ \n\ 9.90) ASSEMBLY VIEW\n\ \n\ Consed can show you a bird's eye view of the Assembly using\n\ forward/reverse pair information, sequence match information, read\n\ depth, etc. We have a test database which shows its features.\n\ \n\ Exit consed and type:\n\ cd assembly_view/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ Restart consed\n\ \n\ Double click on \"assembly_view.fasta.screen.ace.1\"\n\ \n\ In the Consed Main Window, click on the button \"Assembly View\" which is\n\ near the upper left corner of the window.\n\ "; static char szReadMe29[] = "\n\ \n\ You should see 3 grey bars with pink labels \"2\", \"3\", and \"1\". The\n\ bars are the contigs: Pink \"1\" means Contig1, pink \"2\" means Contig2,\n\ etc. Notice the scale on the contigs. This gives the contig\n\ position.\n\ \n\ 9.91) READ DEPTH\n\ \n\ We covered this briefly under Solexa Reads.\n\ \n\ You should see a dark-green graph above the contig bars. This dark\n\ green graph indicates read depth--the depth of the quality 20 (by\n\ default) region of reads. Turn off read depth as follows: Click on\n\ the button labelled \"What to Show\". A menu will popup at that\n\ location. Click on the \"Read Depth/Multiple Discrepancies\" menu item.\n\ A box will appear labelled \"Show Read Depth/Multiple Discrepancies\".\n\ It has a square (a toggle button) with \"show read depth\" to the right\n\ of the toggle button. Click on the toggle button to change it from\n\ appearing pushed in to appearing sticking out. Then click on \"Apply\".\n\ The read depth graph should disappear. If you would like, you can try\n\ showing read depth for other qualities other than 20.\n\ \n\ Note: the read depth is *not* the # of reads that have quality 20\n\ bases or above, although this number is a good approximation. For\n\ example, suppose there is a stretch of 300 Q50 bases, and in the\n\ middle of that stretch are 5 Q10 bases. Those Q10 bases will be counted\n\ toward the Q20 read depth. (In computer science terms, these bases\n\ are part of the maximal Q20 read segment.)\n\ \n\ 9.92) FORWARD/REVERSE PAIR DEPTH\n\ \n\ A \"forward/reverse pair\" is a pair of reads from the same subclone\n\ template, each of which is primed within the subclone vector, but one\n\ is primed on one side of the insert and the other is primed on the\n\ other end of the insert. A forward/reverse pair may both be assembled\n\ into the same contig, in which case they should point towards each\n\ other and be approximately the insert size apart. A forward reverse\n\ pair also might be in different contigs on different sides of a gap.\n\ \n\ 9.93) To see the graph of forward/reverse pair depth: Click on the\n\ button labelled \"What to Show\". A menu will popup at that location.\n\ Click on \"Fwd/Rev Pairs\". A box will appear labelled \"Which Fwd/Rev\n\ Pairs to Show in Assembly View\". There is a little square (a toggle\n\ button) next to \"show consistent fwd/rev pair depth\". Click on this\n\ toggle button to change it from appearing sticking out to appearing\n\ pushed in. Then click on \"Apply\".\n\ \n\ A bright green graph should appear--this is fwd/rev pair depth. It is\n\ highest around 7000 to 10000 of Contig2 and around 14000 of Contig3.\n\ The bright green graph indicates, for each base, the depth of subclone\n\ templates that have a consistent forward/reverse pair. A\n\ forward/reverse pair is \"consistent\" if the forward and reverse are\n\ pointing towards each other and are not too far away from each other.\n\ (\"Too far\" is defined as 3 or more standard deviations from the mean\n\ of the insert size of templates from a particular library.) In other\n\ words, the green graph tells for each base, how many consistent\n\ forward/reverse pairs have that base between the forward read and the\n\ reverse read. This forward/reverse pair depth is not the same as read\n\ depth, which is typically much less. Forward/reverse pair depth is\n\ important in that it gives a measure of the confidence of the assembly\n\ at a base. If the forward/reverse pair depth is close to zero, as it\n\ is in Contig1 position about 9300, there is a likelihood that phrap\n\ has made an incorrect join. When the forward/reverse pair depth is\n\ zero, the green line turns red, as it does on the right end of\n\ Contig3.\n\ \n\ 9.94) INCONSISTENT FORWARD/REVERSE PAIRS\n\ \n\ The red lines connect the right end of Contig3 with the middle of\n\ Contig1. These are filtered inconsisent forward/reverse pairs--they\n\ are \"inconsistent\" because they are not consistent (see above) and\n\ they are \"filtered\" in that they have another inconsistent read\n\ close by (at both ends) that is inconsistent for the same reason. If\n\ two red lines are on top of one another, it is displayed in purple so\n\ you know there is more than one there.\n\ \n\ This is a good example of a misassembly. There are many many reads at\n\ the right end of Contig3 that are paired with reads in the middle of\n\ Contig1. Notice that the forward/reverse pair depth of Contig1 is\n\ close to zero around base 9300. (You can use the \"Zoom In\" button to\n\ see this in more detail, but when you are done experimenting with the\n\ Zoom buttons and the scroll bar, click on \"Zoom Orig\" for the rest of\n\ this exercise.) This is where phrap made a bad join. If you tear the\n\ contig apart there, complement the left part of Contig1, and then join\n\ it to the right end of Contig3, the forward/reverse pairs will change\n\ from inconsistent to consistent. You will learn later how to do that.\n\ \n\ 9.95) Point to one of the red lines. You will notice that it turns\n\ yellow. the box near the bottom of the screen tells you a little more\n\ about what you have \"highlighted\" (turned yellow). If you want more\n\ information, click with the left mouse button. A window \"Clicked\n\ Forward/Reverse Pairs\" will appear giving information about each\n\ highlighted read. Try this. In the \"Clicked Forward/Reverse Pairs\"\n\ Window double click on one of the reads. The Aligned Reads Window\n\ should appear with the cursor on that read. This shows how to go from\n\ the Assembly View Window to the Aligned Reads Window.\n\ \n\ 9.96) You can also go from the Aligned Reads Window to the Assembly View\n\ Window. First you must make sure the Assembly View Window is already\n\ open (or else open it by clicking on Assembly View in the Consed Main\n\ "; static char szReadMe30[] = "\n\ Window). In the Aligned Reads Window, point to a read name, hold down\n\ the right mouse button, and release on \"Find Read in Assembly View\"\n\ (one of the last items in the menu the appears when you push down with\n\ the right mouse button). If the read is from a subclone that has a\n\ forward/reverse pair in the assembly, then the same \"Clicked\n\ Forward/Reverse Pairs\" Window will appear. It will contain not only\n\ the read that you pointed to, but all of the other reads from the same\n\ subclone as the one you pointed to. In the Assembly View Window, all\n\ of these reads will blink yellow. You can use this procedure to go\n\ within the Aligned Reads Window from forward read to reverse read or\n\ visa versa.\n\ \n\ 9.97) Notice the aqua and purple lines that connect the right end of\n\ Contig2 to the left end of Contig3. These are consistent gap-spanning\n\ forward/reverse pairs. If there is more than one pair on top of each\n\ other, the color is purple. These are the reads that tell you (and Consed,\n\ Autofinish, and Phrap) that the right end of Contig2 is connected to\n\ the left end of Contig3. As above, point to one to highlight it and\n\ click on it to see more information.\n\ \n\ 9.98) You can see much more information by clicking on the \"What to\n\ Show\" button, and then when the menu pops up, click on the \"Fwd/Rev\n\ Pairs\" menu item. Up will pop the \"Which Fwd/Rev Pairs to Show in\n\ Assembly View\" Window. Click on \"All\" next to \"Show Inconsistent\n\ Forward/Reverse Pairs\". Then click \"Apply\" at the bottom of this\n\ window. In this particular example, you just see a few more stray red\n\ lines. In a real example, you would probably see so many red lines\n\ that it would be a mess. In most cases those inconsistent\n\ forward/reverse pairs would be just caused by some laboratory problem\n\ (turning a plate around, mislabelling, etc) and not to any\n\ misassembly. Thus I suggest that you only generally leave \"Show\n\ Inconsistent Forward/Reverse Pairs\" to \"Filtered\".\n\ \n\ 9.99) Still in the \"Which Fwd/Rev Pairs to Show in Assembly View\"\n\ Window, click on \"Show each consistent fwd/rev pair within contigs\"\n\ (so the button looks as though it is pushed in) and click \"Apply\".\n\ This will show a blue (or purple if there is more than one at a\n\ location) square for each consistent forward/reverse pair within a\n\ contig. The horizontal position of the square is the center of the\n\ subclone (midway between the forward and reverse read) and the\n\ vertical position of the square indicates the size of the subclone\n\ (higher means a larger subclone). If you really want to see the\n\ position of the forward and reverse reads, you can do that too: Click\n\ on \"Show legs on squares for consistent fwd/rev pairs\" (\"Show each\n\ consistent fwd/rev pair within contigs\" must be still on) and click\n\ \"Apply\". What a mess! I believe most of this information is much\n\ more easily understood by just showing the \"consistent fwd/rev pair\n\ depth\" (the bright green graph described above). But it is your\n\ choice. When you want to highlight a consistent fwd/rev pair, you\n\ must point to the square--not the legs. Try it so you understand.\n\ \n\ 9.100) Suppose you have an assembly and there are some forward/reverse\n\ pairs that you specifically do not want to see in the Assembly View\n\ Window. For example, perhaps they are from a plate that was misnamed\n\ (or turned around) or from a library that is somehow less reliable.\n\ By hiding these forward/reverse pairs, the more reliable/important\n\ ones can more easily be seen. This is how you can do that:\n\ \n\ In the \"Which Fwd/Rev Pairs to Show in Assembly View\" Window, notice\n\ the line that says: \n\ Do not show templates in file doNotShowInAssemblyView.fof\n\ \n\ Underneath this are 3 buttons and probably the one that is selected is\n\ \"show all templates\". Try clicking \"do not show specified templates\"\n\ and click 'Apply'. See if you notice that anything changed in which\n\ forward/reverse pairs are displayed. If not, switch back and forth\n\ between \"show all templates\" and \"do not show specified templates\",\n\ each time clicking 'Apply'. When you see a line that appears and\n\ disappears, click on it to find what template it is. For example,\n\ djs736a2_fp04q146 is one such template. Then from an xterm in the\n\ assembly_view/edit_dir directory, type:\n\ \n\ more doNotShowInAssemblyView.fof\n\ \n\ You will see the names of the templates that are displayed/hidden.\n\ \n\ In order to hide particular forward/reverse pairs, put them into\n\ this file. This file can also contain the character '*' which means\n\ \"match any characters\". For example, djs736a1_fp* would match the template\n\ \n\ djs736a1_fp04q206\n\ \n\ but not \n\ \n\ djs736a2_fp01q127\n\ \n\ \n\ 9.101) Try turning on/off each of the Fwd/Rev Pair options so you\n\ understand them. (In this example, there are no \"consistent fwd/rev\n\ pairs between different scaffolds.\")\n\ \n\ 9.102) SEQUENCE MATCHES\n\ \n\ Notice the curvy orange lines connecting Contig1 with Contig2 and\n\ Contig3. These show sequence matches. Point at the one connecting\n\ Contig1 and Contig2 and click on it. A \"Sequence Matches\" box will\n\ popup saying that this match has 119 bases and has a similarity of\n\ 90.8%. Click on that line so its background turns black. Then click\n\ on the button \"Show Alignment\". Up will pop the Compare Contigs\n\ Window with the alignment shown in the lower half of this box. You\n\ "; static char szReadMe31[] = "\n\ will learn more about this later (see \"JOIN CONTIGS\"). For now,\n\ dismiss this window.\n\ \n\ 9.103) In the Assembly View Window, click on \"What to Show\" and then when\n\ the menu pops up, click on \"Sequence Matches\". In the \"Which Sequence\n\ Matches to Show in Assembly View\" Window, try clicking off \"ok to show\n\ sequence matches between contigs\". Then click the \"Apply\" button.\n\ You should see the orange lines disappear. (Any highlighted lines\n\ will not disappear.) Click \"ok to show sequence matches between\n\ contigs\" back on, and click \"Apply\" and the lines should be back.\n\ \n\ 9.104) Also in the \"Which Sequence Matches to Show in Assembly View\"\n\ Window, change the minimum similarity from 90 to 85. Click \"Apply\".\n\ You should see a lot more orange curvy lines, and now you should also\n\ see black curvy lines. If you look carefully, you will see that 2\n\ lines within each pair of orange curvy lines do not cross each other\n\ but the 2 lines within each pair of black curvy lines do. This is\n\ because orange is used to show direct repeats and black is used to\n\ show inverted repeats (relative to the orientation of the contigs in\n\ the Assembly View Window).\n\ \n\ 9.105) Also in the \"Which Sequence Matches to Show in Assembly View\"\n\ Window, click on \"filter seq matches by size\" and set the min size to\n\ 400 and the max size to some huge number (e.g., 1000000), leave\n\ minimum similarity at 85, and click \"Apply\". You will see just one\n\ direct repeat (orange curvy lines) of size 745.\n\ \n\ 9.106) Try some of the other ways of filtering the sequence matches on\n\ \"Which Sequence Matches to Show in Assembly View\".\n\ \n\ \n\ 9.107) You must learn this step if you are going to ever see sequence\n\ matches with your own data, so don't skip this step. If you have\n\ problems, it is likely that the phred/phrap/consed package has not\n\ been installed correctly and you will need help from your system\n\ administrator. Exit Consed and look at the files in\n\ assembly_view/edit_dir. \n\ \n\ Notice there is a file: assembly_view.fasta.screen.ace.1.aview\n\ \n\ This is what Consed uses to show sequence matches in the Assembly\n\ View Window.\n\ \n\ When you use your own data, you will not have this file so you will\n\ need to learn how to create it. Hide it from Consed by (in practice\n\ you will never do this step--this is just to simulate the .aview file\n\ not being there):\n\ \n\ mv assembly_view.fasta.screen.ace.1.aview assembly_view.fasta.screen.ace.1.aview_hide\n\ \n\ \n\ Now restart consed and select ace file\n\ assembly_view.fasta.screen.ace.1\n\ \n\ If you are asked if you want to apply edits, click the \"No\" button.\n\ \n\ Click on \"Assembly View\" in the Consed Main Window.\n\ \n\ You will get the error message:\n\ \n\ \"Sequence matches will not be shown in Assembly View because there is\n\ no file\n\ assembly_view.fasta.screen.ace.1.aview\n\ If you want sequence matches to be shown, click on \"What to show:\n\ Sequence Matches\" and then \"run cross_match\"\n\ \n\ 9.108) RUNNING CROSS_MATCH FOR SEQUENCE MATCHES\n\ \n\ Just as the instructions (above) say, click on \"What to show\" and then \n\ when the popup menu appears, click on \"Sequence Matches\" and then when \n\ the \"Which Sequence Matches to Show In Assembly View\" Window comes up, \n\ click on the \"Run Cross_Match\" button.\n\ \n\ Watch the action in the xterm. There should be several pages worth of\n\ output from cross_match that scrolls by in the xterm. If you get an\n\ error, it is likely that the phred/phrap/consed package is not\n\ correctly installed. You (or your system administrator) should track\n\ down the problems and correct them.\n\ \n\ If you are successful, then 3 orange pairs of curvy lines will appear\n\ in the Assembly View Window--the same as you saw in the steps above.\n\ \n\ \n\ \n\ 9.109) PULLING OUT READS AND RE-ASSEMBLYING THEM (MINIASSEMBLIES)\n\ \n\ When the Assembly View Window indicates (using forward-reverse pair\n\ information) that there is a misassembly, Consed provides the tools to\n\ correct that misassembly: you can first pull out the the misassembled\n\ reads from their current contigs into individual contigs, with a\n\ single read per contig. Then you can reassemble those new contigs\n\ that each contain a single read. Let's do this:\n\ \n\ \n\ 9.110) In the Assembly View Window move your cursor so that the red and\n\ purple forward/reverse pair lines turn yellow. You will be unable to\n\ get them all yellow, but get as many as you can. Then click with the\n\ left mouse button. A window labelled \"Clicked Fwd/Rev Pairs\" should\n\ appear with a very long list of reads in it (around 53 reads).\n\ \n\ "; static char szReadMe32[] = "\n\ 9.111) In the \"Clicked Fwd/Rev Pairs\" Window, click on the button labelled\n\ \"Pull out reads\". A window labelled \"Put Reads into Their Own Contigs\"\n\ should appear.\n\ \n\ 9.112) In the \"Put Reads into Their Own Contigs\" Window, select all of\n\ the reads. You can do that by clicking with the left mouse button on\n\ the first read and then scrolling down to the bottom of the list of\n\ reads, holding down the shift key and clicking with the left mouse\n\ button on the last read. (When a read is selected, its background\n\ should be black.) Click on the button \"Remove Highlighted Reads\".\n\ The Assembly View Window will close and reopen after a few seconds and\n\ will complain about not being able to show sequence matches. Save the\n\ assembly (see \"SAVING THE ASSEMBLY\" above) and follow the instructions\n\ in \"RUNNING CROSS_MATCH FOR SEQUENCE MATCHES\" (above).\n\ \n\ The assembly will now probably contain 4 contigs: 2-3-1c in one scaffold\n\ and 4 in the other. That is because when the misassembled reads were\n\ pulled out of Contig1, it fell into two new contigs: the new contig 1\n\ and contig 4. All of the reads you pulled out have created Contig5,\n\ Contig6, ... and approximately Contig58, each of which contain only a\n\ single read.\n\ \n\ 9.113) MINIASSEMBLIES\n\ \n\ On the Consed Main Window, click the button \"Miniassembly\". A box\n\ will popup labelled \"Reassemble Some Contigs\". On the left part of\n\ the box will be all contigs, from Contig1 to about Contig58. Notice\n\ that starting with Contig5 will be contigs that contain only a single\n\ read. On the right will be Contig5 through approximately Contig58.\n\ You add or delete from the list on the right. For example, to delete\n\ Contig5 from the list on the right, click on it, and then click \"Clear\n\ Highlighted\". The right list should now only contain Contig6 through\n\ the last contig. Add Contig5 back to the right list by clicking on\n\ Contig5 in the left list and then clicking on the button labelled\n\ \"Move Highlighted to Right\". Contig5 will now appear at the bottom of\n\ the list on the right.\n\ \n\ 9.114) Leave all of these boxes blank: \"-minscore\", \"-minmatch\",\n\ \"-forcelevel\", and \"other phrap options:\". Keep \"Put into separate\n\ contigs\" selected rather than \"Discard from assembly\". Click the\n\ \"Reassemble\" button. If you haven't saved the assembly, a box will\n\ popup saying \"Error You must first save the assembly before making a\n\ miniassembly\". Follow the instructions you learned above (\"SAVING THE\n\ ASSEMBLY\") to save the assembly. Then click the \"Reassemble\" button\n\ again and watch the action in the xterm. Lots of output from\n\ determineReadTypes.perl, phrap, cross_match will scroll by in the xterm\n\ as those programs run. (If they don't, you haven't correctly\n\ installed all of the Consed package.)\n\ \n\ 9.115) When the miniassembly is complete, a box will popup asking \n\ \"Are you finished miniassemblying these contigs?\" Click the \"Yes\"\n\ button.\n\ \n\ 9.116) On the Consed Main Window, click the \"Assembly View\" button.\n\ Consed will complain about not being able to show Sequence Matches so\n\ save the assembly and follow the instructions in \"RUNNING CROSS_MATCH\n\ FOR SEQUENCE MATCHES\" (above). In the Assembly View Window in\n\ addition to Contig1, Contig2, Contig3, and Contig4, you should see a\n\ few more contigs. These are the result of the miniassembly of all\n\ those individual reads.\n\ \n\ \n\ 9.117) CONTIG ARRANGEMENT--REORDER CONTIGS\n\ \n\ Contigs are arranged by Consed into \"scaffolds\" using forward/reverse pair\n\ information. However, you might have some external information (such\n\ as digest information) that tells you a different arrangement. You\n\ can use Consed to rearrange the contigs. This new arrangement will be\n\ preserved even if you reassemble.\n\ \n\ 9.118) Exit Consed and then restart Consed.\n\ \n\ Double click on \"assembly_view.fasta.screen.ace.1\"\n\ \n\ (If a window pops up saying \"There is an edit history file ( a .wrk\n\ file )...\", click the \"No\" button.)\n\ \n\ Click on the \"Assembly View\" button. You will see two scaffolds: one\n\ on the top row with Contig2 and Contig3, and one on the bottom row\n\ with just Contig1. Now suppose that you believe that Contig2 and\n\ Contig1 are connected together instead of Contig2 and Contig3. To do\n\ this:\n\ \n\ 9.119) Within the Assembly View Window, click on the \"Contig Arrangement\"\n\ button. Up will pop a menu. Click on \"Reorder Contigs\". A \"Reorder\n\ Contigs\" Window will pop up. Enter the following information:\n\ \n\ Contig: 2 [Right End] connected to Contig: 1 [Left End]\n\ \n\ That is, you must enter \"2\" and \"1\" in the contig boxes, and you must\n\ click on the first \"right end\" button. \n\ \n\ Then click on the \"Add and Restart Assembly View\" button. A warning\n\ box will pop up telling you that you are crazy, because there are 12\n\ forward/reverse pairs as evidence that the scaffold as displayed in\n\ the Assembly View Window is already correct. Click on \"yes\"--that you\n\ are sure.\n\ \n\ Well, that isn't quite what you wanted. Contig 2 and Contig3 are\n\ still together. So connected the other end of Contig1:\n\ "; static char szReadMe33[] = "\n\ \n\ Contig: 1 [Right End] connected to Contig: 3 [Left End]\n\ \n\ Then click on the \"Add and Restart Assembly View\" button. A warning\n\ box will pop up again. Click on \"yes\"--that you are sure.\n\ \n\ The Assembly View Window will disappear for a second and reappear,\n\ with Consed2 and Contig1 connected together, just as you wanted.\n\ \n\ 9.120) CONTIG ORIENTATION\n\ \n\ Some users want a scaffold oriented a particular way. For\n\ example, one user might be working on a particular gene so wants to\n\ always view the top strand of that gene. Another user might be\n\ finishing a BAC and wants a particular end of the BAC on the left of\n\ the scaffold. The assembly (such as Phrap), however, may not respect\n\ their wishes and might have contigs complemented from the way the\n\ users want to view them. Consed provides a way for the user to\n\ indicate his/her desired orientation, and thereafter if phrap\n\ complements a contig from that desired orientation, Consed will\n\ complement the contig back when Consed starts up.\n\ \n\ To demonstrate this, exit Consed and then restart Consed.\n\ \n\ Double click on \"assembly_view.fasta.screen.ace.1\"\n\ \n\ In the Consed Main Window, double click on Contig1. You will see read\n\ djs736a2_fp02q494.y1 pointing left. But let's suppose that you would\n\ rather the Contig be in the other orientation, with read\n\ djs736a2_fp02q494.y1 pointing right. \n\ \n\ In the Consed Main Window, click on Assembly View. Then click on the\n\ button labelled \"contig arrangement\". When a popup menu comes up,\n\ click on \"Reorient Contigs\". The \"Reorient Contigs Window\" should\n\ come up. Highlight the scaffold labelled \"1\" under \"Select a\n\ scaffold\". Click on \"flip scaffold\". Then push the button labelled\n\ \"Apply and Restart Assembly View\". There will be an error box\n\ complaining about not being able to show sequence matches. To fix\n\ that, save the assembly and follow the instructions in \"RUNNING\n\ CROSS_MATCH FOR SEQUENCE MATCHES\" (above). In the Consed Main Window,\n\ double click on Contig1 so the Aligned Reads Window comes up. Scroll\n\ to the right end. You will notice that djs736a2_fp02q494.y1 is now on \n\ the right end pointing right. \n\ \n\ What is the difference between doing this and just complementing the\n\ contig, which just requires the click of a button? The difference is\n\ that complementing the contig will be undone the next time phrap runs\n\ (you reassemble), but using this procedure will be permanent, even if\n\ phrap complements the contig.\n\ \n\ 9.121) RESTRICTION FRAGMENTS\n\ \n\ We'll look at this feature in Assembly View after we've learned how to \n\ use the Restriction Fragment Window.\n\ \n\ 9.122) USING ANOTHER PROGRAM TO FIND CONSENSUS SITES (SUCH AS POLYMORPHIC SITES)\n\ \n\ [POLYPHRED]\n\ \n\ This example applies not just to polyphred, but to any program that\n\ would find for you particular positions on the consensus.\n\ \n\ Polyphred is a program for finding polymorphic sites; it was developed by\n\ Debbie Nickerson's group (contact them at http://droog.mbt.washington.edu).\n\ \n\ We have a test database, 'polyphred', which has had polyphred run on\n\ it already. Polyphred has put a polymorphism tag on each polymorphic\n\ site. \n\ \n\ If Consed is running, exit it.\n\ \n\ Type:\n\ \n\ cd polyphred/edit_dir\n\ \n\ (You might need to first type \"cd ../..\" depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ \n\ Restart Consed.\n\ \n\ Double click on example2.fasta.screen.ace.1\n\ \n\ When Consed comes up, you should see 2 contigs.\n\ Double click on Contig2\n\ \n\ In the Aligned Reads Window, push the left mouse button while pointing\n\ to the 'Navigate' menu and release on: \n\ \n\ 'Toggle feature: when navigating to consensus location, pop up all\n\ traces (currently off)' \n\ \n\ That will turn this feature on.\n\ \n\ Now push the left mouse button while pointing to the 'Navigate' menu\n\ and release on 'Tags'. Up should pop a list of tag types. Double\n\ click on 'polymorphism'. Polyphred has already been run so the\n\ consensus is tagged with polymorphism tags at each polymorphic site. \n\ Up will pop a window labelled 'Polymorphism Tags' with a list of\n\ "; static char szReadMe34[] = "\n\ sites. Click on 'Next'.\n\ \n\ If you correctly followed the instructions above, all the traces should\n\ pop up at the first polymorphic site. You may want to reposition the\n\ traces window to see it better. \n\ \n\ Now ignore the original 'Polymorphism Tags' window and instead click\n\ on 'Next' in the *traces* window. This will take you to the next\n\ polymorphic site. Pretty nice, huh?\n\ \n\ Many labs write programs that apply tags to the consensus, and then\n\ their staff uses consed to review those sites using the procedure\n\ above.\n\ \n\ \n\ \n\ 9.123) NAVIGATING\n\ \n\ If consed is running, exit it.\n\ \n\ Type:\n\ \n\ cd standard/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ Restart Consed.\n\ \n\ Double click on standard.fasta.screen.ace.1\n\ \n\ Double click on Contig1\n\ \n\ In the Aligned Reads window, pull down the Navigate menu and\n\ release on 'Low consensus quality'. You will see a list of locations.\n\ Move the 'Low consensus quality' window down so you can see the\n\ Aligned Reads window. \n\ \n\ Repeatedly click on 'Next' until you reach the end of the list. (Low\n\ consensus quality means an area in which the bases each have too high\n\ probability of being wrong.) This saves you from having to look\n\ through large amounts of high quality data trying to find problem\n\ areas.\n\ \n\ There are 2 'Next' buttons--one on the Aligned Reads Window and one on\n\ the Low Consensus Quality Window. You can click on either, but it is\n\ probably more convenient to use the 'Next' button on the Aligned Reads\n\ Window. Thus you can keep the Aligned Reads Window in\n\ front with input focus and keep the Low consensus quality window\n\ pushed out of the way.\n\ \n\ You may want to click on the 'Save' button in the Low consensus\n\ quality Window to save to a file a copy of this list of problem areas\n\ as you work through them.\n\ \n\ In our experience, this will be the most important navigate list you\n\ will use. In fact, finishing partly consists mainly of adding reads\n\ and rephrapping until this list is reduced to nothing.\n\ \n\ 9.124) Dismiss the Low consensus quality window. Pull down the\n\ 'Navigate' menu again and release on 'High quality discrepancies as\n\ above, but omitting tagged compressions and G_dropouts'. You will\n\ probably notice there are no entries (unless you created some yourself\n\ by editing). That is because there are no high quality discrepancies\n\ with this dataset. So let's force there to be some by lowering the\n\ quality threshold. First, dismiss the High quality discrepancies\n\ window.\n\ \n\ Click on 'Find Main Win'. In the Consed Main Window, pulldown the\n\ 'Options' menu and release on 'General Preferences'. Notice that the\n\ default for 'Threshold for High Quality Discrepancy' is 40. Change it\n\ to 15 and click 'Apply & Dismiss'.\n\ \n\ Then follow the steps above to bring up the High quality discrepancies\n\ menu. Now you will see several entries. Click 'next' repeatedly to\n\ go successively to the next high quality discrepancy in the Aligned\n\ Reads Window.\n\ \n\ You can also double click on a particular line in the High quality\n\ discrepancies window to go to that location. Alternatively, you can\n\ single click on a line and then click the 'Go' button.\n\ \n\ Dismiss the High quality discrepancies window.\n\ \n\ \n\ 9.125) Similarly, try the other navigate lists: Unaligned high quality\n\ regions (this list will be empty with this data set), Edits, Regions\n\ covered by only 1 strand and only 1 chemistry, and Regions covered by only 1\n\ subclone.\n\ \n\ Unaligned high quality regions are regions in which the traces are\n\ high quality so there is no question of the bases, but the region\n\ differs so much from other reads that phrap has given up trying to\n\ align the region with the consensus. This could be due to a chimeric\n\ read, or perhaps the read belongs somewhere else.\n\ \n\ We believe that regions covered by only 1 subclone should be covered\n\ by a 2nd subclone to prevent the possibility of there being a deletion\n\ in the single subclone.\n\ \n\ "; static char szReadMe35[] = "\n\ There are so many different problem lists that you may forget to check\n\ one of them and thus miss a serious problem. Thus we combined them\n\ all into a single list. This is the first menu item: 'Low Cons/High\n\ Qual Discrep/Single Stranded/Single Subclone/Unaligned High'. We\n\ suggest you use this list.\n\ \n\ 9.126) Also try navigate by tags by selecting 'tags' under navigate: when\n\ the Select Tag Type Window appears, double click on 'compression'.\n\ (Note that you can't do anything else until you deal with this\n\ window.) This gives a list of a particular tag type in a particular\n\ contig.\n\ \n\ 9.127) There is also a way of getting such a list in *ALL* contigs: Click\n\ on 'Find Main Win'. In the Consed Main Window, point to the\n\ 'Navigate' menu, hold down the left mouse button, and release on 'Tags\n\ in all contigs'. You can continue as in the previous step. (Since\n\ there is only one contig, this list will not be any different than the\n\ corresponding list for Contig1.) However, this actually gives you a\n\ lot more power: you can search not just for one tag type, but for\n\ several. Try selecting clicking on several tag types and then\n\ clicking 'ok'. (The only problem is that there are only compression\n\ tags in this assembly. When you learn how to create tags below, you\n\ will be able to search for multiple tag types.) \n\ \n\ To speed-up selecting tag types, do this: type 'heter' in the box and\n\ click 'select'. You will notice that all of the heterozygote tags are\n\ selected. Then click 'ok' to find them all.\n\ \n\ \n\ 9.128) CUSTOM NAVIGATION\n\ \n\ In the Main Window, there is also a Navigate menu. Pull it down and\n\ release on the Custom Navigation menu item. A box will pop up saying\n\ \n\ 'Select custom navigation file:' \n\ \n\ There will be a file:\n\ \n\ custom_navigation.nav\n\ \n\ Double click on it.\n\ \n\ You will see the now-familiar custom navigation box. Click 'Next'\n\ repeatedly until you get to the end of the list.\n\ \n\ This list of locations is chosen by some program other than consed.\n\ Many labs write such programs themselves. This allows a human to\n\ quickly review the sites the program has chosen. If your lab is\n\ interested in writing such a program, see below under HOW TO WRITE A\n\ CUSTOM NAVIGATION FILE.\n\ \n\ \n\ \n\ \n\ 9.129) PRIMER-PICKING\n\ \n\ \n\ Go to some location near the right end of the contig, say base\n\ 2470. Click with the right mouse button on the consensus and click on\n\ either one of the top strand primer choices (either from subclone\n\ template or from clone template). Consed will pause a moment, and\n\ then there will appear a selection of primers that pass all of\n\ Consed's requirements. (If you get an error message, Consed might not\n\ have been correctly installed. See INSTALLING CONSED above.)\n\ Templates are also chosen for each primer. You may have to scroll the\n\ primer list to the right to see the templates. Consed lists these\n\ templates in order of quality--all of them will cover the read you\n\ want to make.\n\ \n\ 9.130) Double click on one of the primers in the Primers Window. That\n\ will cause the Aligned Reads Window to scroll to show that oligo in\n\ context. Click on 'Accept Primer'. A comment box will pop up. Enter\n\ some comment and click 'OK'. Notice that a yellow oligo tag, with a\n\ little red end, is created on the consensus for that primer. The red\n\ end points in the direction of the oligo.\n\ \n\ 9.131) Point to the yellow and press down the right mouse button and then\n\ release on 'Tag: oligo ... show more info?' A box will popup with\n\ much information about the oligo--all you need to order that oligo and \n\ do the reaction. Notice the field: 'Oligo name'. The name should be \n\ something like 'standard.1'.\n\ \n\ 9.132) If you can't find the oligo, you can find it again using its name. In \n\ the Main Consed Window, point to the \"Navigation\" menu, push down the\n\ left mouse button and release on \"Search for oligo tags by name\". A\n\ box will pop up saying \"Search for Oligo Tags\". Enter \"standard.1\"\n\ (or whatever the name is). Click \"search\". The Aligned Reads Window\n\ will scroll to the location of that tag. (To see this, you must first \n\ scroll the Aligned Reads Window so you can't see the tag.)\n\ \n\ 9.133) To check whether the primer matches some other location in the\n\ assembly, do the following: As before, in the Aligned Reads Window,\n\ point to the yellow tag and press down the right mouse button and then\n\ release on 'Tag: oligo ... show more info?' A box will popup. Click\n\ on the button labelled 'search for oligo bases'. \n\ \n\ Note that Consed's primer picking will generally (there are some\n\ exceptions) not pick primers that match to more than one location.\n\ However, if you have added more information and/or reassembled since\n\ that primer was picked, there now could be another location that the\n\ "; static char szReadMe36[] = "\n\ primer matches to.\n\ \n\ I would suggest you just accept the first primer in the list.\n\ However, if you want to understand the differences, here is the\n\ explanation (if you want more information, see Gordon 1998 listed in\n\ the consed references).\n\ \n\ -----matches----- min\n\ self false vector qua\n\ 4 22 13 50\n\ \n\ \"4\" is a measure of the primer's match to itself or another copy of\n\ itself forming a loop or primer-dimer making it less available for\n\ priming. Bigger is worse.\n\ \n\ \"22\" is a measure of a match to some other location (not the location\n\ you want) on the template. Bigger is worse.\n\ \n\ \"13\" is a measure of the match to the vector sequence(s) that are in\n\ the vector files. Bigger is worse. Typically the vector files are:\n\ /usr/local/genome/lib/screenLibs/primerSubcloneScreen.seq\n\ or\n\ /usr/local/genome/lib/screenLibs/primerCloneScreen.seq\n\ but there are .consedrc parameters that allow these files to be some\n\ place else.\n\ \n\ \"50\" is the minimum consensus quality of the primer. Bigger is better\n\ because it gives you greater confidence that the primer sequence is\n\ correct at the location you want to prime.\n\ \n\ When picking primers (above), what is the difference between 'Pick\n\ Primer from Subclone Template' and 'Pick Primer from Clone Template'?\n\ \n\ There are 3 differences: \n\ \n\ A. which vector file the primers are screened against. In the former\n\ case, the primer is screened against the file primerSubcloneScreen.seq\n\ and in the latter case against the file primerCloneScreen.seq \n\ \n\ B. In checking for false matches elsewhere in the assembly, if the\n\ template is the whole clone, then Consed must check for false matches\n\ in the *entire* assembly, including all other contigs. But if the\n\ template is just going to be a subclone, Consed only needs to check\n\ elsewhere in that subclone. Actually, to be conservative, Consed\n\ checks for false matches +/- the maximum insert size of a subclone.\n\ \n\ C. If you are picking primers for subclone template, then the primer\n\ picker can also pick the subclone templates. If it doesn't find any\n\ suitable subclone template, it will reject the primer. (By default,\n\ picking of subclone templates is turned on. If you prefer to pick\n\ your own templates, and want Consed's primer picker to be much faster,\n\ you can turn it off temporarily or permanently. To turn it off\n\ temporarily, go to the Consed Main Window, point to the Options menu,\n\ hold down the left mouse button and release on 'Primer Picking\n\ Preferences'. Scroll down to 'Pick Subclone Templates for Primers'\n\ and click 'False'. Click on 'Apply and Dismiss'. To change this\n\ permanently, see CONSED CUSTOMIZATION below. Beware: you must\n\ correctly customize determineReadTypes.perl for template picking to\n\ work. See INSTALLING CONSED above.)\n\ \n\ If you are interested in the details of primer-picking, type:\n\ \n\ consed -printDefaultResources\n\ \n\ which will tell you the primer-picking parameters and what they do.\n\ \n\ \n\ 9.134) CHECKING WHETHER A PARTICULAR OLIGO WOULD MAKE AN ACCEPTABLE PRIMER\n\ \n\ You can check this as follows:\n\ \n\ In the Aligned Reads Window, point to the 'Misc' menu, hold down the\n\ left mouse button and release on 'Check Primer'. Enter the left and\n\ right consensus positions of the primer, check which strand, and\n\ whether the primer is to use subclone templates or the whole clone as\n\ a template. For example, type 2340 for left and 2360 for right,\n\ select \"->\" (top strand) and subclone. Then click \"Check Primer\".\n\ A box \"What is Wrong With This Primer\" will pop up telling you what is\n\ and is not acceptable about this primer.\n\ \n\ \n\ 9.135) PICKING PCR PRIMER PAIRS\n\ \n\ In the Aligned Reads Window, go to the location where you want to pick\n\ the first PCR primer, say base 500. Point to the consensus, hold down\n\ the right mouse button and release on 'Top Strand PCR Primer'. Then\n\ scroll to the location where you want to pick the second PCR primer,\n\ say base 2200. Point to the consensus, hold down the right mouse\n\ button and release on \"Bottom Strand PCR Primer\". There will be a\n\ pause and then there will be a list of PCR primer pairs. Click on the \n\ pair you want and click \"Accept Pair\". \n\ \n\ You can modify the parameters for choosing PCR primer pairs by going\n\ to the Consed Main Window, pointing to \"Options\", holding down the\n\ left mouse button, and releasing on \"Primer Picking Preferences.\" For\n\ example, by default Consed does not display all PCR primer pairs--this\n\ would take too long and give you too many. However, you can ask it to\n\ show you all such pairs. In the Primer Picking Preferences, scroll\n\ down to \"Check All PCR Pairs (huge) or Just Sample?\" and click on\n\ \"All\". Then click on \"Apply and Dismiss\". Then pick PCR primers\n\ "; static char szReadMe37[] = "\n\ again, as above. Don't be surprised if you get 10,000 or more pairs\n\ of primers!\n\ \n\ (PCR Primers are screened for: melting temperature and length, the\n\ melting temperature of the 2 primers must be sufficiently close to\n\ each other, each primers must not stick to itself or to the other\n\ primer, no mononucleotide repeats, only ACGT's (no n's or ambiguity\n\ codes), and primer pair must not amplify any other location. There\n\ are many more details...)\n\ \n\ \n\ 9.136) ORDERING OF PRIMERS\n\ \n\ I heard of a finisher who manually ordered 72 primers. She had to\n\ cut/paste the bases of each primer. That is not only painful, but\n\ also error prone. I've supplied you a script that you can use to save to a\n\ file all primers that you have selected.\n\ \n\ The primers and are saved in the ace file when you exit consed, so\n\ exit consed.\n\ \n\ The script is ace2Oligos.perl. It takes as command line arguments the\n\ name of an ace file and the name of the primer file. The primer file\n\ is a list of primers that have been ordered for that particular\n\ project, and looks like this:\n\ \n\ name=G1980A181.1\n\ sequence=ctgcatggctaggga\n\ template=seq from subclone\n\ date=980427 temp=52\n\ \n\ name=G1980A181.2\n\ sequence=tcttactttctgactttcattt\n\ template=seq from clone\n\ date=980427 temp=50\n\ \n\ ace2Oligos.perl finds all oligo tags in the ace file and makes sure\n\ that all of them are in this primer file.\n\ \n\ ace2Oligos.perl does not record the comments that the finisher entered\n\ when creating the oligo. If you want to record that as well, you\n\ could use the script ace2OligosWithComments.perl which was written by\n\ a Consed user and thus is found in the 'contributions' directory.\n\ \n\ \n\ \n\ 9.137) SEARCH FOR STRING\n\ \n\ Try the 'Search for String' button (left side of the Aligned Reads\n\ Window). Type in a string (such as aaaca), and click 'ok'. There\n\ should be a list of 'hits'. Double click on one of the hits (or\n\ single click on it and click on 'go'.) Notice that the Aligned Reads\n\ Window scrolls to that position and has the cursor on the found\n\ string. (It might be complemented.)\n\ \n\ Dismiss this window. Try this again, only this time in the Search For\n\ String Window select 'Search Just Reads'. Then click 'OK'. You will\n\ notice there are many more hits. This is because this shows hits in\n\ each read, even if they are at the same consensus position.\n\ \n\ You can also try the approximate match search for string by clicking\n\ on 'Approximate' instead of 'Exact'. The 'Per Cent Mismatch' only\n\ applies to the Approximate match search. \n\ \n\ 9.138) COPY AND PASTE\n\ \n\ In the Aligned Reads Window, swipe some bases by holding down the\n\ left mouse button. You should see the bases turn yellow, at least\n\ temporarily. Then click the 'Search for String' button. Use the\n\ middle mouse button to paste the bases you have just swiped into the\n\ 'Query string:' box. Notice that you can swipe bases either from the\n\ consensus or from a read.\n\ \n\ The search for string is case-insensitive so don't worry about the\n\ pasting being upper or lowercase.\n\ \n\ \n\ 9.139) ADD NEW READS (SANGER--NOT SOLEXA OR 454)\n\ \n\ For this to work, your system administrator must have set up\n\ everything correctly. (See below in INSTALLING CONSED.) Assuming you\n\ have set everything up correctly, you can now experiment with adding\n\ reads.\n\ \n\ From a UNIX prompt, copy the new chromatograms into the chromat_dir\n\ directory:\n\ \n\ cp ../chromats_to_add/* ../chromat_dir\n\ \n\ Exit Consed and bring it up again using the original ace file\n\ standard.fasta.screen.ace.1 \n\ \n\ If it asks if you want to apply edits, just say 'no'.\n\ \n\ On the Main Window, click on the Add New Reads button. There will\n\ appear a list of files ending with .fof. These are files that contain\n\ lists of chromatograms. Double click on 'reads_to_add.fof' (Accept\n\ the defaults for the other options in this window.)\n\ \n\ There should be lots of progress output in the xterm from which you\n\ "; static char szReadMe38[] = "\n\ started Consed. When it completes, there will be a Reads Added Window\n\ popup with a report of which reads were added. In this case, it\n\ should say that 9 reads were successfully added and list them.\n\ \n\ If you get an error message, look carefully at the full error message\n\ in the xterm to diagnose the problem. Probably there is some mistake\n\ in how you installed Consed. See INSTALLING CONSED (above).\n\ \n\ \n\ 9.140) TEAR CONTIG\n\ \n\ Just so you get the same results as I do, exit Consed and bring it up\n\ again using the original ace file\n\ \n\ standard.fasta.screen.ace.1 \n\ \n\ If it asks if you want to apply edits, just say 'no'.\n\ \n\ \n\ 9.141) When phrap really screws up, you may want to just tear the contig\n\ apart in several places and then join the pieces back together in a\n\ different way. Let's try it:\n\ \n\ Go to location 1500. Point the mouse at the consensus base at 1500\n\ and push the right mouse button down. Release the button on 'Tear\n\ Contig at This Consensus Position'. You will notice that in the\n\ Aligned Reads Window, 4 read names are now colored purple:\n\ djs74-996.s2, djs74-2689.s1, djs74-564.s1, and djs74-2931.s1. The\n\ purple reads are consed's suggestions of which visible reads will go\n\ into the new left contig. The read's that are not colored purple will\n\ go into the new right contig. If you click on a read name in the\n\ Aligned Reads Window, it will switch back and forth between purple and\n\ not purple. Leave everything as it is and just click 'Do Tear'. (If\n\ you want to play around with which reads goes into which contig, do\n\ that another time.)\n\ \n\ Now you should have 2 Aligned Reads Windows on top of each other. One\n\ should contain 'Contig2' and the other 'Contig3'. Dismiss the little\n\ window that says 'Tear Complete'.\n\ \n\ \n\ 9.142) JOIN CONTIGS\n\ \n\ Now let's join these 2 contigs back together:\n\ \n\ \n\ Click on 'Search for String' and type in the following bases:\n\ agctgccatc\n\ \n\ Click 'OK'. \n\ \n\ Search for string should find 2 locations, one in Contig2 and one in\n\ Contig3:\n\ \n\ Contig2 (consensus) 1447-1456 (uncomplemented)\n\ Contig3 (consensus) 829-838 (uncomplemented)\n\ \n\ Double click on the first one. The Aligned Reads Window for Contig2\n\ will scroll to location 1447 and the window will raise up. In that\n\ Aligned Reads Window, click on 'Compare Cont'.\n\ \n\ Now double click on the 'Contig3' line in the above Search for String\n\ results. The Aligned Reads Window for Contig3 will scroll to location\n\ 829 and lift up. In that Aligned Reads Window, click on 'Compare\n\ Cont'.\n\ \n\ Now the Compare Contigs Window should be visible. In the Compare\n\ Contigs Window, try scrolling back and forth. You can change the\n\ cursors (blinking red), but if you do, please return them to the\n\ locations 1447 and 829 for the next step. The cursors 'pin' these\n\ bases together when doing an alignment. (The algorithm is a pinned\n\ and banded Smith-Waterman alignment.)\n\ \n\ Click on Align. Try scrolling the alignment by dragging the thumb in\n\ the lower half of the Compare Contigs. An 'X' means there is a\n\ discrepancy between the 2 contigs. There is also a 'P' (see if you\n\ can find it!) The P indicates the bases that you pinned together.\n\ \n\ You will also notice that some bases are lighter and some are darker.\n\ This indicates quality just as in the Aligned Reads Window. You will\n\ notice that wherever there an is a discrepancy (an 'X') one of the\n\ bases is low quality. This is your cue that the discrepancy is just a\n\ base calling error rather than indicating that the two contigs really\n\ are different but similar locations.\n\ \n\ Click a few times on \"Next Discrepancy.\" Then click on \"Prev\n\ Discrepancy.\" These buttons will take to to each discrepancy (X).\n\ Notice that as you move from X to X in this manner, the Aligned Reads\n\ Windows scroll as well. Bring up traces for one of the contigs and\n\ see how the traces will scroll also.\n\ \n\ Click with the left mouse button on either contig in the bottom\n\ alignment. You will notice that both contigs will have the red\n\ blinking cursor in the same position. Click on 'Scroll Both Aligned\n\ Reads Windows' and look at the Aligned Reads Windows to see that they\n\ scroll to the corresponding positions.\n\ \n\ The number of discrepancies and discrepancy rate is also\n\ displayed--find this.\n\ \n\ "; static char szReadMe39[] = "\n\ Finally click 'Join Contigs'. The 2 previous Aligned Reads Windows will\n\ disappear and there will be a new one which has a new contig\n\ 'Contig4'. You have made a join!\n\ \n\ Scroll left and right. You will notice that many of the reads are\n\ highlighted. These are the reads that came from the previous \"right\"\n\ contig. To unhighlight all of these reads at once, point to the\n\ \"Misc\" menu, hold down the left mouse button and release on\n\ \"Unhighlight All Reads\".\n\ \n\ It is possible to have more than one Compare Contigs Windows up at a\n\ time. This allows you to investigate a repeat that has more than 2\n\ copies.\n\ \n\ \n\ \n\ 9.143) COMPARE CONTIGS WINDOW AND INVERTED REPEATS\n\ \n\ In the above example, we used the Compare Contigs Window to\n\ examine a sequence match between two different contigs. It is also\n\ possible to use the Compare Contigs Window to examine a sequence\n\ match between two copies of a repeat within the same contig, either\n\ direct or inverted. \n\ \n\ 9.144) To see this, restart Consed:\n\ \n\ ../../consed_(computer type) \n\ Double click on standard.fasta.screen.ace.1\n\ \n\ When it says \"There is an edit history file (a .wrk file)...Do you\n\ want to apply those edits?\", click on \"no\".\n\ \n\ Double click on Contig1 to bring up the Aligned Reads Window. Go to\n\ position 69 (use the \"Pos:\" box described above). Click the \"Compare\n\ Cont\" button on the Aligned Reads Window. The Compare Contigs Window\n\ will popup, but move it aside. Go to position 2035 in the Aligned\n\ Reads Window. Click the \"Compare Contig\" button again on the Aligned\n\ Reads Window. In the Compare Contigs Window there are two copies of\n\ Contig1--one on top and one on the bottom. Each has a \"complement\n\ just in this window\" button. Click on the bottom one (the one that\n\ has position 2035 blinking red). After clicking on it, you should\n\ notice that the numbers on the bottom contig are reversed to they\n\ decrease to the right--a copy of Contig1 has been reversed and\n\ complemented. Now click the \"Align\" button. Suddenly, you should see\n\ the alignment appear in the bottom half of the Compare Contigs Window.\n\ You should see bases between 69-78 aligned against the reversed\n\ complement of bases from 2026-2035.\n\ \n\ This has shown how you explore an inverted repeat. If you wanted to\n\ examine a direct repeat, you would use the same method except you\n\ wouldn't click on the \"complement just in this window\" button.\n\ \n\ Compare Contigs is one method of exploring joins of contigs that were\n\ not made by phrap. Another method is to use the Assembly View Window\n\ (above). They are designed to work together: the Assembly View Window\n\ gives a high level view of all sequence matches and takes you to the\n\ Compare Contigs Window which shows the alignment of a single sequence\n\ match and, if the user so desires, makes a join.\n\ \n\ \n\ 9.145) REMOVING READS\n\ \n\ You can remove individual reads and put them into their own\n\ contigs. For example, in the Aligned Reads Window, go to location\n\ 2000. Point to the read name of read djs74_2664.s1 and hold down the\n\ right mouse button. Release on 'Put read djs74_2664.s1 into its own\n\ contig.' Presto-chango! The read is put into its own contig and the\n\ old contig is redrawn without the read in it. At this point you\n\ should save the assembly--you should always save the assembly after\n\ removing reads.\n\ \n\ 9.146) You can also remove many reads at once.\n\ \n\ Look at the Consed Main Window. Click on \"Remove Reads\". Type into\n\ the \"File of read names:\" box \"reads_to_remove.fof\" and either push\n\ the \"Enter\" key or click on \"Read File\". You should see a list of 2\n\ reads:\n\ \n\ djs74-2231.s1\n\ djs74-3174.s1\n\ \n\ You can click back and forth between the choices of \"Delete Reads from\n\ Assembly\" and \"Just Put Each Read into Its Own Contig\". Try each\n\ one. \n\ \n\ \n\ Delete Reads from Assembly means that the read will no longer appear\n\ in Consed. When you are using your own data and you really want to\n\ remove reads from the assembly, you must also use the UNIX \"rm\"\n\ command to remove the corresponding phd files from phd_dir and the\n\ chromatograms from chromat_dir. Otherwise, the next time you run\n\ phredPhrap, the reads, like Phoenix, will rise again to become part of\n\ the next assembly.\n\ \n\ Notice that you can also remove all reads in a particular contig.\n\ \n\ After you have completed this exercise, restart Consed so that you\n\ have all the reads in their original locations for the following\n\ exercises.\n\ \n\ "; static char szReadMe40[] = "\n\ \n\ There is also a method of removing reads from a script without using\n\ Consed's graphical interface. See \"consed -removeReads\" below.\n\ \n\ \n\ \n\ 9.147) TAGS\n\ \n\ Bring up a trace for a read (as above). Swipe some bases on the\n\ 'edt' line while holding the middle mouse button down. A list of\n\ choices will pop up. Select 'Add Tag'. Type in a comment in the box\n\ at the bottom, and select 'comment' from the list of tag types. You\n\ will now see a blue box both in the Aligned Reads Window and in the\n\ Traces Window on that read.\n\ \n\ To see the comment, you can just point to it in the Aligned Reads\n\ Window and you will see the comment in the lower right hand corner of\n\ the Aligned Reads Window. Alternatively, you can click on that blue\n\ tag in the Aligned Reads Window with the right mouse button and\n\ release on 'Tag: comment Show more info?'. Alternatively, you can\n\ click on the blue tag in the Traces Window with the right mouse\n\ button.\n\ \n\ Try creating some other kinds of tags: again swipe some bases in the\n\ Trace Window by selecting a different tag type. You will notice that\n\ different tags are in different colors. You can always use the\n\ methods above to see what kind of tag it is if you forget what a\n\ particular color means.\n\ \n\ Create a tag and enter for the comment 'lazy fox'. Then in the Main\n\ Consed Window, push down the left mouse button on 'Navigate' and\n\ release on 'Search for tags/find string in comment'. In the box,\n\ enter 'fox' and click 'Search'. The tag should appear in a navigation \n\ window. In this manner, you can find (and go to) all tags with 'fox'\n\ in the comment.\n\ \n\ You can also define your own tag types. See below CREATING CUSTOM TAG\n\ TYPES for how to do that.\n\ \n\ 9.148) CREATING LONG TAGS\n\ \n\ You can create really, really long tags as follows: Just create a\n\ short version of the tag as above for where you want the tag to start.\n\ Then figure out the consensus position of where you want the tag to\n\ end. In the Aligned Reads Window, click on the short tag with the\n\ right mouse button and release on 'tag: show more info?' (as above).\n\ A Tag Window will appear for that tag. In the Tag Window, simply\n\ change the End Unpadded Consensus Position to the place you want it to\n\ end. Then click 'OK'. You will now notice that the tag will be as\n\ long as you wanted.\n\ \n\ 9.149) CONSENSUS TAGS\n\ \n\ You can create tags on the consensus in the same way. In the\n\ Aligned Reads Window, use the middle mouse button to swipe some bases\n\ on the consensus in the Aligned Reads Window. Up will pop a list of\n\ tag types. Click on one of them. Try it again somewhere else. Try\n\ it with the tag type being 'comment'. In this case, you must enter a\n\ comment. Notice the pretty colors! If you forget which tag type a particular\n\ color represents, just point at the colored tag with the mouse and the\n\ tag type will be displayed at the bottom of the Aligned Reads Window.\n\ \n\ 9.150) Try creating some tags that overlap each other. You will notice\n\ that the overlapping region will be purple. If you want to know which\n\ tags overlap, you can use any of the methods already discussed.\n\ \n\ \n\ 9.151) WHAT THE COLORS MEAN\n\ \n\ At this point, you should know which each of the following colors\n\ means (the answer is further below--no peeking!):\n\ \n\ Dark grey background of a base vs very light background of a base\n\ Grey base with black background\n\ Red base\n\ Black base\n\ Color area covering lower half of a base\n\ Purple area covering lower half of a base\n\ \n\ \n\ \n\ 9.152) SEARCH FOR READ NAME\n\ \n\ Restart Consed using the original ace file\n\ \n\ standard.fasta.screen.ace.1 \n\ \n\ If it asks if you want to apply edits, just say 'no'.\n\ \n\ Instead of clicking on a read or contig name, type a read name into\n\ the \"Find reads containing (*'s allowed):\". If you want to look at\n\ the location containing read djs74-2689.s1, you can just type \"2689\"\n\ and then push the \"Enter\" key and Consed will immediately bring up the\n\ Aligned Reads Window with the cursor on read djs74-2689.s1. Suppose\n\ that there were more than one read that matched? For example, suppose\n\ you type: \"26\" and then push the \"Enter\" key. This matches 3 reads:\n\ \n\ djs74-2689.s1\n\ djs74-2679.s1\n\ djs74-2664.s1 \n\ "; static char szReadMe41[] = "\n\ \n\ Try it and see what happens...\n\ \n\ Try entering \"26*9\" and see what happens. What does the \"*\" mean?\n\ \n\ Try using \"Find 1st read starting with:\". Try typing djs74-2 You will\n\ notice that as you type each letter, the first item in the list that\n\ matches the letters typed will be highlighted. Experiment with\n\ deleting a few letters and typing others. This is a powerful method\n\ of quickly getting to the read name you are interested in. When you\n\ get to the name in the list, you do not have to type the rest of the\n\ name--just type carriage return or else click on 'OK'.\n\ \n\ \n\ 9.153) ONLINE DOCUMENTATION\n\ \n\ On the Aligned Reads Window or on the Consed Main Window, click on\n\ the 'Help' menu and release on 'Show Documentation'. You will see\n\ this document. You can search for keywords in it. It is also on the\n\ web. Go to http://bozeman.mbt.washington.edu/consed/consed.html, and\n\ find \"complete documentation\" near the bottom of the page.\n\ \n\ \n\ \n\ 9.154) THE .WRK LOG FILE\n\ \n\ Consed keeps a log of all changes you make to an assembly: adding\n\ new reads, putting reads into their own contigs, making joins and\n\ tears, adding and removing tags, and changing bases. This log is kept\n\ in a file ending with \".wrk\". You can use this file to help you\n\ remember exactly what you did to an assembly.\n\ \n\ \n\ 9.155) You should save your edits by pulling open the 'File' menu on the\n\ Aligned Reads Window, and releasing on 'Save assembly'.\n\ \n\ \n\ 9.156) PROTEIN TRANSLATION AND OPEN READING FRAMES\n\ \n\ If you would like, you can see the amino acid translation of the\n\ consensus in all reading frames. In the Aligned Reads Window, push\n\ down the left mouse button on the 'Misc' menu and release on 'Show Top\n\ Strand Protein Translation'. Try again but this time release on 'Show\n\ Bottom Strand Protein Translation'. Notice that there are 2\n\ characters that are in magenta color. What are those characters? Why\n\ are they made in a different color? To not show the protein\n\ translation, push down the left mouse button on the 'Misc' menu and\n\ release on 'Don't show protein translation'.\n\ \n\ 9.157) You can search for open reading frames (a methionine and a stop\n\ codon within the same reading frame) within a contig. In the\n\ Aligned Reads Window, push the left mouse button on 'Navigate' and\n\ release on 'Search for Open Reading Frames'. Notice that the open\n\ reading frames are shown for all 6 reading frames and are sorted by\n\ length.\n\ \n\ \n\ 9.158) Answer to What the Colors Mean (above)\n\ \n\ Greyscale of background indicates quality\n\ Grey base with black background--clipped off part of read (either due\n\ to low quality or due to alignment)\n\ Red base--discrepant with consensus\n\ Black base--agrees with consensus\n\ Colored area covering half of a base--tag (see Quick Tour) \n\ Purple tag--more than 1 tag covering a base\n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 10. VARIOUS BATCH CONSED FEATURES\n\ \n\ Some features of consed are designed to be run from the command-line\n\ rather than the graphical (point and click) interface. This allows\n\ these features to be run from scripts, automatically, and/or at night.\n\ \n\ 10.1) FIXING CONTIG-ENDS\n\ \n\ When you've added reads, consed does not automatically extend the\n\ consensus using the new data so you end up with good quality reads\n\ sticking out of the contigs. In addition, the existing consensus\n\ might be wrong and other reads near the ends of contigs may be\n\ misaligned. In the past, users have fixed this by pulling out reads\n\ and rejoining them, and/or bring up traces and \"change consensus\".\n\ This is a tedious process to fix hundreds of contigs ends.\n\ \n\ We now have a feature that fixes contig ends in batch.\n\ To run this, type:\n\ \n\ consed -ace (ace file) -fixContigEnds\n\ \n\ (where \"consed\" is replaced by whatever command brings up consed on\n\ your system).\n\ \n\ This will reassemble (using phrap) each end of each contigs, extending\n\ the consensus based on the consensus of each little assembly.\n\ \n\ If you don't want all ends of all contigs reassembled, you can\n\ restrict it in 2 ways:\n\ "; static char szReadMe42[] = "\n\ \n\ consed -ace (ace file) -fixContigEnds -contigEndsFOF desired_contig_ends.fof\n\ \n\ where desired_contig_ends.fof is a file that looks like this:\n\ \n\ Contig466 left\n\ Contig466 right\n\ \n\ You can also restrict fixing to contigs that have more contigs by\n\ putting into your .consedrc file the following (for information on how\n\ to change the .consedrc file, see EDIT PARAMETERS: HOW TO CHANGE\n\ CONSED/AUTOFINISH PARAMETERS elsewhere in this document.):\n\ \n\ consed.fixContigEndsMinNumberOfReadsInContig: 5\n\ \n\ If you have a -contigEndsFOF, a contig end will only be done if it\n\ also meets the minimum number of reads filter (above).\n\ \n\ Note to old-timers: do not use the following any longer:\n\ \n\ consed.addNewReadsExtendConsensusUsingProtrudingNewReads: true\n\ \n\ \"consed -fixContigEnds\" supercedes the above parameter.\n\ \n\ \n\ 10.2) CHANGING THE CONSENSUS IN BATCH\n\ \n\ consed -ace (ace file) -changeConsensus (change file) \n\ \n\ where \"change file\" is a file with lines like this:\n\ \n\ Contig21 28-30 x \n\ \n\ where Contig21 is the contig, 28-30 are the unpadded positions and x\n\ is the new base. \n\ \n\ You can also specify the positions in padded positions like this:\n\ \n\ Contig21 *35-*40 c\n\ \n\ where 35 and 40 in *padded* positions. You might prefer using padded\n\ to unpadded positions if, for example, you are have some pads in the\n\ consensus that you want to change to other bases. unpadded positions\n\ would not be useful because they only refer to non-pad bases.\n\ \n\ 10.3) AUTOEDIT\n\ \n\ Autoedit is a program that will read an ace file, make edits according\n\ to which options you specify, and then write out a new ace file, all\n\ without any interaction from the user. Thus Autoedit can be run\n\ automatically at night, the same way you can run phredPhrap. Autoedit\n\ has various options that are controlled from the .consedrc file the\n\ same as the .consedrc file controls Autofinish.\n\ \n\ Run AutoEdit as follows:\n\ \n\ consed -ace (name of exising ace file) -autoEdit \n\ \n\ This will create another ace file with a version number one higher\n\ than the one you just ran. If you want to specify a particular new ace\n\ file name, you can do it this way:\n\ \n\ consed -ace (old ace file) -autoEdit -newAceFileName (new ace file)\n\ \n\ \n\ Autoedit has the following options (if you do not specify any of\n\ these, autoedit will do nothing):\n\ \n\ \n\ consed.autoEditConvertCloneEndBasesToXs: true\n\ bool\n\ ! If true, will convert to X's bases of all reads that protrude beyond a\n\ ! cloneEnd tag.\n\ ! (YES)\n\ \n\ consed.autoEditTellPhrapNotToOverlapMultiplyDiscrepantReads: true\n\ bool\n\ ! This will find all locations where there are multiple identical \n\ ! discrepancies with the consensus (and some other conditions) and try\n\ ! to make most of the reads quality 99 at that location so that phrap,\n\ ! next time it is run, will not overlap those reads. This will fix\n\ ! many misassemblies.\n\ ! (YES)\n\ \n\ consed.autoEditTagEditableLowConsensusQualityRegions: true\n\ bool\n\ ! This will find regions that are low quality, but that a human\n\ ! finisher could easily determine the correct base and thus\n\ ! money could be saved by not having Autofinish suggest additional\n\ ! reads overlapping the region\n\ ! (YES)\n\ \n\ consed.autoEditRecalculateHighQualitySegmentsOfReads: false\n\ bool\n\ ! If true, will recalculate the high quality segments of the reads\n\ ! (YES)\n\ \n\ \n\ \n\ \n\ "; static char szReadMe43[] = "\n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 11. ALIGNING SANGER READS TO A REFERENCE SEQUENCE\n\ \n\ If you are sequencing the same region over and over and you have a\n\ reference sequence, phrap may not be a good choice for creating an\n\ assembly: phrap will take a long time to run (since many reads match\n\ each other), phrap may make several contigs when you know there should \n\ be only one, and phrap may not put all the reads into the assembly.\n\ Consed provides an alternative to phrap. Use it as follows:\n\ \n\ Create an edit_dir, phdball_dir, and chromat_dir as usual. Put the\n\ reference sequence, in fasta format, into edit_dir.\n\ \n\ Type:\n\ \n\ fasta2Ace.perl reference.fa\n\ \n\ This will create an ace file for an assembly that just contains the\n\ single reference sequence. Run consed to view it and make sure you\n\ have followed each of these steps successfully so far.\n\ \n\ (If you have multiple reference sequences, put them all in\n\ reference.fa and run fasta2Ace.perl just as shown above.)\n\ \n\ Leave your example for a moment to see how to add Sanger reads to a\n\ reference sequence with the \"standard\" dataset.\n\ \n\ 11.1) cd to the standard/edit_dir directory, as in the beginning of the\n\ QUICK TOUR. \n\ \n\ 11.2) Follow the instructions under \"ADD NEW READS\" above including\n\ cp ../chromats_to_add/* ../chromat_dir\n\ \n\ 11.3) Looks at the file \"reads_to_add.fof\" which contains a list of the reads to be added.\n\ \n\ 11.4) Run:\n\ \n\ consed -ace standard.fasta.screen.ace.1 -addNewReads reads_to_add.fof -newAceFilename standard.fasta.screen.ace.20\n\ \n\ When this completes, there will be a new ace file\n\ standard.fasta.screen.ace.20 with all the reads added.\n\ \n\ There will also be a custom navigation file that is named something like:\n\ \n\ standard.070913.141632.nav\n\ \n\ where 070913.141632 is the date and time so will be different for you.\n\ (See CUSTOM NAVIGATION below.) This will allow you to visually\n\ find each added read in the assembly, if you so choose.\n\ \n\ What Consed does is take each reads and try to align it against the\n\ reference sequence. It will thus attempt to make one contig with all\n\ of the reads in it. Some reads may not align very well against the\n\ reference sequence. In that case, you can tell consed what you want\n\ to do by the following parameter in the .consedrc file:\n\ \n\ consed.addNewReadsPutReadIntoItsOwnContig: ifUnaligned\n\ \n\ means that if a read does not match the reference sequence very well,\n\ it will be put into its own contig. (For information on how to change\n\ the .consedrc file, see EDIT PARAMETERS: HOW TO CHANGE\n\ CONSED/AUTOFINISH PARAMETERS elsewhere in this document.)\n\ \n\ consed.addNewReadsPutReadIntoItsOwnContig: never\n\ \n\ means that if a read does not match the reference sequence very well,\n\ it will not be put into the assembly at all.\n\ \n\ consed.addNewReadsPutReadIntoItsOwnContig: always\n\ \n\ means that each read is not even compared to the reference sequence,\n\ but just put into its own contig.\n\ \n\ Consensus quality values are not recalculated unless you put the\n\ following into your .consedrc file:\n\ \n\ consed.addNewReadsRecalculateConsensusQuality: true\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 12. USING AUTOPCRAMPLIFY\n\ \n\ If you have a fasta sequence, and you want to amplify part of that\n\ sequence using pcr, and you want to select a pair of PCR primers, you\n\ can do that using Consed's autoPCRAmplify function. It can handle\n\ very high throughput: on a slow computer it takes about 5 minutes to\n\ find PCR primers for a hundred different regions.\n\ \n\ 12.1) Type:\n\ \n\ cd autoPCRAmplify\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ "; static char szReadMe44[] = "\n\ Type:\n\ \n\ ls\n\ \n\ You will see there is one file, brian.fa\n\ \n\ Look at what is in this file:\n\ \n\ more brian.fa\n\ \n\ You will see it looks like this:\n\ \n\ >AP000527.C22.6.mRNA.primerRegion 1 70 81 150 smallest\n\ ACAGGGCCCCTCGCGGGCCCTGACGCAGGATGGAGTTGAGGTGGGGGCAG\n\ CGCTGGACCCCAGGGCCCCTNNNNNNNNNNTGCCGCAGTCTTGGATGATG\n\ GGTTCCTAGAAGCTCTCAACATCTCTTCTTAATTGGAGAAAGTGTTAAGC\n\ >AC004019.C22.4.mRNA.primerRegion 1 70 81 150 smallest\n\ AGCTGTGAGCTGTGCAATCATGTAACTAACTTTGTTTAAGTATTGTTTAG\n\ TCTTTCTGGTCTCCAGATGANNNNNNNNNNTCAGACATTCCACAGCTACC\n\ TAGAGGACATCATCAACTACCGCTGGGAGCTCGAAGAAGGGAAGCCCAAC\n\ \n\ The numbers 1 70 81 150 means that the left primer should be selected\n\ from the region from 1 to 70 of the sequence (starting at 1), so the\n\ primer should be chosen from the sequence:\n\ ACAGGGCCCCTCGCGGGCCCTGACGCAGGATGGAGTTGAGGTGGGGGCAG\n\ CGCTGGACCCCAGGGCCCCT\n\ \n\ The 81 150 means that the right primer should be selected from the\n\ region from 81 to 150 of the sequence, i.e. from within:\n\ \n\ TGCCGCAGTCTTGGATGATG\n\ GGTTCCTAGAAGCTCTCAACATCTCTTCTTAATTGGAGAAAGTGTTAAGC\n\ \n\ \n\ \"smallest\":\n\ \n\ -------------------- -------------------------\n\ ---> <---\n\ \n\ \n\ \"biggest\":\n\ \n\ -------------------- -------------------------\n\ ---> <---\n\ \n\ \n\ The word \"smallest\" means that the primers should be chosen so that\n\ the product is as small as possible. That means that the left primer\n\ should be chosen as far as possible to the right within the 1-70\n\ region and the right primer should be chosen as far as possible to the \n\ left within the 81-150 region. If we had instead put \"biggest\", the\n\ primers would instead have been chosen to make the PCR product as\n\ large as possible.\n\ \n\ Notice that in the diagram above, I didn't make it look like this:\n\ \n\ \"smallest\":\n\ \n\ -------------------- -------------------------\n\ ---> <---\n\ \n\ (the primers are at the very edge of the regions). The reason is that \n\ in general, due to other checks on the primers, the primers that would \n\ make the absolute smallest product are not acceptable, and the primers \n\ must be backed up. Similarly for \"biggest\".\n\ \n\ \n\ 12.2) Then run the following:\n\ \n\ amplifyTranscripts.perl brian.fa\n\ \n\ (The name comes from the fact that this perl program was originally\n\ developed to amplify cDNA transcripts.)\n\ \n\ You should see a page or two of output flash by the screen, ending\n\ with:\n\ \n\ \n\ ---------------------------------------------------------\n\ working on AP000527.C22.6.mRNA.primerRegion\n\ ---------------------------------------------------------\n\ \n\ \n\ \n\ working on transcript AC004019.C22.4.mRNA.primerRegion (2 out of 2...\n\ \n\ \n\ \n\ ---------------------------------------------------------\n\ working on AC004019.C22.4.mRNA.primerRegion\n\ ---------------------------------------------------------\n\ \n\ \n\ \n\ see files primers_unsorted.txt for primers and failures.txt for failures (if any)\n\ \n\ 12.3) Look at the files just created in your directory. \n\ \n\ You should see \"failures.txt\" which should be empty. And you should\n\ see \"primers_unsorted.txt\" which should contain:\n\ "; static char szReadMe45[] = "\n\ \n\ PRIMER_PAIR {\n\ Region: AP000527.C22.6.mRNA.primerRegion Product size: 67\n\ AP000527.C22.6.mRNA.primerRegionf: AGTTGAGGTGGGGGCAGC temp: 64\n\ AP000527.C22.6.mRNA.primerRegionr: CATCATCCAAGACTGCGGC temp: 63\n\ }\n\ \n\ \n\ PRIMER_PAIR {\n\ Region: AC004019.C22.4.mRNA.primerRegion Product size: 59\n\ AC004019.C22.4.mRNA.primerRegionf: TTTAGTCTTTCTGGTCTCCAGATGA temp: 61\n\ AC004019.C22.4.mRNA.primerRegionr: TCTAGGTAGCTGTGGAATGTCTGA temp: 60\n\ }\n\ \n\ \n\ This gives the primers. The top strand primer is the one ending in\n\ 'f' and the bottom strand primer is the one ending in 'r'. Both are\n\ in 5' to 3' orientation, so the 'r' primer is reverse complemented\n\ from the sequence in the original fasta file.\n\ \n\ 12.4) To put these primers into 96 well format for ordering, type\n\ \n\ orderPrimerPairs.perl no\n\ \n\ You will see output like this:\n\ \n\ > orderPrimerPairs.perl no\n\ finished sorting\n\ attachments:\n\ /me1/gordon/sunny/autoPCRAmplify_answer/brian.fa \"brian.fa\", /me1/gordon/sunny/autoPCRAmplify_answer/primers_sorted.txt \"primers_sorted.txt\", /me1/gordon/sunny/autoPCRAmplify_answer/to_order.txt \"to_order.txt\"\n\ \n\ Type \n\ \n\ ls\n\ \n\ and see that there will be the following 4 files created:\n\ \n\ to_order.txt, which is the primers in 96 well format, tab-separated so\n\ this can be easily imported into Excel\n\ primers_unsorted_shorter.txt, which in this case is identical to \n\ primers_unsorted.txt (Don't ask.)\n\ primers_sorted.txt, which has the same primers once again, but this\n\ time they are sorted by product size. If you have thousands of \n\ primers, you may want to run all the big ones together on the\n\ thermocycler, then all the next longest ones, etc. \n\ primers081205.fasta (in which 081205 is the current date in YYMMDD)\n\ the same primers, YET AGAIN, but this time in fasta format, in\n\ case you want to use them in some other program that wants fasta\n\ format\n\ \n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ 13. USING AUTOREPORT\n\ \n\ Autoreport is a command-line (non-graphical) method of running consed\n\ to report information about the assembly. \n\ \n\ \n\ \n\ \n\ 13.1) VARIANTS REPORT\n\ \n\ Let's try the consed.autoReportPrintHighlyDiscrepantRegions\n\ feature. Type:\n\ cd solexa_example_answer/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ \n\ You should see a file ref.ace.1\n\ \n\ We will need the .consedrc file in this directory with the following in\n\ it:\n\ \n\ consed.autoReportPrintHighlyDiscrepantRegions: true\n\ consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 12\n\ \n\ These options were explained above under \"Search for highly discrepant\n\ positions\".\n\ \n\ To create this file, do the following:\n\ \n\ 13.2) EDIT PARAMETERS: HOW TO CHANGE .consedrc PARAMETERS\n\ \n\ This section applies not only to autoreport, but also to autofinish,\n\ autoedit, and customizing consed.\n\ \n\ You can edit .consedrc using an editor, such as pico, or you can do it with\n\ Consed, which is far easier. To do it with Consed, bring up consed as\n\ follows:\n\ \n\ 13.3) type:\n\ consed -editConsedrc\n\ \n\ A window should come up with many .consedrc parameters.\n\ "; static char szReadMe46[] = "\n\ \n\ 13.4) Find consed.autoReportPrintHighlyDiscrepantRegions. You can\n\ easily do this by typing in the \"Find Parameter\" box at the bottom\n\ \"printhighly\" and click on \"Find First\".\n\ \n\ 13.5) Click \"True\". This item should turn red, indicating that it is\n\ now different than the default value.\n\ \n\ 13.6) Find \n\ \"consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality\"\n\ \n\ As in the step above, in the \"Find Parameter\" box at the bottom, type\n\ \"ignorebases\" which will be enough to find it.\n\ \n\ 13.7) Change the default of 20 to 12. (The default is actually better\n\ for finding real variants, but there aren't any real variants with\n\ this dataset so if you leave it at 20 you won't get any output.)\n\ \n\ 13.8) At the bottom of the Edit .consedrc Window, click \"Just project\".\n\ Then click \"save\". A box titled \"Name of parameter file to write\"\n\ should pop up. Click \"OK\". That box will disappear and a box saying\n\ \"Note that these new parameters will take effect only after restarting\n\ Consed/Autofinish\" will popup. Click \"Dismiss\" on that box and click\n\ \"Dismiss\" on the \"Edit .consedrc Window\". All windows should disappear.\n\ \n\ 13.9) Back on the unix command line, type:\n\ ls -al\n\ \n\ and you should see .consedrc\n\ \n\ 13.10) Type:\n\ more .consedrc\n\ \n\ and see that it should contain just this:\n\ \n\ consed.autoReportPrintHighlyDiscrepantRegions: true\n\ consed.navigateByHighlyDiscrepantPositionsIgnoreBasesBelowThisQuality: 12\n\ \n\ (Get in the habit of checking .consedrc after using Consed's Edit\n\ .consedrc Window.) \n\ \n\ Why doesn't .consedrc contain these others (below) as well? See if you can\n\ figure that out.\n\ \n\ consed.navigateByHighlyDiscrepantPositionsMinDiscrepantReads: 2\n\ consed.navigateByHighlyDiscrepantPositionsMaxDepthOfCoverage: 100000\n\ consed.navigateByHighlyDiscrepantPositionsJustListIndels: false\n\ consed.navigateByHighlyDiscrepantPositionsIgnoreOtherReadsStartingAtSameLocation: false\n\ \n\ \n\ 13.11) Type:\n\ consed -ace ref.ace.1 -autoreport\n\ \n\ \n\ There will be a lot of output ending with something like:\n\ see ref.ace.1.081211.160556.out\n\ \n\ where 081211.160556 will be replaced by your current date and time.\n\ \n\ 13.12) Type:\n\ more ref.ace.1.081211.160556.out (replace this by the name of your file)\n\ \n\ This file will contain a huge amount of output (listing the parameters\n\ used in the run)--the important part is at the end:\n\ \n\ printHighlyDiscrepantRegions {\n\ Highly Discrepant Positions\n\ min # of discrepant reads: 2 min quality: 12 \"r\": base of reference seq\n\ max depth of coverage: 100000 and ignoring reference seq\n\ A C G T * pos contig\n\ 2 8.0% 23 92.0%r 0 0.0% 0 0.0% 0 0.0% 56 ref\n\ 3 9.1% 30 90.9%r 0 0.0% 0 0.0% 0 0.0% 252 ref\n\ 2 6.9% 27 93.1%r 0 0.0% 0 0.0% 0 0.0% 256 ref\n\ 0 0.0% 0 0.0% 20 90.9%r 2 9.1% 0 0.0% 682 ref\n\ 0 0.0% 0 0.0% 31 93.9%r 2 6.1% 0 0.0% 715 ref\n\ 2 4.8% 40 95.2%r 0 0.0% 0 0.0% 0 0.0% 742 ref\n\ 2 8.7% 21 91.3%r 0 0.0% 0 0.0% 0 0.0% 936 ref\n\ 0 0.0% 1 2.4% 1 2.4% 39 95.1%r 0 0.0% 982 ref\n\ } printHighlyDiscrepantRegions\n\ \n\ This output is explained above under \"Search for highly discrepant\n\ positions\".\n\ \n\ Programmers: if you want to run this report automatically and have\n\ the results parsed, there is also a file auto.fof which will contain the\n\ name of this output file.\n\ \n\ 13.13) cd to standard/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ \n\ 13.14) Type:\n\ consed -editconsedrc\n\ \n\ Follow the example above and this time make .consedrc have only the\n\ following two lines:\n\ \n\ "; static char szReadMe47[] = "\n\ consed.autoReportPrintLowConsensusQualityRegions: true\n\ consed.autoReportPrintSingleSubcloneRegions: true\n\ \n\ 13.15) Run autoreport as follows:\n\ \n\ consed -ace standard.fasta.screen.ace.1 -autoreport\n\ \n\ (where \"consed\" must be replaced by whatever command your system\n\ administer says to use).\n\ \n\ You will see something like this:\n\ \n\ > consed -ace standard.fasta.screen.ace.1 -autoreport\n\ couldn't open readOrder.txt--that's ok\n\ opened file standard.070918.162756.out for output\n\ Now setting quality values\n\ Number of individual phd files read: 24\n\ Total reads in assembly: 24\n\ Finished setting quality values in 0 seconds \n\ see standard.070918.162756.out\n\ \n\ 13.16) Look at standard.070918.162756.out (where the 070918.162756 will\n\ be replaced by the current date and time). Scroll down to the bottom\n\ (where the important information is) and you will see:\n\ \n\ lowConsensusQualityRegions {\n\ Contig1 (consensus) 1-83 base quality below threshold\n\ Contig1 (consensus) 85-110 base quality below threshold\n\ Contig1 (consensus) 113-117 base quality below threshold\n\ Contig1 (consensus) 120-156 base quality below threshold\n\ Contig1 (consensus) 159-166 base quality below threshold\n\ Contig1 (consensus) 168-171 base quality below threshold\n\ Contig1 (consensus) 185-187 base quality below threshold\n\ Contig1 (consensus) 189-190 base quality below threshold\n\ Contig1 (consensus) 192 base quality below threshold\n\ Contig1 (consensus) 194-199 base quality below threshold\n\ Contig1 (consensus) 269 base quality below threshold\n\ Contig1 (consensus) 271-275 base quality below threshold\n\ Contig1 (consensus) 2584-2591 base quality below threshold\n\ } lowConsensusQualityRegions\n\ singleSubcloneRegions {\n\ Contig1 (consensus) 1-199 199 bp single subclone\n\ Contig1 (consensus) 2588-2591 4 bp single subclone\n\ } singleSubcloneRegions\n\ \n\ This gives the low consensus quality regions and the single subclone\n\ regions.\n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 14. FEATURES FOR SNP ANALYSIS\n\ \n\ You can ignore this section until the program phaster is released.\n\ \n\ 14.1) CREATING A SNP-MASKED PHASTER GENOME\n\ \n\ Run consed as:\n\ \n\ consed -snpGenome snp130.txt -valid validExcluded.txt -genome chromosomes.fof\n\ \n\ where snp130.txt is in UCSC snp format as downloaded from ftp site \n\ hgdownload.cse.ucsc.edu \n\ cd goldenPath/hg19/database/\n\ get snp130.txt.gz \n\ \n\ This file has fields separated by tabs, with a carriage return at the\n\ end of each line. For descriptions of the fields, see:\n\ \n\ http://www.genome.ucsc.edu/cgi-bin/hgTables, change \"table\" to\n\ snp130, and click on \"describe table schema\". Here is that schema\n\ (not formatted as nicely):\n\ \n\ Database: hg19 Primary Table: snp130 Row Count: 18,404,149\n\ Format description: Polymorphism data from dbSnp database or genotyping arrays\n\ \n\ field example SQL type description\n\ bin 585 smallint(5) unsigned Indexing field to speed chromosome range queries.\n\ chrom chr1 varchar(31) Reference sequence chromosome or scaffold\n\ chromStart 10259 int(10) unsigned Start position in chrom\n\ chromEnd 10260 int(10) unsigned End position in chrom\n\ name rs72477211 varchar(15) Reference SNP identifier or Affy SNP name\n\ score 0 smallint(5) unsigned Not used\n\ strand + enum('+','-') Which DNA strand contains the observed alleles\n\ refNCBI C blob Reference genomic from dbSNP\n\ refUCSC C blob Reference genomic from nib lookup\n\ observed A/G varchar(255) The sequences of the observed alleles from rs-fasta files\n\ molType genomic enum('unknown','genomic','cDNA') Sample type from exemplar ss\n\ class single enum('unknown','single','in-del','het','microsatellite','named','mixed','mnp','insertion','deletion') The class of variant (simple, insertion, deletion, range, etc.)\n\ valid unknown set('unknown','by-cluster','by-frequency','by-submitter','by-2hit-2allele','by-hapmap','by-1000genomes') The validation status of the SNP\n\ avHet 0 float The average heterozygosity from all observations\n\ avHetSE 0 float The Standard Error for the average heterozygosity\n\ func unknown set('unknown','coding-synon','intron','near-gene-3','near-gene-5','nonsense','missense','frameshift','untranslated-3','untranslated-5') The functional category of the SNP (coding-synon, coding-nonsynon, intron, etc.)\n\ locType exact enum('range','exact','between','rangeInsertion','rangeSubstitution','rangeDeletion') How the variant affects the reference sequence\n\ weight 1 int(10) unsigned The quality of the alignment\n\ \n\ \n\ validExcluded.txt contains the validity codes (or combinations of\n\ them) for which a snp should be ignored. Our validExcluded.txt\n\ "; static char szReadMe48[] = "\n\ contains:\n\ \n\ unknown\n\ by-1000genomes\n\ \n\ chromosomes.fof is a table with 3 columns:\n\ chromosome_name original_chromosome_file phaster_annotated_file\n\ \n\ For example, chromosomes.fof might contain:\n\ \n\ chrM ../../chrM.fa snp_chrM.fa\n\ chr1 ../../chr1.fa snp_chr1.fa\n\ chr2 ../../chr2.fa snp_chr2.fa\n\ chr3 ../../chr3.fa snp_chr3.fa\n\ chr4 ../../chr4.fa snp_chr4.fa\n\ chr5 ../../chr5.fa snp_chr5.fa\n\ chr6 ../../chr6.fa snp_chr6.fa\n\ chr7 ../../chr7.fa snp_chr7.fa\n\ chr8 ../../chr8.fa snp_chr8.fa\n\ chr9 ../../chr9.fa snp_chr9.fa\n\ chr10 ../../chr10.fa snp_chr10.fa\n\ chr11 ../../chr11.fa snp_chr11.fa\n\ chr12 ../../chr12.fa snp_chr12.fa\n\ chr13 ../../chr13.fa snp_chr13.fa\n\ chr14 ../../chr14.fa snp_chr14.fa\n\ chr15 ../../chr15.fa snp_chr15.fa\n\ chr16 ../../chr16.fa snp_chr16.fa\n\ chr17 ../../chr17.fa snp_chr17.fa\n\ chr18 ../../chr18.fa snp_chr18.fa\n\ chr19 ../../chr19.fa snp_chr19.fa\n\ chr20 ../../chr20.fa snp_chr20.fa\n\ chr21 ../../chr21.fa snp_chr21.fa\n\ chr22 ../../chr22.fa snp_chr22.fa\n\ chrX ../../chrX.fa snp_chrX.fa\n\ chrY ../../chrY.fa snp_chrY.fa\n\ \n\ Our program only uses snps with \"weight\" of 1. \n\ \n\ Our program only looks at class/locType of \"single\"/\"exact\",\n\ \"insertion\"/\"between\", \"deletion\"/\"range\", and \"deletion\"/\"exact\". Of\n\ these, there are a few that overlap each other. In this case, we just\n\ use the first one.\n\ \n\ There are 18,404,149 entries in snp130.txt\n\ 6,664,580 after eliminating those with validation \"unknown\" or \"by-1000genomes\"\n\ 6,484,495 after only allowing those with weight = 1\n\ 6,440,943 after only allowing those of class/locType\n\ insertion/between or deletion/exact or deletion/range or single/exact\n\ 6,440,734 after eliminating those that overlap the previous\n\ polymorphism \n\ \n\ \n\ 14.2) PHASTER2MINIASSEMBLY.PERL\n\ \n\ This script is particularly useful when you have reads but you believe\n\ these reads may be from a region that is missing from the reference\n\ sequence.\n\ \n\ You provide the script with a list of chromosome locations and a list\n\ of phaster output files. It scans the phaster output files and\n\ extracts the reads that intersect the locations. By default, both the\n\ read and its mate is extracted. Then phrap assembles all of these\n\ reads together. Fake reads are created from the reference sequence in\n\ a window about the chromosome locations and these fake reads are\n\ aligned using cross_match to the consensus sequences created in the\n\ phrap assembly. A consed-ready ace file is created. You can view the\n\ assembly, observing how the reads differ from the reference sequence\n\ fake reads.\n\ \n\ A small exercise that illustrates this feature is dataset\n\ phaster2Miniassembly and the solution is in\n\ phaster2Miniassembly_answer.\n\ \n\ 14.3) cd phaster2Miniassembly/edit_dir \n\ \n\ (You might need to type a different cd command depending on which\n\ directory you are currently in.)\n\ \n\ 14.4) Type:\n\ phaster2Miniassembly.perl snps.txt fake_genome.fof phaster_output.fof myAce.ace\n\ \n\ There should be a flurry of output ending with:\n\ See new ace file myAce.ace\n\ done 0\n\ \n\ \n\ 14.5) Type:\n\ consed -ace myAce.ace\n\ \n\ (where \"consed\" must be replaced by whatever command your system\n\ administer says to use).\n\ \n\ and view the assembly. The reads:\n\ \n\ chrA_19925_20075\n\ chrA_20001_20151\n\ \n\ are the fake reads created from the reference sequence chrA.fa\n\ All of the reads starting with \"4_\" are solexa reads.\n\ \n\ "; static char szReadMe49[] = "\n\ \n\ 14.6) PHASTER2ACE.PERL\n\ \n\ This script is useful if you have some putative snp sites and want to\n\ look with consed at reads containing these sites.\n\ \n\ 14.7) To learn how to use phaster2Ace.perl,\n\ cd phaster2Ace/edit_dir\n\ \n\ (You might need to type a different cd command depending on which\n\ directory you are currently in.)\n\ \n\ 14.8) Type:\n\ phaster2Ace.perl snps.txt fake_genome.fof phaster.fof myAce.ace\n\ \n\ There will be a flurry of output ending with:\n\ \n\ See new ace file myAce.ace\n\ done 0\n\ \n\ 14.9) Bring up the new ace file:\n\ consed -ace myAce.ace\n\ \n\ (where \"consed\" must be replaced by whatever command your system\n\ administer says to use).\n\ \n\ You should see 13 reads, 12 whose names start with \"phaster\" and 1\n\ named \"chr1_49901_50138\" which is a little piece of the reference\n\ sequence.\n\ \n\ There are 3 little blue polymorphism tags on the consensus.\n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 15. LESS USED CONSED FEATURES\n\ \n\ \n\ 15.1) MULTIPLE HIGH QUALITY DISCREPANCIES VS SEARCH FOR HIGHLY\n\ DISCREPANT REGIONS\n\ \n\ You have already used (above) \"Search for highly discrepant\n\ positions\". \"Multiple high quality discrepanices\" (MHQD) is similar\n\ but much less flexible. It requires that there be one read at a\n\ position that differs with the consensus and is at least\n\ consed.qualityThresholdForFindingHighQualityDiscrepancies (40 by\n\ default) at that base and within a 9-base window about the base (4\n\ bases on each side, not including pads). It then requires there be a\n\ second read of any quality of a different subclone that has the same\n\ base.\n\ \n\ \n\ 15.2) BACKING OUT EDITS AFTER YOU HAVE SAVED THE ASSEMBLY\n\ \n\ If you decide that all your edits are terrible and you want to start\n\ over (perhaps you have been training a new finisher), the cleanest\n\ solution is to delete everything in phd_dir and edit_dir , but leave\n\ everything in chromat_dir and just reassemble (run phredPhrap) or\n\ realign the reads again.\n\ \n\ \n\ 15.3) SELECTIVELY BACKING OUT EDITS AND REMOVING READS\n\ \n\ If you want to back out all edits in just particular reads, I have\n\ provided a perl script to do this:\n\ \n\ \n\ revertToUneditedRead (read name)\n\ \n\ What it does it copy the .phd.1 to 1 greater than the highest\n\ version. \n\ \n\ Then you must reassemble using the phredPhrap script to create an ace\n\ file that has no edits for that particular read. It will have all\n\ edits for all other reads. \n\ \n\ Why doesn't it just delete all phd files except for the\n\ .phd.1? In that case, Consed could not read any previous ace file\n\ since all previous versions of ace files would refer to phd files that \n\ have been deleted.\n\ \n\ 15.4) REMOVING READS FROM A PHRAP ASSEMBLY\n\ \n\ Create a file containing the filename of all the reads you want to\n\ remove, one filename per line.\n\ Then use the perl script\n\ \n\ removeReads \n\ \n\ Then reassemble using the phredPhrap script.\n\ \n\ \n\ 15.5) ADDING READS WITHOUT CHROMATOGRAM FILES\n\ \n\ This may happen if you, for example, download sequence from Genbank\n\ and want to assemble it along with your reads. \n\ \n\ There are 2 ways to do this, depending on whether you want to edit the \n\ "; static char szReadMe50[] = "\n\ read or not. \n\ \n\ a) If you want to edit the read, run mktrace to produce a fake trace. It \n\ will have all perfect peaks. \n\ \n\ Run:\n\ \n\ mktrace (name of file with fasta sequence)\n\ \n\ Then run the phredPhrap script normally. You will be able to bring up \n\ the traces in Consed and edit the read.\n\ \n\ b) If it is not important to edit the reads, there is a method that\n\ is a little faster. Create just a fake phd file using:\n\ \n\ fasta2Phd.perl (name of file with fasta sequence)\n\ \n\ \n\ It will create a file whose name is taken from the fasta file name:\n\ for example, if the fasta filename is Contig1.c.fasta, then the phd file\n\ will be called Contig1.c.phd.1 The fasta name in the file is ignored.\n\ You can then put this in the phd_dir, and reassemble using the\n\ phredPhrap script.\n\ \n\ If the reads are really fake (you don't want the templates to be\n\ chosen by Consed/Autofinish as a template for a primer), then the read\n\ should end with an extension .c or .a or .c1 or\n\ .c2 ... or .a1 or .a2 or ... This indicates to Consed/Autofinish\n\ that the read is a fake read.\n\ \n\ Note: when you are creating phd files such as this, you must start with\n\ (read name).phd.1 Do not start with (read name).phd.2 or any higher\n\ version number. This is because Consed looks for the .1 version in\n\ order to find the original phred calls so it expects there to be a .1\n\ version.\n\ \n\ There is also a publicly contributed script \"lib2Phd.perl\" that takes\n\ a fasta file that contains more than one sequence and makes phd files\n\ for each of them.\n\ \n\ \n\ 15.6) ALIGNING READS TO A BACKBONE\n\ \n\ If you sequence the same region (in different people or in different\n\ species), then you may want them all aligned together, even if phrap\n\ doesn't want to put them all together. To align them all together,\n\ first use a reference sequence and make an assembly out of it by using \n\ mktrace or fasta2Phd.perl (see above) followed by phd2Ace.perl (see\n\ above). Then add all of the other reads using Consed's Add New Reads\n\ feature (either automated or manual--see above). \n\ \n\ \n\ 15.7) COMPARING READS TO A REFERENCE SEQUENCE\n\ \n\ The reference sequence, as in the step above, will just be another\n\ read in the assembly. Let's call it \"ref\". To compare the other\n\ reads to it, in the Aligned Reads Window, point at the Navigate Menu,\n\ hold down the left mouse button and release on \"Compare Reads To\n\ Reference Sequence\". A Window labelled \"Enter Name of Reference\n\ Read\" will pop up. Enter the name of the read and click \"OK\". A list\n\ of high quality read positions that disagree with the reference read\n\ will be displayed.\n\ \n\ \n\ 15.8) TAGGING ALL READS AT ONCE\n\ \n\ Follow the instructions for tagging the consensus, but when the list\n\ of tag types pops up, click the \"tag all reads\" box at the top of this \n\ list. Then continue as with tagging the consensus.\n\ \n\ 15.9) EDITING ALL READS AT ONCE\n\ \n\ Please don't do this. Not unless you REALLY know what you are doing\n\ and have a good reason for doing so. You should really only change a\n\ base call if you are looking at the chromatogram and thus have a basis \n\ in that read for making the change.\n\ \n\ If you are determined to do this in spite of my pleas and protests, do \n\ the following. Suppose that at a particular consensus position some reads\n\ have \"a\" and some \"c\" and you want them all to be \"a\". In the Aligned\n\ Reads Window, point to an \"a\" at that position, hold down the left\n\ mouse button and release on \"make all reads a\". \n\ \n\ The reason you shouldn't do this is that perhaps the reads that were\n\ \"c\" were actually correct and were a different copy of a repeat.\n\ Hence the reads with \"a\" and the reads with \"c\" did not really\n\ overlap. But you just destroyed the evidence.\n\ \n\ \n\ 15.10) FASTER CONSED STARTUP FOR SANGER READS\n\ \n\ Warning: This only applies to assemblies with large numbers of Sanger\n\ reads. This will have no effect on assemblies with 454 or Solexa\n\ reads.\n\ \n\ You can greatly speed up Consed startup if you are willing to use more \n\ disk space. The disk space used will be about equal to the total\n\ space used by the PHD files. Try this will a large dataset (you won't \n\ notice any difference with the test datasets that come with Consed.)\n\ \n\ "; static char szReadMe51[] = "\n\ To use this method of startup:\n\ \n\ 1) cd to directory where ace file is kept\n\ 2) type: makePhdBall.perl\n\ (This will create a file called phd.ball which is big.)\n\ 3) start consed normally\n\ \n\ \n\ In many situations, this will greatly speed up Consed startup. The\n\ amount of speedup depends on which operating system is used: on Linux,\n\ the time to read phd files dropped from 75 seconds to 8 seconds, and\n\ thus the total time to start up consed dropped from 86 seconds to 17\n\ seconds. I saw similar speedups on Solaris where the phd files are on\n\ an nfs mounted disk. However, there was another situation in which\n\ the startup time was the same.\n\ \n\ Warning: If you create phd.ball as above, Consed will be reading most\n\ phd files from phd.ball instead of from ../phd_dir. If you delete phd\n\ files in phd_dir, you must also delete phd.ball. Otherwise Consed\n\ will give lots of error messages \"TIME STAMP MISMATCH\" and many things\n\ will not work correctly.\n\ \n\ \n\ 15.11) VIEWING THE CHROMATOGRAM OF SINGLETS OR NON-ASSEMBLED READS\n\ \n\ \n\ If you have a chromatogram, you can use Consed to view it, even if it\n\ hasn't been assembled into the ace file. This is common with cDNA\n\ assemblies in which the reads don't overlap and thus phrap doesn't put \n\ them together into a contig.\n\ \n\ To do this, make the same edit_dir, phd_dir,\n\ and chromat_dir as above, put the chromatogram into chromat_dir, run\n\ phred on it to generate the phd file which goes into phd_dir.\n\ \n\ Then go to edit_dir and run:\n\ \n\ phd2Ace.perl (name of phd file)\n\ \n\ For example, if your phd file is myRead.phd.1\n\ from edit_dir, type:\n\ \n\ phd2Ace.perl myRead.phd.1\n\ \n\ This will produce myRead.ace\n\ \n\ Then just start Consed normally:\n\ consed -ace myRead.ace\n\ and you can view the chromatogram.\n\ \n\ 15.12) HIDING SOME TYPES OF TAGS\n\ \n\ If you have many tags that overlap and thus are purple, you can\n\ hide some less relevant tag types so there is less purple and there is\n\ less distraction. Make sure you have a few tags visible. Then click\n\ on 'Find Main Win'. In the Main Window, open the Options menu, and\n\ release on 'Hide Some Tag Types'. A list of tag types will pop up.\n\ Select the type that you have visible (above). Then click 'OK'. Go\n\ back to the Aligned Reads Window. That tag should still be visible.\n\ Click on the button 'Some Tags' in the upper right part of the Aligned\n\ Reads Window. Your tag should disappear. The 'Some Tags' button\n\ should have changed to 'Sh All Tags'. Click on it again. Your tags\n\ should have reappeared.\n\ \n\ 15.13) CUSTOM CONTIG NAMES\n\ \n\ Normally, when you re-assemble, phrap will name the contigs\n\ differently--what was Contig31 before may become Contig32. To help\n\ you know which contig is which, Consed allows you to give a name\n\ (e.g., \"A\") to a contig which will persist after re-assembling. To do\n\ this, swipe some consensus bases with the middle mouse button (as\n\ above). When the \"Select Tag Type\" box pops up, click on \"contigName\"\n\ and also type a name into the \"Contig Name:\" field and then click\n\ \"OK\". The next time you re-assemble, the name \"A\" will appear in the\n\ list of contigs on the Consed Main Window.\n\ \n\ \n\ 15.14) ERROR RATE\n\ \n\ In the Aligned Reads Window is a box (upper right) labelled\n\ 'Err/10kb'. This is the estimated error rate for this contig, and it\n\ is a good indicator of when you are done (or not done) finishing.\n\ In addition, you can find the error rate for a particular region of\n\ contig as follows: Point at 'Misc' menu, hold down the left mouse\n\ button, pull down and release on 'Show Error Info For Region'. Fill \n\ in the boxes for left and right consensus position, click on\n\ 'Calculate' and you will be given the error and single subclone data\n\ for that region.\n\ \n\ 15.15) RESTRICTION DIGEST\n\ \n\ Restart Consed.\n\ \n\ Double click on \"standard.fasta.screen.ace.1\"\n\ \n\ In the Consed Main Window, click the \"Digest\" button. For the\n\ purpose of this exercise, the full pathname of file of vector sequence\n\ can refer to any file of sequence in fasta format. However, when you\n\ are using it with your own data it should refer to a file that\n\ contains the sequence of your cloning vector. For example, if you are\n\ "; static char szReadMe52[] = "\n\ sequencing a BAC, it should contain BAC vector. The sequence must\n\ start at the vector/insert junction that you used when you ligated the\n\ insert.\n\ \n\ Click \"OK\". You will see a comparison of in-silico fragments (those\n\ calculated from the sequence) and real fragments (those in\n\ fragSizes.txt which supposedly came from a real gel).\n\ \n\ * If a band is red, that means that it doesn't match. \n\ * If a band has a \"v\" on it, that means it is a vector fragment.\n\ * If a band has a \"g\" on it, that means it is a gap-spanning fragment.\n\ \n\ Move the pointer over the fragments, and you will see the fragment\n\ sizes appear. Move the pointer to the in-silico fragment with size\n\ 2299. Click on it. You will see the fragment on the left size of the\n\ window become highlighted. Click on the button labeled \"right end\"\n\ (2nd row from the bottom of the window) and the Aligned Reads Window\n\ will pop up, with the cursor on the right end of the fragment.\n\ \n\ Click on \"show problems\" and navigate through the list of problems by\n\ clicking on \"next\". You will notice that the Gel Window is zoomed\n\ in. To return to the original zoom, click on \"Zoom Original\". \n\ \n\ Where it says \"Select Enzyme:\", point to \"EcoRV\", hold down the left\n\ mouse button and release on \"HindIII\". This is how you change\n\ enzymes.\n\ \n\ Click on the button labeled \"Text Output\". This can be saved to a\n\ file and printed out.\n\ \n\ Dismiss the restriction digest window. On the Consed Main Window,\n\ click the \"Digest\" button again. Notice the file \"fragSizes.txt\".\n\ This is a file of actual gel fragment sizes. If you don't have an\n\ actual gel, but rather you want to just make predictions of fragment\n\ sizes from the sequence, you can leave this box blank (erase the\n\ \"fragSizes.txt\"). Try that.\n\ \n\ \n\ fragSizes.txt has the following format:\n\ \n\ >EcoRV\n\ 448\n\ 710\n\ 1102\n\ 1197\n\ -1\n\ >HindIII\n\ 448\n\ 508\n\ 586\n\ 735\n\ 801\n\ -1\n\ \n\ where EcoRV and HindIII are enzymes and the numbers below them are the \n\ actual fragment sizes. Each enzyme list is terminated by -1. \n\ \n\ Consed does its best to try to figure out which end of the clone\n\ insert is connected to which end of the vector. However, it sometimes\n\ is wrong. If you believe it is wrong, you can click \"compl vector\" to\n\ try connecting the insert to the vector in the opposite orientation\n\ and see if that produces better agreement with the actual digest.\n\ \n\ \n\ 15.16) RESTRICTION DIGEST AND ASSEMBLY VIEW\n\ \n\ Go to the assembly_view sample dataset and bring up the Assembly View\n\ Window:\n\ \n\ cd assembly_view/edit_dir\n\ \n\ (You might need to type \"cd ../..\" first depending on which directory\n\ you are currently in.)\n\ \n\ ls\n\ Restart consed\n\ \n\ Double click on \"assembly_view.fasta.screen.ace.1\"\n\ \n\ In the Consed Main Window, click on the button \"Assembly View\" which is\n\ near the upper left corner of the window.\n\ \n\ Also on the Consed Main Window, click on Digest. The \"Select Enzyme\n\ and Contigs\" Window should appear with EcoRV and HindIII selected.\n\ Click OK. The \"Display Digest\" Window should appear.\n\ \n\ Now look at the Assembly View Window. You will notice blue, green,\n\ and red rectangles under the grey contig bars. These rectangles are\n\ the in-silico restriction fragments. Point to one of them-- it will\n\ turn yellow and information will be displayed in the information box\n\ below. Point to one of the EcoRV fragments, hold down the right mouse\n\ button, and release on \"Goto fragment in digest window\". Notice that\n\ in the Display Digest Window, the selected fragment is highlighted\n\ both on the left side (the text) and in the Gel (right) side.\n\ \n\ \n\ \n\ 15.17) MULTIPLE TRACE POPUP\n\ \n\ Bring up dataset standard. In the Aligned Reads window, scroll to\n\ "; static char szReadMe53[] = "\n\ a region that has many reads and that has some discrepancies--try\n\ position 1162. Hold down the shift key, and click with the middle\n\ mouse button on the consensus. At this location 3 traces will\n\ pop up--these are the 2 highest quality traces that agree with the\n\ consensus (on each strand) and the highest quality trace that\n\ disagrees with the consensus. This feature is useful in areas of high\n\ coverage when you want to rapidly examine just the most significant\n\ traces rather than looking at all of them.\n\ \n\ \n\ 15.18) MAXIMUM NUMBER OF TRACES DISPLAYED\n\ \n\ Bring up dataset standard. Scroll to position 1162. Bring up 4\n\ reads and then try bringing up additional reads.You will notice that\n\ new reads are put at the top of the stack of traces and, once there\n\ are 4 traces displayed, traces are automatically removed from the\n\ bottom of the stack. If you want to change this maximum number of\n\ traces to something besides 4, you can do that: In the Consed Main\n\ Window (click on 'Find Main Win' on the Aligned Reads window), pull\n\ down the 'Options' menu, and release on 'General Preferences'. Try\n\ changing the 'Max Number of Traces Shown' to 3. Then click 'Apply and\n\ Dismiss'. Now dismiss the Trace Window and again start adding\n\ additional traces to the Trace Window. You will notice that now the\n\ number of traces shown will not exceed 3.\n\ \n\ If you want to view a large number of traces at once, you should use\n\ the SHOW ALL TRACES (described above).\n\ \n\ \n\ 15.19) SCALING THE TRACES \n\ \n\ In the Trace Window, grab the thumb of the line that is labelled \"V\"\n\ (for Vertical magnification) and move it back and forth, noticing the\n\ effect on the traces. This is useful if the traces are too small or\n\ too large. There are several other methods of scaling the traces you\n\ will learn later.\n\ \n\ \n\ 15.20) HOTKEYS FOR EDITING\n\ \n\ If you do a lot of editing, you will want to have a faster method\n\ of doing these edits than having the popup and selecting an option.\n\ Thus the following hot keys exist:\n\ \n\ \n\ < and > (less than and greater than) to make n's to the left\n\ and the right (respectively) of the cursor\n\ control-l and control-r to make low quality to the left and\n\ the right (respectively) of the cursor\n\ overstriking with a capital letter (e.g., C instead of c) causes\n\ the base to become high quality rather than low quality\n\ overstriking with a lower case letter causes the base to become\n\ low quality\n\ \n\ Give these a try.\n\ \n\ 15.21) SCROLLING TRACES INDEPENDENTLY\n\ \n\ Dismiss all of your Trace Windows. Then pop up traces for 2\n\ different reads in approximately the same location. Scroll one of\n\ them. You may want to scroll by clicking the arrows or clicking to\n\ the left or right of the thumb. You will notice that both will\n\ scroll. Consed will do its best to have corresponding peak lined up.\n\ (Consed can't line all of them up because the peak spacing is not\n\ uniform and differs from read to read.) Try removing a trace by\n\ clicking on one of the 'Remove' buttons in the Trace Window. Try\n\ adding other traces. Then click on 'No' for scrolling the traces\n\ together and try scrolling. You will now observe that they scroll\n\ separately.\n\ \n\ \n\ 15.22) MEASURING ERROR RATE AND SINGLE SUBCLONE BASES FOR A REGION\n\ \n\ Some contigs have long tails of low quality bases and you would\n\ like to find out the error rate for the contig without that long\n\ tail. On the Align Reads Window, pull down the Misc menu, and release \n\ on 'Show Errors for a Region'. This will tell you both the error rate \n\ for the region and the number of single subclone bases for that region.\n\ \n\ \n\ 15.23) PREVENTING 2 USERS FROM MAKING CONFLICTING EDITS\n\ \n\ If there are 2 users that are both editing in the same directory,\n\ there is the possibility they will both make edits to the same read.\n\ Whoever saves their assembly last will wipe out the edits of the other\n\ person, even if they were using different ace files. To help prevent\n\ this, consed can warn you if someone else is making edits in the same\n\ directory. Set the consed parameter:\n\ \n\ consed.onlyAllowOneReadWriteConsedAtATime: true\n\ \n\ The default is \"false\" so you have to turn this to true to make it\n\ work (see CONSED CUSTOMIZATION).\n\ \n\ This will usually work even if the 2 users are on different computers\n\ (and the directory is nfs-mounted between them) and even if the\n\ different computers have different operating systems. I've tested the\n\ following combinations:\n\ user 1 on Solaris; user 2 on Solaris\n\ user 1 on Linux; user 2 on Linux\n\ "; static char szReadMe54[] = "\n\ user 1 on Solaris; user 2 on Alpha (Digital Unix)\n\ user 1 on Linux; user 2 on Solaris <--- does not work\n\ \n\ Only the last combination doesn't work.\n\ \n\ \n\ \n\ 15.24) PRINTING CONSED WINDOWS\n\ \n\ There is a free (or nearly free) program called \"xv\". One web site is\n\ http://www.trilon.com/xv It is written by one of those dying breed of\n\ UNIX programmers who just *loved* UNIX and programming and sharing it.\n\ His web site is enjoyable because some of his passion comes through.\n\ With xv, you can make a postscript file from a Consed window. Then\n\ you can print the postscript file on a color printer.\n\ \n\ However, since some Consed windows are mostly black (Aligned Reads\n\ Window and Traces Window), this uses up a lot of toner and is\n\ difficult to read. So go to the Consed Main Window, pulldown the\n\ 'Options' menu and release on 'General Preferences'. Scroll down to\n\ \"Make light background in Aligned Reads Window...\" and click on \"Do it\n\ now\". Dismiss any Aligned Reads Windows or Traces Windows and then\n\ bring them back up. You will notice the light background. A few\n\ other things (traces colors and thickness) are also customized for\n\ making color prints. \n\ \n\ 15.25) COLOR MEANS EDITED AND TAGS\n\ \n\ (For this step, first click on the 'Dim' menu and release on 'Dim\n\ Nothing'.) Point to the 'Color' menu, hold down the left mouse button\n\ and release on 'Color Means Edited and Tags'. Notice that the bases\n\ that you have edited (make sure you have edited some bases) will stand\n\ out in either white or grey (depending on whether the base was made\n\ high quality or low quality). Observe this both in the Trace Window\n\ and the Aligned Reads window. This colormode is useful if you are\n\ interested in easily spotting which bases are edited.\n\ \n\ \n\ 15.26) COLOR MEANS MATCH\n\ \n\ In the Aligned Reads Window, go to the menu labelled 'color', and\n\ pulldown and release on 'color means match'.\n\ \n\ Now you notice different colors: The\n\ colors have the following meaning:\n\ \n\ Blue: agrees with consensus\n\ Orange: disagrees with consensus\n\ Yellow: this stretch of this read was used by phrap to form the consensus\n\ Grey: Low quality or unaligned ends of reads \n\ \n\ Return to the 'Color Means Quality and Tags' colormode by the\n\ following: point to the 'Color' menu, hold down the left mouse button\n\ and release on 'Color Means Quality and Tags'. This is the colormode\n\ most commonly used.\n\ \n\ \n\ \n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ \n\ \n\ 16. CONSED CUSTOMIZATION\n\ \n\ If you want to customize Consed, it would help to be able to edit in\n\ UNIX. There is no Microsoft Word in UNIX, but there is emacs, vi,\n\ pico, nano and other editors. \n\ \n\ I suggest pico or nano for their simplicity. (You can get more\n\ information by googling, for example, \"pico unix editor\".)\n\ \n\ Point at the 'Info' menu on the Consed Main Window, hold down the left\n\ mouse button and release on menu item 'Show Current Consed Parameters'. This\n\ shows you what is available to be changed by putting in your\n\ ~/.consedrc file.\n\ \n\ You can also see what is available by typing on the command line:\n\ consed -printDefaultResources\n\ \n\ Point at the 'Info' menu on the Consed Main Window, hold down the left\n\ mouse button and release on menu item 'Show Default X Resources'.\n\ This shows you what is available to be changed by putting in your\n\ ~/.Xdefaults file.\n\ \n\ Type:\n\ \n\ consed -editConsedrc\n\ \n\ This includes most of the parameters found under 'Info/Show Current\n\ Consed Parameters' (above). It provides an easy graphical way for you\n\ to edit these parameters, if you are not familiar with editing under\n\ UNIX. You just change the parameter you want and click \"Save\". (See\n\ HOW TO CHANGE CONSED/AUTOFINISH PARAMETERS (far above)). For the new\n\ parameter to take effect, you must restart Consed/Autofinish.\n\ \n\ Changes in ~/.consedrc only affect one user. If you want to make a\n\ change to affect all Consed users on the system, put a file in some\n\ "; static char szReadMe55[] = "\n\ central location (e.g., /usr/local/genome/lib/.consedrc ) and then\n\ have every user set the environment variable CONSED_PARAMETERS to\n\ that the full pathname of the file. For example, if using csh or\n\ tcsh, type:\n\ \n\ setenv CONSED_PARAMETERS /usr/local/genome/lib/.consedrc\n\ \n\ If using bash, type:\n\ \n\ CONSED_PARAMETERS=/usr/local/genome/lib/.consedrc\n\ export CONSED_PARAMETERS\n\ \n\ Anything the user puts in ~/.consedrc will override whatever is in the\n\ CONSED_PARAMETERS file.\n\ \n\ You can also have different parameters for different projects. Put a\n\ .consedrc file in the edit_dir of a particular project. When you are\n\ working on that project, whatever is in that .consedrc will override\n\ whatever is in your ~/.consedrc file or the CONSED_PARAMETERS file.\n\ \n\ \n\ 16.1) CUSTOMIZING NAVIGATE BY SINGLE STRANDED REGIONS AND NAVIGATE BY SINGLE\n\ SUBCLONE REGIONS\n\ \n\ You can set the parameters:\n\ \n\ consed.searchFunctionsUseUnalignedEndsOfReads: false\n\ consed.searchFunctionsUseLowQualityEndsOfReads: true\n\ \n\ If you set consed.searchFunctionsUseUnalignedEndsOfReads to be false,\n\ then the unaligned ends of a read are not considered to cover the\n\ consensus.\n\ \n\ If you set consed.searchFunctionsUseLowQualityEndsOfReads to false,\n\ then the low quality ends of a read are not considered to cover the\n\ consensus.\n\ \n\ For example, if the settings are:\n\ \n\ consed.searchFunctionsUseUnalignedEndsOfReads: false\n\ consed.searchFunctionsUseLowQualityEndsOfReads: false\n\ \n\ then a base in a read is only considered to cover the consensus if it\n\ is both in the aligned portion of the read and the high quality\n\ portion of the read.\n\ \n\ 16.2) .consedrc vs .Xdefaults\n\ \n\ Although most Consed parameters now go into .consedrc, there are still\n\ a very few that need to stay in .Xdefaults. Here is the rule: if the\n\ parameter starts with\n\ \n\ consed.\n\ \n\ such as\n\ \n\ consed.gunzipFullPath: /bin/uncompress\n\ \n\ then it goes into .consedrc\n\ \n\ If the resource (here it is called a \"resource\" rather than a\n\ \"parameter\") starts with\n\ \n\ consed*\n\ \n\ such as\n\ \n\ consed*contigwin.background: Black\n\ \n\ then it goes in .Xdefaults\n\ \n\ \n\ 16.3) MAKING LIGHT BACKGROUND FOR SLIDES\n\ \n\ Put the following into your .consedrc\n\ \n\ consed.makeLightBackgroundInAlignedReadsWindowAndTracesWindow: true\n\ \n\ \n\ 16.4) COLOR BLINDNESS\n\ \n\ One person with Red/Green colorblindness (Deutan), found the following\n\ colors helpful:\n\ \n\ consed.colorTracesG: Yellow\n\ consed.colorTracesA: forest green\n\ consed.colorTracesC: medium blue\n\ consed.colorTracesT: light coral\n\ \n\ Put these in a .consedrc in your home directory.\n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ \n\ 17. CREATING CUSTOM TAG TYPES\n\ \n\ You can add your own tag types by creating a file of your custom tag\n\ types. The file looks like this:\n\ \n\ "; static char szReadMe56[] = "\n\ mytag1 red consensus yes\n\ mytag2 purple both yes\n\ mytag3 green read no\n\ \n\ field 1 (\"mytag1\") is the tag name\n\ field 2 (\"red\") is the color\n\ field 3 is \"consensus\", \"read\", or \"both\" depending on which kind of tag\n\ it is\n\ field 4 is \"yes\" or \"no\" depending on whether the user can add\n\ this tag in Consed (by swiping) or whether it is a tag that\n\ can only be viewed in Consed (presumably it would be added by\n\ some software of yours before the user sees it in Consed).\n\ \n\ If the file is called \"/usr/local/genome/lib/tagTypes.txt\", then in\n\ .consedrc put the following line:\n\ \n\ consed.fileOfTagTypes: /usr/local/genome/lib/tagTypes.txt\n\ so that Consed knows where the file is.\n\ \n\ Once you have done this, the user of Consed can add tags of these\n\ types in the method described in TAGS of the Quick Tour (above).\n\ \n\ The list of available colors is found in the file rgb.txt found in\n\ /usr/X11R6/shar/X11/rgb.txt on macosx, /usr/lib/X11/rgb.txt on Linux\n\ or /usr/openwin/lib/rgb.txt on Solaris. For more information, consult\n\ any X-Windows reference, since this has nothing to do specifically\n\ with Consed. For your convenience, here are a few of the color names.\n\ One way to find out what they look like is to try them:\n\ \n\ mint cream DeepSkyBlue1 DeepPink4 \n\ azure DeepSkyBlue2 HotPink1 \n\ alice blue DeepSkyBlue3 HotPink2 \n\ lavender DeepSkyBlue4 HotPink3 \n\ lavender blush SkyBlue1 HotPink4 \n\ misty rose SkyBlue2 pink1 \n\ white SkyBlue3 pink2 \n\ black SkyBlue4 pink3 \n\ dark slate gray LightSkyBlue1 pink4 \n\ dim gray LightSkyBlue2 LightPink1 \n\ slate gray LightSkyBlue3 LightPink2 \n\ light slate gray LightSkyBlue4 LightPink3 \n\ gray SlateGray1 LightPink4 \n\ \n\ \n\ You can also associate data with tags. For example, you can have a\n\ tag type SNPprobability which gives, at a particular consensus\n\ position, the probability that a base is a SNP. Thus there needs to\n\ be a floating point number with the tag. This can be defined in the\n\ same file /usr/local/genome/lib/tagTypes.txt (as above), but instead\n\ of having one line for the tag type (as shown above), it has a more\n\ complicated structure to allow for tag fields:\n\ \n\ TAG_TYPE\n\ NAME: tag_type\n\ CONS_OR_READ: both\n\ USER_CAN_ADD: yes\n\ COLOR: color1\n\ FIELD: name type\n\ POINTER_FIELD: name (type of tag pointed to) (optional ?*+)\n\ END_TAG_TYPE\n\ \n\ CONS_OR_READ: can be followed by \"consensus\", \"read\", or \"both\".\n\ \n\ In FIELD: name type,\n\ \"type\" is one of integer, floating, or string\n\ \n\ In POINTER_FIELD: name (type of tag pointed to) (optional ?*+)\n\ \n\ ? means 0 or 1 occurrences\n\ * means 0 or more occurrences\n\ + means 1 or more occurrences\n\ absence of any of these means exactly 1 occurrence\n\ \n\ Note that FIELD and POINTER_FIELD can both be present 0 or more times.\n\ \n\ Here is an example for a SNP with a probability:\n\ \n\ tagTypes.txt contains:\n\ \n\ TAG_TYPE\n\ NAME: SNP\n\ CONS_OR_READ: consensus\n\ USER_CAN_ADD: yes\n\ COLOR: yellow\n\ FIELD: probability floating\n\ END_TAG_TYPE\n\ \n\ Note that tagTypes.txt is not read by default. You must set the\n\ .consedrc parameter:\n\ \n\ consed.fileOfTagTypes: tagTypes.txt\n\ \n\ The ace file will then contain SNP tags that look like this:\n\ \n\ CT{\n\ Contig1 SNP consed 1863 1870 091030:091242\n\ probability 75.2\n\ }\n\ \n\ \n\ "; static char szReadMe57[] = "\n\ \n\ Here is an example of a user-defined tag type that points to another\n\ copy:\n\ \n\ tagTypes.txt:\n\ \n\ TAG_TYPE\n\ NAME: tear2\n\ CONS_OR_READ: consensus\n\ USER_CAN_ADD: yes\n\ COLOR: yellow\n\ POINTER_FIELD: other_tear2_tag tear2\n\ END_TAG_TYPE\n\ \n\ and these tags look like this in the ace file:\n\ \n\ CT{\n\ Contig2 tear2 consed 6470 6470 090714:103935\n\ ID: 5\n\ other_tear2_tag 6\n\ }\n\ \n\ CT{\n\ Contig2 tear2 consed 6487 6487 090714:103941\n\ ID: 6\n\ other_tear2_tag 5\n\ }\n\ \n\ This means that the tear2 tag with ID 5 refers to the other tear2 with\n\ ID 6, and visa versa.\n\ \n\ \n\ \n\ ----------------------------------------------------------------------------\n\ \n\ 18. EXPANDING CONSED'S CAPABILITIES WITH A LITTLE PROGRAMMING\n\ \n\ Lab managers: Please do not get put off by the title of this\n\ section. You should read through this section so you are aware of\n\ what consed is capable of. If you think one of these features would\n\ be very helpful to your lab, then get a programmer to spend a day or\n\ two and write you some scripts that could really help you out. But\n\ first you need to be aware of what is possible. So read through this.\n\ \n\ 18.1) BRINGING UP CONSED FROM A SCRIPT\n\ \n\ Suppose that you want to write a script that brings up consed on one\n\ ace file to a particular position, and then brings up consed on\n\ another ace file at a particular position, and then brings up consed\n\ on another ace file at a particular position, ... you can do this by:\n\ \n\ consed -ace (name of ace file) -mainContigPos (unpadded pos)\n\ \n\ This will bring up consed with the main contig (the contig with the\n\ most number of reads) with the Aligned Reads Window already up and\n\ scrolled to position (unpadded pos).\n\ \n\ Thus you could write a script like this:\n\ \n\ cd directory1\n\ consed -ace file1.ace -mainContigPos 1050\n\ cd directory2\n\ consed -ace file2.ace -mainContigPos 2057\n\ cd directory3\n\ consed -ace file3.ace -mainContigPos 1487\n\ .\n\ .\n\ .\n\ \n\ 18.2) CONTROL OF CONSED FROM SOME OTHER PROGRAM\n\ \n\ Consed can be controlled by some other program. For example, you\n\ might have a program that displays mapping data and you would like the\n\ user to be able to click on a location and have Consed come up showing\n\ the bases in that region. This feature allows a programmer to do\n\ this.\n\ \n\ Here is an example of how to do this:\n\ \n\ cd to standard/edit_dir\n\ \n\ Start consed as follows:\n\ \n\ consed -socket 5432 -ace standard.fasta.screen.ace.1\n\ \n\ If a window pops up asking if you want to apply edits, answer \"no\".\n\ \n\ Open another xterm and cd to standard/edit_dir\n\ \n\ From this directory, run the script testSocket.perl (thanks to Bill\n\ Gilliland) which is supplied with consed in the scripts directory of\n\ the consed distribution.\n\ \n\ This script will say:\n\ \n\ issuing command Scroll Contig1 100\n\ waiting for you to type a command (such as Scroll Contig1 150)...\n\ \n\ You should immediately see consed's Aligned Reads Window open and\n\ scroll automatically to position 100.\n\ "; static char szReadMe58[] = "\n\ \n\ If you were to type (in the window in which testSocket.perl is\n\ running):\n\ \n\ Scroll Contig1 150\n\ \n\ then you would see Consed immediately scroll to position 150.\n\ \n\ \n\ Here are the details of how you could use it:\n\ \n\ The external program can start up Consed as follows:\n\ \n\ consed -socket (local port number) -ace (ace filename)\n\ \n\ For example,\n\ \n\ consed -socket 5432 -ace standard.fasta.screen.ace.1\n\ \n\ After Consed completes coming up (including you clicking whether you\n\ want to apply edits), you will see the message in the xterm:\n\ \n\ success bind to local port number: 5432\n\ \n\ And then you will see a file created by Consed in the default\n\ directory (which is usually the directory the ace file is in) called\n\ consedSocketLocalPortNumber\n\ \n\ This gives the port number of the Berkeley socket that Consed has\n\ opened and is listening on. Thus your program can read this file and\n\ create a connection to the Berkeley socket created by Consed.\n\ \n\ Once the connection is established, your program can send commands to\n\ Consed at that socket indicating to Consed which contig to display and\n\ what consensus position to scroll to. Currently, the only acceptable\n\ commands are:\n\ \n\ Scroll (contigname) (consensus position)\n\ PopupTraces (read name) (unpadded read position in the direction of sequencing)\n\ \n\ 'Unpadded read position in the direction of sequencing' is the\n\ position from the right end, if the read is a bottom strand read.\n\ \n\ Just send such a command to the Berkeley socket, and Consed will\n\ respond appropriately. (Currently, Consed doesn't like it if another\n\ process establishes a connection and then terminates without first\n\ terminating the connection.)\n\ \n\ \n\ \n\ \n\ 18.3) REMOVING READS FROM A SCRIPT\n\ \n\ Consed can remove reads from the command line (without the graphical\n\ interface) as follows:\n\ \n\ consed -ace (ace file) -removeReads (file with reads to remove)\n\ \n\ consed -ace (ace file) -removeContigs (file with contigs to remove)\n\ \n\ consed -removeReads and consed -removeContigs do not bring up the\n\ graphical interface. What is done with the reads is governed by the\n\ .consedrc parameter consed.removeReadsPutIntoOwnContig: true or false\n\ (see CONSED CUSTOMIZATION).\n\ \n\ \n\ \n\ 18.4) HOW TO WRITE A CUSTOM NAVIGATION FILE\n\ \n\ In the Main Window, there is also a Navigate menu. Pull it down and\n\ release on the Custom Navigation menu item. A box will pop up saying\n\ 'Select custom navigation file:' \n\ There will be a file:\n\ custom_navigation.nav\n\ Double click on it.\n\ \n\ You will see the now-familiar custom navigation box. Click 'Next'\n\ repeatedly until you get to the end of the list.\n\ \n\ Consed doesn't write such a file--it just reads it. This feature\n\ allows you the ability to write your own programs that select\n\ locations that you want your finishers to examine. Your program\n\ writes a file, the user reads that file into Consed in this manner,\n\ and you can go to each of the locations.\n\ \n\ The format of the file is as follows:\n\ \n\ There is a title line that looks like this:\n\ \n\ TITLE: low quality base in discrepant region\n\ \n\ and then there are blocks that look like this:\n\ \n\ BEGIN_REGION\n\ TYPE: READ\n\ READ: B11_hs1-60153193_GGor_050426.f\n\ UNPADDED_READ_POS: 34 34\n\ COMMENT: a comment\n\ END_REGION\n\ \n\ "; static char szReadMe59[] = "\n\ The block above refers to read position 34 of read\n\ B11_hs1-60153193_GGor_050426.f Even if this read is complemented in\n\ the assembly (it is right to left), this position refers to the base\n\ position in the direction of sequencing--same as the position within\n\ the PHD file.\n\ \n\ \n\ There is another kind of block:\n\ \n\ BEGIN_REGION\n\ TYPE: CONSENSUS\n\ CONTIG: hs21-15002178_HSap-Contig\n\ UNPADDED_CONS_POS: 1774 1784\n\ COMMENT: another comment\n\ END_REGION\n\ \n\ which refers to a position on the consensus. Notice that it is\n\ missing the \"READ:\" line, the TYPE: line is different, and instead of\n\ \"UNPADDED_READ_POS\" it has \"UNPADDED_CONS_POS\". When\n\ someone is navigating, the blinking cursor will be put onto the\n\ consensus (with the second kind of block) rather than the blinking\n\ cursor on the read (with the first kind of block).\n\ \n\ You might want to specify the consensus positions in terms of some\n\ user-defined positions (the first position of the consensus is not 1\n\ but rather is some other number). For example, you might want to use\n\ chromosome positions, rather than the position within the contig. You\n\ can let Consed know that the UNPADDED_CONS_POS numbers are\n\ user-defined positions by putting the words \"user-defined positions\"\n\ somewhere in the TITLE line like this:\n\ \n\ TITLE: low quality base in discrepant region (user-defined positions)\n\ \n\ So that Consed knows what number to start numbering the consensus at,\n\ you must have a startNumberingConsensus tag on the consensus or a read\n\ indicating the user-defined position of the left-end of the contig.\n\ See USER-DEFINED CONSENSUS POSITIONS in this document.\n\ \n\ There is a 3rd type of block that you probably won't use much. It is\n\ used when you know the consensus position within a read, but not the\n\ read position. Then you can use:\n\ \n\ BEGIN_REGION\n\ TYPE: READ\n\ CONTIG: hs2-105068850_HSap-Contig\n\ READ: E02_hs2-105068850_PTro_040520.f\n\ UNPADDED_CONS_POS: 295 299\n\ COMMENT: left 2\n\ END_REGION\n\ \n\ The block above refers to a position on read\n\ E02_hs2-105068850_PTro_040520.f in contig hs2-105068850_HSap-Contig at\n\ consensus positions 295-299.\n\ \n\ \n\ 18.5) COMPRESSING CHROMATOGRAMS\n\ \n\ If you are interested in compressing your chromatogram files, go into\n\ chromat_dir and gzip one of the chromatogram files. Make sure that\n\ gunzip is in /usr/local/bin (You can change this location via the\n\ Consed parameter\n\ \n\ consed.gunzipFullPath: /usr/local/bin/gunzip\n\ \n\ --see CONSED CUSTOMIZATION (above), but it will be easiest for \n\ you and your users if you just put gunzip (or a link to it) in\n\ /usr/local/bin and not have to bother with Consed parameters.)\n\ \n\ Restart Consed and bring up the corresponding trace. You will notice\n\ no appreciable delay.\n\ \n\ \n\ 18.6) READING CHROMATOGRAMS OUT OF AN EXTERNAL DATABASE\n\ \n\ Normally, chromatograms are kept in ../chromat_dir. If you want to\n\ keep them somewhere else (such as in an external database), you can do\n\ that. When the chromatogram is needed (when the user asks to view a\n\ trace), Consed will call an external program, passing it the name of\n\ the read required, and then look for the chromatogram in /tmp (by\n\ default). It will read the chromatogram and then delete it. Use the\n\ parameters:\n\ \n\ consed.alwaysRunProgramToGetChromats: true\n\ consed.programToRunToGetChromats: /usr/local/bin/programToGetChromat\n\ \n\ In this case, \"programToGetChromat\" is the name of the program that\n\ gets the chromatogram and puts it into /tmp.\n\ \n\ If you keep *some* chromats in an external database but *some*\n\ chromats are in ../chromat_dir, then set\n\ \n\ consed.alwaysRunProgramToGetChromats: last\n\ \n\ which means it will first look in ../chromat_dir and, if it doesn't\n\ find it, it will then run the program to get the chromats.\n\ \n\ \n\ 18.7) COMPRESSING ACE FILES\n\ \n\ [COMPRESSED ACE FILES]\n\ "; static char szReadMe60[] = "\n\ \n\ You can also compress ace files, and have consed read them like this\n\ (from bash as your shell):\n\ \n\ consed -ace <(gunzip -c myAce.ace.gz)\n\ \n\ However, in tests that I did, this did not appear to have any\n\ performance improvement. I believe the reason is that consed is not\n\ I/O bound--when starting up, reading the ace file is a very small\n\ portion of the time. In fact, you will notice that when consed is\n\ starting up, it uses 100% of the cpu.\n\ \n\ 18.8) NO PHD FILES\n\ \n\ Try bring up Consed like this:\n\ \n\ consed -nophd\n\ \n\ This mode allows you to view an assembly when you don't have phd files\n\ or chromatograms but you only have the ace file. I do not recommend\n\ nor support this option! There are so many things that do not work\n\ with this option that I haven't bothered to keep track of them, but\n\ here are a few items: can't make joins, can't recalculate consensus\n\ quality, can't view traces, can't edit, autofinish will not give good\n\ results, can't view quality of the read bases, ...\n\ \n\ \n\ 18.9) ADDING TAGS FROM OTHER PROGRAMS\n\ \n\ You can also write external programs that add tags to the ace file\n\ and/or the phd files. Both RT (read) and CT (consensus) tags can be\n\ appended to the end of the ace file. BEGIN_TAG tags can be appended\n\ to the end of the phd files. Do not rewrite the ace file or the phd\n\ file--there is no need to do so and it will cause problems. See\n\ SAMPLE PHD BALL FORMAT in this document for the format of BEGIN_TAG\n\ read tags.\n\ \n\ \n\ 18.10) USER-DEFINED CONSENSUS POSITIONS\n\ \n\ Suppose instead of labeling the consensus 1, 2, 3, 4, ..., you want,\n\ for example, to number it: 100,000,001, 100,000,002, 100,000,003,\n\ 100,000,004, etc. (e.g., in chromosome positions). You can do this.\n\ Note that all bases in the consensus (except pads) will be\n\ numbered--you cannot, for example, only number exon bases and not\n\ number intron bases (pity).\n\ \n\ To start numbering the consensus at a number different from 1, add a\n\ \"startNumberingConsensus\" tag to either the consensus or a read in\n\ that contig. The tag will look like this (this is a consensus tag in\n\ the ace file):\n\ \n\ CT{\n\ hs18-25105605_HSap-Contig startNumberingConsensus consed 1 1 041123:152840\n\ 25105605\n\ }\n\ \n\ This says that the consensus will be numbered starting at 25,105,605\n\ \n\ You cannot add such a tag by using Consed--you must have a program add\n\ it to the ace file (or a phd file of one of the reads in the contig).\n\ \n\ See above under selectRegions for how to turn user-defined consensus\n\ positions on and off.\n\ \n\ 18.11) DEFINING KEYS (HOTKEYS) TO CALL EXTERNAL PROGRAMS AND/OR APPLY TAGS AND/OR\n\ INTEGRATE CONSED WITH EXTERNAL DATABASES\n\ \n\ [CUSTOM KEYS, USER-DEFINED KEYS]\n\ \n\ You can define keys (such as Control-N) to apply a particular tag to a\n\ single base, saving you the several steps in applying tags: swiping\n\ and selecting a tag type (as shown under \"TAGS\" above). However, it\n\ is even more powerful than that. You can also define an external\n\ program to run when you type this key. That external program can be\n\ your own, and it could be, for example, a program that puts\n\ information into an external database.\n\ \n\ The first thing you need to set up a custom hotkey is a .consedrc\n\ file which goes in edit_dir of the project you're working on (see\n\ above CONSED CUSTOMIZATION for other possible locations).\n\ \n\ Put the following in that file:\n\ \n\ consed.userDefinedKeys: 14 15\n\ ! make a space-separated list of the decimal ASCII values of the keys\n\ ! 14 means control-N, 15 means control-O\n\ \n\ consed.programsForUserDefinedKeys: /bin/echo /bin/echo\n\ ! a space-separated list of the full pathnames of the commands to run\n\ \n\ consed.argumentsToPassToUserDefinedPrograms: argument_for_first_key argument_for_se\n\ cond_key\n\ ! a space-separated list of the arguments to pass to each user-defined programs\n\ \n\ consed.tagsToApplyWithUserDefinedKeys: none polymorphismConfirmed\n\ ! a space-separate list of the tag types to apply when the user\n\ ! presses a user-defined key. If a key is to have no associated tag,\n\ ! then enter \"none\" for that key.\n\ \n\ "; static char szReadMe61[] = "\n\ \n\ This makes control-N and control-O (\"oh\"--not zero) call \"/bin/echo\"\n\ by default. In either the aligned reads window or the trace window,\n\ click the cursor on a base and try these keys (e.g., holding down the\n\ control key and typing 'o'). Watch in the xterm where you started\n\ Consed for output like this:\n\ \n\ argument_for_first_key djs74-561.s1 97 Contig1 2534 2581 a 51 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 tr.window\n\ argument_for_second_key djs74-2679.s1 78 Contig1 2527 2574 c 39 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 a.r.window\n\ \n\ djs74_561.s1 the read the user was viewing (or \"consensus\" if the\n\ cursor is on the consensus)\n\ 97 the base position in the direction of sequencing (or -1 if the\n\ cursor is on the consensus)\n\ Contig1 the contig\n\ 2534 the unpadded consensus position\n\ 2581 is the padded (counts *'s) consensus position\n\ 'a' is the base\n\ 51 is the quality of the base\n\ /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 is the ace file \n\ tr.window means it was called by the user pushing the key in the trace\n\ window--not the aligned reads window.\n\ \n\ \n\ It's the same as if you had run the \n\ program from the shell, with command-line arguments, like this:\n\ \n\ bash%: /bin/echo argument_for_first_key djs74-561.s1 97 Contig1 2534 2581 a 51 /kw3/gordon/consed_demo/standard/edit_dir/standard.fasta.screen.ace.1 tr.window\n\ \n\ You will also see that control-O will automatically add a\n\ polymorphismConfirmed tag, but control-N will not add any tag. That\n\ is because of consed.tagsToApplyWithUserDefinedKeys (see above).\n\ \n\ Several groups that are doing polymorphism detection have expressed\n\ interest in this feature because it enables them to have Consed\n\ directly write into an external database (e.g., Oracle or Sybase) by\n\ calling a program that then writes to the database.\n\ \n\ You can use these hotkeys from within the trace window or the aligned\n\ reads window.\n\ You don't have to use only ctrl-N/ctrl-O... for instance 1 is \n\ control-A, 2 is control-B, 3 is control-C, 4 is control-D, etc.\n\ \n\ If you want to pass this information to a database, you will need to know \n\ how to talk to your database, and either choose your hotkey to do it \n\ directly for you, or call another program that takes the parameters \n\ above and massages them into the format your database needs.\n\ \n\ control-A, control-E, and control-T already mean something in the\n\ aligned reads window, so those keys cannot be defined to be anything\n\ else. Typically control-C, control-S, and control-Q already mean\n\ something to the operating system so you can't use those, either.\n\ \n\ \n\ 18.12) READ PREFIXES\n\ \n\ You can create a file called readPrefixes.txt in edit_dir. This file\n\ contains a list of reads and prefixes for those reads. In the Aligned \n\ Reads Window, the Consed user will see those read prefixes in a column \n\ before the read names. This can be a very helpful feature for\n\ finishers. For example, these read prefixes can indicate to the\n\ finishers which templates are available to use for making finishing\n\ reads.\n\ \n\ The format of the file is:\n\ \n\ (readname) (read prefix) (color for read prefix)\n\ \n\ The read prefix and color for read prefix are optional. If you\n\ leave them out, you get '*' for the read prefix in blue.\n\ \n\ \n\ The consed parameters involving this feature are:\n\ \n\ consed.defaultReadPrefix: *\n\ consed.readPrefixesFile: readPrefixes.txt\n\ consed.maxCharsDisplayedForReadPrefix: 1\n\ \n\ but you probably won't need to change them.\n\ \n\ \n\ 18.13) USING FILES CREATED ON WINDOWS OR WINDOWS NT. \n\ \n\ Don't. (E.g., phd files generated by a Beckman CEQ-2000.) These\n\ files initially had at end of line instead of . CONSED\n\ chokes every time it tries to read something from these phd files.\n\ If you must use these files, you must first convert them to UNIX\n\ format, which means stripping out the CR's and just having \n (decimal 10)\n\ separate lines.\n\ \n\ 18.14) CREATING YOUR OWN ACE FILES (INSTEAD OF ACE FILES CREATED BY\n\ PHRAP)\n\ \n\ Some people have tried creating their own ace files, try Consed on it,\n\ and when Consed starts up ok, they don't understand when later some\n\ feature in Consed doesn't work. This is because Consed does not check\n\ everything about an ace file when it starts up. If you are going to\n\ write software to create ace files, here is a partial list of Consed\n\ features you should check before you think your ace files are fine for\n\ Consed:\n\ "; static char szReadMe62[] = "\n\ \n\ assembly view\n\ restriction digest\n\ read all traces\n\ complement contig and then read all traces\n\ add new reads\n\ \n\ If all of these work properly, then your ace files are probably ok.\n\ \n\ \n\ 18.15) CONSED OPTIONS\n\ \n\ You've seen quite a few consed options, such as -removeReads, -socket,\n\ -ace, -nophd, -removeContigs, etc.\n\ \n\ To see them all, type \n\ consed -help\n\ \n\ \n\ \n\ --------------------------------------------------------------------------\n\ \n\ 19. MONITORS AND MICE FOR CONSED\n\ \n\ If your monitor is part of a Unix computer (a Linux box, a Mac or a Sun) or\n\ is an Xterminal, then you will have absolutely no problems.\n\ \n\ If your monitor is a PC running Windows (any flavor), then you must\n\ have an X emulator installed and running. X emulators include:\n\ Exceed, XWin32, Reflection X, and OpenNT. Any of these will work if\n\ configured correctly (and the 'correctly' is the key). I encourage\n\ you to use single window mode (where there is one huge unix window\n\ with xterms inside it) and then use a Unix window manager such as CDE,\n\ fvwm, or mwm.\n\ \n\ If your monitor is a MAC with macosx running, see NOTE TO MACOSX\n\ USERS (above).\n\ \n\ Whatever you monitor, you must have 3 button mouse or 3 button\n\ emulation. 3 Button emulation is tricky since Consed uses all 3\n\ buttons of the mouse and it also uses Control-Middle-Mouse-button,\n\ Shift-Middle-Mouse-Button and Control-Right-Mouse-Button. So if you\n\ are going to try to just use a 2 button mouse (or, God-forbid, a 1\n\ button mouse), you should make sure that you can emulate each of\n\ those. Often, if you push the left and right mouse buttons at the\n\ same time, your X server will interpret that to be the middle mouse\n\ button. But you must consult your X emulator or X server to know what\n\ it will do--that is out of Consed's control.\n\ \n\ \n\ --------------------------------------------------------------------------\n\ \n\ 20. ACE FILE FORMAT\n\ \n\ \n\ Note that consed really requires both an ace file and a phd ball to\n\ fully function. If you are trying to write files that consed can\n\ read, I strongly urge you to write both files. Read the next section\n\ about phd balls. \n\ \n\ \n\ Refer to the accompanying sample_ace_file.txt (below)\n\ \n\ AS \n\ \n\ CO <# of bases> <# of reads in contig> <# of base segments in contig> \n\ \n\ This defines the contig. The U or C indicates whether the contig has\n\ been complemented from the way phrap originally created it. Thus this\n\ is always U for an ace file created by phrap.\n\ \n\ The contig sequence follows. It includes pads--\"*\" characters which\n\ are inserted by phrap in order to make room for some read that has an\n\ extra base at that position. (Note: any position which counts the *'s is\n\ referred to as a \"padded position\". A position that does not count\n\ *'s is referred to as \"unpadded position\".)\n\ \n\ BQ\n\ \n\ This starts the list of base qualities for the unpadded consensus\n\ bases. (NB: annoyingly, no qualities are given for *'s in the\n\ consensus.) The contig is the one from the previous CO, hence no name\n\ is needed here.\n\ \n\ \n\ AF \n\ \n\ This defines the location of the read within the contig.\n\ C or U means complemented or uncomplemented. \n\ means the position of the\n\ beginning of the read, in terms of consensus bases which start at 1\n\ and do count *'s. \n\ \n\ BS \n\ \n\ The BS line (base segment) indicates which read phrap has chosen to be\n\ the consensus at a particular position.\n\ \n\ BS lines are now optional since they don't make much sense for\n\ assemblers other than phrap.\n\ "; static char szReadMe63[] = "\n\ \n\ If you choose to to write BS lines, I suggest you choose any read\n\ which matches the consensus perfectly over the stretch of bases.\n\ There must not be any two BS lines that intersect. Each unpadded base\n\ must be included in some BS line.\n\ \n\ RD <# of padded bases> <# of whole read info items> <# of read tags>\n\ Below RD is the sequence of bases for the read. The sequence includes\n\ *'s and is in the orientation that phrap needed to align it against\n\ the consensus (thus it might be complemented from the direction it was\n\ sequenced). \n\ \n\ QA \n\ \n\ This line indicates which part of the read is the high quality segment\n\ (if there is any) and which part of the read is aligned against the\n\ consensus. These positions are offsets (and count *'s) from the left\n\ end of the read (left, as shown in Consed). Hence for bottom strand\n\ reads, the offsets are from the end of the read. The offsets are\n\ 1-based. That is, if the left-most base is in the aligned,\n\ high-quality region, = 1 and = 1 (not zero). If the entire read is low quality, then and will both be -1.\n\ \n\ DS CHROMAT_FILE: PHD_FILE: TIME: CHEM: DYE: TEMPLATE: