Usage: tacg -flag option -flag option ...outfile tacg uses stdin/stdout/stderr; uses redirection or pipes for input and output; needs input specifier (| or <); output to screen (default), >file, | nextcmd Uses Knight's SEQIO for auto reformat on input; most ASCII formats accepted. 1 or more of: -F -g -G -l -L -O -p -P --rules --rulefile -s -S -X flags must be specified for output. +-----+---------+-----------+-----+----------+-----------+----+-------+ |1 | 2 | 3 | 4 | 5 | 6 | 7 | A | +-----+---------+-----------+-----+----------+-----------+----+-------+ |b | dam | i | o | rule | silent | X | ALL | |e | dcm | logdegens | O | rulefile | tmppath | # | Pages | |c | example | l | p | r regex | T | ex | | |C | f | L | P | R | v | +------ + |clone| F | strands | ps | raw | V | | i | |cost | g | notics | pdf | s | w | +-------+ |D | G | numstart | q | S | W slidwin | | index | | | h help | m/M | Q | | x | | | | | H HTML | n | | | | | | +-----+---------+-----------+-----+----------+-----------+----+-------+ (leading dashes for flags above have been removed for brevity) Type one of the numbers indicated to show the help page for that flag. 1-7 = page for those flags, 'A' = all pages, 'i' = index page, 'q' = quit a flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- -b {#} beginning of DNA subsequence; 1* for 1st base of sequence. -e {#} end of DNA subsequence; 0* for last base of sequence. -c order output by # of hits per Enz, else by order in REBASE file. -C {0*-12} Codon Usage table to use for translation: 0 - Standard 5 - Ciliate_Mito 10 - Ascidian_Mito 1 - Vert_Mito 6 - Echino_Mito 11 - Flatworm 2 - Yeast_Mito 7 - Euplotid_Nuclear 12 - Blepharisma 3 - Mold_Mito 8 - Bacterial 4 - Invert_Mito 9 - Alt_Yeast --clone {#_#,#x#..} find REs that don't cut in the range #_#, but do cut in the range #x#. Returns RE names & sites matching each criteria. --cost {#} use only REs that cost >= # units/$. Larger #s are cheaper. Requires optional data be added to rebase file; REs lacking this info are excluded. #s>100 are cheap; #s<10 are v. expensive. -D {0|1*-4} controls input and analysis of degenerate sequence where: 0 FORCES excl'n IUPAC degen's in sequence; only 'acgtu' accepted. 1* cut as NONdegenerate unless IUPAC char's found; then cut as '-D3'. 2 allow IUPAC char's; ignore in KEY hex, but match outside of KEY. 3 allow IUPAC char's; find only EXACT matches. 4 allow IUPAC char's; find ALL POSSIBLE matches. 1-7 = page for those flags, 'A' = all pages, 'i' = index page, 'q' = quit flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- --dam simulate cutting in the presence of Dam methylase (GmATC) --dcm simulate cutting in the presence of Dcm methylase (CmCWGG) for both above, REs not cutting due to methylation are listed as not cutting at all in summary. --example {1-10} example code to show how to add your own flags and functions. Search for 'EXAMPLE' in 'SetFlags.c' and 'tacg.c' for the code. -f {0|1*} form (or topology) of DNA - 0 (zero) for circular; 1 for linear. -F {0*-3} print/sort Fragments; 0*-omit; 1-unsorted; 2-sorted; 3-both. -g {Lo(,Hi)} prints a gel map w/ low cutoff of Lo; high cutoff of Hi bp. If Hi > Seq Length (or is omitted), Hi is set to Seq Length. -G {#,X|Y|L} streams numeric data to stdout for external analysis/plotting. # = bases/bin (the hits for this many bases should be pooled). X = bins on X axis; Y = bins on Y axis; L = Long output as 'bins(X) data(Y). -h (--help) asks for (this) brief help page. -H (--HTML) {0*|1} complete (0) or partial(1) HTML tags generated on the fly for WWW output. See man page for appro usage. 0 = makes standalone HTML page, with Table of Contents. 1 = no page headers, only TOC, to embed in other HTML pages. flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- -i (--idonly) {0|1*|2} controls output for seqs that have no hits. 0 - ID line and normal output printed regardless of hits. 1 (default) ID line and normal output are printed ONLY IF there are hits. 2 - ONLY ID line is printed if there are hits. -l prints a GCG-style ladder map. -L print a Linear map - produces LOTS of output (~10x input). --logdegens all degens logged for graphic output (mem intensive). can be made more space efficient by using next 2 options. --strands {1|2*} in Linear map, print 1 or 2 strands per line of DNA. --notics in Linear map, omit the tics that indicate 5, 10 bases. --numstart {#} in Linear map, start numbering at this number (+ or -). -m/M {#} minimum (-m) and/or Maximum (-M) # cuts/RE; 0* for all. -n {3*-10} magnitude of recognition site; 3 = all, 5 = 5,6,7.... flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- -o {0|1*|3|5} overhang - 5=5', 3=3', 0 for blunt, 1(d) for all. -O {###(x),min} ORF table for selected frames; 'x' -> extra info re AA compos'n ie: -O 135x,25 = fr 1,3,5 w/ xtra info; min ORF len = 25aas. 'x' -> 3 extra lines x 128 chars wide, mungs formal FASTA format. -p {Label,pattern[,Err]} cmd line entry of (degenerate) patterns to search for if Err is missing, it is set = 0, also sets -S for output. eg: -pFindMe,gyrttnnnnnnngct,1 looks for indicated pattern with 1 error. -P {Lab1,[+-lg]DistLo[-DistHi],Lab2} Proximity match'g for 2 named patterns. Lab1/2 patterns must be in a REBASE-format file in form: 'Lab1 1 IUPAC_pattern 0 Err !Comments ' where Err = max # errors allowed eg: 'FindMe 1 gyrttnnnnnnngct 0 1 !The pattern that I'm trying to find can repeat to specify up to 10 relationships at once + (-) Label1 is downstream (upstream) of Label2; default either l (g) Label1 is < (>) or = to 'DistLo' from Label2 'DistLo-DistHi' indicates an explicit distance range (obviating l,g) --ps writes Postscript plasmid map (tacg_Map.ps), forces circular DNA, notes degens around rim, can be combined with -O (above) to plot ORFs in any frames. Will do multiple pages with a multi-sequence file. --pdf converts above PS plasmid map to PDF (tacg_Map.pdf) via exec of '/usr/bin/gs', so ghostscript (gs) needs to be there. -q (default) (quiet) DISallows sending diagnostic UDP info back to author. -Q (UNquiet) sends ~100 bytes of UDP data back to author flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- --rule 'RuleName,((LabA:m:M&LabB:m:M)|(LabC:m:M&LabD:m:M))|(LabE:m:M),window' explicitly selects named patterns (<16) from a REBASE file with per pattern min/Max limits. 'RuleName' = name for the pattern len<11), 'window' = sliding window within which the rule must be true. Parens () enforce logic; otherwise expressions are eval'd L->R. LabX = Pat name; 'm' = min; M = Max; '&' = logical AND; '|' = OR; '^' = XOR. Enclosing single 'quotes' are REQUIRED. Valid patterns are logged to file 'tacg.patterns' in current dir for re-use. See manpage for info, examples --rulefile '/path/to/rulefile' loads a series of complex rules like those described in --rule above. Format is as in the single quotes above : RuleName,(rulestring, as above),window -r (--regex) {'Label:RegexPat'} search for RegexPat; use 'Label' for naming; translates IUPAC characters into std regex notat'n, escapes characters that require it: gy(tt|gc)nc{2,3}m -> g[ct]\(tt\|gc\).c\{2,3\}[ca] . NB: regex-matching is incompatible with regular IUPAC and matrix-matching. -r (--regex) {'FILE:FileOfRegexPats'} open the FILE 'FileOfRegexPats' and search for all Regex pat's in it; 'FILE' must be in CAPS to trigger this behavior. NB: -r REQUIRES the ':' separator and single quotes ' to enclose options. -R {alt pattern file} specifies alternative REBASE file in GCG format OR use it to specify the MATRIX data file in TRANSFAC format. --raw ANYTHING on STDIN is raw sequence; same behavior as pre-SEQIO. -s summary - print Table of Zero Cutters, # hits of each Enzyme. -S Sites - prints the the actual cut Sites in tabular form. flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- --silent searches for possible SILENT RE sites (those that won't cause trans'n to change. use -LT1,1 to see rev-trans sequence re-trans'd to verify that the seq is OK. Arg, Leu, Ser codons will not RT/FT. --tmppath passes a temporary path for cooperation with CGIs that need output in a particular place. Currently used in plasmid map generation. -T {[0*|1|3|6],[1|3]} Translates frames 1, 1-3, 1-6 w/ Linear Map using 1 or 3 letter labels. -T3,3 xlates frames 1,2,3 with 3 letter labels. -v prints version of the program, then dies. -V {1-3} Verbose/debug mode; spews diags to stderr (1=lots, 3=tons). -w {1|#} output width in bp's (60 < # < 210), truncated to a # mod 15 '-w 1' = 1 line output for easier parsing by external apps. -W (--slidwin) {#} Defines the sliding window for searching for pattern groups use w/ '-x' as a looser alternative to -P (Proximity matching in pairs) -x {Label(,=),Label..(,C)} selects SPECIFIC REs (<=15) from the REBASE file; If ',=' is appended to 1 RE Label, it will tag that RE for the AFLP analysis. See man page for details. If ',C' is appended to a list of >1 Labels, requests a multiple digest. flag | opts | explanation (* = default; # = an integer) -----+-------+-------------------------------------------------------------- -X (--extract) {b,e,[0|1]} eXtracts the sequence around the pattern matched, from b bases preceding, to e bases following the MIDDLE of pattern (if a normal pattern, the START of the pattern if a regular expression. If the pattern is found in the bottom strand AND the last field = 1, sequence is rev-compl'ed before it's extracted so all patterns are in same orientation; if last field = 0, it is NOT reverse compl'ed. -# {#} Matrix matching, using TRANSFAC-format matrix input. Required # is the matrix cutoff as a %. Uses '-R' to specify alternative matrix file, if not 'matrix.data'. Also works with '-x'. NB: matrix-matching is incompatible with regular pattern and regex matching. ex: tacg -f0 -n6 -T123,3 -sl -F3 < degen.input.file >output.file (to file) tacg -f 0 -l -n 5 -F 2 < input.seq.file (to stdout/screen) tacg -m 3 -T1,1 -s |grep HindIII < Ecoli_genome.genbank > out tacg --regex 'Rxname1:g(cc|tac)nr{2,4}yt outfile tacg -x HindIII,EcoRV,bamhi,C -O 134,30 -w 90 -LlS < input.seq.file for more help, try 'man tacg'. Type 'Ctrl+C' if the program seems locked. Latest docs & code at: http://tacg.sf.net. Contact author: hjm@tacgi.com