pepcoil Wiki The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and extending the Wiki pages. Function Predicts coiled coil regions in protein sequences Description pepcoil calculates the probability of a coiled-coil structure for windows of (by default) 28 residues through a protein sequence using the method of Lupas A, van Dyke M & Stock J (1991); Science 252:1162-4. It reads one or more sequences and writes a standard EMBOSS report file including the location and score of the putative coiled-coil structures and their score for each input sequence. Optionally, coiled coil regions, non-coiled coil regions and coil frameshifts are reported. Usage Here is a sample session with pepcoil % pepcoil Predicts coiled coil regions in protein sequences Input protein sequence(s): tsw:gcn4_yeast Window size [28]: Output report [gcn4_yeast.pepcoil]: Go to the input files for this example Go to the output files for this example Command line arguments Predicts coiled coil regions in protein sequences Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-sequence] seqall Protein sequence(s) filename and optional format, or reference (input USA) -window integer [28] Window size (Integer from 7 to 28) [-outfile] report [*.pepcoil] Output report file name (default -rformat motif) Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: -[no]coil boolean [Y] Report coiled coil regions -frame boolean [Yes if -coil is true] Show coil frameshifts -other boolean [N] Report non coiled coil regions Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rstrandshow2 boolean Show the nucleotide strand in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit Input file format pepcoil reads one or more protein sequences. The input is a standard EMBOSS sequence query (also known as a 'USA'). Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application. The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug. See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats. Input files for usage example 'tsw:gcn4_yeast' is a sequence entry in the example protein database 'tsw' Database entry: tsw:gcn4_yeast ID GCN4_YEAST Reviewed; 281 AA. AC P03069; P03068; Q70D88; Q70D91; Q70D96; Q70D99; Q70DA0; Q96UT3; DT 21-JUL-1986, integrated into UniProtKB/Swiss-Prot. DT 21-JUL-1986, sequence version 1. DT 15-JUN-2010, entry version 120. DE RecName: Full=General control protein GCN4; DE AltName: Full=Amino acid biosynthesis regulatory protein; GN Name=GCN4; Synonyms=AAS3, ARG9; OrderedLocusNames=YEL009C; OS Saccharomyces cerevisiae (Baker's yeast). OC Eukaryota; Fungi; Dikarya; Ascomycota; Saccharomyceta; OC Saccharomycotina; Saccharomycetes; Saccharomycetales; OC Saccharomycetaceae; Saccharomyces. OX NCBI_TaxID=4932; RN [1] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA]. RX MEDLINE=85038531; PubMed=6387704; DOI=10.1073/pnas.81.20.6442; RA Hinnebusch A.G.; RT "Evidence for translational regulation of the activator of general RT amino acid control in yeast."; RL Proc. Natl. Acad. Sci. U.S.A. 81:6442-6446(1984). RN [2] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA]. RX MEDLINE=84298088; PubMed=6433345; DOI=10.1073/pnas.81.16.5096; RA Thireos G., Penn M.D., Greer H.; RT "5' untranslated sequences are required for the translational control RT of a yeast regulatory gene."; RL Proc. Natl. Acad. Sci. U.S.A. 81:5096-5100(1984). RN [3] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND VARIANTS PRO-24; SER-62; RP ALA-82; ALA-91; ALA-125 AND GLU-196. RC STRAIN=CLIB 219, CLIB 382, CLIB 388, CLIB 410, CLIB 413, CLIB 556, RC CLIB 630, CLIB 95, K1, R12, R13, Sigma 1278B, YIIc12, and YIIc17; RX PubMed=15087486; DOI=10.1093/nar/gkh529; RA Leh-Louis V., Wirth B., Despons L., Wain-Hobson S., Potier S., RA Souciet J.-L.; RT "Differential evolution of the Saccharomyces cerevisiae DUP240 RT paralogs and implication of recombination in phylogeny."; RL Nucleic Acids Res. 32:2069-2078(2004). RN [4] RP NUCLEOTIDE SEQUENCE [LARGE SCALE GENOMIC DNA]. RC STRAIN=ATCC 204511 / S288c / AB972; RX MEDLINE=97313264; PubMed=9169868; RA Dietrich F.S., Mulligan J.T., Hennessy K.M., Yelton M.A., Allen E., RA Araujo R., Aviles E., Berno A., Brennan T., Carpenter J., Chen E., RA Cherry J.M., Chung E., Duncan M., Guzman E., Hartzell G., RA Hunicke-Smith S., Hyman R.W., Kayser A., Komp C., Lashkari D., Lew H., RA Lin D., Mosedale D., Nakahara K., Namath A., Norgren R., Oefner P., RA Oh C., Petel F.X., Roberts D., Sehl P., Schramm S., Shogren T., RA Smith V., Taylor P., Wei Y., Botstein D., Davis R.W.; RT "The nucleotide sequence of Saccharomyces cerevisiae chromosome V."; [Part of this file has been deleted for brevity] FT /FTId=PRO_0000076490. FT DOMAIN 253 274 Leucine-zipper. FT DNA_BIND 231 249 Basic motif. FT REGION 89 100 Required for transcriptional activation. FT REGION 106 125 Required for transcriptional activation. FT MOD_RES 17 17 Phosphoserine. FT MOD_RES 165 165 Phosphothreonine; by PHO85. FT MOD_RES 218 218 Phosphoserine. FT VARIANT 24 24 S -> P (in strain: CLIB 219). FT VARIANT 62 62 P -> S (in strain: CLIB 630 haplotype FT Ha2). FT VARIANT 82 82 T -> A (in strain: CLIB 556 haplotype FT Ha1). FT VARIANT 91 91 D -> A (in strain: CLIB 95, CLIB 219, FT CLIB 382, CLIB 388, CLIB 410, CLIB 413, FT CLIB 556, CLIB 630, K1, R12, R13 FT haplotype Ha2, Sigma 1278B haplotype Ha1, FT YIIc12 and YIIc17). FT VARIANT 125 125 D -> A (in strain: CLIB 556 haplotype FT Ha1). FT VARIANT 196 196 D -> E (in strain: CLIB 388, CLIB 410, FT CLIB 413, CLIB 630 haplotype Ha1, K1, FT YIIc12 haplotype Ha2 and YIIc17 haplotype FT Ha1). FT MUTAGEN 97 98 FF->AA: Reduces transcriptional FT activation activity; when associated with FT A-107; A-110; A-113; A-120; A-123 and A- FT 124. FT MUTAGEN 107 107 M->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-110; A-113; A-120; A-123 and A-124. FT MUTAGEN 110 110 Y->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-107; A-113; A-120; A-123 and A-124. FT MUTAGEN 113 113 L->A: Reduces transcriptional activation FT activity; when associated with A-97; A- FT 98; A-107; A-110; A-120; A-123 and A-124. FT MUTAGEN 120 124 WTSLF->ATSAA: Reduces transcriptional FT activation activity; when associated with FT A-97; A-98; A-107; A-110 and A-113. FT CONFLICT 239 281 ARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVG FT ER -> PGVLVRESCKE (in Ref. 2; AAA65521). FT HELIX 230 248 FT HELIX 251 280 SQ SEQUENCE 281 AA; 31310 MW; 2ED1B8E35D509578 CRC64; MSEYQPSLFA LNPMGFSPLD GSKSTNENVS ASTSTAKPMV GQLIFDKFIK TEEDPIIKQD TPSNLDFDFA LPQTATAPDA KTVLPIPELD DAVVESFFSS STDSTPMFEY ENLEDNSKEW TSLFDNDIPV TTDDVSLADK AIESTEEVSL VPSNLEVSTT SFLPTPVLED AKLTQTRKVK KPNSVVKKSH HVGKDDESRL DHLGVVAYNR KQRSIPLSPI VPESSDPAAL KRARNTEAAR RSRARKLQRM KQLEDKVEEL LSKNYHLENE VARLKKLVGE R // Output file format The output is a standard EMBOSS report file. The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, draw, restrict, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq. See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats. By default the report is in 'motif' format. Output files for usage example The SwissProt annotation marks the true leucine zipper motif as from 253 to 274. The leucine zipper is a special case of a coiled-coil region. File: gcn4_yeast.pepcoil ######################################## # Program: pepcoil # Rundate: Fri 15 Jul 2011 12:00:00 # Commandline: pepcoil # -sequence tsw:gcn4_yeast # Report_format: motif # Report_file: gcn4_yeast.pepcoil ######################################## #======================================= # # Sequence: GCN4_YEAST from: 1 to: 281 # HitCount: 1 # # Window size: 28 residues # #======================================= Max_coil_pos at "*" (1) Score 1.910 length 49 at residues 233->281 * Sequence: ARNTEAARRSRARKLQRMKQLEDKVEELLSKNYHLENEVARLKKLVGER | | 233 281 Probability: 1.000 Predict: coiled Max_coil_pos: 249 #--------------------------------------- #--------------------------------------- Data files None. Notes Coiled coils are formed by two or three alpha helices in parallel and in register that cross at an angle of approximately 20 degrees, are strongly amphipathic and display a pattern of hydrophilic and hydrophobic residues that is repeated every seven residues. The seven positions of the heptad repeat are designated a through g, a and d being generally hydrophobic, while the others are hydrophilic. The parallel two-stranded alpha-helical coiled coil is the most frequently encountered subunit-oligomerization motif in proteins. References 1. Lupas A, van Dyke M & Stock J; Predicting Coiled Coils from Protein Sequences. Science 252:1162-4 (1991) Warnings None. Diagnostic Error Messages None. Exit status It always exits with a status of 0. Known bugs None. See also Program name Description garnier Predicts protein secondary structure using GOR method helixturnhelix Identify nucleic acid-binding motifs in protein sequences pepnet Draw a helical net for a protein sequence pepwheel Draw a helical wheel diagram for a protein sequence Author(s) Alan Bleasby European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. Original program "PEPCOIL" by Peter Rice (EGCG 1991) History Written (1999) - Alan Bleasby Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts. Comments None