antigenic Wiki The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and extending the Wiki pages. Function Finds antigenic sites in proteins Description Antigenic predicts potentially antigenic regions of a protein sequence, using the method of Kolaskar and Tongaonkar. This method is based on a single parameter and thus very simple to use. Algorithm Analysis of data from experimentally determined antigenic sites on proteins has revealed that the hydrophobic residues Cys, Leu and Val, if they occur on the surface of a protein, are more likely to be a part of antigenic sites. The method of Kolaskar and Tongaonkar to predict antigenic determinants in proteins is semi-empirical and makes use of physicochemical properties of amino acid residues and their frequencies of occurrence in experimentally known segmental epitopes. Usage Here is a sample session with antigenic % antigenic Finds antigenic sites in proteins Input protein sequence(s): tsw:actb1_takru Minimum length of antigenic region [6]: Output report [actb1_takru.antigenic]: Go to the input files for this example Go to the output files for this example Example 2 By using the '-rformat gff' qualifier, a GFF file of the predicted regions can be produced. % antigenic -rformat gff Finds antigenic sites in proteins Input protein sequence(s): tsw:actb1_takru Minimum length of antigenic region [6]: Output report [actb1_takru.antigenic]: Go to the output files for this example Command line arguments Finds antigenic sites in proteins Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-sequence] seqall Protein sequence(s) filename and optional format, or reference (input USA) -minlen integer [6] Minimum length of antigenic region (Integer from 1 to 50) [-outfile] report [*.antigenic] Output report file name (default -rformat motif) Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rstrandshow2 boolean Show the nucleotide strand in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit Input file format The input sequence can be one or more protein sequences. Input files for usage example 'tsw:actb1_takru' is a sequence entry in the example protein database 'tsw' Database entry: tsw:actb1_takru ID ACTB1_TAKRU Reviewed; 375 AA. AC P68142; P53484; DT 25-OCT-2004, integrated into UniProtKB/Swiss-Prot. DT 25-OCT-2004, sequence version 1. DT 02-MAR-2010, entry version 39. DE RecName: Full=Actin, cytoplasmic 1; DE AltName: Full=Beta-actin A; DE Contains: DE RecName: Full=Actin, cytoplasmic 1, N-terminally processed; GN Name=actba; OS Takifugu rubripes (Japanese pufferfish) (Fugu rubripes). OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; OC Actinopterygii; Neopterygii; Teleostei; Euteleostei; Neoteleostei; OC Acanthomorpha; Acanthopterygii; Percomorpha; Tetraodontiformes; OC Tetradontoidea; Tetraodontidae; Takifugu. OX NCBI_TaxID=31033; RN [1] RP NUCLEOTIDE SEQUENCE [GENOMIC DNA], AND TISSUE SPECIFICITY. RX MEDLINE=96275651; PubMed=8683572; DOI=10.1006/jmbi.1996.0347; RA Venkatesh B., Tay B.H., Elgar G., Brenner S.; RT "Isolation, characterization and evolution of nine pufferfish (Fugu RT rubripes) actin genes."; RL J. Mol. Biol. 259:655-665(1996). CC -!- FUNCTION: Actins are highly conserved proteins that are involved CC in various types of cell motility and are ubiquitously expressed CC in all eukaryotic cells. CC -!- SUBUNIT: Polymerization of globular actin (G-actin) leads to a CC structural filament (F-actin) in the form of a two-stranded helix. CC Each actin can bind to 4 others. CC -!- SUBCELLULAR LOCATION: Cytoplasm, cytoskeleton. CC -!- TISSUE SPECIFICITY: Widely distributed. Not expressed in skeletal CC muscle. CC -!- MISCELLANEOUS: In vertebrates 3 main groups of actin isoforms, CC alpha, beta and gamma have been identified. The alpha actins are CC found in muscle tissues and are a major constituent of the CC contractile apparatus. The beta and gamma actins coexist in most CC cell types as components of the cytoskeleton and as mediators of CC internal cell motility. CC -!- MISCELLANEOUS: There are three different beta-cytoplasmic actins CC in Fugu rubripes. CC -!- SIMILARITY: Belongs to the actin family. CC ----------------------------------------------------------------------- CC Copyrighted by the UniProt Consortium, see http://www.uniprot.org/terms CC Distributed under the Creative Commons Attribution-NoDerivs License CC ----------------------------------------------------------------------- DR EMBL; U37499; AAC59889.1; -; Genomic_DNA. DR PIR; S71124; S71124. DR SMR; P68142; 2-375. DR Ensembl; ENSTRUT00000013141; ENSTRUP00000013080; ENSTRUG00000005447; Takifu gu rubripes. DR eggNOG; fiNOG04759; -. DR InParanoid; P68142; -. DR PhylomeDB; P68142; -. DR GO; GO:0005737; C:cytoplasm; IEA:UniProtKB-KW. DR GO; GO:0005856; C:cytoskeleton; IEA:UniProtKB-SubCell. DR GO; GO:0005524; F:ATP binding; IEA:UniProtKB-KW. DR GO; GO:0005515; F:protein binding; IEA:InterPro. DR InterPro; IPR004000; Actin-like. DR InterPro; IPR020902; Actin/actin-like_CS. DR InterPro; IPR004001; Actin_CS. DR PANTHER; PTHR11937; Actin_like; 1. DR Pfam; PF00022; Actin; 1. DR PRINTS; PR00190; ACTIN. DR SMART; SM00268; ACTIN; 1. DR PROSITE; PS00406; ACTINS_1; 1. DR PROSITE; PS00432; ACTINS_2; 1. DR PROSITE; PS01132; ACTINS_ACT_LIKE; 1. PE 2: Evidence at transcript level; KW Acetylation; ATP-binding; Cytoplasm; Cytoskeleton; Methylation; KW Nucleotide-binding. FT CHAIN 1 375 Actin, cytoplasmic 1. FT /FTId=PRO_0000367094. FT INIT_MET 1 1 Removed; alternate (By similarity). FT CHAIN 2 375 Actin, cytoplasmic 1, N-terminally FT processed. FT /FTId=PRO_0000000809. FT MOD_RES 1 1 N-acetylmethionine; in Actin, cytoplasmic FT 1; alternate (By similarity). FT MOD_RES 2 2 N-acetylglutamate; in Actin, cytoplasmic FT 1, N-terminally processed (By FT similarity). FT MOD_RES 73 73 Tele-methylhistidine (By similarity). SQ SEQUENCE 375 AA; 41767 MW; 9C505616D33E9495 CRC64; MEDEIAALVV DNGSGMCKAG FAGDDAPRAV FPSIVGRPRH QGVMVGMGQK DSYVGDEAQS KRGILTLKYP IEHGIVTNWD DMEKIWHHTF YNELRVAPEE HPVLLTEAPL NPKANREKMT QIMFETFNTP AMYVAIQAVL SLYASGRTTG IVMDSGDGVT HTVPIYEGYA LPHAILRLDL AGRDLTDYLM KILTERGYSF TTTAEREIVR DIKEKLCYVA LDFEQEMGTA ASSSSLEKSY ELPDGQVITI GNERFRCPEA LFQPSFLGME SCGIHETTYN SIMKCDVDIR KDLYANTVLS GGTTMYPGIA DRMQKEITAL APSTMKIKII APPERKYSVW IGGSILASLS TFQQMWISKQ EYDESGPSIV HRKCF // Output file format The output is a standard EMBOSS report file. The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, draw, restrict, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq. See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats. By default antigenic writes a 'motif' report file. Output files for usage example File: actb1_takru.antigenic ######################################## # Program: antigenic # Rundate: Fri 15 Jul 2011 12:00:00 # Commandline: antigenic # -sequence tsw:actb1_takru # Report_format: motif # Report_file: actb1_takru.antigenic ######################################## #======================================= # # Sequence: ACTB1_TAKRU from: 1 to: 375 # HitCount: 18 #======================================= Max_score_pos at "*" (1) Score 1.207 length 9 at residues 214->222 * Sequence: EKLCYVALD | | 214 222 Max_score_pos: 218 (2) Score 1.187 length 15 at residues 131->145 * Sequence: AMYVAIQAVLSLYAS | | 131 145 Max_score_pos: 137 (3) Score 1.166 length 8 at residues 5->12 * Sequence: IAALVVDN | | 5 12 Max_score_pos: 8 (4) Score 1.164 length 12 at residues 27->38 * Sequence: PRAVFPSIVGRP | | 27 38 Max_score_pos: 32 (5) Score 1.136 length 24 at residues 160->183 * Sequence: THTVPIYEGYALPHAILRLDLAGR | | 160 183 [Part of this file has been deleted for brevity] * Sequence: SSSSLEKSYELPDGQVITI | | 232 250 Max_score_pos: 245 (13) Score 1.083 length 6 at residues 327->332 * Sequence: IKIIAP | | 327 332 Max_score_pos: 330 (14) Score 1.074 length 7 at residues 317->323 * Sequence: ITALAPS | | 317 323 Max_score_pos: 320 (15) Score 1.068 length 7 at residues 186->192 * Sequence: TDYLMKI | | 186 192 Max_score_pos: 191 (16) Score 1.066 length 7 at residues 40->46 * Sequence: HQGVMVG | | 40 46 Max_score_pos: 43 (17) Score 1.045 length 7 at residues 269->275 * Sequence: MESCGIH | | 269 275 Max_score_pos: 269 (18) Score 1.034 length 7 at residues 51->57 * Sequence: DSYVGDE | | 51 57 Max_score_pos: 52 #--------------------------------------- #--------------------------------------- Output files for usage example 2 File: actb1_takru.antigenic ##gff-version 3 ##sequence-region ACTB1_TAKRU 1 375 #!Date 2011-07-15 #!Type Protein #!Source-version EMBOSS 6.4.0.0 ACTB1_TAKRU antigenic epitope 214 222 1.207 + . ID=ACTB1_TAKRU.1;note=*pos 218 ACTB1_TAKRU antigenic epitope 131 145 1.187 + . ID=ACTB1_TAKRU.2;note=*pos 137 ACTB1_TAKRU antigenic epitope 5 12 1.166 + . ID=ACTB1_TAKRU.3;note=*pos 8 ACTB1_TAKRU antigenic epitope 27 38 1.164 + . ID=ACTB1_TAKRU.4;note=*pos 32 ACTB1_TAKRU antigenic epitope 160 183 1.136 + . ID=ACTB1_TAKRU.5;note=*pos 173 ACTB1_TAKRU antigenic epitope 367 372 1.135 + . ID=ACTB1_TAKRU.6;note=*pos 372 ACTB1_TAKRU antigenic epitope 93 108 1.116 + . ID=ACTB1_TAKRU.7;note=*pos 103 ACTB1_TAKRU antigenic epitope 295 301 1.113 + . ID=ACTB1_TAKRU.8;note=*pos 296 ACTB1_TAKRU antigenic epitope 256 266 1.110 + . ID=ACTB1_TAKRU.9;note=*pos 264 ACTB1_TAKRU antigenic epitope 336 352 1.107 + . ID=ACTB1_TAKRU.10;note=*pos 347 ACTB1_TAKRU antigenic epitope 62 76 1.102 + . ID=ACTB1_TAKRU.11;note=*pos 68 ACTB1_TAKRU antigenic epitope 232 250 1.086 + . ID=ACTB1_TAKRU.12;note=*pos 245 ACTB1_TAKRU antigenic epitope 327 332 1.083 + . ID=ACTB1_TAKRU.13;note=*pos 330 ACTB1_TAKRU antigenic epitope 317 323 1.074 + . ID=ACTB1_TAKRU.14;note=*pos 320 ACTB1_TAKRU antigenic epitope 186 192 1.068 + . ID=ACTB1_TAKRU.15;note=*pos 191 ACTB1_TAKRU antigenic epitope 40 46 1.066 + . ID=ACTB1_TAKRU.16;note=*pos 43 ACTB1_TAKRU antigenic epitope 269 275 1.045 + . ID=ACTB1_TAKRU.17;note=*pos 269 ACTB1_TAKRU antigenic epitope 51 57 1.034 + . ID=ACTB1_TAKRU.18;note=*pos 52 Data files Antigenic uses a data file called Eantigenic.dat. EMBOSS data files are distributed with the application and stored in the standard EMBOSS data directory, which is defined by the EMBOSS environment variable EMBOSS_DATA. To see the available EMBOSS data files, run: % embossdata -showall To fetch one of the data files (for example 'Exxx.dat') into your current directory for you to inspect or modify, run: % embossdata -fetch -file Exxx.dat Users can provide their own data files in their own directories. Project specific files can be put in the current directory, or for tidier directory listings in a subdirectory called ".embossdata". Files for all EMBOSS runs can be put in the user's home directory, or again in a subdirectory called ".embossdata". The directories are searched in the following order: * . (your current directory) * .embossdata (under your current directory) * ~/ (your home directory) * ~/.embossdata Here is the default Eantigenic.dat file: # Kolaskar AS and Tongaonkar PC (1990) FEBS Letters 276:172-174 # "A semi-emipirical method for prediction of antigenic determinants # on protein antigens" # # TABLE 1: Occurrence of amino acids in epitopes, proteins and on the surface, # and their antigenic propensity, A(p), values # # 169 antigenic determinants experimentally determined. Selected those 156 # which have less than 20 amino acids per determinant (total 2066 residues). # Calculated f(Ag) as frequency of occurrence of each residue in antigenic # determinants [f(Ag) = Epitope_occurrence/2066]. # # Used Hydrophilicity, Accessibility and Flexibility of Parker JMR, Guo D, # Hodges, RS (1986) Biochemistry 25:5425-5432. In a given protein, calculated # average for each 7-mer and assigned values to central residue of 7-mer. # Residue considered to be on the surface if any of the 7-mer values was above # the average for the protein. Used these results to get f(s) frequency of # occurrence of amino acids at the surface. # # Original table covers the 20 naturally occurring amino acids. # Values for B, Z, X use weighted averages from Edayhoff.dat # and are ignored when calculating totals # # Antigenic propensity column A(p) = f(Ag)/f(s) # # f(s) values below were back-calculated from the table in the paper # # Prediction algorithm: # # 1. calculate average propensity for each overlapping 7-mer, assign to # central residue (i+3) # # 2. calculate average for whole protein A(p)av # # 3. (a) if average for whole protein >= 1.0 then all residues having # >= 1.0 are potentially antigenic. # (b) if average for whole protein < 1.0 then all residues having # > average for whole protein (??? paper has a mangled # formula here :-) are potentially antigenic. # # 4. Find 6-mers where all residues are selected by step 3 above # # Antigenic Surface Antigenic # Amino -- Occurrence of amino acids in -- frequency frequency propensity # Acid Epitopes Surface Protein f(Ag) f(s) A(p) A 135 328 524 0.065 0.061 1.064 B 107 334 410 0.052 0.062 0.827 C 53 97 186 0.026 0.018 1.412 D 118 352 414 0.057 0.066 0.866 E 132 401 499 0.064 0.075 0.851 F 76 180 365 0.037 0.034 1.091 G 116 343 487 0.056 0.064 0.874 H 59 138 191 0.029 0.026 1.105 I 86 193 437 0.042 0.036 1.152 K 158 439 523 0.076 0.082 0.930 L 149 308 684 0.072 0.058 1.250 M 23 72 152 0.011 0.013 0.826 N 94 313 407 0.045 0.058 0.776 P 135 328 411 0.065 0.061 1.064 Q 99 252 332 0.048 0.047 1.015 R 106 314 394 0.051 0.058 0.873 S 168 429 553 0.081 0.080 1.012 T 141 401 522 0.068 0.075 0.909 V 128 239 515 0.062 0.045 1.383 W 19 55 103 0.009 0.010 0.893 X 118 306 453 0.057 0.057 1.025 Y 71 158 245 0.034 0.029 1.161 Z 119 342 433 0.058 0.064 0.916 Total 2066 5340 7944 Notes Application of this method to a large number of proteins has shown that their method can predict antigenic determinants with about 75% accuracy which is better than most of the known methods. References 1. Kolaskar,AS and Tongaonkar,PC (1990). A semi-empirical method for prediction of antigenic determinants on protein antigens. FEBS Letters 276: 172-174. 2. Parker,JMR, Guo,D and Hodges,RS (1986). Biochemistry 25: 5425-5432. Warnings The program will warn you if the sequence is not a protein or has ambiguity codes. Diagnostic Error Messages Exit status It exits with status 0. Known bugs None. See also Program name Description epestfind Finds PEST motifs as potential proteolytic cleavage sites fuzzpro Search for patterns in protein sequences fuzztran Search for patterns in protein sequences (translated) patmatdb Searches protein sequences with a sequence motif patmatmotifs Scan a protein sequence with motifs from the PROSITE database preg Regular expression search of protein sequence(s) pscan Scans protein sequence(s) with fingerprints from the PRINTS database sigcleave Reports on signal cleavage sites in a protein sequence Author(s) Alan Bleasby European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. Original program "ANTIGENIC" by Peter Rice (EGCG 1991) Peter Rice European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. History Completed 9th March 1999 Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts. Comments None