sirna Wiki The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and extending the Wiki pages. Function Finds siRNA duplexes in mRNA Description Finds siRNA duplexes in mRNA. The output is a standard EMBOSS report file. The siRNAs are reported in order of best score first. sirna reports both the sense and antisense siRNAs as 5' to 3'. Algorithm for each input sequence: find the start position of the CDS in the feature table if there is no such CDS, take the -sbegin position as the CDS start for each 23 base window along the sequence: set the score for this window = 0 if base 2 of the window is not 'a': ignore this window if the window is within 50 bases of the CDS start: ignore this window if the window is within 100 bases of the CDS: score = -2 measure the %GC of the 20 bases from position 2 to 21 of the window for the following %GC values change the score: %GC <= 25% (<= 5 bases): ignore this window %GC 30% (6 bases): score + 0 %GC 35% (7 bases): score + 2 %GC 40% (8 bases): score + 4 %GC 45% (9 bases): score + 5 %GC 50% (10 bases): score + 6 %GC 55% (11 bases): score + 5 %GC 60% (12 bases): score + 4 %GC 65% (13 bases): score + 2 %GC 70% (14 bases): score + 0 %GC >= 75% (>= 15 bases): ignore this window if the window starts with a 'AA': score + 3 if the window does not start 'AA' and it is required: ignore this window if the window ends with a 'TT': score + 1 if the window does not end 'TT' and it is required: ignore this window if 4 G's in a row are found: ignore this window if any 4 bases in a row are present and not required: ignore this window if PolIII probes are required and the window is not NARN(17)YNN: ignore this window if the score is > 0: store this window for output sort the windows found by their score output the 23-base windows to the sequence file if the 'context' qualifier is specified, output window bases 1 and 2 in brac kets to the report file take the window bases 3 to 21, add 'dTdT' output to the report file take the window bases 3 to 21, reverse complement, add 'dTdT' output to the report file Usage Here is a sample session with sirna % sirna Finds siRNA duplexes in mRNA Input nucleotide sequence(s): tembl:x65923 Output report [x65923.sirna]: output sequence(s) [x65923.fasta]: Go to the input files for this example Go to the output files for this example Example 2 Show the first two bases of the 23 base target region in brackets. These do not form part of the sequence to be ordered, but it is useful to see if the 23 base region starts with an 'AA'. % sirna -context Finds siRNA duplexes in mRNA Input nucleotide sequence(s): tembl:x65923 Output report [x65923.sirna]: output sequence(s) [x65923.fasta]: Go to the output files for this example Command line arguments Finds siRNA duplexes in mRNA Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-sequence] seqall Nucleotide sequence(s) filename and optional format, or reference (input USA) [-outfile] report [*.sirna] The output is a table of the forward and reverse parts of the 21 base siRNA duplex. Both the forward and reverse sequences are written 5' to 3', ready to be ordered. The last two bases have been replaced by 'dTdT'. The starting position of the 23 base region and the %GC content is also given. If you wish to see the complete 23 base sequence, then either look at the sequence in the other output file, or use the qualifier '-context' which will display the 23 bases of the forward sequence in this report with the first two bases in brackets. These first two bases do not form part of the siRNA probe to be ordered. (default -rformat table) [-outseq] seqoutall [.] This is a file of the sequences of the 23 base regions that the siRNAs are selected from. You may use it to do searches of mRNA databases (e.g. REFSEQ) to confirm that the probes are unique to the gene you wish to use it on. Additional (Optional) qualifiers: -poliii boolean [N] This option allows you to select only the 21 base probes that start with a purine and so can be expressed from Pol III expression vectors. This is the NARN(17)YNN pattern that has been suggested by Tuschl et al. -aa boolean [N] This option allows you to select only those 23 base regions that start with AA. If this option is not selected then regions that start with AA will be favoured by giving them a higher score, but regions that do not start with AA will also be reported. -tt boolean [N] This option allows you to select only those 23 base regions that end with TT. If this option is not selected then regions that end with TT will be favoured by giving them a higher score, but regions that do not end with TT will also be reported. -[no]polybase boolean [Y] If this option is FALSE then only those 23 base regions that have no repeat of 4 or more of any bases in a row will be reported. No regions will ever be reported that have 4 or more G's in a row. -context boolean [N] The output report file gives the sequences of the 21 base siRNA regions ready to be ordered. This does not give you an indication of the 2 bases before the 21 bases. It is often interesting to see which of the suggested possible probe regions have an 'AA' in front of them (i.e. it is useful to see which of the 23 base regions start with an 'AA'). This option displays the whole 23 bases of the region with the first two bases in brackets, e.g. '(AA)' to give you some context for the probe region. YOU SHOULD NOT INCLUDE THE TWO BASES IN BRACKETS WHEN YOU PLACE AN ORDER FOR THE PROBES. Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-sequence" associated qualifiers -sbegin1 integer Start of each sequence to be used -send1 integer End of each sequence to be used -sreverse1 boolean Reverse (if DNA) -sask1 boolean Ask for begin/end/reverse -snucleotide1 boolean Sequence is nucleotide -sprotein1 boolean Sequence is protein -slower1 boolean Make lower case -supper1 boolean Make upper case -sformat1 string Input sequence format -sdbname1 string Database name -sid1 string Entryname -ufo1 string UFO features -fformat1 string Features format -fopenfile1 string Features file name "-outfile" associated qualifiers -rformat2 string Report format -rname2 string Base file name -rextension2 string File name extension -rdirectory2 string Output directory -raccshow2 boolean Show accession number in the report -rdesshow2 boolean Show description in the report -rscoreshow2 boolean Show the score in the report -rstrandshow2 boolean Show the nucleotide strand in the report -rusashow2 boolean Show the full USA in the report -rmaxall2 integer Maximum total hits to report -rmaxseq2 integer Maximum hits to report for one sequence "-outseq" associated qualifiers -osformat3 string Output seq format -osextension3 string File name extension -osname3 string Base file name -osdirectory3 string Output directory -osdbname3 string Database name to add -ossingle3 boolean Separate file for each entry -oufo3 string UFO features -offormat3 string Features format -ofname3 string Features file name -ofdirectory3 string Output directory General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit Input file format The input is a standard EMBOSS sequence query (also known as a 'USA'). Major sequence database sources defined as standard in EMBOSS installations include srs:embl, srs:uniprot and ensembl Data can also be read from sequence output in any supported format written by an EMBOSS or third-party application. The input format can be specified by using the command-line qualifier -sformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: gff (gff3), gff2, embl (em), genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw), dasgff and debug. See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further information on sequence formats. Input files for usage example 'tembl:x65923' is a sequence entry in the example nucleic acid database 'tembl' Database entry: tembl:x65923 ID X65923; SV 1; linear; mRNA; STD; HUM; 518 BP. XX AC X65923; XX DT 13-MAY-1992 (Rel. 31, Created) DT 18-APR-2005 (Rel. 83, Last updated, Version 11) XX DE H.sapiens fau mRNA XX KW fau gene. XX OS Homo sapiens (human) OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; Mammalia; OC Eutheria; Euarchontoglires; Primates; Haplorrhini; Catarrhini; Hominidae; OC Homo. XX RN [1] RP 1-518 RA Michiels L.M.R.; RT ; RL Submitted (29-APR-1992) to the EMBL/GenBank/DDBJ databases. RL L.M.R. Michiels, University of Antwerp, Dept of Biochemistry, RL Universiteisplein 1, 2610 Wilrijk, BELGIUM XX RN [2] RP 1-518 RX PUBMED; 8395683. RA Michiels L., Van der Rauwelaert E., Van Hasselt F., Kas K., Merregaert J.; RT "fau cDNA encodes a ubiquitin-like-S30 fusion protein and is expressed as RT an antisense sequence in the Finkel-Biskis-Reilly murine sarcoma virus"; RL Oncogene 8(9):2537-2546(1993). XX DR H-InvDB; HIT000322806. XX FH Key Location/Qualifiers FH FT source 1..518 FT /organism="Homo sapiens" FT /chromosome="11q" FT /map="13" FT /mol_type="mRNA" FT /clone_lib="cDNA" FT /clone="pUIA 631" FT /tissue_type="placenta" FT /db_xref="taxon:9606" FT misc_feature 57..278 FT /note="ubiquitin like part" FT CDS 57..458 FT /gene="fau" FT /db_xref="GDB:135476" FT /db_xref="GOA:P35544" FT /db_xref="GOA:P62861" FT /db_xref="HGNC:3597" FT /db_xref="InterPro:IPR000626" FT /db_xref="InterPro:IPR006846" FT /db_xref="InterPro:IPR019954" FT /db_xref="InterPro:IPR019955" FT /db_xref="InterPro:IPR019956" FT /db_xref="UniProtKB/Swiss-Prot:P35544" FT /db_xref="UniProtKB/Swiss-Prot:P62861" FT /protein_id="CAA46716.1" FT /translation="MQLFVRAQELHTFEVTGQETVAQIKAHVASLEGIAPEDQVVLLAG FT APLEDEATLGQCGVEALTTLEVAGRMLGGKVHGSLARAGKVRGQTPKVAKQEKKKKKTG FT RAKRRMQYNRRFVNVVPTFGKKKGPNANS" FT misc_feature 98..102 FT /note="nucleolar localization signal" FT misc_feature 279..458 FT /note="S30 part" FT polyA_signal 484..489 FT polyA_site 509 XX SQ Sequence 518 BP; 125 A; 139 C; 148 G; 106 T; 0 other; ttcctctttc tcgactccat cttcgcggta gctgggaccg ccgttcagtc gccaatatgc 60 agctctttgt ccgcgcccag gagctacaca ccttcgaggt gaccggccag gaaacggtcg 120 cccagatcaa ggctcatgta gcctcactgg agggcattgc cccggaagat caagtcgtgc 180 tcctggcagg cgcgcccctg gaggatgagg ccactctggg ccagtgcggg gtggaggccc 240 tgactaccct ggaagtagca ggccgcatgc ttggaggtaa agttcatggt tccctggccc 300 gtgctggaaa agtgagaggt cagactccta aggtggccaa acaggagaag aagaagaaga 360 agacaggtcg ggctaagcgg cggatgcagt acaaccggcg ctttgtcaac gttgtgccca 420 cctttggcaa gaagaagggc cccaatgcca actcttaagt cttttgtaat tctggctttc 480 tctaataaaa aagccactta gttcagtcaa aaaaaaaa 518 // Output file format The output is a standard EMBOSS report file. The results can be output in one of several styles by using the command-line qualifier -rformat xxx, where 'xxx' is replaced by the name of the required format. The available format names are: embl, genbank, gff, pir, swiss, dasgff, debug, listfile, dbmotif, diffseq, draw, restrict, excel, feattable, motif, nametable, regions, seqtable, simple, srs, table, tagseq. See: http://emboss.sf.net/docs/themes/ReportFormats.html for further information on report formats. sirna outputs a report format file. The default format is 'table'. Output files for usage example File: x65923.sirna ######################################## # Program: sirna # Rundate: Fri 15 Jul 2011 12:00:00 # Commandline: sirna # -sequence tembl:x65923 # Report_format: table # Report_file: x65923.sirna ######################################## #======================================= # # Sequence: X65923 from: 1 to: 518 # HitCount: 85 # # CDS region found in feature table starting at 57 # #======================================= Start End Strand Score GC% Sense_siRNA Antisense _siRNA 308 330 + 9.000 50.0 AAGUGAGAGGUCAGACUCCdTdT GGAGUCUGACCUCUCAC UUdTdT 309 331 + 9.000 50.0 AGUGAGAGGUCAGACUCCUdTdT AGGAGUCUGACCUCUCA CUdTdT 310 332 + 9.000 50.0 GUGAGAGGUCAGACUCCUAdTdT UAGGAGUCUGACCUCUC ACdTdT 351 373 + 9.000 50.0 GAAGAAGAAGACAGGUCGGdTdT CCGACCUGUCUUCUUCU UCdTdT 166 188 + 8.000 55.0 GAUCAAGUCGUGCUCCUGGdTdT CCAGGAGCACGACUUGA UCdTdT 279 301 + 8.000 55.0 AGUUCAUGGUUCCCUGGCCdTdT GGCCAGGGAACCAUGAA CUdTdT 330 352 + 8.000 55.0 GGUGGCCAAACAGGAGAAGdTdT CUUCUCCUGUUUGGCCA CCdTdT 354 376 + 8.000 55.0 GAAGAAGACAGGUCGGGCUdTdT AGCCCGACCUGUCUUCU UCdTdT 357 379 + 8.000 55.0 GAAGACAGGUCGGGCUAAGdTdT CUUAGCCCGACCUGUCU UCdTdT 393 415 + 8.000 55.0 CCGGCGCUUUGUCAACGUUdTdT AACGUUGACAAAGCGCC GGdTdT 253 275 + 7.000 60.0 GUAGCAGGCCGCAUGCUUGdTdT CAAGCAUGCGGCCUGCU ACdTdT 280 302 + 7.000 60.0 GUUCAUGGUUCCCUGGCCCdTdT GGGCCAGGGAACCAUGA ACdTdT 339 361 + 7.000 40.0 ACAGGAGAAGAAGAAGAAGdTdT CUUCUUCUUCUUCUCCU GUdTdT 340 362 + 7.000 40.0 CAGGAGAAGAAGAAGAAGAdTdT UCUUCUUCUUCUUCUCC UGdTdT 348 370 + 7.000 40.0 GAAGAAGAAGAAGACAGGUdTdT ACCUGUCUUCUUCUUCU UCdTdT 375 397 + 7.000 60.0 GCGGCGGAUGCAGUACAACdTdT GUUGUACUGCAUCCGCC GCdTdT 408 430 + 7.000 60.0 CGUUGUGCCCACCUUUGGCdTdT GCCAAAGGUGGGCACAA CGdTdT 429 451 + 7.000 60.0 GAAGAAGGGCCCCAAUGCCdTdT GGCAUUGGGGCCCUUCU UCdTdT 432 454 + 7.000 60.0 GAAGGGCCCCAAUGCCAACdTdT GUUGGCAUUGGGGCCCU UCdTdT 435 457 + 7.000 60.0 GGGCCCCAAUGCCAACUCUdTdT AGAGUUGGCAUUGGGGC CCdTdT 488 510 + 7.000 40.0 AAAGCCACUUAGUUCAGUCdTdT GACUGAACUAAGUGGCU UUdTdT 489 511 + 7.000 40.0 AAGCCACUUAGUUCAGUCAdTdT UGACUGAACUAAGUGGC UUdTdT 490 512 + 7.000 40.0 AGCCACUUAGUUCAGUCAAdTdT UUGACUGAACUAAGUGG CUdTdT 491 513 + 7.000 40.0 GCCACUUAGUUCAGUCAAAdTdT UUUGACUGAACUAAGUG GCdTdT 129 151 + 6.000 55.0 GGCUCAUGUAGCCUCACUGdTdT CAGUGAGGCUACAUGAG CCdTdT 165 187 + 6.000 50.0 AGAUCAAGUCGUGCUCCUGdTdT CAGGAGCACGACUUGAU CUdTdT 278 300 + 6.000 50.0 AAGUUCAUGGUUCCCUGGCdTdT GCCAGGGAACCAUGAAC UUdTdT 314 336 + 6.000 50.0 GAGGUCAGACUCCUAAGGUdTdT ACCUUAGGAGUCUGACC UCdTdT 321 343 + 6.000 50.0 GACUCCUAAGGUGGCCAAAdTdT UUUGGCCACCUUAGGAG UCdTdT 323 345 + 6.000 50.0 CUCCUAAGGUGGCCAAACAdTdT UGUUUGGCCACCUUAGG AGdTdT 329 351 + 6.000 50.0 AGGUGGCCAAACAGGAGAAdTdT UUCUCCUGUUUGGCCAC CUdTdT [Part of this file has been deleted for brevity] 374 396 + 5.000 55.0 AGCGGCGGAUGCAGUACAAdTdT UUGUACUGCAUCCGCCG CUdTdT 383 405 + 5.000 55.0 UGCAGUACAACCGGCGCUUdTdT AAGCGCCGGUUGUACUG CAdTdT 387 409 + 5.000 55.0 GUACAACCGGCGCUUUGUCdTdT GACAAAGCGCCGGUUGU ACdTdT 390 412 + 5.000 55.0 CAACCGGCGCUUUGUCAACdTdT GUUGACAAAGCGCCGGU UGdTdT 392 414 + 5.000 55.0 ACCGGCGCUUUGUCAACGUdTdT ACGUUGACAAAGCGCCG GUdTdT 407 429 + 5.000 55.0 ACGUUGUGCCCACCUUUGGdTdT CCAAAGGUGGGCACAAC GUdTdT 428 450 + 5.000 55.0 AGAAGAAGGGCCCCAAUGCdTdT GCAUUGGGGCCCUUCUU CUdTdT 431 453 + 5.000 55.0 AGAAGGGCCCCAAUGCCAAdTdT UUGGCAUUGGGGCCCUU CUdTdT 434 456 + 5.000 60.0 AGGGCCCCAAUGCCAACUCdTdT GAGUUGGCAUUGGGGCC CUdTdT 444 466 + 5.000 35.0 UGCCAACUCUUAAGUCUUUdTdT AAAGACUUAAGAGUUGG CAdTdT 487 509 + 5.000 35.0 AAAAGCCACUUAGUUCAGUdTdT ACUGAACUAAGUGGCUU UUdTdT 123 145 + 4.000 50.0 GAUCAAGGCUCAUGUAGCCdTdT GGCUACAUGAGCCUUGA UCdTdT 125 147 + 4.000 50.0 UCAAGGCUCAUGUAGCCUCdTdT GAGGCUACAUGAGCCUU GAdTdT 128 150 + 4.000 50.0 AGGCUCAUGUAGCCUCACUdTdT AGUGAGGCUACAUGAGC CUdTdT 155 177 + 4.000 50.0 UUGCCCCGGAAGAUCAAGUdTdT ACUUGAUCUUCCGGGGC AAdTdT 234 256 + 4.000 60.0 GGCCCUGACUACCCUGGAAdTdT UUCCAGGGUAGUCAGGG CCdTdT 259 281 + 4.000 60.0 GGCCGCAUGCUUGGAGGUAdTdT UACCUCCAAGCAUGCGG CCdTdT 266 288 + 4.000 40.0 UGCUUGGAGGUAAAGUUCAdTdT UGAACUUUACCUCCAAG CAdTdT 342 364 + 4.000 40.0 GGAGAAGAAGAAGAAGAAGdTdT CUUCUUCUUCUUCUUCU CCdTdT 347 369 + 4.000 40.0 AGAAGAAGAAGAAGACAGGdTdT CCUGUCUUCUUCUUCUU CUdTdT 359 381 + 4.000 60.0 AGACAGGUCGGGCUAAGCGdTdT CGCUUAGCCCGACCUGU CUdTdT 111 133 + 3.000 55.0 AACGGUCGCCCAGAUCAAGdTdT CUUGAUCUGGGCGACCG UUdTdT 113 135 + 3.000 65.0 CGGUCGCCCAGAUCAAGGCdTdT GCCUUGAUCUGGGCGAC CGdTdT 172 194 + 3.000 70.0 GUCGUGCUCCUGGCAGGCGdTdT CGCCUGCCAGGAGCACG ACdTdT 443 465 + 3.000 35.0 AUGCCAACUCUUAAGUCUUdTdT AAGACUUAAGAGUUGGC AUdTdT 456 478 + 3.000 35.0 AGUCUUUUGUAAUUCUGGCdTdT GCCAGAAUUACAAAAGA CUdTdT 468 490 + 3.000 30.0 UUCUGGCUUUCUCUAAUAAdTdT UUAUUAGAGAAAGCCAG AAdTdT 484 506 + 3.000 30.0 UAAAAAAGCCACUUAGUUCdTdT GAACUAAGUGGCUUUUU UAdTdT 108 130 + 2.000 60.0 GGAAACGGUCGCCCAGAUCdTdT GAUCUGGGCGACCGUUU CCdTdT 135 157 + 2.000 60.0 UGUAGCCUCACUGGAGGGCdTdT GCCCUCCAGUGAGGCUA CAdTdT 139 161 + 2.000 60.0 GCCUCACUGGAGGGCAUUGdTdT CAAUGCCCUCCAGUGAG GCdTdT 150 172 + 2.000 60.0 GGGCAUUGCCCCGGAAGAUdTdT AUCUUCCGGGGCAAUGC CCdTdT 171 193 + 2.000 65.0 AGUCGUGCUCCUGGCAGGCdTdT GCCUGCCAGGAGCACGA CUdTdT 201 223 + 2.000 65.0 GGAUGAGGCCACUCUGGGCdTdT GCCCAGAGUGGCCUCAU CCdTdT 204 226 + 2.000 65.0 UGAGGCCACUCUGGGCCAGdTdT CUGGCCCAGAGUGGCCU CAdTdT 245 267 + 2.000 65.0 CCCUGGAAGUAGCAGGCCGdTdT CGGCCUGCUACUUCCAG GGdTdT 256 278 + 2.000 65.0 GCAGGCCGCAUGCUUGGAGdTdT CUCCAAGCAUGCGGCCU GCdTdT 285 307 + 2.000 65.0 UGGUUCCCUGGCCCGUGCUdTdT AGCACGGGCCAGGGAAC CAdTdT 338 360 + 2.000 35.0 AACAGGAGAAGAAGAAGAAdTdT UUCUUCUUCUUCUCCUG UUdTdT 345 367 + 2.000 35.0 GAAGAAGAAGAAGAAGACAdTdT UGUCUUCUUCUUCUUCU UCdTdT 486 508 + 2.000 35.0 AAAAAGCCACUUAGUUCAGdTdT CUGAACUAAGUGGCUUU UUdTdT #--------------------------------------- #--------------------------------------- #--------------------------------------- # Total_sequences: 1 # Total_length: 518 # Reported_sequences: 1 # Reported_hitcount: 85 #--------------------------------------- File: x65923.fasta >X65923_308 %GC 50.0 Score 9 H.sapiens fau mRNA aaaagtgagaggtcagactccta >X65923_309 %GC 50.0 Score 9 H.sapiens fau mRNA aaagtgagaggtcagactcctaa >X65923_310 %GC 50.0 Score 9 H.sapiens fau mRNA aagtgagaggtcagactcctaag >X65923_351 %GC 50.0 Score 9 H.sapiens fau mRNA aagaagaagaagacaggtcgggc >X65923_166 %GC 55.0 Score 8 H.sapiens fau mRNA aagatcaagtcgtgctcctggca >X65923_279 %GC 55.0 Score 8 H.sapiens fau mRNA aaagttcatggttccctggcccg >X65923_330 %GC 55.0 Score 8 H.sapiens fau mRNA aaggtggccaaacaggagaagaa >X65923_354 %GC 55.0 Score 8 H.sapiens fau mRNA aagaagaagacaggtcgggctaa >X65923_357 %GC 55.0 Score 8 H.sapiens fau mRNA aagaagacaggtcgggctaagcg >X65923_393 %GC 55.0 Score 8 H.sapiens fau mRNA aaccggcgctttgtcaacgttgt >X65923_253 %GC 60.0 Score 7 H.sapiens fau mRNA aagtagcaggccgcatgcttgga >X65923_280 %GC 60.0 Score 7 H.sapiens fau mRNA aagttcatggttccctggcccgt >X65923_339 %GC 40.0 Score 7 H.sapiens fau mRNA aaacaggagaagaagaagaagaa >X65923_340 %GC 40.0 Score 7 H.sapiens fau mRNA aacaggagaagaagaagaagaag >X65923_348 %GC 40.0 Score 7 H.sapiens fau mRNA aagaagaagaagaagacaggtcg >X65923_375 %GC 60.0 Score 7 H.sapiens fau mRNA aagcggcggatgcagtacaaccg >X65923_408 %GC 60.0 Score 7 H.sapiens fau mRNA aacgttgtgcccacctttggcaa >X65923_429 %GC 60.0 Score 7 H.sapiens fau mRNA aagaagaagggccccaatgccaa >X65923_432 %GC 60.0 Score 7 H.sapiens fau mRNA aagaagggccccaatgccaactc >X65923_435 %GC 60.0 Score 7 H.sapiens fau mRNA aagggccccaatgccaactctta >X65923_488 %GC 40.0 Score 7 H.sapiens fau mRNA aaaaagccacttagttcagtcaa >X65923_489 %GC 40.0 Score 7 H.sapiens fau mRNA aaaagccacttagttcagtcaaa >X65923_490 %GC 40.0 Score 7 H.sapiens fau mRNA aaagccacttagttcagtcaaaa >X65923_491 %GC 40.0 Score 7 H.sapiens fau mRNA aagccacttagttcagtcaaaaa >X65923_129 %GC 55.0 Score 6 H.sapiens fau mRNA aaggctcatgtagcctcactgga [Part of this file has been deleted for brevity] gaggccctgactaccctggaagt >X65923_259 %GC 60.0 Score 4 H.sapiens fau mRNA caggccgcatgcttggaggtaaa >X65923_266 %GC 40.0 Score 4 H.sapiens fau mRNA catgcttggaggtaaagttcatg >X65923_342 %GC 40.0 Score 4 H.sapiens fau mRNA caggagaagaagaagaagaagac >X65923_347 %GC 40.0 Score 4 H.sapiens fau mRNA gaagaagaagaagaagacaggtc >X65923_359 %GC 60.0 Score 4 H.sapiens fau mRNA gaagacaggtcgggctaagcggc >X65923_111 %GC 55.0 Score 3 H.sapiens fau mRNA gaaacggtcgcccagatcaaggc >X65923_113 %GC 65.0 Score 3 H.sapiens fau mRNA aacggtcgcccagatcaaggctc >X65923_172 %GC 70.0 Score 3 H.sapiens fau mRNA aagtcgtgctcctggcaggcgcg >X65923_443 %GC 35.0 Score 3 H.sapiens fau mRNA caatgccaactcttaagtctttt >X65923_456 %GC 35.0 Score 3 H.sapiens fau mRNA taagtcttttgtaattctggctt >X65923_468 %GC 30.0 Score 3 H.sapiens fau mRNA aattctggctttctctaataaaa >X65923_484 %GC 30.0 Score 3 H.sapiens fau mRNA aataaaaaagccacttagttcag >X65923_108 %GC 60.0 Score 2 H.sapiens fau mRNA caggaaacggtcgcccagatcaa >X65923_135 %GC 60.0 Score 2 H.sapiens fau mRNA catgtagcctcactggagggcat >X65923_139 %GC 60.0 Score 2 H.sapiens fau mRNA tagcctcactggagggcattgcc >X65923_150 %GC 60.0 Score 2 H.sapiens fau mRNA gagggcattgccccggaagatca >X65923_171 %GC 65.0 Score 2 H.sapiens fau mRNA caagtcgtgctcctggcaggcgc >X65923_201 %GC 65.0 Score 2 H.sapiens fau mRNA gaggatgaggccactctgggcca >X65923_204 %GC 65.0 Score 2 H.sapiens fau mRNA gatgaggccactctgggccagtg >X65923_245 %GC 65.0 Score 2 H.sapiens fau mRNA taccctggaagtagcaggccgca >X65923_256 %GC 65.0 Score 2 H.sapiens fau mRNA tagcaggccgcatgcttggaggt >X65923_285 %GC 65.0 Score 2 H.sapiens fau mRNA catggttccctggcccgtgctgg >X65923_338 %GC 35.0 Score 2 H.sapiens fau mRNA caaacaggagaagaagaagaaga >X65923_345 %GC 35.0 Score 2 H.sapiens fau mRNA gagaagaagaagaagaagacagg >X65923_486 %GC 35.0 Score 2 H.sapiens fau mRNA taaaaaagccacttagttcagtc Output files for usage example 2 File: x65923.sirna ######################################## # Program: sirna # Rundate: Fri 15 Jul 2011 12:00:00 # Commandline: sirna # -context # -sequence tembl:x65923 # Report_format: table # Report_file: x65923.sirna ######################################## #======================================= # # Sequence: X65923 from: 1 to: 518 # HitCount: 85 # # The forward sense sequence shows the first 2 bases of # the 23 base region in brackets, this should be ignored # when ordering siRNA probes. # CDS region found in feature table starting at 57 # #======================================= Start End Strand Score GC% Sense_siRNA Antis ense_siRNA 308 330 + 9.000 50.0 (AA)AAGUGAGAGGUCAGACUCCdTdT GGAGUCUGACCUC UCACUUdTdT 309 331 + 9.000 50.0 (AA)AGUGAGAGGUCAGACUCCUdTdT AGGAGUCUGACCU CUCACUdTdT 310 332 + 9.000 50.0 (AA)GUGAGAGGUCAGACUCCUAdTdT UAGGAGUCUGACC UCUCACdTdT 351 373 + 9.000 50.0 (AA)GAAGAAGAAGACAGGUCGGdTdT CCGACCUGUCUUC UUCUUCdTdT 166 188 + 8.000 55.0 (AA)GAUCAAGUCGUGCUCCUGGdTdT CCAGGAGCACGAC UUGAUCdTdT 279 301 + 8.000 55.0 (AA)AGUUCAUGGUUCCCUGGCCdTdT GGCCAGGGAACCA UGAACUdTdT 330 352 + 8.000 55.0 (AA)GGUGGCCAAACAGGAGAAGdTdT CUUCUCCUGUUUG GCCACCdTdT 354 376 + 8.000 55.0 (AA)GAAGAAGACAGGUCGGGCUdTdT AGCCCGACCUGUC UUCUUCdTdT 357 379 + 8.000 55.0 (AA)GAAGACAGGUCGGGCUAAGdTdT CUUAGCCCGACCU GUCUUCdTdT 393 415 + 8.000 55.0 (AA)CCGGCGCUUUGUCAACGUUdTdT AACGUUGACAAAG CGCCGGdTdT 253 275 + 7.000 60.0 (AA)GUAGCAGGCCGCAUGCUUGdTdT CAAGCAUGCGGCC UGCUACdTdT 280 302 + 7.000 60.0 (AA)GUUCAUGGUUCCCUGGCCCdTdT GGGCCAGGGAACC AUGAACdTdT 339 361 + 7.000 40.0 (AA)ACAGGAGAAGAAGAAGAAGdTdT CUUCUUCUUCUUC UCCUGUdTdT 340 362 + 7.000 40.0 (AA)CAGGAGAAGAAGAAGAAGAdTdT UCUUCUUCUUCUU CUCCUGdTdT 348 370 + 7.000 40.0 (AA)GAAGAAGAAGAAGACAGGUdTdT ACCUGUCUUCUUC UUCUUCdTdT 375 397 + 7.000 60.0 (AA)GCGGCGGAUGCAGUACAACdTdT GUUGUACUGCAUC CGCCGCdTdT 408 430 + 7.000 60.0 (AA)CGUUGUGCCCACCUUUGGCdTdT GCCAAAGGUGGGC ACAACGdTdT 429 451 + 7.000 60.0 (AA)GAAGAAGGGCCCCAAUGCCdTdT GGCAUUGGGGCCC UUCUUCdTdT 432 454 + 7.000 60.0 (AA)GAAGGGCCCCAAUGCCAACdTdT GUUGGCAUUGGGG CCCUUCdTdT 435 457 + 7.000 60.0 (AA)GGGCCCCAAUGCCAACUCUdTdT AGAGUUGGCAUUG GGGCCCdTdT 488 510 + 7.000 40.0 (AA)AAAGCCACUUAGUUCAGUCdTdT GACUGAACUAAGU GGCUUUdTdT 489 511 + 7.000 40.0 (AA)AAGCCACUUAGUUCAGUCAdTdT UGACUGAACUAAG UGGCUUdTdT 490 512 + 7.000 40.0 (AA)AGCCACUUAGUUCAGUCAAdTdT UUGACUGAACUAA GUGGCUdTdT 491 513 + 7.000 40.0 (AA)GCCACUUAGUUCAGUCAAAdTdT UUUGACUGAACUA AGUGGCdTdT 129 151 + 6.000 55.0 (AA)GGCUCAUGUAGCCUCACUGdTdT CAGUGAGGCUACA UGAGCCdTdT 165 187 + 6.000 50.0 (GA)AGAUCAAGUCGUGCUCCUGdTdT CAGGAGCACGACU UGAUCUdTdT 278 300 + 6.000 50.0 (UA)AAGUUCAUGGUUCCCUGGCdTdT GCCAGGGAACCAU GAACUUdTdT [Part of this file has been deleted for brevity] 374 396 + 5.000 55.0 (UA)AGCGGCGGAUGCAGUACAAdTdT UUGUACUGCAUCC GCCGCUdTdT 383 405 + 5.000 55.0 (GA)UGCAGUACAACCGGCGCUUdTdT AAGCGCCGGUUGU ACUGCAdTdT 387 409 + 5.000 55.0 (CA)GUACAACCGGCGCUUUGUCdTdT GACAAAGCGCCGG UUGUACdTdT 390 412 + 5.000 55.0 (UA)CAACCGGCGCUUUGUCAACdTdT GUUGACAAAGCGC CGGUUGdTdT 392 414 + 5.000 55.0 (CA)ACCGGCGCUUUGUCAACGUdTdT ACGUUGACAAAGC GCCGGUdTdT 407 429 + 5.000 55.0 (CA)ACGUUGUGCCCACCUUUGGdTdT CCAAAGGUGGGCA CAACGUdTdT 428 450 + 5.000 55.0 (CA)AGAAGAAGGGCCCCAAUGCdTdT GCAUUGGGGCCCU UCUUCUdTdT 431 453 + 5.000 55.0 (GA)AGAAGGGCCCCAAUGCCAAdTdT UUGGCAUUGGGGC CCUUCUdTdT 434 456 + 5.000 60.0 (GA)AGGGCCCCAAUGCCAACUCdTdT GAGUUGGCAUUGG GGCCCUdTdT 444 466 + 5.000 35.0 (AA)UGCCAACUCUUAAGUCUUUdTdT AAAGACUUAAGAG UUGGCAdTdT 487 509 + 5.000 35.0 (AA)AAAAGCCACUUAGUUCAGUdTdT ACUGAACUAAGUG GCUUUUdTdT 123 145 + 4.000 50.0 (CA)GAUCAAGGCUCAUGUAGCCdTdT GGCUACAUGAGCC UUGAUCdTdT 125 147 + 4.000 50.0 (GA)UCAAGGCUCAUGUAGCCUCdTdT GAGGCUACAUGAG CCUUGAdTdT 128 150 + 4.000 50.0 (CA)AGGCUCAUGUAGCCUCACUdTdT AGUGAGGCUACAU GAGCCUdTdT 155 177 + 4.000 50.0 (CA)UUGCCCCGGAAGAUCAAGUdTdT ACUUGAUCUUCCG GGGCAAdTdT 234 256 + 4.000 60.0 (GA)GGCCCUGACUACCCUGGAAdTdT UUCCAGGGUAGUC AGGGCCdTdT 259 281 + 4.000 60.0 (CA)GGCCGCAUGCUUGGAGGUAdTdT UACCUCCAAGCAU GCGGCCdTdT 266 288 + 4.000 40.0 (CA)UGCUUGGAGGUAAAGUUCAdTdT UGAACUUUACCUC CAAGCAdTdT 342 364 + 4.000 40.0 (CA)GGAGAAGAAGAAGAAGAAGdTdT CUUCUUCUUCUUC UUCUCCdTdT 347 369 + 4.000 40.0 (GA)AGAAGAAGAAGAAGACAGGdTdT CCUGUCUUCUUCU UCUUCUdTdT 359 381 + 4.000 60.0 (GA)AGACAGGUCGGGCUAAGCGdTdT CGCUUAGCCCGAC CUGUCUdTdT 111 133 + 3.000 55.0 (GA)AACGGUCGCCCAGAUCAAGdTdT CUUGAUCUGGGCG ACCGUUdTdT 113 135 + 3.000 65.0 (AA)CGGUCGCCCAGAUCAAGGCdTdT GCCUUGAUCUGGG CGACCGdTdT 172 194 + 3.000 70.0 (AA)GUCGUGCUCCUGGCAGGCGdTdT CGCCUGCCAGGAG CACGACdTdT 443 465 + 3.000 35.0 (CA)AUGCCAACUCUUAAGUCUUdTdT AAGACUUAAGAGU UGGCAUdTdT 456 478 + 3.000 35.0 (UA)AGUCUUUUGUAAUUCUGGCdTdT GCCAGAAUUACAA AAGACUdTdT 468 490 + 3.000 30.0 (AA)UUCUGGCUUUCUCUAAUAAdTdT UUAUUAGAGAAAG CCAGAAdTdT 484 506 + 3.000 30.0 (AA)UAAAAAAGCCACUUAGUUCdTdT GAACUAAGUGGCU UUUUUAdTdT 108 130 + 2.000 60.0 (CA)GGAAACGGUCGCCCAGAUCdTdT GAUCUGGGCGACC GUUUCCdTdT 135 157 + 2.000 60.0 (CA)UGUAGCCUCACUGGAGGGCdTdT GCCCUCCAGUGAG GCUACAdTdT 139 161 + 2.000 60.0 (UA)GCCUCACUGGAGGGCAUUGdTdT CAAUGCCCUCCAG UGAGGCdTdT 150 172 + 2.000 60.0 (GA)GGGCAUUGCCCCGGAAGAUdTdT AUCUUCCGGGGCA AUGCCCdTdT 171 193 + 2.000 65.0 (CA)AGUCGUGCUCCUGGCAGGCdTdT GCCUGCCAGGAGC ACGACUdTdT 201 223 + 2.000 65.0 (GA)GGAUGAGGCCACUCUGGGCdTdT GCCCAGAGUGGCC UCAUCCdTdT 204 226 + 2.000 65.0 (GA)UGAGGCCACUCUGGGCCAGdTdT CUGGCCCAGAGUG GCCUCAdTdT 245 267 + 2.000 65.0 (UA)CCCUGGAAGUAGCAGGCCGdTdT CGGCCUGCUACUU CCAGGGdTdT 256 278 + 2.000 65.0 (UA)GCAGGCCGCAUGCUUGGAGdTdT CUCCAAGCAUGCG GCCUGCdTdT 285 307 + 2.000 65.0 (CA)UGGUUCCCUGGCCCGUGCUdTdT AGCACGGGCCAGG GAACCAdTdT 338 360 + 2.000 35.0 (CA)AACAGGAGAAGAAGAAGAAdTdT UUCUUCUUCUUCU CCUGUUdTdT 345 367 + 2.000 35.0 (GA)GAAGAAGAAGAAGAAGACAdTdT UGUCUUCUUCUUC UUCUUCdTdT 486 508 + 2.000 35.0 (UA)AAAAAGCCACUUAGUUCAGdTdT CUGAACUAAGUGG CUUUUUdTdT #--------------------------------------- #--------------------------------------- #--------------------------------------- # Total_sequences: 1 # Total_length: 518 # Reported_sequences: 1 # Reported_hitcount: 85 #--------------------------------------- The siRNAs are reported in order of best score first. sirna reports both the sense and antisense siRNAs as 5' to 3'. Data files None. Notes RNA interference (RNAi) is a phenomenon whereby small interfering RNA strands (siRNA) inhibit gene expression at the level of transcription or translation of specific genes. RNAi is a defence mechanism against viruses and is important in regulating development and genome maintenance. siRNA are double stranded RNA molecules where one or the other strand is strongly complementary to a target RNA strand. Once they bind to a target, a nuclease protein guided by the siRNA cleaves the target and renders it untranslateable. Gene silencing using RNAi has been used to determine the function of many genes in Drosophilia, C. elegans, and many plant species. The duration of knockdown by siRNA can typically last for 7-10 days, and has been shown to transfer to daughter cells. Of further note, siRNAs are effective at quantities much lower than alternative gene silencing methodologies, including antisense and ribozyme based strategies. Due to various mechanisms of antiviral response to long dsRNA, RNAi at first proved more difficult to establish in mammalian species. Then, Tuschl, Elbashir, and others discovered that RNAi can be elicited very effectively by well-defined 21-base duplex RNAs. When these small interfering RNA, or siRNA, are added in duplex form with a transfection agent to mammalian cell cultures, the 21-base-pair RNA acts in concert with cellular components to silence the gene with sequence homology to one of the siRNA sequences. Strategies for the design of effective siRNA sequences have been recently documented, most notably by Sayda Elbashir, Thomas Tuschl, et al. Their studies of mammalian RNAi suggest that the most efficient gene-silencing effect is achieved using double-stranded siRNA having a 19-nucleotide complementary region and a 2-nucleotide 3' overhang at each end. Current models of the RNAi mechanism suggest that the antisense siRNA strand recognizes the specific gene target. In gene-specific RNAi, the coding region (CDS) of the mRNA is usually targeted. The search for an appropriate target sequence should begin 50-100 nucleotides downstream of the start codon. UTR-binding proteins and/or translation initiation complexes may interfere with the binding of the siRNP endonuclease complex. Tuschl, Elbashir et al. say that they have successfully used siRNAs targetting the 3' UTR. To avoid interference from mRNA regulatory proteins, sequences in the 5' untranslated region or near the start codon should not be targeted. A set of rules for the design of siRNA has been suggested http://www.mpibpc.gwdg.de/abteilungen/100/105/sirna.html based on the work of Tuschl, Elbashir et al. They suggest searching for 23-nt sequence motif AA(N19)TT (N, any nucleotide) and select hits with approx. 50% G/C-content (30% to 70% has also worked in for them). If no suitable sequences are found, the search is extended using the motif NA(N21). The sequence of the sense siRNA corresponds to (N19)TT or N21 (position 3 to 23 of the 23-nt motif), respectively. In the latter case, they convert the 3' end of the sense siRNA to TT. The rationale for this sequence conversion is to generate a symmetric duplex with respect to the sequence composition of the sense and antisense 3' overhangs. The antisense siRNA is synthesized as the complement to position 1 to 21 of the 23-nt motif. Because position 1 of the 23-nt motif is not recognized sequence-specifically by the antisense siRNA, the 3'-most nucleotide residue of the antisense siRNA, can be chosen deliberately. However, the penultimate nucleotide of the antisense siRNA (complementary to position 2 of the 23-nt motif) should always be complementary to the targeted sequence. For simplifying chemical synthesis, they always use TT. More recently, they preferentially select siRNAs corresponding to the target motif NAR(N17)YNN, where R is purine (A, G) and Y is pyrimidine (C, U). The respective 21-nt sense and antisense siRNAs therefore begin with a purine nucleotide and can also be expressed from pol III expression vectors without a change in targeting site; expression of RNAs from pol III promoters is only efficient when the first transcribed nucleotide is a purine. They always design siRNAs with symmetric 3' TT overhangs, believing that symmetric 3' overhangs help to ensure that the siRNPs are formed with approximately equal ratios of sense and antisense target RNA-cleaving siRNPs Please note that the modification of the overhang of the sense sequence of the siRNA duplex is not expected to affect targeted mRNA recognition, as the antisense siRNA strand guides target recognition. In summary, no matter what you do to your overhangs, siRNAs should still function to a reasonable extent. However, using TT in the 3' overhang will always help your RNA synthesis company to let you know when you accidentally order a siRNA sequences 3' to 5' rather than in the recommended format of 5' to 3'. sirna reports both the sense and antisense siRNAs as 5' to 3'. Xeragon.com also suggest that choosing a region of the mRNA with a GC content as close as possible to 50% is a more important consideration than choosing a target sequence that begins with AA. They also suggest that a key consideration in target selection is to avoid having more than three guanosines in a row, since poly G sequences can hyperstack and form agglomerates that potentially interfere with the siRNA silencing mechanism. siRNAs appear to effectively silence genes in more than 80% of cases. Current data indicate that there are regions of some mRNAs where gene silencing does not work. To help ensure that a given target gene is silenced, it is advised that at least two target sequences as far apart on the gene as possible be chosen. Coding region specification It's possible (although the evidence is unclear) that regulatory protein binding to regions in and near the untranslated 5' region might interfere with the RNAi process. Therefore, this program avoids choosing siRNA probes from the 5' UTR and from the first 50 bases of the coding region. The second 50 bases of the coding region has a penalty associated with it to reduce the reporting of possible siRNA probes in this region. If the input sequence has a feature table specifying a coding region, then this will be used, else you can specify the start of the coding region, where this is known by the -sbegin command-line qualifier (which is normally used to specify the start of the region of a sequence that should be analysed in all EMBOSS programs). sirna looks at the feature table of the input mRNA sequence to find the coding regions (CDS). It will ignore the 5' UTR and the first 50 bases of the CDS. It will assign a penalty of 2 points to any siRNA in positions 51 to 100 in the CDS. If there is no CDS in the feature table, you can specify the CDS by using the command-line qualifier -sbegin to indicate where the CDS should start. If there is no CDS in the feature table and you do not use the command-line qualifier -sbegin, then sirna will assume that the CDS region is not known and will look for siRNAs in the whole of the sequence with no penaties associated with the location within the sequence. All these confusing regions There are a lot of references to 23 base regions, 21 base regions, 19 base regions, etc. in any description of siRNA. Perhaps an example with a sequence would be clearer? The 23 base region, in this case starting with an AA, might typically look like: 5' AAGUGAGAGGUCAGACUCCUATC The sense siRNA is made from the 19 bases of positions 3 to 21 of the 23 base target region, so: 5' GUGAGAGGUCAGACUCCUA and then typically d(TT) is added, so: 5' GUGAGAGGUCAGACUCCUAdTdT The antisense siRNA sequence is made from bases 3 to 21 of the target region, so: 5' GUGAGAGGUCAGACUCCUA sense 3' CACUCUCCAGUCUGAGGAU antisense 3' -> 5' so the antisense sequence that should be ordered with d(TT) added is: 5' UAGGAGUCUGACCUCUCACdTdT antisense 5' -> 3' References 1. Elbashir, S. M., et al. (2001a). Duplexes of 21-nucleotide RNAs mediate RNA interference in mammalian cell culture. Nature 411: 494-498. 2. Elbashir, S. M., W. Lendeckel and T. Tuschl (2001b). RNA interference is mediated by 21 and 22 nt RNAs. Genes & Dev. 15: 188-200. Warnings It is assumed that the input sequence is mRNA. Diagnostic Error Messages None. Exit status It always exits with status 0. Known bugs None. See also Program name Description banana Plot bending and curvature data for B-DNA btwisted Calculate the twisting in a B-DNA sequence einverted Finds inverted repeats in nucleotide sequences marscan Finds matrix/scaffold recognition (MRS) signatures in DNA sequences trimest Remove poly-A tails from nucleotide sequences Author(s) Gary Williams formerly at: MRC Rosalind Franklin Centre for Genomics Research Wellcome Trust Genome Campus, Hinxton, Cambridge, CB10 1SB, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. History Written (November 2002) - Gary Williams. Target users This program is intended to be used by everyone and everything, from naive users to embedded scripts. Comments None