jaspextract Wiki The master copies of EMBOSS documentation are available at http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki. Please help by correcting and extending the Wiki pages. Function Extract data from JASPAR Description JASPAR is a collection of transcription factor DNA-binding preferences, modelled as matrices. These can be converted into Position Weight Matrices (PWMs or PSSMs), used for scanning genomic sequences. JASPAR is the only database with this scope where the data can be used with no restrictions (open-source). This program copies the JASPAR distribution into its component matrix sets (e.g. JASPAR_CORE, JASPAR_PHYLOFACTS etc) and copies them into the EMBOSS data directories, performing any necessary conversions The home page of JASPAR is: http://jaspar.genereg.net/ The EMBOSS program jaspscan will not work unless this program is run. Running this program may be the job of your system manager. Usage Here is a sample session with jaspextract % jaspextract Extract data from JASPAR JASPAR database directory [.]: jaspar Go to the output files for this example Command line arguments Extract data from JASPAR Version: EMBOSS:6.4.0.0 Standard (Mandatory) qualifiers: [-directory] directory The FlatFileDir directory containing the .pfm files and the matrix_list.txt file Additional (Optional) qualifiers: (none) Advanced (Unprompted) qualifiers: (none) Associated qualifiers: "-directory" associated qualifiers -extension1 string Default file extension General qualifiers: -auto boolean Turn off prompts -stdout boolean Write first file to standard output -filter boolean Read first file from standard input, write first file to standard output -options boolean Prompt for standard and additional values -debug boolean Write debug output to program.dbg -verbose boolean Report some/full command line options -help boolean Report command line options and exit. More information on associated and general qualifiers can be found with -help -verbose -warning boolean Report warnings -error boolean Report errors -fatal boolean Report fatal errors -die boolean Report dying program messages -version boolean Report version number and exit Input file format The input files are part of the uncompressed and extracted Archive.zip file provided in the JASPAR html/DOWNLOAD directory of the JASPAR homepage (http://jaspar.genereg.net). After extracting the file you should specify the all_data/FlatFileDir directory when running jasparextract. It is advisable to first delete any old data files from your EMBOSS data file area e.g. from the /usr/local/emboss/share/EMBOSS/data/JASPAR_* directories Output file format The output file format is currently the same as the JASPAR distribution format, but with the matrix files separated into directories according to their type. Output files for usage example Directory: JASPAR_CNE This directory contains output files. Directory: JASPAR_CORE This directory contains output files, for example MA0070.1.pfm MA0071.1.pfm MA0072.1.pfm MA0073.1.pfm MA0074.1.pfm MA0075.1.pfm MA0076.1.pfm MA0077.1.pfm MA0078.1.pfm MA0079.1.pfm and matrix_list.txt. File: JASPAR_CORE/MA0070.1.pfm 5 3 16 1 0 17 17 0 0 16 12 8 6 9 1 1 18 1 0 0 18 1 0 2 2 3 1 0 0 0 0 1 0 0 1 2 5 3 0 16 0 0 1 17 0 1 5 6 File: JASPAR_CORE/MA0071.1.pfm 15 9 6 11 21 0 0 0 0 25 1 1 12 2 0 0 0 0 25 0 2 0 4 5 4 25 25 0 0 0 7 15 3 7 0 0 0 25 0 0 File: JASPAR_CORE/MA0072.1.pfm 9 17 15 35 23 2 0 28 0 0 0 0 36 15 8 2 0 1 0 12 0 0 0 0 0 36 0 6 8 7 3 0 0 13 0 8 36 36 0 0 0 10 11 10 18 0 13 9 36 0 0 0 36 0 0 5 File: JASPAR_CORE/MA0073.1.pfm 3 1 3 0 7 9 8 4 0 11 4 1 3 4 2 4 4 4 1 4 8 10 8 11 4 2 3 6 11 0 7 10 8 6 9 5 5 6 7 4 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 3 2 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 2 1 1 0 1 File: JASPAR_CORE/MA0074.1.pfm 3 0 0 0 0 9 4 2 2 5 0 0 1 0 7 0 0 0 0 9 0 2 4 0 0 0 0 0 9 1 7 10 9 0 0 1 0 2 8 5 10 0 0 0 2 0 0 1 10 1 0 4 2 0 0 0 10 9 1 0 File: JASPAR_CORE/MA0075.1.pfm 52 59 0 0 58 2 0 0 0 0 4 0 1 0 1 1 0 58 59 0 File: JASPAR_CORE/MA0076.1.pfm 16 0 0 0 0 20 16 4 1 1 20 20 0 0 0 0 1 6 2 0 0 20 20 0 0 15 0 1 0 0 0 0 0 4 0 13 File: JASPAR_CORE/MA0077.1.pfm 24 54 59 0 65 71 4 24 9 7 6 4 72 4 2 0 6 9 31 7 0 2 0 1 1 38 55 14 9 13 2 7 2 71 8 3 File: JASPAR_CORE/MA0078.1.pfm 7 8 3 30 0 0 0 0 0 9 8 18 0 1 0 0 0 17 6 4 1 0 0 0 31 2 10 9 11 9 1 30 31 0 29 4 File: JASPAR_CORE/MA0079.1.pfm 1 2 0 0 0 2 0 0 1 2 1 1 0 0 5 0 1 0 1 0 4 4 8 8 2 4 5 6 6 0 2 1 0 0 1 2 2 2 0 6 File: JASPAR_CORE/matrix_list.txt MA0072.1 17.4248426117905 RORA_2 Zinc-coordinating ; acc "N P_599022" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000048" ; species "9606" ; tax_ group "vertebrates" ; type "SELEX" MA0075.1 9.06306510239134 Prrx2 Helix-Turn-Helix ; acc "Q 06348" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7901837" ; pazar_tf_id "TF0000051" ; species "10090" ; tax_group "vertebrates" ; type "SELE X" MA0071.1 13.1897301896459 RORA_1 Zinc-coordinating ; acc "N P_599023" ; collection "CORE" ; comment "isoform type" ; family "Hormone-nuclear Receptor" ; medline "7926749" ; pazar_tf_id "TF0000047" ; species "9606" ; tax_ group "vertebrates" ; type "SELEX" MA0073.1 22.2782723704014 RREB1 Zinc-coordinating ; acc "Q 92766" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "8816445" ; pazar_tf_id "TF0000049" ; species "9606" ; tax_group "verteb rates" ; type "SELEX" MA0070.1 14.6408952002356 PBX1 Helix-Turn-Helix ; acc "Q 5T486" ; collection "CORE" ; comment "-" ; family "Homeo" ; medline "7910944" ; pazar_tf_id "TF0000046" ; species "9606" ; tax_group "vertebrates" ; type "SELEX " MA0074.1 20.4511671987138 RXRA::VDR Zinc-coordinating ; acc "P19793,P11473" ; collection "CORE" ; comment "heterodimer between RXRA an d VDR" ; family "Hormone-nuclear Receptor" ; medline "8674817" ; pazar_tf_id "TF 0000050" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0078.1 10.5018372361999 Sox17 Other Alpha-Helix ; acc "Q 61473" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medlin e "8636240" ; pazar_tf_id "TF0000054" ; species "10090" ; tax_group "vertebrates " ; type "SELEX" MA0079.1 9.7185757452318 SP1 Zinc-coordinating ; acc "P08047" ; collection "CORE" ; comment "-" ; family "BetaBetaAlpha-zinc finger" ; medline "2192357" ; pazar_tf_id "TF0000055" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0079.2 11.1288626921664 SP1 Zinc-coordinating ; acc "P 08047" ; collection "CORE" ; comment "Annotations from PAZAR SP1 + SP1_MOUSE + S P1_HUMAN + SP1_RAT in the pleiades genes project (TF0000105, TF0000121, TF000013 7, TF0000146)." ; family "BetaBetaAlpha-zinc finger" ; medline "17916232" ; paza r_tf_id "TF0000055" ; species "9606,10090,10116" ; tax_group "vertebrates" ; typ e "COMPILED" MA0077.1 9.07881462267178 SOX9 Other Alpha-Helix ; acc "P 48436" ; collection "CORE" ; comment "-" ; family "High Mobility Group" ; medlin e "9973626" ; pazar_tf_id "TF0000053" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" MA0076.1 14.123230134165 ELK4 Winged Helix-Turn-Helix ; acc "P28324" ; collection "CORE" ; comment "-" ; family "Ets" ; medline "8524663" ; pazar_tf_i d "TF0000052" ; species "9606" ; tax_group "vertebrates" ; type "SELEX" Directory: JASPAR_FAM This directory contains output files. Directory: JASPAR_PBM This directory contains output files. Directory: JASPAR_PBM_HLH This directory contains output files. Directory: JASPAR_PBM_HOMEO This directory contains output files. Directory: JASPAR_PHYLOFACTS This directory contains output files. Directory: JASPAR_POLII This directory contains output files. Directory: JASPAR_SPLICE This directory contains output files. Data files None Notes The home page of JASPAR is: http://jaspar.genereg.net Running this program may be the job of your system manager. References 1. DNA binding sites: representation and discovery Bioinformatics. 2000 Jan;16(1):16-23 2. Applied bioinformatics for the identification of regulatory elements Nat Rev Genet. 2004 Apr;5(4):276-87 Warnings None. Diagnostic Error Messages None. Exit status It always exits with status 0 unless an error is reported Known bugs None. See also Program name Description aaindexextract Extract amino acid property data from AAINDEX cutgextract Extract codon usage tables from CUTG database printsextract Extract data from PRINTS database for use by pscan prosextract Processes the PROSITE motif database for use by patmatmotifs rebaseextract Process the REBASE database for use by restriction enzyme applications tfextract Process TRANSFAC transcription factor database for use by tfscan Author(s) Alan Bleasby European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK Please report all bugs to the EMBOSS bug team (emboss-bug (c) emboss.open-bio.org) not to the original author. History Completed 23rd July 2007 Target users This program is intended to be used by administrators responsible for software and database installation and maintenance. Comments None