needleall


Wiki

   The master copies of EMBOSS documentation are available at
   http://emboss.open-bio.org/wiki/Appdocs on the EMBOSS Wiki.

   Please help by correcting and extending the Wiki pages.

Function

   Many-to-many pairwise alignments of two sequence sets

Description

   needleall reads a set of input sequences and compares them all to one
   or more sequences, writing their optimal global sequence alignments to
   file. It uses the Needleman-Wunsch alignment algorithm to find the
   optimum alignment (including gaps) of two sequences along their entire
   length. The algorithm uses a dynamic programming method to ensure the
   alignment is optimum, by exploring all possible alignments and choosing
   the best. A scoring matrix is read that contains values for every
   possible residue or nucleotide match. Needleall finds the alignment
   with the maximum possible score where the score of an alignment is
   equal to the sum of the matches taken from the scoring matrix, minus
   penalties arising from opening and extending gaps in the aligned
   sequences. The substitution matrix and gap opening and extension
   penalties are user-specified.

  Algorithm

   The Needleman-Wunsch algorithm is a member of the class of algorithms
   that can calculate the best score and alignment of two sequences in the
   order of mn steps, where n and m are the sequence lengths. These
   dynamic programming algorithms were first developed for protein
   sequence comparison by Needleman and Wunsch, though similar methods
   were independently devised during the late 1960's and early 1970's for
   use in the fields of speech processing and computer science.

   An important problem is the treatment of gaps, i.e., spaces inserted to
   optimise the alignment score. A penalty is subtracted from the score
   for each gap opened (the 'gap open' penalty) and a penalty is
   subtracted from the score for the total number of gap spaces multiplied
   by a cost (the 'gap extension' penalty). Typically, the cost of
   extending a gap is set to be 5-10 times lower than the cost for opening
   a gap.

   Penalty for a gap of n positions is calculated using the following
   formula:
gap opening penalty + (n - 1) * gap extension penalty

   In a Needleman-Wunsch global alignment, the entire length of each
   sequence is aligned. The sequences might be partially overlapping or
   one sequence might be aligned entirely internally to the other. There
   is no penalty for the hanging ends of the overlap. In bioinformatics,
   it is usually reasonable to assume that the sequences are incomplete
   and there should be no penalty for failing to align the missing bases.

Usage

   Here is a sample session with needleall


% needleall -minscore 40 -stdout -auto ../data/test1_illumina.fastq

Illumina_DpnII_Gex_PCR_Primer_2 FC12044_91407_8_200_406_24 45 (41.0)
Illumina_NlaIII_Gex_PCR_Primer_2 FC12044_91407_8_200_406_24 45 (41.0)
Illumina_Small_RNA_PCR_Primer_2 FC12044_91407_8_200_406_24 45 (41.0)
Illumina_DpnII_Gex_Adapters1_1 FC12044_91407_8_200_106_131 35 (40.5)
Illumina_Paired_End_DNA_Adapters1_1 FC12044_91407_8_200_57_85 35 (41.0)
Illumina_DpnII_Gex_Adapters1_1 FC12044_91407_8_200_154_436 31 (42.0)
Illumina_Genomic_DNA_PCR_Primers1_1 FC12044_91407_8_200_83_511 64 (42.0)
Illumina_Paired_End_DNA_PCR_Primers1_1 FC12044_91407_8_200_83_511 64 (42.0)
Illumina_DpnII_Gex_Adapters1_2 FC12044_91407_8_200_303_427 33 (40.5)
Illumina_DpnII_Gex_PCR_Primer_2 FC12044_91407_8_200_303_427 51 (40.5)
Illumina_DpnII_Gex_sequencing_primer FC12044_91407_8_200_303_427 38 (44.5)
Illumina_NlaIII_Gex_Adapters1_2 FC12044_91407_8_200_303_427 36 (40.5)
Illumina_NlaIII_Gex_PCR_Primer_2 FC12044_91407_8_200_303_427 51 (40.5)
Illumina_NlaIII_Gex_sequencing_primer FC12044_91407_8_200_303_427 39 (40.5)
Illumina_Small_RNA_5p_Adapter FC12044_91407_8_200_303_427 33 (40.5)
Illumina_Small_RNA_PCR_Primer_2 FC12044_91407_8_200_303_427 51 (40.5)
Illumina_Small_RNA_sequencing_primer FC12044_91407_8_200_303_427 38 (44.5)
Illumina_Paired_End_DNA_Adapters1_1 FC12044_91407_8_200_553_135 33 (44.5)
Illumina_DpnII_Gex_PCR_Primer_2 FC12044_91407_8_200_139_74 51 (46.0)
Illumina_DpnII_Gex_sequencing_primer FC12044_91407_8_200_139_74 38 (42.0)
Illumina_NlaIII_Gex_PCR_Primer_2 FC12044_91407_8_200_139_74 51 (46.0)
Illumina_Small_RNA_PCR_Primer_2 FC12044_91407_8_200_139_74 51 (46.0)
Illumina_Small_RNA_sequencing_primer FC12044_91407_8_200_139_74 38 (42.0)

#---------------------------------------
#---------------------------------------


   Go to the input files for this example
   Go to the output files for this example

Command line arguments

Many-to-many pairwise alignments of two sequence sets
Version: EMBOSS:6.4.0.0

   Standard (Mandatory) qualifiers:
  [-asequence]         seqset     Sequence set filename and optional format,
                                  or reference (input USA)
  [-bsequence]         seqall     Sequence(s) filename and optional format, or
                                  reference (input USA)
   -gapopen            float      [10.0 for any sequence] The gap open penalty
                                  is the score taken away when a gap is
                                  created. The best value depends on the
                                  choice of comparison matrix. The default
                                  value assumes you are using the EBLOSUM62
                                  matrix for protein sequences, and the
                                  EDNAFULL matrix for nucleotide sequences.
                                  (Floating point number from 1.0 to 100.0)
   -gapextend          float      [0.5 for any sequence] The gap extension,
                                  penalty is added to the standard gap penalty
                                  for each base or residue in the gap. This
                                  is how long gaps are penalized. Usually you
                                  will expect a few long gaps rather than many
                                  short gaps, so the gap extension penalty
                                  should be lower than the gap penalty. An
                                  exception is where one or both sequences are
                                  single reads with possible sequencing
                                  errors in which case you would expect many
                                  single base gaps. You can get this result by
                                  setting the gap open penalty to zero (or
                                  very low) and using the gap extension
                                  penalty to control gap scoring. (Floating
                                  point number from 0.0 to 10.0)
  [-outfile]           align      [*.needleall] Output alignment file name
                                  (default -aformat score)

   Additional (Optional) qualifiers:
   -datafile           matrixf    [EBLOSUM62 for protein, EDNAFULL for DNA]
                                  This is the scoring matrix file used when
                                  comparing sequences. By default it is the
                                  file 'EBLOSUM62' (for proteins) or the file
                                  'EDNAFULL' (for nucleic sequences). These
                                  files are found in the 'data' directory of
                                  the EMBOSS installation.
   -endweight          boolean    [N] Apply end gap penalties.
   -endopen            float      [10.0 for any sequence] The end gap open
                                  penalty is the score taken away when an end
                                  gap is created. The best value depends on
                                  the choice of comparison matrix. The default
                                  value assumes you are using the EBLOSUM62
                                  matrix for protein sequences, and the
                                  EDNAFULL matrix for nucleotide sequences.
                                  (Floating point number from 1.0 to 100.0)
   -endextend          float      [0.5 for any sequence] The end gap
                                  extension, penalty is added to the end gap
                                  penalty for each base or residue in the end
                                  gap. (Floating point number from 0.0 to
                                  10.0)
   -minscore           float      [1.0 for any sequence] Minimum alignment
                                  score to report an alignment. (Floating
                                  point number from -10.0 to 100.0)
   -errorfile          outfile    [needleall.error] Error file to be written
                                  to

   Advanced (Unprompted) qualifiers:
   -[no]brief          boolean    [Y] Brief identity and similarity

   Associated qualifiers:

   "-asequence" associated qualifiers
   -sbegin1            integer    Start of each sequence to be used
   -send1              integer    End of each sequence to be used
   -sreverse1          boolean    Reverse (if DNA)
   -sask1              boolean    Ask for begin/end/reverse
   -snucleotide1       boolean    Sequence is nucleotide
   -sprotein1          boolean    Sequence is protein
   -slower1            boolean    Make lower case
   -supper1            boolean    Make upper case
   -sformat1           string     Input sequence format
   -sdbname1           string     Database name
   -sid1               string     Entryname
   -ufo1               string     UFO features
   -fformat1           string     Features format
   -fopenfile1         string     Features file name

   "-bsequence" associated qualifiers
   -sbegin2            integer    Start of each sequence to be used
   -send2              integer    End of each sequence to be used
   -sreverse2          boolean    Reverse (if DNA)
   -sask2              boolean    Ask for begin/end/reverse
   -snucleotide2       boolean    Sequence is nucleotide
   -sprotein2          boolean    Sequence is protein
   -slower2            boolean    Make lower case
   -supper2            boolean    Make upper case
   -sformat2           string     Input sequence format
   -sdbname2           string     Database name
   -sid2               string     Entryname
   -ufo2               string     UFO features
   -fformat2           string     Features format
   -fopenfile2         string     Features file name

   "-outfile" associated qualifiers
   -aformat3           string     Alignment format
   -aextension3        string     File name extension
   -adirectory3        string     Output directory
   -aname3             string     Base file name
   -awidth3            integer    Alignment width
   -aaccshow3          boolean    Show accession number in the header
   -adesshow3          boolean    Show description in the header
   -ausashow3          boolean    Show the full USA in the alignment
   -aglobal3           boolean    Show the full sequence in alignment

   "-errorfile" associated qualifiers
   -odirectory         string     Output directory

   General qualifiers:
   -auto               boolean    Turn off prompts
   -stdout             boolean    Write first file to standard output
   -filter             boolean    Read first file from standard input, write
                                  first file to standard output
   -options            boolean    Prompt for standard and additional values
   -debug              boolean    Write debug output to program.dbg
   -verbose            boolean    Report some/full command line options
   -help               boolean    Report command line options and exit. More
                                  information on associated and general
                                  qualifiers can be found with -help -verbose
   -warning            boolean    Report warnings
   -error              boolean    Report errors
   -fatal              boolean    Report fatal errors
   -die                boolean    Report dying program messages
   -version            boolean    Report version number and exit


Input file format

   needleall reads in two nucleotide or protein sequences inputs. Both can
   be one or more sequences. All sequences in the first ionput are aligned
   to all sequences in the second input.

   The input is a standard EMBOSS sequence query (also known as a 'USA').

   Major sequence database sources defined as standard in EMBOSS
   installations include srs:embl, srs:uniprot and ensembl

   Data can also be read from sequence output in any supported format
   written by an EMBOSS or third-party application.

   The input format can be specified by using the command-line qualifier
   -sformat xxx, where 'xxx' is replaced by the name of the required
   format. The available format names are: gff (gff3), gff2, embl (em),
   genbank (gb, refseq), ddbj, refseqp, pir (nbrf), swissprot (swiss, sw),
   dasgff and debug.

   See: http://emboss.sf.net/docs/themes/SequenceFormats.html for further
   information on sequence formats.

  Input files for usage example

  File: illumina_adapter_primer.fa

>Illumina_Genomici_DNA_Adapters1_1
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
>Illumina_Genomic_DNA_Adapters1_2
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Genomic_DNA_PCR_Primers1_1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Genomic_DNA_PCR_Primers1_2
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
>Illumina_Genomic_DNA_sequencing_primer
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Paired_End_DNA_Adapters1_1
GATCGGAAGAGCGGTTCAGCAGGAATGCCGAG
>Illumina_Paired_End_DNA_Adapters1_2
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Paired_End_DNA_PCR_Primers1_1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Paired_End_DNA_PCR_Primers1_2
CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>Illumina_Paired_End_DNA_sequencing_primer_1
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Illumina_Paired_End_DNA_sequencing_primer_2
CGGTCTCGGCATTCCTGCTGAACCGCTCTTCCGATCT
>Illumina_DpnII_Gex_Adapters1_1
GATCGTCGGACTGTAGAACTCTGAAC
>Illumina_DpnII_Gex_Adapters1_2
ACAGGTTCAGAGTTCTACAGTCCGAC
>Illumina_DpnII_Gex_Adapters2_1
CAAGCAGAAGACGGCATACGA
>Illumina_DpnII_Gex_Adapters2_2
TCGTATGCCGTCTTCTGCTTG
>Illumina_DpnII_Gex_PCR_Primer_1
CAAGCAGAAGACGGCATACGA
>Illumina_DpnII_Gex_PCR_Primer_2
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
>Illumina_DpnII_Gex_sequencing_primer
CGACAGGTTCAGAGTTCTACAGTCCGACGATC
>Illumina_NlaIII_Gex_Adapters1_1
TCGGACTGTAGAACTCTGAAC
>Illumina_NlaIII_Gex_Adapters1_2
ACAGGTTCAGAGTTCTACAGTCCGACATG
>Illumina_NlaIII_Gex_Adapters2_1
CAAGCAGAAGACGGCATACGANN
>Illumina_NlaIII_Gex_Adapters2_2
TCGTATGCCGTCTTCTGCTTG
>Illumina_NlaIII_Gex_PCR_Primer_1
CAAGCAGAAGACGGCATACGA
>Illumina_NlaIII_Gex_PCR_Primer_2
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
>Illumina_NlaIII_Gex_sequencing_primer
CCGACAGGTTCAGAGTTCTACAGTCCGACATG
>Illumina_Small_RNA_RT_Primer
CAAGCAGAAGACGGCATACGA
>Illumina_Small_RNA_5p_Adapter
GTTCAGAGTTCTACAGTCCGACGATC
>Illumina_Small_RNA_3p_Adapter
TCGTATGCCGTCTTCTGCTTGT
>Illumina_Small_RNA_PCR_Primer_1
CAAGCAGAAGACGGCATACGA
>Illumina_Small_RNA_PCR_Primer_2
AATGATACGGCGACCACCGACAGGTTCAGAGTTCTACAGTCCGA
>Illumina_Small_RNA_sequencing_primer
CGACAGGTTCAGAGTTCTACAGTCCGACGATC


  File: test1_illumina.fastq

@FC12044_91407_8_200_406_24
GTTAGCTCCCACCTTAAGATGTTTA
+FC12044_91407_8_200_406_24
SXXTXXXXXXXXXTTSUXSSXKTMQ
@FC12044_91407_8_200_720_610
CTCTGTGGCACCCCATCCCTCACTT
+FC12044_91407_8_200_720_610
OXXXXXXXXXXXXXXXXXTSXQTXU
@FC12044_91407_8_200_345_133
GATTTTTTAACAATAAACGTACATA
+FC12044_91407_8_200_345_133
OQTOOSFORTFFFIIOFFFFFFFFF
@FC12044_91407_8_200_106_131
GTTGCCCAGGCTCGTCTTGAACTCC
+FC12044_91407_8_200_106_131
XXXXXXXXXXXXXXSXXXXISTXQS
@FC12044_91407_8_200_916_471
TGATTGAAGGTAGGGTAGCATACTG
+FC12044_91407_8_200_916_471
XXXXXXXXXXXXXXXUXXUSXXTXW
@FC12044_91407_8_200_57_85
GCTCCAATAGCGCAGAGGAAACCTG
+FC12044_91407_8_200_57_85
XFXMXSXXSXXXOSQROOSROFQIQ
@FC12044_91407_8_200_10_437
GCTGCTTGGGAGGCTGAGGCAGGAG
+FC12044_91407_8_200_10_437
USXSXXXXXXUXXXSXQXXUQXXKS
@FC12044_91407_8_200_154_436
AGACCTTTGGATACAATGAACGACT
+FC12044_91407_8_200_154_436
MKKMQTSRXMSQTOMRFOOIFFFFF
@FC12044_91407_8_200_336_64
AGGGAATTTTAGAGGAGGGCTGCCG
+FC12044_91407_8_200_336_64
STQMOSXSXSQXQXXKXXXKFXFFK
@FC12044_91407_8_200_620_233
TCTCCATGTTGGTCAGGCTGGTCTC
+FC12044_91407_8_200_620_233
XXXXXXXXXXXXXXXXXXXXXSXSW
@FC12044_91407_8_200_902_349
TGAACGTCGAGACGCAAGGCCCGCC
+FC12044_91407_8_200_902_349
XMXSSXMXXSXQSXTSQXFKSKTOF
@FC12044_91407_8_200_40_618
CTGTCCCCACGGCGGGGGGGCCTGG
+FC12044_91407_8_200_40_618
TXXXXSXXXXXXXXXXXXXRKFOXS
@FC12044_91407_8_200_83_511
GATGTACTCTTACACCCAGACTTTG
+FC12044_91407_8_200_83_511
SOXXXXXUXXXXXXQKQKKROOQSU
@FC12044_91407_8_200_76_246
TCAAGGGTGGATCTTGGCTCCCAGT
+FC12044_91407_8_200_76_246
XTXTUXXXXXRXXXTXXSUXSRFXQ
@FC12044_91407_8_200_303_427
TTGCGACAGAGTTTTGCTCTTGTCC
+FC12044_91407_8_200_303_427
XXQROXXXXIXFQXXXOIQSSXUFF
@FC12044_91407_8_200_31_299
TCTGCTCCAGCTCCAAGACGCCGCC
+FC12044_91407_8_200_31_299
XRXTSXXXRXXSXQQOXQTSQSXKQ
@FC12044_91407_8_200_553_135
TACGGAGCCGCGGGCGGGAAAGGCG
+FC12044_91407_8_200_553_135
XSQQXXXXXXXXXXSXXMFFQXTKU
@FC12044_91407_8_200_139_74
CCTCCCAGGTTCAAGCGATTATCCT
+FC12044_91407_8_200_139_74
RMXUSXTXXQXXQUXXXSQISISSO
@FC12044_91407_8_200_108_33
GTCATGGCGGCCCGCGCGGGGAGCG
+FC12044_91407_8_200_108_33
OOOSSXXSXXOMKMOFMKFOKFFFF
@FC12044_91407_8_200_980_965
ACAGTGGGTTCTTAAAGAAGAGTCG
+FC12044_91407_8_200_980_965
TOSSRXXXSSMSXMOMXIRXOXFFS
@FC12044_91407_8_200_981_857
AACGAGGGGCGCGACTTGACCTTGG
+FC12044_91407_8_200_981_857
RXMSSXXXXSXQXQXFSXQFQKMXS
@FC12044_91407_8_200_8_865
TTTCCCACCCCAGGAAGCCTTGGAC
+FC12044_91407_8_200_8_865
XXXFKOROMKOORMIMRIIKKORFF
@FC12044_91407_8_200_292_484
TCAGCCTCCGTGCCCAGCCCACTCC
+FC12044_91407_8_200_292_484
XQXOSXXXXXUXXXXIXXXXQTOXF
@FC12044_91407_8_200_675_16
CTCGGGAGGCTGAGGCAGGGGGGTT
+FC12044_91407_8_200_675_16
OXTXXXSXXQXXOXXKMXXMXOKQF
@FC12044_91407_8_200_285_136
CCAAATCTTGAATTGTAGCTCCCCT
+FC12044_91407_8_200_285_136
OSXOQXXXXXSXXUXXTXXXXTRMS

Output file format

   The output is a standard EMBOSS alignment file.

   The results can be output in one of several styles by using the
   command-line qualifier -aformat xxx, where 'xxx' is replaced by the
   name of the required format. Some of the alignment formats can cope
   with an unlimited number of sequences, while others are only for pairs
   of sequences.

   The available multiple alignment format names are: multiple, simple,
   fasta, msf, clustal, mega, meganon, nexus,, nexusnon, phylip,
   phylipnon, selex, treecon, tcoffee, debug, srs.

   The available pairwise alignment format names are: pair, markx0,
   markx1, markx2, markx3, markx10, match, sam, bam, score, srspair

   See: http://emboss.sf.net/docs/themes/AlignFormats.html for further
   information on alignment formats.

  Output files for usage example

  File: needleall.error

Alignment score (21.5) is less than minimum score(40.0) for sequences Illumina_G
enomici_DNA_Adapters1_1 vs FC12044_91407_8_200_406_24
Alignment score (24.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_Adapters1_2 vs FC12044_91407_8_200_406_24
Alignment score (31.0) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_406_24
Alignment score (25.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_406_24
Alignment score (24.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_sequencing_primer vs FC12044_91407_8_200_406_24
Alignment score (16.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_Adapters1_1 vs FC12044_91407_8_200_406_24
Alignment score (24.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_Adapters1_2 vs FC12044_91407_8_200_406_24
Alignment score (31.0) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_406_24
Alignment score (21.0) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_406_24
Alignment score (24.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_1 vs FC12044_91407_8_200_406_24
Alignment score (21.0) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_2 vs FC12044_91407_8_200_406_24
Alignment score (14.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_1 vs FC12044_91407_8_200_406_24
Alignment score (24.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_2 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_1 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_2 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_406_24
Alignment score (23.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_sequencing_primer vs FC12044_91407_8_200_406_24
Alignment score (12.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_1 vs FC12044_91407_8_200_406_24
Alignment score (27.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_2 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_1 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_2 vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_406_24
Alignment score (27.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_sequencing_primer vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_RT_Primer vs FC12044_91407_8_200_406_24
Alignment score (23.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_5p_Adapter vs FC12044_91407_8_200_406_24
Alignment score (13.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_3p_Adapter vs FC12044_91407_8_200_406_24
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_PCR_Primer_1 vs FC12044_91407_8_200_406_24
Alignment score (23.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_sequencing_primer vs FC12044_91407_8_200_406_24
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_G
enomici_DNA_Adapters1_1 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_Adapters1_2 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_720_610
Alignment score (20.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_sequencing_primer vs FC12044_91407_8_200_720_610
Alignment score (0.0) is less than minimum score(40.0) for sequences Illumina_Pa
ired_End_DNA_Adapters1_1 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_Adapters1_2 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_720_610
Alignment score (33.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_720_610
Alignment score (31.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_1 vs FC12044_91407_8_200_720_610
Alignment score (33.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_2 vs FC12044_91407_8_200_720_610
Alignment score (20.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_1 vs FC12044_91407_8_200_720_610
Alignment score (9.0) is less than minimum score(40.0) for sequences Illumina_Dp
nII_Gex_Adapters1_2 vs FC12044_91407_8_200_720_610
Alignment score (11.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_1 vs FC12044_91407_8_200_720_610
Alignment score (15.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_2 vs FC12044_91407_8_200_720_610
Alignment score (11.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_720_610
Alignment score (10.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_2 vs FC12044_91407_8_200_720_610
Alignment score (15.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_sequencing_primer vs FC12044_91407_8_200_720_610
Alignment score (20.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_1 vs FC12044_91407_8_200_720_610
Alignment score (9.5) is less than minimum score(40.0) for sequences Illumina_Nl
aIII_Gex_Adapters1_2 vs FC12044_91407_8_200_720_610
Alignment score (7.0) is less than minimum score(40.0) for sequences Illumina_Nl
aIII_Gex_Adapters2_1 vs FC12044_91407_8_200_720_610
Alignment score (15.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_2 vs FC12044_91407_8_200_720_610


  [Part of this file has been deleted for brevity]

Alignment score (13.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_1 vs FC12044_91407_8_200_675_16
Alignment score (17.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_2 vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_1 vs FC12044_91407_8_200_675_16
Alignment score (11.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_2 vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_675_16
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_2 vs FC12044_91407_8_200_675_16
Alignment score (22.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_sequencing_primer vs FC12044_91407_8_200_675_16
Alignment score (13.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_1 vs FC12044_91407_8_200_675_16
Alignment score (17.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_2 vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_1 vs FC12044_91407_8_200_675_16
Alignment score (11.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_2 vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_675_16
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_PCR_Primer_2 vs FC12044_91407_8_200_675_16
Alignment score (21.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_sequencing_primer vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_RT_Primer vs FC12044_91407_8_200_675_16
Alignment score (15.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_5p_Adapter vs FC12044_91407_8_200_675_16
Alignment score (7.0) is less than minimum score(40.0) for sequences Illumina_Sm
all_RNA_3p_Adapter vs FC12044_91407_8_200_675_16
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_PCR_Primer_1 vs FC12044_91407_8_200_675_16
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_PCR_Primer_2 vs FC12044_91407_8_200_675_16
Alignment score (22.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_sequencing_primer vs FC12044_91407_8_200_675_16
Alignment score (21.0) is less than minimum score(40.0) for sequences Illumina_G
enomici_DNA_Adapters1_1 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_Adapters1_2 vs FC12044_91407_8_200_285_136
Alignment score (30.0) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_285_136
Alignment score (16.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_G
enomic_DNA_sequencing_primer vs FC12044_91407_8_200_285_136
Alignment score (7.0) is less than minimum score(40.0) for sequences Illumina_Pa
ired_End_DNA_Adapters1_1 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_Adapters1_2 vs FC12044_91407_8_200_285_136
Alignment score (30.0) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_1 vs FC12044_91407_8_200_285_136
Alignment score (21.0) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_PCR_Primers1_2 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_1 vs FC12044_91407_8_200_285_136
Alignment score (18.5) is less than minimum score(40.0) for sequences Illumina_P
aired_End_DNA_sequencing_primer_2 vs FC12044_91407_8_200_285_136
Alignment score (27.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_1 vs FC12044_91407_8_200_285_136
Alignment score (13.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters1_2 vs FC12044_91407_8_200_285_136
Alignment score (6.0) is less than minimum score(40.0) for sequences Illumina_Dp
nII_Gex_Adapters2_1 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_Adapters2_2 vs FC12044_91407_8_200_285_136
Alignment score (6.0) is less than minimum score(40.0) for sequences Illumina_Dp
nII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_285_136
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_PCR_Primer_2 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_D
pnII_Gex_sequencing_primer vs FC12044_91407_8_200_285_136
Alignment score (26.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_1 vs FC12044_91407_8_200_285_136
Alignment score (14.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters1_2 vs FC12044_91407_8_200_285_136
Alignment score (2.0) is less than minimum score(40.0) for sequences Illumina_Nl
aIII_Gex_Adapters2_1 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_Adapters2_2 vs FC12044_91407_8_200_285_136
Alignment score (6.0) is less than minimum score(40.0) for sequences Illumina_Nl
aIII_Gex_PCR_Primer_1 vs FC12044_91407_8_200_285_136
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_PCR_Primer_2 vs FC12044_91407_8_200_285_136
Alignment score (15.5) is less than minimum score(40.0) for sequences Illumina_N
laIII_Gex_sequencing_primer vs FC12044_91407_8_200_285_136
Alignment score (6.0) is less than minimum score(40.0) for sequences Illumina_Sm
all_RNA_RT_Primer vs FC12044_91407_8_200_285_136
Alignment score (15.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_5p_Adapter vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_3p_Adapter vs FC12044_91407_8_200_285_136
Alignment score (6.0) is less than minimum score(40.0) for sequences Illumina_Sm
all_RNA_PCR_Primer_1 vs FC12044_91407_8_200_285_136
Alignment score (12.0) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_PCR_Primer_2 vs FC12044_91407_8_200_285_136
Alignment score (17.5) is less than minimum score(40.0) for sequences Illumina_S
mall_RNA_sequencing_primer vs FC12044_91407_8_200_285_136

   The Identity: is the percentage of identical matches between the two
   sequences over the reported aligned region (including any gaps in the
   length).

   The Similarity: is the percentage of matches between the two sequences
   over the reported aligned region (including any gaps in the length).

Data files

   For protein sequences EBLOSUM62 is used for the substitution matrix.
   For nucleotide sequence, EDNAFULL is used. Others can be specified.

   EMBOSS data files are distributed with the application and stored in
   the standard EMBOSS data directory, which is defined by the EMBOSS
   environment variable EMBOSS_DATA.

   To see the available EMBOSS data files, run:

% embossdata -showall

   To fetch one of the data files (for example 'Exxx.dat') into your
   current directory for you to inspect or modify, run:

% embossdata -fetch -file Exxx.dat


   Users can provide their own data files in their own directories.
   Project specific files can be put in the current directory, or for
   tidier directory listings in a subdirectory called ".embossdata". Files
   for all EMBOSS runs can be put in the user's home directory, or again
   in a subdirectory called ".embossdata".

   The directories are searched in the following order:
     * . (your current directory)
     * .embossdata (under your current directory)
     * ~/ (your home directory)
     * ~/.embossdata

Notes

   needleall is a true implementation of the Needleman-Wunsch algorithm
   and so produces a full path matrix. It therefore cannot be used with
   genome sized sequences unless you've a lot of memory and a lot of time.

References

    1. Needleman, S. B. and Wunsch, C. D. (1970) J. Mol. Biol. 48,
       443-453.
    2. Kruskal, J. B. (1983) An overview of squence comparison In D.
       Sankoff and J. B. Kruskal, (ed.), Time warps, string edits and
       macromolecules: the theory and practice of sequence comparison, pp.
       1-44 Addison Wesley.

Warnings

   needleall is for aligning pairs of sequences over their entire length.
   This works best with closely related sequences. If you use needleall to
   align very distantly-related sequences, it will produce a result but
   much of the alignment may have little or no biological significance.

   A true Needleman Wunsch implementation like needleall needs memory
   proportional to the product of the sequence lengths. For two sequences
   of length 10,000,000 and 1,000 it therefore needs memory proportional
   to 10,000,000,000 characters. Two arrays of this size are produced, one
   of ints and one of floats so multiply that figure by 8 to get the
   memory usage in bytes. That doesn't include other overheads. Therefore
   only use water and needle for accurate alignment of reasonably short
   sequences.

   The first input sequence set is loaded completely into memory. When
   comparing large numbers (or lengths) of sequences, the smallest set
   should be the first input to make the most efficient use of memory.

   If you run out of memory, try using stretcher instead.

Diagnostic Error Messages

Uncaught exception
 Assertion failed
 raised at ajmem.c:xxx

   Probably means you have run out of memory. Try using stretcher if this
   happens.

Exit status

   0 upon successful completion.

Known bugs

   None.

See also

                    Program name                       Description
                    est2genome   Align EST sequences to genomic DNA sequence
                    needle       Needleman-Wunsch global alignment of two sequences
                    stretcher    Needleman-Wunsch rapid global alignment of two sequences

                    When you want an alignment that covers the whole length of two
                    sequences, use needle.

                    When you are trying to find the best region of similarity between two
                    sequences, use water.

                    stretcher is a more suitable program to use to find global alignments
                    of very long sequences.

Author(s)

   Mahmut           Uludag
   European         Bioinformatics Institute, Wellcome Trust Genome Campus,
   Hinxton,         Cambridge CB10 1SD, UK

                    Please report all bugs to the EMBOSS bug team
                    (emboss-bug (c) emboss.open-bio.org) not to the original author.

History

Target users

                    This program is intended to be used by everyone and everything, from
                    naive users to embedded scripts.

Comments

                    None