Version 6.4.0 15-Jul-2011

	DBXFLAT can index FASTQ format short
	read sequence files, allowing individual sequences to be rapidly
	retrieved by name.

	Genpept format has changed since we last tested it. The LOCUS line
	is simpler. EMBOSS now supports GenPept as documented and
	distributed by NCBI.

	Sequence in SAM format ignores the reference sequence
	name. Previous releases saved it as the accession number, but this
	is inappropriate as it is then reported as the identifier in EMBL
	format.

	The -help output (and documentation) for align and report output
	types now includes the default format if defined in the ACD file.

	New code added to handle variation data in ajvar* source
	files. The AjPVar object will hold genetic variation data from the
	Ensembl API and from VCF input files.

	New access methods for URLs have been added as ajurlread.c and for
	URL output methods as ajurlwrite.c - supporting collecting and
	reporting of URLs as output. URLs are saved as an array of strings,
	intended to be reported as a set of links to the underlying data.

	Sequence format "raw" now only reads binary files, which means it
	cannot be used for piped data. The change was needed to avoid
	accepting binary data where a file has a NULL and then no newline,
	for example ABI data files where the initial 'ABIF' could be read
	as a valid sequence.

	Application tcode failed to plot results for more than one
	sequence.  It also reported a plplot error when reading random
	noncoding input. It also failed to report the threshold lines
	when they were outside the range of observed scores.

	Four new functions combine tables where the keys and values are of
	the same types. In each case the tables are resized to the larger
	of the hash array sizes, and then at each hash array position all
	keys in both tables are compared. The functions differ only in the
	actions taken when a match is or is not found. ajTableMergeAnd
	keeps all keys that are in both tables. ajTableMergeEor is the
	inverse keeping only keys that are in only one table.
	ajTableMergeNot removes keys that are also in the second
	table. ajTableMergeOr adds keys from the second table that do not
	match. All remaining keys and values are deleted using the tables
	built-in destructor functions.

	New application dbtell reports the attributes for a database.

	All messages written to the user are also logged to the debug file
	to help locate where they are generated when debugging.

	Applications showfeat, extractfeat and coderet are updated to
	follow the new features /subfeatures data structures.

	GFF3 format has been rewritten to comply strictly with the GFF3
	standard on the sequence ontology website. Characters are now
	escaped in tag values. The 'featflag' tag has been changed to
	convert the hex value into a readable list of flags, with some
	flags now inferred from the content of the GFF line. The GFF3
	special tags (all starting with an upper-case letter) are now
	stored separately. The ID and Parent tags are used in
	post-processing to build subfeatures which are stored under the
	feature with an ID matching their first Parent tag.

	GFF3 format sequence format failed to read files with additional
	## comment records after the header block. These comments are now
	ignored.

	Feature objects have been extended. A feature may now include a
	list of subfeatures. This is intended to allow exons to be stored
	under the feature to which they belong. With this new structure,
	sorting feature tables becomes easy as there is no need to match
	group tags and sort by ID. Features simply sort by their main
	(parent) feature, with the other subfeatures (exons) unseen by the
	sort algorithm.

	Application restrict crashed when the enzyme list was empty. If
	reported invalid enzyme names, but not 'no enzyme name given'.

	Reference-counted lists are enabled with the constructor
	ajListNewRef creating a reference-counted copy. Lists are only
	deleted when the reference count falls to zero.

	Reference-counted tables are enabled with the constructor
	ajTableNewRef creating a reference-counted copy. Tables are only
	deleted when the reference count falls to zero.

	Table code has been rewritten to automatically delete keys unless
	the table is created with a Const version of the constructor. All
	table constructors are renamed, with the older names retains as
	"deprecated" functions which do not delete keys or values. All
	EMBOSS code has been changed to use the new function names.

	New functions ajTableMatch, ajTableMatchC and ajTableMatchS test a
	key is present in a table. They can be used where the ajTableFetch
	is inadequate because the value may be NULL. Some code used
	ajTableFetchKey but this is intended only for case-insensitive keys.

	Tables (AjPTable) have defined functions to hash and compare
	keys. Two new functions can be defined to delete keys and
	values. By default these are NULL and no keys or values are
	deleted. The functions can be ajMemFree to simply free memory, or
	more complex object destructors. As these require a void** argument
	(all keys and values are void* internally) wrappers are needed
	around object destructors. We recommend appending 'Void' to the
	standard destructor name and casting the void** argument to pass
	to the object-specific destructor.

	Tables (AjPTable) can be resized using the ajTableResizeLen
	function. When adding to a table with ajTablePut the table is
	automatically resized when the number of entries exceeds an
	average of 8 per bucket.

	Function ajMemFree now accepts a void** argument and sets the
	pointer to zero after free the memory. All EMBOSS code calls this
	through the AJFREE macro which is now safer to use as the pointer
	appears only once in the generated code.

	Application digest conflicted with the name of a utility on some
	systems. It has been renamed to pepdigest.

	In the emboss.standard and emboss.default files certain attributes
	can appear more than once if defined as type "ATTR_LIST" in the
	ajnam.c source file. These include a new attribute 'field:' defined
	once for each database query field, superseding the 'fields:'
	list of field names. The 'field:' attribute has a list of field
	names, with the first being the name preferred by EMBOSS and
	others acceptable on the command line. A '!' delimiter marks the
	end of the field names and the start of a free text description.
	This style of description is also allowed for other attributes,
	including 'taxon:' and the 'edam*:' attributes. The syntax is
	taken from the metadata in OBO format.

	Data retrieval using the HTTP protocol now checks for redirects in
	the header and replaces the file buffer with the results from the
	new URL. This allows EMBOSS to read outdated URLs for database
	access.

	New trace functions ajTableFetchTrace and ajTablePutTrace help to
	debug adding new keys to a table.

	New parsing function ajStrTokenNextParseDelimiters returns the
	delimiter string in addition to the token parsed from a string
	token handler.

	Application einverted could report a bad alignment if the matched
	region reached the end of the search window. Matches which go
	beyond the search window are now ignored. This bug was reported
	with a very low threshold score and was unlikely to be noticed
	with the default settings.

	Sequence format treecon failed if the only line of input started
	with a number. Failure to find a second record now simply returns
	false.

	Tables can now use integer keys and values of four types - integer
	and long, signed and unsigned. The unsigned longs are used
	internally for emblcd index reading and for b+tree index creation.

	Report output in from pattern patching applications (fuzznuc,
	fuzzpro, fuzztran, dreg, preg) now includes the pattern as well
	as the pattern name in the '*pat' or 'Pattern_name' feature tag
	value.

	New applications search the EDAM ontology by each of its query
	fields, with common options to restrict the results to one of the
	7 EDAM namespaces. Also new applications to look for EDAM term with
	each of the 5 common relationships for EDAM data terms:
	has_input, has_output, is_identifier_of, is_format_of and
	is_source_of. The sixth relationship has_attribute is only used by
	the obsolete 'entity' namespace terms.

	New application dbxresource indexes the data resource catalogue
	DRCAT.dat which is distributed with EMBOSS. Most fields in DRCAT
	are indexed. The EDAM and Taxon fields are used by other
	applications to search the EDAM and TAXON databases for terms which
	are in turn used to select DRCAT entries by taxon, data type,
	format, identifier and resource.

	Any menu (list and selection ACD types) which allows all options
	to be selected now accepts "*" to select everything. This can be
	the default (e.g. for database index fields) or can be specified
	by the user with quotes to protect it from interpretation by the
	Unix shell.

	Tokens indexed with the dbx* programs now have white space indexed
	as underscores. Any index files with spaces in the tokens need to
	be re-indexed. This applies to keyword and organism indexes.

	New code added to handle short read assemblies in ajassem* source
	files. The AjPAssem object will hold large numbers of short reads
	in managed memory buffers.

	New template for adding data types with specific formats for input
	and output and data access methods. These templates are stored in
	ajwxyz* source files with a script newdatatypes.pl to
	automatically create new, properly named, stub functions in the
	AJAX core and ajaxdb libraries.

	Program nthseq now simply reports an error (not a fatal error) if
	too few sequences were read.

	Feature input and output was in one large file. This has now been
	refactored with ajfeatdata.h for the data structures, ajfeatread.c
	for input formats, ajfeatwrite.c for output formats and remaining
	feature object handling code in ajfeat.c.

	New access methods for text have been added as ajtextread.c and
	for text output methods as ajtextwrite.c - supporting text and
	(preserved) HTML and XML output. Text is saved as an array of
	strings, intended to be used as one per input record although
	storing the entire text in the first string is also possible.

	Data queries have been made general. A new AjPQuery object handles
	queries for any datatype, storing a list of field names and
	queries, plus an operator (OR, AND, NOT, EOR, ELSE) for combining
	fields. Previous releases had a hard-coded search for "id or
	accession" which now uses the new query structure. Extensions to
	the query language will allow more complex combinations, and will
	allow any field to be defined for an external data resource
	(e.g. fields for an SRSWWW server).

	All data reading access methods have been restructured. Methods
	that essentially return an open file with the pointer set to the
	start of an entry (which covers most of the original access
	methods) are moved to a new source file ajtextdb.c and use a new
	AjPTextin input object which is included within AjPSeqin for
	sequence input and AjPOboin for OBO term input. These functions
	are generalised for any input data in some text-based file
	format. Sequence access will first check for a text-based access
	method, and then for a sequence-specific method (e.g. ensembl).
	Other input datatypes can do the same. The code for OBO ontology
	terms will use the new text access methods. Code for access to
	other input data types (feature, alignment) will now be relatively
	easy to add. Text retrieval of data from a new list of data
	resources can also use these access methods.

	Program einverted required at least one base between the halves of
	an inverted repeat. Blunt joins are now reported where previous
	versions reported a 2 base gap.

	Error messages from database indexing now include the filename of
	the index file. This is useful when identifying the indexing
	operation where the problem occurred.

	EMBOSS database index files are extended to mark numeric and
	string index pages. In previous releases all were marked as
	strings. Older index files remain valid for sequence retrieval,
	but not for the new dbxreport index analysis application.

	New application dbxreport analyses the contents of an EMBOSS
	index, reporting the numbers of keys of various types, number of
	pages, and percent free space. It also checks that all pages in
	the index have been used and are linked to a higher page.

	New application dbxedam is an extended version of dbxobo which
	also indexes EDAM-specific relationships between terms.

	New application dbxobo indexes OBO format ontology files. Index
	fields are id, acc (alt_id records), name (name and synonym
	records), ns (namespace records), isa (is_a records pointing to
	the parent term) and des (def records).

	EMBOSS database index files include an extra count value
	"fullcount" for the total number of words indexed. The "count"
	value is the number of unique terms (for example, words in
	descriptions or accession numbers).

	EMBOSS database index files include an extra type value "Type"
	with the value "Identifier" for a simple primary identifier such
	as ID or accession, and "Secondary" for an index of secondary terms
	which points to the entry unique ID.

	Database indexing application dbxfasta may corrupt index files with
	long words in the description index. Dbxfasta now checks the
	maximum word length, and as an added safeguard the indexing
	library code also checks and truncates any word longer than the
	maximum.

	New application seqcount returns the number of sequences read.
	This simple application was requested on the EMBOSS mailing list
	to avoid complicated command line manipulations and unnecessary
	sequence output.

	acdpretty now writes lines up to 75 characters wide. The width was
	restricted to 50 to allow space for in-line comments but this
	restricted the length of indented text too severely.

	In emboss.defaults and the user's .embossrc file variables are now
	resolved at read time, including the names of include files. This
	can simplify the configuration files for sites running more than
	one installation.

	Patched: SAM format file entries with negative insert sizes are
	valid but were wrongly rejected.

	Patched: BAM format misread the quality scores. An offset of 33
	used to report values for debugging was incorrectly included in
	the stored values.

	Configuration now uses autoheader and has less dependency
	on the libtool version.

Version 6.3.0 15-Jul-2010
	'ensembl' is a new access method for accessing Ensembl
	from MySQL. Queries take the form:
	   seqret ensembl:human:ENST00000262160
	   seqret ensembl:human:ENST0000026216?
	   seqret ensembl:human:ENSE00001533831
	showing that transcripts, translations and exons are retrievable
	and that partial queries are allowed. Example database
	definitions are given in the emboss.default.template file. Please
	read the note above those definitions regarding fair use of
	the public Ensembl servers.

	'sql' is a new access method for networked SQL servers
	(MySQL or PostgreSQL). The server and database is described
	using the 'url' field. As for biomart (described below) the
	database definition must include definitions of new attributes
	'sequence' (the sequence column) and 'identifier' (the
	column used in the query). Additional columns may be
	returned as description text if they are listed in the 'returns'
	attribute of the DB definition. An example definition is
	given in emboss.default.template.

	tfextract has been updated to deal with multiple pattern lines
	and empty sequence lines.

	Three automatic EMBOSS environment variables are
	added. EMBOSS_INSTALLDIRECTORY is the installation directory
	reported by embossversion -full, EMBOSS_BASEDIRECTORY is the base
	directory reported by embossversion -full, and
	EMBOSS_ROOTDIRECTORY is the root directory reported by
	embossversion-full. These are needed to allow the QA test
	database definitions to point to the test data for the current
	installation, and appear in the test/.embossrc file.

	Validation of EMBL/GenBank feature tables has been updated by
	reading EMBL release 104 (June 2010) and allowing many feature
	qualifier non-standard values that appear in that release.

	Biomart is a new access method for sequence databases, The
	database definition must include definitions of new attributes
	'sequence' (the biomart sequence attribute) and 'identifier' (the
	Biomart identifier attribute). Additional attributes may be
	returned as description text if they are listed in the returns'
	attribute of the DB definition. An example definition is
	given in emboss.default.template.

	Database definitions have a new attribute serverversion which is
	used by SRSWWW access to choose the best way to retrieve data.

	SRSWWW database access, for example from the EBI's srs.ebi.ac.uk
	server, had a problem processing queries returning more than 30
	entries. This is now corrected by first asking the server for the
	number of entries and then accessing the data in chunks. This will
	unfortunately slow down SRSWWW access for single entries but was
	the only solution available after checking with EBI's SRS support
	team.

	Infoseq has a new column "organism" which shows the species line
	from an EMBL or UniProt entry. In a future release this may be
	changed to show the standard name for the NCBI taxon identifier
	from an entry as the species definitions for these databases can
	be long with alternative names and possibly additional species.

	Amino acid 280nm extinction coefficients in file Eamino.dat have
	been adjusted to match those of the Expasy 'protparam' tool.
	Pepstats now reports values with cysteine residues reduced and as
	cysteine bridges.

	Database types, originally defined as simply "N" for nucleotide
	and "P" for protein, should now be named in full. The names are
	expanded automatically when reading the definitions in the
	emboss.default and .embossrc files. Expanding the types allows for
	new database types to be added in the near future.

	EMBOSS can now read and write BAM (binary SAM) sequence files to
	extract all sequences and quality scores, for example to write
	them out in FASTQ format. Although BAM data can also be read
	through a pipe as standard input, in this case the format must be
	specified on the command line as it is not currently possible for
	EMBOSS to read a buffered text file as binary data.

	Needle dynamic programming algorithm updated to allow adjacent
	gaps in opposite strands.

	Rabin-Karp multi pattern search algorithm moved into the nucleus
	library. supermatcher application seed finding step updated to use
	Rabin-Karp multi-pattern search.

	Banded Smith-Waterman algorithm used by supermatcher and
	wordfinder applications has been revised, fixing a problem with
	occasional inconsistent alignments. Basic SAM format support for
	these two applications as well as for the wordmatch
	application. supermatcher assumes the second sequence as the
	reference sequence while wordfinder and wordmatch considers the
	first sequence as the reference sequence.

	The acdvalid application now reads the EDAM (EMBRACE Data and
	Methods) ontology to validate EDAM references in relations
	attributes. All applications are expected to have at least one
	topic and at least one operation term. Other qualifiers can have
	any number of data terms.

	New source file ajtax.c provides parsing and validation for the
	NCBI taxonomy in its .dmp file form. The parser reads all taxonomy
	data into memory. This takes up too much space for practical use,
	so is only intended for subsets. The parser will be reused to
	develop indexing applications to provide fast lookup of taxon
	identifiers.

	New source file ajobo.c provides parsing and validation for OBO
	format ontology files. The parser includes strict warnings
	according to the OBO format documentation, but these can be turned
	off as in many cases the OBO foundry ontologies do not follow the
	exact standard. Examples include terms not in sorted order, and
	Typedef stanzas following Term stanzas, and dbxrefs to
	non-existent terms (e.g. GO:ma in the gene ontology to cite a
	curator).

	Support for PDF and SVG graphic file output has been added. SVG
	requires no additional libraries. PDF support requires the libhpdf
	library (which, somewhat confusingly, is provided by the libharu
	project). EMBOSS will attempt to find the library and development
	files automatically and add PDF support (or not) appropriately.
	However, if libhpdf is in a non-standard place, a --with-hpdf=DIR
	configuration switch can be optionally used.

	The output of showalign has changed. The reference sequence now
	appears at the top, of selected. The ticks and sequence position
	numbering is relative to a selected reference sequence. Gaps
	within the reference appear as '.' and are not counted in
	numbering. End gaps appear as '.' with 'V' and 'v' as the major
	and minor tick marks, and numbering from -1 before the start and
	from +1 after the end of the reference. The additional copy of the
	consensus is no longer reported.

	When reading ABI trace files the quality scores can now be
	read. They are undefined in ABI files, but assumed to be phred
	scores. ABI files can have two sequences and sets of quality
	scores. The first is from the instrument base calling. The second
	is from a second base caller. Where two sets are found, EMBOSS now
	reads the second set.

	Application nospace has a new  -menu option to trim all, trailing, or
	excess whitespace.

	Output type outfileall is obsolete (it is essentially an outfile)
	and has been deleted. No application was using it.

	Input type filelist (comma-delimited list of filenames) now trims
	excess whitespace from the beginning and end of each filename.

	Command line qualifiers with an '=' but no value now have a
	value of an empty string. Previous releases set the value to "="

	The file extension for directory, dirlist and outdir ACD datatypes
	is now a qualifier. This allows it to be defined as a default in
	the ACD file but also substituted by the user. An empty string
	means 'ignore the extension'. To specify 'no extension' a single
	space can be used as the value.

	On the command line, for a parameter (with no qualifier name
	given) a single dot was used as a missing value in previous
	versions. This causes problems when specifying the current
	directory as a dot. On the command line an empty (missing) value
	must now be an empty quoted string '' or "".

	Ampersands in application descriptions have been removed. They
	confuse HTML versions of documentation.

	The QA test script qatest.pl has new options -simple to turn off
	messages when running with a local test file, and -with to cancel
	-without options

	Output redirected to a file can now use ajSysExecOutname functions
	to pass the filename to be used for standard output and possibly
	standard error. The filename is most usefully picked up from a new
	function ajAcdGetOutfileName which closes an ACD outfile and
	returns the name of the file. The file will be empty if simply
	opened, or will have existing contents if the append attribute is
	true in the ACD file.

	The output from tfscan is now in report format, replacing the
	undefined text file produced in previous releases.

	Where a new string is created by ajStrAssignS (the standard string
	copy functions) the reserved space for the string is enough to
	hold the current string value. In past releases the reserved
	memory was the same size as the reserved memory of the string
	being copied. This wasted memory where a large string had a short
	value, especially when copying records read from a buffered input
	file.

	Sequence input formats now turn off buffering of input once they
	can no longer fail (for example, FASTA format after the header
	record will read everything until it finds another header).

	Make ajaxdb code IPv6 compliant. Remove gethostbyname config
	check.

	pcre, expat & zlib include files now install to separate
	subdirectories.

	Showfeat failed to sort features with 'join' locations. The
	sorting is corrected. A future internal change will improve
	feature sorting in all cases.

	Restriction mapping applications now process bad enzyme input
	files without crashing.

	PNG graphics output had an unwanted blank margin that did not
	appear in other output formats. This is now turned off through
	plplot.

	Prettyplot formatting is corrected to improve the centring of
	characters within boxes.

	Restriction mapping applications no longer have an upper limit on
	the number of cuts.

	Warning messages for EMBL format sequences created by ENSEMBL
	have been turned off.

	Corrected references to the EMBL/GenBank feature table
	documentation in ACD files and web pages

	embossversion now reports the setting of debug options, and
	corrects variable name warnrange to acdwarnrange.

	Any numeric ACD type (integer, float, range or array) with
	calculated values for the minimum or maximum attributes can
	potentially have an impossible range (maximum less than minimum)
	at run time. ACD processing now discovers these calculated values,
	and requires a definition for a new attribute 'failrange' If this
	is defined true, a 'failmessage' attribute must also be defined to
	explain why the values are invalid (e.g. input sequence too short
	for the algorithm). If 'failrange' is false, a value for another
	new attribute 'trueminimum' must be set to define which of the
	minimum or maximum values if to be used as the only accepted
	value.

	PNG graphics output had a plplot-defined margin limiting the
	available plot space. This is now removed, allowing applications
	such as prettyplot more space to display results.

	Resource attribute identifier: is obsoleted. No code used it. It
	is no longer allowed in resource definitions.

	Database attributes identifier: description: and command: are
	obsoleted. No code used them. They are no longer allowed in
	database definitions.

Version 6.2.0 15-Jan-2010

	Fixed GFF2 and GFF3 feature formats to always have the start
	position less than the end position for features on the '-'
	strand.

	Updated sequence format refseqp to handle features for proteins in
	the latest release of refseq protein.gpff files.

	A new function ajDebugTest can be used to turn on/off specific
	debug calls. The only argument is a quoted string. A file
	.debugtest in the current directory or the user's home directory
	is read. This contains a list of tokens to be debugged, so
	ajDebugTest returns true if any of these tokens is passed in.
	Optionally, the name in .debugtest can be followed by a number
	which is the maximum number of times that token will be reported.
	ajDebugTest is intended for developers who use ajDebug calls that
	may be expensive or be excessively called.

	Some attributes in ACD files may appear more than once. These
	include any relations: attribute (now being populated with
	references to the new EDAM ontology), the groups attribute for
	applications, the (currently unused) keywords attribute for
	applications, and the external attribute for applications.

	Any external application must now be defined in the ACD file with
	an external: attribute in the application section. The string
	value has the name of the application as the first word, followed
	by a message to be printed if it is not found. When the ACD file
	is parsed, before any user prompts, the external applications are
	searched for by first looking for an environment variable
	EMBOSS_appname and then checking for an executable file in the
	current directory or in the path.

	All applications should be launched by using the name returned by
	the new ajAcdGetPathC or ajAcdGetPathS functions. This ensures the
	application has been found in ACD processing and any
	EMBOSS_appname variable has been tested.

	The acdvalid utility now tests for duplicate attributes.

	Format specifiers for strings and characters (%S, %s and %c) now
	have two flags U (e.g. %US) for uppercase and L for lower case
	output.

	The configure.in and main package Makefile.am files handle
	--enable-devwarnings differently. For the imported libraries this
	level of warning message is turned off. Messages are still
	generated for warnings from the main EMBOSS libraries and
	applications.

	The QA testing script qatest.pl has new options -nocheck to skip
	"make check" applications and -noembassy to skip EMBASSY packages.

	Extractfeat processed failed to accept all features by default.

	Extractfeat failed on reverse direction nucleotide features.

	Coderet miscounted non-coding sequences in the output table.

	Graphics devices now have improved and additional checks. 'tek'
	was rejected as an ambiguous match. 'das' is only valid for an
	xygraph - one based on sequence positions. On Windows (using
	mEMBOSS) the plplot version supports fewer devices and these are
	now excluded from selection.

	The change to graphics library access makes the ajGraphInit call
	which registered graphics functions for use by ACD parsing
	redundant. In its place we need to register data access
	functions. As all applications make use of this, we now include
	this automatically in embInit so there is no longer a need for
	applications to make a separate call before invoking
	code (e.g. ACD parsing) that may require registration of
	functions.

	The AJAX ACD code is now in a separate library. New core library
	functions store and retrieve ACD persistent data such as the
	program name, command line and list of inputs. As ACD is now
	linked separately from core AJAX and the graphics library, the
	callback mechanism for ajGraph functions to be called from ACD is
	no longer needed.

	The database access code in ajseqdb.c has been moved to a separate
	higher level library. This is where we will insert code to access
	the new ensembl library functions in AJAX, and possible future
	data access libraries. A callback mechanism is used so that the
	embInit call automatically registers data access methods to make
	them available within the core library functions that read
	sequences. This allows ajSeqRead to remain in the core library
	while calling database access methods that in turn may invoke
	ensembl access.

	The PCRE (perl-compatible regular expressions) code in AJAX has
	been updated to release 7.9 of PCRE. Previous releases were still
	at version 4.3. The code is standard PCRE code with the LINK_SIZE
	set to 4 bytes to allow matches in long sequences.

	ACD files include relations attributes with text taken from terms
	in the EMBRACE EDAM ontology. These terms are also described in
	the knowntypes.standard file and are matched to the known types
	when validating ACD files.

	EMBOSS now uses a more complete User-Agent string when
	communicating with HTTP servers.

	FASTQ short read sequence formats now read and write faster using
	lookup tables to avoid calculations in the conversion of quality
	scores.

	FASTQ short read formats have additional warning messages for bad
	or incomplete data.

	All sequence input formats now recognize invalid partial entries
	at the end of the input data and report an error message. A
	notable exception is FASTA format where a partial entry is still a
	valid ID line - these will give errors for zero length sequence
	unless empty sequences are allowed.

	Common output formats now write faster, using lightweight output
	functions to copy strings to the output file.

	SwissProt output formats now wrap long OS lines.

	Needle has been updated with end-gap penalties support, allowing
	complete global pairwise alignments. Three new options have been added;
	the endopen and endextend options are used to specify
	the gap opening and extension penalties for the end gaps,
	while the endweight option turns on/off weighting of the end gaps.

	New application needleall for all against all global/overlap
	pairwise alignment of sequences in two multi-sequence files.

	wordmatch updated for multi-sequence files using a modified version
	of the Rabin-Karp algorithm for multi-pattern search. Also added is
	a log file with statistical information on pattern matches.
	The updated wordmatch can, for example, be used for efficiently
	finding multiple patterns in large fastq files.

	Application documentation has a new format HTML table for the
	command line options. This is excluded from the text
	documentation, where the format of the help output is improved.

	Function names standardised for ajcod.c ajrange.c ajtranslate.c
	ajgraph.c ajhist.c and a few other functions renamed. The old
	names continue to work as "deprecated" functions although these
	will generate warning messages with the gcc compiler.

	Infoseq option -version is renamed -seqversion to avoid a clash
	with the new global -version qualifier.

	Three new "make check" applications entrailshtml, entrailsbook and
	entrailswiki generate tables of internal data in HTML, DocBook or
	WikiText formats. These are intended to update the website, books
	and Wiki with the latest internal details. The -tables qualifier
	specifies one or more tables to be printed. By default, all tables
	are produced. The book tables are sorted in format name order.

	Alignment output included headers only for EMBOSS-specific
	formats. The headers have been dropped from the FASTA MARKX0
	through MARKX10 formats to allow standard FASTA suite parsers to
	use the EMBOSS versions of these outputs.

	Fastq-solexa sequence formats converted phred scores of 1 to
	Solexa scores of -6. They now convert to the limit of -5.

	Fastq-sanger sequence format incorrectly stopped when the quality
	scores started with a '@' (phred quality 31).

	Intelligenetics sequence format now correctly ignores additional
	carriage control characters.

	Genbank-like protein formats (genpept and refseqp) failed when
	reading more than one sequence. The input is now buffered when
	the format is automatically reassigned to a related parser.

	The -help output now includes the one-line documentation string
	from the ACD file and the version number information reported by
	--version.

	All applications have a -version (or --version) qualifier which
	will report the EMBOSS version number. For EMBASSY applications it
	will also report the EMBASSY package version number as
	"PACKAGE:version". All EMBASSY applications need to call embInitP
	with an additional parameter of VERSION which will be defined
	automatically by the configure.in template. If the "versionnumber"
	attribute is defined in the ACD file this will also be reported as
	the application version "progamname:version"

	The ACD application attribute "version:" is renamed
	"versionnumber:" to avoid a name clash with the new -version
	qualifier. We need to use the qualifier name "-version" for
	compatibility with other systems and applications, so the renaming
	of the attribute is unavoidable. We believe it was only used (as
	originally intended) for the definition of external applications
	by SoapLab.

Version 6.1.0 15-Jul-2009

	New application showpep displays protein sequences. Showseq is now
	limited to nucleotide sequences. Many of the showseq options are
	not appropriate for proteins. Showpep makes the remaining showseq
	options available.

	A new data structure AjPSeqXref holds details of cross-references
	between a sequence object and any other data resource. The
	cross-reference attributes include a type to indicate the source
	of the cross-reference, for example XREF_DR for a reference in a
	DR line from EMBL or Swiss-Prot. The other attributes are the
	database name and up to 4 identifiers (as in the Swiss-Prot DR
	line definition) and a start and end position where the source is
	a feature table entry.

	When reading a sequence with an identifiable species, attempts are
	made to define the NCBI taxonomy identifier for the
	species. Possible sources include the OX line in Swiss-Prot, the
	taxon cross-reference in the EMBL/GenBank/DDBJ feature table
	(available only if the feature table is read) and the species name
	which can be matched to a set of common species obtained from
	NCBI.

	Swissprot entry descriptions in FASTA output no longer have a
	trailing '.'. Where the source entry has the new Swiss-Prot DE
	line format the name is built from the recommended full name with
	other names in round brackets.

	Binary files now consistently have null characters after strings
	to pad them to full length. Previous versions wrote whatever
	followed the NULL in the string object. The resulting files now
	look cleaner although any extra characters were always ignored
	when reading dbi index files.

	Test databases were updated on 24th June 2009.

	Blank lines are ignored before any sequence input. This is to
	support the use of seqret to read data pasted into web forms where
	extra blank lines are often accidentally included.

	FASTQ is now a valid sequence format and can be detected
	automatically. "fastq" format ignores all quality scores as there
	is no automatic and safe way to determine whether scores are for
	Sanger/phred or Illumina/Solexa quality. To read the quality
	scores we support formats "fastq-sanger" and "fastq-illumina". We
	also support "fastq-int" to read quality scores as integers. These
	scores are assumed to be Sanger quality. For Illumina quality
	scores out of range, a warning message is written once for each
	sequence. Sanger scores do not have out of range values as they
	allow the full set of quality characters, although high values
	(over 40) should only appear for contig consensus sequences.

	MEGA format has been rewritten to support the file format used by
	MEGA 4. Title can be in mixed case. Format and Gene/domain command
	lines are processed. Multiple gene/domain files are read by EMBOSS
	as separate alignment sets by seqretsetall. This may change in a
	future release as MEGA4 processes them as one alignment with
	annotated gene regions. While EMBOSS has no annotation specific to
	alignments this is a reasonable compromise.

	embossdata will now always return directory listings alphabetically.

	A new ACD function replaces an attribute value with an EMBOSS or
	environment variable. The attribute syntax is (@value:VARNAME).

	Infile datatypes in ACD have a new attribute directory: which
	defines the default directory to be searched. If the user
	specifies an explicit path the directory attribute is ignored.

	Applications writing out multiple sets of sequences now correctly
	reset the sequence output. This only affected one test application
	in EMBOSS 6.0.1 (input type seqsetall and output type seqoutall).

	Applications that use single letter qualifier names (for example
	the HMMERNEW wrappers for HMMER applications) can be confused if a
	single letter qualifier name matches uniquely an associated
	qualifier for a preceding command line qualifier. An additional
	check now ensures that a unique qualifier (for example -o) is
	correctly recognized.

	Global alignments with needle in rare cases missed the optimal
	alignment of the first 2 residues. This was a bug introduced in
	6.0.0.

	When reading data using a launched application, including the SRS
	access method which launches "getz", closing the input without
	reading to the end caused the file close function to loop
	forever. Examples included nthseq and seqret -firstonly both of
	which stop reading when they have reached the nth or first
	sequence. File closing now only waits if the input has reached end
	of file, and has a timeout on the wait to break out of the loop.

	Intelligenetics format sequence files with more than one sequence
	are now read correctly. Where the sequence ends with a number,
	intelligenetics format sequences can now be automatically
	detected.

	Add -methylation option to restrict/restover/remap/showseq
	to simulate (e.g.) dam/dcm restriction enzyme knockouts.

	remap now correctly reports restriction enzymes cutting a
	greater number of times than an optionally-supplied maximum
	value. The primary function of the application was unaffected.

	showfeat has a new option -joinfeatures to display all exons on
	one line for a join feature location. In previous releases this
	was one of the -sort options. It is now possible to use
	-joinfeatures and to select a sort order.

	Installing without X11 (using the --without-x option for
	./configure) used "x11" as the default graphics device in some
	applications. These now use "png" (if available) or "ps".

	needle and water with the -nobrief option repeated report header
	information on the longest and shortest similarity and identities
	because the previous header content was not cleared. This only
	affected results where there was more than one sequence as the
	second input.

	In the EMBL/GenBank feature table the group() and one_of()
	operators are obsolete. They are automatically converted to
	order().

	The command line syntax using the master qualifier name as a
	suffix (for example -sreverse_asequence) ignored the master
	qualifier name and set values for all matching inputs. This syntax
	is intended as a way for wrappers to better control the use of
	associated qualifiers, as it is cleaner than using a numeric
	suffix (-sreverse1 -sreverse2 etc.)

	Using -sreverse on the command line could reverse protein
	sequences for inputs that can read more than one sequence (seqall,
	seqaset, seqsetall). -sreverse is now only set for nucleotide
	sequence inputs. Single sequence inputs correctly ignored the
	-sreverse value.

	Multiple sequence sets can be read as input type seqsetall, but
	when this input was used for a single sequence set input (type
	seqset) all sequence sets were read. seqset input now stops after
	the first set (for example a PHYLIP or MSF alignment).

	Genbank test data had incorrect format. The data was extracted
	from a set of test GCG databases and had spaces in the feature
	locations.

	extractfeat now uses the new feature fetch functions and can
	retrieve features that include joins across entries.

	Feature parsing functions are added to fetch sequences from other
	entries. These depend on reusing the USA of the original sequence,
	with the identifier of the external sequence inserted in place
	of the original. This is known to work for database references and
	flat files.

	coderet was limited to EMBL/GenBank feature tables. It now
	processes any valid feature input including GFF files. The
	previous parsing functions are obsolete and have been removed
	as coderet was the only application calling them.

	Very large pairwise alignments can fail to back trace through the
	alignment because of rounding error. The alignment and traceback
	functions now use double precision to maintain accuracy.

	pepwindow and pepwindowall missed the plot value for the last
	window in the sequence.

	pepwindow and pepwindowall now process sequence ranges -sbegin and
	-send.

	pepwindow and pepwindowall now default to a window length of 19,
	ideal for transmembrane regions. The old default of 7 was short
	and gave noisy results.

	pepwindow and pepwindowall have an extra option -normalize to
	convert the amino acid data in the datafile to mean 0.0 and
	standard deviation 1.0. The default Kyte-Doolittle data is not
	normalized.

	The EMBL/Genbank feature table definitions have been updated to
	version 8.0 (October 2008). Sequence ontology terms are now
	available for all feature types except S_region for which no
	specific SO term exists. S_region is attached to an internal term
	derived from SO:0000301 as a placeholder.

	Programs searching with regular expressions and patterns reported
	the pattern name with '1' added to the end. This was to support
	pattern and regular expression files with multiple patterns. When
	only one pattern is given on the command line the '1' is no longer
	added.

	Programs searching with regular expressions (dreg and preg) missed
	overlapping matches to the pattern. The algorithm now steps
	forward one character from the start of the match and searches
	again. Some regular expressions with wildcards may produce a large
	number of overlapping matches especially in low-complexity regions.

	Protein sequences in GFF format now use GFF3 by default. For
	release 6.0.0 protein sequences were written in GFF2 while the
	GFF3 protein feature definitions were redefined using the Sequence
	Ontology. This process is now completed.

	When a sequence is reversed by revseq the description is tagged
	with "Reversed: " so that the output and any sequence derived from
	it has a note of the history.

	EMBL and GenBank formats when used to read multiple entries failed
	to reset the list of citations. Although the first set of
	citations was reported correctly, all other entries in the same
	run included the citation list from the first entry.

	SwissProt/UniProt entries now preserve the complete entry content
	when read and rewritten. All feature types are preserved and
	feature lines wrap according to the widths in UniProt 14.8. Date
	lines are stored and written. Comments are stored in blocks.
	Database cross-references are stored in a list. The description
	lines are saved in the new SwissProt structure. Tests on a set of
	complex entries confirm that EMBOSS is able to read and write an
	exact copy of this sample set.

	Protein feature keys now use the Sequence Ontology identifiers
	as internal names. This may change the way some feature keys are
	converted between data formats. Protein feature keys have been
	updated to correct some conversions, for example to distinguish
	between "coiled coil" from pepcoil and "random coil" from garnier
	output.

	Fitch sequence format was only able to read a single
	sequence. EMBOSS can now read 'fitch' as a multiple sequence
	format.

	Extractfeat now cleanly processes minscore and maxscore as limits
	on the score. By default any score is allowed if these are
	unchanged. Previous releases required minimum and maximum to be
	equal - or minimum greater than maximum - to permit any feature
	score.

	New feature XML output format DASGFF. Feature output functions
	have a changed interface to pass the AjPFeattabOut object so that
	additional processing can handle the opening and closing of an XML
	output file.

	New sequence output formats "dasdna" and "das" write DASDNA and
	DASSEQUENCE XML outputs. Sequence output functions have a new
	capability to define a Cleanup function to write the final lines
	of an XML output file. The AjPSeqout data structure already has
	the Count attribute needed to identify the first sequence so that
	the XML header can be written.

	New environment variable EMBOSS_ACDFILENAME provides an
	alternative way to set the default output filename for EMBOSS
	applications. If set to true, the filename is used rather than the
	current behaviour of using the first sequence name as the default
	filename. When the filename is used the case of the name is
	preserved.

	Corrected display of exon ranges in showseq. Exons now display in
	their original frame (all were displayed in frame 1 in earlier
	versions). Display of 3-letter amino acid names corrected (but we
	hope nobody is using 3-letter codes any more!)

	Added create attribute for outdir datatype in ACD. If true, the
	output directory will be created if it does not already exist.
	The default is false. output directories must already exist. This
	is the behaviour in previous releases.

	Added attribute aligned for datatype seqoutall in ACD
	files. Applications can write multiple sequences as a seqoutset
	(aligned or unaligned) and can also write seqoutall - writing
	sequences one at a time without first storing them as a set.

	For phylogenetic applications (PHYLIPNEW) reading distance matrix
	files failed for some formats written by other
	applications. Distance matrix input now works for multiple
	matrices in square, upper-triangular and lower-triangular formats.

	The PLPLOT graphics library uses 4 environment variables to allow
	local configuration. EMBOSS uses a local copy in libeplplot. For
	sites that have the native PLPLOT also in use we have renamed the
	environment variables to use the prefix EPLPLOT. This protects
	EMBOSS from any configuration set only for the local plplot.
	The variables are: EPLPLOT_BIN EPLPLOT_LIB EPLPLOT_TCL and
	EPLPLOT_HOME. Versions of EMBOSS up to 2.8.0 defined PLPLOT_LIB
	but this value is now automatically set and the environment
	variable is no longer needed.

	Command line qualifiers are renamed where the first 5 characters
	are the same. These were:
	    eprimer3 major revision of all options
	    est2genome -splice to -usesplace
	    prettyplot -boxcolval to -boxuse
            octanol -*plot to -plot*
            showfeat -match* to -*match; -source to -origin
            showpep -match* to -*match
            showseq -match* to -*match; -source to -origin
	    vectorstrip -vectorfile to -readfile; -linker* to -*linker
	and similar changes for EMBASSY applications.

	ACD processing now objects if two or more qualifiers are not
	unique in the first 6 characters. In a future release we would
	like to reduce this to a 5 character unique name. Several EMBASSY
	applications need to be modified to comply with this requirement.

	MEMENEW updated for meme/mast version 4.0.0. ememe now
	produces fasta, html, text, xml and xsl outputs. A new variant,
	ememetext, produces only the text and fasta outputs.

	DBX index file key deletion code added for ID/ACC/SV/KW/DE/TX
	indexes.

	HTTP access now adds a User-Agent string with the EMBOSS version
	number so that servers can count the number of EMBOSS requests.

	PDB model structures failed to generate a new name for each
	model. Duplicate sequence names are not ideal. The model number
	(from the MODEL record) is now appended to each sequence name in
	"pdb" and "pdbnuc" format. The "pdbseq" and "pdbnucseq" formats
	read a single copy of each sequence from the SEQRES records.

	Added two new PDB formats to read nucleotide data. These are named
	"pdbnuc" and "pdbnucseq". They are not available by default, to
	avoid the problem of reading both protein and nucleotide sequence
	data from a structure file for an oligonucleotide binding protein.

	Alignment outputs now include most of the multiple sequence
	alignment formats that EMBOSS can write. The functions for these
	are trivial to write. New functions can be added to use any
	existing sequence output format for alignments.

	PDB entries can be read in two ways, with two named
	formats. Sequence format "pdb" reads the ATOM records. Sequence
	format "pdbseq" reads the SEQRES records. By default, only "pdb"
	format was used, and could crash on entries where the ATOM records
	were missing. Both formats now fail silently if no sequences are
	found. By default, "pdb" format is used first, and if that fails
	"pdbseq" will be tried.

	The EMBOSS logfile (defined by variable EMBOSS_LOGFILE) now
	reports two extra values: the number of cpu seconds and the
	number of elapsed time seconds.

	Extra stop codons in getorf for ORFs ending close to the end of
	the input sequence no longer appear.

	For optional qualifiers (defined as "nullok" in the ACD file) the
	command line option -no(qualname) was causing output files to
	appear by resetting the value to an empty string, which in turn
	was converted to the default filename. Now -no(qualname) turns off
	any output file defined with nullok, and -(qualname) "" asks for
	an output file that is off by default and uses the default
	filename for it.

	Report output has a new tail format that reports the total
	sequences and total sequence length read by the applications. The
	previous "Total_sequences" report was the number of sequences
	included in the report. This is renamed to "Reported_sequences".
	Where the number of hits was limited by the -rmaxseq or -rmaxall
	options, the number of unreported hits also appears. If the
	rmaxall limit was exceeded, the report tails ends with
	"Maxhits_stop: Y". If the -rmaxseq limit is exceeded, the sequence
	report includes (as before) "HitLimit: max/total"

	Refseq protein and Genpept now use a modified genbank format to
	avoid warnings for "aa" replacing "bp" on the LOCUS line and to
	provide better control over any other differences between
	nucleotide and protein entries. Genbank format automatically calls
	refseqp format if a LOCUS line has "aa".

	Swissprot output was missing a '.' at the end of the organism line.

	vectorstrip failed if the user failed to provide a filename for
	the -vectorsfile option and failed to specify -novectorfile to
	turn off file reading. The ACD file is changed so a vectorsfile is
	required if -vectorfile is true and a check is put into the code
	to catch the problem if the ACD interface changes in future.

	Allow user-defined -carboxyl parameter for iep.

	jaspscan now allows multiple sequences to be scanned.

Version 6.0.0 15-Jul-2008

	New application aligncopy reads a set of aligned sequences and
	prints a report in one of the standard alignment formats that can
	accept the same number of sequences. Pairwise alignment formats
	can only be used if the input has exactly two sequences.

	New application aligncopypair reads a set of aligned sequences and
	prints a report or each pair of aligned sequences in one of the
	standard alignment formats.

	New application featreport reads a sequence and a feature table,
	and writes a report in and of the standard report formats.

	New application featcopy reads and writes a feature table to
	convert feature formats.

	New applications maskambignuc and maskambigprot replace ambiguity
	characters in nucleotide sequences with 'N' and in protein
	sequences with 'X'.

	New application consambig reports an alignment consensus sequence
	using ambiguity characters. The intended use cases are sequencing
	reads and SNP reporting.

	New application sizeseq sorts sequences in ascending or descending
	order of length. This is a port of the application seqsort from
	the domsearch EMBASSY package.

	New application skipredundant uses pairwise sequence matches to
	exclude sequences that are similar from an input set. This is a
	modified version of the application seqnr from the domsearch
	EMBASSY package.

	New applications provide utility functions for former GCG users:
	nohtml removes HTML tags, notab replaces tabs with spaces,
	nospace removes all whitespace from a file, skipspace removes
	extra whitespace from a file.

	Older EMBOSS applications can now generate a warning message
	stating that they are marked as 'obsolete' with an explanation and
	an indication of alternative programs in EMBOSS or in an EMBASSY
	package. This warning can be turned off by defining environment
	variable EMBOSS_WARNOBSOLETE with a value of "N" or by defining
	the same variable in the emboss.defaults or ~/.embossrc files. We
	will begin to mark applications as 'obsolete' in future releases.

	A new EMBASSY package "myembossdemo" contains the demonstration
	applications demoalign, demofeatures, demolist, demoreport,
	demosequence, demostring, demostringnew and demotable that
	illustrate how to use EMBOSS data types in your own
	applications. The myembossdemo package allows novice developers to
	try simple EMBOSS programming. The myemboss package is available
	for adding your own applications. The demo applications are no
	longer distributed with the main EMBOSS package. They were not
	installed and were only built with the "make check" option.

	Application short descriptions have been revised. The minimum
	length of application one line descriptions is increased from 60
	to 70 characters. The descriptions are easier to write. Output
	from wossname can now be 90 characters wide. Interfaces that use
	the description in menus may need to allow some extra space.

	Function names in ajfile.c have been standardised. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference. New source files ajfiledata.c and
	ajfileio.c have been added. The buffered file data structures are
	renamed internally to be more consistent (AjPFileBuff to AjPFilebuff).

	notseq was unable to search for IDs containing '|' characters
	but uses string matching (not regular expressions) and these
	characters are valid in NCBI-style FASTA files if read with the
	"pearson" format which accepts the whole ID string without parsing.

	The sequence alignment code has been updated. Sequence alignments
	with low gap penalties failed to allow two gaps (one in each
	sequence) without a match in between. The embAlign functions are
	now simplified. Scores are returned by the PathCalc functions. The
	Walk functions that walk through the path and return the aligned
	sequences are faster and need fewer parameters. Profile alignments
	occasionally duplicated residues in the sequence around gap
	positions. Fast alignments around a limited width include
	additional residues at each end and require an offset rather than
	separate start positions. The offset if the difference between the
	two start positions used in 5.0.0 and earlier releases.

	Eprimer3 citations are corrected in the help text (from the ACD
	file) and in the documentation. The citation errors were traced to
	the original primer3_core documentation which has now been
	corrected.

	Wordmatch could confuse overlapping matches. It occasionally
	extended the wrong match and missed a corresponding new match.

	Seqmatchall results were correct with the default output
	format which reports match positions, but gave incorrect results
	with some other local alignment formats that include the sequence.
	Seqmatchall now stores alignments in the same way as other local
	alignment applications, and the alignment internals are corrected
	to ensure other applications will not have the same problem.

	Emma was officially supporting clustalw 1.83. Issues with clustalw
	2.0 are now resolved and this version is supported if clustalw2 is
	installed. Emma executes an applications called clustalw (not
	clustalw2) so version 2.0 must be installed under this name or an
	environment variable EMBOSS_CLUSTALW needs to be defined to point
	to the executable clustalw2 file.

	Sequence format "selex" allows invalid sequence data files to be
	accepted as input. Selex format is still available but is no
	longer included in the formats that can be automatically
	detected. When reading selex format data, users need to put
	"-sformat selex" on the command line, or specify "selex::" at the
	from of the USA. See the HMMER (old version EMBASSY package)
	documentation for examples. HMMERNEW (recommended) examples use
	Stockholm format and so are unchanged.

	Program dbxfasta now defaults to a filename of "*.fasta"
	The previous default "*.dat" is not commonly used for FASTA format
	databases.

	Program msbar block mutations were 1 longer than the specified
	block and may crash if the block size was fixed (minimum and
	maximum block sizes the same). This off-by-one error is now
	corrected.

	In GenBank output format, multiple line KEYWORD sections were not
	formatted correctly.

	ACD list and select values (the menus that appear in the user
	prompt) can now have ACD variables. Although useful for local
	application development these are not used in EMBOSS distributed
	ACD files because the variables are difficult for web and GUI
	interfaces to resolve when presenting the menu text.

	List and Table internal data structures are now cached so that
	creating and deleting temporary lists and tables is more efficient.

	In emboss.default database definitions the filename and exclude
	values can be delimited by spaces, commas or semicolons. Previous
	releases used only spaces. Parsing is now consistent with the
	fields definition which allowed all the above characters.

	Protein sequences with pyrrolysine ('O') had 'O' converted to a
	gap because this was a gap character in early versions of
	Phylip. This was patched in 5.0.0 to allow 'O' in UniProt release
	13. The gap character is upper case only, so 'o' was correctly
	read as pyrrolysine.

	Wordfinder used the same descriptions for two pairs of qualifiers.
	The descriptions are changed to make their meaning clear in
	commandline help and in web interfaces.

	New function ajTimeDiff returns the difference in seconds between
	two time values.

	Profiling tests showed that file reading and string handling can
	be made faster. String handling called functions many levels
	deep. Making this code inline and using macro versions improved
	performance for applications (e.g. database indexing) that use
	many string calls. File input requires each input line to be
	copied. Using copy-by-reference (ajStrAssignRef) often makes this
	more efficient. Existing macros now test for undefined strings:
	MAJSTRGETLEN, MAJSTRGETPTR, MAJSTRGETRES and MAJSTRGETUSE. New
	macros are added for string handling: MAJSTRDEL,
	MAJSTRGETUNIQUESTR, MAJSTRCMPC and MAJSTRCMPS.

	Memory management includes new macros AJCRESIZE0 and AJRESIZE0
	provide resize functions that guarantee new memory is set to
	zero. The functions must be given the original allocated size.

	Using the GNU C run-time library, calls to mcheck and mprobe are
	available to test for memory corruption by examining the bytes
	before and after an address allocated by malloc. This can be
	turned on for any application, including Unix commands, with the
	environment variable MALLOC_CHECK_ which has values 0, 1, 2 or
	3. 1 writes to standard error when a problem is found, 2 aborts
	the programs, 3 does both and 0 ignores errors. No recompilation
	is needed for this simple method. EMBOSS now has a ./configure
	option --enable-mprobe which enables two new
	functions. ajMemProbe, passed an address from malloc (AJNEW0,
	AJCNEW0, etc.) tests the bytes before and after and reports any
	errors. The advantage of using ajMemProbe rather than mprobe is
	that a macro MAJMEMPROBE also reports the file and line number
	where it was called. To avoid large numbers of messages (when
	code has problems) a limit can be set with ajMemCheckSetLimit
	after which the program will exit. Note that enable-mprobe is
	incompatible with using valgrind to test for memory leaks - as
	mprobe and mcheck have to look at illegal bytes before and after
	allocated memory blocks. Memory checking is turned on by a call to
	mcheck, passing the function ajMemCheck, in ajnam.c before the
	first memory allocation. If any program calls malloc before
	calling embInit or embInitP this call will fail and issue a
	warning (if compiled with --enable-mprobe). A special call
	ajStrProbe tests any string with mprobe. Special calls ajListProbe
	and ajListProbeData test lists and their contents. For more
	details see http://www.gnu.org/software/libc/manual/

	Protein sequences from the Staden package were read as nucleotide
	because they were missing information on the ID line to identify
	EMBL of SWISSPROT format. The sequences are now tested and
	correctly typed.

	Wordcount now accepts protein sequences as input. Previous
	releases only allowed nucleotide sequences.

	Wordfinder options had the same information prompt. These have
	been changed from "limit" to "minimum" and "maximum" to make their
	function clear.

	Prompting for values from the user now includes a test for
	standard input in use as an input file. If standard input is open,
	the default response is accepted and a message is written to the
	user. This is to avoid problems with command lines that use
	"stdin" as an input and do not include -auto.

	The acdpretty utility can now preserve comments in ACD files.
	Comments are maintained in blocks with blank lines before and
	after. Inline comments are started in column 50 unless they are
	exceptionally long. Comments themselves have white space cleaned
	up but otherwise are not reformatted.

	A new function ajAcdGetValueDefault is added to return the default
	value of an ACD qualifier. This can be combined with
	ajAcdIsUserdefined in wrappers to test for values changed by the
	user.

	Infile qualifiers in ACD have a new attribute "trydefault" which
	allows the default filename to fail. Any filename provided by the
	user has to exist. This was added to support the behaviour of the
	MIRA EMBASSY package. To allow an infile to fail the attribute
	"nullok" also must be set to "Y"

	Applications which produce an output file or graphics often
	created an empty output file when the plot was selected.
	The ACD files have been corrected to only create the file if it
	will be written to. Applications changed are charge, dan,
	freak, hmoment, iep and tcode.

	Whichdb only writes to its output file if -get is false.
	With -get it creates sequences. The outfile is no longer created
	when whichdb is in -get mode.

	String functions corrected so that Case in the name always means
	case-insensitive and works by converting to upper case. Some
	functions were defined the wrong way, with "Case" for the
	case-insensitive form.

	GFF3 format is now the default feature output.

	A new function ajFeatIsCds identifies protein coding nucleotide
	features (CDS) using the SO identifier. A new function
	ajFeattagIsNote identifies feature tags that are for the default
	feature tag.

	Protein features now use the new Sequence Ontology terms defined
	by BioSapiens. These are not yet accepted by GFF3 validators. The
	new SO identifiers are added to protein feature definitions and
	used internally.

	Feature format definitions (the Efeatures and Etags files)
	now allow #include references to other files. This allows a
	standard EMBL and Swissprot feature table definition to be
	included by the internal and GFF definitions. Redefinitions are
	allowed using + and - prefixes to add and remove tags for existing
	feature types.

	GFF3 format feature (and report) output is added.

	A new application "density" has been added. This reports the
	A+C+G+T and AT+GC densities of nucleic acid sequences within
	an adjustable sliding window. Plots of A+C+G+T or AT+GC are
	optionally produced.

	Molecular weight programs (e.g. digest, mowse) now have a
	-mono switch to allow use of monoisotopic weights.
	By default, average molecular weights are used.

	The Eamino.dat format has changed. Molecular weight information
	has been removed and put in its own Emolwt.dat file. This latter
	now allows specification of average and monoisotopic weights. Values
	for hydrogen and oxygen are specified as well as the amino acid weights.

	The library representation of amino acid property information
	has been changed. The EmbPropTable global table has been
	removed and replaced with EmbPPropAmino and EmbPPropMolwt objects.

	Pepcoil now produces a report (replacing a text output) in "motif"
	format. The default is changed to not report non coiled-coil
	regions as they are hard to distinguish in this format.

	The "motif" report format is extended to allow two score positions
	marked with "*" and "+" and labelled internally as "pos" and
	"pos2". No application uses pos2 (it was added for pepcoil, but
	both score maximum positions are always the same)

	A new function ajAcdIsUserdefined allows wrappers to test which
	qualifiers have values changed by the user so that they can use
	shorter command lines to launch the wrapped application.

	jaspscan application added. Scans sequences for transcription
	factors using the JASPAR matrices.

	jaspextract application added to move the JASPAR matrices into the
	EMBOSS data area subdirectories.

	Alignment format "trace" used to display internal data content, is
	renamed to "debug" to be consistent with other formats. A "debug"
	format is added for feature output.

	Application documentation has been updated to remove obsolete
	references to EMBL database identifiers. These are replaced with
	the correct accession numbers.

	Two new entries have been added to the "tembl" test EMBL database
	for use in the QA tests.

	Report output now checks the sequence and feature table type. Is
	the sequence is not a valid protein, protein-only formats (pir,
	swiss) will fail with an error message. Similarly, if the sequence
	is not a valid nucleotide sequence then nucleotide-only formats
	(embl, genbank) will fail with an error message.

	Garnier now uses the correct SwissProt and internal feature keys
	for protein secondary structure. The results will appear much
	better for example as a swissprot feature table. This required
	rewriting of the internals by recoding the secondary structure
	features with a "garnier" tag replacing the previous "helix",
	"sheet", "turns" and "coil" tags. The default output is
	unchanged. The results in other report formats will be changed.

	Silent no longer reports the "Dir" column. This is replaced by the
	new "Strand" column which reports "+" for a forward feature and
	"-" for a reverse feature.

	The following programs have changed default report output, with
	the strand included for nucleotide sequences: equicktandem,
	etandem, fuzznuc, fuzztran, recoder, restrict, silent, tcode,
	twofeat. The strand column can be removed with the new command line
	associated qualifier -norstrandshow.

	Reports for nucleotide sequences have confusing ways to represent
	the start and end positions for features on the complementary
	strand. A strand column has been added to these reports,
	controlled by a new -rstrandshow qualifier and attribute. By
	default the strand is shown for all nucleotide reports (see a list
	of changed program outputs above). The start position is always
	lower than the end position for features on the complementary
	strand indicating the region that should be reversed. In past
	releases the seqtable report format (fuzznuc, dreg, dan)
	confusingly reversed start and end positions to indicate the
	unreported strand. For all report formats (nametable, table) the
	start and end positions are now consistent with nucleotide feature
	formats (gff, embl, genbank).

	Reports from dreg incorrectly reported sequences reversed with the
	-sreverse qualifier.

	Report headers now include the text "(Reversed)" when the input
	sequence(s) are reverse complemented.

	Phylogenetic trees in newick format are now parsed into internal
	trees and converted back for use by Phylip. This allows us to
	read other tree formats and pass them to Phylip (e.g. Nexus)

	Some ACD data types did not allow the input to be NULL because
	extra tests were carried out on the results. These are all cleaned
	up and tested so that they can safely be set to nullok and missing
	in local applications.

	New sequence reading formats for PDB files. By default the ATOM
	records are used (format "pdb"). An alternative format "pdbseq"
	will read the SEQRES records which give the original sequence. The
	ATOM records give the sequence determined from the structure.

	Improved the help text for the -stdout and -filter options to
	explain output files are written to standard output. Some users
	expected graphics output (from plplot) to be controlled.

Version 5.0.0 15-jul-2007

	Extractalign is a new applications to extract regions from a
	sequence alignment in the same way extractseq extracts regions
	from single sequences.

	The MRS server in Nijmegen changed its syntax just before our
	release. A new database access method "MRS3" supports the main
	MRS3 server. We have very little documentation on the changed URL
	query syntax. Access by ID appears to work at this stage. The
	database URL is defined as http://mrs.cmbi.ru.nl/mrs-3/plain.do
	The plain text output is now defined in the URL. The database
	names have all changed on the server. At present the same server
	appears to still support the old MRS access method with the URL
	http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi

	ACD parsing now allows square brackets within quoted strings.

	Functions for lists and tables have been renamed to new standard
	naming conventions. Some source files remain to be standardized
	after the release, most importantly ajfile, ajfeat and some
	remaining ajseq source files.

	Warning messages are available for sequence formats that do not
	allow additional characters. The environment variable
	EMBOSS_SEQWARN needs to be set to "Y" to enable warnings. For
	example, EMBL format allows numbers in the sequence records. Fasta
	and related formats now warn for any characters that are not
	whitespace and not known sequence characters. These warnings are
	controlled by an environment variable so they can be disabled (or
	enabled) for specific installations and/or wrappers. We expect
	many cut-and-paste inputs can generate warnings. EMBOSS will
	normally silently remove non-sequence characters.

	Regular expression pattern file names (for dreg and preg) were
	converted to upper case if the ACD file required the patterns to
	be upper case.

	The EMBOSS commandline now accepts gnu-style syntax with
	--qualifier (we allow one or two '-' characters). Users who tried
	this syntax were confused because EMBOSS treated --qualifier as a
	parameter. In many cases it was used as the output filename, which
	would give no error message but make it hard to find the output.

	Antigenic now accepts any protein sequence as input (earlier
	versions did not allow ambiguity codes). B and Z are treated as
	weighted averages of D/N and E/Q. All others are converted to X
	and treated as a weighted average of all values. The data table
	used has no information for selenocysteine or pyrrolysine.

	Dottup is corrected to plot only the selected sequence range. The
	plot lines were 1 residue too long (only noticeable on very short
	sequences).

	Distance matrix data can now read multiple distance matrices from
	a single input file. This is used by three programs (fneighbor,
	ffitch and fkitsch) in the phylipnew EMBASSY package.

	Discrete states input now correctly defaults to all non-space
	characters if no characters attribute is given in the ACD file.
	This was the intention, but two programs (fpars and fdiscboot)
	were instead accepting only 0 and 1. Other phylip programs have
	their discrete state character set specified in the ACD file.

	A new function ajSystemOut calls a system command, and redirects
	standard output to a named file.

	Function names are standardised for the ajsys, ajtime and ajutil
	functions.

	New function ajStrTableFreeKey frees only the key from tables
	where the value is a constant.

	Error messages from reading badly formatted comparison matrix
	files are improved to report the line and the token that failed
	to parse.

	Test data has been updated. EMBL and SwissProt entries are updated
	to the latest versions of these entries. Swnew entries are now a
	selection from the SpTrEmbl subset in UniProt. The wormpep
	database is obsolete. We do not have current data for the gb
	directory which contained GCG reformatted genbank entries.

	NBRF (or PIR) format failed to read some entries from SRSWWW
	servers because the sequence ID does not match if the protein is a
	fragment.

	Efficiency of building large strings is greatly improved by
	doubling the reserved space each time the end is reached. This
	speeds up the reading of all long sequences.

	String function ajStrFmtWrap to wrap strings for output now
	respect newlines in the original string. A new function
	ajStrFmtWrapAt prefers to wrap at a selected character, for
	example ',' for author lists.

	Sequence objects are extended to include the full set of fields
	defined in EMBL, Genbank and UniProt database entries. The "embl"
	"genbank" and "swissprot" formats now read and write all fields,
	so that entries will be rewritten exactly as in the originals
	except for a few minor corrections (extra spaces in feature tables
	are removed). We cannot guarantee that information is preserved
	when writing out in a different format. For example, EMBL and
	Genbank formats do not contain the same information.

	GIF graphics output added where the gd library is a recent enough
	version to provide support.

	The plplot graphics library has been updated to 5.7.2. New files
	are disptab.h pldll.h, file gd.c replaces file gdpng.c and needed
	one change for FREETYPE.

	Infoseq can now optionally display the database name.

	The acdvalid utility warns about qualifier names that do not fit
	the standard naming convention. The messages now include a
	suggested valid name, for example an input file called -sites
	will be suggested as -sitesfile.

	Sequence output in EMBL and SWISS formats now defaults to the new
	format of the databases from 2006. The previous formats are still
	available as "emblold" and "swissold". As sequence input, "embl"
	and "swiss" formats will read both versions of the files.

	Function ajTableRemove deletes an entry in a table, but only
	returns the value. This is replaced by ajTableRemoveKey which also
	returns the original key. The caller now owns both the value and
	the key, and is responsible for deleting them. ajTableRemove is now
	declared obsolete and will be removed from a future release.

	Infoseq by default uses columns with fixed width, but this fails
	to delimit long sequence names (for example, long file names and
	paths). Two changes make this better. Infoseq now inserts a space
	in column-delimited output (the default) when a string fills the
	whole column. It is also now possible to specify a tab as
	delimiter with -nocolumn -delimiter "\t" to return to 3.0.0
	behaviour. This was needed for the W2H interface and maybe some
	other wrappers.

	Renamed libplplot to libeplplot and plplot headers are now
	installed to include/eplplot. This avoids collisions with later
	versions of plplot.

Version 4.1.0 04-mar-2007

	Bugfix 1: graphics output failed to reset the title correctly in
	some applications. Prettyplot and banana badly rescale the output
	from the second page of multipage output. Abiview produced
	additional blank pages with only the title. Abiview also had bugs
	in display when the user changed the window size or asked for
	separate plots for each trace.

	A new ACD attribute outputmodifier: "Y" identifies qualifiers that
	cause the kinds of output changes that can break parsers. An
	obvious example is the -html qualifier on may of the utility
	programs. This attribute is a warning to wrapper developers and
	maintainers that they may want to fix the value of this qualifier
	and not allow users to change it. In some cases (as with toggle
	qualifiers) it may be useful to wrap each possible value
	separately. For example, tfm can run as an HTML version (-html)
	and a text version (-nohtml -nomore).

	Backtranseq now keeps stop positions in the sequence and replaces
	them with the most common stop codon. Previous releases converted
	stops to 'X' and back translated them as 'NNN'.

	Reading sequences in NBRF (or PIR) format now only removes one '*'
	from the end, allowing protein sequences to end with a stop codon.

	Reading NBRF format sequences in FASTA format was retaining a ';'
	in front of the sequence ID. This is now fixed.

	Pattern files and regular expression files now use the -pformat
	and -pname associated qualifiers which were ignored when they
	first appeared in 4.0.0. Pattern file formats are "fasta" for the
	original format in 4.0.0 with FASTA style identifiers, and
	"simple" for files with a single pattern on each line. The format
	defaults to testing the first character for a '>'. The pattern
	name is used to set a name of "name1", "name2" and so on if no
	name is in the FASTA file. By default patterns are called
	pattern1, regular expressions are called "regex1".

	Added a new function to read from a buffered file and trim
	newlines. It was not needed before because input functions were
	doing their own trimming.

	Valgrind memory leak tests now cover all QA tests. The command
	line is captured and used to generate test cases. Script
	valgrind.pl knows about the few cases that need input files copied
	and preprocesses them by name. A few tests can be flagged as
	ignored. This is intended for tests known to run for a very long
	time under valgrind. Memory leaks are fixed for all programs in
	the main EMBOSS package and for the most used ones in the EMBASSY
	packages.

	A new environment variable ACDCOMMANDLINELOG takes a filename as
	its value. This saves the command line equivalent of a program
	run, converting user responses to prompts into their command line
	equivalents. A number of bugs in command line saving for report
	headers were identifier and fixed.

	Two string functions had their names reversed. ajStrRemoveWhite is
	to remove all white space from a string, ajStrRemoveWhiteExcess is
	to remove white space from the ends and replace internal
	whitespace with single spaces. When function names were
	standardized these names were reversed. As function calls were
	converted automatically EMBOSS code worked as before, but
	developers will notice the functions to not behave as
	expected. This is now corrected, and all existing calls in the
	EMBOSS code have been checked and converted.

	Showseq with a sequence end position now stops output at the end
	of the user-specified range, Previous releases printed the whole
	of the line with the last base/residue.

	SRS servers use "gid" as the field name for GI numbers. The field
	name has been changed to allow GI searches with local SRS and
	remote SRSWWW access to Genbank.

	A new configure option for developers --enable-devwarnings
	turns on many more warning messages from the gcc compiler. Not all
	warnings are useful - the less useful gcc options are documented
	(and commented out) in the configure.in file devwarnings section.
	Warnings include missing function prototypes, signed/unsigned
	comparisons, potential loss of precision in casts, use of global
	names (index for example) as variables.

	Function names in ajseqwrite.c have been standardised. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference.

	Edialign is a new application, a port of the DIALIGN2 program by
	B. Morgenstern, using an ACD file written by Guy Bottu.  It takes
	as input nucleic acid or protein sequences and produces as output
	a multiple sequence alignment. The sequences need not be similar
	over their complete length, since the program constructs
	alignments from gapfree pairs of similar segments of the
	sequences.

	Wordfinder is a new application to find word-based matches of
	limited size. It is based on code from supermatcher. The inputs
	are reversed so the query sequence set (unaligned) is compared to
	a streamed database of sequences. (Supermatcher should perhaps
	have its inputs in this order too). Limits are provided for the
	length of the word match and the length of the alignment. The
	default gap penalties are also increased to limit the gaps allowed
	in alignment.

	Word-based algorithms found too many matches where both sequences
	contains runs of X (protein) or N (nucleotide). These are now
	ignored when building the word table.

	Word-based algorithms complained if a sequence was shorter than
	the wordsize. This was a problem for database searches with some
	short sequences present. They now run silently and simply return
	no word matches.

	The EMBL format sequence entry parser was able to read swissprot
	sequence data, but not the feature table. Efficiency improvements
	to set the sequence type to nucleotide for EMBL entries showed
	that swissprot entries were being read by the EMBL parser. A test
	for swissprot protein information on the ID line should redirect
	these entries to the swissprot parser. In previous releases the
	sequence type was not set, so there was no problem with the
	sequence type - although feature lines may not have been readable
	from swissprot format flat files. Database definitions specify the
	swiss or embl format so they are not affected.

	Large sequences were running very slowly. This was traced to the
	way sequence types are tested using regular expressions processed
	by calls to the PCRE library. These calls were replaced by simple
	string functions as they are only testing that a sequence is
	entirely composed of characters from an allowed set. An
	additional speedup was achieved by defining only upper case
	characters as required (almost halving the number of tests) and
	testing the upper case version of the sequence characters.

	Sequence translation in the reverse direction adds extra amino
	acids for partial codons. In the forward direction the overhang
	was miscalculated so these codons were missed. No users have
	complained, probably because in most cases they are translated as
	'X' (it needs a 4-base wobble in the code to convert the first 2
	bases of a codon into a single amino acid).

	Sequence translation was relatively slow, at least on very large
	sequences. Profiling with gprof indicated some changed to reduce
	the number of string handling calls (each was very fast, but
	there was a very large number of calls. The internal tables were
	resized (from 15 elements to 16) for more efficient mapping.

	Parsing NCBI format ID lines saves the database. This is available
	for writing NCBI formatted output ID lines, but is not to be used
	in reporting the USA.

	Added "refseq" as a sequence and feature format. Initially a
	simple alias of GenBank but we may let them diverge later.

	REFSEQ entries have their own idea of what a ProteinID in the
	feature table looks like, as they use REFSEQP protein IDs.
	Validation now allows the third character to be an underscore.

	Large numbers of database files could make the dbi indexing
	programs (dbiflat, dbifasta, dbigcg, dbiblast) fail at the sort
	merge stage when the index files are combined. The sort merge is
	now in 2 steps to limit the number of open files required in the
	system sort utility.

	Added a script emblsplit.pl to split EMBL and UniProt database files
	into 2Gbyte chunks.

	The -sid qualifier now overwrites the sequence id if used. The
	-sid value will be used for creating the output filename and for
	reporting the sequence identifier in output files. For more than
	one sequence as input currently the same ID is used. We may change
	this in future to generate new IDs from this base name.

	New sequence format gifasta is the same as "ncbi" but uses the GI
	number as the identifier. Because the output is the same for both
	formats we have to require -sformat gifasta to be on the
	commandline. The default for such files will remain "ncbi" as the
	automatically processed format. On output if there is no GI number
	a dummy value of "000000" is currently used.

	coderet now writes non-coding sequence to a new output file.

	New feature function ajFeatLocMark marks selected features as
	lower case. Used by coderet to report non-coding regions.

	The help output now correctly reports output sequence default
	filenames.

	Phylip input distance matrices now allow integer values to be
	treated as reals, although there is a possible confusion over
	integer replicate values so the use of a trailing ".0" is strongly
	recommended.

	Sequences with NCBI deflines and no ID after the final "|" were
	using the version part of the seqversion ("1" from "AB123456.1")
	instead of the "AB123456" part to set the ID.

	Graph titles were not standard on the general "graph" type output,
	but are consistent for xygraph outputs. A new attribute gdesc
	defines a prefix for graph titles which can be appended to by the
	calling program, usually with a description of the input (sequence
	USA, input filename). A new call ajGraphSetTitlePlus defines the
	text to add to the gdesc as "[gdesc] of [text]". All graphs were
	standardized except pepinfo which has 10 subplot titles already in
	the intended format. This will be corrected later to have standard
	main titles and shorter subplot titles.

	The version of plplot we use has a bug in calculating character
	sizes where the origin in user units is not the default of
	(0,0). This has been fixed in the plgchrW and plstrlW functions in
	the copy that is included with EMBOSS.

	Dreg and preg ignored sequence begin and end positions. Both
	programs now use the embpatlist function calls to process sequence
	ranges.

	Fuzznuc, fuzzpro and fuzztran lost the ability to use the sequence
	begin and end positions when we switched to pattern lists. This
	has been restored in the pattern list processing code.

	The logfile caused a file close error if it was read only (because
	it had not been successfully opened). Opening the logfile now
	tests the file is writable and ignores logging for a read-only file.

	More case-sensitive sequence comparison and matching functions
	added to be consistent about providing both versions.

	A few sequence databases have no accession number. For these a new
	database attribute hasaccession: "N" in emboss.default prevents
	EMBOSS trying to search the ACC field in addition to the ID field.

	A few databases with duplicate IDs should be treated as
	case-sensitive. The original example was a pdbprot database,
	containing FASTA format sequences of individual chains from PDB
	entries. In PDB, the entry itself is a 4-character string, and the
	chain is a single character A through Z. When an entry has more
	than 26 chains, the next 26 are labelled a through z. Pdbprot
	appends these as _A, _B, etc. PDBPROT is available from some
	public SRS servers - see the official list at
	http://downloads.lionbio.co.uk/publicsrs.html.
	This is resolved by adding a new database attribute caseidmatch in
	emboss.default. A value of "Y" will force EMBOSS to exactly match
	the case of the whole ID. This is done by post-processing and
	rejecting entries with an ID that fails to match.

	The run date included in report output has changed format to have
	the day first and to lose the leading zero when the day is 1st to
	9th of the month.

	Program cpgplot can run on more than one input sequence, but the
	plot failed on the second sequence. Fixing this required adding a
	new function ajGraphDataReplaceI to replace the 1st, 2nd 3rd,
	etc. subgraph. Some memory cleanup was also added to remove
	the replaced graph data objects.

	Programs pepwindow and pepwindowall can now process any
	protein sequence. In previous versions pepwindow was restricted to
	pureprotein (no ambiguity codes) while pepwindowall accepted any
	protein sequence (it has to handle gaps) but was using a score of
	zero for unknown amino acid residues. Changed so that missing amino
	acid values can be filled in using Dayhoff frequency weighted
	averages for B, J and Z and an overall average for X, J and O.

	Program octanol can accept any protein sequence. Interpolated
	values are used for B, Z and J. An average over all values is used
	for X and also for O and U where there is no data. Interpolations
	and averages used the Dayhoff amino acid frequencies.

	Program iep can accept any protein sequence. Ambiguity codes B and
	Z are resolved by converting to the carboxylic acid (D or E) or
	amide (N or Q) according to the Dayhoff amino acid frequencies,
	giving a consistent value for any input protein.

	Sequence set type testing was checking whether the seqset is
	defined as protein but ignoring the type of the first
	sequence. This is now fixed.

	Program tfm looks in the obsolete install directory with the -html
	option. Changed to find the embassy package name from the
	installed ACD file and then to find the installed HTML file. If
	EMBOSS has not been installed, will also search the original
	source files.

	Modified NCBI/FASTA format to preserve the database name from the
	NCBI style ID. The database name is reported in one of the many
	and varied NCBI syntax variants, depending on whether there is a
	version or accession number, and whether there is an EMBOSS
	database name also involved (for example, an entry in a file
	indexed with dbxfasta or dbifasta)

	Modified "pearson" sequence format to keep the FASTA file ID
	complete. For historical reasons GCG-style dbname:id syntax was
	still having the db part trimmed. This will still be trimmed from
	fasta or ncbi format.

	The report for digest has Cterm and Nterm columns capitalised to
	match the rest of the report. Sequence ranges now give correct
	cterm and nterm results.

	The list file Cut.index for codon usage tables was changed to
	remove old file names (commented out list at the end) and to
	remove underscores from the species names.

	Programs water, needle, merger and prophet calculate an internal
	path size from the lengths of the input sequences. For sequences
	that are too long, a fatal error is produced. But if the sequences
	are extremely long, the test failed and the program gave a
	segmentation fault. This fix tests in a different way that will
	catch all cases. (added as a fix to 4.0.0)

	The new MRS access method used a general search. This gave strange
	results when the ID or accession appeared in any other entry. It
	appears that MRS can search for id or accession only. This worked
	on the main MRS server at least. (added as a fix to 4.0.0)

	New database access methods MRS and DBFETCH need to be explicitly
	turned on so that showdb can report them. (added as a fix to
	4.0.0)

	When deleting the last line of buffered input, failed to reset the
	pointer to the last buffered line. This only affected debug
	traces. Unfortunately, the ajFileBuffClear function does call the
	debug trace. In practice we have only seen this bug when
	processing sequence data in EMBL format from an MRS server. (added
	as a fix to 4.0.0)

	Pattern and regular expression searches failed to correctly
	reverse a nucleotide sequence. The change is to use
	ajSeqReverseForce (always reverses the sequence provided) instead
	of ajSeqReverseDo (which only reverses if the reverse flag is
	set). (added as a fix to 4.0.0)

	Reports in list format failed to write a usable USA for "asis"
	sequence input, and incorrectly reported reverse strand nucleotide
	features. (added as a fix to 4.0.0)

	The lists files Matrices.nucleotide, Matrices.protein and
	Matrices.proteinstructure now have comment headers explaining
	their format.  Fixed issues with nucleotide features in the
	reverse direction in reports. The start/end positions were stored
	the wrong way around and then reversed again when reported in one
	of the report formats. However, reporting as EMBL features showed
	the incorrect storage. ajFeatNewII now checks start/end and
	reverses the feature if start is greater than end. ajFeatNewIIRev
	sets the reverse strand and also checks that the start position is
	greater than (or equal to) the end position (added as a fix to 4.0.0)

	To reduce the size of very large reports, for example when fuzznuc
	or fuzzpro run over very large databases, new qualifiers are added
	to report output. -rmaxseq gives the maximum hits for any one
	sequence, -maxall gives the total maximum number of hits. The
	report tail contains a record of the number of hits reported and
	found. The qualifiers are intended for web interfaces to control
	the maximum output they need to report. When the maximum hits
	figure is reached, ajReportWrite returns false so that programs
	can terminate at that point. (added as a fix to 4.0.0)

	Reports now write a header and tail when closed, to make sure that
	all programs will write something to the report file. The default
	header contains the command line provenance, the tail contains the
	number of sequences and hits. (added as a fix to 4.0.0)

Version 4.0.0 15-jul-2006

	The format of the knowntypes.standard file in the emboss/acd
	directory has changed to list the knowntype first, then the
	datatype and finally the description. The file should be sorted by
	knowntype, and any description should not end in "file" so that
	file and directory prompts can be generated.

	Standard prompts can be generated from the knowntype for files,
	directories and other data types. This can reduce the need for
	special information: attributes, but to help those who maintain
	parsers and wrappers we will try to keep an information string in
	the ACD file to match the prompt generated by EMBOSS. Acdvalid
	will report cases where the information string does not match the
	generated prompt. There may be a few cases where two inputs or
	outputs of the same knowntype are needed.

	The output produced by -help provides more information about
	associated qualifiers than the HTML table view (from acdtable)
	which is included in the HTML documentation in the
	distribution. However, there is also a lot of extra information
	in the acdtable output on the default values and the allowed
	values for each qualifier. The -help output is now expanded to
	include all the information provided by the acdtable view. A
	benefit of this is that we can now remove the badly formatted
	acdtable from the text version of the documentation. This is used
	by tfm so the output of the tfm program will now be easier to read.

	The default prompts for input and output files have been very
	simple for the first 10 years. EMBOSS now has a "known type"
	defined for all files in ACD. The known type is now included in
	the automatically generated prompt for input and output files. To
	help in this process, the known type should not have the word
	"file" at the end. This will be added automatically in the prompt.

	Printing with conversion type %g could write extra zeros where the
	decimal point was stripped. In C, %g conversion removes trailing
	zeros and the decimal point if nothing remains after it. The AJAX
	print conversion functions added extra zeros at start of the
	output to extend the result up to the expected width.

        Prophet modified to use an "align:" ACD definition rather than an
	"outfile:".  A bug which was mixing up the name of the profile with
	the name of the sequence has been fixed.

        Simple XML DOM added. This has no additional library
	dependencies. This is a preliminary step in producing (revisiting)
	XML graphics output etc.

        EMBL/Genbank have agreed to add a new amino acid code 'O' for
        pyrrolysine. O has been added to EMBOSS checking for protein
        sequence data, and to the existing data files that contain 'U'
        (selenocysteine). IUPAC/IUBMB has accepted the use of O for protein
        sequences. This means that any alphabetic text is now a valid
        protein sequence. There are 20 naturally occurring amino acids,
        plus 'X' (unknown) 'B' and 'Z' ('D' or 'N' and 'E' or 'Q' for
        analysis of complete digests) 'J' ('I' or 'L' in mass spectrometry)
        plus 'U' (selenocysteine) and 'O' (pyrrolysine). There is a small
        complication - older versions of phylip sometimes use 'O' as a gap
        character. EMBOSS will still allow this in nucleotide sequences.

        New sequence access method "mrs" uses CMBI's "Maarten's Retrieval
        System" http://mrs.cmbi.ru.nl/mrs/cgi-bin/mrs.cgi to query
        databases by ID or accession.

        New sequence access method "dbfetch" uses the EBI's dbfetch REST
        services http://www.ebi.ac.uk/cgi-bin/dbfetch to query databases
        by ID or accession.

	iep changed to allow users to specify number of modified
	(uncharged) lysines and intrachain disulphide bridges. This
	includes extensions to embIep functions to include the two new
	parameters. These updates were provided by Clemens Broger of
	F.Hofmann-La Roche Ltd.

	Changes to splitter and union by Kim Rutherford (Artemis
	maintainer at the Sanger Institute) allow features to be preserve
	for nucleotide sequences. The default operation of both programs
	is unchanged.

	Regular expression pattern lists are accepted by dreg and preg.
	The output reports include pattern names which default to regex1,
	regex2, and so on. The "regex" prefix can be set using the new
	associated qualifier -pname on the command line.

	Prosite pattern lists are accepted by fuzznuc, fuzzpro and fuzztran.
	The output reports include pattern names which default to pattern1,
	pattern2, and so on. The "pattern" prefix can be set using the new
	associated qualifier -pname on the command line.

	Regular expressions have the same syntax as the new pattern
	datatype - they can be in a file, with pattern names, and have a
	qualifier -pname to set the name for a pattern. Regular
	expressions also have a type defined in ACD which can be
	nucleotide (e.g. for dreg), protein (e.g. for preg) and string for
	general patterns. Function ajAcdGetRegexSingle will read a single
	regular expression. ajAcdGetRegex now reads a list of regular
	expressions.

	New ACD pattern type reads a PROSITE style pattern, or @filename
	where filename contains patterns with names in FASTA
	format. Patterns in the file are concatenated if on multiple
	lines. The file may also contain mismatch=n after the ID to set
	the number of mismatches for a pattern. Patterns also have
	associated qualifiers -pmismatch and -pname for the pattern on the
	commandline or all patterns in the file.

	Pattern processing is changed to use lists of patterns, as
	submitted by Henrikki Almusa of Medical in Helsinki. This is
	implemented as new ACD data type "pattern" which required some
	nucleus embPat functions and data types to be moved to AJAX ajPat
	so that they can be called from ajacd.c

        "a2m" alignment format (which is just fasta) is now supported in
	ACD.

        New EMBASSY MEME package containing "wrapper" applications
	providing an EMBOSS-style interface to the applications in
	the original MEME package version 3.0.14 developed by Timothy
	L. Bailey.  The package is fully documented.

	New EMBASSY HMMER package contains "wrapper" applications
	providing an EMBOSS-style interface to the applications in
	the original HMMER package version 2.3.2 developed by Sean Eddy.
	The package is fully documented.

        ACD dirlist: order of list of files is now system-independent.

	fuzztran: now always generates an output file, even if there
	is no data.

	coderet: now writes any permutation of cds, mrna and protein
	sequence output to separate files.  Output file formats may
	be set independently and have the default file extensions of
	"cds", "mrna" and "prot".

	oddcomp: New ACD option to set the window size equal to length
	of the current protein. Code cleaned up.

	Restrict: alphabetic sorting fixed in the case where -limit
	is specified

	Digest changed to add ragging option. Original code was
	contributed by Gregoire R Thomas.

	infoseq: code largely rewritten.  Two new advanced ACD options
	to specify output using a user-defined delimiter or in columns.
	Output much cleaner, e.g. columns are aligned.

	Digest changed to read a sequence stream (earlier versions read
	only one sequence). Code for this was contributed by Henrikki
	Almusa of Medicel in Finland.

	Two new programs makenucseq and makeprotseq have been submitted by
	Henrikki Almusa of Medicel in Finland. They create sets of random
	sequences, Sequence composition can be specified by a codon usage
	file or by pepstats output.

	New format "swissnew", with aliases "swnew" and "swissprotnew",
	added.  UniProt has announced future changes to the UniProt entry
	format, which is still called "swiss" in EMBOSS. The ID line had
	"Reviewed" and "Unreviewed" in place of "STANDARD" and
	"PRELIMINARY", and no longer has the "PRT;" placeholder for the
	EMBL format "division" - now obsolete as EMBL has changed this
	part of their ID line in the latest release. In EMBOSS 4.0.0 we
	replace "STANDARD" with "Unreviewed" as more appropriate to
	entries that come from FASTA files and other sources.

	Programs which analyze nucleotide features now call ajFeatGet
	functions in most places. In previous releases, some of these
	programs used the internal feature data structures directly.

	GFF format feature files are designed for nucleotide
	sequences. EMBOSS supports the use of GFF for protein sequence.

	Feature keys (to use the EMBL/Genbank feature table term) are now
	defined with external names for each format and a list of internal
	names to be used by EMBOSS. This greatly simplified the
	conversion of SwissProt and PIR feature tables. The internal table
	also has a list of aliases. The internal aliases for nucleotide
	features are as far as possible identifiers from the Sequence
	Ontology SOFA (feature annotation) subset. In a few cases, where
	multiple EMBL/Genbank terms map to a single SOFA term, new terms
	have been added to extend the SOFA name uniquely (we simply append
	the EMBL/Genbank feature key).

	MSF format files with more than 5000 sequences were truncated on
	input - only the first 5000 names were being read. This limit has
	been removed. As "emma" uses MSF format for the clustalw run it
	launches, this problem limited emma to 5000 output sequences in
	previous releases.

	The EMBL database has changed its ID line. The new line has
	semicolons after each token, the primary accession instead of the
	ID (there is no ID in the new EMBL format), and the sequence
	version as a number. Internally in EMBOSS we continue to build the
	accnum.n style sequence version. We expect most other packages
	will take some time to change EMBL formats, so for output this is
	called "emblnew" format. As input, "embl" format will accept both
	the old and new style entries. For database indexing, dbiflat and
	dbxflat will read old and new formats as "embl" by looking for SV
	on the ID line. EMBL and EMBLNEW format output is also improved by
	wrapping long DE lines.

	Wossname will now search for each word in a phrase used as the
	search text. By default, all words must match. A new qualifier
	-noallmatch tells wossname to match any word in the
	search. Partial word matches are accepted so "restrict" will match
	"restriction". The search term is also compared to the groups and
	keywords attributes in the ACD file. A new qualifier -showkey will
	report the keywords to help explain why applications were matched.

	All ACD files have a new application attribute keywords: which
	provides keywords to search for in addition to the groups.  This
	is intended for keywords which are hard to include correctly in
	the short description. A file keywords.standard is provided with a
	list of all keywords. this is for use by utilities searching
	programs by keyword, which will be expected to check the groups
	and keywords attributes in a single query.

	Reading a sequence of type "any" sets the sequence type to
	nucleotide by default. Any x or X ambiguity codes will be
	converted to 'n' or 'N' to avoid confusion in programs that will
	convert a second nucleotide sequence (alignment programs, for
	example). X is allowed as an unknown character in nucleotide
	sequences (and N is also allowed as 'any base').

	Stockholm and Selex sequence formats, used mainly by the HMMER and
	HMMERNEW embassy packages, have been corrected for a few cases
	where automatic format detection generated errors.

	Function names in ajseq.c have been standardised. Old names are
	still accepted but are marked as "deprecated" and will generate
	warnings with the gcc compiler (see ajstr below). Other compilers
	will see no difference.

	Further correction to reversed sequence numbering for local alignments
	from water and supermatcher. For these local alignments all reversed
	alignments were ending at "1" because the end offset was not
	calculated correctly. Matcher called a different function to set
	sequence positions and reported correct positions.

	For alignments with a line of gaps, adjusted the numbering to
	report the last sequence position instead of the next at the start
	of the line.

	Program einverted output is changed to include the sequence ID
	and the program input is changed to process more than one sequence
	as input. The change to the output format was needed to indicate
	which sequence is reported. The program is also speeded up by not
	dynamically resizing the internal arrays used to hold sequence
	positions.

	Added additional information to "entrails" output (entrails is
	built by "make check" and displays internal data to assist
	developers of wrappers and interfaces). The output now includes
	application attributes and reports definitions which are aliases
	(with -full on the commandline).

	Added -mincount option to wordcount to report only words occurring
	a given number of times. The default of 1 does not change the
	previous results.

	Oddcomp had a number of bugs. A window size equal to the sequence
	length resulted in no hits. The word size was used before reading
	the input file. A match in the last possible window was missed.

	Biosed modified to specify a position so it can be used to edit A
	to L in position 2 (for example) in a single sequence or
	throughout an alignment. Normal use is unchanged. If there is
	demand, the target could be changed from a string to a pattern.

	Clustal sequence format output is now version 1.83 with 60
	bases/residues per line. Previous EMBOSS releases reported it as
	1.4 and printed 50 bases/residues per line.

	The tmap program had an upper limit of 6000 residues and 300
	sequences. All fixed size arrays were made dynamic. The length
	limit was exceeded by one of our users.

	GCG formatted databases were found to have split entries into more
	than 1000 chunks - for example human chromosome 7 in a TPA (third
	party annotation) entry in EMBL. A regular expression is now used
	to check for any number of subsequences in GCG data.

	ajSysStrTok and ajSysStrTokR changed to match the behaviour of the
	C run time library function strtok. Both now keep their internal
	pointer at the first delimiter after the matched token. This only
	changes the result if the delimiter set is changed on the next call.

	Another code cleanup is the addition of Exit functions to all AJAX
	and NUCLEUS source files that could still have static memory
	allocated when a program ends. We aim to clean up memory for all
	the standard memory tests in test/memtest.dat. This includes
	creating a new function acdReset which resets the stats of ACD
	processing so that a new ACD file could, in theory, be read once a
	program has completed. All programs need to call the embExit
	function at the end to call the NUCLEUS and AJAX cleanup
	functions. Some of these functions will also log memory usage
	statistics if debugging is turned on (-debug on the command line).

	We are working through all the library code making standard
	function names. Old function names will be retained at least until
	release 4.0.0. They are marked with the __deprecated flag, which
	causes the gcc compiler to report all uses of the old name. Other
	compilers are not affected. The first set to be processed is in
	ajstr.c (string and character functions).

	Sequence reading from website URLs now defaults to HTTP 1.1, with
	chunked blocks of data. A bug in processing small (single line)
	chunks was fixed.

	Report and alignment output now includes the full commandline used
	to run the program, with any replies to prompts included.

	Excel report format includes a column for Strand to indicate
	sequences on the reverse strand. The strand column is + for a
	forward feature (all protein features are forward) or - for a
	reverse direction feature.

	New sequence type gapstopprotein for proteins with gaps and
	internal stops.

	Translation functions in ajax/ajtranslate.c have been cleaned up.

	New program backtranambig to backtranslate as most ambiguous
	codons.

	Phylip sequence format can now read sets of alignments with blank
	lines in between. Such formats were produced by the new fseqboot
	program and used by the new phylip programs and seqsetall in ACD.

	The list of graph devices produced when an invalid device (or '?')
	is given now lists only the unique devices (those defined
	differently in the plplot library code) with alternative names
	(xwindows for x11, for example) added in brackets. Specifying an
	ambiguous device used to accept the first match found, now an
	error message is given.

	Prettyplot and cons were producing different consensus
	sequences. Comparison of the results showed two problems. Cons was
	missing consensus characters because of an error in calculating
	the plurality (since fixed in prettyplot, but the library function
	used by cons had not been corrected). Prettyplot was missing
	consensus characters for a different reason - prettyplot has a
	"collision detection" feature to skip consensus characters for
	positions where more than one amino acid or base is valid as a
	consensus character. This was turned on by default, when the ACD
	file clearly states it should be turned off. In fixing both bugs
	the two programs will give the same consensus, except for cases
	where collisions occur - in these cases prettyplot may not select
	the same character as cons, where both are equally valid.

	Programs that write sequences need to call ajSeqWriteClose before
	they exit. This forces output from sequence formats that save up
	sequences in memory and write at the end. An example is MSF, which
	has to wait for all sequences in order to calculate the file
	checksum.

	Functions that process directories now skip the '.' and '..'
	directories so that '*' wildcards will work correctly.

	Prettyplot has been revised. A debugging commandline option has
	been removed. String commandline options have been changes to
	array and select types for better validation with the same user
	responses. Colours are now corrected for proteins - in version
	3.0.0 and earlier the colours depended on the column order in the
	matrix. Nucleotide colours follow the ABI base colours used in
	abiview. The examples in the documentation showed no boxes because
	of low sequence weights in the MSF format input data. The weights
	have been updated to give the 'expected' results.

	All programs now store the command line needed to recreate the
	run. The result is logged by the database indexing programs, and
	will be added to other program outputs in a future release. The
	command line includes all non-default responses to prompts by the
	user.

	dbiflat, dbifasta, dbigcg and dbiblast set the system sort to use
	normal "C" sort order. On systems where the locale is set to a
	language other than English, sort can have strange behaviour. In
	particular, the underscore character fails to sort in the correct
	place so that indexing SwissProt/UniProt or RefSeq entries fails
	to put certain entries in the correct sort position for
	retrieval. There is now no need to set LC_ALL=C locally, although
	this is good practice whenever sort is used.

Version 3.0.0 15-jul-2005

	Gap penalty qualifiers were standardised for all programs.

	water, needle and other alignment programs occasionally could
	report suboptimal alignments (off by the gap extension penalty
	score). The reported alignments were correct, but rearranging the
	gaps could give a slightly higher score. Matcher and stretcher use
	different alignment functions and were unaffected.

	Cpgplot no longer has a -shift option to speed processing on long
	sequences. The output was broken. We will restore it if there is
	demand.

	Two new variables added for developers using the MYEMBOSS package
	to write their own EMBOSS programs. EMBOSS_MYEMBOSSROOT (the same
	will work for other EMBASSY packages) points to the location of
	the ACD files for an EMBASSY package which is not installed - as
	would be the case for an ordinary user developing and maintaining
	their own code using MYEMBOSS. This requires the use of embInitP
	rather than embInit to pass the package name - something all
	EMBASSY programs should (and will do). The second variable is
	EMBOSS_ACDUTILROOT and is required so that utilities such as
	acdvalid can also find the ACD files. Utilities acdvalid, acdc,
	acdhelp, acdtable and acdpretty use embInit as they no nothing
	about any package name.

	Sequence sets (seqset and seqsetall) have a new ACD attribute
	"aligned" which is true or false. If true, the sequences will be
	extended with gaps and passed to the application as a full
	alignment. It is assumed that they are already aligned. If false,
	the application needs all sequences in memory but has no need for
	aligned input. The aligned attribute is required (to help ACD
	parsers) so acdvalid will object if it is not found.

	embossdata now requires a filename, or an empty string to search
	for all files. If no filename is given, it will prompt for one
	with a default of an empty string.

	acdvalid now tests the order in which sections appear in the ACD
	file. The order must be: input, required, additional, advanced,
	output. There are already constraints on which ACD data types can
	appear in each section. All existing ACD files passed this test.
	If any external ACD files have a problem the acdvalid tests can be
	revised.

	Sequence format "experiment" is now correctly the Staden package
	experiment file format. The description is taken from the "EX"
	experiment description line. EMBL line types (including features)
	are allowed in this format and are supported if used before the
	sequence. The accuracy values are read and stored (one per base,
	using the highest base value if all 4 bases have individual
	numbers) and written. These values could possibly be passed to
	primer3, for example.

	Staden and GCG input formats can now parse out comments from
	anywhere in the sequence records.

	Nexus and nexusnon output formats now correctly report the
	datatype for protein alignments.

	Documentation of the @data datatype header tags updated on the
	developers webpages.

	Coderet reports the number of CDS, mRNA and translation sequences
	to an output file. Requested for easier tracing of inputs that
	gave no sequences.

	Nbrf (pir) input can now read from an SRSWWW server. The problem
	was that SRS reports an extra ">P1;seqid" header before the
	sequence. Now if there is no sequence, a duplicate header (one
	with the same ID) can be skipped.

	Clustal output format no longer writes in blocks of 10.

	Clustal and other multiple sequence formats were unable to return
	single named sequences. Fixed for all such formats.

	Phylip3 output renamed phylipnon for compatibility with other
	formats. The phylip3 name is retained for back compatibility. The
	header for phylip non-interleaved format is corrected to that
	accepted by phylip 3.6 (no need for YF on the header line, and
	correct number of sequences). Documentation of these formats (for
	seqret and general format documentation) has been updated.

	Programs chips, cusp, prettyseq and showtran used a codon usage
	table as input only to define the genetic code (amino acids for
	each codon) for the table they produce. This is no longer needed
	as a new AjPCod constructor ajCodNewCode can be given a genetic
	code (default 0 to use the standard code) and will set the amino
	acid data.

	The ajCodClear function now clears all data, including the amino
	acid assignments, for use in reading multiple codon usage
	formats. A new function ajCodClearData clears only the data and
	other values, and leaves the amino acid assignments in case other
	applications may make the same assumptions.

	Codon usage input filenames can now be used to set the output
	filename. The codcmp program for example will no longer default to
	"outfile.codcmp" for output. However, this can cause unexpected
	results when a codon usage table and a sequence are read in, so
	codon usage filenames are only used if no other input file (or
	sequence, or feature table, or other input type) has been
	read. This is done by passing a "reset" boolean when setting the
	saved first input file name so that other inputs can overwrite a
	name defined by a codon usage input. A remaining side effect is
	that if the first input is stdin (for example with -filter on)
	then a second input file can now set the default for output. The
	recommendation for anyone developing wrappers is to always
	explicitly set the output filenames if there is a need to know the
	name for a specific output.

	Codon usage tables support multiple formats. All can be read
	automatically. EMBOSS will now, for example, accept native GCG
	codon usage tables including those used by the codonusage and
	transterm databases. The format can be specified for "codon" input
	by a -format qualifier. Outcodon is now used as an ACD datatype
	for writing codon usage tables, and has a -oformat qualifier. A
	new application codcopy can inter-convert the codon usage table
	formats. The default codon usage table format is called "emboss"
	and includes structured comments to identify the species, database
	release, database division, number of CDSs and codons, and GC
	content. These values are calculated of searched for in the text
	within a file for other formats.

	In the emboss.default and .embossrc files the same name can be
	used for variables, databases, and resources. In previous versions
	a single table was used and name clashes could occur. This becomes
	an issue with the increasing use of resource definitions.

	Colours for abiview set to the ABI standard colours.

	Sequence types explicitly set in source code for cons, sixpack and
	backtranseq. GCG format output was showing nucleotide instead of
	protein sequence type.

	Correction to reversed sequence numbering for local alignments
	from water.

Version 2.10.0 03-Jan-2005

	Profile analysis with gprof indicates that the regular expressions
	(and the PCRE library) are very inefficient. Wildcards in regular
	expressions lead to millions of recursive calls to the match
	function. Although they are very readable for code maintenance,
	replaced them for EMBL sequence and feature reading to get about a
	4-fold speedup. Profile analysis will continue up to version 3.0.0

	Feature table updated for nucleotide sequences to
	EMBL/GenBank/DDBJ version 6.2. A few obsoleted qualifiers.

	tranalign now allows for the proteins to have Methionine residues
	at the start which now match a START codon in the corresponding
	nucleic acid sequence.

	diffseq has a new option '-global' which makes it treat the whole
	of the sequences as regions to be aligned, rather than the
	default which looks for the longest region of overlap and only
	reports differences within that overlapping region.  This new
	option is useful when looking at protein and mRNA sequences
	which are expected to align over their whole length.

	Alignment output issues resolved. Specifying begin and end of
	input sequences now works for all alignment formats. Markx formats
	have been rewritten as the original code we used has nasty
	dependencies on global variables which we struggled to reproduce
	for all cases. The rewritten code is much simpler. Note that the
	gap penalty reported by markx10 format is the EMBOSS
	penalty. Markx10 as used in the FASTA package subtracts the gap
	extension penalty from the gap penalty ... and adds it back when
	calculating.

	transeq failed to check sequence ranges in list files
	correctly. It was only using the range from the first sequence if
	the USA included a start and end. The range is now reset for each
	sequence.

	remap (and other programs that display translations) had problems
	with masking ORFs (using strange characters instead of '0'),
	caused by bad calls to an AJAX function.

	Entrez added as an access method. Sequence format must be
	genbank. Server URL is hard-coded at NCBI (for now). Works by
	finding GIs GenInfo Identifiers) that match the query, and then
	retrieving them one at a time. This is still a prototype - more work is
	needed. Note that apparently Entrez cannot retrieve by LOCUS (id).

	Seqhound added as an access method. Sequence format must be
	genbank. Needs a URL to find the server. Works by finding GIs
	(GenInfo Identifiers) that match the query, and then retrieving
	them one at a time. This is still a prototype - more work is
	needed. Some Entrez error conditions are less graceful in
	SeqHound. Des and Key searches are turned off until SeqHound adds
	indexing for these. Org searches work, but require the numeric
	taxon ID. This is not friendly, so we are looking for a way to get
	the taxid from the species or genus.

	Direct access databases now support exclude wildcards. The syntax
	is as for emblcd indexing, but only files listed in filename are
	included.

	Database names must be letters, numbers and underscores
	only. Reading emboss.default and .embossrc now generates a warning
	message for any bad database name. Bad names were ignored by USA
	processing, leading to confusing results.

	seqretsplit has a new -feature option (as for seqret)

	noreturn can write files for PC or Mac file systems using a new
	-system qualifier.

	FASTA format sequence files with a sequence ID starting P1; were
	assumed to be PIR format. These can now be read as FASTA, assuming
	that PIR format has already been tested for.

	Sequences with zero length were accepted. Sequences must now have a
	length of at least 1. Some user scripts could create FASTA format
	files with no sequence, or with the sequence on the ID line. These
	can crash many programs, including a core dump from clustalw
	(through emma).

	Added a calculated attribute "haslengths" to (phylogenetic) tree
	input in ACD for use in phylipnew interfaces

	Wossname and seealso have a new commandline option -showembassy
	which defines one embassy package to be shown. The main use is in
	finding applications when automatically building the
	documentation, but end users and interface builders may find some
	uses for this option too.

	Added an "embassy" string attribute to the application in ACD so
	that wossname can find whether an application is in EMBASSY or
	not. Wossname was depending on the source directory, but could not
	distinguish between EMBOSS and EMBASSY ACD files once they were
	installed.

	The EFUNC and EDATA databases have been enhanced to provide better
	views and links within SRS. The new versions are available at both
	HGMP and EBI. In future, EBI will probably become the sole site
	(as HGMP/RFCGR is closing in 2005).

	The official EMBOSS website has moved to emboss.sourceforge.net
	which includes redefining links in applications and major
	modifications to the scripts which maintain the application web
	pages. The sourceforge web pages are now committed to CVS under
	doc/sourceforge. The pages on sourceforge itself can only be
	modified by registering at sourceforge and joining the emboss
	project.

Version 2.9.0 15-Jul-2004

	ajListMapRead and ajListstrMapRead functions for read-only lists.
	As an added check, the functions these call for each element have
	a different prototype.

	ajStrStr function now returns const, as do various 'Get' functions.
	The few cases where a true char* is needed must now call
	ajStrStrMod with the AjPStr passed by reference so that we can
	check it is being modified. All calls to ajStrStr in EMBOSS and
	most EMBASSY packages have been resolved to compiler remove
	warning messages. ajStrFix also needs the AjPStr passed by
	reference.

	tfm -html now gives full path to image files.

	Remove need for the definition of PLPLOT_LIB.

	Add configuration for cygwin dlls.

	Allow filenames of the form drive:/filename for cygwin.

	Fixes for list files with sequence ranges in the USAs. The
	sequence input object is now reset during list processing.

	Sequence sets with begin and end positions are now automatically
	trimmed on input. This applies for example to list input with
	ranges in the USAs for programs such as polydot which were
	previously reporting the entire sequence.

	graph output now has the default title including the date in
	dd-mmm-yy format instead of the unreadable dd/mm/yy format.

	Align output for seqmatchall (like wordmatch). The algorithm is
	not maintaining the sequence accession and description
	information. They may be restored in a future update.

        infoalign now also displays the weight of the sequences in the
        alignment.  This can be turned off using '-noweight'.

	New output types in ACD for all input data types, including those
	for phylogenetics and protein structure data. Initially these are
	a new AjPOutfile type with a defined format (fixed until any of
	them has a choice).

	Programs that produce graphics or text (outfile) output now
	by default will not create the outfile if there is a graph (done
	by setting the nullok attribute of the outfile).

	Acdvalid now checks for incomplete ACD types and attributes.

        trimest now has the option '-toplower' which changes the
        poly-A tail to lower-case instead of cutting it off.

	new ACD attribute 'relation' added to all ACD types. This will
	hold some information about how output data types relate to inputs
	and parameters. The syntax of the string is not yet clear. Running
	of EMBOSS programs will not be affected - the relation string is
	defined for web services and related wrappers to maintain
	provenance better.

	New ACD function oneof added, syntax is @($(var)=={a,b,c}) to test
	for a choice of menu options. Intended to clean up some ACD files
	- but they are already clean so it may not be useful. At some
	stage the unused ACD functions should be declared obsolete for
	simplicity (and efficiency). We will leave the code in place, but
	remove them from the list of functions tested.

	acdvalid now tests the knowntype attribute for strings. ACD files
	have been cleanup up to give knowntypes for all strings (defined
	in knowntypes.standard) or to convert strings to datafile or other
	ACD types as appropriate.

        showfeat now has the qualifier '-annotation'. This allows you to
        add your own brief annotations of regions on the displayed
        figure.

        remap now has has the option '-frame' which allows you to specify
        a list of the frames to be translated and displayed.

	Major cleanup of @data documentation. Added @datatype for typedef
	data types (e.g. AjBool). Checking all have attributes, and all
	attribute names and types match. Comments in the code are moved to
	the @attr documentation. Added an @cc documentation line for
	comments.

	Eprimer3 has been changed so that it runs a separate child process
	of primer3_core for every sequence. This is to cure a problem
	seen when more than about 23 sequences were input, in which there
	was some blocking contention between the input and output streams.

	Major cleanup of ACD files to match acdvalid standards. Featout
	qualifiers are now -outfeat, which means all output start with
	-out but it does clash with -outfile so -outf is not always usable
	as an abbreviation.

	Options for emma have been cleaned up. -insist is no longer used
	(use -sprotein instead) and -slowfast is now a simple boolean
	-slow. Both changed lead to a much cleaner ACD file.

	Options for eprimer3 have been cleaned up. New options -primer
	(true) and -hybridprobe (false) make the dependencies far
	simpler. The default task is now 1 (same as the old zero) and the
	-hybridprobe option is needed to calculate the hybridization
	probes. This removes a lot of dependencies on tasks 1 and 4
	(hybridprobe) and not-task-4 (primer)

	New AjPDir to hold directory path and default extension. Intended
	for domainatrix applications. This requires changing
	ajAcdGetDirectory to return an AjPDir and providing
	ajAcdGetDirectoryName to return the path as a string. Several
	programs were changed to reflect this changed call.

	New ACD type outdirectory for a directory to which files will be
	written. Must have a knowntype describing the files that will
	appear there. Expected qualifier name is -outdir.

	compseq now has the option '-calcfreq'. This makes it calculate
	the expected frequencies of the words in the sequences from the
	observed frequencies of the single bases or residues in those
	frequencies.

	HTML data from remote sites is becoming more complex. EMBOSS now
	makes a first pass to look for a single preformatted block and
	accepts this as the data (thus avoiding horrors such as the Entrez
	headers and javascript which NCBI's search service includes).
	At the same time, an old fix to patch SRS 6.1.0 output has been
	removed as this clashed with the new code.

	Optional outputs have a new behaviour. With nulldefault defined,
	an output is, by default, turned off and will return a NULL value
	to the calling program if nullok is set. Setting the value to ""
	on the command line will now ask for the standard filename to be
	generated. The "missing" attribute, if defined, allows simply
	-qualname on the commandline to request the default filename,
	although care must be taken to avoid anything following the
	qualifier appearing to be a filename. This means the qualifier
	must be last on the commandline, or must be followed by another
	qualifier.

	Indexing programs dbifasta and dbiflat no longer store the source
	directory in the division.lkp file - directory is specified in the
	database definition. This was only done originally to share index
	files with "efetch" at the Sanger Centre. With index files and
	data files in the same directory (as for efetch) it is not needed.

	All ACD files revised for new acdvalid checks.

	New ACD section "additional" added for qualifiers with
	additional:"Y" defined. These have been put in the "advanced"
	section until now. Acdvalid checks that these qualifiers are in
	the appropriate section.

	Acdvalid now checks that qualifiers are in the expected
	section. All input qualifiers (including cfile and datafile) aer
	now in the input section, all output qualifiers are in the output
	section. All (remaining) standard, additional and advanced
	qualifiers are in the "required" "additional" and "advanced"
	sections.

	New ACD type "toggle" added. This is the same as "boolean" but is
	allowed in any section by "acdvalid" checks. Toggle is to be used
	for ACD qualifiers that "toggle" (turn on or off) other
	qualifiers. An example in many ACD files would be "-plot".

	Cirdna and lindna now dynamically allocate memory. For simplicity
	they do still have an upper limit for the number of groups and
	labels per group, but no longer have static arrays.

Version 2.8.0 30-Nov-2003

	tfm accepts the PAGER environment variable. It can be overridden
	by EMBOSS_PAGER.

	Fix for HTTP 1.1 lines for MacOSX added (Cedric Rossi).

	The home directory ~/.embossrc file can be turned off with
	"setenv EMBOSS_RCHOME N" This was added for cleaner QA tests
	but may have other uses.

	Report format output added (by Henrikki Almusa) for dreg, preg,
	recoder and silent.

	pestfind renamed to epestfind and handling of terminal water
	residue adjusted.

	Align formats: Added "tcoffee" as a valid -aformat which writes a
	T-Coffee library file suitable for input as -in=Lfilename to
	T-Coffee.

	Pepstats: added molar extinction coefficient and extinction
	coefficient at 1mg/ml for A280.

	Nexus format sequence input added, with new functions to parse all
	standard nexus files. Later releases will accept nexus format for
	other input data.

	Jackknifer, Mega, Treecon Mase and Fitch formats parsed, at least in
	their EMBOSS output forms.

	Underscores are allowed in accession numbers and sequence versions
	to handle REFSEQ fasta format entries.

	New function ajRegPre returns the original string before the
	regular expression match.

	New function ajStrArrayDel deletes a string array.

	New functions ajListstrToArrayApp appends strings in a list to the
	end of a string array.

	Sequence input changes: Allow '?' as a valid character (it has
	been seen in phylip sequences) for 'unknown' and convert to X for
	protein (or any) and 'N' for nucleotide. Note that this can give
	an X or N depending on whether the program accepts nucleotide only
	or any sequence. We may find a cleaner fix, but it would depend on
	knowing the sequence type.

	Added binding factor output to tfscan plus option to specify a
	custom data file

	Removed the Henry Spencer regular expression libraries. There were
	a few calls to the ajPosReg functions, but only to test it worked
	the same way as ajReg. Added a case-insensitive ajRegComp and
	ajRegCompC (which the ajPosReg functions had) using
	PCRE. Farewell, Henry. You were a great servant to EMBOSS.

	Water S-W alignment program no longer truncates some matches

	Vector arithmetic added to ajax library.

	Compilation now uses large file handling by default. To disable use
	--disable-large when configuring. An effect is to make the default
	size of ajlongs 64 bits.

	Pepstats modified to allow multiple sequences

	Major (well, obvious impact on ACD authors) ACD change - the
	"required" attribute is renamed "standard" and the "optional"
	attribute is renamed "additional". They have exactly the same
	functions as before. The change is to (hopefully) make their
	meaning more obvious to those developing ACD parsers and wrappers
	for EMBOSS. ACD attribute "standardtype" clashed with "standard"
	and is renamed "knowntype".

	ACD attributes have been added for applications and for all ACD
	types to make wrappers easier to control. These new attributes are
	specifically for SoapLab from EBI, and need not have any impact on
	other wrappers (SoapLab uses ACD to define non-EMBOSS applications
	and needs extra attributes to define some additional properties).

	pepinfo now writes to a file with a standard output filename of
	(sequenceid).pepinfio instead of pepinfo.out

	Completed the standardization of ACD definitions, using "acdvalid"
	to remove all errors and allowing only selected and hard to avoid
	warnings to remain. The warnings are for calculated "required" or
	"optional" definitions (simple true/false relations to another
	boolean are accepted). In particular: all essential inputs and
	outputs are parameters, with standardtype defined. Non-essential
	inputs and outputs have the nullok attribute set. Information
	strings are defined only where there is no standard prompt.

	The definition of AjPStr and other "pointers to structs" is
	causing strange problems in specifying "const" for structs that
	are unchanged by function calls. In summary, it appears (for all
	compilers we tried) that "const" only knows it is for a pointer if
	it can see the "*" in the type. This means, for example, that
	"const AjPStr" failed but "const AjOStr*" worked. With "const" if
	it knows it is a pointer, it makes the data structure
	constant. Otherwise it makes the pointer itself constant, the
	equivalent of "AjOStr* const". We fixed this by changing AjPStr to
	be a #define of AjOStr*. This has the advantages that most code is
	unaffected and that const now works as expected. The only code
	changes we needed are lines with multiple AjPStr definitions
	(which is anyway deprecated), for example "AjPStr astr, bstr"
	which clearly fail when you think about the #define (astr is an
	AjPStr, but bstr is now an AjOStr and will give strange compiler
	errors). We may change this again to define a separate const data
	type for each struct, but probably the #define is a good solution
	and we expect to stay with it.

	PCRE is now the library of choice for regular expressions. This
	allows the full Perl regular expression syntax, and was very easy
	to integrate. Regular expressions are used internally for parsing
	and for manipulating strings such as file and directory names, and
	also for matching by programs such as dreg and preg.

	The previous Henry Spencer library functions are renamed from
	ajReg to ajHsReg. The Posix version of the Henry Spencer library
	remains available as ajPosReg but may be removed as it was not
	used by the EMBOSS distribution, and PCRE can provide the same or
	higher functionality.

	acdpretty now writes the name of the output file to standard
	output. For example "Created seqret.acdpretty".

	The ACD qualifiers -acdpretty -acdtable and -acdlog are
	removed. Programs acdpretty and acdtable do the first two tasks
	(in the same way as before). To turn on the acdlog file, use
	environment variable EMBOSS_ACDLOG.

	Graphs can now use "-graph data" to produce files compatible with
	the Staden package's spin2 and spin GUIs. This makes some ACD
	options obsolete, especially the various -data and -outfile
	combinations. Banana already wrote an output file which caused
	some confusion in these options. The outfile and the graph are
	both produced by default, but have the nullok attribute and can be
	turned off with -nooutfile or -nograph on the command line.

	graph and xygraph output can now be optional - the ACD files can
	have a nullok: "Y" attribute which allows -nograph on the command
	line.

	In ACD files alternatives for protein and nucleotide input are
	common. Added an automatic variable $(acdprotein) which is defined
	as the calculated ".protein" attribute of the first input
	sequence(s). The value will be "Y" or "N". Acdvalid will check
	that this is how proteins are tested, so the original
	"$(asequence.protein)" syntax will become obsolete. The intention is
	that any wrappers can use this to make protein and nucleotide
	versions of the ACD file, and in general to use only simple
	boolean tests in calculated ACD values.

	Added wait call to wait for a piped command to complete
	before reading data (needed for listfile input with
	many piped reads, for example getz calls from SRS databases.

Version 2.7.1 03-jun-2003

	Corrected Jemboss for displaying emma & prettyplot forms
	Corrected display of recognition sequence for restrict -solofragment

Version 2.7.0 01-jun-2003

	Standardtype attribute added for filelist in ACD

	Datafile for mwfilter changed from string to datafile ACD type.

	A new test application acdvalid will check for deprecated ACD
	syntax and report errors for something that should be fixed, or
	warnings for something still to be clearly defined. None of these
	"errors" will stop an ACD file from working correctly, but they do
	cause confusion to the authors and maintainers of wrappers, GUIs,
	and so on.

	Sequence types are extended to include new types for programs that
	can handle selenocysteine.

	Sequence types are simplified so that input can be converted to
	the specified type. Gaps can be removed, and unsupported
	characters can be converted to X for protein or N for nucleotide.
	A few applications may be unable to handle any ambiguity
	(pureprotein, puredna, etc.) and will require correct input. To
	make it safe to run a program over (for example) swissprot or
	embl, such programs should read single sequences only, or be
	converted to support ambiguity codes. This may take a little
	time. banana, octanol and pepwindow already read single sequences.
	In need of attention are hmoment and iep.

	In ACD files a new application attribute "external" is added where
	a third-party tool is needed. examples include clustalw (emma) and
	primer3_core (eprimer3 and primers).

	ACD definitions for feature and featout now have a "type"
	attribute. The feature output type defaults to the sequence type,
	as for sequence output. Feature types are "protein" or
	"nucleotide" or "any".

	ACD sections now have "information" instead of merely "info" for
	consistency.

	Boundary fix for ajStrMask

	Tightened up on reporting of isoschizomer groups in 'showseq -limit'
	and 'remap -limit'.

        Added embPatRestrictPreferred.

	Added -individual option to RESTRICT. This gives the fragment
	lengths produced by restriction assuming only each named RE
	of the set that can cut the sequence is used. Results are
	added to the tail section of the report.

	Added a -equivalences option (on by default) to rebaseextract.
	This option calculates an embossre.equ file using RE
	prototypes in the withrefm file.

	A guide to the EMBASSY package domainatrix (domainatrix.doc)
	has been added to /emboss/emboss/doc/manuals

	Extractfeat now has the -describe qualifier to allow it to add
	the value of selected tags to the Description line of the output
	sequence.

	Revseq can now read in gapped nucleic acid sequences.

	Removed old corba code in preparation for adding corba server as
	an embassy package.

	Simplified error messages for sequence reading, and corrected
	handling of a bad USA as the first in a list file.

	Padded temporary filename for emma to avoid clustalw bug with
	short input filename (this will not work in all cases and
	a corrected clustalw should be used nevertheless).

	-help output modified to align all the qualifiers

	acdpretty output revised to resolve to full names

	Complete overhaul of all ACD error conditions. Parsing and command
	line validation messages are now all used, and all tested in the
	qatest suite. These tests used bad ACD files in the test/acd
	directory.

	whichdb failed to report error messages. They are now turned on -
	and most of the common errors are reported with less verbosity.

	TCODE application added. Calculates the TESTCODE statistic.

	Eprimer3 now reports the primer positions using the coordinates
	of the original sequence when -sbegin and -send are used to
	specify a sub-sequence to consider.  The input ranges, such as
	the -exclude and -target ranges are always given using the
	positions from the original sequence.

	tfm looks for documentation in EMBOSS_DOCROOT (an environment
	variable, or defined in emboss.default), then in the install
	directory, and finally the original build directory.

	In some cases, EMBOSS programs could terminate with an exit status
	of 255 (-1). Terminating with "Die:" message exists with status 1.
	All exit calls now use either 0 (success) or the standard
	library EXIT_FAILURE value (usually 1).

	All report output fields have a new attribute (and qualifier)
	rscoreshow which defaults to "Y". Setting rscoreshow: "N" will
	remove the score from the output, except for GFF where it is
	required, and SRS format where it can be kept for use in standard
	parsers. The aim is to exclude the score value from applications
	that have no scoring method (restrict for example). For these,
	putting -rscore on the command line will override the ACD file and
	display the score.

	Showseq and showfeat both now have the qualifier '-stricttags'.
	By default if any tag/value pair in a feature matches the
	specified tag and value, then all the tags/value pairs of that
	feature will be displayed.  If '-stricttags' is set to be true,
	then only those tag/value pairs in a feature that match the
	specified tag and value will be displayed.

	Megamerger now has the qualifier '-prefer' which makes it use
	the first sequence to create the merged sequence whenever there is a
	mismatch between the two sequences.

	Sirna now has the qualifier '-context' which writes the first
	two bases (in brackets) of the 23 base target region.

	Maskseq and maskfeat now both have the qualifier '-tolower'
	which will change the masked regions to lower-case characters
	instead of replacing them with a mask character.

	ACD parsing internals are rewritten to find and report errors more
	cleanly and to make the syntax stricter for other ACD parsers used
	by (for example) GUI developers.

	Sequence output types now have a 'type:' attribute which defaults
	to the type of the first input sequence. For most applications
	this is good enough as a default. For those which add gaps or
	translate DNA to protein (or vice versa) a 'type:' attribute will
	be needed. This is to improve support for automated workflow
	building by more strongly typing input and output data.

	acdpretty now wraps long lines of ACD definitions, splitting at
	any lone backslash (which defines a newline for -help output) or
	at whitespace. Attributes and sections are indented by 2 spaces.

	Until now, the ACD file syntax has allowed name=value syntax and
	the use of {} () and even <> for quoted strings just in case they
	needed both ' and " characters. These are now removed. We believe
	no ACD files were using this syntax.

	valgrind.pl is a new addition to the script directory that runs
	valgrind memory leak tests under linux. the tests are a copy of
	those in purify.pl - they may one day move to a separate file.

	EMBOSS feature output now copies (where available) the name of the
	input sequence as the filename, so filenames match more closely to
	the sequence output. For example, "seqret -feat tembl:paamir" will
	now create 2 files called paamir.fasta and paamir.gff where the
	feature file previously was called 'unknown.gff'

	EMBOSS feature output defaults (as before) to GFF format, but the
	default format can now be set by variable EMBOSS_OUTFEATFORMAT

	All EMBOSS output files now have a default output directory
	(required by some webservices implementations that run in
	the 'wrong' default directory). Variable EMBOSS_OUTDIRECTORY
	if set becomes the default output directory for outfile, align,
	report, graph, sequence and feature output.

	The output directory can also be set from the command line (or as
	an ACD attribute) using the associated qualifier -odirectory
	(outfile), -rdirectory (report) -adirectory (align) -gdirectory
	(Graph and graphxy) -osdirectory (sequence) or -ofdirectory
	(featout).

	The "g*"" attributes for graph and graphxy in ACD have been deleted as
	they have the same name (and function) as existing associated
	qualifiers - and can still be used with these names in ACD files.
	Duplicate ACD attribute and associated qualifier functions exist
	in many ACD types, but usually have different names and so are
	left for compatibility purposes.

	emboss.default and ~/.embossrc configuration files now have
	extensive error messages reporting filename and line number.
	showdb has additional validation for all database definitions.
	Environment variable EMBOSS_NAMVALID (boolean) turns this on for
	all programs.

	ajnam.c has debugging turned on by environment variable
	EMBOSS_NAMDEBUG (boolean). This processing (of emboss.default and
	~/.embossrc) happens before command line option -debug has taken
	effect. The output goes to standard error.

	Function ajFmtVPrintS is a previously missing complement to ajFmtPrintS

	EMBL/Genbank feature tables updated to FTv5.0

	SwissProt feature table '<' '>' and '?' location modifiers are
	now handled correctly.

	Added new applications acdlog, acdpretty and acdtable. Run like
	acdc they provide the same functions as the command line options
	-acdlog -acdpretty and "-acdtable -help" These -acd options are
	now obsolete and will be removed in a future release to clean up
	the ACD interface.

	Transeq now has the option '-clean' that converts all '*'
	characters to 'X's.  This may be useful because not all programs
	accept protein sequences containing '*' characters.

Version 2.6.0 20-Sep-2002

	Showdb now can display the presence of any of the extra sv, des,
	org, and key search fields that can be used to index and search in
	databases.

	Added twofeat - Finds neighbouring pairs of features in sequences.

	Extractfeat - added option (-featinname) to include the name of
	the feature as part of the ID name of the sequence that is
	written out.

	Added sirna - designs siRNA probes in mRNA.

	Sigcleave sorts results highest score first.

	Helixturnhelix sorts results highest score first and reports the
	score position as an integer.

	Added pestfind.

	Moved the following programs into the "domainatrix" embassy
	package:
	 contacts, domainer, fraggle, hetparse, hmmgen, interface,
	 pdbparse, pdbtosp, profgen, scopalign, scopnr, scopparse,
	 scoprep, scopreso, scopseqs, seqalign, seqnr, seqsearch,
	 seqsort, seqwords, siggen, sigplot, sigscan

	Palindrome no longer reports palindromes that are only composed
	of N's.

	Msbar can now check that the result doesn't match a set of
	input other sequences.  For example you could specify that it
	doesn't match the input sequence or a set of previously produced
	mutation results.

	Getorf reporting of circular genome positions tidied up - it now
	reports positions starting in the range 1 to the sequence length
	and indicates if the ORF goes through the breakpoint.  A clear
	indication of when ORFs are in the reverse sense has been added.

	Pasteseq now behaves correctly when -sask2, -sbegin2 or -send2
	are used.

Version 2.5.1 12-Aug-2002

	Whichdb new option -showall to see which databases are being
	searched for use where searches hang. The order of searching is
	undefined - it depends on the order in which databases are
	returned from the internal table, which is unrelated to the order
	in which they were defined.

	Wordmatch alignments save the entire sequence but use part only.
	Fixed all alignment formats to work with these by adding a
	SubOffset attribute.

	Duplicate IDs fix. The database indexing programs skipped
	duplicate IDs but did not reset the size of the entryname index
	file so some queries could fail to find the later IDs in the
	databases. Duplicate IDs are illegal for -nosystemsort (no easy
	way to correct because entry numbers are stored internally). For
	the default case duplicate IDs are merged even if they are
	different. REFSEQ is the main problem area.

	Writing data files used EMBOSS_DATA, or by default the install
	directory. Earlier versions, if not installed, could write to the
	source tree emboss/data directory. Fixed to continue if there is
	no install data directory, and to check EMBOSS_DATA (if defined) is
	a real directory.

	Sigcleave options pval and nval hardcoded. They depend on the
	weight matrix size - which is hardcoded as 15 in the ACD file and
	is not checked in the program. They were introduced in EGCG in
	1988 but never used because no other weight matrix length was
	tried.

Version 2.5.0 25-may-2002

	"fasta" format now uses the "ncbi" parser, so both formats report
	"fasta" as the format. "pearson" is the old "fasta" format for a few
	cases (empty IDs for example) there ncbi parsing fails completely.

	SPLITTER changed to match documentation. Old behaviour is
	now selectable by using the -addoverlap command line
	option.

	Configuration modifications. --without-x works. Removed odd
	but harmless -I definitions. PNG detection improved.

	Corrected EMBLCD index searching for queries that start with a
	wildcard. For example, tembl-key:?* should search for all entries
	that have a keyword (key:* is regarded as 'all entries'). Entries
	with no keyword (in PIR's pir4.ref file for example) will be
	ignored.

	Updated source code docs for EFUNC and EDATA. Corrected all bad
	headers. efunc.out has no errors. efunc.check only reports
	'missing headers' for duplicated function names (#ifdef code)
	which is a known 'feature'.

	Updated source code to fix most lines over 80 bytes.

	Calculated ACD attributes now QA tested. Feature attributes will
	be correctly set, although none are used in the ACD files at present.

	purify.pl has a new option -block=n where n is a number from 1
	upwards.  1 runs the first 10 tests, 2 runs the next 10
	(blocksize=10 is hardcoded for now).

	Cleaned up string position code. Inspections showed ajStrPos and
	related functions gave results from 0 to length of a string. This
	caused confusion in many other functions and applications. These
	functions are now static strPos functions because only ajstr.c had
	calls to them (though the ajStrPos versions are still available).
	All calls were checked for positions out of range. As a result,
	many calls to ajStrAssSub and AjStrCut were fixed. ajStrInsertC
	requires a value from 0 to length (start position to insert can be
	before or after the string, or any position in between). Fixed by
	passing length+1 to strPosII.

	Added a functions ajUtilCatch for use in debugging with gdb. When
	a nasty special case occurs, call ajUtilCatch and make it a
	breakpoint in gdb. The resulting backtrace will give the call stack
	and all variable values.

	Cleaned up code for chunk HTML input. Added a new variable
	EMBOSS_HTTPVERSION which defaults to 1.0 (so HTTP is not chunked)
	and a DB attribute httpversion. This must be a floating point
	number, and is included in the HTTP header to specify the HTTP
	protocol version to be used. There is no check in the code to
	change behaviour for different versions. This is used in the
	SRSWWW and URL access methods.

	Added check to qatest.pl to report any EMBOSS (rather than
	EMBASSY) applications for which there is no defined test. The
	EMBASSY test uses wossname results, checked against the names of
	ACD files in the source tree, as qatest always runs in the test/qa
	directory.

	Allowed sequences as values for EMBL rpt_unit feature qualifiers
	because so many entries have them. They are illegal according to
	the Version 4.0 (current) feature table document.

	Allow ? before from and to feature locations in SwissProt. For
	now, these are ignored, though we could add something to hold them
	for accurate output.

	Added modified Harrison solubility probability to PEPSTATS

	ACD attributes now have descriptions in the ajacd.c code which are
	reported by 'entrails'. All ACD attributes have been checked by
	inspection of the code to note those which are used/unused by ACD.
	The ACD "type" attribute for files is renamed "standardtype" to
	reflect its intended use to note standard file types for linking
	applications. Sequences and alignments still have a "type"
	attribute for protein or dna sequence types.

	Aaindexextract (new) reads the AAINDEX database and writes each
	entry to data/AAINDEX directory. New function ajFileDataDirNew to
	read data files from a named directory. New ACD datafile attribute
	'directory' passed to ajFileDataDirNew. AAINDEX directory defined
	for pepwindow and pepwindowall.

	Palindrome can now read in multiple sequences

	Palindrome now does not print a '|' in an alignment where there
	is a mismatched pair of bases.

	Added filelist datatype to ACD

	Mwcontam program added. Displays molecular weights that are common
	across a set of files.

	Showfeat - added '-sort join' to display joined features on one line.

	Diffseq - don't give summary of SNPs if the sequences are proteins.

	Inclusion of stat64 and readdir64 for offsetbits=64 (ajfile.c
	and ajsys.c)

	Workaround for broken Solaris readdir64_r (jembossctl)

	Infoseq can now optionally display GI and Sequence Version numbers.

	Notseq can now read in a file of sequence names.

	Added '-alternative' qualifier to transeq to allow reverse frame
	translations to be done using the codons counted from the start
	of the reversed sequence, rather than, by default, using the
	codons of the corresponding forward frame.

	Added the qualifier '-join' to the program extractfeat.
	If '-join' is set then joined features, such as 'CDS' and 'mRNA'
	are output as a single concatenated sequence.

	Changed the default output filename from 'stdout' to a file for
	the following:
	    infoalign
	    megamerger
	    merger
	    showalign
	    showfeat
	    showseq
	    textsearch

	Lindna/cirdna can now draw filled boxes and the user can change the
 	text size on the command-line. They can also read and display
 	complete genomic sequences.

	Major new revision of protein structure applications - w/o full
 	documentation.

	New applications have been added:
	     pdbparse.c / acd
	     scopseqs.c / acd
	     scopnr.c / acd
	     seqsearch.c / acd
	     seqwords.c / acd
	     seqalign.c / acd
	     hetparse.c / acd
	     scopreso.c / acd
	     scoprep.c / acd
	     profgen.c / acd
	     funky.c / acd
	     hmmgen.c / acd
	     fraggle.c / acd

	Some applications have been deleted:
	     scope.c / acd
	     nrscope.c / acd
	     psiblasts.c / acd
	     swissparse.c / acd
	     alignwrap.c / acd
	     dichet.c / acd

	The deleted applications have been replaced as follows:
	     coordenew  --> pdbparse (coordnew was deleted a while back)
	     scope --> scopparse
	     nrscope --> scopnr
	     psiblasts --> seqsearch
	     swissparse --> seqwords
	     alignwrap  --> seqalign

	New versions of code have been committed:
	     pdbparse.c / acd
	     domainer.c / acd
	     contacts.c / acd
	     interface.c / acd
	     pdbtosp.c / acd
	     scopparse.c / acd
	     scopreso.c / acd
	     scopseqs.c / acd
	     scopnr.c / acd
	     scoprep.c / acd
	     scopalign.c / acd
	     seqsearch.c / acd
	     seqwords.c / acd
	     seqsort.c / acd
	     seqnr.c / acd
	     seqalign.c / acd
	     siggen.c / acd
	     sigscan.c / acd
	     sigplot.c / acd
	     hetparse.c / acd
	     profgen.c / acd
	     funky.c / acd
	     hmmgen.c / acd
	Plus
	     ajxyz.c / ajxyz.h

	Short summaries of the applications are as follows:
	     pdbparse - Parses pdb files and writes cleaned-up protein
			coordinate files.
	     domainer - Reads protein coordinate files and writes
			domains coordinate files.
	     contacts - Reads coordinate files and writes files of
			intra-chain residue-residue contact data.
	     interface- Reads coordinate files and writes files of
			inter-chain residue-residue contact data.
	     pdbtosp  - Convert raw swissprot:pdb equivalence file to
			embl-like format.
	     scopparse- Converts raw scop classification files to a
			file in embl-like format.
	     scopreso - Removes low resolution domains from a scop
			classification file.
	     scopseqs - Adds pdb and swissprot sequence records to a
			scop classification file.
	     scopnr   - Removes redundant domains from a scop
			classification file.
	     scoprep  - Reorder scop classification file so that the
			representative structure of each family is
			given first.
	     scopalign- Generate alignments for families in a scop
			classification file by using STAMP.
	     seqsearch- Generate files of hits for families in a scop
			classification file by using PSI-BLAST with
			seed alignments.
	     seqwords - Generate files of hits for scop families by
			searching swissprot with keywords.
	     seqsort  - Reads multiple files of hits and writes a
			non-ambiguous file of hits (scop families file)
			plus a validation file.
	     seqnr    - Removes redundant hits from a scop families file.
	     seqalign - Generate extended alignments for families in
			a scop families file by using CLUSTALW with seed
			alignments.
	     siggen   - Generates a sparse protein signature from an
			alignment and residue contact data.
	     sigscan  - Scans a signature against swissprot and writes
			a signature hits files.
	     sigplot  - Reads a signature hits file and validation file
			and generates gnuplot data files of signature
			performance.
	     profgen  - Generates various profiles for each alignment
			in a directory.
	     hmmgen   - Generates a hidden Markov model for each alignment
			in a directory.
	     hetparse - Converts raw dictionary of heterogen groups to
			a file in embl-like format.
	     funky    -	Reads clean coordinate files and writes file
			of protein-heterogen contact data.

	Updated "make check" program entrails. Corrected sequence format
	reports, added report and alignment formats and database access
	methods.

	Added scripts/logreport1.pl to report EMBOSS usage from the
	logfile. Takes the logfile name on the command line. Reports
	total use, most active user, and total user count.

	Extractseq now only reads one sequence as input.

Version 2.4.1 14-may-2002

	Fixed error reading multiple databases

	Fixed MacOSX reading of incomplete sequence files

	Fixed indexing of REFSEQ

Version 2.4.0 11-Apr-2002

	New Jemboss authorising server code. This uses a new set-uid
	program (jembossctl) to perform tasks as the user.

	New alignment output format "match" for wordmatch, reports the
	length, sequence names, and range in each sequence.

	emboss.default.template has been changed to include the new SRSWWW
	access method and the fields definitions for the test databases.

	In dbiblast, renamed the -filename option -filenames to match the
	other dbi indexing programs, and because wildcard filenames are
	supported.

	Removed the -staden option for the dbi indexing programs. This had
	no effect (it was originally included to rename files as
	division.lookup for use by internal utilities at the Sanger
	Centre).

	In qatest.pl test script, added test for missing expected file.
	Only seen for obsolete secondary output files, no tests were passing
	that should have failed.

	Script (scripts/dbilist.pl) to report the contents of EMBLCD
	database indices created by dbiflat, dbigcg, dbifasta or dbiblast.

	Proxy HTTP access for remote servers. Define EMBOSS_PROXY as an
	environment variable, or in emboss.defaults. Can also be set for
	any database as proxy: "hostname:port" or overridden with
	proxy: ":" to use a local server for a database. This is used by
	both the URL and SRSWWW access methods.

	New ajListUnique function to remove duplicate nodes in a list.

	New embxyz.c / .h embXyzSeqsetNRRange functions added

	Report format "table" is the default for several applications. In
	this format, the sequence USA has been removed because it already
	appears in the sequence header part of the report. A new format
	"-rformat nametable" will produce the previous report output for
	users who are relying on parsing it.

	Output files defined with the "nullok" attribute in ACD are not
	created unless requested. The file name and extension are ignored.
	It is possible to add a new associated qualifier to control this
	behaviour, but its use may be confusing with more than one output
	file.

	Precision attribute for report score (default is 3). Other
	floating point report values are written as strings by the
	original application so their precision is defined in the
	code. The score is a float, as part of the internal (GFF) feature
	structure.  A zero value produces an integer score (strictly, it
	uses %.0f as the format). Set precision for etandem, fuzznuc,
	fuzzpro, fuzztran, patmatdb, patmatmotifs (integer scores) and
	restrict (no score)

	Report output for equicktandem and etandem, with -origfile to
	write the original output format for sites (Sanger for example)
	who still require it. By default, the origfile output file is
	not created.

	Report output for patmatdb and patmatmotifs. For patmatmotifs the
	prosite documentation appears in the report footer, with the
	addition of the motif name and the number of matches in the
	sequence.

	Report headers and footers automatically trim last newline.

	Reports in -rformat SeqTable right-align numbers.

	Report output for marscan (-rformat GFF by default)

	Report output for fuzztran (-rformat table with the translation
	included as a report field). Using -rformat seqtable with fuzztran
	now also shows the original DNA sequence.

	Report output for fuzznuc and fuzzpro (-rformat SeqTable by default)

	New report qualifiers -raccshow to include accession in header
	and -rdesshow to include description in header

	Two access methods "file" and "offset" were defined as valid in
	database definitions, but are really reserved for simple file reading.
	They are removed from the database access methods list.

	Two access methods "cmd" and "nbrf" are obsolete (cmd was never
	implemented, nbrf is replaced by gcg which includes a query
	mechanism). Both are removed from the database access methods list,
	and the source code is commented out.

	SRS, SRSFASTA and SRSWWW database access can read all entries This
	is not recommended for SRSWWW access because it will read
	everything into memory - all of EMBL for example - then strip out
	HTML tags before reading. For SRS it is not recommended because
	"methodall: direct" is faster. For SRSFASTA it is necessary
	because using SRSFASTA implies EMBOSS does not read the original
	data format. However, not implementing an "all" search left a gap
	in the SRS access methods which would generate a bad SRS command
	line or URL.

	NBRF sequence reading trims last character only if it is '*'
	to catch cases where SRS reports the sequence as 'plain'

	GCG database text has the spaces in ". ." strings removed.

	Database entry text and sequence saved for binary formats (GCG, BLAST)
	for use by entret and other applications

	Dbiblast indices with split databases (formatdb -v) fixed for reading
	all entries (was only reading the first file)

	Dbiblast and dbigcg indices support exclude and file definitions
	to create database subsets

	Database include and file definitions can use the simple filename.
	In some cases the full path was used. Database files are checked
	both with and without the directory path for back-compatibility.

	srswww access method created to query a remote web server.
	Preferred to using URL access as SRS queries can be built

	Sequence objects include the SeqVersion, Keyword list and Taxonomy
	list.

	The GI number is read as an alternative SeqVersion where it is
	available (GenBank and some NCBI formats). The GI number is
	reported in GenBank format if available, but the GenBank VERSION
	line may have only the SeqVersion if, for example, the sequence
	was read from an EMBL entry. "sv" queries check both the
	SeqVersion and GI number.

	Accession numbers have a strict definition, which covers the old
	and new EMBL/GenBank format, SwissProt, PIR, and REFSEQ
	(NM_nnnnnn). Earlier versions would accept any "accession number"
	in some sequence formats, especially NCBI format.

	SeqVersion (EMBL SV line, GenBank VERSION line) is used in preference
	to accession number where available. Can also be read in FASTA
	and NCBI formats. Where only the SeqVersion is available, the
	accession number is generated.

	USA queries implement searches by SV, DES, ORG and KEY. These work
	with SRS access methods (SRS, SRSFASTA, SRSWWW) by building SRS
	queries, and with direct access (simple file reading) by
	testing the sequence object.

	Key and Org queries are for full keywords (including spaces) and
	for each level of the taxonomy.

	Des queries, if the access method does not provide a mechanism,
	(if the access method does not have its own index) are applied to
	words within the description. Words start with a letter or number,
	and end with a letter or number. SRS typically does the same, but
	allows a single quote at the end. This catches words such as 3'
	and 5' but is a problem with some quoted text.

	Queries for ID ACC SV DES ORG and KEY are valid for all file
	access methods, including URL, external, cmd, app, file and by
	default any new method added. If the internal query data is not
	flagged by the access method (to show the database has been
	queried) the sequence object is automatically tested.

	Missing description, keyword, organism, or seqversion fields cause
	queries to fail if they are used on inappropriate data.

	Dbiflat, dbigcg dbifasta and dbiblast can index the new
	fields. All fields are available in dbiflat and dbigcg. The sv and
	des fields are available in dbifasta and dbiblast. If any specific
	formats make it possible to parse the org (or key) field they can
	be added as new formats.

	The new EMBLCD index files are named as follows: des for the
	descriptions (no obvious standard name), seqvn for the seqversion
	(no obvious standard name), keyword for keywords (EMBLCD
	distribution name) and taxon to organism (EMBCD distribution
	name). The EMBLCD distribution also included a freetext index
	which is similar to the SRS alltext search so we did not use the
	name for the description index.

	We are working through the EMBLCD format documentation to make
	EMBOSS indices more compatible. For example, all tokens in the TRG
	index files should have trailing spaces. We use a NULL to mark the
	end of the string.

	EMBLCD index files now expand to fit the longest token, including
	the entryname index which was limited to 12 characters (only one
	site reported a problem with this in dbifasta with long ID names).

	A new qualifier -maxindex sets an upper limit (25 is recommended)
	to limit the size of all index files. Currently this applies to
	all indices. We can add separate maxima for each field if
	needed. We expect very few sites to use the extra index fields
	as SRS is a simpler alternative.

	New database definition token 'fields' with a list of indexed fields
	can be set to 'sv des org key' for SRS databases.

	USAs check the query field against the database 'fields'
	definition. ID and ACC are always allowed. dbname:name still
	searches ID and ACC (no change from previous version)

	USAs with a filename can include the new query fields. The syntax is
	filename:field:query for example empro.dat:id:eclaci (the extended
	syntax is because empro.dat-id:eclaci looks like a filename ending
	in -id)

	Application 'tranalign' added.
	This aligns nucleic coding regions based on a set of aligned proteins.

Version 2.3.1 07-mar-2002

	Est2genome fixed for large alignments (over 40Mbase for
	est * genomic sequence length).

	Sequence reading for ABI files fixed (and selex files tested).

	Genbank feature input working.

	Pepinfo PNG output larger to make the text readable (only affects
	PNG output).

	Empty sequence file input fails gracefully.

	Empty sequence input fails gracefully (and only needs one
	^D from stdin).

Version 2.3.0 03-mar-2002

	Seqretall, seqretallfeat and seqretset moved to 'make check'.
	Seqret has all the functionality of the above.

	Fix for NBRF accession number reading (ajseqread.c).

	Whichdb program added.

	Fix for dbifasta and wormpep.

	Fix for problem reading plain format sequences by primer3.

	Primer3 renamed eprimer3 to avoid conflicts with the Whitehead's
	Primer3 version 3.0.6.

	Transeq's '-frame' can have a list of values, as: '-frame=1,2,3'.

	Non-existent files in lists are again ignored.

	Various wildcard database search fixes.

	ESIM4 added as an embassy package.

Version 2.2.0 12-Jan-2001

	New applications:
	Biosed, Contacts, Dichet, Psiblasts,
	Scopalign, Sigscan, Siggen.

	Configure tidy.

	Alignment report fixes.

Version 2.1.0  24-Dec-2001

	Jemboss.

	More formats for reports and alignments.

Version 2.0.1  29-jul-2001

	Release of HMMER as an embassy package.

	DBIGCG bugfix

Version 2.0.0  15-jul-2001

	New feature table handling etc.

Version 1.13.1 25-may-2001

	Fix emboss.default.template problem

Version 1.13.0 24-may-2001

	New applications showalign and embossversion.

	Prophet fixed.

Version 1.12.0 17-Apr-2001

	New applications distmat and cai.

Version 1.11.0 10-mar-2001

	New applications charge and degapseq.

Version 1.10.1 26-Feb-2001

	Bug fixes of marscan, getorf and garnier

Version 1.10.0 18-Feb-2001

	New applications scope, nrscope, domainer.

	Initial large file model support.

Version 1.9.0 22-Jan-2001

	New applications abiview and  recode.

	Linked list and string iterator code rewritten.

Version 1.8.0 20-Nov-2000

	New application coderet.

	Corba test routines

Version 1.7.0 31-Oct-2000

	New application entret.

	GCG output style changed.

	Fixed -slower & -supper input options for multiple sequences

Version 1.6.3 25-Oct-2000

	Further mods for seqed files.

	Rewrite of profile core routines.

	Added %id, %sim and fasta output to needle and water.

Version 1.6.2 23-Oct-2000

	Now reads GCG seqed mangled files.

	Phylip  output fixed.

	Numerous minor changes.

Version 1.6.1 11-Oct-2000

	RedHat Linux 7.0 fpos_t fix

Version 1.6.0 06-Oct-2000

	New application cons.

Version 1.5.6 3-Oct-2000

	URL access handles new SRS6.07* format.

	Library and applications leak-free.

	Error messages made less daunting.

Version 1.5.5 28-Sep-2000

	dbigcg changes for genbank.

	Memory leaks plugged.

Version 1.5.4 23-Sep-2000

	Added blast multi-volume support for database indexing.

	More gui hints in ACD files.

Version 1.5.3 18-Sep-2000

	LinuxPPC support added.

Version 1.5.2 5-Sep-2000

	dbigcg changes for embl database in GCG format.

Version 1.5.1 09-Sep-2000

	Changes to graphics data output for GUIs.

Version 1.5.0 07-Sep-2000

	New application emowse.

Version 1.4.3 03-Sep-2000

	tfm corrected.

	HTML documentation corrected.

	More GUI work.

Version 1.4.2 29-aug-2000

	Changes to graphics data output for GUIs.

Version 1.4.1 25-aug-2000

	Minor library changes.

Version 1.4.0 20-aug-2000

	New applications silent and restrict.

Version 1.3.1 18-aug-2000

	Indexing filenamelen fix.

	Modification to diffseq.

Version 1.3.0 17-aug-2000

	New applications vectorstrip and diffseq.

Version 1.2.0 15-aug-2000

Version 1.1.0 09-aug-2000

Version 1.0.2 08-aug-2000

Version 1.0.0 15-jul-2000

Version 0.0.4 Dec-1998