AQL - Language overview (Version 4

AQL - Language overview (Version 4_8b)

Here is a simple AQL query that lists English authors with their email addresses, with an example of a possible resulting table:-

select a, a->Email from a in class Author where a->Mail = "England"

  --->
	"Durbin_R"	"rd@sanger.ac.uk"
	"Durbin_R"	"rdurbin@demon.co.uk"
	"Wobus_F"	"fw@sanger.ac.uk"

Conventional database systems store their data in tables, whereas acedb organizes its data in tree structured objects. However, the results of an AQL query are always returned in tables made up by rows and columns.

1. Basic query structure

Like in SQL the basic query construction comes in three parts:-

`select a, a->Email`	select-clause
`from a in class Author`	from-clause
`where a->Mail = "England"`	where-clause

1.1. Select-clause

The select-clause defines one or more fields that each will make up a column in the result. Whether or not those columns contain useful values and the number of rows will depend entirely on the data that matches the query.

One can think of the whole select-from-where query as an expression whose return value is a table with rows and columns. Every field in the select clause is an expression and for each iteration cycle the result value will be copied to the result table. These expressions usually extract data values from objects or combine them using arithmetic operators or functions. The variables that are referenced by select-field expressions have to be declared in the from-clause or in an enclosing scope.

Field naming

More advanced AQL features need to refer to a specific field in a row. They can do this either by name or number. The second select-field in the above query can be named automatically by the tag that it uses, but if we want to use a name for the first field we must assign it explicitly using the field naming operator '::'.

select Author::a, a->Email from ...

The use of expressions in the select-clause also requires those fields to be explicitly named, if they are to be referred to by name. Note the second select is again automatically named by it's tag.

select Age::yeardiff(a->date_of_birth, today), a->full_name from ...

1.2. From-clause

This part of the query declares all iterator variables and thereby defines the source of the data. Acedb stores all data in objects that belong to classes. The example query declares the iterator a to range over the objects in class Author. The preferred AQL notation for a class declaration is

IteratorName in class ClassName

although

class ClassName as IteratorName

class ClassName IteratorName

are valid variations that are borrowed from OQL.

Class declarations will cause the query to fill the iterator variable a with values from the Author class. Other iterators can be defined based on previously defined iterators:-

... from a in class author, p in a->Paper ...

There are many cases where a specific object is wanted and we want to refer to that object directly without searching through the entire class. The declaration:-

... from s in object("Sequence","F15H10")->Subsequence ...

finds the object F15H10 in class Sequence directly and then declares the iterator variable s to loop over the objects that are listed under the Subsequence tag in F15H10.

Variables can also be declared to be based on the result of any expression (e.g. the table-function count):-

... from num in count(a->paper) ...

This notation can be very useful if the value for num needs to be used in the select-clause as well as in a comparison in the where-clause. Just referring to the variable num saves having to calculate the count-function twice.

Note that the from-where-clause is optional. It is not needed if the select fields don't refer to any iterators that require a declaration (in a from-clause), i.e. where there are no iterators in the query or where any iterators that are present have already been declared in an enclosing scope, which can happen inside a nested query.

1.3. Where-clause

The WHERE clause is a boolean condition which can be attached to any declaration to act as a filter.

In order to evaluate the where-clause in the example query, the Text-data stored under the Address tag in every Author object is compared with "England". The range of Object-values for the iterator a in the from-clause will therefore be restricted to those objects where the comparison is TRUE.

A query can have multiple where-clauses and in queries with multiple from-clause declarations it is useful to apply the boolean condition as early as possible. The from-clauses are evaluated in turn and a FALSE where-clause will prevent any subsequent declarations to be evaluated in any cycle of the execution loop. Clever where-clauses that filter the declarations early on can therefore increase the execution speed of a query.
For example the following query finds the papers of authors who live in England. In the example, the second declaration which fetches the authors papers doesn't have to be execuetd for any author where the where-clause on the first declaration (test on english address) has already failed.

select a, b from a in class Author where a->Address = "England", b in a->Paper

2. Quoting value literals

The database schema, known in acedb as the models (stored in wspec/models.wrm), specifies a type for each value that is stored. An iterator in AQL will assume that type when it binds to the value in the database.

Values can also be quoted as literals in an AQL query, e.g. in comparisons with iterator values. Every expression in an AQL query (e.g. in a select-field) that evaluates correctly will produce a result of a specific type.

These are the legal value types as used in the acedb database schema and examples of how they might be quoted in an AQL query.

Int	`1024 -12`
Float	`3.1415 -5e3 4.2E-10 .2`
Text	`"England" "Durbin RM"`
DateType	`1998-08-04_10:21:46 `1989-11-09

To quote an object one has to use the object constructor to define an iterator, whose value will be that object. If the iterator is printed to the result table, the object name will be shown as a text-string.

Dates and Times

The AQL syntax requires DateType literals to be prefixed by a ` character. A DateType literal may specify any level of detail from just the year down to the seconds.

These are valid DateType literals:-


	 `1998-08-04_10:27:32
	 `1998-08-04_10:27
	 `1998-08-04_10	
	 `1998-08-04
	 `1998-08
	 `1998

The literals `1998-08-04_ or `1998-55 would be rejected.

The reserved keywords today and now are used to refer to the current date/time and current date respectively. For example at this moment these keywords replace the following date/time quotations:-

today	--->	`1999-12-06_11:13:20`
now	--->	`1999-12-06`

Null values

The definition of Null values in AQL is borrowed from Computer Science. The possible values for any type include the valid range of the type and the value NULL. Values usually become NULL if an iterator couldn't be evaluated, e.g. a tag is not present in an object. Rather than reporting 0 or an empty string - AQL will report NULL to indicate that no valid value could be found. This is because 0 and the empty string are valid values for Integer and String-types respectivly.

Expressions containing NULL values will fail and their result itself becomes NULL. This explains the little quirk of boolean values, which can't only be TRUE or FALSE but also NULL.
Example : select ->0, ->subsequence[1], abs (->subsequence[2] - ->subsequence[3]) > 1000 from class Sequence
The third column is a boolean value, but it can only be evaluated if the Subsequence tag and its values exist for a given object. The result will be TRUE or FALSE based on the actual numbers, but NULL of the tag didn't exist and the expression couldn't be evaluated.

Note that in a boolean context (e.g. comparison, boolean operator) Integer values are converted to booleans - zero is FALSE and non-zero is TRUE.

3. Variables

3.1. Iterators

The most important part of any useful query is the ability to bind a variable to a database value. Iterator variables serve this purpose. Iterators are declared in from clauses as ranging over a set of values. Object-valued iterators can then be dereferenced using the -> operator to extract data from inside the objects.

AQL maintains the notion of a scope for each iterator within each select-from-where expression. Any identifier can only be declared once within a single scope. An inner nested query can redeclare the identifier name, shadowing the outer declaration. The identifier is out of scope beyond the select-from-where expression in which it was declared.

Extracting data from objects

There are two different operators that can be used to extract data from inside an object. The most commonly used is the arrow operator ->, which will dereference an object valued expression, looking inside that object starting at its root (remember that acedb objects are trees). We have seen many examples of this, such as a->Email. The other data access operator is a pair of square brackets [ ], which are used to look further within the same object. This is most often seen in compound constructions, such as a->Map[Position] which extracts the Position that follows the Map entry inside the a object. i.e. this expects the a object to contain data looking something like

	a = Locus unc-3
	  Map "X" Position 18.5633

In contrast, a->Map->Position would give the Position in the Map object, i.e. the Position field within object "X" in the above example.

Either type of operator can use a tag name, as above, or a (non-negative) integer. If a tag name is used, it is interpreted according to the schema (model) of the relevant acedb class. If a number is used, it is interpreted as a number of positions rightwards in the tree representation of the object. In practice, tags are normally used with the arrow operator, and numbers with the brackets operator, because often the first action is to extract a field by its tag name, and then further information may be stored in subsequent fields to the right of it in the acedb model, which can be accessed by [1], [2] etc. e.g. select a->Subsequence, a->Subsequence[1], a->Subsequence[2] will extract the names, start and end points of subsequences according to the standard acedb Sequence class model.

The unnamed (default) iterator

The iterator identifier can be omitted for one declaration only, so many simple queries can be abbreviated like:-

select ->Paper from class Author where ->Address[Mail] = "England"

The default iterator that is declared by a shorthand declaration has no name, therefore we can't refer to it by name. We can however refer to using ->0 or [0]. Both mean exactly the same - the object itself (in this case the default iterator).

In a simple select clause which reports only the values of the iterator itself, the keyword all can be used to represent the unidentified iterator. In this case no other select-field are allowed, but the keyword from also becomes optional:-

To list all sequence object containing the DNA-tag, the query shortens drastically to:-

select all class Sequence where exists_tag ->DNA

This query is identical to select all from class Sequence where exists_tag ->DNA
select ->0 from class Sequence where exists ->DNA[0]
select [0] from class Sequence where ->DNA[0]
select s from s in class Sequence where exists_tag s->DNA

3.2. Table variables

The result of a table-expression (e.g. a select-from-where query) can be assigned to a variable. The variable name starts with @ followed by an identifier (like a list object in perl). The values in that table can be re-used in subsequent query.

Multiple table-expressions within one query are concatenated by the semi-colon ;. AQL always returns the result of the last table-expression of a query.

Example:-


@AH6subseq :=  select seq::x, x[1], x[2] 
               from x in object("Sequence","AH6")->Subsequence ;
select r:seq, (r:3 - r:2)
from r in @AH6subseq where r:3 > r:2

The table called @AH6subseq is filled with the results of a select table-expression. This table contains three column, while the first column has explicitly been renamed to 'seq' for clarification.

The from-clause of the second table-expression defines a row-variable iterator r to loop over rows in the table @AH6subseq. In every evaluation cycle the iterator r will point to the next row in that table.

A field in a row is referred to by the row-variable name followed by a colon : followed by either the column name or the column number.

The select-clause then generates a result table with 2 columns by extracting the 'seq' field from each row and the result of subtracting the second column (r:2) from the third column (r:3).

In the same way fields are specified in row-variables, they can be used to extract a whole column from a table variable. This is another re-use of the same table definition:-


@AH6subseq :=  select seq::x, x[1], x[2] 
               from x in object("Sequence","AH6")->Subsequence ;
select s, s->source
from s in @AH6subseq:seq

The from-clause of the second expression declares a table based iterator s that will take values (of type Object) from the 'seq' column of the @AH6subseq table. This column contains the objects that follow Subsequence in AH6, and it can therefore be dereferenced in the second select-clause expression.

The unnamed (default) row-variable iterator

The default iterator from the select all construct can also be used with table variables. Based on the definition of @AH6subseq from above, these two queries are equivalent, but the second query removes the need for the declaration of a row-variable for each column:-

select def:1, def:2, def:3 from def in @AH6subseq

select all @AH6subseq

The default iterator that is substituted into the select-clause is not visible in the select all construct, but it can be referenced in the same way as normal default iterators.

select all @AH6subseq where :3 > :2

lists all rows in the table, where the the third field is greater than the second field. Columns can be referenced by name (see field naming) or by number as usual.

3.3. Scalar variables

Just as the results table-expressions can be assigned to variables, the results of scalar expressions can be assigned to scalar variables, i.e. single-valued objects. The variable name of a scalar variable starts with a $ followed by an identifier (like an ordinary variable in perl).
It can be of any scalar type (Integer, Float, Text, Date) and will assume the value and type of the result of evaluating the given expression.

Example:-


$x :=  avg(select x2 - x1
           from s in object("Sequence","AH6")->Subsequence,
               x1 in a[1], x2 in a[2]
           where x2 > x1) ;
select s, x2-x1
from s in object("Sequence","AH6")->Subsequence,
    x1 in s[1], x2 in s[2]
where x2 - x1 > $x

The first part of this query is a table-expression to find all positive sizes of subsequences of AH6. Their average if then assigned to the scalar variable $x. The second part then lists those subsequences with sizes that are bigger than average. It thereby re-uses the single value for the average computed in part I.

A scalar can be assigned to the result of any valid expression, e.g. table-functions, date-functions or arithmetic expressions.

The result of a singular scalar variable assign operation will be a single column table with one row to contain the expression value. Therefore

$x := 3.1415 * 3

is perfectly valid as an AQL query, although slightly inappropriate for a query language to answer.

4. Expressions and arithmetic

4.1. Comparisons

Comparisons are boolean expressions, which means that the result is either TRUE or FALSE. Iterator variables can be compared to other iterators or to value literals or vice versa. It is important to understand the notion of value types, as incompatibilities lead to an error when the query is processed.

In general only variables or literals of equal types can be compared to each other. For simplicity however, a few automatic conversions are performed that make the language less fussy:-

Integers are promoted to Floats in numeric comparisons.
Tags or Objects are promoted to Text if required.
Remember : These are database internal codes anyway, so text is the only way to quote them literally. When objects are compared to each other, though, their internal identifiers are used, not their names as Text.

During evaluation a iterator will assume the value-types of the database-field that they are bound to. Therefore in most cases the query has to be run against the database to perform all the type-checking.

AQL will allow the following comparison operators with their usual meaning:-

     =    !=    >    <    >=    <=    like

Note : Objects and Tags, when compared to each other can only be tested for equality or inequality (i.e. = and != respectively).

When Texts are compared to Objects/Tags or each other, all operators are valid. The like comparator performs case-insensitive wildcard matching on Text values.

"Cambridge, England, UK" like "*england*" --> TRUE

Possible wildcards are the usual * and ?, where * matches any number of characters and ? matches any one character.

Apart from the like operator, all others work on Int, Float (as usual) and DateType (see next section) values.

Comparison of dates

It is easy to compare two DateType values, if they both specify the same level of detail. In this case the concept of later/earlier is unambiguous.

         1996-03 <  1997-04     -->  TRUE
      1998-06-07 >  1998-06-12  -->  FALSE
      1998-06-07 =  1998-06-07  -->  TRUE

If the two dates in question however, vary in the level of detail given, the outcome of the comparison expression becomes more difficult to understand.
A comparison will only become TRUE, if the levels of detail given in the two dates allows an unambiguous decision. The expression 1998-06 < 1998-07-09_09:51:23 is TRUE, which decided at month-level.

By the same rational 1990 = 1990-05-02 is TRUE, as both dates are equal in as much information as is given in either date-value.

An equality comparison asks if the lesser detailed date is completely contained within the other, and the above comparison evaluates TRUE, because May 2nd 1990 is in the year 1990.

A comparison involving greater-than or less-than is based on the amount of detail, that the decision can be based on:-
1998-07 < 1998-07-09 is FALSE, because one date gives a specific day in July 1998, but as the other doesn't specify the day, so we can't decide whether it is earlier and the comparison evaluates FALSE.

To demonstrate this behavior, imagine a movie database, that contains movies titles and the date on which they were released. The movie "City Hall" was released in the USA on 1996-02-16.

select all class Movie where ->Released < `1996-02
select all class Movie where ->Released > `1996-02

will both EXclude the movie 'City Hall', while

select all class Movie where ->Released <= `1996-02
select all class Movie where ->Released >= `1996-02
select all class Movie where ->Released < `1996-02-17
select all class Movie where ->Released > `1996-02-15
select all class Movie where ->Released <= `1996-02-16
select all class Movie where ->Released >= `1996-02-16

will INclude 'City Hall'.

4.2. Boolean operators

In addition to boolean comparisons, there are other boolean operators that can be evaluated in the where-clause.

The following operators are supported for boolean expressions:-

operator		boolean value
exists obj->tag	-	TRUE if the tag has a value (if obj->tag[1] is non-NULL)
exists_tag obj->tag	-	TRUE if the tag is present (if obj->tag[0] is non-NULL)
not expr	-	TRUE if FALSE, FALSE if TRUE
expr and expr	-	TRUE if both are TRUE
expr or expr	-	TRUE if one is TRUE
expr xor expr	-	TRUE if one or the other is TRUE, but not both

4.3. Arithmetic

All standard arithmetic expressions are supported, both in the select-clause to produce columns of calculated values and in the where-clause to compare expression results.

The following operators are supported for Integer and Floating point values types:-

operator		expression value
`-` expr	-	negative value
abs `(` expr `)`	-	absolute value
mod `(` expr, expr `)`	-	modulus (positive remainder)
expr `+` expr	-	Addition
expr `-` expr	-	Subtraction
expr `` expr*	-	Multiplication
expr `/` expr	-	Division

An expression that consist of Integers only will have an Integer result, but in expressions that contain Float values, all Integers are promoted to Float. An error is given if any side of an expression is not a numeric value type.

4.4. Table functions

As well as functions on individual expression value, a range of functions are provided to calculate aggregate values over lists of values. The table functions will only consider non-NULL values from the list.
Such a list is a single column in a table. Table functions therefore work on tables with only one column, or on table expressions where the selected column is explicitly specified.

In order to construct a table for the table function to work on, we have had to introduce another, internal select statement. These can either be valid on their own, or be dependent on variables defined in an outer query. The argument for a table function can also be a table variable, previously assigned.

Table function		return value
count	-	number of values
sum	-	sum of (Int,Float)values
avg	-	average over (Int,Float) values
min	-	smallest/earliest (Int,Float,DateType) value
max	-	greatest/latest (Int,Float,DateType) value
first	-	first value in the list
last	-	last value in the list

These 2 examples demonstrate the use of table functions:-

select a, a->paper from a in class author where count(select a->Paper) > 5

$x := sum(select b->price from b in class Books)

The query :-

@AH6subseq := select seq::x, size::(x[2]-x[1]) from object("Sequence","AH6")->Subsequence as x ; select min @AH6subseq:size

is equivalent to :

select min (select seq::x, size::(x[2]-x[1])) from object("Sequence","AH6")->Subsequence as x

4.5. Methods

Objects and certain value types have builtin methods. Here is an example:-

select a->Full_name from a in class author where a->Full_Name.length > 10

method name	locator type	return value
name	Object	Text-value for the object's name
class	Object	Text-value for the object's class
length	Text	Int-value for length of the Text-string
create_session	Object	UserSession Object of the session in which the object was first saved
create_time	Object	DateType-value when the object was first created
update_session	Object	UserSession Object of the session in which the object was last updated
update_time	Object	DateType-value when the object was last updated
node_session	Tag	UserSession Object of the session when the tag was last added/updated
node_time	Tag	DateType-value when the tag was last added/updated

Note : Objects are always shown by their name in the result table, which basically implies a .name method for each Object-iterator in the select-clause. It is therefor not needed explicitly.

Timestamp methods

The timestamp methods are a powerful tool to track changes in the database. The underlying mechanism has been added to the Acedb-lernel at version 4.8c. All changes to the database have to be made with database code Version 4.8c or greater for the timestamping to work correctly.

The objects themselves and each tag carry timestamps. A timestamp can be returned either as a DateType-value or as a UserSession Object. The date values can be used with the date functions like yeardiff, daydiff. The ?UserSession object belongs to a built-in system class and has the following model :-

  ?UserSession	Session UNIQUE ?Session
		Start UNIQUE DateType
		Finish UNIQUE DateType
		User UNIQUE Text

The two DateType-values Start and Finish in this object denote the time from gaining write-access and saving the changes in any acedb-session. Those times are always in the most complete date/time format YYYY-MM-DD_HH:MM:SS.
The tag Session links to another object of the system class ?Session whcih has the following built-in system model:-

  ?Session	Session UNIQUE Int
		Permanent_session     // If set, prevents automatic destruction
		Date   UNIQUE Text
		User   UNIQUE Text
		Session_Title  UNIQUE Text
		CodeRelease UNIQUE Int UNIQUE Int
		DataRelease UNIQUE Int UNIQUE Int
		Created_from UNIQUE ?Session
		Up_linked_to UNIQUE ?Session
		Destroyed_by_session UNIQUE Int
		GlobalLex   UNIQUE Int
		SessionLex  UNIQUE Int UNIQUE Int UNIQUE Int UNIQUE Int
		VocLex  UNIQUE Int UNIQUE Int UNIQUE Int UNIQUE Int
		GlobalBat UNIQUE Int UNIQUE Int  // address # blocks used
		BatPlus  UNIQUE ?Bat  UNIQUE Int // # block set
		BatMinus UNIQUE ?Bat  UNIQUE Int // # blocks freed
		IndexVersion UNIQUE Int // automatic indexing

4.6. Date functions

Dates in acedb are extremely flexible. Any level of detail (always starting with a year) can be specified. So `1998-08-12 is not automatically treated as `1998-08-12_12:07:36, it just says that the time of day on the 12th of August is not known or not important.

This leveling, however imposes the need for functionality to access specific parts of the Date/Time structure. Section 4.1 showed how tricky it can be to determine whether a date is earlier/later than another. Using comparison operators it is decided upon the deepest detail level that is shared by both dates.

In order to calculate time spans or make comparative decisions based on specific Date/Time elements a set of functions is provided. Both arguments are expressions, that will evaluate to a DateType-value. The result value is always of type integer; the result is positive, if the first argument is earlier than the second one.

select a, Age::yeardiff(a->Date_of_birth, today) from a in class Author

Note that the evaluation fails, if either date doesn't specify the requested level of detail, no value is returned in this case. e.g. monthdiff(`1997-10-12, `1998) --> NULL

The following date functions are provided:-

Date function
yeardiff
monthdiff
daydiff
hourdiff
mindiff
secdiff

5. Advanced features

5.1. Table sorting

The keyword order can be specified after any select-from-where table expression. It can have an optional parameter asc or desc. The default is ascending order. If the order keyword is used on its own like that the column sorting precedence is left to right.

With the addition of the by keyword one can specify the precise column precedence of the table sorting. The arguments of by is a list of sort columns, specified by a colon ':', followed by the column number or name.

This example uses the field naming operator assign a name to the second select-field expression, which is then used by the order by-clause to refer to that column:-

select s, size::s[2]-s[1]

from s in object("Sequence","LINK_AH10")->Subsequence

order by :size

If more than one sort column is specified (separated by a comma), the left-most is the most significant - equal values in that column will be arranged according to the next sort criterion etc.

Additionally every sort criterion can have an optional parameter asc or desc to specify sorting in ascending or descending order.

This is an example using a fictional personell database:-

select s->First_Name, s->Last_Name

from s in class Staff

order by :2 asc, :1 desc

might produce the following:-

	Walter	Adams
	Peter	Adams
	John	Adams
	Nigel	Clark
	Jim	Clark
	John	Cleese
	...

5.2. Table operators

Any two tables (either select-from-where expressions or table variables) can be combined using standard set operators.

The result is again a table with all duplicates removed (where a row occurs in both tables).

Both tables have to be precisely column-compatible - they have to have the same number of columns and the value-types of the columns must match. The table operation is performed after both tables have been evaluated individually, which can take quite long sometimes.

Table operator		result
table union table	-	all rows from both tables
table diff table	-	rows from the first table except the ones that are also in the second table, a.k.a. except
table intersect table	-	rows that are in one table but not the other

5.3. Context variables

It is possible for the program that uses AQL functionality to provide a context for a query. The context is a set of predefined variables, whose value is readily available for any subsequent query.
Note that this is an implementation dependent feature of AQL.

Tace and the active KeySet

In tace, every query has access to a preset table variable @active. It is a table of one column of Object-values that represent the KeySet from the previous query.

acedb> find author

// Found 5178 objects in this class
// 5178 Active Objects 
acedb> aql select :1 from @active where :1->Mail = "U.K."
"Berks M"
"Sulston JE"
...
"Wolstenholme AJ"
"Yeung M"
// 33 Active Objects 
acedb>

After each AQL query the new active KeySet will be the first column with Object-values from the results table of the last query.

Command line arguments in taql

In taql, the user can pass in the value of a scalar variable using the command line (the -param argument). This enable sthe use of shell or perl script to supply different values, and the pre-formed query will stay the same, because it references a scalar variable.
Type taql -h for help on command line options.

acedb@sanger.ac.uk