Here is a simple AQL query that lists English authors with their email addresses, with an example of a possible resulting table:-
select a, a->Email from a in class Author where a->Mail = "England"
---> "Durbin_R" "rd@sanger.ac.uk" "Durbin_R" "rdurbin@demon.co.uk" "Wobus_F" "fw@sanger.ac.uk"
Conventional database systems store their data in tables, whereas acedb organizes its data in tree structured objects. However, the results of an AQL query are always returned in tables made up by rows and columns.
Like in SQL the basic query construction comes in three parts:-
select a, a->Email | select-clause |
from a in class Author | from-clause |
where a->Mail = "England" | where-clause |
The select-clause defines one or more fields that each will make up a column in the result. Whether or not those columns contain useful values and the number of rows will depend entirely on the data that matches the query.
One can think of the whole select-from-where query as an expression whose return value is a table with rows and columns. Every field in the select clause is an expression and for each iteration cycle the result value will be copied to the result table. These expressions usually extract data values from objects or combine them using arithmetic operators or functions. The variables that are referenced by select-field expressions have to be declared in the from-clause or in an enclosing scope.
More advanced AQL features need to refer to a specific field in a row. They can do this either by name or number. The second select-field in the above query can be named automatically by the tag that it uses, but if we want to use a name for the first field we must assign it explicitly using the field naming operator '::'.
select Author::a, a->Email from ...
The use of expressions in the select-clause also requires those fields to be explicitly named, if they are to be referred to by name. Note the second select is again automatically named by it's tag.
select Age::yeardiff(a->date_of_birth, today), a->full_name from ...
This part of the query declares all iterator variables and thereby defines the source of the data. Acedb stores all data in objects that belong to classes. The example query declares the iterator a to range over the objects in class Author. The preferred AQL notation for a class declaration is
although
are valid variations that are borrowed from OQL.
Class declarations will cause the query to fill the iterator variable a with values from the Author class. Other iterators can be defined based on previously defined iterators:-
... from a in class author, p in a->Paper ...
There are many cases where a specific object is wanted and we want to refer to that object directly without searching through the entire class. The declaration:-
... from s in object("Sequence","F15H10")->Subsequence ...
finds the object F15H10 in class Sequence directly and then declares the iterator variable s to loop over the objects that are listed under the Subsequence tag in F15H10.
Variables can also be declared to be based on the result of any expression (e.g. the table-function count):-
... from num in count(a->paper) ...
This notation can be very useful if the value for num needs to be used in the select-clause as well as in a comparison in the where-clause. Just referring to the variable num saves having to calculate the count-function twice.
Note that the from-where-clause is optional. It is not needed if the select fields don't refer to any iterators that require a declaration (in a from-clause), i.e. where there are no iterators in the query or where any iterators that are present have already been declared in an enclosing scope, which can happen inside a nested query.
The WHERE clause is a boolean condition which can be attached to any declaration to act as a filter.
In order to evaluate the where-clause in the example query, the Text-data stored under the Address tag in every Author object is compared with "England". The range of Object-values for the iterator a in the from-clause will therefore be restricted to those objects where the comparison is TRUE.
A query can have multiple where-clauses and in queries with multiple from-clause declarations
it is useful to apply the boolean condition as early as possible. The from-clauses are evaluated
in turn and a FALSE where-clause will prevent any subsequent declarations to be evaluated
in any cycle of the execution loop. Clever where-clauses that filter the declarations
early on can therefore increase the execution speed of a query.
For example the following query finds the papers of authors who
live in England. In the example, the second declaration which fetches
the authors papers doesn't have to be execuetd for any author where
the where-clause on the first declaration (test on english address)
has already failed.
select a, b from a in class Author where a->Address = "England", b in a->Paper
The database schema, known in acedb as the models (stored in wspec/models.wrm), specifies a type for each value that is stored. An iterator in AQL will assume that type when it binds to the value in the database.
Values can also be quoted as literals in an AQL query, e.g. in comparisons with iterator values. Every expression in an AQL query (e.g. in a select-field) that evaluates correctly will produce a result of a specific type.
These are the legal value types as used in the acedb database schema and examples of how they might be quoted in an AQL query.
Int | 1024 -12 |
Float | 3.1415 -5e3 4.2E-10 .2 |
Text | "England" "Durbin RM" |
DateType | `1998-08-04_10:21:46 `1989-11-09 |
To quote an object one has to use the object constructor to define an iterator, whose value will be that object. If the iterator is printed to the result table, the object name will be shown as a text-string.
The AQL syntax requires DateType literals to be prefixed by a ` character. A DateType literal may specify any level of detail from just the year down to the seconds.
These are valid DateType literals:-
`1998-08-04_10:27:32 `1998-08-04_10:27 `1998-08-04_10 `1998-08-04 `1998-08 `1998
The literals `1998-08-04_ or `1998-55 would be rejected.
The reserved keywords today and now are used to refer to the current date/time and current date respectively. For example at this moment these keywords replace the following date/time quotations:-
today | ---> | 1999-12-06_11:13:20 |
now | ---> | 1999-12-06 |
The definition of Null values in AQL is borrowed from Computer Science. The possible values for any type include the valid range of the type and the value NULL. Values usually become NULL if an iterator couldn't be evaluated, e.g. a tag is not present in an object. Rather than reporting 0 or an empty string - AQL will report NULL to indicate that no valid value could be found. This is because 0 and the empty string are valid values for Integer and String-types respectivly.
Expressions containing NULL values will fail and their result itself becomes NULL. This explains the
little quirk of boolean values, which can't only be TRUE or FALSE but also NULL.
Example :
select ->0, ->subsequence[1], abs (->subsequence[2] - ->subsequence[3]) > 1000 from class Sequence
The third column is a boolean value, but it can only be evaluated if the Subsequence tag and its values exist for
a given object. The result will be TRUE or FALSE based on the actual numbers, but NULL of the tag didn't exist and the
expression couldn't be evaluated.
Note that in a boolean context (e.g. comparison, boolean operator) Integer values are converted to booleans - zero is FALSE and non-zero is TRUE.
The most important part of any useful query is the ability to bind a variable to a database value. Iterator variables serve this purpose. Iterators are declared in from clauses as ranging over a set of values. Object-valued iterators can then be dereferenced using the -> operator to extract data from inside the objects.
AQL maintains the notion of a scope for each iterator within each select-from-where expression. Any identifier can only be declared once within a single scope. An inner nested query can redeclare the identifier name, shadowing the outer declaration. The identifier is out of scope beyond the select-from-where expression in which it was declared.
There are two different operators that can be used to extract data from inside an object. The most commonly used is the arrow operator ->, which will dereference an object valued expression, looking inside that object starting at its root (remember that acedb objects are trees). We have seen many examples of this, such as a->Email. The other data access operator is a pair of square brackets [ ], which are used to look further within the same object. This is most often seen in compound constructions, such as a->Map[Position] which extracts the Position that follows the Map entry inside the a object. i.e. this expects the a object to contain data looking something like
a = Locus unc-3 Map "X" Position 18.5633In contrast, a->Map->Position would give the Position in the Map object, i.e. the Position field within object "X" in the above example.
Either type of operator can use a tag name, as above, or a (non-negative) integer. If a
tag name is used, it is interpreted according to the schema (model) of the relevant acedb
class. If a number is used, it is interpreted as a number of positions rightwards in the
tree representation of the object. In practice, tags are normally used with the arrow
operator, and numbers with the brackets operator, because often the first action is to
extract a field by its tag name, and then further information may be stored in subsequent
fields to the right of it in the acedb model, which can be accessed by [1], [2]
etc. e.g. select a->Subsequence, a->Subsequence[1], a->Subsequence[2]
will extract the names, start and end points of subsequences according to the standard
acedb Sequence class model.
The iterator identifier can be omitted for one declaration only, so many simple queries can be abbreviated like:-
select ->Paper from class Author where ->Address[Mail] = "England"
The default iterator that is declared by a shorthand declaration has no name, therefore we can't refer to it by name. We can however refer to using ->0 or [0]. Both mean exactly the same - the object itself (in this case the default iterator).
In a simple select clause which reports only the values of the iterator itself, the keyword all can be used to represent the unidentified iterator. In this case no other select-field are allowed, but the keyword from also becomes optional:-
To list all sequence object containing the DNA-tag, the query shortens drastically to:-
select all class Sequence where exists_tag ->DNA
This query is identical to
select all from class Sequence where exists_tag ->DNA
select ->0 from class Sequence where exists ->DNA[0]
select [0] from class Sequence where ->DNA[0]
select s from s in class Sequence where exists_tag s->DNA
Multiple table-expressions within one query are concatenated by the semi-colon ;. AQL always returns the result of the last table-expression of a query.
Example:-
@AH6subseq := select seq::x, x[1], x[2]
from x in object("Sequence","AH6")->Subsequence ;
select r:seq, (r:3 - r:2)
from r in @AH6subseq where r:3 > r:2
The table called @AH6subseq is filled with the results of a select table-expression. This table contains three column, while the first column has explicitly been renamed to 'seq' for clarification.
The from-clause of the second table-expression defines a row-variable iterator r to loop over rows in the table @AH6subseq. In every evaluation cycle the iterator r will point to the next row in that table.
A field in a row is referred to by the row-variable name followed by a colon : followed by either the column name or the column number.
The select-clause then generates a result table with 2 columns by extracting the 'seq' field from each row and the result of subtracting the second column (r:2) from the third column (r:3).
In the same way fields are specified in row-variables, they can be used to extract a whole column from a table variable. This is another re-use of the same table definition:-
@AH6subseq := select seq::x, x[1], x[2]
from x in object("Sequence","AH6")->Subsequence ;
select s, s->source
from s in @AH6subseq:seq
The from-clause of the second expression declares a table based iterator s that will take values (of type Object) from the 'seq' column of the @AH6subseq table. This column contains the objects that follow Subsequence in AH6, and it can therefore be dereferenced in the second select-clause expression.
The default iterator from the select all construct can also be used with table variables. Based on the definition of @AH6subseq from above, these two queries are equivalent, but the second query removes the need for the declaration of a row-variable for each column:-
select def:1, def:2, def:3 from def in @AH6subseq
select all @AH6subseq
The default iterator that is substituted into the select-clause is not visible in the select all construct, but it can be referenced in the same way as normal default iterators.
select all @AH6subseq where :3 > :2
lists all rows in the table, where the the third field is greater than the second field. Columns can be referenced by name (see field naming) or by number as usual.
Just as the results table-expressions can be assigned to variables,
the results of scalar expressions can be assigned to scalar variables,
i.e. single-valued objects. The variable name of a scalar variable
starts with a $ followed by an identifier (like an
ordinary variable in perl).
It can be of any scalar type (Integer, Float, Text, Date) and will
assume the value and type of the result of evaluating the given expression.
Example:-
$x := avg(select x2 - x1
from s in object("Sequence","AH6")->Subsequence,
x1 in a[1], x2 in a[2]
where x2 > x1) ;
select s, x2-x1
from s in object("Sequence","AH6")->Subsequence,
x1 in s[1], x2 in s[2]
where x2 - x1 > $x
The first part of this query is a table-expression to find all positive sizes of subsequences of AH6. Their average if then assigned to the scalar variable $x. The second part then lists those subsequences with sizes that are bigger than average. It thereby re-uses the single value for the average computed in part I.
A scalar can be assigned to the result of any valid expression, e.g. table-functions, date-functions or arithmetic expressions.
The result of a singular scalar variable assign operation will be a single column table with one row to contain the expression value. Therefore
$x := 3.1415 * 3
is perfectly valid as an AQL query, although slightly inappropriate for a query language to answer.
Comparisons are boolean expressions, which means that the result is either TRUE or FALSE. Iterator variables can be compared to other iterators or to value literals or vice versa. It is important to understand the notion of value types, as incompatibilities lead to an error when the query is processed.
In general only variables or literals of equal types can be compared to each other. For simplicity however, a few automatic conversions are performed that make the language less fussy:-
During evaluation a iterator will assume the value-types of the database-field that they are bound to. Therefore in most cases the query has to be run against the database to perform all the type-checking.
AQL will allow the following comparison operators with their usual meaning:-
= != > < >= <= like
Note : Objects and Tags, when compared to each other can only be tested for equality or inequality (i.e. = and != respectively).
When Texts are compared to Objects/Tags or each other, all operators are valid. The like comparator performs case-insensitive wildcard matching on Text values.
"Cambridge, England, UK" like "*england*"
--> TRUE
Possible wildcards are the usual * and ?, where * matches any number of characters and ? matches any one character.
Apart from the like operator, all others work on Int, Float (as usual) and DateType (see next section) values.
It is easy to compare two DateType values, if they both specify the same level of detail. In this case the concept of later/earlier is unambiguous.
1996-03 < 1997-04 --> TRUE 1998-06-07 > 1998-06-12 --> FALSE 1998-06-07 = 1998-06-07 --> TRUE
If the two dates in question however, vary in the level of detail given,
the outcome of the comparison expression becomes more difficult to understand.
A comparison will only become TRUE, if the levels of detail given in the
two dates allows an unambiguous decision. The expression
1998-06 < 1998-07-09_09:51:23 is TRUE,
which decided at month-level.
By the same rational 1990 = 1990-05-02 is TRUE, as both dates are equal in as much information as is given in either date-value.
An equality comparison asks if the lesser detailed date is completely contained within the other, and the above comparison evaluates TRUE, because May 2nd 1990 is in the year 1990.
A comparison involving greater-than or less-than is based on
the amount of detail, that the decision can be based on:-
1998-07 < 1998-07-09 is FALSE,
because one date gives a specific day in July 1998, but as the
other doesn't specify the day, so we can't decide whether it is earlier
and the comparison evaluates FALSE.
To demonstrate this behavior, imagine a movie database, that contains movies titles and the date on which they were released. The movie "City Hall" was released in the USA on 1996-02-16.
select all class Movie where ->Released < `1996-02
select all class Movie where ->Released > `1996-02
will both EXclude the movie 'City Hall', while
select all class Movie where ->Released <= `1996-02
select all class Movie where ->Released >= `1996-02
select all class Movie where ->Released < `1996-02-17
select all class Movie where ->Released > `1996-02-15
select all class Movie where ->Released <= `1996-02-16
select all class Movie where ->Released >= `1996-02-16
will INclude 'City Hall'.
In addition to boolean comparisons, there are other boolean operators that can be evaluated in the where-clause.
The following operators are supported for boolean expressions:-
operator | boolean value | |
---|---|---|
exists obj->tag | - | TRUE if the tag has a value (if obj->tag[1] is non-NULL) |
exists_tag obj->tag | - | TRUE if the tag is present (if obj->tag[0] is non-NULL) |
not expr | - | TRUE if FALSE, FALSE if TRUE |
expr and expr | - | TRUE if both are TRUE |
expr or expr | - | TRUE if one is TRUE |
expr xor expr | - | TRUE if one or the other is TRUE, but not both |
All standard arithmetic expressions are supported, both in the select-clause to produce columns of calculated values and in the where-clause to compare expression results.
The following operators are supported for Integer and Floating point values types:-
operator | expression value | |
---|---|---|
- expr | - | negative value |
abs ( expr ) | - | absolute value |
mod ( expr, expr ) | - | modulus (positive remainder) |
expr + expr | - | Addition |
expr - expr | - | Subtraction |
expr * expr | - | Multiplication |
expr / expr | - | Division |
An expression that consist of Integers only will have an Integer result, but in expressions that contain Float values, all Integers are promoted to Float. An error is given if any side of an expression is not a numeric value type.
As well as functions on individual expression value, a range of
functions are provided to calculate aggregate values over
lists of values. The table functions will only consider non-NULL values from the list.
Such a list is a single column in a table. Table functions therefore work on tables
with only one column, or on table expressions
where the selected column is explicitly specified.
In order to construct a table for the table function to work on, we have had to introduce another, internal select statement. These can either be valid on their own, or be dependent on variables defined in an outer query. The argument for a table function can also be a table variable, previously assigned.
Table function | return value | |
---|---|---|
count | - | number of values |
sum | - | sum of (Int,Float)values |
avg | - | average over (Int,Float) values |
min | - | smallest/earliest (Int,Float,DateType) value |
max | - | greatest/latest (Int,Float,DateType) value |
first | - | first value in the list |
last | - | last value in the list |
These 2 examples demonstrate the use of table functions:-
select a, a->paper from a in class author
where count(select a->Paper) > 5
$x := sum(select b->price from b in class Books)
The query :-
@AH6subseq := select seq::x, size::(x[2]-x[1])
from object("Sequence","AH6")->Subsequence as x ;
select min @AH6subseq:size
is equivalent to :
select min (select seq::x, size::(x[2]-x[1]))
from object("Sequence","AH6")->Subsequence as x
Objects and certain value types have builtin methods. Here is an example:-
select a->Full_name from a in class author where a->Full_Name.length > 10
method name | locator type | return value |
---|---|---|
name | Object | Text-value for the object's name |
class | Object | Text-value for the object's class |
length | Text | Int-value for length of the Text-string |
create_session | Object | UserSession Object of the session in which the object was first saved |
create_time | Object | DateType-value when the object was first created |
update_session | Object | UserSession Object of the session in which the object was last updated |
update_time | Object | DateType-value when the object was last updated |
node_session | Tag | UserSession Object of the session when the tag was last added/updated |
node_time | Tag | DateType-value when the tag was last added/updated |
Note : Objects are always shown by their name in the result table, which basically implies a .name method for each Object-iterator in the select-clause. It is therefor not needed explicitly.
The timestamp methods are a powerful tool to track changes in the database. The underlying mechanism has been added to the Acedb-lernel at version 4.8c. All changes to the database have to be made with database code Version 4.8c or greater for the timestamping to work correctly.
The objects themselves and each tag carry timestamps. A timestamp can be returned either as a
DateType-value or as a UserSession Object. The date values can be used with
the date functions like yeardiff
, daydiff
.
The ?UserSession
object belongs to a built-in system class and has the following
model :-
?UserSession Session UNIQUE ?Session Start UNIQUE DateType Finish UNIQUE DateType User UNIQUE Text
The two DateType-values Start
and Finish
in this object
denote the time from gaining write-access and saving the changes in any
acedb-session. Those times are always in the most complete date/time format
YYYY-MM-DD_HH:MM:SS.
The tag Session
links to another object of the system class
?Session
whcih has the following built-in system model:-
?Session Session UNIQUE Int Permanent_session // If set, prevents automatic destruction Date UNIQUE Text User UNIQUE Text Session_Title UNIQUE Text CodeRelease UNIQUE Int UNIQUE Int DataRelease UNIQUE Int UNIQUE Int Created_from UNIQUE ?Session Up_linked_to UNIQUE ?Session Destroyed_by_session UNIQUE Int GlobalLex UNIQUE Int SessionLex UNIQUE Int UNIQUE Int UNIQUE Int UNIQUE Int VocLex UNIQUE Int UNIQUE Int UNIQUE Int UNIQUE Int GlobalBat UNIQUE Int UNIQUE Int // address # blocks used BatPlus UNIQUE ?Bat UNIQUE Int // # block set BatMinus UNIQUE ?Bat UNIQUE Int // # blocks freed IndexVersion UNIQUE Int // automatic indexing
Dates in acedb are extremely flexible. Any level of detail (always starting with a year) can be specified. So `1998-08-12 is not automatically treated as `1998-08-12_12:07:36, it just says that the time of day on the 12th of August is not known or not important.
This leveling, however imposes the need for functionality to access specific parts of the Date/Time structure. Section 4.1 showed how tricky it can be to determine whether a date is earlier/later than another. Using comparison operators it is decided upon the deepest detail level that is shared by both dates.
In order to calculate time spans or make comparative decisions based on specific Date/Time elements a set of functions is provided. Both arguments are expressions, that will evaluate to a DateType-value. The result value is always of type integer; the result is positive, if the first argument is earlier than the second one.
select a, Age::yeardiff(a->Date_of_birth, today) from a in class Author
Note that the evaluation fails, if either date doesn't specify the requested level of detail, no value is returned in this case. e.g. monthdiff(`1997-10-12, `1998) --> NULL
The following date functions are provided:-
Date function |
---|
yeardiff |
monthdiff |
daydiff |
hourdiff |
mindiff |
secdiff |
The keyword order can be specified after any select-from-where table expression. It can have an optional parameter asc or desc. The default is ascending order. If the order keyword is used on its own like that the column sorting precedence is left to right.
With the addition of the by keyword one can specify the precise column precedence of the table sorting. The arguments of by is a list of sort columns, specified by a colon ':', followed by the column number or name.
This example uses the field naming operator assign a name to the second select-field expression, which is then used by the order by-clause to refer to that column:-
select s, size::s[2]-s[1]
from s in object("Sequence","LINK_AH10")->Subsequence
order by :size
If more than one sort column is specified (separated by a comma), the left-most is the most significant - equal values in that column will be arranged according to the next sort criterion etc.
Additionally every sort criterion can have an optional parameter asc or desc to specify sorting in ascending or descending order.
This is an example using a fictional personell database:-
select s->First_Name, s->Last_Name
from s in class Staff
order by :2 asc, :1 desc
might produce the following:-
Walter Adams Peter Adams John Adams Nigel Clark Jim Clark John Cleese ...
Any two tables (either select-from-where expressions or table variables) can be combined using standard set operators.
The result is again a table with all duplicates removed (where a row occurs in both tables).
Both tables have to be precisely column-compatible - they have to have the same number of columns and the value-types of the columns must match. The table operation is performed after both tables have been evaluated individually, which can take quite long sometimes.
Table operator | result | |
---|---|---|
table union table | - | all rows from both tables |
table diff table | - | rows from the first table except the ones that are also in the second table, a.k.a. except |
table intersect table | - | rows that are in one table but not the other |
It is possible for the program that uses AQL functionality to provide
a context for a query. The context is a set of predefined variables,
whose value is readily available for any subsequent query.
Note that this is an implementation dependent feature of AQL.
acedb> find author
// Found 5178 objects in this class
// 5178 Active Objects
acedb> aql select :1 from @active where :1->Mail = "U.K."
"Berks M"
"Sulston JE"
...
"Wolstenholme AJ"
"Yeung M"
// 33 Active Objects
acedb>