AQL is a proposed new query language for acedb. It borrows syntax and ideas from OQL, the ODMG's proposed query language for object-oriented databases (which is supported by O2), Lorel, a language for querying semi-structured data developed at Stanford, and Boulder, a value-attribute path-based access system developed by Lincoln Stein for the Whitehead Genome Center.
This document provides a walk through of the proposed features for AQL, presented informally and illustrated by examples. There is also a syntax definition document, and an introductory document giving some history, motivation etc.
Example 1: Authors who live in England
select
a
from a in class Author
where a->Address = "England"
The basic structure shown here is fairly standard, with a select
clause specifying the output field(s), a from
clause giving declarations of the range of variables, and a where
clause giving a boolean condition on expressions of those variables. We use the arrow notation to access information inside objects.
The behaviour of such a query is:
Note that most acedb properties can be multi-valued, so that expressions like a->Address = "England"
in the above contain implicit declarations that will range over all the Address
strings. We are assuming in Example 1 that a person's country will be stored as one of their address lines. When this line matches, the condition is satisfied and a result entry is generated, even if other Address lines do not match. Note that we always produce a set rather than a bag (equivalent to an implicit distinct
operator for SQL or OQL). Sets are so pervasively implicit in acedb that anything else makes seems meaningless.
Example 1 generated a list of objects. With the same construction we can generate a table containing the Email address of each author:
Example 2:
Authors with their email addresses select
a, a->Email
from a in class Author
We may get multiple result lines per author if they have more than one email address. If the author does not have an email address, this query will still get a result line with a null value in the second field. To get entries only for people with email we would have to give an explicit condition: where
exists a->Email.
Abbreviations
from
clause is not needed if the results fields required are exactly those specified in the declarations. We can then write simply (select a in class Author where a->Address = "England").
(select class Author where ->Address = "England").
Expansions
(select author:a, email:a->Email in class Author).
Alternatives
<var> in <set>,
one can write <set> as <var>,
or just <set> <var>.
Unexpected properties
from
and where
clauses. In fact we like it, and don't really see why we should change. If we set/build an optimiser, it can easily rearrange the order, or use this information only as a hint.Example 3:
English authors with their publications in Nature select
a,b
from a in class Author where a->Address = "England",
b in a->Paper where b = "Nature"
select
a,a->Paper … where a->Address = "England" and a->Paper = "Nature", because the two different a->Paper
expressions are not bound, so we would get all the papers in any journal written by those English authors who have published in Nature. This is not what typically happens with such a query, e.g. in SQL or OQL. Maybe we should do the necessary lexical analysis to bind these, but then if you want the alternative interpretation you have to do something weird. Perhaps just: select
a,b->Paper
from a in class Author where a->Paper = "Nature"
b in a
More complex queries
Below we give a series of queries illustrating further features of the language.
Example 4:
Authors who have published with Richard Durbin select
a->Paper->Author from a in object("Author","Durbin RM")
or just select
object("Author","Durbin RM")->Paper->Author
Here we introduce the object()
constructor, and show serial use of the object dereferencing operator.
Example 5:
Objects mapped on chromosome III with their map position select
g, gm.Position
from g in object("Map","III")->Contains[2], gm in g->Map
where gm = "III"
There are number of points here. The construction ->Contains[2]
will find the items that are 2 positions away from the tag Contains
, i.e. its grandchildren, rather than its children. This is a common construction in acedb: it allows objects from multiple arbitrary classes to be collected together while maintaining strong typing. Next note that we must require that gm = "III"
because g
may have positions on multiple maps. Finally, note that we have written gm.Position
in the select
clause, because we do not want to dereference gm
as an object pointer; instead we want to look at the attribute following gm
in the g
object.
N.B. We would like to be able to write select
g, g->Map."III".Position here.
Example 6:
Subsequences of sequence AH6, with their start and end points, saved as a table @AH6subseq. @AH6subseq := select x,x[1],x[2]
from x in object("Sequence","AH6")->Subsequence
We can save query results for temporary use later in a query sequence, and also long term in the database (final syntax for this to be decided). Note the use of x[1]
and x[2]
for the two fields following x
, which happen to be single-valued in the model.
Example 7:
Authors with more than 5 papersselect
a in class Author where count(select a->Paper) > 5
Here we see a set operator count()
applied to a recursively defined subquery. Note that the subquery has an unbound variable, a
, making it a parameterized query. The parameter is bound to the value in the main query. We can use this mechanism to create parameterized queries that can have parameters externally provided via the query package interface.
Example 8:
AH6 subsequences sorted by size (select
r[1],size:r[3]-r[2] from r in @AH6subseq where [3]>[2])
union (select [1],size:[2]-[3] from @AH6subseq where [2]>[3])
order_by .size
Quite a lot in this one! First, we have set operations union, intersection, diff
available on tables. Next, we can assign variables such as r
to range over tables (previously stored as here, or explicitly defined). We dereference these variables by field number using the [1]
notation, or by field name using the .size
notation. Finally there is an order_by
operator on tables, that takes as arguments a list of field specifications (with qualifiers for ascending or descending).