AQL Mark 1 - Walk through

AQL is a proposed new query language for acedb. It borrows syntax and ideas from OQL, the ODMG's proposed query language for object-oriented databases (which is supported by O2), Lorel, a language for querying semi-structured data developed at Stanford, and Boulder, a value-attribute path-based access system developed by Lincoln Stein for the Whitehead Genome Center.

This document provides a walk through of the proposed features for AQL, presented informally and illustrated by examples. There is also a syntax definition document, and an introductory document giving some history, motivation etc.

Example 1: Authors who live in England

	select a 
	from a in class Author
	where a->Address = "England"

The basic structure shown here is fairly standard, with a select clause specifying the output field(s), a from clause giving declarations of the range of variables, and a where clause giving a boolean condition on expressions of those variables. We use the arrow notation to access information inside objects.

The behaviour of such a query is:

loop over declarations nested inside each other
test the condition
if TRUE then evaluate the result field(s)
when done remove duplicates, i.e. make the resulting list/table a set

Note that most acedb properties can be multi-valued, so that expressions like a->Address = "England" in the above contain implicit declarations that will range over all the Address strings. We are assuming in Example 1 that a person's country will be stored as one of their address lines. When this line matches, the condition is satisfied and a result entry is generated, even if other Address lines do not match. Note that we always produce a set rather than a bag (equivalent to an implicit distinct operator for SQL or OQL). Sets are so pervasively implicit in acedb that anything else makes seems meaningless.

Example 1 generated a list of objects. With the same construction we can generate a table containing the Email address of each author:

Example 2: Authors with their email addresses

	select a, a->Email
	from a in class Author

We may get multiple result lines per author if they have more than one email address. If the author does not have an email address, this query will still get a result line with a null value in the second field. To get entries only for people with email we would have to give an explicit condition: where exists a->Email.

Abbreviations

The from clause is not needed if the results fields required are exactly those specified in the declarations. We can then write simply (select a in class Author where a->Address = "England").

There can be one default variable which can be thought of as having an empty name, allowing us to write (select class Author where ->Address = "England").

Expansions

Result fields of a table can be named, e.g. (select author:a, email:a->Email in class Author).

We would like to be able to specify (optional) types for declarations - not implemented yet.

Alternatives

In place of <var> in <set>, one can write <set> as <var>, or just <set> <var>.

Unexpected properties

Because we currently don't have an optimiser, we allow intercalation of from and where clauses. In fact we like it, and don't really see why we should change. If we set/build an optimiser, it can easily rearrange the order, or use this information only as a hint.

Example 3: English authors with their publications in Nature

	select a,b 
	from a in class Author where a->Address = "England", 
	     b in a->Paper where b = "Nature"

In the previous example, we could not write select a,a->Paper … where a->Address = "England" and a->Paper = "Nature", because the two different a->Paper expressions are not bound, so we would get all the papers in any journal written by those English authors who have published in Nature. This is not what typically happens with such a query, e.g. in SQL or OQL. Maybe we should do the necessary lexical analysis to bind these, but then if you want the alternative interpretation you have to do something weird. Perhaps just:

	select a,b->Paper
	from a in class Author where a->Paper = "Nature"
	     b in a

More complex queries

Below we give a series of queries illustrating further features of the language.

Example 4: Authors who have published with Richard Durbin

	select a->Paper->Author from a in object("Author","Durbin RM") 
	or just	select object("Author","Durbin RM")->Paper->Author

Here we introduce the object()constructor, and show serial use of the object dereferencing operator.

Example 5: Objects mapped on chromosome III with their map position

	select g, gm.Position
	from g in object("Map","III")->Contains[2], gm in g->Map
	where gm = "III"

There are number of points here. The construction ->Contains[2] will find the items that are 2 positions away from the tag Contains , i.e. its grandchildren, rather than its children. This is a common construction in acedb: it allows objects from multiple arbitrary classes to be collected together while maintaining strong typing. Next note that we must require that gm = "III" because g may have positions on multiple maps. Finally, note that we have written gm.Position in the select clause, because we do not want to dereference gm as an object pointer; instead we want to look at the attribute following gm in the g object.

N.B. We would like to be able to write select g, g->Map."III".Position here.

Example 6: Subsequences of sequence AH6, with their start and end points, saved as a table @AH6subseq.

	@AH6subseq := select x,x[1],x[2] 
		      from x in object("Sequence","AH6")->Subsequence

We can save query results for temporary use later in a query sequence, and also long term in the database (final syntax for this to be decided). Note the use of x[1] and x[2] for the two fields following x, which happen to be single-valued in the model.

Example 7: Authors with more than 5 papers

select a in class Author where count(select a->Paper) > 5

Here we see a set operator count() applied to a recursively defined subquery. Note that the subquery has an unbound variable, a, making it a parameterized query. The parameter is bound to the value in the main query. We can use this mechanism to create parameterized queries that can have parameters externally provided via the query package interface.

Example 8: AH6 subsequences sorted by size

	(select r[1],size:r[3]-r[2] from r in @AH6subseq where [3]>[2]) 
	union (select [1],size:[2]-[3] from @AH6subseq where [2]>[3]) 
	order_by .size

Quite a lot in this one! First, we have set operations union, intersection, diff available on tables. Next, we can assign variables such as r to range over tables (previously stored as here, or explicitly defined). We dereference these variables by field number using the [1] notation, or by field name using the .size notation. Finally there is an order_by operator on tables, that takes as arguments a list of field specifications (with qualifiers for ascending or descending).