AQL Mark 1 - Walk through

AQL is a proposed new query language for acedb. It borrows syntax and ideas from OQL, the ODMG's proposed query language for object-oriented databases (which is supported by O2), Lorel, a language for querying semi-structured data developed at Stanford, and Boulder, a value-attribute path-based access system developed by Lincoln Stein for the Whitehead Genome Center.

This document provides a walk through of the proposed features for AQL, presented informally and illustrated by examples. There is also a syntax definition document, and an introductory document giving some history, motivation etc.

Example 1: Authors who live in England

	select a 
	from a in class Author
	where a->Address = "England"

The basic structure shown here is fairly standard, with a select clause specifying the output field(s), a from clause giving declarations of the range of variables, and a where clause giving a boolean condition on expressions of those variables. We use the arrow notation to access information inside objects.

The behaviour of such a query is:

Note that most acedb properties can be multi-valued, so that expressions like a->Address = "England" in the above contain implicit declarations that will range over all the Address strings. We are assuming in Example 1 that a person's country will be stored as one of their address lines. When this line matches, the condition is satisfied and a result entry is generated, even if other Address lines do not match. Note that we always produce a set rather than a bag (equivalent to an implicit distinct operator for SQL or OQL). Sets are so pervasively implicit in acedb that anything else makes seems meaningless.

Example 1 generated a list of objects. With the same construction we can generate a table containing the Email address of each author:

Example 2: Authors with their email addresses

	select a, a->Email
	from a in class Author

We may get multiple result lines per author if they have more than one email address. If the author does not have an email address, this query will still get a result line with a null value in the second field. To get entries only for people with email we would have to give an explicit condition: where exists a->Email.

Abbreviations

Expansions

Alternatives

Unexpected properties

Example 3: English authors with their publications in Nature

	select a,b 
	from a in class Author where a->Address = "England", 
	     b in a->Paper where b = "Nature"
	select a,b->Paper
	from a in class Author where a->Paper = "Nature"
	     b in a

More complex queries

Below we give a series of queries illustrating further features of the language.

Example 4: Authors who have published with Richard Durbin

	select a->Paper->Author from a in object("Author","Durbin RM") 
	or just	select object("Author","Durbin RM")->Paper->Author

Here we introduce the object()constructor, and show serial use of the object dereferencing operator.

Example 5: Objects mapped on chromosome III with their map position

	select g, gm.Position
	from g in object("Map","III")->Contains[2], gm in g->Map
	where gm = "III"

There are number of points here. The construction ->Contains[2] will find the items that are 2 positions away from the tag Contains , i.e. its grandchildren, rather than its children. This is a common construction in acedb: it allows objects from multiple arbitrary classes to be collected together while maintaining strong typing. Next note that we must require that gm = "III" because g may have positions on multiple maps. Finally, note that we have written gm.Position in the select clause, because we do not want to dereference gm as an object pointer; instead we want to look at the attribute following gm in the g object.

N.B. We would like to be able to write select g, g->Map."III".Position here.

Example 6: Subsequences of sequence AH6, with their start and end points, saved as a table @AH6subseq.

	@AH6subseq := select x,x[1],x[2] 
		      from x in object("Sequence","AH6")->Subsequence

We can save query results for temporary use later in a query sequence, and also long term in the database (final syntax for this to be decided). Note the use of x[1] and x[2] for the two fields following x, which happen to be single-valued in the model.

Example 7: Authors with more than 5 papers

select a in class Author where count(select a->Paper) > 5

Here we see a set operator count() applied to a recursively defined subquery. Note that the subquery has an unbound variable, a, making it a parameterized query. The parameter is bound to the value in the main query. We can use this mechanism to create parameterized queries that can have parameters externally provided via the query package interface.

Example 8: AH6 subsequences sorted by size

	(select r[1],size:r[3]-r[2] from r in @AH6subseq where [3]>[2]) 
	union (select [1],size:[2]-[3] from @AH6subseq where [2]>[3]) 
	order_by .size

Quite a lot in this one! First, we have set operations union, intersection, diff available on tables. Next, we can assign variables such as r to range over tables (previously stored as here, or explicitly defined). We dereference these variables by field number using the [1] notation, or by field name using the .size notation. Finally there is an order_by operator on tables, that takes as arguments a list of field specifications (with qualifiers for ascending or descending).