AQL examples - C.elegans Genome Database

These queries work on the full-sized C. elegans Genome Database and most will work on the sequence-less smaller version, the mini version used in acedb courses and on other genomic databases, with only minor alterations.

The queries were originally designed to utilise all the operators or keywords provided in the old query lvanguage. The old language is still functional, but the queries have been converted to AQL. These examples make use of recent AQL features which are available in version 4.8b and above.
Some queries have been formatted over multiple lines for readability. When typing multi-line queries in the standard acedb command-shell tace, the backslash character has to be appended to the end of each line, which will allow one command to be continued over more than one input line.
Multiline queries can be typed in the AQL test-program taql. The query is executed after entering a blank line.

Only a small subset of classes are used in order to simplify matters. It is essential to know the class model of any class before writing a query.
On an active database you can select a list of all the Class models, then open the relevant one. Or you can open the models file using a text editor. This file is in the wspec directory as models.wrm.

There are several different ways of writing each query, only some of which are ilustrated below.

Questions 1 to 5 give answers in the form of tables, at their simplest 1 column tables, which therefore resemble a list. Question 6 requires a numerical answer having counted the number of objects which satisfy a condition. In question 7, the query jumps from one class to another and back again For simplicity this is hown in two stages.

Question 8 demonstrates the power of AQL to move along a horizontal line in a model from primary tag to secondary or along several consecutive data-fields.

Listing objects using conditions on tags and object names

1a. Find all Loci starting with the letters dp: select l from l in class locus where l.name like "dp*"; select all from class locus where .name like "dp*"; select all from class locus."dp*" This notation is reminiscent of the Find Locus dp* notation of the old query-language. It may run marginally quicker than the first two queries.; select l from l in class locus."dp*" The query, but using a named iterator "l".

1b. Find all Authors starting with the letters Ba: select all class author where .name like "ba*" Same as 1a. All the variations also apply.

2a. Find all Authors with email addresses: select all class Author where exists_tag ->E_mail The exists_tag keyword is a boolean expression, which causes the where-clause to succeed or fail depending on whether the iterator succeeds to find the tag.
The expression ->E_mail would normally be expanded to ->E_mail[1] (i.e. return the first data-field to the right of the tag), but exists_tag will inhibit this particular behaviour for the expression that follows.; select all from class Author where exists_tag ->E_mail; select a from a in class Author where exists a->E_mail[0] When using the keyword exists the normal behaviour on tag-expressions applies. In order to refer to the existence of the tag E_mail, we specify the 0'th position from the tag, i.e. the tag itself. This is the long-hand version of the first query, which will internally be converted to this query anyway before execution. Both versions are therefore identical.

Use of logical operators : and or xor not

2b. Find all Authors whose name begins with C, and who have email addresses: select a from a in class Author where a.name like "C*" and exists_tag a->E_mail

2c. find all the authors whose object_name (surname) starts with 'a' and contains 'o'.: select all class Author where .name like "a*" and .name like "*o*"

3a. Find all Authors that have either an address OR a published paper.: select all class Author where exists_tag ->Address or exists_tag ->Paper

3b. Find all Authors starting with C, with an address OR a paper.: select all class Author where .name like "C*" and (exists_tag ->Address or exists_tag ->Paper)

3c. Find all Authors starting with C, with address AND a Paper: select all class Author where .name like "C*" and exists_tag ->Address and exists_tag ->Paper

3d.Find all Authors starting with c, with either an address or a Paper but NOT both.: select all class Author where .name like "C*" and (exists_tag ->Address xor exists_tag ->Paper)

3e. Find all Authors starting with c, with NO address: select all class Author where not exists_tag ->Address

Conditions on tags and their values

4a. Find authors who live in California OR Japan.: select all class Author where ->Mail like "*Calif*" or ->Mail like "*Japan"; select a, a->mail from a in class Author where a->Mail like "*Calif*" or a->Mail like "*Japan"

4b. Find authors who live in USA and work in an American University.: select all class Author where ->E_Mail like "*edu"; select a, a->E_mail from a in class Author where a->E_Mail like "*edu"; select a, a->E_mail from a in class Author where a->E_Mail like "*edu" order; select a, a->E_mail from a in class Author where a->E_Mail like "*edu" order by :1 asc, :2 asc; select a, a->E_mail from a in class Author where a->E_Mail like "*edu" order by :E_mail

5. Find the Authors who have more than 5 papers published.: select all class author where count (select ->Paper) > 5; select a, count (select a->Paper) from a in class author where count (select a->Paper) > 5; select a,c from a in class author, c in count (select a->Paper) where c > 5 Here the iterator c is declared upoin the value of the count expression. We can now refer to c in various places in the query without having to repeat the count function like in the previous query.

Crossing between classes

6a. Find papers published by Ian Hope.: select ->paper from class Author."hope*"; select a, a->paper from a in class Author where a.name like "*hope*" We find the author by matching every auhtor's name against the template "*hope*". The iterator a will therefore loop over all authors, just to find one single entry.; select hope->paper from hope in object("Author", "Hope IA") The object-constructor is perfect for situations where we know the exact class and object-name. Rather than the class-based declaration a in class Author which return multiple entries, the object constructor will only return one single entry and is very quick to execute. No wildcards are allowed in the class or object-name text.; select object("Author", "Hope IA")->paper The same query but shorter. The return value from the object-constructor can be dereferenced directly. and without an explicit from-clause, this query forms a complete table-expression.

6b. Find the co-authors of papers published by Ian Hope.: select a->paper->author from a in class Author where a.name like "*hope*" Any expression, such as a->paper which returns another object-value can be further dereferenced to any tag in the model of that object.; select ca from a in class Author where a.name like "*hope*", ca in a->paper->author; select ca from a in class Author."*hope*", ca in a->paper->author; select hope->Paper->Author from hope in object ("Author", "Hope IA") - fastest; select hope->paper, hope->paper->Author from hope in object ("Author", "Hope IA")

Moving along a line in a model

7. Find Sequences from DDBJ whose Accession number contains *269*

Looking at the relevant part of the Sequence model before starting this complex query :

?Sequence
  ..
	DB_info Database ?Database Text Text
  ..

where DB_info is a major tag similar to Address, Database is a sub-tag like Town and takes the name of the database, the text fields take the Identity name followed by the Accession number.

select s from s in class Sequence, 
              db in s->Database 
                 where db = "DDBJ" and 
		       db[2] like "*269*"

We have already learned that the default position of a tag-expression will refer to the first data-field along from that tag. s->Database is therefore automatically treated as s->Database[1] in the above declaration of db. The notation db[2] will now move two position further along from s->Database[1] to return the last Text-element in the model. Moving along the model is incremental, to the last Text-element is in fact s->Database[1][2] or can also be written as s->Database[3] directly.

select s, t from s  in class Sequence,
                 db in s->Database 
		       where db = "DDBJ",
                 t  in db[2] 
		       where t like "*269*"

Same query, but this time using two iterator declarations.

8a. Find genes on chromosome IV which have been sequenced.: select l from l in class locus where l->Map = "IV" and exists_tag l->Sequence; select l, m, s from l in class locus, m in l->Map where m = "IV", s in l->Sequence; select l, m, s from l in class locus, m in l->Map where m = "IV", s in l->Sequence where s The last where-clause is treating the iterator s as a boolean expression. s is declared on l->Sequence and the where-clause whill be TRUE if there is a data attached to that tag, and FALSE when there is no data.; select l, m, iss from l in class locus, m in l->Map where m = "IV", iss in exists_tag l->Sequence where iss In this query the variable iss is a boolean variable which is declared upon the result of the exists_tag function.

8b. Variations on 8a.: try another chromosome such as III, use OR, find those NOT sequenced, any other variations.

9. List objects mapped on chromosome III with their map position: select g, gm[Position] from g in object("Map","III")->Contains[2], gm in g->Map where gm = "III" The construction ->Contains[2] will find the items that are 2 positions away from the tag Contains , i.e. its grandchildren, rather than its children.
Next note that we must require that gm = "III" because g may have positions on multiple maps. Finally, note that we have written gm[Position]in the select-clause, because we do not want to dereference gm as an object pointer; instead we want to look at the attribute following gm in the g object. The positioning inside the same object is done here by name rather than by tag-position.

Queries on objects containing hash structures.

10. Find loci between positions 5 and 10 on chromosome IV

Hash structures are inlined model-definition with their own name-space. They are defined using the # (hash) character in the model.wrm file. The use of those hash structures is totally transparent in AQL, and when constructing a query it is best to think of the hash-model to be inlined at the given position.

Class Locus contains the following line where Map_position is a # (hash) structure :

?Locus  ...
	...
	Map ?Map XREF Locus #Map_position
	...

?Map_position UNIQUE  Position UNIQUE Float
                      Ends Left UNIQUE Float
                           Right UNIQUE Float

Loci have point positions (see Position) on the genetic maps.

select l, pos from l in class Locus, 
              map in l->Map where map = "IV",
              pos in map[Position] where pos > 5.0 and pos < 10.0

11a. Find Rearrangements with a right ends beyond position 5 on chromosome IV

Class Rearrangment has the following tags where Map_position is a hash structure

?Rearrangement ...
	...
                Map ?Map XREF Rearrangement #Map_position
	...

?Map_position UNIQUE  Position UNIQUE Float
                     Ends Left UNIQUE Float
                          Right UNIQUE Float

Rearrangements are deletions or duplications of a linear chunk of the chromosome and consequently have starting (Left) and finishing points (Right) on the map.

select r, right from r in class Rearrangement, 
            right in r->Map[Right] where right > 5.0

select r, right from r in class Rearrangement,
                     map in r->Map where map = "IV",
                     right in map[right] where right > 5.0

11b. Find Rearrangements with left ends before position 4 on chromosome IV: select r, left from r in class Rearrangement, map in r->Map where map = "IV", left in map[Left] where left < 4.0

11c. Find Rearrangements with left ends before position 4 on chromosome IV and right ends beyond position 5: select r, left, right from r in class Rearrangement, map in r->Map where map = "IV", left in map[Left] where left < 4.0, right in map[right] where right > 5.0

Using table functions

Table functions can be applied to the result of any table expression. Examples of table functions are count, sum, avg, first, last etc. A normal select-from-where query is such a table expression. A table function can only be used on tables with a single column are they have to be restricted to work only on one column of the given table.

12. How many predicted genes have matching cDNA.?

Note : in the worm genome project, although clone objects (cosmids and Yacs) have been fed into the sequencing reactions, the DNA sequence of an entire clone (e.g. AH6) is attached to a sequence object, of the same name (also AH6). After gene predictions have been made in these cosmid sequences, each coding sequence has a number such as AH6.1, AH6.2. In the database these are held as subsequences of the main AH6 sequence. This particular type of subsequence has a definition called Predicted_gene in the subclass.wrm file. It is more efficient to use subclasses in queries, wherever possible.

The first answer would take a very long time examining 140,000 sequence objects. The second uses the subclass and is faster.

select s from s in class sequence 
         where exists_tag s->CDS 
          and s.name like "*.*" 
          and exists_tag s->Matching_cDNA

select g from g in class predicted_gene 
               where exists_tag s->Matching_cDNA

The class Predicted_Gene is a subclass of Sequence and the interator g will loop over a much smaller set of objects that in the first query when 140,000 objects are inspected.

select count(select g from g in class predicted_gene 
                           where exists_tag s->Matching_cDNA)

By forming a query that reports only the result of the count-function we're left with a single number as a result.

13. How many predicted genes are equivalent to Mendelian loci.?: for explanation, see 12.; select g from g in class predicted_gene where exists g->Locus; select g, l from g in class predicted_gene, l in g->Locus where l; select count (select g from g in class predicted_gene where exists g->Locus)

14. How may Clones have been finished by the Sanger centre sequencing team ?.: see explanation under 12 for use of sequence class here.; $x := count(select s from s in class sequence where exists_tag s->Finished and s->From_Laboratory = "CB") The keyword count with the query as its argument isn't a full query in its own right. To make it a valid table expression, we assign the result of the expression to a scalar variable $x. The result of the query is therefore the value of this variable, and the result table will have one row with one column containing the single result value.; select s from s in object("sequence", "ah6") where exists_tag s->Finished and s->From_Laboratory = "CB"

15. How many contigs make up chromosome IV ?, or indeed any of the chromosomes.: select num from m in class Genetic_Map where m = "IV", num in count (select m->Contig); select m, num from m in class Genetic_Map, num in count (select m->Contig)

16. Which sequence objects have DNA ?: select s from s in class Sequence where exists_tag s->DNA By using the keyword exists_tag we don't have to open every sequence object to look at its DNA-tag. This tag is indexed and the lookups to the database-index are very quick. Compare this qith the execution time of this query :; select s from s in class Sequence where exists s->DNA This query will check the existsnce of a data-value for the DNA-tag. To find this out it will have to open every sequence object, which will take a very long time.

17. How many sequences have DNA longer than 300 base pairs.: select s from s in class genome_sequence where exists_tag ->DNA, length in s->DNA[2] where length > 300

18. How many cells have great grandchildren ?: select count (select c from c in class cell, d1 in c->daughter where d1, d2 in d1->daughter where d2, d3 in d2->daughter where d3); select count (select c from c in class cell where exists c->daughter->daughter->daughter)

Some queries which were designed to be performed with TableMaker.

When converted to AQL those queries are nothing special, and creating multi-columned tables of output is the strength of AQL

19. Make a table of the email addresses and laboratories of worm authors.: select a, addr, a->laboratory from a in class author where exists_tag a->E_mail, addr in a->E_Mail; select a, addr, lab from a in class author where exists_tag a->E_mail, lab in a->laboratory where lab, addr in a->E_Mail; select a, addr, lab from a in class author addr in a->E_mail where addr, lab in a->laboratory where lab; select a, a->E_mail, a->Laboratory from a in class author where exists_tag a->E_mail and exists_tag a->Laboratory

20. Make a table of Loci whose name begins with C, and the clones to which they have been mapped.: select l, c from l in class locus, c in l->positive_Clone where c.name like "c*"

21. Make a table of (Mendelian) genes on chromosome IV in the order: Note : Mendelian genes refers to those mapped by conventional crossing procedures. They should have associated map data and alleles (other than those created by Tc1 insertions).; select l, pos from l in class locus where exists_tag l->Allele, map in l->Map where map = "X", pos in map[Position] order by :pos

Gene expression at cellular level

Which genes are expressed in named cells: select l from l in class Locus, ex in l->Expr_pattern where exists ex->cell order

Which genes are expressed in neurons.: select c->Expr_pattern->locus from c in class Cell where c->Fate like "*neuron*" or c->Fate like "*Neuron*"

Which cells are known to have loci expressed in them.: select c, ex, l from c in class Cell, ex in c->Expr_pattern, l in ex->locus where l

Which loci are expressed in embryos ?: select l from l in class Locus, ex in l->Expr_pattern where ex->Life_stage like "*embryo*"; select l from l in class Locus, ex in l->Expr_pattern where ex->Life_stage like "*larv*" Finds the loci expressed in larvae.

Which loci are expressed in embryos but not in larvae ?: select l from l in class Locus, ex in l->Expr_pattern where ex->Life_stage like "*embryo*" diff select l from l in class Locus, ex in l->Expr_pattern where not ex->Life_stage like "*larv*" We have simply combined the sets of loci expressed in embryos and not expressed in larvae using the the table operator union.
Substitute the keyword union for minus or intersect to get different ways of combining the two sets of loci.; select l from l in class Locus, ex in l->Expr_pattern where ex->Life_stage like "*embryo*" and not ex->Life_stage like "*larv*" This query produces the same result, but is much quicker to write.

Which loci have no expression information ?: select l from l in class Locus where not exists l->expr_pattern

Queries designed by Fred Wobus and Sylvia Martinelli.
The Sanger Centre, 06/Dec/1999

acedb@sanger.ac.uk