These queries work on the full-sized C. elegans Genome Database and most will work on the sequence-less smaller version, the mini version used in acedb courses and on other genomic databases, with only minor alterations.
The queries were originally designed to utilise all the operators
or keywords provided in the old query lvanguage.
The old language is still functional, but the queries have been
converted to AQL. These examples make use of recent AQL features
which are available in version 4.8b and above.
Some queries have been formatted over multiple lines for readability.
When typing multi-line queries in the standard acedb command-shell tace,
the backslash character has to be appended to the end of each line,
which will allow one command to be continued over more than one input line.
Multiline queries can be typed in the AQL test-program taql.
The query is executed after entering a blank line.
Only a small subset of classes are used in order to simplify matters. It is
essential to know the class model of any class before writing a query.
On an active database you can select a list of all the Class models,
then open the relevant one. Or you can open the models file using
a text editor. This file is in the wspec directory as models.wrm.
There are several different ways of writing each query, only some of which are ilustrated below.
Questions 1 to 5 give answers in the form of tables, at their simplest 1 column tables, which therefore resemble a list. Question 6 requires a numerical answer having counted the number of objects which satisfy a condition. In question 7, the query jumps from one class to another and back again For simplicity this is hown in two stages.
Question 8 demonstrates the power of AQL to move along a horizontal line in a model from primary tag to secondary or along several consecutive data-fields.
select l from l in class locus where l.name like "dp*"
select all from class locus where .name like "dp*"
select all from class locus."dp*"
This notation is reminiscent of the Find Locus dp* notation
of the old query-language. It may run marginally quicker than
the first two queries.
select l from l in class locus."dp*"
The query, but using a named iterator "l".
select all class author where .name like "ba*"
Same as 1a. All the variations also apply.
select all class Author where exists_tag ->E_mail
The exists_tag keyword is a boolean expression, which
causes the where-clause to succeed or fail depending on
whether the iterator succeeds to find the tag.select all from class Author where exists_tag ->E_mail
select a from a in class Author where exists a->E_mail[0]
When using the keyword exists the normal behaviour on
tag-expressions applies. In order to refer to the existence
of the tag E_mail, we specify the 0'th position from the tag, i.e.
the tag itself. This is the long-hand version of the first query,
which will internally be converted to this query anyway before
execution. Both versions are therefore identical.
select a from a in class Author
where a.name like "C*" and exists_tag a->E_mail
select all class Author where .name like "a*" and .name like "*o*"
select all class Author
where exists_tag ->Address or exists_tag ->Paper
select all class Author where .name like "C*"
and
(exists_tag ->Address or exists_tag ->Paper)
select all class Author where .name like "C*" and
exists_tag ->Address and
exists_tag ->Paper
select all class Author where .name like "C*"
and
(exists_tag ->Address xor exists_tag ->Paper)
select all class Author where not exists_tag ->Address
select all class Author
where ->Mail like "*Calif*" or ->Mail like "*Japan"
select a, a->mail from a in class Author
where a->Mail like "*Calif*" or a->Mail like "*Japan"
select all class Author where ->E_Mail like "*edu"
select a, a->E_mail from a in class Author where a->E_Mail like "*edu"
select a, a->E_mail from a in class Author where a->E_Mail like "*edu" order
select a, a->E_mail from a in class Author
where a->E_Mail like "*edu" order by :1 asc, :2 asc
select a, a->E_mail from a in class Author
where a->E_Mail like "*edu" order by :E_mail
select all class author where count (select ->Paper) > 5
select a, count (select a->Paper) from a in class author
where count (select a->Paper) > 5
select a,c from a in class author, c in count (select a->Paper) where c > 5
Here the iterator c is declared upoin the value of the count expression.
We can now refer to c in various places in the query without having
to repeat the count function like in the previous query.
select ->paper from class Author."hope*"
select a, a->paper from a in class Author where a.name like "*hope*"
We find the author by matching every auhtor's name against the
template "*hope*". The iterator a will therefore loop over all
authors, just to find one single entry.
select hope->paper from hope in object("Author", "Hope IA")
The object-constructor is perfect for situations where we know the exact
class and object-name. Rather than the class-based declaration a in class Author
which return multiple entries, the object constructor will only return
one single entry and is very quick to execute. No wildcards are allowed
in the class or object-name text.
select object("Author", "Hope IA")->paper
The same query but shorter. The return value from the object-constructor
can be dereferenced directly. and without an explicit from-clause, this
query forms a complete table-expression.
select a->paper->author from a in class Author where a.name like "*hope*"
Any expression, such as a->paper which returns another object-value
can be further dereferenced to any tag in the model of that object.
select ca from a in class Author where a.name like "*hope*", ca in a->paper->author
select ca from a in class Author."*hope*", ca in a->paper->author
select hope->Paper->Author from hope in object ("Author", "Hope IA")
- fastestselect hope->paper, hope->paper->Author from hope in object ("Author", "Hope IA")
Looking at the relevant part of the Sequence model before starting this complex query :
?Sequence .. DB_info Database ?Database Text Text ..
where DB_info is a major tag similar to Address, Database is a sub-tag like Town and takes the name of the database, the text fields take the Identity name followed by the Accession number.
select s from s in class Sequence,
db in s->Database
where db = "DDBJ" and
db[2] like "*269*"
We have already learned that the default position of a tag-expression
will refer to the first data-field along from that tag. s->Database
is therefore automatically treated as s->Database[1] in the above
declaration of db. The notation db[2] will now move two
position further along from s->Database[1] to return the last
Text-element in the model. Moving along the model is incremental,
to the last Text-element is in fact s->Database[1][2]
or can also be written as s->Database[3] directly.
select s, t from s in class Sequence,
db in s->Database
where db = "DDBJ",
t in db[2]
where t like "*269*"
Same query, but this time using two iterator declarations.
select l from l in class locus
where l->Map = "IV" and exists_tag l->Sequence
select l, m, s from l in class locus,
m in l->Map where m = "IV",
s in l->Sequence
select l, m, s from l in class locus,
m in l->Map where m = "IV",
s in l->Sequence where s
The last where-clause is treating the iterator s as a boolean
expression. s is declared on l->Sequence and the where-clause
whill be TRUE if there is a data attached to that tag, and FALSE
when there is no data.select l, m, iss from l in class locus,
m in l->Map where m = "IV",
iss in exists_tag l->Sequence where iss
In this query the variable iss is a boolean variable which is
declared upon the result of the exists_tag function.try another chromosome such as III, use OR, find those NOT sequenced, any other variations.
select g, gm[Position]
from g in object("Map","III")->Contains[2], gm in g->Map
where gm = "III"
The construction ->Contains[2] will find the items that are
2 positions away from the tag Contains , i.e. its
grandchildren, rather than its children.
Hash structures are inlined model-definition with their own name-space. They are defined using the # (hash) character in the model.wrm file. The use of those hash structures is totally transparent in AQL, and when constructing a query it is best to think of the hash-model to be inlined at the given position.
Class Locus contains the following line where Map_position is a # (hash) structure :
?Locus ... ... Map ?Map XREF Locus #Map_position ... ?Map_position UNIQUE Position UNIQUE Float Ends Left UNIQUE Float Right UNIQUE Float
Loci have point positions (see Position) on the genetic maps.
select l, pos from l in class Locus,
map in l->Map where map = "IV",
pos in map[Position] where pos > 5.0 and pos < 10.0
Class Rearrangment has the following tags where Map_position is a hash structure
?Rearrangement ... ... Map ?Map XREF Rearrangement #Map_position ... ?Map_position UNIQUE Position UNIQUE Float Ends Left UNIQUE Float Right UNIQUE Float
Rearrangements are deletions or duplications of a linear chunk of the chromosome and consequently have starting (Left) and finishing points (Right) on the map.
select r, right from r in class Rearrangement,
right in r->Map[Right] where right > 5.0
select r, right from r in class Rearrangement,
map in r->Map where map = "IV",
right in map[right] where right > 5.0
select r, left from r in class Rearrangement,
map in r->Map where map = "IV",
left in map[Left] where left < 4.0
select r, left, right from r in class Rearrangement,
map in r->Map where map = "IV",
left in map[Left] where left < 4.0,
right in map[right] where right > 5.0
Table functions can be applied to the result of any table expression. Examples of table functions are count, sum, avg, first, last etc. A normal select-from-where query is such a table expression. A table function can only be used on tables with a single column are they have to be restricted to work only on one column of the given table.
Note : in the worm genome project, although clone objects (cosmids and Yacs) have been fed into the sequencing reactions, the DNA sequence of an entire clone (e.g. AH6) is attached to a sequence object, of the same name (also AH6). After gene predictions have been made in these cosmid sequences, each coding sequence has a number such as AH6.1, AH6.2. In the database these are held as subsequences of the main AH6 sequence. This particular type of subsequence has a definition called Predicted_gene in the subclass.wrm file. It is more efficient to use subclasses in queries, wherever possible.
The first answer would take a very long time examining 140,000 sequence objects. The second uses the subclass and is faster.
select s from s in class sequence
where exists_tag s->CDS
and s.name like "*.*"
and exists_tag s->Matching_cDNA
select g from g in class predicted_gene
where exists_tag s->Matching_cDNA
The class Predicted_Gene is a subclass of Sequence and the interator g
will loop over a much smaller set of objects that in the first query
when 140,000 objects are inspected.select count(select g from g in class predicted_gene
where exists_tag s->Matching_cDNA)
By forming a query that reports only the result of the count-function
we're left with a single number as a result.for explanation, see 12.
select g from g in class predicted_gene where exists g->Locus
select g, l from g in class predicted_gene, l in g->Locus where l
select count (select g from g in class predicted_gene
where exists g->Locus)
see explanation under 12 for use of sequence class here.
$x := count(select s from s in class sequence where exists_tag s->Finished and s->From_Laboratory = "CB")
The keyword count with the query as its argument isn't a full query in its own right.
To make it a valid table expression, we assign the result of the expression to a scalar variable $x.
The result of the query is therefore the value of this variable, and the result table will
have one row with one column containing the single result value.select s from s in object("sequence", "ah6") where exists_tag s->Finished and s->From_Laboratory = "CB"
select num from m in class Genetic_Map
where m = "IV",
num in count (select m->Contig)
select m, num from m in class Genetic_Map,
num in count (select m->Contig)
select s from s in class Sequence
where exists_tag s->DNA
By using the keyword exists_tag we don't have to open
every sequence object to look at its DNA-tag. This
tag is indexed and the lookups to the database-index are very quick.
Compare this qith the execution time of this query : select s from s in class Sequence
where exists s->DNA
This query will check the existsnce of a data-value for the DNA-tag.
To find this out it will have to open every sequence object,
which will take a very long time.select s from s in class genome_sequence
where exists_tag ->DNA,
length in s->DNA[2]
where length > 300
select count (select c from c in class cell,
d1 in c->daughter where d1,
d2 in d1->daughter where d2,
d3 in d2->daughter where d3)
select count (select c from c in class cell where exists c->daughter->daughter->daughter)
When converted to AQL those queries are nothing special, and creating multi-columned tables of output is the strength of AQL
select a, addr, a->laboratory from a in class author
where exists_tag a->E_mail,
addr in a->E_Mail
select a, addr, lab from a in class author where exists_tag a->E_mail,
lab in a->laboratory where lab,
addr in a->E_Mail
select a, addr, lab from a in class author
addr in a->E_mail where addr,
lab in a->laboratory where lab
select a, a->E_mail, a->Laboratory from a in class author
where exists_tag a->E_mail and exists_tag a->Laboratory
select l, c from l in class locus,
c in l->positive_Clone where c.name like "c*"
Note : Mendelian genes refers to those mapped by conventional crossing procedures. They should have associated map data and alleles (other than those created by Tc1 insertions).
select l, pos from l in class locus where exists_tag l->Allele,
map in l->Map where map = "X",
pos in map[Position] order by :pos
select l from l in class Locus,
ex in l->Expr_pattern
where exists ex->cell order
select c->Expr_pattern->locus from c in class Cell
where c->Fate like "*neuron*" or c->Fate like "*Neuron*"
select c, ex, l from c in class Cell,
ex in c->Expr_pattern,
l in ex->locus where l
select l from l in class Locus,
ex in l->Expr_pattern
where ex->Life_stage like "*embryo*"
select l from l in class Locus,
ex in l->Expr_pattern
where ex->Life_stage like "*larv*"
Finds the loci expressed in larvae.
select l from l in class Locus,
ex in l->Expr_pattern
where ex->Life_stage like "*embryo*" diff
select l from l in class Locus,
ex in l->Expr_pattern
where not ex->Life_stage like "*larv*"
We have simply combined the sets of loci expressed in embryos and not expressed
in larvae using the the table operator union.
select l from l in class Locus,
ex in l->Expr_pattern
where ex->Life_stage like "*embryo*" and
not ex->Life_stage like "*larv*"
This query produces the same result, but is much quicker to write.
select l from l in class Locus where not exists l->expr_pattern