AQL - MovieDB Example queries

Simple listing of objects

1. list all movies

select all from class Movie
select all class Movie (the same, just shorter)

List all objects in the class movie.
Note the system is not case sensitive when referring to database related terms, such as class names or tags.
results

2. list all persons

select p from p in class Person

A locator-variable p is defined to loop over the key-values in class Person. The query works exactly like the (1), but this query can be extended easily, as p can be referred to for tag-dereferencing, column-wise sorting etc..
results


Tag dereferencing to access object attributes

3. list all movies and their directors

select m, m->Director from m in class Movie

The from-clause defines an iterator variable m that loops over all key-values of class movie.
The select-clause defines two select-fields - m is just the object value (shown as it's name) and m->Director is the object value, which is stored with the tag Directory in object in the class Movie.
results

4. list all movies and their directors and possibly their real names

select m, m->Director, m->Director->Real_name from m in class Movie

Basically the same as (3), but the result of m->Director has been dereferenced to get the Real_name tag value for the particular object from the Person super-class.
results

5. list authors of books on which movies are based

select a, a->Real_name from m in class Movie, a in m->Based_on->Author

The from-clause defines two iterators - m loops over the Movie class and a which takes the value of the Person-object, that is stored with the Based_on-object (in class Book) for movies that are based on novels or plays.
The select-clause defines two result columns which report the Author objects for such books. The second column shows how that object can be dereferenced further to gain access to tags inside the Author objects.
Note that is isn't important where the dereferencing is done (select or from-clause). The above query is the same as

select m->Based_on->Author, m->Based_on->Author->Real_name from m in class Movie

which performs the same without the need for an extra variable a, but duplicates some query elements.
results

6. list all people and all their relations

select p, p->Relations[2] from p in class Person

This query makes use of tag-positioning. All values fields have a position relating to their tag. So far we've dereferenced tags to retrieve their values, e.g. m->Based_on. This notation is in fact translated to m->Based_on[1] to say that we're interested in the data element which is one to the right of the tag. A look at the models shows a different arrangement for the Relations data in the Person class, where the real data is positioned 2 levels to the right of the Relations tag.
The bracketed number 2 means to skip the first level of data (some people have a Spouse or other tag in the database) and only report the value to the right of that tag.
results

7. list all people and their spouses

select p, p->Relations[Spouse] from p in class Person

This query positioned the tag-value by name rather than number. The level2 tag notation at Relations is resolved only at the Spouse tag in this case. This query is equivalent to :

select p, p->Spouse from p in class Person

which shows that the user does not have to specify the exact path from the root of the object. The name for a tag has to be unique within a class, so there is always only one possible path from the root to a named tag in the object.
results


Conditional expressions in the where-clause

8. list movies that have a director with movie name and directors name

select m, d from m in class Movie, d in m->Director where exists d

The where-clause puts a condition on the preceding from-clause. The AQL keyword exists is a boolean construct, which returns TRUE if a value exists, and FALSE if no value can be found.
The result table has no blank fields in the second column as all those movies without a value for d (i.e. m->Director) have been omitted.
results

9. list the names of cast members in "True Romance"

select m->Cast from m in class Movie where m = "True Romance"

The query iterates the variable m over the Movie class, but only activates the select-clause (which reports the cast of a movie) if the value of the Movie iterator is "True Romance".
Although m takes object-typed value, we can compare them by their text-value (i.e. the object's name). The objects name has to be quoted for that matter.
The same results could be achieved by looping over all persons, and only selecting those that have starred in "True Romance".

select p from p in class Person where p->Stars_in = "True Romance"

This shows an interesting aspect of AQL - It is possible to get the same result using totally different queries. Other ways of obtaining the same results will be shown later.
results

10. list the movies that have been released after 1990, i.e. in 1991 and thereafter

select m, year from m in class Movie, year in m->Release[Date] where year > `1990

This condition compares two value of DateType. A date like 1990 would easily be confused with an integer, and therefore literal date-values have to be preceded with the reverse tick `.
The where-clause also has the effect of restricting the listing of m-values to those where a release-date is stored in the database, which implies the where exists ->year condition.
The iterator year could also be defined as m->Date, but the above construct was chosen for readability. See example (7).
results

11. list all movies for which no release year information is stored

select m from m in class Movie where not exists m->Release[Date]

The not keyword negates the value of the exists m->Release[Date] condition.
results


Date functions

12. list all people with their age

select p, yeardiff(p->Date_of_Birth, today) from p in class Person

Calculations with values of type Int or Float are easily written using the standard operators +, -, / and *. For DateType values this is not so easy. A set of functions is provided to calculate the difference between two dates at various levels.
This query extract the year portion from two dates and subtracts those to compute the age. The result of those date-functions is an integer.
The AQL keyword today will be rendered to `1999-12-06 (or whatever the current date is) during evaluation. The word today is therefore a valid date-literal value.
results

13. for all people over 55, list their name and age

select p, yeardiff(p->Date_of_Birth, today) from p in class Person where yeardiff(p->Date_of_Birth, today) > 55

Same as (12), but restricts the list to people with ages greater than 55.
Note that only values for p can be included for which the Born tag exists. See example (10), where a similar behavior occurred.
results

Possible date-functions are :-

These functions will return a sensible result of the date/time value specifies the appropriate portion that the function requests.

No sensible result will be returned for monthdiff (m->Release[Date], `1996). The date-function monthdiff can only produce a good result if both dates contain month-details. Some movies in fact have release dates with month information, but the date literal '1996 does not. The result of evaluating monthdiff will therefore always produce NULL.

All date-functions return an integer value, which is positive if left argument is the smaller (i.e. earlier) date.


Table functions

14. list all people and how many movies they've starred in

select p, count(select p->Stars_in) from p in class Person

The table-function count acts on the sub-query select p->Stars_in and returns an integer value for all objects where the Stars_in-tag exists. It simply counts the number of times the Stars_in-tag is contained in each object p.
Note, that count (p->Stars_in) is not valid. Table-functions only operate over valid table-expressions. And the minimal valid table expression to list all values of the Stars_in tag needs the keyword select.
results

15. list all movies with more than 3 cast members

select m from m in class Movie where count(select m->Cast) > 3

This time the table-function is evaluated in the where-clause, which compares it's integer result with the number 3.
results

16. calculate the average age of all the actors

select avg(select yeardiff(p->Date_of_Birth, now) from p in class Actor)

The avg table function acts on the return-value of another table expression. The inner select-query returns a list of ages, over which the average value is computed.
Such complex queries should be constructed inside-out. First try out the inner query and see that it returns the desired result. Then wrap that result up in the final table-function.
results

17. get the latest movie release date

select max(select m->Date from m in class Movie)

Similar to (16), max acts on a list of dates.
results

Possible table-functions are :-

All table functions will work on either accumulated object attributes (14),(15) or on the table values of whole select-queries(16),(17). All such tables expression have to produce a table with only one column, and in most cases they have to be of a specific value-type for the table-function to work.


Scalar variables

18. how many movies are stored

$x := count(select all class movie)

Scalar variable have a perl-like '$' syntax and can assume just one value. This value is usually the return value of a table-function expression. A variable assignment is treated as a whole query expression in itself.
results

The output will be a table with one column and one row containing the single return value that the variable $x will assume. The query is therefore equivalent to
select count(select all class movie)

19. list movies with an higher than average rating

$x := avg(select m->Rating from m in class Movie);
select m, r from m in class Movie, r in m->Rating where r > $x

The first query assigns a scalar variable to be the average rating of movie (ignoring movie for which no such value is stored). The second query then uses this value to list all movies for which the rating value is higher that that.
This demonstrates how queries can be concatenated using the semi-colon ';'. Usually one or more queries will calculate variables, whereas the last query is a table expression that uses those variables to generate the output.
results

The same query could be composed without a scalar variable, but the difference in execution time is quite startling :
select m, r from m in class Movie,
  r in m->Rating
    where r > avg(select m1->Rating from m1 in class Movie)

The first query would only run avg once (to get the value for $x) and finish in a time linearly related to the number of movies. The second query will execute the nested query inside the avg function in every iteration cycle of m from the outer query. The execution time is therefore quadratically related to the number of movies.

This is an example how queries can optimized by arranging it in a clever way.


Table variables

20. show release date of first "James Bond" film

@tmp := select first(select m->Date)
    from m in class Movie, a in m->Cast
    where a[1] = "James Bond" order by :1;
select min(@tmp):1

This query is structured into two parts. The first query lists the release dates of all films with the character named "James Bond". The nature of the model for the Release tag means that multiple release dates can be stored. The table-functions first ensures we're only listing one release date per movie. The results are sorted and assigned to the table variable @tmp.
The second part of the query uses the table function min to extract the smallest value from the first column of the given table. This will be the earliest release date of any bond film.
results

21. characters that appeared in more than 1 movie listed with the movies, sorted by the movie primary release date

@characters := select c,
        count(select m1 from m1 in class Movie where m1->Cast[2] = c),
        m, first(select m->Release[Date])
  from m in class Movie, c in m->Cast[2];
select character::row:1, movie::row:3, date::row:4
from row in @characters
    where row:2 > 1
order by :character, :date

The table-variable feature is used again to separate the work into two parts. The first query does all the work, whereas the second query, just selects specific fields and does the sorting. We will learn later how this can be written all in one query.
The second query demonstrates a row-variable. The iterator row is declared as a cursor on the table @characters. In the select-clause we extract fields 1, 3 and 4 from that row whereas the where-clause uses field 2 for the comparison.
The select-clause also renames the select-fields, which makes the query more readable. Those aliased names ('character', 'movie' and 'date') can then be used for the sort-clause to refer to select-fields by their name.
results

The use of table-variables sometimes stems from the way people compose queries by incrementally building them up from less complex queries. Their use is not always necessary though. Here is the same example, but without table-variables :
select c, m, mdate
from m in class Movie, c in m->Cast[2],
     num in count(select m1 from m1 in class Movie where m1->Cast[2] = c)
         where num > 1,
     mdate in first(select m->Release[Date])
order by :c, :mdate

See example (29) for more on splitting queries into two parts and table-variables.


Bond-movie examples

The database contains a particular dataset, which is useful for testing and learning. It contains the last 20 James Bond movies. Some have little detail stored about them, but the one feature that identifies them is the character name "James Bond", which is stored with the particular Bond-actor.
The following examples will demonstrate a range of features which get progressively more complicated.

22. list bond movies

select m from m in class Movie where m->Cast[2] = "James Bond"

In all Bond-movies there is a character called "James Bond". The character name is the data-element to the right of the actors name under the Cast-tag in the Movie-object.
results

23. list bond movies together with all of the cast

select m, m->Cast from m in class Movie where m->Cast[2] = "James Bond"

The where-clause is important : it applies to the iterator m, and will be evaluated once per movie. So the list of possible values for m is restricted to Bond-movies.
The select-clause will then evaluate m->Cast again for all of those movies, and therefore report all cast-members of those movies, not just the ones playing "James Bond". This may not always be what you want, so compare with (24).
results

24. list bond movies together with all of the cast

select m, a from m in class Movie, a in m->Cast where a[1] = "James Bond"

The subtle difference to (23) is important. The where-clause now applies directly to the iterator a (for "actor"). It will only be true if the given actor a is playing "James Bond", and because the select-clause reports the value of a, only the Bond-actors are shown.
Note, how we get the value for the character name in this query. The iterator m->Cast s automatically rewritten as m->Cast[1] (the first data-element to the right is reported by default). Therefore we only have to step one more position further to the right to get to the second field, which is the character name.
results

If the query was written as

select m, m->Cast from m in class Movie, a in m->Cast where a[1] = "James Bond"

the result would be the same as (23), because the extra declaration of a is just used in the where-clause and is basically the same as referring to m->Cast[2] directly.

25. list bond films with the James Bond actor and list all the movies that actor has starred in

select m, a, a->Stars_in from m in class Movie, a in m->Cast where a[1] = "James Bond"

Basically the same as (24), but with an extra select field which lists all movies by the actor a.
results

26. list all James Bond actors and the total number of movies they've starred in

select a, count(select m1 from m1 in class Movie where m1->Cast = a)
from m in class Movie, a in m->Cast where a[1] = "James Bond"

This query makes use of a nested sub-query. For every bond actor as determined by the outer query, the nested query loops over all movies again finding the ones which where the cast includes the actor a from the outer query.
results

27. list all James Bond actors and how many non bond-films they've starred in

select a, count(select m1 from m1 in class Movie,
       a1 in m1->Cast where a1 = a and a1[1] != "James Bond")
from m in class Movie,
   a in m->Cast where a[1] = "James Bond"

The inner is similar to the one in (26), but the where-clause is more elaborate to count only the films where the current actor a (from the outer scope) played someone that wasn't called "James Bond".
results

28. for all Bond actors list their films where they weren't James Bond and who they played in those films

select a, otherm, otherc
from m in class Movie,
  a in m->Cast
    where a[1] = "James Bond",
  otherm in a->Stars_in, a1 in otherm->Cast
    where a = a1,
  otherc in a1[1]
    where otherc != "James Bond"

We don't need a nested query here to count the number of movies for each Bond-actor. The first where-clause restricts the value of a to only those actors who have played "James Bond". The iterator otherm then find all their movies (which at this stage still includes all their bond movies as well). The last clause then gets their character names from all those movies and excludes "James Bond". This will exclude all actors who only played James Bond.
results


Tricks and pitfalls

All of the "James Bond" examples have been expressed so far in just one query. The query in example (28) grows to considerable complexity. The section on table-variables showed how some queries are better split up into two or more queries. A final query can then combine the results as shown in example (22).

29. list actors from Bond-films that didn't play "James Bond"

@a := select m->Cast from m in class Movie
                          where m->Cast[2] = "James Bond";
@b := select c from m in class Movie,
                    c in m->Cast,
                    otherm in c->Stars_in where c[1] = "James Bond";
@a diff @b

The table @a lists all actors from movies where a character was called "James Bond". The second table @b will list all those actors who actually played the "James Bond" character. That leaves us to report the difference of the two table using the diff table-function.
The use of table-variables can be avoided because the diff keyword can be placed between any two expressions which return a table value, i.e. two select-from-where queries :

select m->Cast from m in class Movie
                    where m->Cast[2] = "James Bond"
 diff
select c from m in class Movie,
              c in m->Cast,
              otherm in c->Stars_in where c[1] = "James Bond"

results

The trick of the two queries which are combines by diff, leave no ambiguity. Spot the difference in this query, where everything is expressed as just one select-from-where query :

select othera
from m in class Movie
          where m->Cast[2] = "James Bond",
     othera in m->Cast
               where othera[1] != "James Bond"

results

At first sight it seems to do the right thing. We select only the movies m where a character is called "James Bond". The variable othera then lists all actors again for only those movies and we check that we only report those who didn't play "James Bond". So how is it different ?

The line "Sophie Marceau" is missing from the result. Closer inspection of the data reveals that no character name is stored for her role in "The world is not enough". The second where-clause on othera therefore has no value for othera[1] to be used in the comparison. The comparison will evaluate to NULL and no result is shown for "Sophie Marceau".

See the section on NULL values.


Queries designed by Fred Wobus.
The Sanger Centre, 06/Dec/1999
acedb@sanger.ac.uk