PLNT4610/PLNT7690 Bioinformatics

DATABASE PROJECT



Objective: You will design and construct a database pertaining to your research work, research work in your lab, or some area of interest within the topic areas of the course. This project will use the ACeDB database system. The final product will be a database, which should be clearly organized, internally consistent and well documented.

1) Try out the demonstration ACeDB database. Explore the sample database to get a feeling for how it works. To run the demonstration, type 'acedemo' at the command line.

2)  Design a schema using the schema_template.odg as a starting point. Send the schema to Dr. Fristensky for comment before you implement. Update your schema in the file schema.png.

Note: I will only look at a schema ONCE before you begin implementation. Any subsequent requests for me to evaluate your schema will result in 2 points deducted from the grade.

Note: Due to technical problems with ACEDB on the CCL system, steps 3 - 6 will be done on one of the Fristensky lab servers, brassica.plants.umanitoba.ca or flamingo (10.53.2.50). Each student will be provided with a userid and password on one of these machines.  (You must be on the campus VPN to use flamingo). If doing Assignment 3, it will be necessary to set up your account as was done previously for your CCL account. Setup instructions can be found on the BIRCH web site for each of these hosts, in the Getting Started menu. You will need to do the sections on First Time Setup and BIRCH user settings.

IMPORTANT: Limitations on simultaneous users
On each server, only 5 users may run a Thinlinc session simultaneously. It is therefore critical that each user logout (NOT disconnect) from their account when not actually doing work. Please be considerate of your fellow students.



3) Tutorial: Loading a database, modifying models, and entering data

4) Create your database, using existing classes from the sample data base as templates for your own customized classes:
a) Start by deleting the existing database directory and creating a new, empty database directory. Make sure this directory is world-readable and world-executable.
b) Make a copy of wspec/models.wrm (eg. models.wrm.bak). You can use this file to copy classes. Now, delete all classes from models.wrm. (You can leave in the comment lines at the beginning of the file if you wish).
c) Start up as3db.py and reinitialize. An empty database will be created.
d) One at a time, add classes to models.wrm. You can copy classes from models.wrm.bak and modify them. At first, it may be best to not even include all data fields in a class, but to slowly add data fields and read in the models each time.
e) During the addition of models, you may discover flaws in your schema, or things that just make more sense if done another way. Modify your schema, and modify the models to match the schema.
f) Continue adding classes until all classes have been successfully read.
Save the database at each step.

At any time, if you make a major error, you can exit as3db without saving, and the error will not be written to the database.
5) Populate the database with your data. Depending on the data, you can either add data using the graphic interface, or by creating and reading .ace files, as shown in the tutorial. You don't need to create a large database. Just make sure that each class has at least a few objects to illustrate how your data classes look and how they fit together.

As you add data, you may discover that the database would work better if you change one or more models. Again, update the schema and the models. When you change a model, the data fields that no longer fit the model will be 'grandfathered'. That is, the data will be retained in separate fields. If this occurs, you should re-enter the grandfathered data into the new fields, and then delete the old fields.

6) When your database is complete do the following steps:

Criteria for evaluation will include:
  • (10 points) Schema
    • Schema is understandable.
    • Schema is a natural representation of real data
    • Data is not duplicated in two different classes
    • Classes are kept as small as is reasonable
    • Classes minimize use of Text and LongText, where data could be represented in more 'machine-readable' ways
  • (10 points) Database
    • All database files must be world-readable
    • The database must launch correctly from your as3db script
    • The database must be consistent with the schema
    • Where possible, links that can be made should be made. For example, in the sample database, each Genotype object should point to a Species object, because all Genotype objects belong to a Species.
    • All links work, All data should fit the models ie. no grandfathered data should be present from earlier versions of the models.
    • Avoid recording the same piece of information twice in the database.



Resources:


For quick hints on some common database operations:

ACeDB HOWTO

The only really complete and (almost) up to date manual on ACEDB:

ACEDB Manual (797k, PDF)