lprevious page PLNT4610/PLNT7690 Bioinformatics
Lecture 8, part 2 of 2
next page

3. Object-oriented databases

The previous examples illustrate that one database can generate many views. This concept is addressed in a more formal way in Object-oriented databases. Object-oriented databases solve many of the problems that are inherent in relational databases, and provide more powerful capabilities.

The object concept can be stated as follows:
  1. A class is a formula for creating an object.
  2. Everything is an object.
  3. Objects have Data and Methods
  4. You can make new classes by reusing and extending an existing class

The fundamental unit  of Object-oriented databases (OODB) is the Class.

A class can be thought of as a formula for an object.

A class may have many attributes and methods.

An object is an instance of a class

Objects, in OODBs, correspond to individual records in a relational database. There may therefore be many objects of a given class. A class is an abstraction. An object is a tangible thing.

pFF100 is an instance of the class PLASMID. The class definition allows it to point to two other classes, VECTOR and DNA_SAMPLE. DNA_SAMPLE might look like this:
 
CLASS OBJECT
DNA_SAMPLE
VECTOR | PLASMID

EXPERIMENT

concentration [g/ l]

BOX

SK99.pFF100
pFF100

SK99

0.5 

Pisum ESTs I

In the DNA_SAMPLE class, the first field points to either a VECTOR or a PLASMID. In this case, the corresponding field in SK99.pFF100 points to PLASMID pFF100. the EXPERIMENT field points to an object of the type EXPERIMENT. the concentration field contains a floating point number whose units are g/ l. The BOX field points to an object of the type BOX, called "Pisum ESTs I".

A database is a model of something in the real world

It is important to note that out of 4 fields in the DNA_SAMPLE class, three of them are relations. That is, three of them point to other classes. Only one contains data. This illustrates the point that relations between objects are often as important as the data themselves. The relations between objects give the data objects characteristics similarly to their real-world counterparts.
.

Methods

The implementation of methods is highly dependent on the specific database software being used. In some cases, macros, that is, sets of database commands might be run. In other cases, an external program might be called. For example, the PLASMID class could have a method called MAP_VIEWER:
 
CLASS OBJECT
PLASMID
VECTOR

insert

MAP_VIEWER filename

DNA_SAMPLE

pFF100
pBluescriptIISK+

2500bp BamHI frag. from GB::X6638

xv pFF100.gif

SK99.pFF100

Here, the MAP_VIEWER and filename fields could be a template for a command that would launch a viewing program with a specific file. In the pFF100 object, the actual command that would be run is 'xv pFF100.gif'. This command would be passed to UNIX, launching the xv image viewer with the GIF file pFF100.gif.

Depending on the software, it might even be possible to call a different viewing program for different types of image files. For example, if a plasmid called pMU1 had a map in Adobe PDF format, it would be necessary to view its map using the Adobe Acrobat viewer eg. 'acroread pMU1.pdf'.
 

One-to-one vs. one-to-many

The tabular structure of relational databases makes one-to-many relationships awkward to implement. In OODBs, it is trivial to modify a class to allow many fields of the same type. In databases like ACeDB, all fields may be present in arbitrary numbers unless specifically implemented as UNIQUE fields.
 
CLASS OBJECT
VECTOR
PLASMID
 
 
 

ACCESSION

DNA_SAMPLE
 
 

pBluescriptKSm13+
pI206KS
pI49KS
pI236KS
pI230KS

X52331

AN29.pBluescriptKSm13+
GK302.pBluescriptKSm13+
FJ120.pBluescript.KSm13+

Here, four plasmid constructs were made using the pBluescriptKSm13+ vector, and three DNA_SAMPLEs of this vector (not the plasmids) are listed.
 

Independence of classes

Many changes in classes have no effect on other classes. In particular, any number of attributes can be added, without changing objects that link to a class. The one big exception to this is the case in which 2-way links are added. In that case, both classes must be modified.

One point to make here is that when a class is changed, not all objects in that class need to be modified. OODBs, and to some extent other types of databases, do not require that all objects contain data for all possible attributes defined in the class. This allows a 'grandfathering' of preexisting objects. For example, if the class Cell_Stock was modified by the addition of an attribute called 'Date', listing the date on which the stock was made, it would not be necessary to go back and insert a date into (potentially thousands) of existing Cell_Stock objects. Dates can be included in new Cell_Stock objects, as they are created.

Modifications are done on specific objects

OODBs are the most efficient type of database to modify. Each object can be updated independently. Depending on the implementation, only a small part of the database may need to be rewritten during updates.

4. Schemas - models for objects

A schema is a model, or a formula, for how to create objects of a given class. Schemas can be expressed as diagrams for human readability, or in languages such as XML, for programmatic use.

Goals for creation of a good schema:
EXAMPLE - Schema to implement biochemical pathways

The schema at right implements a biochemical pathways, using the conventions of the ACeDB system. Each pathway object points to one or more  enzymes present in that pathway, and each enzyme object points to one or more pathways to which it belongs.

Reactions performed by each enzyme are conceptualized as consumption of a substrate to produce a product. The pathway class also has a Chart field, which points to an image file showing the pathway. Note that objects contain other pieces of information. For example, each compound has a molecular weight, and each enzyme has an EC number.

Each field contains a label and a data type

Databases try to create a model of real-world things as we understand them. To make this possible, it is useful to give each field a label, which describes what each piece of data is intended to represent. The label is a convenience for the human user. Each field also has a data type, which indicates the type of data used to represent that piece of information.

Common_Name is implemented as a Text field, a string of characters.
Mol_Wt (molecular weight) is a number, so it is implemented as an integer.

In a biochemical pathway, a compound can be a product of one enzyme, and a substrate for one enzyme. To represent these concepts, we have two fields, Produced_by and Consumed_by. Both point to objects of the Enzyme class.

Note: Common_Name and Mol_Wt are examples of fields in which the information is contained in each object. Produced_by and Consumed_by are examples of fields which point to other objects.




Remember, objects are instances of a class. We can make as many instances of a class as we wish. So here are two Objects of the class Compound, as implemented in the ACeDB system:




This illustrates the point that Classes are abstract ideas, whereas Objects are specific instances of those abstract ideas.

As mentioned above, Common_Name and Mol_Wt contain information, whereas Produced_by and Consumed_by point to other objects, in this case, Enzyme objects.


Pathway Demo

You can try a database that implements this schema to emulate the TCA cycle, by typing 'pathace' at the Linux prompt.


One of the best tests of a well-thought out  a database occurs when you decide that the schema needs to be modified to add new concepts. For example, the existing Enzyme class could be extended to incorporate the concept of stoichiometry by adding coefficients to each compound linked-to in the enzyme class. In the example at right, an integer (Int) tells the number of molecules of a substrate or product consumed or produced.

Adding these fields doesn't require any changes in the other data objects. If your classes are well-designed, a change in one class will not break other classes.


There are other possible modifications that might be reasonable for a database of this sort. For example, the current schema doesn't have provisions for enzymatic reactions that can proceed in either direction. Databases should always be designed with the goal of creating a realistic representation of something in the real world, and building in the ability for change to occur in one part without disrupting other parts.

Guidelines for good schema design

1. The database is a model of a biological or experimental system. Make it as close to the real system as possible.

2. Keep each class simple. The fewer fields, the better.

3. Do not duplicate the same piece of information in more than one object.

4. Wherever practical, avoid free text. Use links or enumerated choices.

5. BioLegato applies Object-Oriented concepts to Graphical User Interfaces

The BioLegato interface is a fundamental rethinking about how to work with data. It takes as its premise the idea that objects are an intuitive way to combine information and the methods that work with that information. If the objects are structured like things that the end user is already familiar with, the fact that the user already understands the relationships between objects, and what they are expected to do, makes it easier to use the software.

blgeneric is a BioLegato interface that launches BioLegato without any menus or canvas. This is mainly for demonstration purposes, to illustrate the fact that almost all functionality of BioLegato is programmable. In the terminology of  Object-Oriented programming, think of BioLegato as an abstract class that is extended to create real classes. So in a way blgeneric is like instantiating an abstract class. To launch type 'blgeneric'.

To continue showing how BioLegato follows the Object-Oriented paradigm, bldna shows that all BioLegato windows have two parts: The canvas, which displays the data, and the Menus, which are the methods for the BioLegato object. In this example, bldna has a sequence canvas and menus for working with DNA.



Similarly, blncbi has a table canvas for displaying NCBI search results, and menu items for performing operations on those results, such as retrieval of hits.



Designing software tools as objects ensure that only methods appropriate with a particular kind of data accompany those objects. bldna has methods for DNA or RNA sequences. blprotein only has methods for proteins. blncbi has methods for NCBI query results. Packaging data and methods together prevent errors by making it impossible to use a method with data for which it is not suited. For example, bldna can launch BLAST searches, but only those searches that take a DNA sequence as input. Searches that take protein as input cannot be run from bldna.

OO design also simplifies the look and feel of software tools by limiting menus to only those methods that make sense for a particular type of data.

6. Hypertext databases

Ways in which Web sites can be considered databases

In many peoples' minds, the World Wide Web is one big database. There is some element of truth in that statement. Example: The Tree of Life [http://tolweb.org/tree/phylogeny.html]

The Tree of Life is a taxonomic database edited by David R. Maddison at the University of Arizona.
Its main structure is a hierarchy of web pages, whose root is at the kingdom level. Hypertext links allow a user to traverse the phylogenetic tree from one level to another (eg. phylum, order, class, family, genus, species). At each node, specialized data of almost any kind may be found, from images to text documents, or even links to other web sites.

DEMO: Descend Tree of life as follows:

root
Organisms with nucleated cells (Eukaryota)
Animals (Metazoa)
Bilateria
Deuterostomia

Why the Web is not, strictly speaking, a database

For comparison with the Tree of Life, the NCBI operates a taxonomy server through a relational-database engine, as part of the NCBI database. [http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html ].

DEMO: Descend Tree of life as follows:

root
Eukaryota
Animals (Metazoa)
Bilateria
Deuterostomia

Although it may seem like a subtle difference, the Tree of Life IS a collection of web pages, whereas the web pages visited at NCBI are generated on the fly from the NCBI database. The web pages seen at NCBI are therefore a view of the data.

It should be pointed out that each approach has advantages and disadvantages. The NCBI web site is formal and structured, but primarily serves to encode a taxonomic structure, with links to databse items such as sequences or literature references. The Tree of Life is rich, with images, articles, and other information, limited only by the creativity of the contributers.

Unless otherwise cited or referenced, all content on this page is licensed under the Creative Commons License Attribution Share-Alike 2.5 Canada

 
lprevious page PLNT4610/PLNT7690 Bioinformatics
Lecture 8, part 2 of 2
next page