The previous examples
illustrate that one database can generate many views. This concept
is addressed in a more formal way in Object-oriented databases.
Object-oriented databases solve many of the problems that are
inherent in relational databases, and provide more powerful
|The object concept can be stated as follows:
- A class
is a formula for creating an object.
is an object.
have Data and Methods
- You can
make new classes by reusing and extending an
unit of Object-oriented databases (OODB) is the Class.
A class can
be thought of as a formula for an object.
A class may have many
attributes and methods.
- Attributes -
Attributes hold the information that is intrinsic to the
class. Data items are usually simple types of data, such as
text or numbers. Attributes can also be pointers to other
classes. There may be either one or many data items or classes
of each type.
- Methods -
Methods are procedures or tasks that an object can carry out,
A method may contain its own data fields, and may point to
An object is an instance of
Objects, in OODBs,
correspond to individual records in a relational database. There
may therefore be many objects of a given class. A class is an abstraction.
An object is a tangible thing.
pFF100 is an instance
of the class PLASMID. The class definition allows it to point to
two other classes, VECTOR and DNA_SAMPLE. DNA_SAMPLE might look
concentration [µg/ µl]
Pisum ESTs I
In the DNA_SAMPLE
class, the first field points to either a VECTOR or a PLASMID.
In this case, the corresponding field in SK99.pFF100 points to
PLASMID pFF100. the EXPERIMENT field points to an object of the
type EXPERIMENT. the concentration field contains a floating
point number whose units are µg/ µl. The BOX field points to an
object of the type BOX, called "Pisum ESTs I".
|A database is a model of something in
the real world
It is important to note that out of 4 fields in the
DNA_SAMPLE class, three of them are relations. That is,
three of them point to other classes. Only one contains
data. This illustrates the point that relations between
objects are often as important as the data themselves. The
relations between objects give the data objects
characteristics similarly to their real-world
The implementation of
methods is highly dependent on the specific database software
being used. In some cases, macros, that is, sets of database
commands might be run. In other cases, an external program might
be called. For example, the PLASMID class could have a method
2500bp BamHI frag. from GB::X6638
Here, the MAP_VIEWER
and filename fields could be a template for a command that would
launch a viewing program with a specific file. In the pFF100
object, the actual command that would be run is 'xv pFF100.gif'.
This command would be passed to UNIX, launching the xv image
viewer with the GIF file pFF100.gif.
Depending on the
software, it might even be possible to call a different viewing
program for different types of image files. For example, if a
plasmid called pMU1 had a map in Adobe PDF format, it would be
necessary to view its map using the Adobe Acrobat viewer eg.
The tabular structure of
relational databases makes one-to-many relationships awkward to
implement. In OODBs, it is trivial to modify a class to allow many
fields of the same type. In databases like ACeDB, all fields may
be present in arbitrary numbers unless specifically implemented as
Here, four plasmid
constructs were made using the pBluescriptKSm13+ vector, and
three DNA_SAMPLEs of this vector (not the plasmids) are listed.
Many changes in classes
have no effect on other classes. In particular, any number of
attributes can be added, without changing objects that link to a
class. The one big exception to this is the case in which 2-way
links are added. In that case, both classes must be modified.
One point to make here
is that when a class is changed, not all objects in that class
need to be modified. OODBs, and to some extent other types of
databases, do not require that all objects contain data for all
possible attributes defined in the class. This allows a
'grandfathering' of preexisting objects. For example, if the
class Cell_Stock was modified by the addition of an attribute
called 'Date', listing the date on which the stock was made, it
would not be necessary to go back and insert a date into
(potentially thousands) of existing Cell_Stock objects. Dates
can be included in new Cell_Stock objects, as they are created.
done on specific objects
OODBs are the most
efficient type of database to modify. Each object can be updated
independently. Depending on the implementation, only a small part
of the database may need to be rewritten during updates.
Schemas - models for objects
A schema is a model, or a
formula, for how to create objects of a given class. Schemas can
be expressed as diagrams for human readability, or in languages
such as XML, for programmatic use.
Goals for creation of a good schema:
EXAMPLE - Schema to
implement biochemical pathways
- Each class
represents something in the real world. The closer your
classes and relations are to the real-world description, the
more the database will behave like the real thing you are
- Classes point to
other classes. Much of the information about things has to do
with their relationships to other things
- Try to minimize the
number of classes. If you see redundancy between two or more
classes, it may be better to combine two or more classes into
a single class
- Try to minimize the
size of classes. When a class gets too big, it may be time to
create a new class
- Never store the same
piece of information in two different classes. Links
(relations) can be duplicated, but not raw data (eg. numbers,
Reactions performed by
each enzyme are conceptualized as consumption of a substrate to
produce a product. The pathway class also has a Chart field, which
points to an image file showing the pathway. Note that objects
contain other pieces of information. For example, each compound
has a molecular weight, and each enzyme has an EC number.
|The schema at right implements a biochemical
pathways, using the conventions of the ACeDB system. Each
pathway object points to one or more enzymes present
in that pathway, and each enzyme object points to one or
more pathways to which it belongs.
Each field contains a label and a
|Databases try to create a model of real-world
things as we understand them. To make this possible, it is
useful to give each field a label, which describes what each
piece of data is intended to represent. The label is a
convenience for the human user. Each field also has a data
type, which indicates the type of data used to represent
that piece of information.
Common_Name is implemented as a Text field, a string of
Mol_Wt (molecular weight) is a number, so it is implemented
as an integer.
In a biochemical pathway, a compound can be a product of one
enzyme, and a substrate for one enzyme. To represent these
concepts, we have two fields, Produced_by and Consumed_by.
Both point to objects of the Enzyme class.
Note: Common_Name and Mol_Wt are examples of fields in which
the information is contained in each object. Produced_by and
Consumed_by are examples of fields which point to other
Remember, objects are instances of a class. We can make as many
instances of a class as we wish. So here are two Objects of the
class Compound, as implemented in the ACeDB system:
This illustrates the point that Classes are abstract ideas,
whereas Objects are specific instances of those
As mentioned above, Common_Name and Mol_Wt contain information,
whereas Produced_by and Consumed_by point to other objects, in
this case, Enzyme objects.
You can try a database that implements this schema to
emulate the TCA cycle, by typing 'pathace' at the Linux
of the best tests of a well-thought out a database
occurs when you decide that the schema needs to be
modified to add new concepts. For example, the existing
Enzyme class could be extended to incorporate the concept
of stoichiometry by adding coefficients to each compound
linked-to in the enzyme class. In the example at right, an
integer (Int) tells the number of molecules of a substrate
or product consumed or produced.
Adding these fields doesn't require any changes in the
other data objects. If your classes are well-designed, a
change in one class will not break other classes.
There are other possible modifications that might be reasonable
for a database of this sort. For example, the current schema
doesn't have provisions for enzymatic reactions that can proceed
in either direction. Databases should always be designed with the
goal of creating a realistic representation of something in the
real world, and building in the ability for change to occur in one
part without disrupting other parts.
good schema design
1. The database is a model of a
biological or experimental system. Make it as close to the real
system as possible.
each class simple. The fewer fields, the better.
3. Do not
duplicate the same piece of information in more than one object.
practical, avoid free text. Use links or enumerated choices.
BioLegato applies Object-Oriented concepts to Graphical User
The BioLegato interface is
a fundamental rethinking about how to work with data. It takes as
its premise the idea that objects are an intuitive way to combine
information and the methods that work with that information. If
the objects are structured like things that the end user is
already familiar with, the fact that the user already understands
the relationships between objects, and what they are expected to
do, makes it easier to use the software.
|blgeneric is a BioLegato interface that
launches BioLegato without any menus or canvas. This is
mainly for demonstration purposes, to illustrate the fact
that almost all functionality of BioLegato is programmable.
In the terminology of Object-Oriented programming,
think of BioLegato as an abstract class that is extended to
create real classes. So in a way blgeneric is like
instantiating an abstract class. To launch type 'blgeneric'.
To continue showing
how BioLegato follows the Object-Oriented paradigm, bldna shows
that all BioLegato windows have two parts: The canvas, which
displays the data, and the Menus, which are the methods for the
BioLegato object. In this example, bldna has a sequence canvas and
menus for working with DNA.
Similarly, blncbi has a table canvas for displaying NCBI search
results, and menu items for performing operations on those
results, such as retrieval of hits.
Designing software tools as objects ensure that only methods
appropriate with a particular kind of data accompany those
objects. bldna has methods for DNA or RNA sequences. blprotein
only has methods for proteins. blncbi has methods for NCBI query
results. Packaging data and methods together prevent errors by
making it impossible to use a method with data for which it is not
suited. For example, bldna can launch BLAST searches, but only
those searches that take a DNA sequence as input. Searches that
take protein as input cannot be run from bldna.
OO design also simplifies the look and feel of software tools by
limiting menus to only those methods that make sense for a
particular type of data.
in which Web sites can be considered databases
In many peoples' minds,
the World Wide Web is one big database. There is some element of
truth in that statement.
Example: The Tree of
- Provides an efficient way to store and
Web pages could
each be considered a record of data. In this context, the
browser is the 'front end' to the database.
- Is machine readable and searchable
very obvious, but is a critical distinction between a library
and a web site. Information in a library is not directly
searchable. In contrast, some very sophisticated search
engines now exist, using highly-efficient indexing schemes,
that make it possible to quickly find almost any kind of
information on the Web.
- Is object-oriented, to some degree
There are many
kinds of objects that are almost universally-recognized by web
browsers, including HTML and text files, graphic files, Java
applets, FTP sites, telnet programs, and plugins of many
kinds. They all have attributes and methods.
- Knowledge can be encoded in the structure of
a web site
In much the
same way that links structure an OODB, links between web pages
structure web sites. In principle, an object-oriented database
could be devised in which each plasmid had a web page, and
each plasmid web page linked to corresponding pages for DNA
samples and vectors.
The Tree of Life is a
taxonomic database edited by David R. Maddison at the University
Its main structure is a hierarchy of web pages, whose root is at
the kingdom level. Hypertext links allow a user to traverse the
phylogenetic tree from one level to another (eg. phylum, order,
class, family, genus, species). At each node, specialized data
of almost any kind may be found, from images to text documents,
or even links to other web sites.
DEMO: Descend Tree of life as follows:
Organisms with nucleated
the Web is not, strictly speaking, a database
For comparison with the
Tree of Life, the NCBI operates a taxonomy server through a
relational-database engine, as part of the NCBI database. [http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html
- Web pages have no formal structure
A web page can
be anything, and web sites can have any underlying structure.
The structure can change from one moment to the next. The
browser knows about nothing except the current page being
- Much of the data is in the form of text,
rather than structured types
databases, every field has a type. For example, in DNA_SAMPLE,
the concentration field was a floating point number. Many
database programs would have a straightforward way of finding
all DNA_SAMPLES whose concentration was 0.5 µg/ µl or greater.
Although text on web sites is machine searchable, the lack of
formal types makes it impossible to write programs to work
with the data.
- There is no formal definition of any
'object' on the Web
There can be as
many different ways of representing data as there are web
sites. At one site, literature references could be listed as
plain text, and at another, they could be implemented with
links to authors, journals, and electronic copies of
papers. At one site, a DNA sequence might be represented as
raw nucleotides, while at another, a sequence might be
available in a format such as GenBank, EMBL, or GCG. More
importantly, the lack of any formal definition of sequence
means that even at a single web site, each sequence could be
in a different format. Thus, there is no way to write a
program to handle a sequence from such a web site, because no
formal definition exists. As well, there is no way for
database software to do automated validation and error
- Structuring of data is on an ad hoc
In a database,
well defined objects mean that the knowledge encoded in the
database has a predictable structure. For example, given an
object of the type Paper, we can be pretty sure that there
will be authors, a journal or book, dates, pages etc. Links to
other objects exist, or not, at the whim of the author. Any
web page can link to any other web page, rather than to other
web pages of a specific type.
DEMO: Descend Tree of life as follows:
Although it may seem
like a subtle difference, the Tree of Life IS a collection of
web pages, whereas the web pages visited at NCBI are generated
on the fly from the NCBI database. The web pages seen at NCBI
are therefore a view of the data.
It should be pointed
out that each approach has advantages and disadvantages. The
NCBI web site is formal and structured, but primarily serves to
encode a taxonomic structure, with links to databse items such
as sequences or literature references. The Tree of Life is rich,
with images, articles, and other information, limited only by
the creativity of the contributers.