define attributes (data) and Methods
can be extended, and inherit the attributes and methods of their parent
Bill(String), Webbed Feet, Neck(String)
Duck extends Waterfowl
Bill ("wide"), Neck("short")
Walk ("waddle"), Vocalization("quack")
Goose extends Waterfowl
Bill ("long"), Neck("long")
Walk ("waddle"), Vocalization("honk")
Extended Markup Language
XML is a
language designed for objects. Languages such as Java,
C++, Perl and Python, are designed to allow networked objects to
communicate with one another. The nature of the data must be hard-coded
into the programs, or into the database definition.
Another approach to data
interchange between objects is XML. XML is a protocol for
defining datafile formats. It is not a file format per se. Rather,
XML standardizes how data files
Although XML is still an
evolving set of protocols, several components are critical to XML:
- Data Definition
- an instance of data following the specifications of the DTD or
- DTD - Document
Remember, XML is not so much a language as a standard for describing a
data object. For each new type of data object it is necessary to write
a DTD to specify how the object is structured. The names of all
elements and their components are defined.
Sample DTD file for
GenBank entries used by gb2xml
by Phillipe Bouige [genbank.dtd.txt
The DTD is limited to the description of data elements with simple text
fields. While this model is adequate for most text documents, the
of more complex data elements requires a schema. A schema
in XML) describes relationships between data objects, as in a database.
elements can also be described, unlike DTDs, in which numbers are
as text. Constraints can be placed on data elements (eg. nucleotide can
the value A,G,C,T,U or N; the number of strands may have the value 1 or
The specification of constraints in the schema makes it possible to
validate an XML file.
Usually, an application
will check each XML file to make sure it conforms to the DTD. If valid,
the XML file is read, and the corresponding data objects are created.
EXAMPLE: The XML file
U11716.xml was created from GenBank entry U11716
was created by gb2xml
using the DTD
- XML stylesheet language
One of the most important uses for XML is as an alternative to HTML.
HTML is limited to displaying text in a fairly simple fashion. XML
allows the creation
of rich data types. However, XML files, unlike HTML, don't specify how
are to be displayed in a browser. Therefore, XML stylesheets can be
for each type of XML file, providing a consistent structure to XML
presentation. Most of the major web browsers (Netscape, IE, Mozilla,
at least some XML support.
a -In this
example, an application has been programmed to read a specific type of
An object is created within the application based on the specifications
in the XML file.
b - Conversely to a, an
object in the application is translated to an XML representation, and
written to a file.
c - This is a complex
example. Data from a binary database is written to an XML file. At top,
program, specialized for this particular XML data type, reads the XML
creates an object.
If a stylesheet exists for
this XML data type, a browser can import the same XML file and render
as a Web page. In this case, a complex page, consisting of text,
and a Java applet, is rendered.
| XML is still evolving! Use with caution!
The standards for XML are still under development, particularly with
regard to XSL stylesheets, XML schemas, and XML browsers. As well, in
particular fields, numerous mutually incompatible DTDs may exist for
the same type of data object! For example, an XML file for a DNA
sequence or a protein, produced by one program, may be unreadable by
another program, even though both programs can read XML.
Most of the widely-used languages (Java, C++, Python, Perl) have
extensive libraries for reading, writing and manipulating XML objects.
For more information, see XML for
Molecular Biology by Paul Gordon