A Molecular Biology Database Directory and Schema Library
Purpose
Constructing Multidatabase Systems
The purpose of the Molecular Biology Database Directory and Schema Library
is to document and assemble a variety of molecular biology databases (MBDs)
into a multidatabase system (federation).
The Directory constitutes the basis for the OPM multidatabase
query system that supports queries across multiple MBDs.
The Library will promote the reuse and sharing of MBD schemas
with the goal of reducing the time and complexity of the schema design
process. A scientist developing a new schema will be able to browse
the Library and start from the reference or alternative schema designs,
and thus make use of the experience gained in developing other schemas.
Exploring Multiple Heterogeneous Databases
The Directory facilitates exploring multiple
(possibly heterogeneous) MBDs. The Directory contains information
about multiple MBDs expressed in a common (OPM) format, and is used
by the
OPM Multidatabase System.
The OPM multidatabase query system provide facilities for:
-
processing ad-hoc multidatabase queries via uniform OPM interfaces; and
-
assisting scientists in specifying and interpreting multidatabase queries.
Incorporating an MBD into the Directory involves
constructing one or more OPM views of the MBD, and entering information
about this MBD and its views into the Directory together with
information about how it can be accessed.
Queries in an OPM-based multidatabase system are expressed
in the OPM multidatabase query language (OPM*QL).
OPM*QL extends OPM-QL with constructs needed for querying
multiple databases, such as constructs for expressing conditions
across multiple classes from distinct databases,
and for navigating between classes of multiple databases following
inter-database links.
Processing OPM*QL queries involves generating OPM-QL queries
over individual databases in the multidatabase system, and
combining the results of these queries
using a local query processor. The stages of generating OPM-QL
queries and manipulating data locally may be interleaved depending on
the particular query evaluation strategy being pursued.
Modeling by Schema Reuse
The Library can be employed for modeling purposes as follows:
- Top-Down Schema Reuse Modeling.
A user can first find the relevant schemas in the schema library,
and then copy and/or combine these schemas to form a draft
schema. Classes or attributes that are not needed in the
draft schemas can be removed. The draft schema can be then
refined by adding new classes and attributes
or modifying the definitions of existing classes and attributes.
- Bottom-Up Class Reuse Modeling.
A user starts from (a possibly empty) schema draft.
Relevant classes from any schema in the library can be then selected
and included into the draft schema.
Classes related to a selected class
via abstract attributes or subclass-superclass relationships,
can be also added to the draft schema.
This class addition process can be continued until all relevant
classes selected from the library are added to the draft schema.
Search facilities provided by web browsers can be used for
finding relevant schema components for reuse. However these facilities
are limited to key word searches over an entire schema. No
facilities are provided for searching only certain types
of components (e.g., only attributes or only classes)
nor for finding similar components.
In future, we plan to provide a schema reuse tool that will
allow finding relevant schema components based on similarity
information regarding such components, and
help reusing schemas from the schema library by supporting
schema customization operations such as:
- merging schemas;
- merging classes and attributes;
- deleting classes and attributes;
- replacing classes;
- changing attribute value classes or constraints.