A Molecular Biology Database Directory and Schema Library

Purpose



Constructing Multidatabase Systems

The purpose of the Molecular Biology Database Directory and Schema Library is to document and assemble a variety of molecular biology databases (MBDs) into a multidatabase system (federation). The Directory constitutes the basis for the OPM multidatabase query system that supports queries across multiple MBDs. The Library will promote the reuse and sharing of MBD schemas with the goal of reducing the time and complexity of the schema design process. A scientist developing a new schema will be able to browse the Library and start from the reference or alternative schema designs, and thus make use of the experience gained in developing other schemas.

Exploring Multiple Heterogeneous Databases

The Directory facilitates exploring multiple (possibly heterogeneous) MBDs. The Directory contains information about multiple MBDs expressed in a common (OPM) format, and is used by the OPM Multidatabase System. The OPM multidatabase query system provide facilities for:
  1. processing ad-hoc multidatabase queries via uniform OPM interfaces; and
  2. assisting scientists in specifying and interpreting multidatabase queries.
Incorporating an MBD into the Directory involves constructing one or more OPM views of the MBD, and entering information about this MBD and its views into the Directory together with information about how it can be accessed. Queries in an OPM-based multidatabase system are expressed in the OPM multidatabase query language (OPM*QL). OPM*QL extends OPM-QL with constructs needed for querying multiple databases, such as constructs for expressing conditions across multiple classes from distinct databases, and for navigating between classes of multiple databases following inter-database links. Processing OPM*QL queries involves generating OPM-QL queries over individual databases in the multidatabase system, and combining the results of these queries using a local query processor. The stages of generating OPM-QL queries and manipulating data locally may be interleaved depending on the particular query evaluation strategy being pursued.

Modeling by Schema Reuse

The Library can be employed for modeling purposes as follows:
  1. Top-Down Schema Reuse Modeling. A user can first find the relevant schemas in the schema library, and then copy and/or combine these schemas to form a draft schema. Classes or attributes that are not needed in the draft schemas can be removed. The draft schema can be then refined by adding new classes and attributes or modifying the definitions of existing classes and attributes.
  2. Bottom-Up Class Reuse Modeling. A user starts from (a possibly empty) schema draft. Relevant classes from any schema in the library can be then selected and included into the draft schema. Classes related to a selected class via abstract attributes or subclass-superclass relationships, can be also added to the draft schema. This class addition process can be continued until all relevant classes selected from the library are added to the draft schema.
Search facilities provided by web browsers can be used for finding relevant schema components for reuse. However these facilities are limited to key word searches over an entire schema. No facilities are provided for searching only certain types of components (e.g., only attributes or only classes) nor for finding similar components. In future, we plan to provide a schema reuse tool that will allow finding relevant schema components based on similarity information regarding such components, and help reusing schemas from the schema library by supporting schema customization operations such as:
  1. merging schemas;
  2. merging classes and attributes;
  3. deleting classes and attributes;
  4. replacing classes;
  5. changing attribute value classes or constraints.