Library of Information Models for Biological Collections:
Purpose
Main Purpose
The purpose of the Library of Information Models for Biological
Collections is to present and document a reference model
for biological collections, and record alternative models derived
from or related to this reference model.
The Library will promote the reuse and sharing of models (schemas)
with the goal of reducing the time and complexity of the schema design
process. A scientist developing a new schema will be able to browse
the Library and start from the reference or alternative schema designs,
and thus make use of the experience gained in developing other schemas.
Furthermore, the Library will aid the development of
software for interconnecting multiple biological collections databases
(BCDs) and eventually help scientists to
specify queries across multiple BCDs.
Modeling by Schema Reuse
The Library can be employed for modeling purposes as follows:
- Top-Down Schema Reuse Modeling.
A user can first find the relevant schemas in the schema library,
and then copy and/or combine these schemas to form a draft
schema. Classes or attributes that are not needed in the
draft schemas can be removed. The draft schema can be then
refined by adding new classes and attributes
or modifying the definitions of existing classes and attributes.
- Bottom-Up Class Reuse Modeling.
A user starts from (a possibly empty) schema draft.
Relevant classes from any schema in the library can be then selected
and included into the draft schema.
Classes related to a selected class
via abstract attributes or subclass-superclass relationships,
can be also added to the draft schema.
This class addition process can be continued until all relevant
classes selected from the library are added to the draft schema.
Search facilities provided by web browsers can be used for
finding relevant schema components for reuse. However these facilities
are limited to key word searches over an entire schema. No
facilities are provided for searching only certain types
of components (e.g., only attributes or only classes)
nor for finding similar components.
In future, we plan to provide a schema reuse tool that will
allow finding relevant schema components based on similarity
information regarding such components, and
help reusing schemas from the schema library by supporting
schema customization operations such as:
- merging schemas;
- merging classes and attributes;
- deleting classes and attributes;
- replacing classes;
- changing attribute value classes or constraints.
Exploring Multiple Heterogeneous Databases
The Library will facilitate exploring multiple collections
(possibly heterogeneous) databases. The Library will eventually
contain information about multiple collections databases expressed
in a common (OPM) format. The OPM toolkit contains tools that
allow exploring such databases in the context of a
OPM Multidatabase System.
The multidatabase OPM tools provide facilities for:
-
assembling heterogeneous databases into a multidatabase
system, while documenting their schemas and inter-database links;
-
processing ad-hoc multidatabase queries via uniform OPM interfaces; and
-
assisting scientists in specifying and interpreting multidatabase queries.
Incorporating a collections database into a OPM multidatabase system involves
constructing one or more OPM views of the database, and entering information
about this database and its views into the Library together with
information about how it can be accessed.
Queries in an OPM-based multidatabase system are expressed
in the OPM multidatabase query language (OPM*QL).
OPM*QL extends OPM-QL with constructs needed for querying
multiple databases, such as constructs for expressing conditions
across multiple classes from distinct databases,
and for navigating between classes of multiple databases following
inter-database links.
Processing OPM*QL queries involves generating OPM-QL queries
over individual databases in the multidatabase system, and
combining the results of these queries
using a local query processor. The stages of generating OPM-QL
queries and manipulating data locally may be interleaved depending on
the particular query evaluation strategy being pursued.