Library of Information Models for Biological Collections:

Purpose



Main Purpose

The purpose of the Library of Information Models for Biological Collections is to present and document a reference model for biological collections, and record alternative models derived from or related to this reference model. The Library will promote the reuse and sharing of models (schemas) with the goal of reducing the time and complexity of the schema design process. A scientist developing a new schema will be able to browse the Library and start from the reference or alternative schema designs, and thus make use of the experience gained in developing other schemas. Furthermore, the Library will aid the development of software for interconnecting multiple biological collections databases (BCDs) and eventually help scientists to specify queries across multiple BCDs.

Modeling by Schema Reuse

The Library can be employed for modeling purposes as follows:
  1. Top-Down Schema Reuse Modeling. A user can first find the relevant schemas in the schema library, and then copy and/or combine these schemas to form a draft schema. Classes or attributes that are not needed in the draft schemas can be removed. The draft schema can be then refined by adding new classes and attributes or modifying the definitions of existing classes and attributes.
  2. Bottom-Up Class Reuse Modeling. A user starts from (a possibly empty) schema draft. Relevant classes from any schema in the library can be then selected and included into the draft schema. Classes related to a selected class via abstract attributes or subclass-superclass relationships, can be also added to the draft schema. This class addition process can be continued until all relevant classes selected from the library are added to the draft schema.
Search facilities provided by web browsers can be used for finding relevant schema components for reuse. However these facilities are limited to key word searches over an entire schema. No facilities are provided for searching only certain types of components (e.g., only attributes or only classes) nor for finding similar components. In future, we plan to provide a schema reuse tool that will allow finding relevant schema components based on similarity information regarding such components, and help reusing schemas from the schema library by supporting schema customization operations such as:
  1. merging schemas;
  2. merging classes and attributes;
  3. deleting classes and attributes;
  4. replacing classes;
  5. changing attribute value classes or constraints.

Exploring Multiple Heterogeneous Databases

The Library will facilitate exploring multiple collections (possibly heterogeneous) databases. The Library will eventually contain information about multiple collections databases expressed in a common (OPM) format. The OPM toolkit contains tools that allow exploring such databases in the context of a OPM Multidatabase System. The multidatabase OPM tools provide facilities for:
  1. assembling heterogeneous databases into a multidatabase system, while documenting their schemas and inter-database links;
  2. processing ad-hoc multidatabase queries via uniform OPM interfaces; and
  3. assisting scientists in specifying and interpreting multidatabase queries.
Incorporating a collections database into a OPM multidatabase system involves constructing one or more OPM views of the database, and entering information about this database and its views into the Library together with information about how it can be accessed. Queries in an OPM-based multidatabase system are expressed in the OPM multidatabase query language (OPM*QL). OPM*QL extends OPM-QL with constructs needed for querying multiple databases, such as constructs for expressing conditions across multiple classes from distinct databases, and for navigating between classes of multiple databases following inter-database links. Processing OPM*QL queries involves generating OPM-QL queries over individual databases in the multidatabase system, and combining the results of these queries using a local query processor. The stages of generating OPM-QL queries and manipulating data locally may be interleaved depending on the particular query evaluation strategy being pursued.