next up previous
Next: Converting Schemas Up: Data Models for Previous: Object-Protocol Model (OPM)

Comments

The Genera data model is a strict subset of OPM, the EER model is less expressive than OPM and ACEDB, and OPM and ACEDB support different flavors of similar constructs, but also have some constructs that do not have a direct correspondent in the other model. The relational model can be used for expressing object structures but with substantially larger, and therefore less comprehensible, schemas than the object data models.

We have not considered in the discussion above the ASN.1 notation used at NCBI for specifying structured data files [16], since ASN.1 is not a data model per se. However, except class hierarchies, the constructs of the reference data model can be expressed using ASN.1.

EERM, Genera, ACEDB, and OPM are affected by implementation considerations. For example, since the EER, Genera, and OPM tools target primarily relational DBMSs, the EERM, Genera, and OPM data types reflect the data types supported by these DBMSs; the null constraints in the ACEDB data model have different semantics than in other data models because they reflect ACEDB's implementation of tree structures.

A data model can be judged for the conciseness and clarity of the schemas expressed with its constructs. Some data models are easier to use for modeling certain structures than others, but all data models impose certain restrictions. A data model cannot be considered as well defined if it has ambiguous constructs. Data model well-definedness is very important from a computer science point of view, but seems to be of little concern to developers and users of molecular biology databases.

Developers of molecular biology databases are somewhat concerned about the size of database schemas since large schemas are hard to maintain, but very few (if any) seem to care about the clarity of the data model used for specifying these schemas. Scientists using molecular biology databases are not interested in the size, clarity, or even lack of ambiguity of database schemas, and are concerned only about the ease of interaction with these databases. Thus, ACEDB is popular because of its graphical user interfaces, tolerance of incomplete data, and data exchange and reorganization facilities; Genera is used mainly for its ability of providing high-level Web-based interfaces on top of (existing or new) Sybase databases; EER and OPM are known primarily because of their associated tools that provide object-based interfaces on top of relational DBMSs.

The report on the first Meeting on Interconnection of Molecular Biology Databases [9] indicates that there is agreement on the need of using object data models for modeling molecular biology databases (MBDs), but there is no agreement on what data model or individual constructs are preferable. Developing a standard data model for MBDs has been promoted by some for facilitating MBD interoperation. Standardization, however, could have a negative effect on MBD modeling. Thus, object constructs are not always appropriate for modeling certain molecular biology data. For example, object constructs are not sufficient for accurately modeling molecular biology laboratory experiments [4]. Additional constructs are needed for better modeling sequences and maps. Such extensions should not be denied via standardization as long they are justified and rigorously defined. Furthermore, standardization is not need for interconnecting MBDs which can be achieved by developing converters between the data models underlying MBDs. Most of these data models are already closely related. Translating from one data model to another depends only on providing clear and complete semantic definitions for the constructs of these data models. Unfortunately, such definitions are not always available.

1.15



next up previous
Next: Converting Schemas Up: Data Models for Previous: Object-Protocol Model (OPM)



VMMarkowitz@lbl.gov
Jul 13, 1995