Electronic Notebooks (EN) are used to manage scientific experiments and record data generated by these experiments. ENs must provide facilities for capturing, organizing, analysing, and structuring information as well as provide efficient access to information [11].
For keeping track of and manipulating experimental data, ENs need data management facilities. Commercial database management systems (DBMSs) provide facilities for data storage and manipulation, but do not provide the capabilities required by ENs for tasks such as the description, browsing, and querying of data in terms familiar to scientists. Such capabilities are required due to the extremely complex data, and error-prone and time-consuming development processes that scientific databases can entail.
We propose to develop a toolkit providing EN-specific data management facilities in the context of the Object-Protocol Model (OPM) and the OPM tools. The EN toolkit will allow the construction of diverse EN data repositories that would consist of multiple databases and files and interoperate in the framework of a CORBA architecture.
The Object-Protocol Model (OPM) is an object data model that supports the specification of database schemas in terms of objects and protocols [1]. OPM loosely follows the ODMG-93 standard [12], extending it with additional constructs for modeling scientific experiments (protocols) and versions. OPM is valuable because of its data management tools providing facilities for constructing databases with commercial relational database management systems (DBMSs), such as Sybase and Oracle; for constructing OPM views for existing relational databases and certain structured files; and for querying such databases using the OPM query language via OPM interfaces based on the OPM query translator (OPM-QLT). Furthermore, OPM provides an open framework for adding new constructs deemed important for large classes of applications and for developing new tools that, in combination with the core OPM tools, can be used for achieving new goals. For example, the currently developed OPM multidatabase (OPM*) tools provide facilities for constructing and exploring multidatabase systems involving heterogeneous databases.
OPM and the OPM data management tools have been successfully used for developing several large genome community databases, such as the Genome Data Base (GDB) at the Johns Hopkins School of Medicine in Baltimore and the Protein Data Bank (PDB) at the Brookhaven National Laboratory. OPM documents, papers, and data management tools are available from http://gizmo.lbl.gov/opm.html.
In order to support ENs, the OPM toolkit will be extended with additional facilities that will provide support for (i) special data types (e.g., images and graphs) and operations on such data types (e.g., the visualization, analysis, and annotation of images and graphs); (ii) constructing and maintaining EN repositories consisting of multiple databases and files; (iii) EN-specific operations, such as experiment tracking and steering. Furthermore, in order to facilitate operation over heterogeneous distributed environments, EN repositories will be developed in the framework of a CORBA architecture.
The rest of this document is organized as follows. Section 2 contains a brief overview of OPM. The OPM data management tools are reviewed in Section 3. The OPM Web-based interfaces are described in Section 4. A CORBA based architecture for ENs is presented in Section 5. Section 6 describes the OPM extensions required for supporting ENs.
This document is closely related to