Biological
Collections and DatabasesBiological specimens in museum collections provide verifiable, physical documentation (vouchers) of the Earth's biodiversity and indispensable resources to the science of biological systematics. Access to collections data is crucial to both biodiversity assessment and the rapid progress of systematics, but currently access is impeded by heterogeneity among the hundreds of independently conceived and developed collections databases. To enhance the accessibility and value of collections data, the collections community must develop interoperability among its collection databases. A critical component of this interoperability is the semantic reconciliation of, or the mapping of correspondences among, our heterogeneous data structures, across both institutions and systematics disciplines.
Molokai cloud forest. Photo by Clyde
Imada,
courtesy Hawaii
Biological Survey, Bishop Museum
The
first attempt to describe biological collections data, across disciplines,
above the level of a single institution, and with a modern methodology,
was a 1992, NSF-sponsored, workshop of the Association of Systematics Collections,
Computerization and Networking Committee (ASC-CNC), at Cornell University
(Julian Humphries and Janet Gomon, Co-Chairs). The workshop produced a
high-level, entity relationship (ER) model (ASC,
1993), containing more than 50 entities and 100 pages of description,
but relatively little detail about data elements. In 1992, the model was
arguably ahead of its time, but was used subsequently as the point of departure
for further modeling and database development efforts. Awareness of the
model's significance has grown in the intervening years and calls have
been repeated for the model to be updated and made more explicit.
The ASC-CNC effort to integrate collections data across disciplines has now been revived through a year-long grant from the NSF Database Activities Program to the Bishop Museum (Allison and Blum, Co-PIs). The centerpiece of the project will be this Library of Information Models for Biological Collections. The library will function as a publicly accessible repository where models of collection databases can be deposited, compared, studied, and ultimately, recombined into templates for "next-generation" collection databases. Within the framework of this library, the ASC model will serve as a reference, or to use a navigational metaphor, a series of conceptual landmarks that will enable users to stay oriented while examining alternative data structures. The existing ASC model will be revised as input is gathered from local institutions through regional workshops and site visits during the first half of 1997. (If you would like to attend one of these regional workshops or schedule a site visit at your institution, please contact the "librarian", sblum@bishop.bishop.hawaii.org.)
The
concept for this particular schema library, and the collaboration between
the ASC and OPM projects, grew from a recent workshop
at the San Diego Supercomputer Center (October, 1996), in which a small
group of collections information specialists and computer scientists met
to discuss the ASC project and its relationship to current thinking in
computer science. The technology to compile, maintain, and use the library
is being developed as part of the OPM
Project at the Lawrence Berkeley National Laboratories.