next up previous
Next: Exploring Heterogeneous Molecular Up: Semantic Problems of Previous: Semantics of Global

Semantics of Data Exploration

MBDs are usually explored via specially constructed schemas or views. These views may not necessarily preserve the information capacity [16,17] of component MBDs. The tool-based strategy described in the next section, for example, involves constructing OPM views of component MBDs, where an OPM view may entail constraints (e.g., referential integrity constraints) that are not enforced in the underlying MBD. Consequently access to an underlying MBD through an OPM view is restricted to those data which comply with these constraints, while other data are discarded. Discrepancies between the information capacities of the views employed for exploring heterogeneous MBDs and the underlying component MBDs can be a source of confusion if not properly documented and explained. This is especially critical for data warehouses where data are converted from the format of component MBDs into that of the data warehouse and are subsequently physically loaded into the data warehouse. Anecdotal evidence suggests that information loss occurs during some data conversion processes underlying IGD, but the causes and extent of this problem information loss have not been examined or documented.

Often the power of the facilities provided for exploring heterogeneous MBDs is not properly characterized. Query capabilities provided by MBDs vary between two extremes. Systems such as SRS [10] support only queries with limited keyword matching capabilities. Entrez [21] provides two query interfaces, NetEntrez and WebEntrez, both supporting the expression of form-based queries. The query language of IGD consists of, and is limited to, the ACeDB query language. On the other hand, systems such as Kleisli allow users to query databases using powerful programming languages such as CPL. Although users of these systems can submit very complex queries, it is difficult to imagine a biologist mastering such languages.

Often interfaces for exploring heterogeneous MBDs do not provide any help in clarifying the semantics of the queries users specify or in interpreting the semantics of the query results. For example the Entrez interface provides a number of query forms and various choices of attributes on which to search, but does not offer any description of the extents over which these queries search, or the semantics of the individual attributes. These problems are compounded by the extremely large and complex molecular biology nomenclature and by various differences in interpretations of this nomenclature within the molecular biology community.

Users of many MBD systems, such as Entrez and ENQUire (see http://csl.ncsa.uiuc.edu:80/ENQUire/), interact directly with a Web query interface, and may not be aware of the existence of a global schema. It is difficult or even impossible for users of these systems to detect whether there are conflicts in component MBDs and to realize how the conflicts are resolved. Therefore, when a user receives uninterpreted answers to a query involving conflicting MBDs, information regarding conflicts and how the conflicts are resolved are all hidden from the user.



next up previous
Next: Exploring Heterogeneous Molecular Up: Semantic Problems of Previous: Semantics of Global



& Markowitz
Thu Mar 14 15:45:38 PST 1996