next up previous
Next: Criteria for Characterizing Up: Characterizing Heterogeneous Molecular Previous: Introduction

Classification Criteria for Heterogeneous Database Systems

A database provides multiple autonomous applications with a centralized and homogeneous view of data. The data in a database are structured according to a schema (database definition) specified in a data definition language (DDL), and are manipulated using operations specified in a data manipulation language (DML). Data definition and manipulation languages are based on a data model that defines the semantics of the constructs and operations provided by these languages.

Managing data in multiple pre-existing databases entails dealing with their data distribution, system (e.g., DBMS) heterogeneity, and semantic (e.g., schema) heterogeneity. Approaches to managing heterogeneous databases include linking heterogeneous databases via the World Wide Web (WWW), organizing them into database federations or multidatabase systems, and constructing data warehouses. Common to these approaches is allowing component databases to preserve their autonomy, that is, their local definitions, applications, and policy of exchanging data with other databases (Bright et al. 1992).

Heterogeneous database systems have been traditionally classified by the type of schemas, extent of data sharing, and data access facilities they support.

Schemas supported by a heterogeneous database system include (Sheth and Larson 1990):

  1. local views expressed representing the schemas of component databases expressed in the DDL of local databases; and

  2. a global schema expressed in a common DDL, providing a unified view of the schemas of all component databases.

Thus, every database in a heterogeneous database system can provide a subset of its schema as its export schema interface to other databases; each database in turn, can have import schemas describing the export schemas of other databases in their local DDL (Heimbigner and McLeod 1985). The global schema of a heterogeneous database system can range from a loose collection of export schemas to a fully integrated schema. Similarly, local views of the system can range from a loose collection of import schemas to an integration of the local schema with all import schemas. For example, a database federation can have a global (federation) schema that provides users with a uniform view of the federation and thus insulates them from the component databases, or local views that provides users with multiple views of the federation.gif A data warehouses represents the materialization of a global schema, that is, the warehouse database defined by the global schema is loaded with data from the component databases. Unlike database federations and data warehouses, multidatabase systems are collections of loosely coupled databases without global schemas.

Data sharing in a heterogeneous database system can be at the level of:

  1. linking specific data items in the component databases; or

  2. generic (schema driven) correlations across component databases.

Individual data item links (e.g., hypertext links) between databases do not require or comply with schema correlations across databases. For schema correlations, data links need to be consistent with the constraints entailed by these correlations, such as inter-database referential integrity constraints.

Data access facilities in a heterogeneous database system can range from:

  1. browsing across component databases; to

  2. querying a centralized data warehouse; to

  3. querying multiple databases.

Browsing across component databases is usually based on traversing WWW hyperlinks between data items in a database to data items in another database, and does not require schema correlations. Querying a data warehouse amounts to querying a single database, where the data of all component databases are represented according to the global schema of the warehouse. Querying multiple databases is carried out by expressing queries over the global schema of the heterogeneous database system or over the component database local views of the system; query translators convert queries expressed over the global schema or local views to queries for component databases. Alternatively, a heterogeneous database system can be provided with a multidatabase query language (Litwin 1987) that allows expressing queries that refer directly to elements of component databases.

A data warehouse is sometimes confused with consolidating heterogeneous databases into a centralized database which subsumes and replaces its component databases.gif However, consolidating heterogeneous databases is far more complex and expensive than constructing data warehouses. Unlike data warehouses that do not disturb component databases, consolidation eventually discards component databases and therefore requires consensus on common (global) names, data structures, values, and policy. Furthermore, all existing applications on component databases must be converted in order to comply with the new consolidated elements. This conversion process is usually very costly and not always feasible. Finally, manipulating (e.g., updating) and maintaining (e.g., reorganizing) a large database are inherently more complex processes than for smaller component databases.

We do not intend in this paper to discuss the advantages, disadvantages, architectures, or complexity of heterogeneous database systems providing different types of schemas, data sharing, and data access facilities. Such discussions are abundant in the literature on heterogeneous database systems (e.g., see Sheth and Larson 1990). We note only that the facilities provided by heterogeneous database systems are not sufficient for a comprehensive comparison of such systems. For example, determining that IGD and GT are both based on global schemas and support data warehouse querying, provides an incomplete and therefore unsatisfactory comparison of these two systems.

2



next up previous
Next: Criteria for Characterizing Up: Characterizing Heterogeneous Molecular Previous: Introduction



& Markowitz
Tue Nov 14 17:16:09 PST 1995