A database provides multiple autonomous applications with a centralized and homogeneous view of data. The data in a database are structured according to a schema (database definition) specified in a data definition language (DDL), and are manipulated using operations specified in a data manipulation language (DML). Data definition and manipulation languages are based on a data model that defines the semantics of the constructs and operations provided by these languages.
Managing data in multiple pre-existing databases entails dealing with their data distribution, system (e.g., DBMS) heterogeneity, and semantic (e.g., schema) heterogeneity. Approaches to managing heterogeneous databases include linking heterogeneous databases via the World Wide Web (WWW), organizing them into database federations or multidatabase systems, and constructing data warehouses. Common to these approaches is allowing component databases to preserve their autonomy, that is, their local definitions, applications, and policy of exchanging data with other databases (Bright et al. 1992).
Heterogeneous database systems have been traditionally classified by the type of schemas, extent of data sharing, and data access facilities they support.
Schemas supported by a heterogeneous database system include (Sheth and Larson 1990):
Thus, every database in a heterogeneous database system can provide a
subset of its schema as its export schema interface to
other databases; each database in turn,
can have import schemas describing the export schemas of other
databases in their local DDL (Heimbigner and McLeod 1985).
The global schema of a heterogeneous database system can range from
a loose collection of export schemas to a fully integrated schema.
Similarly, local views of the system can range from a loose collection
of import schemas to an integration of the local schema with
all import schemas.
For example, a database federation can have a global
(federation) schema that provides users with a uniform view of
the federation and thus insulates them from the component databases,
or local views that provides users with multiple views of the
federation.
A data warehouses represents the materialization of a
global schema, that is, the warehouse database defined by the global
schema is loaded with data from the component databases.
Unlike database federations and data warehouses,
multidatabase systems are collections of
loosely coupled databases without global schemas.
Data sharing in a heterogeneous database system can be at the level of:
Individual data item links (e.g., hypertext links) between databases do not require or comply with schema correlations across databases. For schema correlations, data links need to be consistent with the constraints entailed by these correlations, such as inter-database referential integrity constraints.
Data access facilities in a heterogeneous database system can range from:
Browsing across component databases is usually based on traversing WWW hyperlinks between data items in a database to data items in another database, and does not require schema correlations. Querying a data warehouse amounts to querying a single database, where the data of all component databases are represented according to the global schema of the warehouse. Querying multiple databases is carried out by expressing queries over the global schema of the heterogeneous database system or over the component database local views of the system; query translators convert queries expressed over the global schema or local views to queries for component databases. Alternatively, a heterogeneous database system can be provided with a multidatabase query language (Litwin 1987) that allows expressing queries that refer directly to elements of component databases.
A data warehouse is sometimes confused with consolidating
heterogeneous databases into a centralized database which subsumes and
replaces its component databases.
However, consolidating heterogeneous
databases is far more complex and expensive than constructing
data warehouses. Unlike data warehouses that do not disturb component
databases, consolidation eventually discards component databases
and therefore requires consensus on common (global) names, data
structures, values, and policy. Furthermore, all existing applications
on component databases must be converted in order to comply with the
new consolidated elements. This conversion process is usually very
costly and not always feasible. Finally, manipulating
(e.g., updating) and maintaining (e.g., reorganizing) a large database
are inherently more complex processes
than for smaller component databases.
We do not intend in this paper to discuss the advantages, disadvantages, architectures, or complexity of heterogeneous database systems providing different types of schemas, data sharing, and data access facilities. Such discussions are abundant in the literature on heterogeneous database systems (e.g., see Sheth and Larson 1990). We note only that the facilities provided by heterogeneous database systems are not sufficient for a comprehensive comparison of such systems. For example, determining that IGD and GT are both based on global schemas and support data warehouse querying, provides an incomplete and therefore unsatisfactory comparison of these two systems.
2