next up previous
Next: Technological Alternatives Up: Pursuing the OPM Previous: The Multidatabase Directory

Supporting Multidatabase Queries

The current (first) version of the OPM multidatabase query translator [8] has been developed between October 1995 and January 1996. This version of the translator supports the expression of queries that combine (join) and manipulate data from multiple MBDs, and relies on information on the OPM schemas and remote access facilities of these MBDs contained in the Multidatabase Directory.

The multidatabase query processing strategy currently pursued, involves two stages:

  1. OPM multidatabase queries are decomposed into component OPM queries for each component database involved in the query, where single-database OPM queries are evaluated using the existing OPM query translator [7].
  2. Data retrieved from each single-database OPM query are assembled locally into the result of the multidatabase query, where the local query processor is capable of performing joins and evaluating conditions over complex nested data-structures.

Although this query processing strategy is very simple and general, it can be inefficient for certain types of queries. For example, for evaluating a query that selects a small number of genes from the GDB class Gene and then finds the related genes in the GSDB Gene class, it would be inefficient to retrieve all the GDB and GSDB genes separately and then compare their accession numbers, rather than just looking up the GSDB genes using the accession numbers of the genes retrieved from GDB. A more efficient query strategy could find an order for the subqueries and evaluate them in sequence, so that the results of each subquery would be used to restrict the next subquery in the sequence. However such a strategy would be considerably more difficult to implement, since it would require statistics on sizes of individual classes and the selectivity of constraints in order to determine an optimal evaluation order. Although we consider pursuing such strategies in the future, in the short term we plan to increase the efficiency of multidatabase query processing by using inter-database links.

Inter-database links are known connections between heterogeneous databases that are recorded in the Multidatabase Directory together with the metadata on component databases. An example of an interdatabase link is the link between the Gene class in GSDB and the Gene class in GDB, represented by attribute gdb_xref of class Gene in GSDB; this attribute contains GDB accession numbers and thus indirectly points to GDB Gene objects. Following such a link allows retrieving from a component database only the objects that are involved in specific links, instead of retrieving all the objects in a class, where following inter-database links predetermines a query evaluation order.

From the perspective of a user constructing OPM multidatabase queries, inter-database links look like regular OPM abstract attributes (which represent intra-database links), except that the result of following such a link will be an object in another database rather than an object in a different class of the same database. Thus the Multidatabase Directory will associate an attribute name with each inter-database link, thus augmenting the list of attribute names that are already associated with an OPM class. These attributes can then be used for including the inter-database links in attribute paths in a query.

It should be noted that inter-database links do not subsume the general multidatabase joins already implemented, but rather complement them: multiple MBDs can be queried using multi-database joins (as done in our current implementation), inter-database links, or a combination of the two. This means that users are not confined to using the links already determined and included in the MBD Link Library, but can determine their own correspondences between databases as well. Using a combination of multidatabase joins, inter-database links, and other locally performed data manipulations, it should be possible to express very general and efficient multidatabase queries.

In order to assist users in understanding the semantics of multidatabase queries, the OPM multidatabase query processor will also provide support for interpreting queries in terms of the semantics of both the target MBDs and the query processing operations.



next up previous
Next: Technological Alternatives Up: Pursuing the OPM Previous: The Multidatabase Directory



& Markowitz
Thu Mar 14 15:45:38 PST 1996