There are commercial distributed-join software tools, such as the Sybase Enterprise CONNECT family of products, that allow querying multiple relational databases. It should be noted that such tools do not help constructing multidatabase systems: one still needs to understand the component databases in their relational representation, their semantics and links. The OPM tools support higher level representations of databases, using abstract constructs that are better suited for representing biological data. In addition, the Multidatabase Directory simplifies substantially the task of exploring and understanding multiple databases.
A distributed-join tool could underly the processing of OPM multidatabase queries, where an OPM multidatabase query would first be translated into multidatabase SQL queries. The multidatabase SQL queries could be then processed by the distributed-join tool and the query results could be then converted into OPM data format. This query processing strategy is different from our current strategy of translating an OPM multidatabase query into OPM queries over individual databases.
Although we plan to examine this alternative query processing strategy in terms of cost and performance, we are aware of several problems inherent to this alternative. First, a distributed-join tool can be used only for a set of data sources supported by the tool: usually major commercial DBMSs or widely used standards, but not the more specialized data-sources frequently used for molecular biology databases (e.g., ASN.1, ACeDB). For data sources that are not supported by such a tool, additional programming would be still required. Moreover, such a tool is restricted to the constructs of a standard relational query language (e.g., SQL). Such query languages have been found to be overly restrictive and difficult to use for querying complex MBDs: for example, an SQL query over the relational schema for GSDB 2.0 will in general involve substantially more tables, and will be considerably more complex, than an equivalent OPM query expressed over the OPM view of GSDB. Furthermore, the OPM multidatabase query translator is based on a more powerful nested relational algebra which supports directly operations on nested sets and complex data structures. Finally, using a distributed-join tool for processing OPM multidatabase queries will make the performance of the OPM multidatabase query translator dependent on this tool. With our current query processing approach, we have the flexibility of experimenting with any query optimization strategy and hopefully achieve better query performance.