next up previous contents
Next: The OPM Multidatabase Up: OPM*QS: The Object-Protocol Previous: The OPM Data

The OPM Multi-Database Query Language

The Object Protocol Model multi-database query language (OPM*QL) supports ad hoc queries across multiple databases that have an OPM interface.

OPM*QL is intended to be similar to the single-database OPM query language, OPM_QL. However, unlike OPM_QL, OPM*QL supports only retrieval (i.e., ``select'') queries, and does not support inserting, deleting or updating multiple databases. OPM*QL also includes a number of extentions to OPM_QL necessary for dealing with multiple databases. These extensions include the ability to query multiple classes, possibly from distinct databases, constructs that allow navigation between the classes of multiple databases, and the ability to rename fields of a query in order to resolve potential naming conflicts between multiple databases.

An example of a simple OPM query over database GSDB is:

        SELECT ID = GSDB:Gene, GSDBRef = GSDB:Gene.gdb_xref
        FROM GSDB:Gene
        WHERE GSDB:Gene.gdb_xref IS NOT NULL
        AND   GSDB:Gene.name = "ACHE" ;

In the query above, the term `` GSDB:Gene'' refers to class Gene of database GSDB, while term `` GSDB:Gene.name'' refers to attribute name of class Gene. If a class name is unique among all the classes listed in the database directory, the database name can be omitted from a term, for example `` Gene'' can be used instead of `` GSDB:Gene''.

The structure of an OPM*QL query is defined below:

<query> ::= <select_stat> <from_stat> <where_stat> ';'
          | <select_stat> <from_stat> ';'
          ;
<select_stat> ::= SELECT <selected_terms>
                ;
<selected_terms> ::= <selected_terms> ',' <selected_term>
                   | <selected_term>
                   ;
<selected_term> ::= <class_spec> '.' <comp_term>
                  | <attr_id> '=' <class_spec> '.' <comp_term>
                  ;
<comp_term> ::= <composition element list> <last element>
              ;
<composition element list> ::= <null>
                             | <composition element list> <composition element>
                             ;
<last element> ::= <attr_id>
                 | <attr_id> '[' <class_id> ']'
                 | '!' <attr_id> '[' <class_id> ']'
                 ;
<composition element> ::= <attr_id> '.'
                        | <attr_id> '[' <class_id> ']'
                        | '!' <attr_id> '[' <class_id> ']'
                        ;
<from_stat> ::= FROM <class_specs>
              ;
<class_specs> ::= <class_specs> ',' <class_spec>
                | <class_spec>
                ;
<class_spec> ::= <db_id> ':' <class_id>
             | <class_id>
             ;
<where_stat> ::= WHERE <conditions>
               ;

The SELECT and FROM parts of a query are mandatory; the WHERE part of a query is optional. Each term (which can be an attribute name or a path expression) in the select statement can be renamed using the prefix `` attr_id =''. If this prefix is omitted then the attribute name of the term will be used. For example, the select statement

        SELECT ID = GSDB:Gene.id, GSDBRef =  GSDB:Gene.gdb_xref

will return instances with attributes ID and GDBRef, while the select statement

        SELECT GSDB:Gene.id, GSDB:Gene.gdb_xref

will return instances with attributes id and gdb_xref.

The WHERE statement is followed by a list of conditions separated by the keyword AND. The syntax for conditions is defined below:

<conditions> ::= <condition>
               | <conditions> AND <condition>
               ;
<condition> ::= <class_comp_term> IS NULL
              | <class_comp_term> IS NOT NULL
              | <class_comp_term> <comp op> <a primitive value>
              | <class_comp_term> <comp op> <set of values>
              | <class_comp_term> <comp op> <class_comp_term>
              | <class_comp_term> <match op> <string>
              | <class_comp_term> <in op> <set of values>
              | <class_comp_term> <in op> <class_comp_term>
              | <class_comp_term> <contains op> <a primitive value>
              | <class_comp_term> <contains op> <set of values>
              | <class_comp_term> <contains op> <class_comp_term>
              ;
<class_comp_term> ::= <class_spec> '.' <comp_term>
                    ;
<comp op> ::= '=' <any all> | '!=' <any all> | '>' <any all>
            | '>=' <any all> | '<' <any all> | '<=' <any all>
            ;
<any all> ::= <null> | ANY | ALL
            ;
<match op> ::= MATCH | NOT MATCH
             ;
<contains op> ::= CONTAINS | NOT CONTAINS
                ;
<in op> ::= IN | NOT IN
          ;
<set of values> ::= '{' <set of numbers> '}'
                  | '{' <set of strings> '}'
                  ;
<set of numbers> ::= <number>
                   | <set of numbers> ',' <number>
                   ;
<set of strings> ::= <string>
                   | <set of strings> ',' <string>
                   ;

(The ALL and ANY modifiers for comparison operators are not yet supported.)

Conditions can involve multiple classes, possibly from different databases. For example, the following query involves joining the Gene classes from the GSDB and HGD databases:

     SELECT Name = GSDB:Gene.name, Reason = HGD:Gene.reason,
            Annotation = HGD:Gene.annotation
     FROM GSDB:Gene, HGD:Gene
     WHERE HGD:Gene.accessionID = GSDB:Gene.gdb_xref
     AND   GSDB:Gene.name = "ACHE" ;

Note that the condition HGD:Gene.accessionID = GSDB:Gene.gdb_xref compares values arising from two different databases.



next up previous contents
Next: The OPM Multidatabase Up: OPM*QS: The Object-Protocol Previous: The OPM Data



& Markowitz
Wed Jan 17 16:39:09 PST 1996