next up previous
Next: Pursuing the OPM Up: An Example Previous: The GDB-GSDB Multidatabase

Examples of Queries Expressed over GDB and GSDB

The following OPM multidatabase queries are examples of typical queries expressed over GDB 6.0 and GSDB 2.0. These queries were suggested by Chris Fields of the National Center for Genome Resources, Santa Fe, and were specified with help provided by Ken Fasman and Stan Letovsky of the Johns Hopkins School of Medicine, Baltimore and Carol Harger of the National Center for Genome Resources.

Query 1:
Find the protein kinase genes on chromosome 4. To identify protein kinase genes in GSDB it is necessary to first find protein kinase products in the GSDB Product class, and then find Genes associated with the same Feature as the Product. The corresponding Gene in GDB can then be accessed by following the gdb_xref attribute from the GSDB Gene, if present, and equating it with the GDB accessionID attribute. Some string reformatting was needed in this query in order to resolve discrepancies between the representations of accession numbers in GDB and GSDB; this reformatting was implemented using functions built into the OPM multidatabase query language, but are ignored in the queries shown below. In order to test whether a Gene occurs on chromosome 4, one can then follow the path in GDB from Gene to MapElement to Map to Chromosome.

     SELECT GDB:Gene.displayName, GDB:Gene.accessionID, Feature.products.name
     FROM   GSDB:Feature, GDB:Gene
     WHERE  Feature.products.name MATCH "%protein kinase%"
       AND  Feature.genes.gdb_xref = GDB:Gene.accessionID
       AND  GDB:Gene.mapElements.map.chromosome.displayName = "4";

Query 2:
Find sequenced regions on chromosome 17 with length greater than 100,000. Map elements on chromosome 17 are selected from the GDB class MapElement using the path from class MapElement to class Map to class Chromosome. Links from MapElement objects to GSDB Entry objects are found using the GDB SequenceLink class. From the GSDB Entry the corresponding sequence can be found and tested to see if its length is greater than 100,000.

     SELECT Entry.accession_number, Entry.sequence.length
     FROM   GDB:MapElement, GDB:SequenceLink, GSDB:Entry
     WHERE  MapElement.map.chromosome = "17"
       AND  SequenceLink.dBObject = MapElement.segment
       AND  SequenceLink.externalDB.displayName = "GSDB"
       AND  SequenceLink.accessionID = Entry.accession_number
       AND  Entry.sequences.length > 100000;

Query 3:
Find the sequences of ESTs mapped between 4q21.1 - 21.2. Currently this query requires two sub-queries: the first sub-query finds the coordinate range and the second sub-query finds ESTs with coordinates in that range and their sequences. Planned extensions to the multidatabase query system will allow this query to be expressed as a single OPM query.

The first part of the query finds the coordinates of the points q21.1 and q21.2 in the Cytogenetic Map of chromosome 4:

     SELECT MapElement.coordinate, MapElement.point, MapElement.segment.displayName
     FROM   GDB:MapElement
     WHERE  MapElement.map.objectClass = "CytogeneticMap"
       AND  MapElement.map.chromosome.displayName = "4"
       AND  MapElement.segment.displayName IN {"q21.1", "q21.2"};

Next, one can retrieve the expressed Amplimers occurring between these coordinates and lookup the corresponding sequence in GSDB.

     SELECT Amplimer.displayName, Entry.accession_number, 
            Entry.sequences.length, Entry.sequences.sequence
     FROM   GDB:Amplimer, GDB:SequenceLink, GSDB:Entry
     WHERE  Amplimer.isExpressed = "Yes"
       AND  Amplimer.mapElements.map.chromosome.displayName = "4"
       AND  Amplimer.mapElements.sortCoord >= START_COORD
       AND  Amplimer.mapElements.sortCoord <= END_COORD
       AND  SequenceLink.dbObject = Amplimer
       AND  SequenceLink.externalDB.displayName = "GSDB"
       AND  SequenceLink.accessionID = Entry.accession_number;

where START_COORD and END_COORD are the values from the previous query.



next up previous
Next: Pursuing the OPM Up: An Example Previous: The GDB-GSDB Multidatabase



& Markowitz
Thu Mar 14 15:45:38 PST 1996