next up previous contents
Next: Evaluating Queries Expressed Up: Using the OPM Previous: Using the OPM

Typical Queries Expressed over GDB and GSDB

The following queries return results based on data in both the HGD database of GDB 6.0 and the GSDB 2.0 database. These queries were suggested by Chris Fields, National Center for Genome Resources, Santa Fe, and were specified with help provided by Ken Fasman and Stan Letovsky, Johns Hopkins School of Medicine, Baltimore and Carol Harger, National Center for Genome Resources.

Query 1:
Find the protein kinase genes on chromosome X. To identify protein kinase genes in GSDB it is necessary to first find protein kinase products in the GSDB Product class, and then to find Genes associated with the same Feature as the product. The corresponding Gene in the GSD Human Genome Database (HGD) can then be accessed by following the gdb_xref attribute from the GSDB Gene, if present, and equating it with the HGD accessionID attribute. (In fact some string reformatting is needed here because of current incompatibilities between the representations of GDB accession numbers in GDB and GSDB. These can be implemented using string reformatting functions built into the OPM multi-db query language, but will be ignored in the following examples.) In order to test whether a Gene occurs in chromosome X we can then follow the path in HGD from Gene to MapElement to Map to Chromosome.

SELECT HGD:Gene.displayName, HGD:Gene.accessionID
FROM   GSDB:Feature, HGD:Gene
WHERE  Feature.products.name MATCH "%kinase%"
AND    Feature.genes.gdb_xref = HGD:Gene.accessionID
AND    HGD:Gene.mapElements.map.chromosome.displayName = "X";

Query 2:
Find sequenced regions on chromosome 17 with length greater than 100,000. Map Elements on chromosome 17 are selected from the HGD class MapElement using the path from MapElement to Map to Chromosome. Links from the MapElements to GSDB Entries are found using the HGD SequenceLink class. From the GSDB Entry the corresponding sequence can be found and tested to see if its length is greater than 10,000.

SELECT Entry.accession_number, Entry.sequence.length
FROM   HGD:MapElement, HGD:SequenceLink, GSDB:Entry
WHERE  MapElement.map.chromosome = "17"
AND    SequenceLink.dBObject = MapElement.segment
AND    SequenceLink.externalDB.displayName = "GSDB"
AND    SequenceLink.accessionID = Entry.accession_number
AND    Entry.sequences.length > 10000;

Query 3:
Find the sequences of ESTs mapped between 4q21.1 - 21.2. Currently this requires two queries: the first to find the coordinate range and the second to find ESTs with coordinates in that range and their sequences. Future extensions to the multi-database query system will allow this to be expressed as a single query.

The first part of the query finds the coordinates of the points q21.1 and q21.2 in the Cytogenetic Map of chromosome 4:

SELECT MapElement.coordinate, MapElement.point, 
                                     MapElement.segment.displayName
FROM   HGD:MapElement
WHERE  MapElement.map.objectClass = "CytogeneticMap"
AND    MapElement.map.chromosome.displayName = "4"
AND    MapElement.segment.displayName IN {"q21.1", "q21.2"};

Next we can retrieve the expressed Amplimers occurring between these coordinates and lookup the corresponding sequence in GSDB.

SELECT Amplimer.displayName, Entry.accession_number, 
       Entry.sequences.length
FROM   HGD:Amplimer, HGD:SequenceLink, GSDB:Entry
WHERE  Amplimer.isExpressed = "Yes"
AND    Amplimer.mapElements.map.chromosome.displayName = "4"
AND    Amplimer.mapElements.sortCoord >= START_COORD
AND    Amplimer.mapElements.sortCoord <= END_COORD
AND    SequenceLink.dbObject = Amplimer
AND    SequenceLink.externalDB.displayName = "GSDB"
AND    SequenceLink.accessionID = Entry.accession_number;

Where START_COORD and END_COORD are the values from the previous query.



next up previous contents
Next: Evaluating Queries Expressed Up: Using the OPM Previous: Using the OPM



& Markowitz
Wed Jan 17 16:39:09 PST 1996