next up previous
Next: Examples of Queries Up: An Example Previous: An Example

The GDB-GSDB Multidatabase System

The Genome Data Base (GDB) is an archival MBD of genomic mapping data maintained at Johns Hopkins School of Medicine, Baltimore [12]. The new version of GDB, GDB 6.0 (see http://wwwtest.gdb.org/gdb/), was developed with the Sybase DBMS using the OPM toolkit [4]. GDB contains objects identified by accession numbers and are classified in classes organized in a class hierarchy. The main classes of this class hierarchy contain objects representing genomic data, literature references, and information on people and organizations.

The Genome Sequence Database (GSDB) is an archival MBD of genome sequence data maintained at the National Center for Genome Resources, Santa Fe. The current version of GSDB, GSDB 2.0 (see http://www.ncgr.org/gsdb/gsdb.html), has also been developed with Sybase DBMS but without using the OPM toolkit. For GSDB 2.0, an OPM view (see http://gizmo.lbl.gov/DM_TOOLS/OPM/opm_4.html) has been constructed using the OPM Retrofitting tool; this view allows GSDB to be accessed using the OPM query tools [7]. GSDB 2.0 is structured around one main class of objects, Entry, whose objects represent DNA sequences identified by accession numbers; the actual sequences (strings) are represented by objects of another class, Sequence. GSDB 2.0 also contains objects representing various entities, including genes, products, sources, and references.

Both GDB and GSDB have a Gene class. In GSDB 2.0, genes are considered to be a kind of Feature, and are characterized by gene names and references to external MBDs, such as GDB, that contain additional information on genes. In GDB, genes are represented by objects of class Gene and are characterized by information that includes the reason a genomic region is considered a gene, links to gene families the gene belongs to, mapping information, and references to derived sequences.

Sequences are represented in GSDB by objects of class Sequence. Sequence data include the actual sequence, sequence length, and information on the source of the sequence. Sequence information in GDB is represented by objects of class SequenceLink. These objects contain annotations linking primary GDB objects to external sequence MBDs such as GSDB, as well as information regarding the beginning and end points of sequences.

Both GDB and GSDB contain classes representing products. In GDB, products are limited to gene products, while in GSDB a product can be associated with any feature. In both GDB and GSDB, these classes seem primarily to serve as a way of referencing external MBDs, such as protein MBDs.

Both GSDB and GDB contain data representing references and/or citations. In GSDB, a Reference object is considered as a kind of (i.e., a specialization of) Feature object. References in GSDB are characterized by titles, publication status, lists of authors and editors, and external references to the Medline bibliographic database. In GDB, citations are represented by objects of class Citation and are further classified in subclasses of Citation representing books, journals, articles and so on.



next up previous
Next: Examples of Queries Up: An Example Previous: An Example



& Markowitz
Thu Mar 14 15:45:38 PST 1996