Historically, bitmap indexing has provided an important database
capability to accelerate queries. However, only a few database systems
have implemented these indexes because of the difficulties of modifying
fundamental assumptions in the low-level design of a database system and
in the expectations of customers, both of which have developed in an
environment that does not support bitmap indexes. Another problem that
arises, and one that may more easily be addressed by a research article,
is that there is no definitive design for bitmap indexes; bitmap index
designs in Oracle, Sybase IQ, Vertica and MODEL 204 are idiosyncratic,
and some of them were designed for older machine architectures.
To investigate an efficient design on modern processors, this
paper provides details of the Set Query benchmark and a comparison of
two research implementations of bitmap indexes. One, called RIDBit, uses
the N-ary storage model to organize table rows, and implements a
strategy that gracefully switches between the well-known B-tree RID-list
structure and a bitmap structure. The other, called FastBit is based on
vertical organization of the table data, where all columns are
individually stored. It implements a compressed bitmap index, with a
linear organization of the bitmaps to optimize disk accesses. Through
this comparison, we evaluate the pros and cons of various design
choices. Our analysis adds a number of subtleties to the conventional
indexing wisdom commonly quoted in the database community.