When using an out-of-core indexing method to answer a query, it is
generally assumed that the I/O cost dominates the overall query response
time. Because of this, most research on indexing methods concentrate on
reduceing the sizes of indices. For bitmap indices, compression has
been used for this purpose. However, in most cases, operations on these
compressed bitmaps, mostly bitwise logical operations such as AND, OR,
and NOT, spend more time in CPU than in I/O. To speedup these
operations, a number of specialized bitmap compression schemes have been
developed; the best known of which is the byte-aligned bitmap code
(BBC). They are usually faster in performing logical operations than
the general purpose compression schemes, but, the time spent in CPU
still dominates the total query response time. To reduce the query
response time, we designed a CPU-friendly scheme named the
word-aligned hybrid (WAH) code. In this paper, we prove that the
sizes of WAH compressed bitmap indices are about two words per row for
large range of attributes. This size is smaller than typical sizes of
commonly used indices, such as a B-tree. Therefore, WAH compressed
indices are not only appropriate for low cardinality attributes but also
for high cardinality attributes.
In the worst case, the time to operate on compressed bitmaps is
proportional to the total size of the bitmaps involved. The total
size of the bitmaps required to answer a query on one attribute is
proportional to the number of hits. These indicate that WAH compressed
bitmap indices are optimal. To verify their effectiveness, we generated
bitmap indices for four different datasets and measured the response
time of many range queries. Tests confirm that sizes of compressed
bitmap indices are indeed smaller than B-tree indices, and query
processing with WAH compressed indices is much faster than with BBC
compressed indices, projection indices and B-tree indices. In addition,
we also verified that the average query response time is proportional to
the index size. This indicates that the compressed bitmap indices are
efficient for very large datasets.