Bitmap indices are efficient data structures for processing
complex, multi-dimensional queries in data warehouse applications and
scientific data analysis. For high-cardinality attributes, a common approach
is to build bitmap indices with binning. This technique partitions
the attribute values into a number of ranges, called bins, and uses bitmap
vectors to represent bins (attribute ranges) rather than distinct values.
In order to yield exact query answers, parts of the original data values
have to be read from disk for checking against the query constraint. This
process is referred to as candidate check and usually dominates the total
query processing time.
In this paper we study several strategies for optimizing the candidate check cost for multi-dimensional queries. We present an efficient candidate check algorithm based on attribute value distribution, query distribution as well as query selectivity with respect to each dimension. We also show that re-ordering the dimensions during query evaluation can be used to reduce I/O costs. We tested our algorithm on data with various attribute value distributions and query distributions. Our approach shows a significant improvement over traditional binning strategies for bitmap indices.