Notes on Design and Implementation of Compressed Bit Vectors

Kesheng Wu, Ekow J. Otoo, Arie Shoshani and Henrik Nordberg


Many bitmap based indexing schemes have been shown to work effectively in database applications. To further improve their effectiveness, we study compression schemes that have the potential to reduce the index size without increasing query processing time. A bitmap index is stored as a collection of bitmaps and the most frequent operations on these bitmaps are bitwise logical operations. Since we are interested in using these indices on large databases, reducing the sizes of the indices is essential. In addition, we also want to be able to perform logical operations on the compressed bitmaps efficiently. This note contains a study on a number of generic lossless compression algorithms and a number of specialized compression schemes. Generic compression schemes, such as gzip, are effective in reducing the storage requirement, but don't support faster bitwise logical operations. Most specialized bitmap compression schemes are byte based, i.e., they access memory one byte at a time, and can not take full advantage of today's computing hardware that supports fast word operations. In addition to studying the performance of these schemes, we also introduce a number of word-based compression schemes with the expectation that they may support faster logical operations than the byte-based schemes. What is surprising is that the word based scheme can be dozens of times faster than the byte based ones. In many cases, our word based compressed bitmap also perform bitwise logical operations faster than the uncompressed bitmaps.

full text of PUB-3161 (PDF)

More research work by John Wu
Bitmap Index
Connected Component Labeling
Eigenvalue Computation
Inforamtion available elsewhere on the web
Google Scholar
Contact us

John Wu