FastBit
  Summary Applications Compression Range Queries Publications Software  
To FastBit front page

Word-Aligned Hybrid Compression for Bitmap Indices

 

FastBit implements a patented bitmap compression technique to speed up the bitmap indices. The key to the new compression scheme is that it is much more suited for implementation on CPU's operating on words rather than on bytes. Though others observed the same need to align the compressed data with the computer architecture, our compression technique has the distinct advantage of being much simpler to encode and decode. More importantly, it enables efficient bitwise logical operations that are needed to answer user queries.

A word in WAH compressed bitmaps has two possible interpretation. It is either a literal word or a fill word. Aside from a flag indicating it is a literal word, the rest of the bits stores the bits from the uncompressed bitmap. A fill word stores a number of consecutive bits that are either 0 or 1. Most of the bits in a fill word are used to store the number of bits it represents. One key that improve the operational efficiency is a restriction that the number of bits represented by a fill word must be an integer multiple of the number of bits in a literal word. We refer to this restriction as the word-alignment requirement.

The earliest description of this compression scheme was reported in technical report LBNL/PUB-3161, titled "Notes on Design and Implementation of Compressed Bit Vectors." A more careful description with some minor improvements were reported in Technical Report LBNL-49626, titled "An Efficient Compression Scheme For Bitmap Indices."

Related sites:
Berkeley Lab
Computing Sciences
Computational Research
Scientific Data Management Group