FastBit
Summary Applications Compression Publications Software Documentation
2022/02: FastBit source moved to github
2011/10: DBLunch seminar
2010/7: FastBit in nProbe
2009/9: FastBit in SciDAC review
2008/12: ASCR Disvovery article
2008/07: R&D 100 award
2007/08: FastBit speed up drug discovery tool
2006/03: Prove formal optimality [Draft]
2005/05: Grid Collector wins ISC Award.
2004/12: WAH patent issued.

Organization: LBNL » CRD » SDM » FastBit


FastBit: An Efficient Compressed Bitmap Index Technology

FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools.

Software

The FastBit software is distributed under the BSD license. The software is available at code.lbl.gov. The most recent release tar ball is at https://sdm.lbl.gov/fastbit/fastbit-latest.tar.gz. .

Technology

The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record.

A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps. The figure on the right shows a set of timing measurements using a high-energy physics data set. It shows that FastBit compressed index is more than 10x faster than the compressed bitmap index implementation from a popuplar commercial database management (DBMS) product.
A formal analysis of the WAH compressed bitmap index is published in ACM TODS v31 in 2006.
** FastBit performance **
Another innovation in FastBit is the multi-level bitmap encoding methods. In the performance measurements shown on the right, the commonly used bitmap encoding method is labeled "E1"; the binary encoding (also known as the bit-sliced index) is labeled "BN"; the two-level encoding methods are labeled "EE", "RE" and "IE". From this figure, we see that the new two-level methods are never slower than "E1" or "BN". In some cases, it can be up to 10x faster than "E1", which is used the popular commercial DBMS mentioned earlier.
An analysis of the multi-level methods is published in ACM TODS v35 in 2010.
** PLEASE DESCRIBE THIS IMAGE **
A list of FastBit research publications is available on-line

Use Cases

The development of FastBit was originally motivated by the need of a high-energy physics experiment called STAR. That eventually led to the development of a Grid-based system named Grid Collector. Since then FastBit has been used in many more applications, some of which are listed in this web page.

We are constantly amazed by the innovative ways FastBit is used. For example, a group of German researches have applied it to a molecular docking problem and published a detailed description at J. Chem. Inf. Model., 2009, Jose Nazario and Brent Pedersen have posted a python binding to FastBit, Andreas Streichardt has put together an extension of PHP for FastBit, and Olaf Walkowiak and friends have developed an Alternative Native Interface for Java.

There is an active FastBit mailing list. If you make use of FastBit please join the mailing list and feel free to provide us with you feedback.

Acknowledgments

This work is supported in part by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 with University of California. Some tests are performed with resources of the National Energy Research Scientific Computing Center.

Make It A Bit Faster
Related sites:
University of California
Berkeley Lab
Computing Sciences
Computational Research
Scientific Data Management Group