Organization: LBNL » CRD » SDM » FastBit
FastBit is an open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools.
The FastBit software is distributed under the BSD license. The software is available at code.lbl.gov. The most recent release is FastBit 2.0.3; it comes as a source tar ball named fastbit-2.0.3.tar.gz. The latest development version is available from http://goo.gl/Ho7ty.
The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record.
|A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps. The figure on the right shows a set of timing measurements using a high-energy physics data set. It shows that FastBit compressed index is more than 10x faster than the compressed bitmap index implementation from a popuplar commercial database management (DBMS) product.
A formal analysis of the WAH compressed bitmap index is published in ACM TODS v31 in 2006.
|Another innovation in FastBit is the multi-level bitmap encoding methods. In the performance measurements shown on the right, the commonly used bitmap encoding method is labeled "E1"; the binary encoding (also known as the bit-sliced index) is labeled "BN"; the two-level encoding methods are labeled "EE", "RE" and "IE". From this figure, we see that the new two-level methods are never slower than "E1" or "BN". In some cases, it can be up to 10x faster than "E1", which is used the popular commercial DBMS mentioned earlier.
An analysis of the multi-level methods is published in ACM TODS v35 in 2010.
|A list of FastBit research publications is available on-line|
The development of FastBit was originally motivated by the need of a high-energy physics experiment called STAR. That eventually led to the development of a Grid-based system named Grid Collector. Since then FastBit has been used in many more applications, some of which are listed in this web page.
We are constantly amazed by the innovative ways FastBit is used. For example, a group of German researches have applied it to a molecular docking problem and published a detailed description at J. Chem. Inf. Model., 2009, Jose Nazario and Brent Pedersen have posted a python binding to FastBit, Andreas Streichardt has put together an extension of PHP for FastBit, and Olaf Walkowiak and friends have developed an Alternative Native Interface for Java.
There is an active FastBit mailing list. If you make use of FastBit please join the mailing list and feel free to provide us with you feedback.
More creative uses of FastBit
This work is supported in part by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231 with University of California. Some tests are performed with resources of the National Energy Research Scientific Computing Center.