Summary Applications Compression Publications Software Documentation

2008/12: ASCR Disvovery article
2008/07: R&D 100 award
2008/02: FastBit source in new home
2007/08: FastBit speed up drug discovery tool
2007/06: Physical Design Reviewed
2006/03: Prove formal optimality [ Draft]
2006/02: Work on Enron data made headline at PRIMEUR.
2005/05: Appeared in ACM TechNews.
2005/05: Grid Collector wins ISC Award.
2005/01: CRD news report on FastBit.
2004/12: WAH patent issued.

FastBit: An Efficient Compressed Bitmap Index Technology

As computers become more pervasive, many scientific and commercial endeavors are collecting or generating tremendous amount of data. Typically, a relative smaller number of records contain the keys to new insight or new trends [ 1, 2]. One of the most daunting challenges in data analyses or data mining is to quickly retrieve these useful records. FastBit answers this challenge with efficient compressed bitmap indexes. In computational complexity analysis, we proved that our index scheme is optimal [ 3, 4]. In a number of tests, we observed that our indexing scheme can answer range queries tens of times faster than the well-known indexing schemes [ 5]. By integrating FastBit with a number of existing technologies, such as the Storage Storage Manager [6], we are able to significantly ease the data management tasks in many data intensive science projects. An overview article about FastBit applications has appeared in SciDAC 2005 conference proceedings and more information about the applications is available at applications.html.


Recent highlights December 2008: FastBit featured in DOE - Science - ASCR Discovery.

November 2008: FastBit significantly speed up the analysis of Laser Wakefield Particle Accelerator simulation data [ Project description] [ SC08 publication ( Abstract)] [ movie]

July 2008: !FastBit recognized as one of 100 most innovative new products by R&D magzine [ Note from Representative Ellen Tauscher] [ News release][Poster for R&D 100 Award Recepition][Photo 1 from the R&D 100 Award Receiption] [ Photo 2 from the R&D 100 Award Receiption]

March 2008: FastBit made its way to UC Davis Visualization group, appeared in work on Bin-Hash Indexing: A Parallel GPU-Based Method For Fast Query Processing .

August 2007: A research group in Germany has applied FastBit technology to virtual screening for molecular docking used by pharmaceutical companies among others. At the ACS Fall 2007 meeting, Jochen Schlosser and Matthias Rarey presented this new virtual screening tool named TrixX-BMI, and showed that it can screen libraries of ligands 140--250 times faster than the state of art screening tools (see page 18 of their presentation at ACS Fall 2007 meeting).

June 2007: A paper jointly produced by the developers of RIDBit and FastBit examines the physical design aspects of the two packages, and reveals that the design choices of FastBit are indeed effective. The paper is to be presented at IDEAS 2007 conference.


Comparison with other implementations of bitmap index

The potential of using bitmap index to significantly improve the query performance did not go unnoticed by the database vendors, for example, ORACLE and Sybase IQ have implemented different bitmap indexes. However, bitmap indexes are not as popular as B-trees because most of the existing database products are designed initially for transactional data, where any index on data must be updated quickly as data records are modified. This is required in transactional applications, but for majority of the data analysis applications, such data warehousing or scientific data management, the data to be queried is not modified, at least not modified frequently. In these cases, a new physical data layout and a new set of indexing techniques are more appropriate. One prominent example of a new approach is the Vertica database product.

In the database research community, vertical data organization has been studied extensively, for example MonetDB was first released in 1994. FastBit combines both vertical data organization and efficient bitmap indexes to provide a unique method of processing queries on large data sets.

Compression

FastBit implements a patented bitmap compression technique to speed up the bitmap indices [7]. The key to the new compression scheme is that it is much more suited for implementation on CPU's operating on words rather than on bytes. Though others observed the same need to align the compressed data with the computer architecture, our compression technique has the distinct advantage of being much simpler to encode and decode. More importantly, it enables efficient bitwise logical operations that are needed to answer user queries.

More information about compression available at this link.

Range Queries

Using bitmap indices, range queries can be answered with bitwise logical operations. Since bitwise logical operations are generally very well supported by computer hardware, uncompressed bitmaps involving relatively smaller number of bitmaps can be efficiently answered. In most scientific applications, the number of bitmaps in a bitmap index is typically large, say more than 1000. This causes the uncompressed bitmaps to be very large. Our compression scheme addresses the size problem by reducing the compressed size to the level that is below a typical B-tree implementation [3]. In addition, because our compression scheme supports fast bitwise logical operations on the compressed bitmaps, the time to answer a one-dimensional range query is a linear function of the number of hits [3]. This is theoretically optimal, since any system to answer a query must at least list all the hits and therefore has the same theoretical time complexity as our approach. This theoretical property is remarkable as only a few most efficient indexing schemes including B+tree and B*tree have it. There are two factors that make FastBit even more remarkable. Since answers to one-dimensional queries produced from bitmap indices can be easily combined to answer multi-dimensional queries, bitmap index scheme that is efficient for one-dimensional range queries is also efficient for multi-dimensional range queries. The same is not true for other optimal indexing schemes. In addition, FastBit is not only theoretically efficient; it is also efficient in tests involving various different application datasets [3, 4, 5].

More information about performance of range queries available at rangeQueries.html.

Applications

Here is a short list of application stories. More FastBit applications are given at applications.html.

  • Grid Collector for STAR data analysis: The application that initially motivated our research was a high-energy physics experiment called STAR [7]. One of the main goals of the experiment was to investigate the existence of such state of matter as Quark Gluon Plasma. One indication of the existence of Quark Gluon Plasma is a phenomenon called jet quenching. However, only a small fraction of the high-energy collisions may have this phenomenon. Effectively identifying these key events out of hundreds of millions of collision events stored mostly on tape is a significant challenge. To meet this challenge, we clearly need efficient searching strategies. In addition, we also need to address the file management issues because the distributed nature of the storage systems and analysis resources involved.
  • Feature tracking in combustion data analysis: In a number of scientific fields, simulation is the predominant instrument of study. As in an analysis of simulation of hydrogen-air autoignition process, a common analysis task is to track the progression of some particular feature, such as the evolution of the ignition kernel, the propagation of the flame front, or the procession of a shock wave. There are two general categories of existing approaches, one based on visualization techniques or one based on database techniques. The former typically can only performs very simple search operations, therefore unable to process features involving many variables or complex conditions; while the later is better at processing complex conditions, it typically handles spatial characters of the features very slowly. Our compressed bitmap based approach has the unique advantage of efficiently handle both multi-dimensional conditions and complex spatial characters.
  • DEX for query-driven visualization: This is an extension of the work on feature tracking. One of the aims is to provide a flexible visual data exploration tool that can interactively process complex user queries on datasets. FastBit is uniquely suited to fill this role of searching, which is very limited in existing visualization tools. In a demonstration at Supercomputing 2004 conference, we show that DEX can find isocontours significantly faster than commonly used visualization tools such as VTK. Clearly, finding isocontours is one of the simplest capabilities of DEX. In the immediate future, our goal is to design an intuitive interface so that the complex capability can be easily accessible to users.
  1. J. Gray, D. T. Liu, M. Nieto-Santisteban, A. Szalay, D. DeWitt, and G. Heber. Scientific Data Management in the Coming Decade. CT Watch Quarterly. February, 2005.
  2. J. Becla and D. L. Wang, Lessons Learned from Managing a Petabyte, CIDR, 2005.
  3. K. Wu, E. Otoo, and A. Shoshani. Optimizing bitmap indices with efficient compression. ACM Transactions on Database Systems, v 31, pages 1-38, 2006. [ Preprint]
  4. K. Wu, E. J. Otoo and A. Shoshani. On the Performance of Bitmap Indices for High Cardinality Attributes. VLDB'2004, pp. 24 - 35. 2004. [ Preprint]
  5. K. Wu, E. J. Otoo and A. Shoshani. Compressing Bitmap Indexes for Faster Search Operations. SSDBM'2002, pp. 99-108. 2002. [ Abstract] [ Preprint]
  6. A. Shoshani, A. Sim, and J. Gu. Storage Resource Managers: Essential Components for the Grid, in Grid Resource Management: State of the Art and Future Trends, Edited by J. Nabrzyski, J. M. Schopf, J. weglarz, Kluwer Academic Publishers, 2003. [Preprint] Information about Storage Resource Manager (SRM) is also available at /Projects/SRMproject.
  7. K. Wu, A. Shoshani and E. J. Otoo. Word aligned bitmap compression method, data structure, and apparatus. US Patent 6,831,575. 2004.
  8. More information about STAR is available at this main website http://www.star.bnl.gov. Information about using FastBit in Grid Collector is also available.
Make It A Bit Faster
You are here:   Projects/FastBit/WebHome
Last Updated r32 on 2009-06-11 18:42:25 by Kwu