|FastBit Front Page||Research Publications||Software Documentation||Software Download||Software License|
Organization: LBNL » CRD » SDM » FastBit » Documentation » Quick Start
This quick start outlines the steps for a few tasks that one might do with FastBit software, such as preparing data, querying, and changing indexing options. Only the basic instructioins are contained here, more detailed instructions for these tasks and information on trouble shooting are provided else where as indicated.
This section briefly desribes the FastBit data format, and the tools available for converting ASCII data files.
If you have your data in an ASCII format known as Comma Separated Values
(CSV), which many database systems utilize for input and output, you can
use the command line tool
ardea for converting the CSV
files into directories that can be used by FastBit. (The executable
ardea is built by command
make all from the
top level directory of the FastBit source code.) The following command
line converts the file
tests/test0.csv to the FastBit data
partition in directory
examples/ardea -d tmp -m "a:int, b:float, c:short" -t tests/test0.csvIt reads the first column as 32-bit integers with column name
a, the second column as 32-bit floating-point values with column name
b, and the third column as 16-bit integer values with column name
c. The resulting binary files and the metadata file are writtent to directory
tmp. The following is a listing of the files in
tmp. The file sizes of
cshould be exactly 400 bytes, 400 bytes and 200 bytes as shown.
-rw-r--r-- 1 kwu Users 402 Jul 30 13:38 -part.txt -rw-r--r-- 1 kwu Users 400 Jul 30 13:38 a -rw-r--r-- 1 kwu Users 400 Jul 30 13:38 b -rw-r--r-- 1 kwu Users 200 Jul 30 13:38 c
For additional information about preparing data, read dataLoading.html.
FastBit treats data as tables with rows and columns. A large table may be partitioned into many data partitions. To prepare data for FastBit, one needs to build these partition separately. Each partition is stored in a directory in you file system, with each column stored as a separated file in raw binary form. The name of the data file is the name of the column. The column name can only contain alphanumeric characters plus the underscore (_) and must start with an alphabet. Furthermore, all column names are case insensitive.
There must be a metadata file named
-part.txt in the directory
for a data partition. This file contains information such as the name
of the partition, the number of rows in the partition, the number of
columns, column names and so on. Here is an example with the minimal
BEGIN HEADER name=testData Number_of_rows=1000000 Number_of_columns=2 END HEADER BEGIN Column name=f1 data_type=float END Column BEGIN Column name=j2 data_type=unsigned END Column
Once the binary data files and the metadata file
in place, FastBit can make use of the data partition and we can query
the data with the command line tool named
ibis built from
FastBit typically reads a whole data file containing a column into memory when part of the file is needed. This imposes a limit on the number of rows that can be stored in a partition. In addition, the size of a partition is internally recorded with a 32-bit unsigned integer, which has a limit of 232 rows, which is a hard upper bound on the number rows for a partition. Typically, for a machine with a few gigabytes of memory, we recommend a data partition to contain between 1 million and 100 million rows.
Larger samples and more usage examples
After preparing a data partition, we can try some queries.
thula can be used accomplish
this task. Assuming directry
tmp has been prepared with the above
ardea command line, the following two commands should both
produce 1 hit.
examples/ibis -d tmp -q "where a = 0" examples/thula -d tmp -w "a = 0"The following two commands both produce 9 hits.
examples/ibis -d tmp -q "where a = b and c < 10" examples/thula -d tmp -w "a = b and c < 10"
The above commands directly use the data directories specified on
command line. To specify more than one direcoty per command or to
specify additional parameters to control the execution of FastBit, one
may use a configuration file. For example, the following configuration
file specifies two data directories and tell FastBit to store temporary
timestep1.dataDir=/data/jwu/ts1 timestep2.dataDir=/data/jwu/ts2 CacheDirectory=/tmp/FastBitCacheThe configuration file can be passed to any command line tool with option
With all command line tools, a useful option is
-h, which will
cause them to print out usage information. Another useful option is
-v, which can be used to instruct them to print out more
information about their progress. Multiple
-v options can
be used if more verbose output is desired.
Note that both
thula answer queries, but
they exercise different interfaces to the underlying indexing functions. In
ibis also supports a lot more operations. A
description of these functions are available in ibisCommandLine.html.
FastBit implements more than a dozen different bitmap indexes and also
offers a number of ways to control which one is used for query
processing. An easy way to build index is to use the
program, such as
examples/ibis -v -d tmp -b "<binning none/><encoding equality/>"which builds the basic bitmap index (with no binning, equality encoding), other possible indexes are described in indexSpec.html.
When an index exists, the above command does not check whether the
existing index is of the specified type. To remove the existing indexes
and build new ones, add
-z in addition to option
-b. The existing query processing function can only work
with one index per column (per data partition). There is no way to
build multiple indexes for one column in a data partition.
Alternatively, one may manually remove all index (
from a directory or edit the file
-part.txt to specify
indexing options to the whole partition or a specific column. For
example, the following
-part.txt modifies the above one by
adding an indexing option to the whole partition and one for column
BEGIN HEADER name=testData Number_of_rows=1000000 Number_of_columns=2 index=<binning precision=2/><encoding equality/> END HEADER BEGIN Column name=f1 data_type=float END Column BEGIN Column name=j2 data_type=unsigned index=<binning none/><encoding equality/> END Column
All useful FastBit functions and classes are in the name space of
ibis. There are two levels of interfaces, one is more abstract
and the other is more concrete. The abstract interface is implemented as
ibis::table, and the concrete interface is implemented as
ibis::part (a shorthand for partition). We will next
briefly describe a handful of key functions from these two classes.
Note that before doing anything, one should call
read the configuration file if there is one.
One normally instantiates an
ibis::table object by calling
ibis::table::create with the data directory as the
argument. This is done in file
There are two functions,
ibis::table::estimate, which gives a
lower and a upper bound on the number of hits, and
ibis::table::select, which produces another table containing
the selected values. Both of these functions are documented in table.h.
The following is a snipet of source code that builds an
ibis::table object from directory
tmp, and estimate
the number of hits of query condition "a = 0".
ibis::table *tbl = ibis::table::create("tmp"); uint64_t nhmin, nhmax; tbl->estimate("a = 0", nhmin, nhmax);For a more realistic example, see file
Additional information about the functions defined in
ibis::table and related classes can be found in table.h.
The more concrete interface is represented by class
ibis::part corresponds to exactly one data directory
described above. Currently, the constructor of
requires two arguments both being directory names. It is safe to
pass the data directory name as the first argument and pass a nil
pointer as the second argument, see
examples/ibis.cpp for an
To process any query with
ibis::part, one needs to instantiate
ibis::query objects. The following code snippet demonstrates
how to create an
ibis::part from directory
ibis::query object for the query condition "a = b and c < 10",
then evaluate the query and find out the number of hits.
ibis::part par("tmp", 0); ibis::query que("username", &par); que.setWhereClause("a = b and c < 10"); que.evaluate(); long nhits = que.getNumHits();The class
ibis::partis documented in part.h, and the class
ibis::queryis documented in query.h. A more realistic example can be found in
The most interesting component of FastBit is the class hierarchy of
ibis::index. A potentially useful task is to extend these
compressed bitmap indexes in this class hierarchy. The class
ibis::index is documented in index.h.
Inside FastBit, an index is created through the class
ibis::column::indexLock, which in turn calls
ibis::column::loadIndex, which then invokes the index factory
ibis::index::create. The main reason for using
ibis::column::indexLock is to track how many
queries are using an index simultaneously. It is possible to bypass all
these layers of code by directly invoking the constructor of a concrete
There are two types of indexes in FastBit, binned and unbinned. The
binned class all derive from
ibis::bin, and most of the
unbinned classes derive from
ibis::relic that implements the
basic bitmap index. The class
ibis::bin is defined in ibin.h
and the class
ibis::relic is defined in irelic.h.
Note that all concrete index classes are defined in files starting with