Namespaces | Macros | Typedefs | Functions
ibis.cpp File Reference

IBIS – Interactive Bitmap Index Search. More...

#include <ibis.h>
#include <mensa.h>
#include <twister.h>
#include <sstream>
#include <algorithm>
#include <memory>
#include <iomanip>

Namespaces

 ibis
 The current implementation of FastBit is code named IBIS; most data structures and functions are in the name space ibis.
 

Macros

#define maxselect   8
 

Typedefs

typedef std::vector< joinspec * > ibis::joinlist
 
typedef std::pair< const char *, const char * > namepair
 

Functions

template<typename T >
void findMissingValuesT (const ibis::column &col, const ibis::bitvector &ht0, const ibis::bitvector &ht1)
 
int main (int argc, char **argv)
 
void * thFun (void *arg)
 

Detailed Description

IBIS – Interactive Bitmap Index Search.

A sample code to exercises the main features of the FastBit bitmap indexing and search capabilities. It can ingest data through append operations, build indexes, and answer a limited version of SQL select statement. These SQL statments may be entered either as command line arguments or from standard input.

The queries are specified in a simplified SQL statement of the form:

[SELECT ...] [FROM ...] WHERE ... [ORDER BY colname [ASC | DESC] [colname [ASC | DESC]]] [LIMIT ...]

The SELECT clause contains a list of column names and some of the following one-argument functions, AVG, MAX, MIN, SUM, VARPOP, VARSAMP, STDPOP, STDSAMP, DISTINCT, e.g., "SELECT a, b, AVG(c), MIN(d)." If specified, the named columns of qualified records will be displayed as the result of the query. The unqualified variables will be used to group the selected records; for each group the values of the functions are evaluated. This is equivalent to use all unqualified variables in the "GROUP BY" clause. Note the print out always orders the unqualified variables first followed by the values of the functions. It always has an implicit "count(*)" as the end of each line of print out.

The FROM clause contains a list of data partition names. If specified, the search will be performed only on the named partitions. Otherwise, the search is performed on all known tables.

The column names and partition names can be delimited by either ',', or ';'. The leading space and trailing space of each name will be removed and no space is allowed in the middle of the names.

The WHERE clause specifies the condition of the query. It is specified as range queries of the form

RANGE LOGICAL_OP RANGE

where LOGICAL_OP can be one of "and", "or", "xor", "minus", "&&", "&", "||", "|", "^", and "-". Note the logical "minus" operations can be viewed as a short-hand for "AND NOT," i.e., "A minus B" is exactly the same as "A AND NOT B."

A range is specifed on one column of the form

ColumnA CMP Constant

where CMP can be one of =, ==, !=, >, >=, <, <=.

The ranges and expressions can also be negated with either '!' or '~'.

The ORDER BY clause and the LIMIT clause are applied after the implicit GROUP BY operation has been performed. The expressions in the ORDER BY clause must be a proper subset of the SELECT clause. The modifiers ASC and DESC are optional. By default ASC (ascending) order is used. One may use DESC to change to use the descending order.

The LIMIT clause limits the maximum number of output rows. Only number may follow the LIMIT keyword. This clause has effects only if the preceeding WHERE clause selected less than or equal to the specified number of rows (after applying the implicit group by clause).

Command line options:

-append data_dir [output_dir / partition_name]
-build-indexes [numThreads|indexSpec] -z[ap-existing-indexes]
-conf conf_file
-datadir data_dir
-estimation-only
-f query-file-name
-help
-interactive
-independent-parts
-join part1 part2 join-column conditions1 conditions2 [columns ...]
-keep-temporary-files
-log logfilename
-mesh-query
-no-estimation
-o[utput-[with-header|as-binary]] name
-query [SELECT ...] [FROM ...] WHERE ...
-s <sequential-scan or sort-option>
-rid-check [filename]
-reorder data_dir[:colname1,colname2...]
-t[=| ]n
-v[=| ]n
-yank filename|conditions

An explanation of these command line arguments are provided at http://lbl.gov/~kwu/fastbit/doc/ibisCommandLine.html.

Note
Options can be specified with the minimal distinguishing prefixes, which in most cases is just the first letter.
Options -no-estimation and -estimation-only are mutually exclusive, the one that appears later will overwrite the one that appears early on the same command line.
Option -t is interpreted as self-testing if no query is specified on the same command line; however if there are any query, it is interpreted as indicating the number of threads to use.
The select clause of "count(*)" produces a result table with one row and one column to hold the content of "count(*)" following the SQL standard. If no select clause is specified at all, this program will print the number of hits. In either case, one gets back the number of hits, but different handling is required.
Only implicit group by operation is performed. This program does NOT accept a group by clause!

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive