| ||||||
| FastBit Front Page | Research Publications | Software Documentation | Software Download | Software License | ||
Organization: LBNL » CRD » SDM » FastBit » Documentation » Command Line Tool
IBIS [1] is an Implementation of Bitmap Indexing
System named FastBit.
This document explains the command line tool named ibis,
which is a shorthand for
A table is physically organized into one or more data partitions, so
that one column from a partition can comfortably fit in computer memory.
Each data partition is stored in a directory on file systems and the
command line tool ibis works with data directories.
falkland.jgi-psf.org as an example, the
following command prints all the machine names (mchn) and the
temperature values (tmpr) where the temperature is not one of
the nominal values (55 for MegaBace and 60 for ABI).
/home/kwu/bin/ibis -c /home/kwu/bin/ibis.rc -v -q "select mchn, tmpr where ! (tmpr == 55 || tmpr == 60)"On the particular machine, the most current version of the ibis executable is
/home/kwu/bin/ibis. The file name following
option -c is the configuration file name. Alternatively, one
may directly specify the data directory on command line use '-d
data-directory-path'. The particular file
contains the current version of JGI trace data header information. The
attribute names [2] are available in the data directories
/psf/QC/Projects/IBIS/Datasets.
The main option is -q which is followed by a query string. The
basic syntax follows that of SQL, however, only the basic features of
the SQL's select statement is implemented. Here we will first mention a
few limitations that might cause non-descriptive failures of ibis.
The option -v tells ibis to be verbose. If this option is not
supplied, only the number of hits and the result of the select clause
are printed. The result of the select clause may be appended to a file
instead of printed to standard output. To use this option specify
'-output output-file'.
-a[ppend] data_dir [to table_name]
to must appear before the table name. If a table name is
not specified, ibis attempt to use the meta tags contained in the table
to generate a name. If no meta tags are found, it will generate a
random name.
More information about loading data can be found in dataLoading.html.
-b[uild-index] [num-threads|indexing-option]
The option
Using a pair of directories for a data partition was intended to improve
reliability and reduce the transition time when appending data. In most
cases, it is fine to use only one directory for each data partition, in
which case, one simply do not specify "DataDir2". More information
about the configuration file is available in dataLoading.html.
NOTE: this file contains the error messages and other
information. If option
NOTE: options
NOTE: the output file contains an header of the variable names
and functions. No type information is provided.
NOTE: this file contains the results of the select statments
only. Other messages, such as errors, progress information, and debug
information, may be redirected with option -b may be followed by one of two
arguments, a decimal number to indicate the number of simultaneous
threads to use to build the indexes, or a string arguments to indicate
the type of indexes to build. An indexing option string might look like
"<binning nbins=1000/> <encoding range ncomp=2/>".
More about the indexing options is available at indexSpec.html.
-c[onf] conf_file
Specify a configuration file. If no configuration file is
specified, it will look for a file named "ibis.rc" in the current
working directory. If that file does not exist, it will also look at
the environment variable named IBISRC. If the environment variable is
defined, its value is taken to be the rc file. One of the most
noticeable entries defined in the rc file is the "DataDir1" and
"DataDir2" entries. They define the data directories used by IBIS. On
UNIX systems, ibis will also recursively traverse the
directories to find directory pairs with the same name and the matching
-part.txt files. Each such pair defined a partition. If
different data partitions have the same name, only the last one will be
kept.
-d[atadir] data_dir [backup_dir]
As an alternative to specify the data directories in a configuration
file, one may specified them directly on the command line. The effect
of "data_dir" and "backup_dir" are the same as "DataDir1" and "DataDir2"
in the RC file.
-e[stimation-only]
Output a range for the number of hits rather than the exact number
of hits. Note, the estimation is applicable to queries contain
only simple range conditions without negation. Otherwise, the
estimation may return 0 as both the upper and lower bound of the number
of hits.
-h[elp]
Print a short usage statement.
-i[nteractive]]
Tell ibis to enter an interactive mode after finishing processing
the command line arguments. In the interactive mode, the user may
directly use the SELECT statement described below with the leading
option -q. There are also a small number of other commands
that can be used in the interactive mode. Type "help" in the
interactive mode to see a list.
-l[logfile] logfilename
Redirect all messages printed out by ibis to the named
file. The file is opened in append mode, therefore the existing content
is preserved. The only message that may still be printed to standard
output is something indicating the name of the message file.
-o is also specified, the file
specified in that option will contain the results of select statements.
-n[o-estimation]
Forces ibis to evaluate the number of hits without
first perform an estimation.
-no-estimation and
-estimation-only are mutually exclusive, the one that
appears later will overwrite the one that appears early on the same
command line.
-o[utput] filename
Tell ibis to append the result of select statements to
the name file rather than to standard output. The output are
coma-separated values.
-l.
-p[rint] [Parts|Columns|Distributions|column-name [:
conditions]]
Print information about the tables known to ibis program.
-q[uery] [SELECT ...] [FROM ...] WHERE ...
On most systems the strings following -q must be quoted in
order for them to be perceived as one argument.
-r[id-check] [filename]
This option will tells ibis to verify the RIDs can be
used in queries of their own. If the optional file name is present, the
RIDs will also be written into the named file.
-s[quential-scan]
For ibis to check the answer produced from the index operations with
a sequence scan of the raw data.
-t[=n]
Instructs ibis to perform predefined self tests. If a positive
integer is provided, it take the given number as the number of tests to
perform.
-v[=n]
Any integer can be specified as the verbose level. Typically range
are 1 -- 20. Use a negative number to mute ibis.
The command line tool ibis supports a limited version of
the SQL select statement. It recognizes three key words, SELECT, FROM
and WHERE. The key words are not case-sensitive, neither are the names
of variables and functions described next.
The key word SELECT must be followed by a list of attribute names or one of the four functions AVG, MAX, MIN and SUM, separated by commas (,). The attribute names must be from the available datasets. If any name is not in the available dataset, IBIS treats it same as no attribute provided. If no attribute is provided, the SELECT key must not be used. In which case, only the number of hits would be printed. The four functions each take one argument that must be a column name of an available dataset. The variables not appearing in any functions are implicitly passed to a SQL 'GROUP BY' clause and the functions are defined on the groups defined by this implicit 'GROUP BY' clause. For example, the select clause 'SELECT mchn, avg(q20), min(snra)' will order the selected records according to machine name (mchn) and for each machine the average Q20 score and the minimum SNRA value will be computed. In the printout, the number of selected records is printed at the end. This is equivalent to adding 'count(*)' to the end of every (non-empty) select clause.
The key word FROM must be followed by a list of table names. Conceptually, the data under the management of IBIS are organized into tables; and each table must have a name. The table names may contain wild cards, '%' and '_', where '%' matches zero or more any characters and '_' matches exactly one character as in SQL "LIKE" expression. If no table name is specified, the key word FROM must not be used. In which case, all know data table would be queried.
The key word WHERE must be followed by a set of range conditions joined
by logical operators 'AND', 'OR', 'XOR', and '!'. A range condition
can be one-sided as "A = 5" or "B > 10", or two-sided as "10 <= B < 20."
The supported operators are = (alternatively ==), <, <=, >, and >=. The
range condition that involves only one attribute with constant bounds are
known as simple conditions, which can be very efficiently processed by
IBIS. A range condition can also involve multiple attributes, such as,
"A < B <= 5", or even arithmetic expressions, such as, "sin(A) + fabs(B)
< sqrt(cx*cx+cy*cy)". Note all one-argument and two-argument arithmetic
functions available in math.h are supported. The key word
WHERE and the conditions following it are essential to a query and can
not be ommited.
ibis::part::readTDC (Tue Jun 14 03:30:32 2005 UTC) --- failed to open file "/home/xhe/TRAC/tmp/dir1/-part.txt" -- No such file or directory
buildTables: directory /home/xhe/TRAC/tmp/dir1 does not contain a valid -part.txt file or contains an empty table.
Make sure the parameter Number_of_events has the correct value
Completed construction of an ibis::part named 200506.
1766540 records each with 36 attributes
./ibis: batch mode, log level 1
Tables:
200506
Query:
select mchn, tmpr where ! (tmpr == 55 or tmpr == 60)
doQuery (Tue Jun 14 03:31:13 2005 UTC) --- processing query ! (tmpr == 55 or tmpr == 60) on table 200506
query[LT55J4AkJu400000]::setWhereClause -- WHERE "! (tmpr == 55 or tmpr == 60)"
query[LT55J4AkJu400000]::setSelectClause -- SELECT mchn,tmpr
ibis::column[200506.Tmpr](INT)::readIndex -- the basic index was read (/home/xhe/TRAC/tmp/dir1/200506) in 0 sec(CPU), 0.00101352 sec(elapsed)
query[LT55J4AkJu400000]::estimate -- time to compute the bounds: 0.01 sec(CPU), 0.00476098 sec(elapsed).
query[LT55J4AkJu400000]::estimate -- # of hits for query "! (tmpr == 55 or tmpr == 60)" is 0
doQuery (Tue Jun 14 03:31:13 2005 UTC) --- number of hits in [0, 0]
query[LT55J4AkJu400000]::evaluate -- time to compute the 15744 hits: 0 sec(CPU), 0.000848532 sec(elapsed).
query[LT55J4AkJu400000]::evaluate -- user kwu SELECT mchn,tmpr FROM 200506 WHERE ! (tmpr == 55 or tmpr == 60) ==> 15744 hit(s).
doQuery (Tue Jun 14 03:31:13 2005 UTC) --- number of hits = 15744
doQuery:: evaluate(! (tmpr == 55 or tmpr == 60)) took 0.00626969 sec(elapsed)
ibis::part[200506]::getRIDs -- number of RIDs (0) does not match the size of the mask (1766540)
Tue Jun 14 03:31:13 2005 UTC
Warning -- query[LT55J4AkJu400000]::getRIDs -- got 0 row IDs from table 200506, expected 15744
ibis::part[200506]::getRIDs -- number of RIDs (0) does not match the size of the mask (1766540)
ibis::column[200506.MCHN](KEY)::readIndex -- the basic index was read (/home/xhe/TRAC/tmp/dir1/200506) in 0 sec(CPU), 0.000895739 sec(elapsed)
Query LT55J4AkJu400000 produces 32 distinct tuples of columns mchn,tmpr
MegaBACE # MB424 -128
MegaBACE # MB424 -120
MegaBACE # MB424 -112
MegaBACE # MB424 -104
MegaBACE # MB424 -96
MegaBACE # MB424 -88
MegaBACE # MB424 -80
MegaBACE # MB424 -72
MegaBACE # MB424 -64
MegaBACE # MB424 -56
MegaBACE # MB424 -48
MegaBACE # MB424 -40
MegaBACE # MB424 -32
MegaBACE # MB424 -24
MegaBACE # MB424 -16
MegaBACE # MB424 -8
MegaBACE # MB424 0
MegaBACE # MB424 8
MegaBACE # MB424 16
MegaBACE # MB424 24
MegaBACE # MB424 32
MegaBACE # MB424 40
MegaBACE # MB424 48
MegaBACE # MB424 56
MegaBACE # MB424 64
MegaBACE # MB424 72
MegaBACE # MB424 80
MegaBACE # MB424 88
MegaBACE # MB424 96
MegaBACE # MB424 104
MegaBACE # MB424 112
MegaBACE # MB424 120
Cleaning up table 200506
Cleaning up the file manager
Total pages accessed through read(unistd.h) is 145
The number of hits is printed in the following line
query[LT55J4AkJu400000]::evaluate -- user ... ==> 15744 hit(s).
The SELECT clause produced the output with the following heading.
Query LT55J4AkJu400000 produces 32 distinct tuples of columns mchn,tmpr
In this particular case, it prints the machine name with the abnormal temperature, 'MegaBACE # MB 424', and the abnormal temperature values, which incidentally all appears to be multiple of 8.