Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Static Protected Member Functions | Protected Attributes | List of all members
ibis::index Class Referenceabstract

The base index class. More...

#include <index.h>

Inheritance diagram for ibis::index:
ibis::bin ibis::direkte ibis::keywords ibis::relic ibis::ambit ibis::bak ibis::bak2 ibis::egale ibis::fuge ibis::mesa ibis::pack ibis::pale ibis::range ibis::zone ibis::bylt ibis::fade ibis::fuzz ibis::skive ibis::zona

Classes

class  barrel
 A specialization that adds function setValue. More...
 
class  bitmapReader
 A simple container to hold the function pointer given by user for reading the serialized bitmaps. More...
 

Public Types

typedef std::map< double, uint32_t > histogram
 
enum  INDEX_TYPE {
  BINNING =0, RANGE, MESA, AMBIT,
  PALE, PACK, ZONE, RELIC,
  ROSTER, SKIVE, FADE, SBIAD,
  SAPID, EGALE, MOINS, ENTRE,
  BAK, BAK2, KEYWORDS, MESH,
  BAND, DIREKTE, GENERIC, BYLT,
  FUZZ, ZONA, FUGE, SLICE,
  EXTERN
}
 The integer values of this enum type are used in the index files to differentiate the indexes. More...
 
typedef std::map< double, ibis::bitvector * > VMap
 

Public Member Functions

void addBins (uint32_t ib, uint32_t ie, ibis::bitvector &res) const
 Add the sum of bits[ib] through bits[ie-1] to res. More...
 
void addBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, const ibis::bitvector &tot) const
 Compute the sum of bit vectors [ib, ie). More...
 
virtual long append (const char *, const char *, uint32_t)
 Extend the index.
 
virtual void binBoundaries (std::vector< double > &) const
 The function binBoundaries and binWeights return bin boundaries and counts of each bin respectively. More...
 
virtual void binWeights (std::vector< uint32_t > &) const
 
virtual int contractRange (ibis::qContinuousRange &) const
 
virtual indexdup () const =0
 Duplicate the content of an index object.
 
bool empty () const
 The index object is considered empty if there is no bitmap or getNRows returns 0. More...
 
virtual void estimate (const ibis::qContinuousRange &, ibis::bitvector &lower, ibis::bitvector &upper) const
 Computes an approximation of hits as a pair of lower and upper bounds. More...
 
virtual uint32_t estimate (const ibis::qContinuousRange &) const
 Returns an upper bound on the number of hits.
 
virtual void estimate (const ibis::qDiscreteRange &expr, ibis::bitvector &lower, ibis::bitvector &upper) const
 Estimate the hits for discrete ranges, i.e., those translated from 'a IN (x, y, ..)'. More...
 
virtual uint32_t estimate (const ibis::qDiscreteRange &expr) const
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the pairs for the range join operator.
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the pairs for the range join operator. More...
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr) const
 Estimate an upper bound for the number of pairs.
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask) const
 Estimate an upper bound for the number of pairs produced from marked records. More...
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2) const
 
virtual void estimate (const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Evaluating a join condition with one (likely composite) index.
 
virtual int64_t estimate (const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2) const
 
virtual double estimateCost (const ibis::qContinuousRange &) const
 Estimate the cost of evaluating a range condition.
 
virtual double estimateCost (const ibis::qDiscreteRange &) const
 Estimate the cost of evaluating a range condition.
 
virtual long evaluate (const ibis::qContinuousRange &expr, ibis::bitvector &hits) const =0
 To evaluate the exact hits. More...
 
virtual long evaluate (const ibis::qDiscreteRange &, ibis::bitvector &) const
 To evaluate the exact hits. More...
 
virtual int expandRange (ibis::qContinuousRange &) const
 The functions expandRange and contractRange expands or contracts the boundaries of a range condition so that the new range will have exact answers using the function estimate. More...
 
virtual const ibis::bitvectorgetBitvector (uint32_t i) const
 Return a pointer to the ith bitvector used in the index (may be 0).
 
virtual long getCumulativeDistribution (std::vector< double > &bds, std::vector< uint32_t > &cts) const
 Cumulative distribution of the data. More...
 
virtual long getDistribution (std::vector< double > &bbs, std::vector< uint32_t > &cts) const
 Binned distribution of the data. More...
 
virtual double getMax () const
 The maximum value recorded in the index.
 
virtual double getMin () const
 The minimum value recorded in the index.
 
uint32_t getNRows () const
 Return the number of rows represented by this object.
 
virtual double getSum () const
 Compute the approximate sum of all the values indexed. More...
 
virtual const char * name () const =0
 Returns the name of the index, similar to the function type, but returns a string instead. More...
 
virtual uint32_t numBitvectors () const
 Returns the number of bit vectors used by the index.
 
virtual void print (std::ostream &out) const =0
 Prints human readable information. More...
 
virtual int read (const char *name)=0
 Reconstructs an index from the named file. More...
 
virtual int read (ibis::fileManager::storage *st)=0
 Reconstructs an index from an array of bytes. More...
 
virtual long select (const ibis::qContinuousRange &, void *) const =0
 Evaluate the range condition and select values.
 
virtual long select (const ibis::qContinuousRange &, void *, ibis::bitvector &) const =0
 Evaluate the range condition, select values, and record the positions.
 
virtual void serialSizes (uint64_t &, uint64_t &, uint64_t &) const =0
 Compute the size of arrays that would be generated by the serializatioin function (write). More...
 
float sizeInBytes () const
 Estiamte the size of this index object measured in bytes. More...
 
virtual void speedTest (std::ostream &) const
 Time some logical operations and print out their speed.
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res) const
 Sum up bits[ib:ie-1] and place the result in res. More...
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, uint32_t ib0, uint32_t ie0) const
 Compute a new sum for bit vectors [ib, ie) by taking advantage of the old sum for bitvectors [ib0, ie0). More...
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, uint32_t *buf) const
 Sum up bits[ib:ie-1] and place the result in res. More...
 
void sumBins (const ibis::array_t< uint32_t > &, ibis::bitvector &) const
 Sum up the bits in in the specified bins.
 
virtual INDEX_TYPE type () const =0
 Returns an index type identifier.
 
virtual float undecidable (const ibis::qContinuousRange &expr, ibis::bitvector &iffy) const
 Mark the position of the rows that can not be decided with this index. More...
 
virtual float undecidable (const ibis::qDiscreteRange &expr, ibis::bitvector &iffy) const
 
virtual int write (const char *name) const =0
 Save index to a file. More...
 
virtual int write (ibis::array_t< double > &, ibis::array_t< int64_t > &, ibis::array_t< uint32_t > &) const =0
 Save index to three arrays. Serialize the index in memory.
 
virtual ~index ()
 The destructor.
 

Static Public Member Functions

static void addBits (const array_t< bitvector * > &bits, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Add the pile[ib:ie-1] to res. More...
 
static indexcreate (const column *c, const char *name=0, const char *spec=0, int inEntirety=0)
 Index factory. More...
 
static void divideCounts (array_t< uint32_t > &bounds, const array_t< uint32_t > &cnt)
 Determine how to split the array cnt, so that each group has roughly the same total value. More...
 
static bool isIndex (const char *f, INDEX_TYPE t)
 Is the named file an index file? Read the header of the named file to determine if it contains an index of the specified type. More...
 
template<typename E >
static void mapValues (const array_t< E > &val, VMap &bmap)
 
template<typename E >
static void mapValues (const array_t< E > &val, histogram &hist, uint32_t count=0)
 
template<typename E >
static void mapValues (const array_t< E > &val, array_t< E > &bounds, std::vector< uint32_t > &cnts)
 
template<typename E1 , typename E2 >
static void mapValues (const array_t< E1 > &val1, const array_t< E2 > &val2, array_t< E1 > &bnd1, array_t< E2 > &bnd2, std::vector< uint32_t > &cnts)
 Compute a two-dimensional histogram. More...
 
static void printHeader (std::ostream &, const char *)
 
static void setBases (array_t< uint32_t > &bases, uint32_t card, uint32_t nbase=2)
 Fill the array bases with the values that cover the range [0, card). More...
 
static void sumBits (const array_t< bitvector * > &bits, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Sum up pile[ib:ie-1] and place the result in res. More...
 
static void sumBits (const array_t< bitvector * > &bits, const ibis::bitvector &tot, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Sum up pile[ib:ie-1] and add the result to res. More...
 

Protected Member Functions

virtual void activate () const
 Regenerate all bitvectors from the underlying storage. More...
 
virtual void activate (uint32_t i) const
 Regenerate the ith bitvector from the underlying storage.
 
virtual void activate (uint32_t i, uint32_t j) const
 Regenerate bitvectors i (inclusive) through j (exclusive) from the underlying storage. More...
 
virtual void clear ()
 Clear the existing content. More...
 
void computeMinMax (const char *f, double &min, double &max) const
 
void dataFileName (std::string &name, const char *f=0) const
 Generate data file name from "f". More...
 
virtual size_t getSerialSize () const throw ()
 Compute the size of the serialized version of the index. More...
 
 index (const ibis::column *c=0)
 Default constructor. More...
 
 index (const ibis::column *c, ibis::fileManager::storage *s)
 Constructor with a storage object. More...
 
 index (const index &)
 Copy constructor.
 
void indexFileName (std::string &name, const char *f=0) const
 Generates index file name from "f". More...
 
void initBitmaps (int fdes)
 Prepare the bitmaps using the given file descriptor. More...
 
void initBitmaps (ibis::fileManager::storage *st)
 Prepare bitmaps from the given storage object. More...
 
void initBitmaps (uint32_t *st)
 Prepare bitmaps from the given raw pointer. More...
 
void initBitmaps (void *ctx, FastBitReadBitmaps rd)
 Prepare bitmaps from the user provided function pointer and context. More...
 
int initOffsets (int64_t *, size_t)
 Initialize the offsets from the given data array. More...
 
int initOffsets (int fdes, const char offsize, size_t start, uint32_t nobs)
 Read in the offset array. More...
 
int initOffsets (ibis::fileManager::storage *st, size_t start, uint32_t nobs)
 Regenerate the offsets array from the given storage object. More...
 
void mapValues (const char *f, VMap &bmap) const
 Map the positions of each individual value. More...
 
void mapValues (const char *f, histogram &hist, uint32_t count=0) const
 Generate a histogram. More...
 
indexoperator= (const index &)
 Assignment operator.
 
void optionalUnpack (array_t< ibis::bitvector * > &bits, const char *opt)
 A function to decide whether to uncompress the bitvectors. More...
 

Static Protected Member Functions

static void indexFileName (std::string &name, const ibis::column *col1, const ibis::column *col2, const char *f=0)
 Generate the index file name for the composite index fromed on two columns. More...
 

Protected Attributes

array_t< ibis::bitvector * > bits
 A list of bitvectors.
 
bitmapReaderbreader
 The functor to read serialized bitmaps from a more complex source.
 
const ibis::columncol
 Pointer to the column this index is for.
 
const char * fname
 The name of the file containing the index.
 
uint32_t nrows
 The number of rows represented by the index. More...
 
array_t< int32_t > offset32
 Starting positions of the bitvectors.
 
array_t< int64_t > offset64
 Starting positions of the bitvectors. More...
 
ibis::fileManager::storagestr
 The underlying storage. More...
 

Detailed Description

The base index class.

Class ibis::index contains the common definitions and virtual functions of the class hierarchy. It is assumed that an ibis::index is for only one column. The user is to create an new index through the function ibis::index::create and only use the functions defined in this class.

Member Enumeration Documentation

The integer values of this enum type are used in the index files to differentiate the indexes.

Enumerator
BINNING 

ibis::bin.

Fix this as 0 so that the index type indicator will be known all versions of the program. This should ensure the index files from different version of FastBit code are recognized correctly.

RANGE 

ibis::range.

MESA 

ibis::interval.

AMBIT 

ibis::ambit, range-range two level encoding on bins.

PALE 

ibis::pale, equality-range encoding on bins.

PACK 

ibis::pack, range-equality encoding on bins.

ZONE 

ibis::zone, equality-equality encoding on bins.

RELIC 

ibis::relic, the basic bitmap index.

ROSTER 

ibis::roster, RID list.

SKIVE 

ibis::skive, binary encoding with recoding of key values.

FADE 

ibis::fade, multicomponent range encoding (unbinned).

SBIAD 

ibis::sbiad, multicomponent interval encoding (unbinned).

SAPID 

ibis::sapid, multicomponent equality encoding (unbinned).

EGALE 

ibis::egale, multicomponent equality encoding on bins.

MOINS 

ibis::moins, multicomponent range encoding on bins.

ENTRE 

ibis::entre, multicomponent interval encoding on bins.

BAK 

ibis::bak, reduced precision mapping, equality code.

BAK2 

ibis::bak2, splits each BAK bin in three, one less than the mapped value, one greater than the mapped value, and one equal to the mapped value.

This is used to implement the low-precision binning scheme.

KEYWORDS 

ibis::keywords, boolean term-document matrix.

MESH 

not used.

BAND 

not used.

DIREKTE 

ibis::direkte, hash value to bitmaps.

GENERIC 

not used.

BYLT 

ibis::bylt, unbinned range-equality encoding.

FUZZ 

ibis::fuzz, unbinned interval-equality encoding.

ZONA 

ibis::zona, unbinned equality-equality encoding.

FUGE 

ibis::fuge, binned interval-equality encoding.

SLICE 

ibis::slice, bit-sliced index.

EXTERN 

externally defined index.

Constructor & Destructor Documentation

ibis::index::index ( const ibis::column c = 0)
inlineprotected

Default constructor.

Protect the constructor so that ibis::index can not be instantiated directly. Protecting it also reduces the size of public interface.

ibis::index::index ( const ibis::column c,
ibis::fileManager::storage s 
)
protected

Constructor with a storage object.

Both the column object and the storage object are expected to be valid. However, this function only make uses of the storage object.

References ibis::fileManager::storage::begin(), col, ibis::column::fullname(), and nrows.

Member Function Documentation

void ibis::index::activate ( ) const
protectedvirtual
void ibis::index::activate ( uint32_t  i,
uint32_t  j 
) const
protectedvirtual

Regenerate bitvectors i (inclusive) through j (exclusive) from the underlying storage.

References ibis::array_t< T >::read(), and UnixOpen.

void ibis::index::addBins ( uint32_t  ib,
uint32_t  ie,
ibis::bitvector res 
) const

Add the sum of bits[ib] through bits[ie-1] to res.

Always explicitly use bits[ib] through bits[ie-1]. The most important difference between this function and sumBins is that this function always use bits[ib] through bits[ie-1]. This is similar to the function addBits.

References ibis::bitvector::adjustSize(), ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

void ibis::index::addBins ( uint32_t  ib,
uint32_t  ie,
ibis::bitvector res,
const ibis::bitvector tot 
) const

Compute the sum of bit vectors [ib, ie).

If computing a complement is faster, assume all bit vectors add up to tot. This is basically a copy of the function sumBins (without the 4th argument). There are two changes: (1) if res has the same number of bits as tot, the new sum is added to the existing bitvector, and (2) when it computes the sum through complements, it performs a subtraction from tot.

References ibis::bitvector::adjustSize(), ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

void ibis::index::addBits ( const array_t< bitvector * > &  pile,
uint32_t  ib,
uint32_t  ie,
ibis::bitvector res 
)
static

Add the pile[ib:ie-1] to res.

This function always use bitvectors pile[ib] through pile[ie-1] and expects the caller to have filled these bitvectors already.

Note
The caller need to activate the required bit vectors! In other words, pile[ib:ie-1] must be in memory.
If pile[i] is a null pointer, it is skipped, which is equivalent to it being a bitvector of all 0s.

References ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::compress(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::util::logMessage(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::array_t< T >::size(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

virtual void ibis::index::binBoundaries ( std::vector< double > &  ) const
inlinevirtual

The function binBoundaries and binWeights return bin boundaries and counts of each bin respectively.

Reimplemented in ibis::bak2, ibis::bak, ibis::egale, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, ibis::range, ibis::bin, ibis::relic, ibis::keywords, and ibis::direkte.

Referenced by ibis::part::coarsenBins(), and ibis::part::get2DDistributionI().

void ibis::index::clear ( )
protectedvirtual
ibis::index * ibis::index::create ( const column c,
const char *  dfname = 0,
const char *  spec = 0,
int  readopt = 0 
)
static

Index factory.

It creates a specific concrete index object. It attempts to read the existing index if a location is specified. If it fails to read an index or no expilicit location is given, it attempts to create a new index based on the current data file and index specification. Any newly created index will be written to a file.

Parameters
ca pointer to a ibis::column object. This argument must be present.
dfnamedata file name, may also be the name of the index file, or the directory containing the data file. If the name ends with '.idx' is treated as an index file, and the content of the file is read. If the name does not end with '.idx', it is assumed to be the data file name, unless it is determined to be a directory name. If it is a directory name, the data file is assumed to be in the specified directory with the same name as the column. Once a data file is found, the content of the data file is read to construct a new index according to the return value of function indexSpec. The argument dfname can be nil, in which case, the data file name is constructed by concatenate the return from partition()->currentDataDir() and the column name.
specthe index specification. This string contains the parameters for how to create an index. The most general form is
*  <binning .../> <encoding .../> <compression .../>.
* 
Here is one example (it is the default for some integer columns)
*  <binning none /> <encoding equality />
* 
FastBit always compresses every bitmap it ever generates. The compression option is to instruct it to uncompress some bitmaps or not compress indexes at all. The compress option is usually not used.

If the argument spec is not specified, this function checks the specification in the following order.

– use the index specification for the column being indexed; – use the index specification for the table containing the column being indexed; – use the most specific index specification relates to the column be indexed in the global resources (gParameters).

It stops looking as soon as it finds the first non-empty string, which follows the principle of a more specific index specification override a general specification.

Parameters
readoptDepending on whether this value is positive, zero or negative, the index is read from the index file in three different ways.

(1) If this value is positive, the content of the index file is read into memory and there is no need for further I/O operation to use the index.

(2) If this value is zero, the content of a large index file is loaded into memory through memory map and the content of a small index file will be read into memory. This is the default option.

(3) If this value is negative, then only the metadata is read into memory. This option requires the least amount of memory, but requires more I/O operations later when bitmaps are needed to answer queries. To use option (1) and (2), there must be enough memory to hold the index file in memory. Furthermore, to use the memory map option, FastBit must be able to hold the index file open indefinitely and the operating system must support memory map function mmap.

These three options have different start up cost and different query processing cost. Typically, reading the whole file in this function will take the longest time, but since it requires no further I/O operations, its query processing cost is likely the lowest. The memory map option only need to load the page table into memory and read part of the metadata, it is likely to be relatively inexpensive to reconstruct the index object this way. Since the memory map option can read the necessary portion of the index file into memory pretty efficiently, the query processing cost should have reasonable performance. The third of reading metadata only in this function requires the least amount of memory, but it might actually read more bytes in this function than the memory map option because this option actually needs to read all the bytes representing the metadata while the memory map option only need to create a memory map for the index file. Later when the index is used, the needed bitmaps are read into memory, which is likely take more time than accessing the memory mapped bitmaps. Additionally, the third option also causes the bitmaps to be placed in unpredictable memory locations, while the first two options place all bitmaps of an index consecutively in memory. This difference in memory layout could cause the memory accesses to take different amounts of time; typically, accessing consecutive memory locations is more efficient.

The default value of readopt is 0, which prefers the memory map option.

Returns
This function returns a pointer to the index created. The caller is responsible for freeing the pointer. In case of error, it returns a nil pointer. It captures and absorbs exceptions in most cases.
Note
An index can NOT be built correctly if it does not fit in memory! This is the most likely reason for the function to fail. If this does happen, try to build indexes one at a time, use a machine with more memory, or break up a large partition into a number of smaller ones. Normally, we recommand one to not put more than 100 million rows in a data partition.
Set dfname to null to build a brand new index and discard the existing index.
The index specification passed to this function will be attached to the column object if a new index is to be built. This is the only possible change to the column object.

References ibis::fileManager::storage::begin(), ibis::BIT, ibis::BLOB, ibis::horometer::CPUTime(), ibis::part::currentDataDir(), ibis::column::dataFileName(), ibis::fileManager::flushFile(), ibis::column::fullname(), getNRows(), ibis::gParameters(), indexFileName(), ibis::part::indexSpec(), ibis::fileManager::instance(), ibis::resource::isTrue(), ibis::part::name(), ibis::column::name(), name(), ibis::part::nRows(), print(), ibis::column::purgeIndexFile(), ibis::horometer::realTime(), ibis::horometer::start(), ibis::horometer::stop(), ibis::fileManager::tryGetFile(), ibis::column::type(), UnixOpen, ibis::UNKNOWN_TYPE, and write().

Referenced by ibis::column::append(), ibis::column::loadIndex(), and ibis::part::loadIndexes().

void ibis::index::dataFileName ( std::string &  iname,
const char *  f = 0 
) const
protected

Generate data file name from "f".

Invokes ibis::column::dataFileName to do the actual work.

Referenced by ibis::direkte::direkte(), and ibis::keywords::keywords().

void ibis::index::divideCounts ( array_t< uint32_t > &  bdry,
const array_t< uint32_t > &  cnt 
)
static

Determine how to split the array cnt, so that each group has roughly the same total value.

Note
The array bdry stores the dividers. The first group is [0, bdry[0]), the second group is [bdry[0], bdry[1]), and so on. Ths function uses the size of array bdry to determine the number of groups to use.

References ibis::array_t< T >::clear(), ibis::array_t< T >::push_back(), ibis::array_t< T >::resize(), and ibis::array_t< T >::size().

Referenced by ibis::part::adaptive2DBins(), ibis::part::adaptive3DBins(), ibis::part::adaptiveFloats(), ibis::part::adaptiveFloatsDetailed(), ibis::part::adaptiveInts(), ibis::part::adaptiveIntsDetailed(), ibis::part::coarsenBins(), ibis::part::equalWeightBins(), ibis::part::get2DDistributionI(), ibis::part::getCumulativeDistribution(), and ibis::part::getDistribution().

bool ibis::index::empty ( ) const
inline

The index object is considered empty if there is no bitmap or getNRows returns 0.

References bits, and nrows.

Referenced by ibis::column::indexLock::indexLock().

virtual void ibis::index::estimate ( const ibis::qContinuousRange ,
ibis::bitvector lower,
ibis::bitvector upper 
) const
inlinevirtual

Computes an approximation of hits as a pair of lower and upper bounds.

Parameters
exprthe query expression to be evaluated.
lowera bitvector marking a subset of the hits. All rows marked with one (1) are definitely hits.
uppera bitvector marking a superset of the hits. All hits are marked with one, but some of the rows marked one may not be hits. If the variable upper is empty, the variable lower is assumed to contain the exact answer.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, ibis::range, ibis::skive, ibis::keywords, ibis::bin, ibis::relic, and ibis::direkte.

References nrows, and ibis::bitvector::set().

void ibis::index::estimate ( const ibis::qDiscreteRange expr,
ibis::bitvector lower,
ibis::bitvector upper 
) const
virtual

Estimate the hits for discrete ranges, i.e., those translated from 'a IN (x, y, ..)'.

A trivial implementation to indicate the index can not determine any row.

Reimplemented in ibis::relic, and ibis::direkte.

References ibis::qDiscreteRange::colName(), and ibis::bitvector::set().

void ibis::index::estimate ( const ibis::index idx2,
const ibis::deprecatedJoin expr,
const ibis::bitvector mask,
ibis::bitvector64 lower,
ibis::bitvector64 upper 
) const
virtual

Estimate the pairs for the range join operator.

Only records that are masked are evaluated.

References ibis::bitvector64::clear(), ibis::util::outerProduct(), and ibis::bitvector64::set().

int64_t ibis::index::estimate ( const ibis::index idx2,
const ibis::deprecatedJoin expr,
const ibis::bitvector mask 
) const
virtual

Estimate an upper bound for the number of pairs produced from marked records.

References ibis::bitvector::cnt().

virtual long ibis::index::evaluate ( const ibis::qContinuousRange expr,
ibis::bitvector hits 
) const
pure virtual
virtual long ibis::index::evaluate ( const ibis::qDiscreteRange ,
ibis::bitvector  
) const
inlinevirtual

To evaluate the exact hits.

On success, return the number of hits, otherwise a negative value is returned.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, ibis::range, ibis::sapid, ibis::sbiad, ibis::fade, ibis::skive, ibis::bin, ibis::relic, and ibis::direkte.

virtual int ibis::index::expandRange ( ibis::qContinuousRange ) const
inlinevirtual

The functions expandRange and contractRange expands or contracts the boundaries of a range condition so that the new range will have exact answers using the function estimate.

The default implementation provided does nothing since this is only meaningful for indices based on bins.

Reimplemented in ibis::bak2, ibis::bak, ibis::range, and ibis::bin.

long ibis::index::getCumulativeDistribution ( std::vector< double > &  bds,
std::vector< uint32_t > &  cts 
) const
virtual

Cumulative distribution of the data.

A brute-force approach to get an accurate cumulative distribution.

Reimplemented in ibis::bin, ibis::relic, and ibis::direkte.

References ibis::util::compactValue().

long ibis::index::getDistribution ( std::vector< double > &  bbs,
std::vector< uint32_t > &  cts 
) const
virtual

Binned distribution of the data.

A brute-force approach to get an accurate distribution.

Reimplemented in ibis::bin, ibis::relic, and ibis::direkte.

size_t ibis::index::getSerialSize ( ) const
throw (
)
protectedvirtual

Compute the size of the serialized version of the index.

This the fallback implementation which always returns 0.

Reimplemented in ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::zona, ibis::mesa, ibis::bylt, ibis::range, ibis::fuzz, ibis::fade, ibis::skive, ibis::bin, ibis::relic, ibis::keywords, and ibis::direkte.

virtual double ibis::index::getSum ( ) const
inlinevirtual

Compute the approximate sum of all the values indexed.

If it decides that computing the sum directly from the vertical partition is more efficient, it will return NaN immediately.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::pack, ibis::ambit, ibis::mesa, ibis::range, ibis::fade, ibis::skive, ibis::bin, ibis::relic, ibis::keywords, and ibis::direkte.

void ibis::index::indexFileName ( std::string &  iname,
const char *  f = 0 
) const
protected

Generates index file name from "f".

Invokes ibis::column::dataFileName to do most of the work.

Referenced by create().

void ibis::index::indexFileName ( std::string &  iname,
const ibis::column col1,
const ibis::column col2,
const char *  dir = 0 
)
staticprotected

Generate the index file name for the composite index fromed on two columns.

May use argument "dir" if it is not null.

References ibis::part::currentDataDir(), and ibis::column::name().

void ibis::index::initBitmaps ( int  fdes)
protected

Prepare the bitmaps using the given file descriptor.

It clears the existing content of the array bits and resize the array to have nobs elements. It reconstructs all the bitmaps if the file name (fname) is not a valid pointer. It reads the first bitmap if the compiler macro FASTBIT_READ_BITVECTOR0 is defined. It is to be used by the constructors of a concrete index classes after initOffsets has been called.

References ibis::bitvector::set(), ibis::bitvector::size(), and ibis::bitvector::sloppySize().

Referenced by ibis::bin::bin(), and ibis::relic::relic().

void ibis::index::initBitmaps ( ibis::fileManager::storage st)
protected

Prepare bitmaps from the given storage object.

Used by constructors to initialize the array bits based on the content of offset32 and offset64. The member variable nrows is expected to be set to the correct value as well.

References ibis::fileManager::storage::isFileMap(), ibis::bitvector::set(), ibis::bitvector::size(), and ibis::bitvector::sloppySize().

void ibis::index::initBitmaps ( uint32_t *  st)
protected

Prepare bitmaps from the given raw pointer.

Used by constructors to initialize the array bits after the content of offset32 and offset64 have been initialized correctly. It expects all bitmaps are serialized and packed into this single array.

The member variable nrows is expected to be set to the correct value as well.

References ibis::bitvector::size(), and ibis::bitvector::sloppySize().

void ibis::index::initBitmaps ( void *  ctx,
FastBitReadBitmaps  rd 
)
protected

Prepare bitmaps from the user provided function pointer and context.

This is intended for reading serialized bitmaps placed in a more complex setting, however, we still view the content as if it is written as 1-D array.

int ibis::index::initOffsets ( int64_t *  buf,
size_t  noffsets 
)
protected

Initialize the offsets from the given data array.

Note
The incoming argument buf is from the write function that stores offsets in the number 4-byte words and need to translated into offsets in bytes.

References ibis::array_t< T >::print().

Referenced by ibis::bin::bin(), and ibis::relic::relic().

int ibis::index::initOffsets ( int  fdes,
const char  offsize,
size_t  start,
uint32_t  nobs 
)
protected

Read in the offset array.

The offset size has been read by the caller and so has the number of bitmaps to expect.

int ibis::index::initOffsets ( ibis::fileManager::storage st,
size_t  start,
uint32_t  nobs 
)
protected

Regenerate the offsets array from the given storage object.

It determines the size of offsets based on the 7th bytes in the storage object, and the offset size is expected to be either 4-byte or 8-byte. Unexpected offset size will cause an exception to be raised. It is to be used by the constructors of a concrete index classes.

References ibis::fileManager::storage::begin().

bool ibis::index::isIndex ( const char *  f,
INDEX_TYPE  t 
)
static

Is the named file an index file? Read the header of the named file to determine if it contains an index of the specified type.

Returns true if the correct header is found, else return false.

References ibis::util::logMessage(), and UnixOpen.

Referenced by ibis::bak::read(), and ibis::bak2::read().

template<typename E1 , typename E2 >
void ibis::index::mapValues ( const array_t< E1 > &  val1,
const array_t< E2 > &  val2,
array_t< E1 > &  bnd1,
array_t< E2 > &  bnd2,
std::vector< uint32_t > &  cnts 
)
static

Compute a two-dimensional histogram.

Given two arrays of the same size, count the number of appearance of each combinations defined by bnd1 and bnd2. If the arrays bnd1 or bnd2 contain values in ascending order, their values are directly used in this function. The empty one will be replaced by a linear division of actual range into 256 bins. The array counts stores the 2-D bins in raster-scan order with the second variable, val2, as the faster varying variables. More specifically, the bins for variable 1 are defined as:

/// (..., bnd1[0]) [bnd1[1], bin1[2]) [bnd1[2], bin1[3) ... [bnd1.back(), ...)
/// 

Note that '[' denote the left boundary is inclusive and ')' denote the right boundary is exclusive. Similarly, the bins for variable 2 are

/// (..., bnd2[0]) [bnd2[1], bin2[2]) [bnd2[2], bin2[3) ... [bnd2.back(), ...)
/// 

The counts are for the following bins

/// (..., bin1[0]) (.... bin2[0])
/// (..., bin1[0]) [bin2[0], bin2[1])
/// (..., bin1[0]) [bin2[1], bin2[2])
/// ...
/// (..., bin1[0]) [bin2.back(), ...)
/// [bin1[0], bin1[1]) (..., bin2[0])
/// [bin1[0], bin1[1]) [bin2[0], bin2[1])
/// [bin1[0], bin1[1]) [bin2[1], bin2[2])
/// ...
/// [bin1[0], bin1[1]) [bin2.back(), ...)
/// ...
/// 

References ibis::array_t< T >::find(), ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), and ibis::array_t< T >::size().

void ibis::index::mapValues ( const char *  f,
VMap &  bmap 
) const
protected

Map the positions of each individual value.

Map the locations of the values of one column.

Given a file containing the values of a column, this function maps the position of each individual values and stores the result in a set of bitmaps.

Note
IMPROTANT ASSUMPTION. A value of any supported type is supposed to be able to fit in a double with no rounding, no approximation and no overflow.

References ibis::bitvector::adjustSize(), ibis::bitvector::bitsPerLiteral(), ibis::CATEGORY, ibis::DOUBLE, ibis::FLOAT, ibis::fileManager::getFile(), ibis::util::getFileSize(), ibis::bitvector::indexSet::indices(), ibis::fileManager::instance(), ibis::INT, ibis::bitvector::indexSet::isRange(), ibis::LONG, ibis::bitvector::indexSet::nIndices(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::bitvector::setBit(), ibis::SHORT, ibis::array_t< T >::size(), ibis::bitvector::size(), ibis::horometer::start(), ibis::horometer::stop(), ibis::TEXT, ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

void ibis::index::mapValues ( const char *  f,
histogram &  hist,
uint32_t  count = 0 
) const
protected

Generate a histogram.

Compute a histogram of a column.

Given a property file containing the values of a column, this function counts the occurances of each distinct values. Argument count is the number of samples to be used for building the histogram. If it is zero or greater than half of the number of values in the data files, all values are used, otherwise, approximately count values will be sampled with nearly uniform distances from each other.

Note
IMPROTANT ASSUMPTION. A value of any supported type is supposed to be able to fit in a double with no rounding, no approximation and no overflow.

References ibis::bitvector::adjustSize(), ibis::bitvector::appendFill(), ibis::bitvector::bitsPerLiteral(), ibis::CATEGORY, ibis::DOUBLE, ibis::FLOAT, ibis::fileManager::getFile(), ibis::bitvector::indexSet::indices(), ibis::fileManager::instance(), ibis::INT, ibis::bitvector::indexSet::isRange(), ibis::LONG, ibis::bitvector::indexSet::nIndices(), ibis::util::rand(), ibis::horometer::realTime(), ibis::SHORT, ibis::array_t< T >::size(), ibis::bitvector::size(), ibis::horometer::start(), ibis::horometer::stop(), ibis::TEXT, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

virtual const char* ibis::index::name ( ) const
pure virtual
void ibis::index::optionalUnpack ( array_t< ibis::bitvector * > &  bits,
const char *  opt 
)
protected

A function to decide whether to uncompress the bitvectors.

Decide whether to uncompress the bitmaps.

References ibis::gParameters(), and ibis::array_t< T >::size().

Referenced by ibis::bin::bin(), ibis::keywords::keywords(), ibis::mesa::mesa(), and ibis::range::range().

virtual void ibis::index::print ( std::ostream &  out) const
pure virtual
virtual int ibis::index::read ( const char *  name)
pure virtual

Reconstructs an index from the named file.

The name can be the directory containing an index file. In this case, the name of the index file must be the name of the column followed by ".idx" suffix.

Implemented in ibis::bak2, ibis::bak, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::zona, ibis::bylt, ibis::fuzz, ibis::range, ibis::fade, ibis::skive, ibis::keywords, ibis::direkte, ibis::bin, and ibis::relic.

virtual int ibis::index::read ( ibis::fileManager::storage st)
pure virtual

Reconstructs an index from an array of bytes.

Intended for internal use only!

Implemented in ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::zona, ibis::bylt, ibis::fuzz, ibis::range, ibis::fade, ibis::skive, ibis::keywords, ibis::direkte, ibis::bin, and ibis::relic.

virtual void ibis::index::serialSizes ( uint64_t &  ,
uint64_t &  ,
uint64_t &   
) const
pure virtual

Compute the size of arrays that would be generated by the serializatioin function (write).

Implemented in ibis::keywords, ibis::direkte, ibis::bin, and ibis::relic.

void ibis::index::setBases ( array_t< uint32_t > &  bases,
uint32_t  card,
uint32_t  ncomp = 2 
)
static

Fill the array bases with the values that cover the range [0, card).

Assumes at least two components. Since the base size of each component can not be less two, the maximum number of components could be used is to have each component uses base size two. If the input argument ncomp is larger than ceiling(log_2(card)), the return array bases shall have ceiling(log_2(card)) elements.

References ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), ibis::array_t< T >::resize(), and ibis::array_t< T >::size().

Referenced by ibis::egale::egale().

float ibis::index::sizeInBytes ( ) const

Estiamte the size of this index object measured in bytes.

Do not intend to be precise, but should be good enough for operations such as comparing index size against base data size to determine which operation to use for answering a query.

References ibis::util::getFileSize().

void ibis::index::sumBins ( uint32_t  ib,
uint32_t  ie,
ibis::bitvector res 
) const

Sum up bits[ib:ie-1] and place the result in res.

The bitmaps (bits) are held by this index object and may be regenerated as needed. It uses the combined strategy that was determined in an series of earlier tests. The basic rule is as follows.

  • If there are two or less bit vectors, use |= operator directly.
  • Compute the total size of the bitmaps involved.
  • If the total size times log(number of bitvectors involved) is less than the size of an uncompressed bitvector, use a priority queue to store all input bitvectors and intermediate solutions,
  • or else, decompress the first bitvector and use inplace bitwise OR operator to complete the operations.

References ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::bitvector::flip(), ibis::util::logMessage(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

Referenced by ibis::category::patternSearch().

void ibis::index::sumBins ( uint32_t  ib,
uint32_t  ie,
ibis::bitvector res,
uint32_t  ib0,
uint32_t  ie0 
) const

Compute a new sum for bit vectors [ib, ie) by taking advantage of the old sum for bitvectors [ib0, ie0).

This function attempts to take advantage of existing results of a previously computed sum.

  • On input, res = sum_{i=ib0}^{ie0-1} bits[i].
  • On exit, res = sum_{i=ib}^{ie-1} bits[i].

References ibis::bitvector::bytes(), ibis::bitvector::cnt(), and ibis::bitvector::size().

void ibis::index::sumBins ( uint32_t  ib,
uint32_t  ie,
ibis::bitvector res,
uint32_t *  buf 
) const

Sum up bits[ib:ie-1] and place the result in res.

The bitmaps (bits) are stored in the argument buf and have to be regenerated based on the information in offset64. The basic rule is as follows.

  • If there are two or less bit vectors, use |= operator directly.
  • Compute the total size of the bitmaps involved.
  • If the total size times log(number of bitvectors involved) is less than the size of an uncompressed bitvector, use a priority queue to store all input bitvectors and intermediate solutions,
  • or else, decompress the first bitvector and use inplace bitwise OR operator to complete the operations.

References ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::clear(), ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::horometer::realTime(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

void ibis::index::sumBits ( const array_t< bitvector * > &  pile,
uint32_t  ib,
uint32_t  ie,
ibis::bitvector res 
)
static

Sum up pile[ib:ie-1] and place the result in res.

Note
This function either uses pile[ib:ie-1] or pile[0:ib-1] and pile[ie:nobs-1] depending which set has more bit vectors! This requires the caller to determine with set of them are to be used and then load them to memory before calling this function.
This function always uses the operator |=. Tests show that using the function setBit is always slower.

References ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::compress(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::bitvector::flip(), ibis::util::logMessage(), ibis::array_t< T >::push_back(), ibis::horometer::realTime(), ibis::array_t< T >::reserve(), ibis::bitvector::set(), ibis::array_t< T >::size(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

Referenced by ibis::part::adaptiveFloatsDetailed(), ibis::part::adaptiveIntsDetailed(), and ibis::mesa::mesa().

void ibis::index::sumBits ( const array_t< bitvector * > &  pile,
const ibis::bitvector tot,
uint32_t  ib,
uint32_t  ie,
ibis::bitvector res 
)
static

Sum up pile[ib:ie-1] and add the result to res.

It is assumed that all pile add up to tot. In the other version of sumBits without this argument tot, it was assumed that all bitmaps add up to a bit vector of all ones. The decision of whether to use pile[ib:ie-1] directly or use the subtractive version (with pile[0:ib-1] and pile[ie:nobs-1]) are based on the number of bit vectors. The caller is responsible to ensuring the necessary bitmaps are already in memory before calling this function.

References ibis::bitvector::bitsPerLiteral(), ibis::bitvector::bytes(), ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::horometer::CPUTime(), ibis::bitvector::decompress(), ibis::horometer::realTime(), ibis::bitvector::set(), ibis::array_t< T >::size(), ibis::bitvector::size(), ibis::horometer::start(), and ibis::horometer::stop().

virtual float ibis::index::undecidable ( const ibis::qContinuousRange expr,
ibis::bitvector iffy 
) const
inlinevirtual

Mark the position of the rows that can not be decided with this index.

Parameters
exprthe range conditions to be evaluated.
iffythe bitvector marking the positions of rows that can not be decided using the index. Return value is the expected fraction of undecided rows that might satisfy the range conditions.

Reimplemented in ibis::egale, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, ibis::range, ibis::keywords, ibis::bin, ibis::relic, and ibis::direkte.

virtual int ibis::index::write ( const char *  name) const
pure virtual

Save index to a file.

Outputs the index in a compact binary format to the named file or directory. The index file contains a header that can be identified by the function isIndex.

Implemented in ibis::bak2, ibis::bak, ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::zona, ibis::bylt, ibis::mesa, ibis::fuzz, ibis::range, ibis::sapid, ibis::sbiad, ibis::fade, ibis::slice, ibis::skive, ibis::keywords, ibis::direkte, ibis::bin, and ibis::relic.

Referenced by ibis::column::append(), and create().

Member Data Documentation

uint32_t ibis::index::nrows
protected
array_t<int64_t> ibis::index::offset64
mutableprotected

Starting positions of the bitvectors.

This is the 64-bit version of offset32 to deal with large indexes. All functions that requires these offsets will attempt to use the 64-bit first.

Referenced by ibis::direkte::append(), ibis::bylt::bylt(), ibis::skive::estimateCost(), ibis::fuge::fuge(), ibis::fuzz::fuzz(), operator=(), ibis::relic::relic(), and ibis::zona::zona().

ibis::fileManager::storage* ibis::index::str
mutableprotected

The underlying storage.

It may be nil if bitvectors are not from a storage object managed by the file manager.

Referenced by ibis::direkte::append(), and operator=().


The documentation for this class was generated from the following files:

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive