Classes | Public Types | Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes | Friends | List of all members
ibis::bin Class Reference

The equality encoded bitmap index with binning. More...

#include <ibin.h>

Inheritance diagram for ibis::bin:
ibis::index ibis::ambit ibis::bak ibis::bak2 ibis::egale ibis::fuge ibis::mesa ibis::pack ibis::pale ibis::range ibis::zone

Classes

struct  comparevalpos
 The comparator used to build a min-heap based on positions. More...
 
struct  granule
 A data structure to assist the mapping of values to lower precisions. More...
 
struct  valpos
 A list of values and their positions. More...
 

Public Types

typedef std::map< double, granule * > granuleMap
 
- Public Types inherited from ibis::index
typedef std::map< double, uint32_t > histogram
 
enum  INDEX_TYPE {
  BINNING =0, RANGE, MESA, AMBIT,
  PALE, PACK, ZONE, RELIC,
  ROSTER, SKIVE, FADE, SBIAD,
  SAPID, EGALE, MOINS, ENTRE,
  BAK, BAK2, KEYWORDS, MESH,
  BAND, DIREKTE, GENERIC, BYLT,
  FUZZ, ZONA, FUGE, SLICE,
  EXTERN
}
 The integer values of this enum type are used in the index files to differentiate the indexes. More...
 
typedef std::map< double, ibis::bitvector * > VMap
 

Public Member Functions

virtual long append (const char *dt, const char *df, uint32_t nnew)
 Create index for the data in df and append the result to the index in dt.
 
long append (const ibis::bin &tail)
 Append the tail to this index.
 
long append (const array_t< uint32_t > &ind)
 Append a list of integers representing bin numbers.
 
 bin (const ibis::bin &rhs)
 Copy constructor. It performs a deep copy.
 
 bin (const ibis::column *c=0, const char *f=0)
 Constructor. Construct a bitmap index from current data.
 
 bin (const ibis::column *c, ibis::fileManager::storage *st, size_t offset=8)
 
 bin (const ibis::column *c, const char *f, const array_t< double > &bd)
 Constructor. Construct an index with the given bin boundaries.
 
 bin (const ibis::column *c, const char *f, const std::vector< double > &bd)
 Constructor. Construct an index with the given bin boundaries.
 
 bin (const ibis::column *c, uint32_t nb, double *keys, int64_t *offs)
 Reconstruct an object from keys and offsets.
 
 bin (const ibis::column *c, uint32_t nb, double *keys, int64_t *offs, uint32_t *bms)
 Reconstruct an object from keys and offsets.
 
 bin (const ibis::column *c, uint32_t nb, double *keys, int64_t *offs, void *bms, FastBitReadBitmaps rd)
 Reconstruct an object from keys and offsets.
 
virtual void binBoundaries (std::vector< double > &) const
 The function binBoundaries and binWeights return bin boundaries and counts of each bin respectively. More...
 
virtual void binWeights (std::vector< uint32_t > &) const
 
long checkBin (const ibis::qRange &cmp, uint32_t jbin, ibis::bitvector &res) const
 Candidate check using the binned values. More...
 
long checkBin (const ibis::qRange &cmp, uint32_t jbin, const ibis::bitvector &mask, ibis::bitvector &res) const
 Candidate check using the binned values. More...
 
void construct (const char *)
 Construct a binned bitmap index. More...
 
template<typename E >
void construct (const array_t< E > &varr)
 Construction function for in-memory data. More...
 
virtual int contractRange (ibis::qContinuousRange &rng) const
 
virtual indexdup () const
 Duplicate the content of an index object.
 
virtual void estimate (const ibis::qContinuousRange &expr, ibis::bitvector &lower, ibis::bitvector &upper) const
 Provide an estimation based on the current index. More...
 
virtual uint32_t estimate (const ibis::qContinuousRange &expr) const
 Compute an upper bound on the number of hits.
 
virtual void estimate (const ibis::deprecatedJoin &expr, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the hits for symmetric joins. More...
 
virtual void estimate (const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 
virtual void estimate (const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Evaluating a join condition with one (likely composite) index.
 
virtual int64_t estimate (const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2) const
 
virtual void estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the number of hits for nonsymmetric joins.
 
virtual void estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 
virtual void estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 
virtual int64_t estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr) const
 
virtual int64_t estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask) const
 
virtual int64_t estimate (const ibis::bin &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2) const
 
virtual double estimateCost (const ibis::qContinuousRange &expr) const
 Estimate the cost of evaluating a range condition.
 
virtual double estimateCost (const ibis::qDiscreteRange &expr) const
 Estimate the cost of evaluating a range condition.
 
virtual long evaluate (const ibis::qContinuousRange &expr, ibis::bitvector &hits) const
 To evaluate the exact hits. More...
 
virtual long evaluate (const ibis::qDiscreteRange &expr, ibis::bitvector &hits) const
 To evaluate the exact hits. More...
 
virtual int expandRange (ibis::qContinuousRange &rng) const
 The functions expandRange and contractRange expands or contracts the boundaries of a range condition so that the new range will have exact answers using the function estimate. More...
 
virtual long getCumulativeDistribution (std::vector< double > &bds, std::vector< uint32_t > &cts) const
 Compute the cumulative distribution from the binned index.
 
virtual long getDistribution (std::vector< double > &bbs, std::vector< uint32_t > &cts) const
 Compute a histogram from the binned index.
 
virtual double getMax () const
 Compute the actual maximum value from the binned index.
 
virtual double getMin () const
 Compute the actual minimum value from the binned index.
 
virtual double getSum () const
 Compute the approximate value of the sum from the binned index.
 
array_t< uint32_t > * indices (const ibis::bitvector &mask) const
 
virtual void locate (const ibis::qContinuousRange &expr, uint32_t &cand0, uint32_t &cand1) const
 Find the outer boundaries of the range expression. More...
 
virtual void locate (const ibis::qContinuousRange &expr, uint32_t &cand0, uint32_t &cand1, uint32_t &hit0, uint32_t &hit1) const
 Find the bins related to the range expression. More...
 
virtual const char * name () const
 Returns the name of the index, similar to the function type, but returns a string instead. More...
 
virtual uint32_t numBins () const
 
virtual void print (std::ostream &out) const
 Prints human readable information. More...
 
virtual int read (const char *idxfile)
 Read from a file named f.
 
virtual int read (ibis::fileManager::storage *st)
 Read from a reference counted piece of memory.
 
int read (int fdes, size_t offset, const char *fname, const char *header)
 Read an ibis::bin embedded inside a file. More...
 
virtual long select (const ibis::qContinuousRange &, void *) const
 Select the rows that satisfy the range condition. More...
 
virtual long select (const ibis::qContinuousRange &, void *, ibis::bitvector &) const
 Select the rows that satisfy the range condition. More...
 
virtual void serialSizes (uint64_t &, uint64_t &, uint64_t &) const
 Compute the size of arrays that would be generated by the serializatioin function (write). More...
 
virtual void speedTest (std::ostream &out) const
 Time some logical operations and print out their speed.
 
virtual INDEX_TYPE type () const
 Returns an index type identifier.
 
virtual float undecidable (const ibis::qContinuousRange &expr, ibis::bitvector &iffy) const
 Mark the position of the rows that can not be decided with this index. More...
 
virtual int write (ibis::array_t< double > &, ibis::array_t< int64_t > &, ibis::array_t< uint32_t > &) const
 Save index to three arrays. Serialize the index in memory.
 
virtual int write (const char *dt) const
 Write the index to the named directory or file.
 
- Public Member Functions inherited from ibis::index
void addBins (uint32_t ib, uint32_t ie, ibis::bitvector &res) const
 Add the sum of bits[ib] through bits[ie-1] to res. More...
 
void addBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, const ibis::bitvector &tot) const
 Compute the sum of bit vectors [ib, ie). More...
 
bool empty () const
 The index object is considered empty if there is no bitmap or getNRows returns 0. More...
 
virtual void estimate (const ibis::qDiscreteRange &expr, ibis::bitvector &lower, ibis::bitvector &upper) const
 Estimate the hits for discrete ranges, i.e., those translated from 'a IN (x, y, ..)'. More...
 
virtual uint32_t estimate (const ibis::qDiscreteRange &expr) const
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the pairs for the range join operator.
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 Estimate the pairs for the range join operator. More...
 
virtual void estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2, ibis::bitvector64 &lower, ibis::bitvector64 &upper) const
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr) const
 Estimate an upper bound for the number of pairs.
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask) const
 Estimate an upper bound for the number of pairs produced from marked records. More...
 
virtual int64_t estimate (const ibis::index &idx2, const ibis::deprecatedJoin &expr, const ibis::bitvector &mask, const ibis::qRange *const range1, const ibis::qRange *const range2) const
 
virtual const ibis::bitvectorgetBitvector (uint32_t i) const
 Return a pointer to the ith bitvector used in the index (may be 0).
 
uint32_t getNRows () const
 Return the number of rows represented by this object.
 
virtual uint32_t numBitvectors () const
 Returns the number of bit vectors used by the index.
 
float sizeInBytes () const
 Estiamte the size of this index object measured in bytes. More...
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res) const
 Sum up bits[ib:ie-1] and place the result in res. More...
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, uint32_t ib0, uint32_t ie0) const
 Compute a new sum for bit vectors [ib, ie) by taking advantage of the old sum for bitvectors [ib0, ie0). More...
 
void sumBins (uint32_t ib, uint32_t ie, ibis::bitvector &res, uint32_t *buf) const
 Sum up bits[ib:ie-1] and place the result in res. More...
 
void sumBins (const ibis::array_t< uint32_t > &, ibis::bitvector &) const
 Sum up the bits in in the specified bins.
 
virtual float undecidable (const ibis::qDiscreteRange &expr, ibis::bitvector &iffy) const
 
virtual ~index ()
 The destructor.
 

Static Public Member Functions

static uint32_t parseNbins (const ibis::column &)
 Parse the index specs to determine eqw and nbins. More...
 
static unsigned parsePrec (const ibis::column &)
 Parse the index spec to extract precision.
 
static unsigned parseScale (const ibis::column &)
 Parse the specification about scaling. More...
 
static unsigned parseScale (const char *)
 
- Static Public Member Functions inherited from ibis::index
static void addBits (const array_t< bitvector * > &bits, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Add the pile[ib:ie-1] to res. More...
 
static indexcreate (const column *c, const char *name=0, const char *spec=0, int inEntirety=0)
 Index factory. More...
 
static void divideCounts (array_t< uint32_t > &bounds, const array_t< uint32_t > &cnt)
 Determine how to split the array cnt, so that each group has roughly the same total value. More...
 
static bool isIndex (const char *f, INDEX_TYPE t)
 Is the named file an index file? Read the header of the named file to determine if it contains an index of the specified type. More...
 
template<typename E >
static void mapValues (const array_t< E > &val, VMap &bmap)
 
template<typename E >
static void mapValues (const array_t< E > &val, histogram &hist, uint32_t count=0)
 
template<typename E >
static void mapValues (const array_t< E > &val, array_t< E > &bounds, std::vector< uint32_t > &cnts)
 
template<typename E1 , typename E2 >
static void mapValues (const array_t< E1 > &val1, const array_t< E2 > &val2, array_t< E1 > &bnd1, array_t< E2 > &bnd2, std::vector< uint32_t > &cnts)
 Compute a two-dimensional histogram. More...
 
static void printHeader (std::ostream &, const char *)
 
static void setBases (array_t< uint32_t > &bases, uint32_t card, uint32_t nbase=2)
 Fill the array bases with the values that cover the range [0, card). More...
 
static void sumBits (const array_t< bitvector * > &bits, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Sum up pile[ib:ie-1] and place the result in res. More...
 
static void sumBits (const array_t< bitvector * > &bits, const ibis::bitvector &tot, uint32_t ib, uint32_t ie, ibis::bitvector &res)
 Sum up pile[ib:ie-1] and add the result to res. More...
 

Protected Member Functions

void addBounds (double lbd, double rbd, uint32_t nbins, uint32_t eqw)
 The function used by setBoudaries() to actually generate the bounds.
 
virtual void adjustLength (uint32_t nrows)
 Fill the bitmaps to the specified size. More...
 
 bin (const ibis::column *c, const uint32_t nbits, ibis::fileManager::storage *st, size_t offset=8)
 !< The minimal values in each bin. More...
 
void binning (const char *f, const std::vector< double > &bd)
 Generate bins according to the specified boundaries. More...
 
void binning (const char *f, const array_t< double > &bd)
 
void binning (const char *f)
 Read the data file and partition the values into bins according to the specified bin boundary. More...
 
template<typename E >
void binning (const array_t< E > &varr)
 
template<typename E >
void binning (const array_t< E > &varr, const array_t< double > &bd)
 
template<typename E >
void binningT (const char *fname)
 Read the data file, partition the values, and write out the bin ordered data with .bin suffix. More...
 
long binOrder (const char *fname) const
 Write bin-ordered values.
 
template<typename E >
long binOrderT (const char *fname) const
 Write bin-ordered values.
 
template<typename E >
long checkBin0 (const ibis::qRange &cmp, uint32_t jbin, ibis::bitvector &res) const
 
template<typename E >
long checkBin1 (const ibis::qRange &cmp, uint32_t jbin, const ibis::bitvector &mask, ibis::bitvector &res) const
 
virtual void clear ()
 Clear the existing content. More...
 
virtual double computeSum () const
 Compute the sum of values from the information in the index. More...
 
void convertGranules (granuleMap &gmap)
 Convert the granule map into binned index. More...
 
void divideBitmaps (const array_t< bitvector * > &bms, std::vector< unsigned > &parts) const
 Partition the bitmaps into groups of takes about the same amount of storage. More...
 
virtual size_t getSerialSize () const throw ()
 Compute the size of the serialized version of the index. More...
 
virtual uint32_t locate (const double &val) const
 Find the bin containing val. More...
 
template<typename E >
void mapGranules (const array_t< E > &, granuleMap &gmap) const
 
template<typename T >
long mergeValues (const ibis::qContinuousRange &, ibis::array_t< T > &) const
 Extract values only. More...
 
template<typename T >
long mergeValues (const ibis::qContinuousRange &, ibis::array_t< T > &, ibis::bitvector &) const
 Extract values and record the positions. More...
 
void printGranules (std::ostream &out, const granuleMap &gmap) const
 
void readBinBoundaries (const char *name, uint32_t nb)
 Read a file containing a list of floating-point numbers. More...
 
template<typename E >
void scanAndPartition (const array_t< E > &, unsigned)
 
void scanAndPartition (const char *, unsigned, uint32_t nbins=0)
 Partition the range based on the (approximate) histogram of the data. More...
 
void setBoundaries (const char *f)
 Set bin boundaries. More...
 
void setBoundaries (array_t< double > &bnds, const ibis::bin &bin0) const
 
void setBoundaries (array_t< double > &bnds, const ibis::bin &idx1, const array_t< uint32_t > cnt1, const array_t< uint32_t > cnt0) const
 
template<typename E >
void setBoundaries (const array_t< E > &varr)
 
void swap (bin &rhs)
 Swap the content of the index.
 
int write32 (int fptr) const
 Write the content to a file already open.
 
int write64 (int fptr) const
 write the content to a file already open.
 
- Protected Member Functions inherited from ibis::index
virtual void activate () const
 Regenerate all bitvectors from the underlying storage. More...
 
virtual void activate (uint32_t i) const
 Regenerate the ith bitvector from the underlying storage.
 
virtual void activate (uint32_t i, uint32_t j) const
 Regenerate bitvectors i (inclusive) through j (exclusive) from the underlying storage. More...
 
void computeMinMax (const char *f, double &min, double &max) const
 
void dataFileName (std::string &name, const char *f=0) const
 Generate data file name from "f". More...
 
 index (const ibis::column *c=0)
 Default constructor. More...
 
 index (const ibis::column *c, ibis::fileManager::storage *s)
 Constructor with a storage object. More...
 
 index (const index &)
 Copy constructor.
 
void indexFileName (std::string &name, const char *f=0) const
 Generates index file name from "f". More...
 
void initBitmaps (int fdes)
 Prepare the bitmaps using the given file descriptor. More...
 
void initBitmaps (ibis::fileManager::storage *st)
 Prepare bitmaps from the given storage object. More...
 
void initBitmaps (uint32_t *st)
 Prepare bitmaps from the given raw pointer. More...
 
void initBitmaps (void *ctx, FastBitReadBitmaps rd)
 Prepare bitmaps from the user provided function pointer and context. More...
 
int initOffsets (int64_t *, size_t)
 Initialize the offsets from the given data array. More...
 
int initOffsets (int fdes, const char offsize, size_t start, uint32_t nobs)
 Read in the offset array. More...
 
int initOffsets (ibis::fileManager::storage *st, size_t start, uint32_t nobs)
 Regenerate the offsets array from the given storage object. More...
 
void mapValues (const char *f, VMap &bmap) const
 Map the positions of each individual value. More...
 
void mapValues (const char *f, histogram &hist, uint32_t count=0) const
 Generate a histogram. More...
 
indexoperator= (const index &)
 Assignment operator.
 
void optionalUnpack (array_t< ibis::bitvector * > &bits, const char *opt)
 A function to decide whether to uncompress the bitvectors. More...
 

Protected Attributes

array_t< double > bounds
 !< Number of bitvectors.
 
array_t< double > maxval
 !< The nominal boundaries.
 
array_t< double > minval
 !< The maximal values in each bin.
 
uint32_t nobs
 
- Protected Attributes inherited from ibis::index
array_t< ibis::bitvector * > bits
 A list of bitvectors.
 
bitmapReaderbreader
 The functor to read serialized bitmaps from a more complex source.
 
const ibis::columncol
 Pointer to the column this index is for.
 
const char * fname
 The name of the file containing the index.
 
uint32_t nrows
 The number of rows represented by the index. More...
 
array_t< int32_t > offset32
 Starting positions of the bitvectors.
 
array_t< int64_t > offset64
 Starting positions of the bitvectors. More...
 
ibis::fileManager::storagestr
 The underlying storage. More...
 

Friends

class ibis::ambit
 
class ibis::band
 
class ibis::mesa
 
class ibis::mesh
 
class ibis::pack
 
class ibis::pale
 
class ibis::range
 
class ibis::zone
 

Additional Inherited Members

- Static Protected Member Functions inherited from ibis::index
static void indexFileName (std::string &name, const ibis::column *col1, const ibis::column *col2, const char *f=0)
 Generate the index file name for the composite index fromed on two columns. More...
 

Detailed Description

The equality encoded bitmap index with binning.

The exact bin boundary assignment is controlled by indexing options '<binning ... />'.

The 0th bit vector represents x < bounds[0]; The (nobs-1)st bit vector represents x >= bounds[nobs-2]; The ith bit vector represents bounds[i-1] <= x < bounds[i], (0 < i < nbos-1).

Constructor & Destructor Documentation

ibis::bin::bin ( const ibis::column c,
const uint32_t  nbits,
ibis::fileManager::storage st,
size_t  start = 8 
)
protected

!< The minimal values in each bin.

Constructor.

A constructor to accommodate multicomponent encodings.

Reconstruct from content of fileManager::storage.

The common portion of the index files writen by derived classes of bin
8-byte header
nrows(uint32_t) -- number of bits in each bit vector
nobs (uint32_t) -- number of bit vectors
offsets (intxx_t[nobs+1]) -- the starting positions of the bit sequences
(i.e., bit vectors) plus the end position of the last one
(padding to ensure the next data element is on 8-byte boundary)
bounds (double[nobs]) -- the end positions (right sides) of the bins
maxval (double[nobs]) -- the maximum value of all values fall in the bin
minval (double[nobs]) -- the minimum value of all those fall in the bin
the bit sequences as bit vectors
@encdoe */
size_t start)
: ibis::index(c, st),
nobs(*(reinterpret_cast<uint32_t*>(st->begin()+start+sizeof(uint32_t)))),
bounds(st, 8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8),
8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8) +
sizeof(double)*nobs),
maxval(st, 8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8) +
sizeof(double)*nobs,
8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8) +
sizeof(double)*nobs*2),
minval(st, 8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8) +
sizeof(double)*nobs*2,
8*((start+(*st)[6]*(nobs+1)+2*sizeof(uint32_t)+7)/8) +
sizeof(double)*nobs*3) {
try {
nrows = *(reinterpret_cast<uint32_t*>(st->begin()+start));
// LOGGER(c->partition()->getState() == ibis::part::STABLE_STATE &&
// nrows != c->partition()->nRows() && ibis::gVerbose > 2)
// << "Warning -- bin[" << col->fullname() << "]::bin found nrows ("
// << nrows << ") to be different from that of the data partition "
// << c->partition()->name() << " (" << c->partition()->nRows() << ")";
int ierr = initOffsets(st, start+2*sizeof(uint32_t), nobs);
if (ierr < 0) {
<< "Warning -- bin[" << (col ? col->fullname() : "?.?")
<< "]::bin failed to initialize bitmap offsets"
<< " from storage object @ " << st << " with start = " << start
<< ", ierr = " << ierr;
throw "bin::ctor failed to initOffsets from storage" IBIS_FILE_LINE;
}
initBitmaps(st);
if (ibis::gVerbose > 2) {
lg() << "bin[" << (col ? col->fullname() : "?.?")
<< "]::ctor -- initialization completed with "
<< nobs << " bin" << (nobs>1?"s":"") << " for "
<< nrows << " row" << (nrows>1?"s":"")
<< " from a storage object @ " << st << " offset " << start;
if (ibis::gVerbose > 6) {
lg() << "\n";
print(lg());
}
}
}
catch (...) {
clear();
throw;
}
} // ibis::bin::bin

References ibis::fileManager::storage::begin(), ibis::index::col, ibis::column::fullname(), ibis::index::initBitmaps(), ibis::index::initOffsets(), ibis::index::nrows, and print().

Member Function Documentation

void ibis::bin::adjustLength ( uint32_t  nr)
protectedvirtual

Fill the bitmaps to the specified size.

Fill the bitvectors with zeros so that they all contain nrows bits.

Truncate the bitvectors if they have more bits.

Reimplemented in ibis::fuge, ibis::zone, ibis::pack, ibis::pale, and ibis::ambit.

References ibis::bitvector::adjustSize().

Referenced by ibis::ambit::adjustLength(), ibis::pale::adjustLength(), ibis::pack::adjustLength(), ibis::zone::adjustLength(), and ibis::fuge::adjustLength().

void ibis::bin::binBoundaries ( std::vector< double > &  ) const
virtual

The function binBoundaries and binWeights return bin boundaries and counts of each bin respectively.

Reimplemented from ibis::index.

Reimplemented in ibis::bak2, ibis::bak, ibis::egale, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

void ibis::bin::binning ( const char *  f,
const std::vector< double > &  bd 
)
protected

Generate bins according to the specified boundaries.

This version of the binning function takes an external specified bin boundaries – if the array is too small to be valid, it uses the default option.

Note
This function does not attempt to clear the content of the current data structure, the caller is responsible for this task!

References ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::SHORT, ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

Referenced by append(), bin(), and ibis::egale::egale().

void ibis::bin::binning ( const char *  f)
protected
template<typename E >
void ibis::bin::binningT ( const char *  fname)
protected
long ibis::bin::checkBin ( const ibis::qRange cmp,
uint32_t  jbin,
ibis::bitvector res 
) const

Candidate check using the binned values.

Returns the number of hits if successful, otherwise it returns a negative value.

References ibis::bitvector::clear(), ibis::horometer::CPUTime(), ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::horometer::realTime(), ibis::SHORT, ibis::horometer::start(), ibis::horometer::stop(), ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

long ibis::bin::checkBin ( const ibis::qRange cmp,
uint32_t  jbin,
const ibis::bitvector mask,
ibis::bitvector res 
) const

Candidate check using the binned values.

The bitvector mask marks the actual values in the bin (because the bitmaps stored in bits do not directly corresponds to the bin).

References ibis::bitvector::clear(), ibis::bitvector::cnt(), ibis::horometer::CPUTime(), ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::horometer::realTime(), ibis::SHORT, ibis::bitvector::size(), ibis::horometer::start(), ibis::horometer::stop(), ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

void ibis::bin::clear ( )
protectedvirtual

Clear the existing content.

Free the bitmap objectes common to all index objects.

Reimplemented from ibis::index.

Reimplemented in ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, and ibis::ambit.

References ibis::index::clear().

Referenced by append(), bin(), ibis::ambit::clear(), ibis::pale::clear(), ibis::pack::clear(), ibis::zone::clear(), ibis::fuge::clear(), ibis::egale::clear(), ibis::mesa::mesa(), and ibis::range::range().

double ibis::bin::computeSum ( ) const
protectedvirtual

Compute the sum of values from the information in the index.

Compute the approximate sum of all values using the binned index.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::pack, ibis::ambit, ibis::mesa, and ibis::range.

void ibis::bin::construct ( const char *  df)

Construct a binned bitmap index.

It reads data from disk. The arguement df can be the name of the directory containing the data or the data file name. The actual file name is determined by the function ibis::column::dataFilename.

This construction function is designed to handle the full spectrum of binning specifications.

References ibis::DOUBLE, ibis::FLOAT, ibis::fileManager::getFile(), ibis::gParameters(), ibis::fileManager::instance(), ibis::INT, ibis::LONG, ibis::util::reorder(), ibis::bitvector::set(), ibis::SHORT, ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

Referenced by bin(), ibis::range::construct(), and ibis::mesa::construct().

template<typename E >
void ibis::bin::construct ( const array_t< E > &  varr)

Construction function for in-memory data.

It reads the indexing option from using the function ibis::column::indexSpec.

References ibis::array_t< T >::size().

void ibis::bin::convertGranules ( granuleMap &  gmap)
protected

Convert the granule map into binned index.

The bitmaps that are not empty are transferred to the array bits, and the empty bitmaps are deleted. Therefore, the content of gmap is no longer valid after calling this function. The only thing that could be done to the granuleMap object is to free it.

References ibis::util::clear(), and ibis::util::compactValue().

void ibis::bin::divideBitmaps ( const array_t< bitvector * > &  bms,
std::vector< unsigned > &  parts 
) const
protected

Partition the bitmaps into groups of takes about the same amount of storage.

References ibis::array_t< T >::size().

Referenced by ibis::pack::pack(), and ibis::zone::zone().

void ibis::bin::estimate ( const ibis::qContinuousRange expr,
ibis::bitvector lower,
ibis::bitvector upper 
) const
virtual

Provide an estimation based on the current index.

Set bits in lower are hits for certain, set bits in upper are candidates. Set bits in (upper - lower) should be checked to verifies which ones are actually hits. If the bitvector upper contain less bits than bitvector lower, the content of upper is assumed to be the same as lower.

Note
This function will not do anything if the estimated cost is high.

Reimplemented from ibis::index.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::bitvector::clear(), ibis::bitvector::cnt(), ibis::bitvector::copy(), and ibis::bitvector::set().

void ibis::bin::estimate ( const ibis::deprecatedJoin expr,
ibis::bitvector64 lower,
ibis::bitvector64 upper 
) const
virtual

Estimate the hits for symmetric joins.

Evaluate the range join condition using the ibis::bin index.

Record the definite hits in lower, and all possible hits in upper. NOTE: upper includes all entries in lower.

References ibis::bitvector64::clear(), ibis::bitvector64::cnt(), ibis::horometer::CPUTime(), ibis::math::term::eval(), ibis::util::logMessage(), ibis::horometer::realTime(), ibis::bitvector64::set(), ibis::bitvector64::size(), ibis::horometer::start(), and ibis::horometer::stop().

long ibis::bin::evaluate ( const ibis::qContinuousRange expr,
ibis::bitvector hits 
) const
virtual

To evaluate the exact hits.

On success, return the number of hits, otherwise a negative value is returned.

Implements ibis::index.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::bitvector::cnt(), ibis::bitvector::copy(), ibis::bitvector::set(), and ibis::bitvector::size().

virtual long ibis::bin::evaluate ( const ibis::qDiscreteRange ,
ibis::bitvector  
) const
inlinevirtual

To evaluate the exact hits.

On success, return the number of hits, otherwise a negative value is returned.

Reimplemented from ibis::index.

Reimplemented in ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::index::evaluate().

int ibis::bin::expandRange ( ibis::qContinuousRange ) const
virtual

The functions expandRange and contractRange expands or contracts the boundaries of a range condition so that the new range will have exact answers using the function estimate.

The default implementation provided does nothing since this is only meaningful for indices based on bins.

Reimplemented from ibis::index.

Reimplemented in ibis::bak2, ibis::bak, and ibis::range.

References ibis::util::compactValue(), ibis::qContinuousRange::leftBound(), and ibis::qContinuousRange::rightBound().

size_t ibis::bin::getSerialSize ( ) const
throw (
)
protectedvirtual

Compute the size of the serialized version of the index.

Return the size in bytes.

Reimplemented from ibis::index.

Reimplemented in ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

void ibis::bin::locate ( const ibis::qContinuousRange expr,
uint32_t &  cand0,
uint32_t &  cand1 
) const
virtual

Find the outer boundaries of the range expression.

Locate the outer reaches of a continuous range expression.

Reimplemented in ibis::bak2, ibis::bak, and ibis::range.

References ibis::qContinuousRange::leftBound(), and ibis::qContinuousRange::rightBound().

Referenced by ibis::bak::expandRange(), ibis::bak2::expandRange(), ibis::range::locate(), ibis::bak::locate(), and ibis::bak2::locate().

void ibis::bin::locate ( const ibis::qContinuousRange expr,
uint32_t &  cand0,
uint32_t &  cand1,
uint32_t &  hit0,
uint32_t &  hit1 
) const
virtual

Find the bins related to the range expression.

Locate the bins for all candidates and hits.

Reimplemented in ibis::bak2, ibis::bak, and ibis::range.

References ibis::qContinuousRange::leftBound(), and ibis::qContinuousRange::rightBound().

uint32_t ibis::bin::locate ( const double &  val) const
protectedvirtual

Find the bin containing val.

Find the smallest i such that bounds[i] > val.

Reimplemented in ibis::bak2, ibis::bak, and ibis::range.

template<typename T >
long ibis::bin::mergeValues ( const ibis::qContinuousRange cmp,
ibis::array_t< T > &  vals 
) const
protected

Extract values only.

This function requires the clustered version of values to be present. The clustered version is created with the option 'reorder' in the binning specification. Currently, the clustered values are stored in a file with .bin extension.

References ibis::array_t< T >::clear(), ibis::qContinuousRange::inRange(), ibis::array_t< T >::read(), ibis::array_t< T >::reserve(), ibis::array_t< T >::resize(), ibis::array_t< T >::size(), and UnixOpen.

template<typename T >
long ibis::bin::mergeValues ( const ibis::qContinuousRange cmp,
ibis::array_t< T > &  vals,
ibis::bitvector hits 
) const
protected
virtual const char* ibis::bin::name ( ) const
inlinevirtual

Returns the name of the index, similar to the function type, but returns a string instead.

Implements ibis::index.

Reimplemented in ibis::bak2, ibis::bak, ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

uint32_t ibis::bin::parseNbins ( const ibis::column c)
static

Parse the index specs to determine eqw and nbins.

Parse the index specification to determine the number of bins, returns IBIS_DEFAULT_NBINS if it is not specified.

References ibis::gParameters(), ibis::part::indexSpec(), ibis::part::name(), and ibis::column::name().

unsigned ibis::bin::parseScale ( const ibis::column c)
static

Parse the specification about scaling.

Parse scale=xx (or the old form equalxx) in the index specification

  • 0 – simple linear scale (used when "scale=" is present, but is not "scale=linear" or "scale=log")
  • 1 – equal length [linear scale]
  • 2 – equal ratio [log scale]
  • 10 – equal weight
  • UINT_MAX – default value if no index specification is found.

References ibis::gParameters(), ibis::part::indexSpec(), ibis::part::name(), and ibis::column::name().

void ibis::bin::print ( std::ostream &  out) const
virtual

Prints human readable information.

Outputs information about the index as text to the specified output stream.

Implements ibis::index.

Reimplemented in ibis::bak2, ibis::bak, ibis::entre, ibis::moins, ibis::egale, ibis::fuge, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

Referenced by bin().

int ibis::bin::read ( int  fdes,
size_t  start,
const char *  fn,
const char *  header 
)

Read an ibis::bin embedded inside a file.

Read from a file starting from an arbitrary start position.

This is intended to be used by multi-level indices. The size of bitmap offsets are defined in header[6] and full index type is defined in header[5].

References ibis::util::clear(), ibis::fileManager::instance(), ibis::fileManager::recordPages(), and ibis::util::strnewdup().

void ibis::bin::readBinBoundaries ( const char *  fnm,
uint32_t  nb 
)
protected

Read a file containing a list of floating-point numbers.

If nb > 0, read first nb values or till the end of the file. The file contains one value in each line. Sine this function only reads the first value, the line may contain other thing after the value. The sharp '#' symbol is used to indicate comments in the file.

The file name can use either an absolute path or a relative path (relative to the current data directory of the data partition). The file name could be specified throught binFile="filename" option of a bin specification.

void ibis::bin::scanAndPartition ( const char *  f,
unsigned  eqw,
uint32_t  nbins = 0 
)
protected

Partition the range based on the (approximate) histogram of the data.

The optional argument nbins can either be set outside or set to be the return value of function parseNbins.

References ibis::array_t< T >::clear(), ibis::util::compactValue(), ibis::DOUBLE, ibis::FLOAT, ibis::util::incrDouble(), ibis::array_t< T >::push_back(), ibis::array_t< T >::reserve(), and ibis::array_t< T >::size().

long ibis::bin::select ( const ibis::qContinuousRange cmp,
void *  vals 
) const
virtual

Select the rows that satisfy the range condition.

Output the values in vals. The values are in unspecified order to reduce the amount of processing needed in this function – this follows the spirit of SQL standard.

Note
This function only works with integers and floating-point values.

Implements ibis::index.

Reimplemented in ibis::entre, ibis::moins, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::SHORT, ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

long ibis::bin::select ( const ibis::qContinuousRange cmp,
void *  vals,
ibis::bitvector hits 
) const
virtual

Select the rows that satisfy the range condition.

Output the rows in the bit vector hits and the corresponding values in vals.

Note
This function only works with integers and floating-point values.

Implements ibis::index.

Reimplemented in ibis::entre, ibis::moins, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::SHORT, ibis::TYPESTRING, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

void ibis::bin::serialSizes ( uint64_t &  ,
uint64_t &  ,
uint64_t &   
) const
virtual

Compute the size of arrays that would be generated by the serializatioin function (write).

Implements ibis::index.

void ibis::bin::setBoundaries ( const char *  f)
protected

Set bin boundaries.

Parse the index specification to determine the bin boundaries and store the result in member variable bounds.

The bin specification can be of the following, where all fields are optional.

  • equal([_-]?)[weight|length|ratio])
  • no=xx|nbins=xx|bins:(\[begin, end, no=xx\))+
  • <binning (start=begin end=end nbins=xx scale=[linear|log])* />
  • <binning binFile=file-name[, nbins=xx] />

The bin speficication can be read from the column object, the table object containing the column, or the global ibis::gParameters object under the name of table-name.column-name.index. If no index specification is found, this function attempts to generate approximate equal weight bins.

Note
If equal weight is specified, it takes precedence over other specifications.

References ibis::util::compactValue(), ibis::DOUBLE, ibis::FLOAT, and ibis::util::readString().

Referenced by ibis::egale::egale().

float ibis::bin::undecidable ( const ibis::qContinuousRange expr,
ibis::bitvector iffy 
) const
virtual

Mark the position of the rows that can not be decided with this index.

Parameters
exprthe range conditions to be evaluated.
iffythe bitvector marking the positions of rows that can not be decided using the index. Return value is the expected fraction of undecided rows that might satisfy the range conditions.

Reimplemented from ibis::index.

Reimplemented in ibis::egale, ibis::zone, ibis::pack, ibis::pale, ibis::ambit, ibis::mesa, and ibis::range.

References ibis::bitvector::copy(), ibis::qContinuousRange::leftBound(), ibis::qContinuousRange::rightBound(), and ibis::bitvector::set().


The documentation for this class was generated from the following files:

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive