Public Member Functions | Static Public Member Functions | Protected Member Functions | Protected Attributes | List of all members
ibis::tablex Class Referenceabstract

The class for expandable tables. More...

#include <table.h>

Inheritance diagram for ibis::tablex:
ibis::tafel

Public Member Functions

virtual int addColumn (const char *cname, ibis::TYPE_T ctype, const char *cdesc=0, const char *idx=0)=0
 Add a column.
 
virtual int append (const char *cname, uint64_t begin, uint64_t end, void *values)=0
 Add values to the named column. More...
 
virtual int appendRow (const ibis::table::row &)=0
 Add one row. More...
 
virtual int appendRow (const char *line, const char *delimiters=0)=0
 Append a row stored in ASCII form. More...
 
virtual int appendRows (const std::vector< ibis::table::row > &)=0
 Add multiple rows. More...
 
virtual uint32_t bufferCapacity () const
 Capacity of the memory buffer. More...
 
virtual void clearData ()=0
 Remove all data recorded. More...
 
virtual void describe (std::ostream &) const =0
 Print a description of the table to the specified output stream.
 
virtual const char * getASCIIDictionary (const char *) const =0
 Retrieve the name of the ASCII dictionary file associated with a column of categorical values. More...
 
virtual uint32_t getPartitionMax () const
 Get the recommended number of rows in a data partition.
 
virtual uint32_t mColumns () const =0
 The number of columns in this table.
 
virtual uint32_t mRows () const =0
 The number of rows in memory. More...
 
virtual int parseNamesAndTypes (const char *txt)
 Parse names and data types in string form. More...
 
virtual int readCSV (const char *inputfile, int memrows=0, const char *outputdir=0, const char *delimiters=0)=0
 Read the content of the named file as comma-separated values. More...
 
virtual int readNamesAndTypes (const char *filename)
 Read a file containing the names and types of columns. More...
 
virtual int readSQLDump (const char *inputfile, std::string &tname, int memrows=0, const char *outputdir=0)=0
 Read a SQL dump from database systems such as MySQL. More...
 
virtual int32_t reserveBuffer (uint32_t)
 Reserve enough buffer space for the specified number of rows. More...
 
virtual void setASCIIDictionary (const char *, const char *)=0
 Set the name of the ASCII dictionary file for a column of categorical values. More...
 
virtual void setPartitionMax (uint32_t m)
 Set the recommended number of rows in a data partition.
 
virtual tabletoTable (const char *nm=0, const char *de=0)=0
 Stop expanding the current set of data records. More...
 
virtual int write (const char *dir, const char *tname=0, const char *tdesc=0, const char *idx=0, const char *nvpairs=0) const =0
 Write the in-memory data records to the specified directory and update the metadata on disk. More...
 
virtual int writeMetaData (const char *dir, const char *tname=0, const char *tdesc=0, const char *idx=0, const char *nvpairs=0) const =0
 Write out the information about the columns. More...
 

Static Public Member Functions

static ibis::tablexcreate ()
 Create a minimalistic table exclusively for entering new records. More...
 

Protected Member Functions

 tablex ()
 Protected default constructor. More...
 

Protected Attributes

uint32_t ipart
 Current partition number being used for writing.
 
uint32_t maxpart
 Recommended size of data partitions to be created.
 

Detailed Description

The class for expandable tables.

It is designed to temporarily store data in memory and then write the records out through the function write. After creating a object of this type, the user must first add columns by calling addColumn. New data records may be added one column at a time or one row at a time. An example of using this class is in examples/ardea.cpp.

Note
Most functions that return an integer return 0 in case of success, a negative value in case error and a positive number as advisory information.

Constructor & Destructor Documentation

ibis::tablex::tablex ( )
inlineprotected

Protected default constructor.

Derived classes need a default constructor.

Member Function Documentation

virtual int ibis::tablex::append ( const char *  cname,
uint64_t  begin,
uint64_t  end,
void *  values 
)
pure virtual

Add values to the named column.

The column name must be in the table already. The first value is to be placed at row begin (the row numbers start with 0) and the last value before row end. The array values must contain (end - begin) values of the type specified through addColumn.

The expected types of values are "const std::vector<std::string>*" for string valued columns, and "const T*" for a fix-sized column of type T. For example, if the column type is float, the type of values is "const float*"; if the column type is category, the type of values is "const std::vector<std::string>*".

Note
Since each column may have different number of rows filled, the number of rows in the table is considered to be the maximum number of rows filled of all columns.
This function can not be used to introduce new columns in a table. A new column must be added with addColumn.
See also
appendRow

Implemented in ibis::tafel.

Referenced by fastbit_add_values().

virtual int ibis::tablex::appendRow ( const ibis::table::row )
pure virtual

Add one row.

If an array of names has the same number of elements as the array of values, the names are used as column names. If the names are not specified explicitly, the values are assigned to the columns of the same data type in the order as they are specified through addColumn or if the same order as they are recreated from an existing dataset (which is typically alphabetical).

Return the number of values added to the new row.

Note
The column names are not case-sensitive.
Like append, this function can not be used to introduce new columns in a table. A new column must be added with addColumn.
Since the various columns may have different numbers of rows filled, the number of rows in the table is assumed to the largest number of rows filled so far. The new row appended here increases the number of rows in the table by 1. The unfilled rows are assumed to be null.
A null value is internally denoted with a mask separated from the data values. However, since the rows corresponding to the null values must be filled with some value in this implementation, the following is how their values are filled. A null value of an integer column is filled as the maximum possible of the type of integer. A null value of a floating-point valued column is filled as a quiet NaN (Not-a-Number). A null value of a string-valued column is filled with an empty string.

Implemented in ibis::tafel.

virtual int ibis::tablex::appendRow ( const char *  line,
const char *  delimiters = 0 
)
pure virtual

Append a row stored in ASCII form.

The ASCII form of the values are assumed to be separated by comma (,) or space, but additional delimiters may be added through the second argument.

Return the number of values added to the new row.

Implemented in ibis::tafel.

virtual int ibis::tablex::appendRows ( const std::vector< ibis::table::row > &  )
pure virtual

Add multiple rows.

Rows in the incoming vector are processed on after another. The ordering of the values in earlier rows are automatically carried over to the later rows until another set of names is specified.

Return the number of new rows added.

See also
appendRow

Implemented in ibis::tafel.

virtual uint32_t ibis::tablex::bufferCapacity ( ) const
inlinevirtual

Capacity of the memory buffer.

Report the maximum number of rows can be stored with this object before more memory will be allocated. A return value of zero (0) may also indicate that it does not know about its capacity.

Note
For string valued columns, the resvation is not necessarily allocating space required for the actual string values. Thus it is possible to run out of memory before the number of rows reported by mRows reaches the value returned by this function.

Reimplemented in ibis::tafel.

virtual void ibis::tablex::clearData ( )
pure virtual

Remove all data recorded.

Keeps the information about columns. It is intended to prepare for new rows after invoking the function write.

Implemented in ibis::tafel.

ibis::tablex * ibis::tablex::create ( )
static

Create a minimalistic table exclusively for entering new records.

Create a tablex for entering new data.

virtual const char* ibis::tablex::getASCIIDictionary ( const char *  ) const
pure virtual

Retrieve the name of the ASCII dictionary file associated with a column of categorical values.

Implemented in ibis::tafel.

virtual uint32_t ibis::tablex::mRows ( ) const
pure virtual

The number of rows in memory.

It is the maximum number of rows in any column.

Implemented in ibis::tafel.

int ibis::tablex::parseNamesAndTypes ( const char *  txt)
virtual

Parse names and data types in string form.

A column name must start with an alphabet or a underscore (_); it can be followed by any number of alphanumeric characters (including underscores). For each built-in data types, the type names recognized are as follows:

If it can not find a type, but a valid name is found, then the type is assumed to be int.

Note
Column names are not case-sensitive and all types should be specified in lower case letters.

Characters following '#' or '–' on a line will be treated as comments and discarded.

References ibis::tafel::addColumn(), ibis::BLOB, ibis::CATEGORY, ibis::DOUBLE, ibis::FLOAT, ibis::INT, ibis::LONG, ibis::SHORT, ibis::TEXT, ibis::UBYTE, ibis::UINT, ibis::ULONG, and ibis::USHORT.

Referenced by readNamesAndTypes().

virtual int ibis::tablex::readCSV ( const char *  inputfile,
int  memrows = 0,
const char *  outputdir = 0,
const char *  delimiters = 0 
)
pure virtual

Read the content of the named file as comma-separated values.

Append the records to this table. If the argument memrows is greater than 0, this function will reserve space to read this many records. If the total number of records is more than memrows and the output directory name is specified, then the records will be written the outputdir and the memory is made available for later records. If outputdir is not specified, this function attempts to expand the memory allocated, which may run out of memory. Furthermore, repeated allocations can be time-consuming.

By default the records are delimited by comma (,) and blank space. One may specify alternative delimiters using the last argument.

Upon successful completion of this funciton, the return value is the number of rows processed. However, not all of them may remain in memory because ealier rows may have been written to disk.

Note
Information about column names and types must be provided before calling this function.
The return value is intentionally left as 32-bit integer, which limits the maximum number of rows can be correctly handled.
This function processes the input text file one line at a time by using the standard unix read function to perform the actual I/O operations. Depending on the I/O libraries used, it may expect the end-of-line character to be unix-style. If your text file is not terminated with the unix-style end-of-line character, then it is possible for this function to understand the lines incorrectly. If you see an entire line being read as one single field, then it is likely that you are have problem with the end-of-line character. Please try to convert the end-of-line character and give it another try.

Implemented in ibis::tafel.

int ibis::tablex::readNamesAndTypes ( const char *  filename)
virtual

Read a file containing the names and types of columns.

The content of the file is either the simple list of "name:type" pairs or the more verbose version used in '-part.txt' files. If it is the plain 'name:type' pair form, the pairs can be either specified one at a time or a group at a time. This function attempts to read one line at a time and will automatically grow the internal buffer used if the existing buffer is too small to read a long line. However, it is typically a good idea to keep the lines relatively short so it can be examined manually if necessary.

References ibis::fileManager::buffer< T >::address(), parseNamesAndTypes(), ibis::util::readString(), ibis::fileManager::buffer< T >::resize(), and ibis::fileManager::buffer< T >::size().

virtual int ibis::tablex::readSQLDump ( const char *  inputfile,
std::string &  tname,
int  memrows = 0,
const char *  outputdir = 0 
)
pure virtual

Read a SQL dump from database systems such as MySQL.

The entire file will be read into memory in one shot unless both memrows and outputdir are specified. In cases where both memrows and outputdir are specified, this function reads a maximum of memrows before write the data to outputdir under the name tname, which leaves no more than memrows number of rows in memory. The value returned from this function is the number of rows processed including those written to disk. Use function mRows to determine how many are still in memory.

If the SQL dump file contains statement to create table, then the existing metadata is overwritten. Otherwise, it reads insert statements and convert the ASCII data into binary format in memory.

Implemented in ibis::tafel.

virtual int32_t ibis::tablex::reserveBuffer ( uint32_t  )
inlinevirtual

Reserve enough buffer space for the specified number of rows.

Return the number of rows that can be stored or a negative number to indicate error. Since the return value is a 32-bit signed integer, it is not possible to represent number greater or equal to 2^31 (~2 billion), the caller shall not attempt to reserve space for 2^31 rows (or more).

The intention is to mimize the number of dynamic memory allocations needed expand memory used to hold the data. The implementation of this function is not required, and the user is not required to call this function.

Reimplemented in ibis::tafel.

virtual void ibis::tablex::setASCIIDictionary ( const char *  ,
const char *   
)
pure virtual

Set the name of the ASCII dictionary file for a column of categorical values.

Implemented in ibis::tafel.

virtual table* ibis::tablex::toTable ( const char *  nm = 0,
const char *  de = 0 
)
pure virtual

Stop expanding the current set of data records.

Convert a tablex object into a table object, so that they can participate in queries. The data records held by the tablex object is transfered to the table object, however, the metadata remains with this object.

Implemented in ibis::tafel.

virtual int ibis::tablex::write ( const char *  dir,
const char *  tname = 0,
const char *  tdesc = 0,
const char *  idx = 0,
const char *  nvpairs = 0 
) const
pure virtual

Write the in-memory data records to the specified directory and update the metadata on disk.

If the table name (tname) is a null string or an empty string, the last component of the directory name is used. If the description (tdesc) is a null string or an empty string, a time stamp will be printed in its place. If the specified directory already contains data, the new records will be appended to the existing data. In this case, the table name specified here will overwrite the existing name, but the existing name and description will be retained if the current arguments are null strings or empty strings. The data type associated with this table will overwrite the existing data type information. If the index specification is not null, the existing index specification will be overwritten.

  • dir The output directory name. Must be a valid directory name. The named directory will be created if it does not already exist.
  • tname Table name. Should be a valid string, otherwise, a random name is generated as FastBit requires a name for each table.
  • tdesc Table description. An optional description of the table. It can be an arbitrary string.
  • idx Indexing option for all columns of the table without its own indexing option. More information about indexing options is available elsewhere.
  • nvpairs An arbitrary list of name-value pairs to be associated with the data table. An arbitrary number of name-value pairs may be given here, however, FastBit may not be able to do much about them. One useful of the form "columnShape=(nd1, ..., ndk)" can be used to tell FastBit that the table table is defined on a simple regular k-dimensional mesh of size nd1 x ... x ndk. Internally, these name-value pairs associated with a data table is known as meta tags or simply tags.

Return the number of rows written to the specified directory on successful completion.

Implemented in ibis::tafel.

Referenced by fastbit_flush_buffer().

virtual int ibis::tablex::writeMetaData ( const char *  dir,
const char *  tname = 0,
const char *  tdesc = 0,
const char *  idx = 0,
const char *  nvpairs = 0 
) const
pure virtual

Write out the information about the columns.

It will write the metadata file containing the column information and index specifications if no metadata file already exists. It returns the number of columns written to the metadata file upon successful completion, returns 0 if a metadata file already exists, and returns a negative number to indicate errors. If there is no column in memory, nothing is written to the output directory.

Note
The formal arguments of this function are exactly same as those of ibis::tablex::write.
Warning
This function does not preserve the existing metadata! Use with care.

Implemented in ibis::tafel.


The documentation for this class was generated from the following files:

Make It A Bit Faster
Contact us
Disclaimers
FastBit source code
FastBit mailing list archive