Large scale scientific data is often stored in scientific
data formats such as FITS, netCDF and HDF. These storage
formats are of particular interest to the scientific user community
since they provide multi-dimensional storage and retrieval.
However, one of the drawbacks of these storage formats
is that they do not support semantic indexing which is
important for interactive data analysis where scientists look
for features of interests such as "Find all supernova explosions
where energy > 105 and temperature > 106."
In this paper we present a novel approach called HDF5-
FastQuery to accelerate the data access of large HDF5
files by introducing multi-dimensional semantic indexing.
Our implementation leverages an efficient indexing technology
called bitmap indexing that has been widely used
in the database community. Bitmap indices are especially
well suited for interactive exploration of large-scale readonly
data. Storing the bitmap indices into the HDF5 file
has the following advantages: a) Significant performance
speedup of accessing subsets of multi-dimensional data and
b) portability of the indices across multiple computer platforms.
We will present an API that simplifies the execution
of queries on HDF5 files for general scientific applications
and data analysis. The design is flexible enough to accommodate
the use of arbitrary indexing technology for semantic
range queries. We will also provide a detailed performance
analysis of HDF5-FastQuery for both synthetic and
scientific data. The results demonstrate that our proposed
approach for multi-dimensional queries is up to a factor of
2 faster than HDF5.