Modern scientific datasets present numerous data management and analysis
challenges. State-of-the-art index and query technologies are critical
for facilitating interactive exploration of large datasets, but
numerous challenges remain in terms of designing a system for process-
ing general scientific datasets. The system needs to be able to run on
distributed multi-core platforms, efficiently utilize underlying I/O
infrastructure, and scale to massive datasets.
We present FastQuery, a novel software framework that address these
challenges. FastQuery utilizes a state-of-the-art index and query
technology (FastBit) and is designed to process mas- sive datasets on
modern supercomputing platforms. We apply FastQuery to processing of a
massive 50TB dataset generated by a large scale accelerator modeling
code. We demonstrate the scalability of the tool to 11,520
cores. Motivated by the scientific need to search for interesting
particles in this dataset, we use our framework to reduce search time
from hours to tens of seconds.