This paper presents a new parallel indexing data
structure for answering queries. The index, called Bin-Hash,
offers extremely high levels of concurrency, and is therefore wellsuited
for the emerging commodity of parallel processors, such
as multi-cores, cell processors, and general purpose graphics
processing units (GPU). The Bin-Hash approach first bins the
base data, and then partitions and separately stores the values
in each bin as a perfect spatial hash table. To answer a query,
we first determine whether or not a record satisfies the query
conditions based on the bin boundaries. For the bins with records
that can not be resolved, we examine the spatial hash tables.
The procedures for examining the bin numbers and the spatial
hash tables offer the maximum possible level of concurrency; all
records are able to be evaluated by our procedure independently
in parallel. Additionally, our Bin-Hash procedures access much
smaller amounts of data than similar parallel methods, such as
the projection index. This smaller data footprint is critical for
certain parallel processors, like GPUs, where memory resources
are limited.
To demonstrate the effectiveness of Bin-Hash, we implement it
on a GPU using the data-parallel programming language CUDA.
The concurrency offered by the Bin-Hash index allows us to
fully utilize the GPU's massive parallelism in our work; over
12,000 records can be simultaneously evaluated at any one time.
We show that our new query processing method is an order
of magnitude faster than current state-of-the-art CPU-based
indexing technologies. Additionally, we compare our performance
to existing GPU-based projection index strategies.