Grid Collector: Facilitating Efficient Selective Access from Data Grids

Kesheng Wu, Junmin Gu, Jerome Lauret, Arthur M. Poskanzer, Arie Shoshani, Alexander Sim, and Wei-Ming Zhang


The Grid Collector is a system that facilitates the effective analysis and spontaneous exploration of scientific data. It combines an efficient indexing technology with a Grid file management technology to speed up common analysis jobs on high-energy physics data and to enable some previously impractical analysis jobs. To analyze a set of high-energy collision events, one typically specifies the files containing the events of interest, reads all the events in the files, and filters out unwanted ones. Since most analysis jobs filter out significant number of events, a considerable amount of time is wasted by reading the unwanted events. The Grid Collector removes this inefficiency by allowing users to specify more precisely what events are of interest and to read only the selected events. This speeds up most analysis jobs. In existing analysis frameworks, the responsibility of bringing files from tertiary storages or remote sites to local disks falls on the users. This forces most of analysis jobs to be performed at centralized computer facilities where commonly used files are kept on large shared file systems. The Grid Collector automates file management tasks and eliminates the labor-intensive manual file transfers. This makes it much easier to perform analyses that require data files on tertiary storages and remote sites. It also makes more computer resources available for analysis jobs since they are no longer bound to the centralized facilities.

full text of LBNL-57677 (PDF)

Published in International Supercomputer Conference 2005
Closely related
More research work by John Wu
Bitmap Index
Connected Component Labeling
Eigenvalue Computation
Inforamtion available elsewhere on the web
Google Scholar
Contact us

John Wu