LBNL has for a long time been involved in developing data management technology for scientific applications. More recently, several projects have focused on research and development for scientific applications that generate very large quantities of data. The volumes of data may reach hundreds of terabytes per year that need to be stored on robotic tape systems. In particular, the OPTIMASS project [CDLK+95a, CDLK+95b] developed the technology to reorganize and access spatio-temporal data from robotic tape systems. Another project, called the HENP-Grand Challenge [BNRS98, SBNRS98, SBNRS99], developed the technology and software to manage terabytes of High-Energy and Nuclear Physics data and their movement to a shared disk cache. This system, called STACS (Storage Access Coordination System) [STACS99] was integrated into the production analysis system for the STAR and PHENIX experiments at BNL.
STACS is directly relevant to this proposed project. It has three main components that represent its three functions: 1) The Query Estimator (QE) uses the index to determine what files and what chunks in each file (called ??events?? in physics applications) are needed to satisfy a given range query. 2) The Query Monitor (QM) keeps track of what queries are executing at any time, what files are cached on behalf of each query, what files are not in use but are still in cache, and what files still need to be cached. 3) The Cache Manager is responsible for interfacing to the mass storage system (HPSS) to perform all the actions of staging files to and purging files from the disk cache. The QE was designed for particle physics data and can be readily used for this project. The QM will only be used in a local system, as the Globus, Condor, and SRB services can be used to manage the distributed resource management. The CM will be used to manage the transfer of data from HPSS to a local cache, and will be extended to use services to move data between distributed caches.