Earth System Grid I

In the project "Prototyping an Earth System Grid" (ESG-I), initially funded under DOE's NGI program, and continued with follow-on support from OBER and MICS, we took the first steps towards the realization of the ESG vision. We developed Data Grid technologies for managing the movement and replication of large datasets, and applied these technologies in practical settings via the creation of an ESG-enabled data browser based on the PCMDI data analysis tools. We also worked to deploy these technologies at DOE laboratories and NCAR. The results of this work have been demonstrated on multiple occasions, including the 2000 NCAR Climate System Model meeting and the SC'2000 meeting in Dallas. At the latter event, we demonstrated interactive analysis of remote climate data (located at sites across the country) and cross-country transfer rates of more than 500 Mb/s, and were awarded the "Hottest Infrastructure" award in the Network Challenge event.

Figure, below, illustrates key elements of the ESG-I prototype, which are:

1. PCMDI, an application developed at LLNL for analysis and visualization of climate simulation data. PCMDI users can specify input to the analysis in terms of data attributes, such as geographic position, time, or simulation variable. A metadata database is used to map the specified attributes into a logical file name that identifies which element simulation data set contain the data of interest.

2. PCMDI forwards the logical file name to a request manager that is responsible for selecting from one or more replicas of the desired data and transferring the data back to PCMDI. The request manager uses a Replica Catalog to map the supplied logical file name to the one or more physical storage systems that contain replicas of the needed data. It then uses a network performance measurement system-the Network Weather Service (NWS)-to select the best source of the data and then uses GridFTP, a secure, high-performance, parallel file transfer service, to transfer the data from the preferred site to the site on which PCMDI is running. On HPSS platforms, it is desirable to manage the staging of data off of tape to a local disk caches before transferring them to their destination via GridFTP. A Hierarchical Resource Manager developed at LBNL is used for this purpose.

3. A transfer monitoring tool (not shown) is used to track the progress of data movement. This status information is relayed back to the user where it is reported via a graphical display.

ESG-I was extremely successful in two regards: it developed a rich technology base that is now seeing use in multiple other disciplines, and it demonstrated the feasibility and power of using a Grid environment for climate analysis applications. However, ESG-I did not address the task of putting an operational Earth System Grid into service for the broad community. One reason was simply timescale: creating a complete Earth System Grid is a big task, and ESG-I did not proceed long enough to complete it. A more significant reason is focus. As befits an NGI project, ESG-I was concerned above all with the high-speed movement of large climate model datasets, and with analysis by "fat clients" (specifically, the PCMDI tools) on powerful workstations. Yet the vast majority of the climate modeling community operates on relatively low-powered workstations. For these users, "thin" clients capable of calling upon powerful remote storage and computing resources are essential.