ExaHDF5

Delivering Efficient Parallel I/O
on Exascale Computing Systems

Exascale Computing Project (ECP)
Software Technology area

Overview

Background

In pursuit of more accurate modeling of real-world systems, scientific applications at exascale will generate and analyze massive amounts of data. A critical requirement of these applications is the capability to access and manage this data efficiently on exascale systems. Parallel I/O, the key technology behind moving data between compute nodes and storage, faces monumental challenges from the new application workflows as well as memory, interconnect, and storage architectures considered in the designs of exascale systems. As the storage hierarchy is expanding to include node-local persistent memory, burst buffers, etc., as well as disk and tape-based storage, data movement among these layers must be efficient, with parallel I/O libraries of the future capable of handling file sizes of many terabytes and beyond. Easy-to-use interfaces to access and move data are required for alleviating the burden on scientific application developers and to improve productivity. Exascale I/O systems must also be fault-tolerant to handle the failure of compute, network, memory, and storage, given the number of hardware components at these scales.

With the goal of addressing efficiency, fault-tolerance, and other challenges posed by data management and parallel I/O on exascale architectures, we propose to develop new capabilities in HDF5, the most popular parallel I/O library for scientific applications. HDF5 is the most used library for performing parallel I/O on existing HPC systems at the leadership computing facilities (LCFs). Many of the proposed exascale applications and co-design centers require HDF5 for their I/O, and enhancing the HDF5 software to handle the unique challenges of exascale architectures will play an instrumental role in the success of the Exascale Computing Project (ECP).

Scope of work

In this project, we will productize features and techniques prototyped in the ExaHDF5 and Fast Forward I/O projects, explore optimization strategies on upcoming architectures, maintain and optimize existing HDF5 features for ECP applications, and release these new features in HDF5 for broad deployment on HPC systems. Focusing on the challenges of exascale I/O, we will develop technologies based on the massively parallel storage hierarchies that are being built into pre-exascale systems. We will enhance HDF5 software to achieve efficient parallel I/O on exascale systems in ways that will impact a large number of DOE science applications.

The ExaHDF5 team consists of researchers and engineers from three organizations: The HDF Group (THG), Lawrence Berkeley National Laboratory (LBNL), and Argonne National Laboratory (ANL). THG will release new HDF5 capabilities after rigorous software testing procedures. PIs of this project will work with ASCR and NNSA supercomputing facilities to make these new HDF5 releases available for scientific applications and the project team will provide support and maintenance for issues related to software defects and performance. The ExaHDF5 team will collaborate with exascale application teams and software technology teams to achieve superior I/O performance on upcoming systems.

Previous ASCR funded ExaHDF5 project



Funded by:
Smiley face Smiley face