Abstract
Hierarchical storage subsystems that include multiple layers of burst buffers (BB) and disk-based parallel file systems (PFS), are becoming an essential part of HPC systems to address the I/O performance gap. However, the state-of-the-art software for managing these hierarchical storage subsystems, such as Cray DataWarp, requires user involvement in moving data among storage layers. Such manual data movement may experience poor performance because of resource contention on the I/O servers of a layer for serving data movement in the hierarchy as well as regular read/write requests. In this work, we propose a new system, called Data Elevator, for transparently and efficiently moving data in hierarchical storage. Users specify the final destination for their data, typically a PFS. The Data Elevator library intercepts the I/O calls, stages data on a fast persistent storage layer (for example, an SSD-based burst buffer), and then asynchronously transfers the data to the final destination in the background. Data Elevator reduces the resource contention on BB servers via offloading the data movement from a fixed number of BB server nodes to compute nodes. The number of the compute nodes is configurable based on the data movement load. Data Elevator also allows optimizations, such as overlapping read and write operations, choosing I/O modes, and aligning buffer boundaries. In our tests with large-scale scientific applications, Data Elevator is as much as 4.2X faster than Cray DataWarp, and 4X faster than directly writing data to PFS.

Publications

  • Bharti Wadhwa, Suren Byna, and Ali R. Butt, "Toward Transparent Data Management in Multi-layer Storage Hierarchy for HPC Systems", IEEE International Conference on Cloud Engineering 2018 (IC2E 2018) [Preprint version]
  • Bin Dong, Suren Byna, Kesheng Wu, Prabhat, Hans Johansen, Jeffrey N. Johnson, and Noel Keen, "Data Elevator: Low-contention Data Movement in Hierarchical Storage System", The 23rd annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), 2016. [Preprint version]

Software
Data Elevator -- Transparent data movement in hierarchical storage systems [Works with HDF5]