Proactive Data Containers

Abstract
Reading and writing data efficiently from storage system is necessary for most scientific simulations to achieve good performance at scale. Many software solutions have been developed to decrease the I/O bottleneck. One well-known strategy, in the context of collective I/O operations, is the two-phase I/O scheme. This strategy consists of selecting a subset of processes to aggregate contiguous pieces of data before performing reads/writes. In this paper, we present TAPIOCA, an MPI-based library implementing an efficient topology-aware two-phase I/O algorithm.We show how TAPIOCA can take advantage of double-buffering and one-sided communication to reduce as much as possible the idle time during data aggregation. We also introduce our cost model leading to a topology-aware aggregator placement optimizing the movements of data. We validate our approach at large scale on two leadership-class supercomputers: Mira (IBM BG/Q) and Theta (Cray XC40). We present the results obtained with TAPIOCA on a micro-benchmark and the I/O kernel of a large-scale simulation. On both architectures, we show a substantial improvement of I/O performance compared with the default MPI I/O implementation. On BG/Q+GPFS, for instance, our algorithm leads to a performance improvement by a factor of twelve while on the Cray XC40 system associated with a Lustre filesystem, we achieve an improvement of four.

Publications

François Tessier, Venkat Vishwanath, and Emmanuel Jeannot, "TAPIOCA: An I/O library for optimized topology-aware data aggregation on large-scale supercomputers", The IEEE Cluster Conference 2017
François Tessier, Preeti Malakar, Venkatram Vishwanath, Emmanuel Jeannot and Florin Isaila, "Topology-Aware Data Aggregation for Intensive I/O on Large-Scale Supercomputers", 1st Workshop on Optimization of Communication in HPC runtime systems (IEEE COM-HPC16), Held in conjunction with ACM/IEEE SuperComputing'16 Conference [Preprint version] [Presentation]

TAPIOCA for optimized topology-aware data aggregation"

Contact: