People Publications Projects

Optimized Parallel I/O for Big Data Analytics at a Trillion Particle Scale

Investigators: Suren Byna (LBNL), P. Carns (ANL)

Program Manager: Lucy Nowell

Scientific Achievement

Using Big Data analytics algorithms to analyze the tens of terabytes produced by trillion particle scale cosmology and plasma physics simulations needs high-performance data read/write (I/O) functions, where the time spent in I/O must be minimized. LBNL researchers developed I/O methods and optimizations to drive the I/O time down to a minimum.

Clusters identified in 1.4 trillion particle space weather simulation: Spatial distribution of clusters identified by the clustering algorithm developed in this work. The high-density clusters are mainly localized within the current sheet and appear as narrow structures elongated along the direction of local magnetic field. Scientists interpret that the particles comprising the clusters have been accelerated in a process where they gain a fixed amount of energy in a relatively narrow region of space. This observation helped the scientists understand principles of plasma interactions in space weather. (Image Credit: V. Roytershteyn, LANL/SSI)

Significance and Impact

Researchers achieved near-peak I/O performance on NERSC file systems, enabling DBScan & K-nearest neighbor algorithms scale to 100,000 cores and leading to first-of-a-kind data analysis and visualizations that helped scientists understand principles of space weather.

Research Details