Prabhat (PI), Suren Byna, Oliver Rubel, John Wu, LBNL

Objectives

Ability to analyze very large datasets quickly to enhance scientific understanding and discovery
Enhance I/O on HDF5, a popular file format used by many application domains
Demonstrate these capabilities on Trillion particle plasma physics simulation

Accomplishments

Trillion particle plasma physics simulation conducted on 120,000 cores @NERSC
Enhanced Parallel HDF5 obtained peak 35GB/s, and 80% sustained I/O rate
FastBit was used to index 30TB timestep in 10 minutes and query in 3 seconds

Impact

Software enabled scientists to search and gain insights from the trillion particle dataset for the first time:
- Confinement of energetic particles by the flux ropes
- Asymmetric distribution of particles near the reconnection hot-spot

Wu.LBNL.DM.trillion.particles-fig1

Magnetic reconnection from a plasma physics simulation (Left). Scientists were able to query and find an asymmetric distribution of particles near the reconnection event (Right) using our software tools.

Surendra Byna et al, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation". SuperComputing conference, SC’12, November 2012.

Notes:

The slide highlights recent accomplishments from the ExaHDF5 project with collaboration with SDAV staff.

1) Parallel I/O with HDF5
We ran a Trillion particle simulation on 120K cores on hopper. The code produced 30 TB of particle data *per timestep*- To the best of our knowledge, this is the first time that anyone has demonstrated writes to a single, shared 30 TB HDF5 file

We hit peak I/O rates on hopper (~35GB/s) for brief time intervals during the run, we made an average ~23GB/s, which is a new record for parallel HDF5 performance

2) FastBit based analysis-

We developed a novel hybrid parallel version of FastBit to do the indexing/querying on the dataset
This was the first time that we used FastBit and FastQuery to index and query a dataset with Trillion entries
We were able to index the dataset in 10 minutes and query the dataset in 3 seconds

3) Scientific insights

This is the first time that our science collaborators have been able to examine the trillion particle dataset. They had largely ignored the particle data, or looked at a coarse grained version earlier
Our collaborators had made a number of conjectures and hypothesis regarding the interplay between particles and the magnetic fields and multi-dimensional phase-space distribution of particles. Using these new tools, they were able to confirm these hypothesis quantitatively. More specifically the scientists found:
- a preferential acceleration of particles in a direction parallel to the magnetic field
- energetic particles carrying a significant amount of current, even at early timesteps in the simulation
- predominant distribution of energetic particles in the current sheet, suggesting that flux ropes can confine these particles
- agyrotropic (asymmetric) distribution of particles near the magnetic reconnection event

DOE researchers: Prabhat (PI), Suren Byna, Oliver Rubel and John Wu (LBNL)
Scientific collaborators: Homa Karimabadi (UCSD), Vadim Roytershteyn (UCSD) and Bill Daughton (LANL)
Simulation code used in the study is VPIC, developed at LANL.

Scaling Parallel I/O and Analysis to a Trillion Particles

Objectives

Accomplishments

Impact

Notes: