[View PDF]

 

Prabhat (PI), Suren Byna, Oliver Rubel, John Wu, LBNL

Objectives

  • Ability to analyze very large datasets quickly to enhance scientific understanding and discovery
  • Enhance I/O on HDF5, a popular file format used by many application domains
  • Demonstrate these capabilities on Trillion particle plasma physics simulation

Accomplishments

  • Trillion particle plasma physics simulation conducted on 120,000 cores @NERSC
  • Enhanced Parallel HDF5 obtained peak 35GB/s, and 80% sustained I/O rate
  • FastBit was used to index 30TB timestep in 10 minutes and query in 3 seconds

Impact

  • Software enabled scientists to search and gain insights from the trillion particle dataset for the first time:
    • Confinement of energetic particles by the flux ropes
    • Asymmetric distribution of particles near the reconnection hot-spot

Wu.LBNL.DM.trillion.particles-fig1Wu.LBNL.DM.trillion.particles-fig2

 

Magnetic reconnection from a plasma physics simulation (Left). Scientists were able to query and find an asymmetric distribution of particles near the reconnection event (Right) using our software tools.


Surendra Byna et al, "Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation". SuperComputing conference, SC’12, November 2012.

Notes:

The slide highlights recent accomplishments from the ExaHDF5 project with collaboration with SDAV staff.

1) Parallel I/O with HDF5
We ran a Trillion particle simulation on 120K cores on hopper. The code produced 30 TB of particle data *per timestep*- To the best of our knowledge, this is the first time that anyone has demonstrated writes to a single, shared 30 TB HDF5 file

  • We hit peak I/O rates on hopper (~35GB/s) for brief time intervals during the run, we made an average ~23GB/s, which is a new record for parallel HDF5 performance


2) FastBit based analysis-

  • We developed a novel hybrid parallel version of FastBit to do the indexing/querying on the dataset
  • This was the first time that we used FastBit and FastQuery to index and query a dataset with Trillion entries
  • We were able to index the dataset in 10 minutes and query the dataset in 3 seconds


3) Scientific insights

  • This is the first time that our science collaborators have been able to examine the trillion particle dataset. They had largely ignored the particle data, or looked at a coarse grained version earlier
  •  Our collaborators had made a number of conjectures and hypothesis regarding the interplay between particles and the magnetic fields and multi-dimensional phase-space distribution of particles. Using these new tools, they were able to confirm these hypothesis quantitatively. More specifically the scientists found:
    • a preferential acceleration of particles in a direction parallel to the magnetic field
    • energetic particles carrying a significant amount of current, even at early timesteps in the simulation
    • predominant distribution of energetic particles in the current sheet,  suggesting that flux ropes can confine these particles
    • agyrotropic (asymmetric) distribution of particles near the magnetic reconnection event


DOE researchers: Prabhat (PI), Suren Byna, Oliver Rubel and John Wu (LBNL)
Scientific collaborators: Homa Karimabadi (UCSD), Vadim Roytershteyn (UCSD) and Bill Daughton (LANL)
Simulation code used in the study is VPIC, developed at LANL.