Simulations are generating an unprecedented amount of data, facilitated by the rapidly increasing computational capabilities of leading compute resources. This presents significant challenges. One challenge lies in hardware trends: the enormous increases in compute power are not being matched by corresponding increases in bandwidth to storage. Cost and power constrain the feasibility of dramatically larger storage deployments. A second challenge lies in extracting knowledge from these volumes of data. Research in data management infrastructure has created capabilities that can assist in this process, but the available tools are not widely used and deployed. These are not just future challenges, but rather, they are already causing bottlenecks that substantially impact the quality and productivity of scientific research performed with HPC machines.

 

Leaders:

This email address is being protected from spambots. You need JavaScript enabled to view it. (ORNL), This email address is being protected from spambots. You need JavaScript enabled to view it. (ANL)

 

Team Members:

This email address is being protected from spambots. You need JavaScript enabled to view it., ANL

This email address is being protected from spambots. You need JavaScript enabled to view it., Rutgers

This email address is being protected from spambots. You need JavaScript enabled to view it., ORNL

This email address is being protected from spambots. You need JavaScript enabled to view it., NCSU

This email address is being protected from spambots. You need JavaScript enabled to view it., GA Tech

This email address is being protected from spambots. You need JavaScript enabled to view it., LBNL

This email address is being protected from spambots. You need JavaScript enabled to view it., ANL

This email address is being protected from spambots. You need JavaScript enabled to view it., GA Tech

This email address is being protected from spambots. You need JavaScript enabled to view it., LBNL

 

Projects:

I/O frameworks

In Situ Processing and Code Coupling

Indexing

In Situ Data Compression

Parallel I/O and File Formats

Wu.LBNL.DM.Indexing-fig1

FastBit - Efficient Search Technology for Data Driven Science

[View PDF] John Wu, Arie Shoshani (LBNL) Problem Quickly find records satisfying user-specified conditions from a large, complex data set Example: High-energy physics data – from billions of events find collision events with a given energy level and having a specified number of tracks Solution Developed new indexing techniques and a new compression method for the… Read more
FastQuery API

FastQuery: providing database capabilities for scientific files

[View PDF] K. Wu, S. Byna, A. Shoshani, in collaboration with LBNL Vis group Key Ideas Provide uniform array interface for scientific data in… Read more
Wu.LBNL.DM.trillion.particles-fig1

Scaling Parallel I/O and Analysis to a Trillion Particles

[View PDF] Prabhat (PI), Suren Byna, Oliver Rubel, John Wu, LBNL Objectives Ability to analyze very large datasets quickly to enhance scientific… Read more
Pugmire.ORNL.DMVis.mesh.support-fig1

ADIOS Visualization Schema for VisIt

[View PDF] Dave Pugmire, Gary Liu, Scott Klasky (ORNL) VisIt and ADIOS VisIt: Large data parallel visualization tool Rich set of visualization and… Read more
Samatova.NCSU.DM.Climate-fig1

15% More Accuracy in Seasonal Hurricane Forecasts through Comparative Climate Networks Analytics

[View PDF] Nagiza Samatova NCSU/ORNLFred Semazzi NCSU Objectives Develop predictive forecasting methodology for climate extremes (e.g., hurricanes,… Read more
Optimization

Accelerating Science Input/Output on Leadership Platforms

[View PDF] Rob Ross, ANL Objectives Standards-based Input/Output (I/O) interfaces are a cornerstone of DOE science codes The ROMIO MPI-IO… Read more
Visualization of data from SpecFM3D. Simulation by J. Tromp (Princeton)

Visualization for Geo Sciences

[View PDF] Dave Pugmire, ORNL VisIt and ADIOS VisIt: Large data parallel visualization tool Rich set of visualization and analysis plots and… Read more
Pugmire.ORNL.DMVis.Fusion-fig1

Visualization Support for Fusion

[View PDF] Dave Pugmire, ORNL VisIt and ADIOS VisIt: Large data parallel visualization tool Rich set of visualization and analysis plots and… Read more
DIY usage and library organization

DIY: Enabling large-scale data-parallel analysis

[View PDF] Tom Peterka, ANL Main Ideas and Objectives Decouple analysis technique (user) from data-intensive parallelism (DIY) Enable large-scale… Read more
Conceptual overview of the in-memory data indexing and querying framework and its components

Scalable In-Memory Data Indexing and Querying for Scientific Simulation Workflows

[View PDF] Manish Parashar, Rutgers Target Application S3D combustion simulation (Jacqueline Chen and Hemanth Kolla, Sandia National Laboratory)… Read more
In-situ execution of simulation and visualization processes on a multi-core platform

In situ code coupling and analysis - an essential capability for advanced large scale simulations

[View PDF] Manish Parashar, Rutgers Objectives Provide tools for online and In-situ data analytics E.g. visualization, feature tracking Enable… Read more
Data automatically translated from full resolution (left) to  the reduced resolution (right) to meet the limited memory availability.

Facilitating In-Situ Analytics for Complex AMR-based Simulation Workflows

[View PDF] Manish Parashar, Rutgers Objective Manage dynamic data processing requirements at extreme scales using coordinated algorithm, middleware… Read more
Histogram of I/O access sizes in a FLASH plot file

Darshan: Improving I/O performance for scientific applications

[View PDF] Robert Ross, ANL Application Darshan collects concise I/O access pattern information from large-scale applications Goal Users: improve the… Read more
childs fig1

Big Data Means Big Issues for Exascale Visualization

DOE ASCR DISCOVERY – New Faces Posted August 8, 2012 When exascale computers begin calculating at a billion, billion operations each second, gaining… Read more
Supernova Simulation

I/O bottlenecks and analysis challenges faced by applications running on leadership systems

[View PDF] Visualization of Type 1A supernova explosion FLASH simulation • FLASH is multi-scale, multi-physics code used in domains including… Read more