[View PDF]

 

Manish Parashar, Rutgers

Objectives

  • Provide tools for online and In-situ data analytics
    • E.g. visualization, feature tracking
  • Enable integrated and coupled multi-physics simulation
    • E.g. integrated climate modeling, fusion simulation, subsurface modeling, material science workflows

Impact

  • Enable in-situ execution of coupled scientific workflow
  • Enabled coupled simulation / data analytics /data processing workflow composed as a DAG of tasks
  • The DataSpaces tool was used for shared space programming abstraction to coordinate data sharing for in-memory code coupling

Accomplishments

  • Two workflow scenarios evaluated on Cray XT5
  • Significant saving in the amount of data transferred over the network by co-locating data producers and consumers on the same processor
  • Data transfer time (and energy) decreased as much of the coupled data is retrieved directly  from on-processor shared memory

Publication

F. Zhang, C. Docan, M. Parashar, S. Klasky, N. Podhorszki, H. Abbasi: "Enabling In-situ Execution of Coupled Scientific Workflow on Multi-core Platform". IPDPS’2012.

 

In-situ execution of simulation and visualization processes on a multi-core platform

In-situ execution of simulation and visualization processes on a multi-core platform

Notes:

Motivation:

  • Emerging scientific workflows are composed of heterogeneous coupled component applications that interact and exchange significant volumes of data at runtime
  • On-chip data sharing is much cheaper than off-chip data transfers
    • Large volumes of data movement over communication fabrics ? contention, latency and energy consumption
  • High-end systems have increased cores count (per compute node)
    • E.g., Cray XK6  (Titan) -- AMD 16-core per processors; IBM BG/Q (Mira and Sequoia) --17-core per processor


Problems with Traditional Approach for Data Sharing:
Disk based

  • Increasing performance gap between computation and disk IO speeds, IO becomes the bottleneck
  • Couplers can become the bottleneck limiting scalability
  • Larger data sharing latency, data is moved twice
  • Large volumes of network data movement ? Increasing costs in terms of time and energy!