a49-bennett.pdf

Combining In-situ and In-transit Processing to

Enable Extreme-Scale Scientific Analysis

Janine C. Bennett

, Hasan Abbasi

, Peer-Timo Bremer

, Ray Grout

, Attila Gyulassy

{

Tong Jin

, Scott Klasky

, Hemanth Kolla

, Manish Parashar

, Valerio Pascucci

{

Philippe Pebay

, David Thompson

, Hongfeng Yu

, Fan Zhang

, and Jacqueline Chen

Sandia National Laboratories,

Oakridge National Laboratory,

Lawrence Livermore National Laboratory

National Renewable Energy Laboratory,

{

University of Utah,

Rutgers University,

Kitware

Abstract

—With the onset of extreme-scale computing, I/O

constraints make it increasingly difficult for scientists to save

a sufficient amount of raw simulation data to persistent storage.

One potential solution is to change the data analysis pipeline from

a post-process centric to a concurrent approach based on either

in-situ or in-transit processing. In this context computations are

considered in-situ if they utilize the primary compute resources,

while in-transit processing refers to offloading computations to

a set of secondary resources using asynchronous data transfers.

In this paper we explore the design and implementation of three

common analysis techniques typically performed on large-scale

scientific simulations: topological analysis, descriptive statistics,

and visualization. We summarize algorithmic developments, de-

scribe a resource scheduling system to coordinate the execution of

various analysis workflows, and discuss our implementation using

the DataSpaces and ADIOS frameworks that support efficient

data movement between in-situ and in-transit computations. We

demonstrate the efficiency of our lightweight, flexible framework

by deploying it on the Jaguar XK6 to analyze data generated

by S3D, a massively parallel turbulent combustion code. Our

framework allows scientists dealing with the data deluge at ex-

treme scale to perform analyses at increased temporal resolutions,

mitigate I/O costs, and significantly improve the time to insight.

I. I

NTRODUCTION

While the steady increase in available computing resources

enables ever more detailed and sophisticated simulations, I/O

constraints are beginning to impede their scientific impact.

Even though the time scales resolved by modern simulations

continue to decrease, the length between time steps saved to

disk typically increases. For example, turbulent combustion

direct numerical simulations (DNS) currently resolve inter-

mittent phenomena that occur on the order of 10 simulation

timesteps (e.g., the creation of ignition kernels). However, in

order to maintain I/O overheads at a reasonable level, typically

only every 400th timestep is saved to persistent storage for

post-processing and, as a result, the data pertaining to these

intermittent phenomena is lost. Figure 1 illustrates such subtle

vortical structures identified in a large and complex flow field

of turbulent combustion. This problem is widely predicted to

become even more pressing on future architectures, motivating

a fundamental shift away from a post-process centric data

analysis paradigm.

One promising direction is to move towards a concurrent

analysis framework in which raw simulation data is processed

as it is computed, decoupling the analysis from file I/O. The

Fig. 1: Top: A small vortical structure in a turbulent flow

field is highlighted in the red box. Bottom: The highlighted

structure is tracked over time (left 5 images). The right-most

image shows the overlap between the

1

and

5

images.

Such connectivity indicators are lost with conventional post-

processing when the temporal length-scale of features is

shorter than the frequency at which data is written to disk.

two most commonly considered variants are in-situ and in-

transit processing. Both are based on the idea of performing

analyses as the simulation is running, storing only the results,

which are usually several orders of magnitude smaller than

the original, and thus mitigating the effects of limited disk

bandwidth or capacity. Their difference lies in how and where

the computation is performed. In-situ analysis typically shares

the primary simulation compute resources. In contrast, when

analyses are performed in-transit, some or all of the data is

transferred to different processors, either on the same machine

or on different computing resources all together.

Both of these approaches have inherent advantages and dis-

advantages. In principle, in-transit analysis minimally impacts

the scientific simulation. By using asynchronous data transfers

to offload computations to secondary resources, the simulation

can resume operation much more quickly than if it were to wait

for a set of in-situ analyses to complete. However, in practice,

transferring even a subset of the raw data over the network

may become prohibitive, and furthermore, the memory and/or

SC12, November 10-16, 2012, Salt Lake City, Utah, USA

978-1-4673-0806-9/12/$31.00

2012 IEEE

computing capabilities of the secondary resources can quickly

be surpassed. In-situ analyses are not faced with the same

resource limitations because the entirety of the simulation data

is locally available. However, scientists will typically tolerate

only a minimal impact on simulation performance, which

places significant restrictions on the analysis. First, simulations

are often memory bound and thus all analyses must operate

within a very limited amount of scratch space. Second, the

analysis is usually allotted only a short time window to

execute. The latter restriction is particularly challenging as

many data analysis algorithms are global in nature and few

are capable of scaling satisfactorily.

To address these challenges, this paper proposes a hybrid

approach based on decomposing analysis algorithms into two

stages: a highly efficient and massively parallel in-situ stage,

and a small-scale parallel or serial in-transit stage connected

via a transparent data staging framework. The key insight is

that many analysis algorithms can be formulated to perform

various amounts of filtering and aggregation, resulting in a set

of intermediate data that is often orders of magnitude smaller

than the raw simulation output. Asynchronously transferring

these partial results, we are able to both minimize simulation

impact and reduce in-transit data transfer costs. We demon-

strate our framework using three common post-processing

tasks: visualization, descriptive statistical summaries, and a so-

phisticated topological analysis. All three approaches perform

an entirely local set of in-situ computations and transfer their

intermediate results asynchronously to a staging area where

computations are completed in-transit. The staging frame-

work automatically pipelines in-transit computations using

different processes for successive time steps via a pull-based

scheduling model to manage execution heterogeneity. This

almost entirely decouples the time necessary to complete the

analysis of a time step from the time required to advance the

simulation. In particular, we demonstrate how our framework

enables analysis and visualization of a large-scale combustion

simulation at temporal frequencies infeasible for traditional

post-processing approaches, while minimizing impact on the

primary simulation. Our contributions in detail are:

A new formulation of three common analysis approaches

into a massively parallel in-situ and a small-scale or serial

in-transit stage;

A flexible data staging and coordination framework to

transparently transfer intermediate data from the primary

to a set of secondary computing resources;

A temporally multiplexed approach to decouple the per-

formance of the analysis from that of the simulation; and

A case study demonstrating a wide range of analyses

applied to a large-scale turbulent combustion simulation

at unprecedented temporal frequencies.

Overall, our framework represents a crucial first step towards

a practical approach for the concurrent analysis of massively

parallel simulations. Our approach is flexible, extensible to a

wide range of analysis algorithms, applicable to virtually all

high performance computing environments, and promises to

significantly improve the time to insight for modern scientific

simulations.

II. R

ELATED

ORK

In-situ and In-transit processing

: The increasing per-

formance gap between compute and I/O capabilities has

motivated recent developments in both in-situ and in-transit

data processing paradigms. Largely data-parallel operations,

including visualization [1]–[5], and statistical compression and

queries [6], have been directly integrated into simulation rou-

tines, enabling them to operate on in-memory simulation data.

Another approach, used by FP [7] and CoDS [8], performs in-

situ data operations on-chip using separate dedicated processor

cores on multi/many-core nodes.

The use of a data staging area, i.e., a set of additional com-

pute nodes allocated by users when launching parallel simula-

tions, has been investigated in projects such as DataStager [9],

PreDatA [10], JITStaging [11], DataSpaces [12]/ActiveS-

paces [13], and Glean [14]. Most of these existing data

staging solutions primarily focus on fast and asynchronous

data movement off simulation nodes to lessen the impact of

expensive I/O operations. They typically support limited data

operations within the staging area, such as pre-processing,

and transformations, often resulting in under-utilization of the

staging nodes’ compute power. In contrast, we present a hybrid

in-situ/in-transit processing framework in which a multi-stage

pipeline supporting various simultaneous analyses fully uti-

lizes both the data buffering and computation capabilities of

staging nodes.

Analytics:

Visualization is a largely data-parallel operation

that has been the focus of many of the existing in-situ analytics

efforts. Among the earlier work are several parallel run-time

visualizations whose problem and system scales were fairly

small [15]–[17]. One of the primary advantages of simulation-

time visualization is the ability it grants scientists to visually

monitor their simulation while it is running. For example,

SCIRun [18] provides a computational steering environment

that supports run-time simulation tracking. Tu et al. [1] were

the first to demonstrate how to effectively monitor a terascale

earthquake simulation running on thousands of processors of

a supercomputer. Over a wide-area network, they were able to

interactively change visualization parameters used to visually

monitor simulation runs [2]. Yu et al. [3] demonstrate in-situ

visualization of particle and scalar field data from a large-scale

combustion simulation, creating a scalable solution in which

in-situ visualization only accounts for a small fraction of over-

all simulation time. Recent efforts also allow for the coupling

of simulation codes with popular visualization tools, VisIt [4]

and ParaView [5]. Both works aim to reduce integration efforts

required by the user and minimize performance impact to the

simulation.

Descriptive statistics are a common tool used by scientists

to provide succinct summaries of their data. The R [19]

software package contains a subset of algorithms which have

been fully parallelized [20]. The work of [6] provides a

framework for performing statistical queries on massive data.

This paper describes the in-situ and in-transit deployment of







