of 9
Current View
Combining In-situ and In-transit Processing to
Enable Extreme-Scale Scientific Analysis
Janine C. Bennett
, Hasan Abbasi
y
, Peer-Timo Bremer
z
, Ray Grout
x
, Attila Gyulassy
{
,
Tong Jin
k
, Scott Klasky
y
, Hemanth Kolla
, Manish Parashar
k
, Valerio Pascucci
{
,
Philippe Pebay
, David Thompson
, Hongfeng Yu
, Fan Zhang
k
, and Jacqueline Chen
Sandia National Laboratories,
y
Oakridge National Laboratory,
z
Lawrence Livermore National Laboratory
x
National Renewable Energy Laboratory,
{
University of Utah,
k
Rutgers University,
Kitware
Abstract
—With the onset of extreme-scale computing, I/O
constraints make it increasingly difficult for scientists to save
a sufficient amount of raw simulation data to persistent storage.
One potential solution is to change the data analysis pipeline from
a post-process centric to a concurrent approach based on either
in-situ or in-transit processing. In this context computations are
considered in-situ if they utilize the primary compute resources,
while in-transit processing refers to offloading computations to
a set of secondary resources using asynchronous data transfers.
In this paper we explore the design and implementation of three
common analysis techniques typically performed on large-scale
scientific simulations: topological analysis, descriptive statistics,
and visualization. We summarize algorithmic developments, de-
scribe a resource scheduling system to coordinate the execution of
various analysis workflows, and discuss our implementation using
the DataSpaces and ADIOS frameworks that support efficient
data movement between in-situ and in-transit computations. We
demonstrate the efficiency of our lightweight, flexible framework
by deploying it on the Jaguar XK6 to analyze data generated
by S3D, a massively parallel turbulent combustion code. Our
framework allows scientists dealing with the data deluge at ex-
treme scale to perform analyses at increased temporal resolutions,
mitigate I/O costs, and significantly improve the time to insight.
I. I
NTRODUCTION
While the steady increase in available computing resources
enables ever more detailed and sophisticated simulations, I/O
constraints are beginning to impede their scientific impact.
Even though the time scales resolved by modern simulations
continue to decrease, the length between time steps saved to
disk typically increases. For example, turbulent combustion
direct numerical simulations (DNS) currently resolve inter-
mittent phenomena that occur on the order of 10 simulation
timesteps (e.g., the creation of ignition kernels). However, in
order to maintain I/O overheads at a reasonable level, typically
only every 400th timestep is saved to persistent storage for
post-processing and, as a result, the data pertaining to these
intermittent phenomena is lost. Figure 1 illustrates such subtle
vortical structures identified in a large and complex flow field
of turbulent combustion. This problem is widely predicted to
become even more pressing on future architectures, motivating
a fundamental shift away from a post-process centric data
analysis paradigm.
One promising direction is to move towards a concurrent
analysis framework in which raw simulation data is processed
as it is computed, decoupling the analysis from file I/O. The
Fig. 1: Top: A small vortical structure in a turbulent flow
field is highlighted in the red box. Bottom: The highlighted
structure is tracked over time (left 5 images). The right-most
image shows the overlap between the
1
st
and
5
th
images.
Such connectivity indicators are lost with conventional post-
processing when the temporal length-scale of features is
shorter than the frequency at which data is written to disk.
two most commonly considered variants are in-situ and in-
transit processing. Both are based on the idea of performing
analyses as the simulation is running, storing only the results,
which are usually several orders of magnitude smaller than
the original, and thus mitigating the effects of limited disk
bandwidth or capacity. Their difference lies in how and where
the computation is performed. In-situ analysis typically shares
the primary simulation compute resources. In contrast, when
analyses are performed in-transit, some or all of the data is
transferred to different processors, either on the same machine
or on different computing resources all together.
Both of these approaches have inherent advantages and dis-
advantages. In principle, in-transit analysis minimally impacts
the scientific simulation. By using asynchronous data transfers
to offload computations to secondary resources, the simulation
can resume operation much more quickly than if it were to wait
for a set of in-situ analyses to complete. However, in practice,
transferring even a subset of the raw data over the network
may become prohibitive, and furthermore, the memory and/or
SC12, November 10-16, 2012, Salt Lake City, Utah, USA
978-1-4673-0806-9/12/$31.00
c
2012 IEEE








computing capabilities of the secondary resources can quickly
be surpassed. In-situ analyses are not faced with the same
resource limitations because the entirety of the simulation data
is locally available. However, scientists will typically tolerate
only a minimal impact on simulation performance, which
places significant restrictions on the analysis. First, simulations
are often memory bound and thus all analyses must operate
within a very limited amount of scratch space. Second, the
analysis is usually allotted only a short time window to
execute. The latter restriction is particularly challenging as
many data analysis algorithms are global in nature and few
are capable of scaling satisfactorily.
To address these challenges, this paper proposes a hybrid
approach based on decomposing analysis algorithms into two
stages: a highly efficient and massively parallel in-situ stage,
and a small-scale parallel or serial in-transit stage connected
via a transparent data staging framework. The key insight is
that many analysis algorithms can be formulated to perform
various amounts of filtering and aggregation, resulting in a set
of intermediate data that is often orders of magnitude smaller
than the raw simulation output. Asynchronously transferring
these partial results, we are able to both minimize simulation
impact and reduce in-transit data transfer costs. We demon-
strate our framework using three common post-processing
tasks: visualization, descriptive statistical summaries, and a so-
phisticated topological analysis. All three approaches perform
an entirely local set of in-situ computations and transfer their
intermediate results asynchronously to a staging area where
computations are completed in-transit. The staging frame-
work automatically pipelines in-transit computations using
different processes for successive time steps via a pull-based
scheduling model to manage execution heterogeneity. This
almost entirely decouples the time necessary to complete the
analysis of a time step from the time required to advance the
simulation. In particular, we demonstrate how our framework
enables analysis and visualization of a large-scale combustion
simulation at temporal frequencies infeasible for traditional
post-processing approaches, while minimizing impact on the
primary simulation. Our contributions in detail are:
A new formulation of three common analysis approaches
into a massively parallel in-situ and a small-scale or serial
in-transit stage;
A flexible data staging and coordination framework to
transparently transfer intermediate data from the primary
to a set of secondary computing resources;
A temporally multiplexed approach to decouple the per-
formance of the analysis from that of the simulation; and
A case study demonstrating a wide range of analyses
applied to a large-scale turbulent combustion simulation
at unprecedented temporal frequencies.
Overall, our framework represents a crucial first step towards
a practical approach for the concurrent analysis of massively
parallel simulations. Our approach is flexible, extensible to a
wide range of analysis algorithms, applicable to virtually all
high performance computing environments, and promises to
significantly improve the time to insight for modern scientific
simulations.
II. R
ELATED
W
ORK
In-situ and In-transit processing
: The increasing per-
formance gap between compute and I/O capabilities has
motivated recent developments in both in-situ and in-transit
data processing paradigms. Largely data-parallel operations,
including visualization [1]–[5], and statistical compression and
queries [6], have been directly integrated into simulation rou-
tines, enabling them to operate on in-memory simulation data.
Another approach, used by FP [7] and CoDS [8], performs in-
situ data operations on-chip using separate dedicated processor
cores on multi/many-core nodes.
The use of a data staging area, i.e., a set of additional com-
pute nodes allocated by users when launching parallel simula-
tions, has been investigated in projects such as DataStager [9],
PreDatA [10], JITStaging [11], DataSpaces [12]/ActiveS-
paces [13], and Glean [14]. Most of these existing data
staging solutions primarily focus on fast and asynchronous
data movement off simulation nodes to lessen the impact of
expensive I/O operations. They typically support limited data
operations within the staging area, such as pre-processing,
and transformations, often resulting in under-utilization of the
staging nodes’ compute power. In contrast, we present a hybrid
in-situ/in-transit processing framework in which a multi-stage
pipeline supporting various simultaneous analyses fully uti-
lizes both the data buffering and computation capabilities of
staging nodes.
Analytics:
Visualization is a largely data-parallel operation
that has been the focus of many of the existing in-situ analytics
efforts. Among the earlier work are several parallel run-time
visualizations whose problem and system scales were fairly
small [15]–[17]. One of the primary advantages of simulation-
time visualization is the ability it grants scientists to visually
monitor their simulation while it is running. For example,
SCIRun [18] provides a computational steering environment
that supports run-time simulation tracking. Tu et al. [1] were
the first to demonstrate how to effectively monitor a terascale
earthquake simulation running on thousands of processors of
a supercomputer. Over a wide-area network, they were able to
interactively change visualization parameters used to visually
monitor simulation runs [2]. Yu et al. [3] demonstrate in-situ
visualization of particle and scalar field data from a large-scale
combustion simulation, creating a scalable solution in which
in-situ visualization only accounts for a small fraction of over-
all simulation time. Recent efforts also allow for the coupling
of simulation codes with popular visualization tools, VisIt [4]
and ParaView [5]. Both works aim to reduce integration efforts
required by the user and minimize performance impact to the
simulation.
Descriptive statistics are a common tool used by scientists
to provide succinct summaries of their data. The R [19]
software package contains a subset of algorithms which have
been fully parallelized [20]. The work of [6] provides a
framework for performing statistical queries on massive data.
This paper describes the in-situ and in-transit deployment of