Online Data Analysis and Reduction at Extreme Scales

1
Computing Just What You Need:
Online Data Analysis and Reduction
at Extreme Scales
Ian Foster, Argonne & U.Chicago
August 31, 2017
EuroPar, Santiago de Compostela
https://www.researchgate.net/publication/317703782

2
What I won’t talk about: globus.org
5major services
13
national labs
use Globus
300PB
transferred
10,000
active endpoints
50 Bn
files processed
70,000
registered users
99.5%
uptime
65+
institutional
subscribers
1 PB
largest single
transfer to date
3 months
longest continuously
managed transfer
300+
federated
campus
identities
12,000
active users/year

4
Three messages
Dramatic changes in HPC system geography …
… are driving new application structures …
… resulting in exciting new computer science challenges

5
Geography: (Part of) what determines how
long it takes to get from A to B

6
The memory hierarchy plays a
big role in computing
geography

7
• Computing geography
is changing rapidly
• Despite continued
exponential growth in
many technologies
• Different rates mean
that resources are
getting farther away
~1980-2000
Patterson, CACM, 2004
CPU high,
Disk low

8
A. C. Bauer et al.,
EuroVis 2016Titan supercomputer

13
Exascale climate goal: Ensembles of 1km
models at 15 simulated years/24 hours
Full state once per model day  260 TB every 16 seconds
 1.4 EB/day

14
Model selection in deep learning
Evaluate 1M alternative models, each with 100M
parameters  1014 parameter values
https://de.mathworks.com/company/newsletters/articles/cancer-diagnostics-with-deep-learning-and-photonic-time-stretch.html

15
Real-time analysis and experimental steering
• Current experimental protocols
typically process and validate data
only after an experiment has
completed, which can lead to
undetected errors and prevents
online steering
• We built an autonomous stream
processing system that allows data
streamed from beamline computers
to be processed in real time on a
remote supercomputer with a
control feedback loop used to make
decisions during experimentation
• The system has been tested in real-
world setting TXM beamline (32-
ID@APS) while performing cement
wetting experiment (2 experiments,
each with 8 hours of data
acquisition time)
Sustained # Projections/seconds
CircularBufferSize
Reconstruction Frequency
Image Quality w.r.t. Streamed Projections
SimilarityScore
# Streamed Projections Reconstructed
Image Sequence
Tekin Bicer et al., eScience 2017

16
Other examples
• Materials science
• Billion-atom atomistic simulations with femtosecond time steps
• Simulations may run for simulated seconds
• Want to study vibrational responses at 10s of femtoseconds
• Fusion science
• Full-device simulations may generate 100 PBs
• Need to reduce 1000:1 for effective output
• Eventual goal is real-time response during fusion experiments

17
HPC applications: Synopsis
Single
program
Multiple
program
Offline
analysis
Online
analysis
Many tasks
• Reliable or unreliable
• Loosely or tightly coupled
• Static or dynamic
New challenges: Efficient logistics!
• “Amateurs talk strategy while
professionals study logistics” –
Robert Barrow
• “The line between disorder and
order lies in logistics...” – Sun Tzu
Multiple
simulations
+ analyses
Simulation
+ analysis
Multiple
simulations

18
The need for online data analysis and reduction
Traditional approach:
Simulate, output, analyze
Write simulation output to secondary
storage; read back for analysis
Decimate in time when simulation
output rate exceeds output rate of
computer
Online: y = F(x)
Offline: a = A(y), b= B(y), …

19
The need for online data analysis and reduction
Traditional approach:
Simulate, output, analyze
Write simulation output to secondary
storage; read back for analysis
Decimate in time when simulation
output rate exceeds output rate of
computer
Online: y = F(x)
Offline: a = A(y), b= B(y), …
New approach:
Online data analysis & reduction
Co-optimize simulation, analysis,
reduction for performance and
information output
Substitute CPU cycles for I/O, via data
(de)compression and/or online data
analysis
a) Online: a = A(F(x)), b = B(F(x)), …
b) Online: r = R(F(x))
Offline: a = A’(r), b = B’(r), or
a = A(U(r)), b = B(U(r))
[R = reduce, U = un-reduce]

20
But reduction comes with challenges
• Handling high entropy
• Performance – no benefit
otherwise
• Not only errors in variable ∶
Ε ≡ 𝑓 − 𝑓
• Must also consider impact on
derived quantities:
Ε ≡ (𝑔𝑙
𝑡
(𝑓 𝑥, 𝑡 ) − 𝑔𝑙
𝑡
( 𝑓𝑙
𝑡
( 𝑥, 𝑡 )
S. Klasky

21
Data reduction challenges
Key research challenge:
How to manage the impact
of errors on derived
quantities?
Where did it go???
S. Klasky

22
CODAR: Center for Online Data Analysis and Reduction
A U.S. Department of Energy
Exascale Computing Program
Codesign Center
CODAR
Data services Exascale
platforms
Applications

23
Infrastructure – Matthew Wolf (Lead)
• Cheetah: Bryce Allen, Kshitij Mehta,
Tahsin Kurc, Li Tang
• Savannah: Justin Wozniak, Manish
Parashar, Philip Davis
• Chimbuko: Abid Malik, Line
Pouchard
Data Reduction – Franck Cappello (Lead)
• Multilevel: Mark Ainsworth, Ozan
Tugluk, Jong Choi
• Z-checker: Julie Bessac, Sheng Di
Data Analysis – Shinjae Yoo (Lead)
• Blobs: Tom Peterka, Hanqi Guo
• Hierarchical: Stefan Wild, Wendy Di
• Functional: George Ostrouchov
• Visual Analytics: Klaus Mueller, Wei
Xu
Management – Ian Foster (Lead)
• Scott Klasky
• Kerstin Kleese van Dam
• Todd Munson (Project Management)

24
Cross-cutting research questions
What are the best data analysis and reduction algorithms for different
application classes, in terms of speed, accuracy, and resource
requirements? How can we implement those algorithms to achieve
scalability and performance portability?
What are the tradeoffs in data analysis accuracy, resource needs, and
overall application performance between using various data reduction
methods to reduce file size prior to offline data reconstruction and
analysis vs. performing more online data analysis? How do these
tradeoffs vary with hardware and software choices?
How do we effectively orchestrate online data analysis and reduction to
reduce associated overheads? How can hardware and software help with
orchestration?

25
Prototypical CODAR data analysis and reduction pipeline
CODAR runtime
Reduced output and reconstruction info
I/O
system
CODAR data API
Running simulation
Multivariate statistics
Feature analysis
Outlier detection
Application-aware
Transforms
Encodings
Error calculation
Refinement hints
CODARdataAPI
Offlinedataanalysis
Simulation knowledge: application, models, numerics, performance optimization, …
CODAR
data analysis
CODAR
data reduction
CODAR
data monitoring

26
Overarching data reduction challenges
• Understanding the science requires massive data reduction
• How do we reduce
• The time spent in reducing the data to knowledge?
• The amount of data moved on the HPC platform?
• The amount of data read from the storage system?
• The amount of data stored in memory, on storage system, moved over WAN?
• Without removing the knowledge.
• Requires deep dives into application post processing routines and simulations
• Goal is to create both (a) co-design infrastructure and (b)
reduction and analysis routines
• General: e.g., reduce Nbytes to Mbytes, N<<M
• Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements
• Application-specific: e.g. reduced physics allows us to understand deltas

27
HPC floating point compression
• Current interest is with lossy algorithms, some use preprocessing
• Lossless may achieve up to ~3x reduction
• ISABELA
• SZ
• ZFP
• Linear auditing
• SVD
• Adaptive gradient methods
Compress each variable separately: Several variables simultaneously:
• PCA
• Tensor decomposition
• …

28
Lossy compression with SZ
No existing compressor can reduce hard to compress
datasets by more than a factor of 2.
Objective 1: Reduce hard to compress datasets by
one order of magnitude
Objective 2: Add user-required error controls (error
bound, shape of error distribution, spectral behavior
of error function, etc. etc.)
NCAR
atmosphere
simulation
output
(1.5 TB)
WRF
hurricane
simulation
output
Advanced
Photon Source
mouse brain
data
What we need to
compress
(bit map of 128
floating point
numbers):
Random noise
Franck Cappello

29
Lossy compression: Atmospheric simulation
Franck Cappello
Latest SZ

30
Characterizing compression error
0.0001
0.001
0.01
0.1
1 1/N 6/N 11/N 16/N 21/N 26/N 31/N 36/N 41/N 46/N
Amplitude
Frequency
0
2e-07
4e-07
6e-07
8e-07
1e-06
1.2e-06
1.4e-06
1.6e-06
1.8e-06
2e-06
0 20 40 60 80 100
120
140
160
180
200
MaximumCompressionError
Variables
SZ(max error)
SZ(avg error)
ZFP(max error)
ZFP(avg error)
Error distribution
Spectral behavior Laplacian (derivatives)
Autocorrelation of errors
Respect of error bounds Error propagation
Franck Cappello

31
Z-checker: Analysis of data reduction error
• Community tool to enable comprehensive assessment of lossy data reduction error:
• Collection of data quality criteria from applications
• Community repository for datasets, reduction quality requirements, compression
performance
• Modular design enables contributed analysis modules (C and R) and format readers
(ADIOS, HDF5, etc.)
• Off-line/on-line parallel statistical, spectral, point-wise distortion analysis with static &
dynamic visualization
Franck Cappello, Julie Bessac, Sheng Di

32
Science-driven decompositions
• Information-theoretically derived methods like SZ,
Isabella, ZFP make for good generic capabilities
• If scientists can provide additional details on how to
determine features of interest, we can use those to
drive further optimizations. E.g., if they can select:
• Regions of high gradient
• Regions near turbulent flow
• Particles with velocities > two standard deviations
• How can scientists help define features?

33
Multilevel compression techniques
A hierarchical reduction scheme produces
multiple levels of partial decompression of the
data so that users can work with reduced
representations that require minimal storage
whilst achieving the user-specified tolerance
Compression vs. user-specified toleranceResults for turbulence dataset: extremely large,
inherently non-smooth, resistant to compression Mark Ainsworth

34
Manifold learning for change detection and
adaptive sampling
Low dimensional manifold projection
of different state of MD trajectories
• A single molecular dynamics
trajectory can generate 32 PB
• Use online data analysis to detect
relevant or significant events
• Project MD trajectories to manifold
space (dimensionality reduction) across
time into two dimensional space
• Change detection on manifold space is
more robust than original full coordinate
space as it removes local vibrational
noise
• Apply adaptive sampling strategy based
on accumulated changes of trajectories
Shinjae Yoo

35
Critical points extracted
with topology analysis
Tracking blobs in XGC fusion simulations
Blobs, regions of high turbulence that can
damage the Tokamak, can run along the edge
wall down toward the diverter and damage it.
Blob extraction and tracking enables the
exploration and analysis of high-energy blobs
across timesteps. Our new visualizations will
help scientists understand the behavior of blob
dynamics in greater detail than previously
possible.
Research Details
• Access data with ADIOS I/O in high performance
• Precondition the input data with robust PCA
• Detect blobs as local extrema with topology analysis
• Track blobs over time with combinatorial feature flow
field method
A method to extract, track, and visualize blobs in large scale 5D gyrokinetic Tokamak simulations.
Hanqi Guo, Tom Peterka
Tracking graph that visualizes the dynamics of blobs
(birth, merge, split, and death) over time
Data preconditioning
with robust PCA

36
Reduction for visualization
“an extreme scale simulation … calculates
temperature and density over 1000 of time
steps. For both variables, a scientist would like
to visualize 10 isosurface values and X, Y, and Z
cut planes for 10 locations in each dimension.
One hundred different camera positions are
also selected, in a hemisphere above the
dataset pointing towards the data set. We will
run the in situ image acquisition for every time
step. These parameters will produce: 2
variables x 1000 time steps x (10 isosurface
values + 3 x 10 cut planes) x 100 camera
positions x 3 images (depth, float, lighting)
= 2.4 x 107 images.”
J. Ahrens et al., SC’14
103 time steps x
1015 B state per
time step = 1018 B
2.4 x 107 images x
1MB/image
(megapixel, 4B) =
2.4 x 1012 B

37
Fusion whole device model
XGC GENEInterpolator
100+ PB
PB/day on
Titan today;
10+ PB/day
in the future
10 TB/day on
Titan today;
100+ TB/day
in the future
Analysis
Analysis
Analysis
Read 10-100 PB
per analysis
http://bit.ly/2fcyznK

38
Reduction Reduction
XGC
Viz.
XGC
output
GENE
Viz.
GENE
output
Comparative
Viz.
NVRAM
PFS
TAPE
http://bit.ly/2fcyznK

39
Integrates multiple technologies:
•ADIOS staging (DataSpaces) for coupling
•Sirius (ADIOS + Ceph) for storage
•ZFP, SZ, Dogstar for reduction
•VTK-M services for visualization
•TAU for instrumenting the code
•Cheetah + Savanna to test the different
configurations (same node, different node,
hybrid-combination) to determine where to
place the different services
•Flexpath for staged-write from XGC to storage
•Ceph + ADIOS to manage storage hierarchy
•Swift for workflow automation
Reduction Reduction
XGC
Viz.
XGC
output
GENE
Viz.
GENE
output
TAU TAU
Comparative
Viz.
NVRAM
PFS
TAPE
Performance
Viz.
Cheetah +
Savanna drive
codesign experiments

40
Savannah: Swift workflows coupled with ADIOS
Z-Check
dup
Multi-node workflow components communicate over ADIOS
Application data
Cheetah
Experiment
configuration
and dispatch
User monitoring and
control of multiple
pipeline instances
Co-design data
Store
experiment
metadata
Chimbuko
captures co-design
performance data
Other co-design
output
(e.g., Z-Checker)
CODAR
campaign
definition
Analysis
ADIOS output
Job launch
Science
App
Reduce
Co-design experiment architecture

41
Transformation layer
• Designed for data conversions,
compression, and transformation
• zlib, bzip2, szip, ISOBAR, ALACRITY, FastBit
• Can transform local data on each processor
• Transparent for users
• User code read/write the original
untransformed data
• Applications
• Compressed output
• Automatically indexed data
• Local Data Reorganization
• Data Reduction
• Released in ADIOS 1.6 in 2013 with
compression transformations
User Application
ADIOS
Variable A
I/O Transport Layer
Regular var.
BP file, staging area, etc.
Data
Transform
Layer
Variable B
Plugin Read
Transform
Plugin
Plugin Write
Transformed var.

42
Codesign questions to be addressed
• How can we couple multiple codes? Files, staging on the same
node, different nodes, synchronous, asynchronous?
• How we can test different placement strategies for memory
optimization, performance optimizations?
• What are the best reduction technologies to allow us to capture
all relevant information during a simulation? E.g., Performance
vs. accuracy.
• How can we create visualization services that work on the
different architectures and use the data models in the codes?
• How do we manage data across storage hierarchies?

43
CODAR summary
• Infrastructure development and deployment
• Enable rapid composition of application and “data services” (data
reduction methods, data analysis methods, etc.)
• Support CODAR-developed and other data services
• Method development: new reduction & analysis routines
• Motif-specific: e.g., finite difference mesh vs. particles vs. finite elements
• Application-specific: e.g., reduced physics to understand deltas
• Application engagement
• Understand data analysis and reduction requirements
• Integrate, deploy, evaluate impact
https://codarcode.github.io codar-info@cels.anl.gov

44
Dramatic changes in HPC system geography …
… are driving new application structures …
… resulting in exciting new computer science challenges
Thanks to US Department of Energy and CODAR team

Online Data Analysis and Reduction at Extreme Scales

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Online Data Analysis and Reduction at Extreme Scales

Similaire à Online Data Analysis and Reduction at Extreme Scales (20)

Plus de Ian Foster

Plus de Ian Foster (20)

Dernier

Dernier (20)

Online Data Analysis and Reduction at Extreme Scales