Spark Summit East Keynote by Ion Stoica
A long-standing grand challenge in computing is to enable machines to act autonomously and intelligently: to rapidly and repeatedly take appropriate actions based on information in the world around them. To address this challenge, at UC Berkeley we are starting a new five year effort that focuses on the development of data-intensive systems that provide Real-Time Intelligence with Secure Execution (RISE). Following in the footsteps of AMPLab, RISELab is an interdisciplinary effort bringing together researchers across AI, robotics, security, and data systems. In this talk I’ll present our research vision and then discuss some of the applications that will be enabled by RISE technologies.
8. Why?
What does this mean?
• Faster decisions better than slower decisions
• Decisions on fresh data better than decisions on stale data
• Decisions on personalized data better than on aggregate data
8
Data only as valuable as the decisions it enables
9. Goal
Real-time decisions
on live data
with strong security
9
decide in ms
the current stateof theenvironment
privacy, confidentiality, integrity
10. Typical decision system
Decision System
Query
Decision
Environment
+
sensors &
actuators
Observations, Feedback
Preprocess Intermediate
data
Decision
Engine
13. Example of decision systems
Decision System
Query
Decision
Training
Models
(diff.tradeoffs
complexity/
accuracy)
Model
Serving
Feedback
Observations, Feedback
ML Pipeline
(e.g., Clipper +
Spark/Tensorflow)
Decision System
Obs.
Action
Update
Policy
Policy
obs à
action
Query
Policy
Observations, Rewards
Reinforcement
Learning Systems
(e.g., Ray)
Pre-
process
Interm-
ediate
Data
Decision
Engine
14. What else do we want from decisions?
Intelligent:complexdecisions in uncertain environments
Robust: handle complexnoise, unforeseeninputs, failures
Explainable:ability to explainnon-obvious decisions
16. Some Proposed Research
Secure Real-time Decisions Stack (SRDS)
• Open source platform to develop of RISE apps
• Secure from ground up
• Reinforcement Learning (RL) as one of key app patterns
Learning control hierarchies: speeduplearning, training
Shared learning: learn over confidential data
16
17. Secure Real-time Decisions Stack (SRDS)
• Open source platform to develop of RISE apps
• Secure from ground up
• Reinforcement Learning (RL) as one of key app patterns
Learning control hierarchies: speeduplearning, training
Shared learning: learn over confidential data
17
Some Proposed Research
18. Secure Real-time Decision Stack (SRDS)
scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
19. scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Minimalist executionengine:
• Support both data flow and task-parallel execution models
• High-throughput, low-latency: ~ 1M tasks/sec @ ms latency
Secure Real-time Decision Stack (SRDS)
20. scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Central repositoryfor models,APIs to capture the context
in whichdata gets used and produced
Status:ongoing project with industry partners
Secure Real-time Decision Stack (SRDS)
21. scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Replaying of apps at fine granularity
• Simplify development, debugging
• Robustness: replay against perturbed inputs
• Explainability: identify inputs causing decision
• Security: confirm vulnerabilities, test security
patches, compliance auditing
Secure Real-time Decision Stack (SRDS)
22. scheduler object store
RISE μkernel
Ray Clipper …
Ground(data contextservice)
Time
Machine
Dramatically simplify development of RISE applications
• Apache Spark: improve latency and security
• Clipper: model serving for Apache Spark, Scikit learn, etc
• Ray: framework for RL applications
Secure Real-time Decision Stack (SRDS)
31. ImprovingApache Spark
Drizzle
• Decrease latency of Structured Streaming and ML algorithms by ~10x
• Techniques: group scheduling, shared variables
• Some of these techniques will make their way to Apache Spark
Opaque
• Full data encryption, authentication, and verification (Intel’s SGX)
• Oblivious mode: hide data access pattern
• Support most SparkSQL functionality
• See Wenting’s talk later
31
32. RISELab
Already promising results
Expect muchmore over the next five years!
32
Goal: Develop open source platforms,tools, and
algorithms for intelligent real-timedecisions on live-
data