1. Practical Considerations for Continual Learning
Tom Diethe
tdiethe@amazon.com
Continual AI Meetup:
“Real-world Applications of Continual Learning”
April 28 2020
2. Continual Learning at Amazon
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 1 / 11
3. Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
4. Alexa AI
What is Alexa?
A cloud-based voice service that can help
you with tasks, entertainment, general
information, shopping, and more
The more you talk to Alexa, the more
Alexa adapts to your speech patterns,
vocabulary, and personal preferences
How do we ensure that ...
we create robust and efficient AI systems?
we ensure that the privacy of customer
data is safeguarded?
customers are treated fairly by ML
algorithms?
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 2 / 11
5. Failure Modes
Unintentional failures: ML system produces a formally correct but completely unsafe
outcome
Outliers/anomalies
Dataset shift
Limited memory
Intentional failures: failure is caused by an active adversary attempting to subvert the
system to attain her goals, such as to:
misclassify the result
infer private training data
steal the underlying algorithm
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 3 / 11
6. FX (xt1 , . . . , xtn ) = FX (xt1+τ , . . . , xtn+τ )
for all τ, t1, . . . , tn
for all n ∈ N
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 4 / 11
8. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
9. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
10. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
11. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
12. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
13. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
14. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
15. Robustness & Transparency via Continual Learning
Data arrive continually
(Possibly) non-IID
Tasks may change over time (e.g. trends/fashions in
shopping)
New tasks may emerge (e.g. new product
categories, new marketplaces)
Robustness How can we adapt to new data whilst
retaining existing knowledge?
Transparency: How can we have systems can
signal they’re going wrong?
Standard approaches:
Train individual models on each task. Train
combination
Maintain single model and use regularization to fix
influential parameters
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 6 / 11
16. Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
17. Bayesian Continual Learning [Nguyen 2018]
Given e.g. data in task t as Dt = x
(nt )
t , y
(nt )
t
Nt
n=1
, parameters θ (e.g. BLR, BNN, GP ...)
p(θ|D1:T ) ∝ p(θ)p(D1:T |θ)
= p(θ)
T
t−1
NT
n=1
p y
(nt )
t |θ, x
(nt )
t
= p(θ|D1:T−1)p(DT |θ).
Natural recursive algorithm!
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 7 / 11
18. Engineering a Continual Learning System
Automating Data Retention Policies:
Sketcher/Compressor: when the data rate is too high
Joiner: when labels arrive late
Shared infrastructure: optimal use of space, like an OS cache
Automating Monitoring and Quality Control:
Data monitoring: dataset shift detection, anomaly detection
Prediction monitoring: monitor performance of models
Automating the ML Life-Cycle:
Trainer and HPO: store provenance, warm start training
Model policy engine: ensure re-training performed at right cadence
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 8 / 11
19. “Zero-Touch” Machine Learning
Model Policy
Engine
Streams
Model
Stream
Trainer
HPO
Data
Statistics
Data Monitoring
Anomaly Detection,
Distribution Shift
Measurement
Retrain
Rollback
Prediction
statistics
Prediction
Statistics
Prediction
Monitoring
Accuracy, Shift
Predictor
Business Metrics
Business Logic
Business metrics
Costs
Desired accuracy
Joiner
System State
DB
Diagnostic
Logs
Sketcher/
Sampler
Predictions
Predictions
Shared Infrastructure
Model DB
Training Data
Reservoir
Validation Data
Reservoir
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 9 / 11
20. Summary: Continual Learning
Continual Learning
Bayesian methods are a natural fit for continual learning
However it’s tricky to make them work well with deep learning methods
Many interesting methodological improvements happening, but most are still not
production ready
Engineering viewpoint is also required
Tom Diethe (Amazon) Practical Considerations for Continual Learning April 28 2020 10 / 11