Systems for performing ML inference are increasingly important, but are far slower than they could be because they use techniques designed for conventional data serving workloads, neglecting the statistical nature of ML inference. As an alternative, this talk presents Willump, an optimizer for ML inference.
Call Girls in Sarai Kale Khan Delhi ๐ฏ Call Us ๐9205541914 ๐( Delhi) Escorts S...
ย
Willump: Optimizing Feature Computation in ML Inference
1. Willump: A Statistically-Aware
End-to-end Optimizer for ML
Inference
Peter Kraft, Daniel Kang, Deepak Narayanan, Shoumik Palkar,
Peter Bailis, Matei Zaharia
Stanford University
2. Problem: ML Inference
โ Often performance-critical.
โ Recent focus on tools for ML prediction serving.
2
3. A Common Bottleneck: Feature Computation
3
โ Many applications bottlenecked by
feature computation.
โ Pipeline of transformations computes
numerical features from data for model.
Receive Raw
Data
Compute
Features
Predict With
Model
4. A Common Bottleneck: Feature Computation
4
โ Feature computation is bottleneck when models are
inexpensiveโboosted trees, not DNNs.
โ Common on tabular/structured data!
5. A Common Bottleneck: Feature Computation
Source: Pretzel (OSDI โ18)
Feature computation takes >99% of the time!
Production Microsoft sentiment analysis pipeline
Model run
time
5
6. Current State-of-the-art
โ Apply traditional serving optimizations, e.g. caching
(Clipper), compiler optimizations (Pretzel).
โ Neglect unique statistical properties of ML apps.
6
8. Statistical Properties of ML
Amenability to approximation
8
Easy input:
Definitely not
a dog.
Hard input:
Maybe a
dog?
9. Statistical Properties of ML
Amenability to approximation
Existing Systems: Use Expensive Model for Both
9
Easy input:
Definitely not
a dog.
Hard input:
Maybe a
dog?
10. Statistical Properties of ML
Amenability to approximation
Statistically-Aware Systems: Use cheap model on bucket,
expensive model on cat. 10
Easy input:
Definitely not
a dog.
Hard input:
Maybe a
dog?
12. Statistical Properties of ML
โ Model is often part of a bigger app (e.g. top-K query)
12
Artist Score Rank
Beatles 9.7 1
Bruce Springsteen 9.5 2
โฆ โฆ โฆ
Justin Bieber 5.6 999
Nickelback 4.1 1000
Problem:
Return top
10 artists.
13. Statistical Properties of ML
โ Model is often part of a bigger app (e.g. top-K query)
13
Artist Score Rank
Beatles 9.7 1
Bruce Springsteen 9.5 2
โฆ โฆ โฆ
Justin Bieber 5.6 999
Nickelback 4.1 1000
Use
expensive
model for
everything!
Existing Systems
14. Statistical Properties of ML
โ Model is often part of a bigger app (e.g. top-K query)
14
Artist Score Rank
Beatles 9.7 1
Bruce Springsteen 9.5 2
โฆ โฆ โฆ
Justin Bieber 5.6 999
Nickelback 4.1 1000
High-value:
Rank precisely,
return.
Low-value:
Approximate,
discard.
Statistically-aware Systems
15. Prior Work: Statistically-Aware Optimizations
โ Statistically-aware optimizations exist in literature.
โ Always application-specific and custom-built.
โ Never automatic!
15
Source:
Cheng et al.
(DLRSโ 16),
Kang et al.
(VLDB โ17)
16. ML Inference Dilemna
โ ML inference systems:
โ Easy to use.
โ Slow.
โ Statistically-aware systems:
โ Fast
โ Require a lot of work to implement.
16
17. Can an ML inference system be fast and easy to use?
17
27. Background: Model Cascades
โ Classify โeasyโ inputs with cheap model.
โ Cascade to expensive model for โhardโ inputs.
27
Easy input:
Definitely not
a dog.
Hard input:
Maybe a
dog?
28. Background: Model Cascades
โ Used for image classification, object detection.
โ Existing systems application-specific and custom-built.
28
Source:
Viola-Jones
(CVPRโ 01),
Kang et al.
(VLDB โ17)
29. Our Optimization: End-to-end cascades
โ Compute only some features for โeasyโ data inputs;
cascade to computing all for โhardโ inputs.
โ Automatic and model-agnostic, unlike prior work.
โ Estimates for runtime performance & accuracy of a feature set
โ Efficient search process for tuning parameters
29
33. Cascades
Optimization
End-to-end Cascades: Final Pipeline
Compute
All Features
Model
Prediction
Compute Selected Features
Compute Remaining Features
Approximate Model
Prediction
Original Model
Confidence > Threshold
Yes No
34. End-to-end Cascades: Constructing Cascades
34
โ Construct cascades during model training.
โ Need model training set and an accuracy target.
35. End-to-end Cascades: Selecting Features
Compute Selected Features
Compute Remaining Features
Approximate Model
Prediction
Original Model
Confidence > Threshold
Yes No
Key question:
Select which
features?
36. End-to-end Cascades: Selecting Features
36
โ Goal: Select features that minimize expected query time
given accuracy target.
37. End-to-end Cascades: Selecting Features
Compute Selected Features
Compute Remaining Features
Approximate Model
Prediction
Original Model
Confidence > Threshold
Yes No
Two possibilities for a query: Can approximate or not.
Can approximate
query.
Canโt approximate
query.
38. End-to-end Cascades: Selecting Features
Compute Selected Features (S)
Approximate Model
Prediction
Confidence > Threshold
YesP(Yes) = P(approx)
cost(๐)
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
39. End-to-end Cascades: Selecting Features
Compute Selected Features (S)
Compute Remaining Features
Approximate Model
Prediction
Original Model
Confidence > Threshold
No P(No) = P(~approx)
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
cost(๐น)
40. End-to-end Cascades: Selecting Features
Compute Selected Features (S)
Compute Remaining Features
Approximate Model
Prediction
Original Model
Confidence > Threshold
Yes NoP(Yes) = P(approx) P(No) = P(~approx)
cost(๐)
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
cost(๐น)
41. End-to-end Cascades: Selecting Features
41
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
42. End-to-end Cascades: Selecting Features
42
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐บ).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
43. End-to-end Cascades: Selecting Features
43
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
44. End-to-end Cascades: Selecting Features
44
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
45. End-to-end Cascades: Selecting Features
45
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
46. End-to-end Cascades: Selecting Features
46
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐บ).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
47. End-to-end Cascades: Selecting Features
47
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
48. End-to-end Cascades: Selecting Features
48
โ Subgoal: Find S minimizing query time if ๐๐๐ ๐ก ๐ = ๐ ๐๐๐ฅ.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
49. End-to-end Cascades: Selecting Features
49
โ Subgoal: Find S minimizing query time if ๐๐๐ ๐ก ๐ = ๐ ๐๐๐ฅ.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Find S maximizing approximate model accuracy.
50. End-to-end Cascades: Selecting Features
50
โ Subgoal: Find S minimizing query time if ๐๐๐ ๐ก ๐ = ๐ ๐๐๐ฅ.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Find S maximizing approximate model accuracy.
โ Problem: Computing accuracy expensive.
51. End-to-end Cascades: Selecting Features
51
โ Subgoal: Find S minimizing query time if ๐๐๐ ๐ก ๐ = ๐ ๐๐๐ฅ.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Find S maximizing approximate model accuracy.
โ Problem: Computing accuracy expensive.
โ Solution: Estimate accuracy via permutation
importance -> knapsack problem.
52. End-to-end Cascades: Selecting Features
52
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
53. End-to-end Cascades: Selecting Features
53
โ Subgoal: Train model & find cascade threshold for S.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Compute empirically on held-out data.
54. End-to-end Cascades: Selecting Features
54
โ Subgoal: Train model & find cascade threshold for S.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Compute empirically on held-out data.
โ Train approximate model from S.
55. End-to-end Cascades: Selecting Features
55
โ Subgoal: Train model & find cascade threshold for S.
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Solution:
โ Compute empirically on held-out data.
โ Train approximate model from S.
โ Predict held-out set, determine cascade threshold
empirically using accuracy target.
56. End-to-end Cascades: Selecting Features
56
โ Goal: Select feature set S that minimizes query time:
min
๐
๐(approx)cost(๐) + ๐(~approx)cost(๐น)
โ Approach:
โ Choose several potential values of cost(๐).
โ Find best feature set with each cost(S).
โ Train model & find cascade threshold for each set.
โ Pick best overall.
57. End-to-end Cascades: Results
57
โ Speedups of up to 5x without statistically significant
accuracy loss.
โ Full evaluation at end of talk!
59. Top-K Approximation: Query Overview
โ Top-K problem: Rank K highest-scoring items of a
dataset.
โ Top-K example: Find 10 artists a user would like most
(recommender system).
59
60. Top-K Approximation: Asymmetry
60
โ High-value items must be predicted, ranked precisely.
โ Low-value items need only be identified as low value.
Artist Score Rank
Beatles 9.7 1
Bruce Springsteen 9.5 2
โฆ โฆ โฆ
Justin Bieber 5.6 999
Nickelback 4.1 1000
High-value:
Rank precisely,
return.
Low-value:
Approximate,
discard.
61. Top-K Approximation: How it Works
โ Use approximate model to identify and discard low-
value items.
โ Rank high-value items with powerful model.
61
62. Top-K Approximation: Prior Work
โ Existing systems have similar ideas.
โ However, we automatically generate approximate
models for any ML applicationโprior systems donโt.
โ Similar challenges as in cascades.
62
Source:
Cheng et al.
(DLRS โ16)
63. Top-K Approximation: Automatic Tuning
โ Automatically selects features, tunes parameters to
maximize performance given accuracy target.
โ Works similarly to cascades.
โ See paper for details!
63
70. Statistical nature of ML enables new
optimizations: Willump applies them
automatically for 10x speedups.
github.com/stanford-futuredata/Willump
Summary
Summary