SlideShare une entreprise Scribd logo
1  sur  24
Télécharger pour lire hors ligne
Speaker: Alberto Scolari, PhD student @ Politecnico di Milano, Italy - alberto.scolari@polimi.it
Politecnico di Milano, Milano, 19/10/2018
Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco D.
Santambrogio, Markus Weimer, Matteo Interlandi
PRETZEL: Opening the Black Box
of Machine Learning Prediction Serving Systems
ML-as-a-Service
• ML models are learnt from data
during training
2
• Key requirements:
1. Performance: latency/throughput
2. Minimal resource usage: minimal service cost
• Are deployed on cloud platforms for
Prediction Serving
• State-of-art deployment strategy: Black Box
3
Inside the black box: interesting facts
• Applications host multiple models per machine
(10-100s)
• Deployed models are often similar in structure and
state
- Customer personalization, Templates, Transfer
Learning
• Inside, models are DAGs
of different operators
• But with black boxes, you can apply only
external optimizations: caching, batching, …
We need to know structure and state:
PRETZEL white-box model
4
Breaking the black-box model
1. To generate an optimised version of a model on
deployment: higher performance
2. To allocate shared state only once, and share
resources among models: higher density
‣ Limitations of Black Box approaches
‣ PRETZEL, White Box Prediction Serving System
‣ Evaluation
‣ Conclusions and Future Work
5
Outline
LIMITATIONS OF BLACK BOX
APPROACHES
6
7
Case study
• 250 Sentiment Analysis models in ML.Net, C#, run 100 times
• First warm-up execution is cold, 99 following executions are hot
• Long-Tail latency, especially with cold:
cannot ensure Service-Level-Objectives
(SLOs)
• Overheads: JIT, memory allocation
• Profiling shows no clear bottleneck, with ML op LogReg being 0.3% of
runtime for simple models
• Black Box models cannot share resources
• Each model has its own container/process/
thread: overhead, poor scalability
8
Resource waste
• Each model has its own state
• But many operators have similar/equal state, and models are
deployed together
• Optimisations for single operators, like DNNs [1-2]
• TensorFlow Serving [3] as Servable Python objects, ML.Net as zip files
with state files and DLLs
• Clipper [4] and Rafiki [5] deploy pipelines as Docker containers
– They schedule requests based on latency target
– Can apply caching and batching
• MauveDB [6] accepts regression and interpolation model and optimises
them as DB views
• Tensor Comprehension [7] optimizes DNN models only via tensors
9
Related work
[1] https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/
[2] In-Datacenter Performance Analysis of a Tensor Processing Unit, arXiv, Apr 2017
[3] C.Olston, et al., Tensorflow - serving: Flexible, high-performance ml serving. In Work-shop on ML Systems at NIPS, 2017
[4] D. Crankshaw, et al.. Clipper: A low-latency online prediction serving system. In NSDI, 2017
[5] W.Wang, et al., Rafiki: Machine Learning as an Analytics Service System. ArXiv e-prints, Apr. 2018
[6] A. Deshpande and S. Madden, Mauvedb: Supporting model-based user views in database systems. In SIGMOD, 2006
[7] N. Vasilache, et al., Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
PRETZEL, WHITE-BOX PREDICTION
SERVING SYSTEM
10
White Box Prediction Serving: make pipelines co-exist
better, schedule better
1. End-to-end Optimizations: inspect models and optimise
internal execution
2. Multi-model Optimizations: share data, code and
resources
11
Design principles
End-to-end
1. Ahead-of-Time compilation at deployment, to
minimise JIT
2. Vector pooling, pre-allocate data structures
12
Models optimizations
Multi-model
1. Use Object Store to share operators parameters/
weights
2. Sub-Plan materialisation to -use intermediate results
across models
13
Off-line phase - Flour+Oven
var fContext = ...;
var Tokenizer = ...;
return fPrgm.Plan();
(1) Flour
Transforms
Logical Stages
S1 S2 S3 1: [x]
2: [y,z]
3: …
int[100]
float[200]
…
Params Stats
Physical Stages
S1 S2 S3
(3) Compilation
(2) Optimization
Model
Stats
Params
Logical
Stages
Physical
Stages
Model Plan
Figure 6: Model optimization and compilation in PRET-
ZEL. In (1), a model is translated into a Flour program. (2)
Oven Optimizer generates a DAG of logical stages from
the program. Additionally, parameters and statistics are
extracted. (3) A DAG of physical stages is generated by
the Oven Compiler using logical stages, parameters, and
statistics. A model plan is the union of all the elements
and is fed to the runtime.
recognize
Char and
of Concat
Char and
CharNgra
CharNgra
created. T
stages, ve
Model Pl
two DAG
DAG of p
tion of the
lated para
that will b
given DA
physical s
execution
physical i
ters chara
Plan co
DAG is g
Plan Com
representa
formation
var fContext = ...;
var Tokenizer = ...;
return fPrgm.Plan();
(1) Flour
Transforms
Logical Stages
S1 S2 S3 1: [x]
2: [y,z]
3: …
int[100]
float[200]
…
Params Stats
Physical Stages
S1 S2 S3
(3) Compilation
(2) Optimization
Model
Stats
Params
Logical
Stages
Physical
Stages
Model Plan
Figure 6: Model optimization and compilation in PRET-
ZEL. In (1), a model is translated into a Flour program. (2)
Oven Optimizer generates a DAG of logical stages from
the program. Additionally, parameters and statistics are
extracted. (3) A DAG of physical stages is generated by
recognize that the Linear R
Char and WordNgram, ther
of Concat. Additionally, To
Char and WordNgram, ther
CharNgram (in one stage)
CharNgram and WordNGr
created. The final plan wil
stages, versus the initial 4 o
Model Plan Compiler: M
two DAGs: a DAG comp
DAG of physical stages. L
tion of the stages output of
lated parameters; physical s
that will be executed by the
given DAG, there is a 1-to-
physical stages so that a lo
execution code of different
physical implementation is
ters characterizing a logica
Plan compilation is a two
var Tokenizer = ...;
return fPrgm.Plan();
(1) Flour
Transforms
Logical Stages
S1 S2 S3 1: [x]
2: [y,z]
3: …
int[100]
float[200]
…
Params Stats
Physical Stages
S1 S2 S3
(3) Compilation
(2) Optimization
Model
Stats
Params
Logical
Stages
Physical
Stages
Model Plan
Figure 6: Model optimization and compilation in PRET
ZEL. In (1), a model is translated into a Flour program. (2
Oven Optimizer generates a DAG of logical stages from
the program. Additionally, parameters and statistics are
extracted. (3) A DAG of physical stages is generated by
the Oven Compiler using logical stages, parameters, and
statistics. A model plan is the union of all the element
and is fed to the runtime.
(such as most featurizers) are pipelined together in a sin
gle pass over the data. This strategy achieves best data
locality because records are likely to reside in CPU reg
isters [33, 38]. Compute-intensive transformations (e.g
Oven
Optimiser
var fContext = ...;
var Tokenizer = ...;
return fPrgm.Plan();
(1) Flour
Transforms
Logical Stages
S1 S2 S3 1: [x]
2: [y,z]
3: …
int[100]
float[200]
…
Params Stats
Physical Stages
S1 S2 S3
(3) Compilation
(2) Optimization
Model
Stats
Params
Logical
Stages
Physical
Stages
Model Plan
Figure 6: Model optimization and compilation in PRET-
ZEL. In (1), a model is translated into a Flour program. (2)
Oven Optimizer generates a DAG of logical stages from
the program. Additionally, parameters and statistics are
extracted. (3) A DAG of physical stages is generated by
the Oven Compiler using logical stages, parameters, and
statistics. A model plan is the union of all the elements
and is fed to the runtime.
(such as most featurizers) are pipelined together in a sin-
gle pass over the data. This strategy achieves best data
locality because records are likely to reside in CPU reg-
isters [33, 38]. Compute-intensive transformations (e.g.,
var fContext = ...;
var Tokenizer = ...;
return fPrgm.Plan();
(1) Flour
Transforms
Logical Stages
S1 S2 S3 1: [x]
2: [y,z]
3: …
int[100]
float[200]
…
Params Stats
Physical Stages
S1 S2 S3
(3) Compilation
(2) Optimization
Model
Stats
Params
Logical
Stages
Physical
Stages
Model Plan
Figure 6: Model optimization and compilation in PRET-
ZEL. In (1), a model is translated into a Flour program. (2)
Oven Optimizer generates a DAG of logical stages from
recognize that the Linear Regression can be pushed in
Char and WordNgram, therefore bypassing the executi
of Concat. Additionally, Tokenizer can be reused betwe
Char and WordNgram, therefore it will be pipelined w
CharNgram (in one stage) and a dependency betwe
CharNgram and WordNGram (in another stage) will
created. The final plan will therefore be composed by
stages, versus the initial 4 operators (and vectors) of ML
Model Plan Compiler: Model plans are composed
two DAGs: a DAG composed of logical stages, and
DAG of physical stages. Logical stages are an abstr
tion of the stages output of the Oven Optimizer, with
lated parameters; physical stages contains the actual co
that will be executed by the PRETZEL runtime. For.ea
given DAG, there is a 1-to-n mapping between logical
physical stages so that a logical stage can represent t
execution code of different physical implementations.
physical implementation is selected based on the param
Flour API
• Oven optimizes models much like DB queries
• Uses a rule-based optimiser
– repeatedly looks for patterns of operators within model DAG
– merge operators into stage
14
Oven optimizations
Initial
model DAG
Push linear
predictor back
and remove
Concat
Apply rules
and group
into stages
Add
statistics and
create Model
Plan
• Two main components:
– Runtime, with an Object Store
– Scheduler
• Runtime handles physical resources: threads and buffers
• Object Store caches objects of all models
– models register and retrieve state objects via a key,
like the file MD5
• Scheduler is event-based, each stage being an event
15
On-line phase
EVALUATION
16
• Two model classes written in ML.NET, running in ML.NET and
Pretzel
– 250 Sentiment Analysis (SA) models
– 250 Attendee Count (AC) models
• Testbed representing a small production server
– 2 8-core Xeon E5-2620 v4 at 2.10 GHz, HT disabled
– 32 GB RAM
– Windows 10
– .Net Core 2.0
17
Workload and testbed
• Experiments with all 250 AC models, smaller than SA
• With SA, only Pretzel can load all models
18
Memory
Setting
Shared
Objects
Shared
Runtime
ML.Net + Clipper
ML.Net ✓
PRETZEL without
ObjectStore ✓
PRETZEL ✓ ✓
• Micro-benchmark with stand-alone system, no communication
• All 250 SA models
19
Latency
ML.Net PRETZEL
P99 (hot) 0.6 0.2
P99 (cold) 8.1 0.8
Worst (cold) 280.2 6.2
• 250 AC models, run 1000 times each
• 1000 queries in a batch
• ML.Net vs PRETZEL
20
Throughput
CONCLUSIONS AND FUTURE WORK
21
• We addressed performance/density bottlenecks in ML inference for
Model-as-a-Service
• We advocate the adoption of a white-box approach
• We apply DB query optimizations techniques to ML Prediction Serving
• We were accepted at OSDI ’18
• Limitations:
- PRETZEL currently supports a subset of ML.Net operators
- No NN operators
- No automated code generation: stages implementation still
involves some manual process
22
Conclusions and Limitations
• NUMA-aware Scheduler and Runtime
• Fully automated code-generation of stages:
- hardware-specific templates [8]
- Halide-based generator for CPU and GPU: no JIT anymore
• Support user-coded operators for filtering and pre-processing
23
Future Work 1
[8] K.Krikellas, S.Viglas,et al. Generating code for holistic query evaluation, in ICDE, pages 613–624. IEEE Computer Society, 2010
• Supporting ML.Net operators, including ONNX [9], is complex
• Not just manpower: Oven rules need to scale fairly with number of
operators
- Cannot write rules for all possible (sequences of) operators
• We need a formal framework to describe operators
- Something like Relational Algebra for query optimiser
- Maybe Tensor Algebra, like Tensor Comprehension [10]?
24
Future Work 2
QUESTIONS ?
Speaker: Alberto Scolari, PhD student @ Politecnico di Milano, Italy - alberto.scolari@polimi.it
[9] Open Neural Network Exchange (ONNX). https://onnx.ai, 2017
[10] Announcing Tensor Comprehensions. https://research.fb.com/announcing-tensor-comprehensions/
Y. Lee, A. Scolari, B.-G. Chun, M. D. Santambrogio, M. Weimer, M. Interlandi
PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
https://arxiv.org/abs/1810.06115

Contenu connexe

Tendances

pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGenevsachde
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...Alexander Decker
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterIRJET Journal
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentationNoha Elprince
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex PerrierAlexis Perrier
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013James McGalliard
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMIJCSEA Journal
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...Sunny Kr
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniquejournalBEEI
 
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...eSAT Publishing House
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Intel® Software
 
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco IJECEIAES
 
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd Iaetsd
 
Design and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersDesign and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersIJERA Editor
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsHPCC Systems
 

Tendances (20)

cug2011-praveen
cug2011-praveencug2011-praveen
cug2011-praveen
 
T180304125129
T180304125129T180304125129
T180304125129
 
Cluster Schedulers
Cluster SchedulersCluster Schedulers
Cluster Schedulers
 
pMatlab on BlueGene
pMatlab on BlueGenepMatlab on BlueGene
pMatlab on BlueGene
 
11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...11.dynamic instruction scheduling for microprocessors having out of order exe...
11.dynamic instruction scheduling for microprocessors having out of order exe...
 
Enhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop ClusterEnhancing Performance and Fault Tolerance of Hadoop Cluster
Enhancing Performance and Fault Tolerance of Hadoop Cluster
 
I012255862
I012255862I012255862
I012255862
 
D0212326
D0212326D0212326
D0212326
 
My mapreduce1 presentation
My mapreduce1 presentationMy mapreduce1 presentation
My mapreduce1 presentation
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
SOME WORKLOAD SCHEDULING ALTERNATIVES 11.07.2013
 
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEMGRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
GRAPH MATCHING ALGORITHM FOR TASK ASSIGNMENT PROBLEM
 
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardin...
 
Enhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce TechniqueEnhancing Big Data Analysis by using Map-reduce Technique
Enhancing Big Data Analysis by using Map-reduce Technique
 
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
A multi objective hybrid aco-pso optimization algorithm for virtual machine p...
 
Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*Data Analytics and Simulation in Parallel with MATLAB*
Data Analytics and Simulation in Parallel with MATLAB*
 
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
MetOp Satellites Data Processing for Air Pollution Monitoring in Morocco
 
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdlIaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
Iaetsd vlsi architecture for exploiting carry save arithmetic using verilog hdl
 
Design and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix addersDesign and Estimation of delay, power and area for Parallel prefix adders
Design and Estimation of delay, power and area for Parallel prefix adders
 
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC SystemsImproving Efficiency of Machine Learning Algorithms using HPCC Systems
Improving Efficiency of Machine Learning Algorithms using HPCC Systems
 

Similaire à PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Rusif Eyvazli
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...NECST Lab @ Politecnico di Milano
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsLightbend
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsLightbend
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerSambit Banerjee
 
Traffic Simulator
Traffic SimulatorTraffic Simulator
Traffic Simulatorgystell
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 
Intelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingIntelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingAlessio Villardita
 
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...Modelon
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkWenrui Meng
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMatthias Feys
 
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsReport: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsRoman Atachiants
 
Evolutionary Multi-Goal Workflow Progress in Shade
Evolutionary  Multi-Goal Workflow Progress in ShadeEvolutionary  Multi-Goal Workflow Progress in Shade
Evolutionary Multi-Goal Workflow Progress in ShadeIRJET Journal
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGcscpconf
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxanhlodge
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 EstimationLawrence Bernstein
 
Parallel machines flinkforward2017
Parallel machines flinkforward2017Parallel machines flinkforward2017
Parallel machines flinkforward2017Nisha Talagala
 

Similaire à PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems (20)

Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...Scaling Application on High Performance Computing Clusters and Analysis of th...
Scaling Application on High Performance Computing Clusters and Analysis of th...
 
Pretzel: optimized Machine Learning framework for low-latency and high throu...
Pretzel: optimized Machine Learning framework for  low-latency and high throu...Pretzel: optimized Machine Learning framework for  low-latency and high throu...
Pretzel: optimized Machine Learning framework for low-latency and high throu...
 
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data StreamsMachine Learning At Speed: Operationalizing ML For Real-Time Data Streams
Machine Learning At Speed: Operationalizing ML For Real-Time Data Streams
 
Operationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML ModelsOperationalizing Machine Learning: Serving ML Models
Operationalizing Machine Learning: Serving ML Models
 
Exploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access LayerExploring Neo4j Graph Database as a Fast Data Access Layer
Exploring Neo4j Graph Database as a Fast Data Access Layer
 
Traffic Simulator
Traffic SimulatorTraffic Simulator
Traffic Simulator
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 
Intelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modelingIntelligent Systems Project: Bike sharing service modeling
Intelligent Systems Project: Bike sharing service modeling
 
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...
Model-Based Integration for FMI Co-Simulation and Heterogeneous Simulations o...
 
Uber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache FlinkUber Business Metrics Generation and Management Through Apache Flink
Uber Business Metrics Generation and Management Through Apache Flink
 
Machine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud PlatformMachine learning at scale with Google Cloud Platform
Machine learning at scale with Google Cloud Platform
 
Parallel Processor for Graphics Acceleration
Parallel Processor for Graphics AccelerationParallel Processor for Graphics Acceleration
Parallel Processor for Graphics Acceleration
 
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing ToolsReport: Test49 Geant4 Monte-Carlo Models Testing Tools
Report: Test49 Geant4 Monte-Carlo Models Testing Tools
 
Evolutionary Multi-Goal Workflow Progress in Shade
Evolutionary  Multi-Goal Workflow Progress in ShadeEvolutionary  Multi-Goal Workflow Progress in Shade
Evolutionary Multi-Goal Workflow Progress in Shade
 
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTINGDYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
DYNAMIC TASK PARTITIONING MODEL IN PARALLEL COMPUTING
 
COCOMO
COCOMOCOCOMO
COCOMO
 
School of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docxSchool of Computing, Science & EngineeringAssessment Briefin.docx
School of Computing, Science & EngineeringAssessment Briefin.docx
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
Parallel machines flinkforward2017
Parallel machines flinkforward2017Parallel machines flinkforward2017
Parallel machines flinkforward2017
 

Plus de NECST Lab @ Politecnico di Milano

Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingNECST Lab @ Politecnico di Milano
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...NECST Lab @ Politecnico di Milano
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification SystemNECST Lab @ Politecnico di Milano
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingNECST Lab @ Politecnico di Milano
 

Plus de NECST Lab @ Politecnico di Milano (20)

Mesticheria Team - WiiReflex
Mesticheria Team - WiiReflexMesticheria Team - WiiReflex
Mesticheria Team - WiiReflex
 
Punto e virgola Team - Stressometro
Punto e virgola Team - StressometroPunto e virgola Team - Stressometro
Punto e virgola Team - Stressometro
 
BitIt Team - Stay.straight
BitIt Team - Stay.straight BitIt Team - Stay.straight
BitIt Team - Stay.straight
 
BabYodini Team - Talking Gloves
BabYodini Team - Talking GlovesBabYodini Team - Talking Gloves
BabYodini Team - Talking Gloves
 
printf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTonprintf("Nome Squadra"); Team - NeoTon
printf("Nome Squadra"); Team - NeoTon
 
BlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking PlatformBlackBoard Team - Motion Tracking Platform
BlackBoard Team - Motion Tracking Platform
 
#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome#include<brain.h> Team - HomeBeatHome
#include<brain.h> Team - HomeBeatHome
 
Flipflops Team - Wave U
Flipflops Team - Wave UFlipflops Team - Wave U
Flipflops Team - Wave U
 
Bug(atta) Team - Little Brother
Bug(atta) Team - Little BrotherBug(atta) Team - Little Brother
Bug(atta) Team - Little Brother
 
#NECSTCamp: come partecipare
#NECSTCamp: come partecipare#NECSTCamp: come partecipare
#NECSTCamp: come partecipare
 
NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1NECSTCamp101@2020.10.1
NECSTCamp101@2020.10.1
 
NECSTLab101 2020.2021
NECSTLab101 2020.2021NECSTLab101 2020.2021
NECSTLab101 2020.2021
 
TreeHouse, nourish your community
TreeHouse, nourish your communityTreeHouse, nourish your community
TreeHouse, nourish your community
 
TiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architectureTiReX: Tiled Regular eXpressionsmatching architecture
TiReX: Tiled Regular eXpressionsmatching architecture
 
Embedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposingEmbedding based knowledge graph link prediction for drug repurposing
Embedding based knowledge graph link prediction for drug repurposing
 
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
PLASTER - PYNQ-based abandoned object detection using a map-reduce approach o...
 
EMPhASIS - An EMbedded Public Attention Stress Identification System
 EMPhASIS - An EMbedded Public Attention Stress Identification System EMPhASIS - An EMbedded Public Attention Stress Identification System
EMPhASIS - An EMbedded Public Attention Stress Identification System
 
Luns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural networkLuns - Automatic lungs segmentation through neural network
Luns - Automatic lungs segmentation through neural network
 
BlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAsBlastFunction: How to combine Serverless and FPGAs
BlastFunction: How to combine Serverless and FPGAs
 
Maeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matchingMaeve - Fast genome analysis leveraging exact string matching
Maeve - Fast genome analysis leveraging exact string matching
 

Dernier

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performancesivaprakash250
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Christo Ananth
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VDineshKumar4165
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Bookingroncy bisnoi
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01KreezheaRecto
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdfankushspencer015
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSrknatarajan
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdfSuman Jyoti
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordAsst.prof M.Gokilavani
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spaintimesproduction05
 

Dernier (20)

UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...Call for Papers - International Journal of Intelligent Systems and Applicatio...
Call for Papers - International Journal of Intelligent Systems and Applicatio...
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
(INDIRA) Call Girl Bhosari Call Now 8617697112 Bhosari Escorts 24x7
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Walvekar Nagar Call Me 7737669865 Budget Friendly No Advance Booking
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01Double rodded leveling 1 pdf activity 01
Double rodded leveling 1 pdf activity 01
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
AKTU Computer Networks notes --- Unit 3.pdf
AKTU Computer Networks notes ---  Unit 3.pdfAKTU Computer Networks notes ---  Unit 3.pdf
AKTU Computer Networks notes --- Unit 3.pdf
 
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICSUNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
UNIT-IFLUID PROPERTIES & FLOW CHARACTERISTICS
 
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank  Design by Working Stress - IS Method.pdfIntze Overhead Water Tank  Design by Working Stress - IS Method.pdf
Intze Overhead Water Tank Design by Working Stress - IS Method.pdf
 
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete RecordCCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
CCS335 _ Neural Networks and Deep Learning Laboratory_Lab Complete Record
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 
Vivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design SpainVivazz, Mieres Social Housing Design Spain
Vivazz, Mieres Social Housing Design Spain
 

PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems

  • 1. Speaker: Alberto Scolari, PhD student @ Politecnico di Milano, Italy - alberto.scolari@polimi.it Politecnico di Milano, Milano, 19/10/2018 Yunseong Lee, Alberto Scolari, Byung-Gon Chun, Marco D. Santambrogio, Markus Weimer, Matteo Interlandi PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems
  • 2. ML-as-a-Service • ML models are learnt from data during training 2 • Key requirements: 1. Performance: latency/throughput 2. Minimal resource usage: minimal service cost • Are deployed on cloud platforms for Prediction Serving • State-of-art deployment strategy: Black Box
  • 3. 3 Inside the black box: interesting facts • Applications host multiple models per machine (10-100s) • Deployed models are often similar in structure and state - Customer personalization, Templates, Transfer Learning • Inside, models are DAGs of different operators • But with black boxes, you can apply only external optimizations: caching, batching, …
  • 4. We need to know structure and state: PRETZEL white-box model 4 Breaking the black-box model 1. To generate an optimised version of a model on deployment: higher performance 2. To allocate shared state only once, and share resources among models: higher density
  • 5. ‣ Limitations of Black Box approaches ‣ PRETZEL, White Box Prediction Serving System ‣ Evaluation ‣ Conclusions and Future Work 5 Outline
  • 6. LIMITATIONS OF BLACK BOX APPROACHES 6
  • 7. 7 Case study • 250 Sentiment Analysis models in ML.Net, C#, run 100 times • First warm-up execution is cold, 99 following executions are hot • Long-Tail latency, especially with cold: cannot ensure Service-Level-Objectives (SLOs) • Overheads: JIT, memory allocation • Profiling shows no clear bottleneck, with ML op LogReg being 0.3% of runtime for simple models
  • 8. • Black Box models cannot share resources • Each model has its own container/process/ thread: overhead, poor scalability 8 Resource waste • Each model has its own state • But many operators have similar/equal state, and models are deployed together
  • 9. • Optimisations for single operators, like DNNs [1-2] • TensorFlow Serving [3] as Servable Python objects, ML.Net as zip files with state files and DLLs • Clipper [4] and Rafiki [5] deploy pipelines as Docker containers – They schedule requests based on latency target – Can apply caching and batching • MauveDB [6] accepts regression and interpolation model and optimises them as DB views • Tensor Comprehension [7] optimizes DNN models only via tensors 9 Related work [1] https://www.microsoft.com/en-us/research/blog/microsoft-unveils-project-brainwave/ [2] In-Datacenter Performance Analysis of a Tensor Processing Unit, arXiv, Apr 2017 [3] C.Olston, et al., Tensorflow - serving: Flexible, high-performance ml serving. In Work-shop on ML Systems at NIPS, 2017 [4] D. Crankshaw, et al.. Clipper: A low-latency online prediction serving system. In NSDI, 2017 [5] W.Wang, et al., Rafiki: Machine Learning as an Analytics Service System. ArXiv e-prints, Apr. 2018 [6] A. Deshpande and S. Madden, Mauvedb: Supporting model-based user views in database systems. In SIGMOD, 2006 [7] N. Vasilache, et al., Comprehensions: Framework-Agnostic High-Performance Machine Learning Abstractions
  • 11. White Box Prediction Serving: make pipelines co-exist better, schedule better 1. End-to-end Optimizations: inspect models and optimise internal execution 2. Multi-model Optimizations: share data, code and resources 11 Design principles
  • 12. End-to-end 1. Ahead-of-Time compilation at deployment, to minimise JIT 2. Vector pooling, pre-allocate data structures 12 Models optimizations Multi-model 1. Use Object Store to share operators parameters/ weights 2. Sub-Plan materialisation to -use intermediate results across models
  • 13. 13 Off-line phase - Flour+Oven var fContext = ...; var Tokenizer = ...; return fPrgm.Plan(); (1) Flour Transforms Logical Stages S1 S2 S3 1: [x] 2: [y,z] 3: … int[100] float[200] … Params Stats Physical Stages S1 S2 S3 (3) Compilation (2) Optimization Model Stats Params Logical Stages Physical Stages Model Plan Figure 6: Model optimization and compilation in PRET- ZEL. In (1), a model is translated into a Flour program. (2) Oven Optimizer generates a DAG of logical stages from the program. Additionally, parameters and statistics are extracted. (3) A DAG of physical stages is generated by the Oven Compiler using logical stages, parameters, and statistics. A model plan is the union of all the elements and is fed to the runtime. recognize Char and of Concat Char and CharNgra CharNgra created. T stages, ve Model Pl two DAG DAG of p tion of the lated para that will b given DA physical s execution physical i ters chara Plan co DAG is g Plan Com representa formation var fContext = ...; var Tokenizer = ...; return fPrgm.Plan(); (1) Flour Transforms Logical Stages S1 S2 S3 1: [x] 2: [y,z] 3: … int[100] float[200] … Params Stats Physical Stages S1 S2 S3 (3) Compilation (2) Optimization Model Stats Params Logical Stages Physical Stages Model Plan Figure 6: Model optimization and compilation in PRET- ZEL. In (1), a model is translated into a Flour program. (2) Oven Optimizer generates a DAG of logical stages from the program. Additionally, parameters and statistics are extracted. (3) A DAG of physical stages is generated by recognize that the Linear R Char and WordNgram, ther of Concat. Additionally, To Char and WordNgram, ther CharNgram (in one stage) CharNgram and WordNGr created. The final plan wil stages, versus the initial 4 o Model Plan Compiler: M two DAGs: a DAG comp DAG of physical stages. L tion of the stages output of lated parameters; physical s that will be executed by the given DAG, there is a 1-to- physical stages so that a lo execution code of different physical implementation is ters characterizing a logica Plan compilation is a two var Tokenizer = ...; return fPrgm.Plan(); (1) Flour Transforms Logical Stages S1 S2 S3 1: [x] 2: [y,z] 3: … int[100] float[200] … Params Stats Physical Stages S1 S2 S3 (3) Compilation (2) Optimization Model Stats Params Logical Stages Physical Stages Model Plan Figure 6: Model optimization and compilation in PRET ZEL. In (1), a model is translated into a Flour program. (2 Oven Optimizer generates a DAG of logical stages from the program. Additionally, parameters and statistics are extracted. (3) A DAG of physical stages is generated by the Oven Compiler using logical stages, parameters, and statistics. A model plan is the union of all the element and is fed to the runtime. (such as most featurizers) are pipelined together in a sin gle pass over the data. This strategy achieves best data locality because records are likely to reside in CPU reg isters [33, 38]. Compute-intensive transformations (e.g Oven Optimiser var fContext = ...; var Tokenizer = ...; return fPrgm.Plan(); (1) Flour Transforms Logical Stages S1 S2 S3 1: [x] 2: [y,z] 3: … int[100] float[200] … Params Stats Physical Stages S1 S2 S3 (3) Compilation (2) Optimization Model Stats Params Logical Stages Physical Stages Model Plan Figure 6: Model optimization and compilation in PRET- ZEL. In (1), a model is translated into a Flour program. (2) Oven Optimizer generates a DAG of logical stages from the program. Additionally, parameters and statistics are extracted. (3) A DAG of physical stages is generated by the Oven Compiler using logical stages, parameters, and statistics. A model plan is the union of all the elements and is fed to the runtime. (such as most featurizers) are pipelined together in a sin- gle pass over the data. This strategy achieves best data locality because records are likely to reside in CPU reg- isters [33, 38]. Compute-intensive transformations (e.g., var fContext = ...; var Tokenizer = ...; return fPrgm.Plan(); (1) Flour Transforms Logical Stages S1 S2 S3 1: [x] 2: [y,z] 3: … int[100] float[200] … Params Stats Physical Stages S1 S2 S3 (3) Compilation (2) Optimization Model Stats Params Logical Stages Physical Stages Model Plan Figure 6: Model optimization and compilation in PRET- ZEL. In (1), a model is translated into a Flour program. (2) Oven Optimizer generates a DAG of logical stages from recognize that the Linear Regression can be pushed in Char and WordNgram, therefore bypassing the executi of Concat. Additionally, Tokenizer can be reused betwe Char and WordNgram, therefore it will be pipelined w CharNgram (in one stage) and a dependency betwe CharNgram and WordNGram (in another stage) will created. The final plan will therefore be composed by stages, versus the initial 4 operators (and vectors) of ML Model Plan Compiler: Model plans are composed two DAGs: a DAG composed of logical stages, and DAG of physical stages. Logical stages are an abstr tion of the stages output of the Oven Optimizer, with lated parameters; physical stages contains the actual co that will be executed by the PRETZEL runtime. For.ea given DAG, there is a 1-to-n mapping between logical physical stages so that a logical stage can represent t execution code of different physical implementations. physical implementation is selected based on the param Flour API
  • 14. • Oven optimizes models much like DB queries • Uses a rule-based optimiser – repeatedly looks for patterns of operators within model DAG – merge operators into stage 14 Oven optimizations Initial model DAG Push linear predictor back and remove Concat Apply rules and group into stages Add statistics and create Model Plan
  • 15. • Two main components: – Runtime, with an Object Store – Scheduler • Runtime handles physical resources: threads and buffers • Object Store caches objects of all models – models register and retrieve state objects via a key, like the file MD5 • Scheduler is event-based, each stage being an event 15 On-line phase
  • 17. • Two model classes written in ML.NET, running in ML.NET and Pretzel – 250 Sentiment Analysis (SA) models – 250 Attendee Count (AC) models • Testbed representing a small production server – 2 8-core Xeon E5-2620 v4 at 2.10 GHz, HT disabled – 32 GB RAM – Windows 10 – .Net Core 2.0 17 Workload and testbed
  • 18. • Experiments with all 250 AC models, smaller than SA • With SA, only Pretzel can load all models 18 Memory Setting Shared Objects Shared Runtime ML.Net + Clipper ML.Net ✓ PRETZEL without ObjectStore ✓ PRETZEL ✓ ✓
  • 19. • Micro-benchmark with stand-alone system, no communication • All 250 SA models 19 Latency ML.Net PRETZEL P99 (hot) 0.6 0.2 P99 (cold) 8.1 0.8 Worst (cold) 280.2 6.2
  • 20. • 250 AC models, run 1000 times each • 1000 queries in a batch • ML.Net vs PRETZEL 20 Throughput
  • 22. • We addressed performance/density bottlenecks in ML inference for Model-as-a-Service • We advocate the adoption of a white-box approach • We apply DB query optimizations techniques to ML Prediction Serving • We were accepted at OSDI ’18 • Limitations: - PRETZEL currently supports a subset of ML.Net operators - No NN operators - No automated code generation: stages implementation still involves some manual process 22 Conclusions and Limitations
  • 23. • NUMA-aware Scheduler and Runtime • Fully automated code-generation of stages: - hardware-specific templates [8] - Halide-based generator for CPU and GPU: no JIT anymore • Support user-coded operators for filtering and pre-processing 23 Future Work 1 [8] K.Krikellas, S.Viglas,et al. Generating code for holistic query evaluation, in ICDE, pages 613–624. IEEE Computer Society, 2010
  • 24. • Supporting ML.Net operators, including ONNX [9], is complex • Not just manpower: Oven rules need to scale fairly with number of operators - Cannot write rules for all possible (sequences of) operators • We need a formal framework to describe operators - Something like Relational Algebra for query optimiser - Maybe Tensor Algebra, like Tensor Comprehension [10]? 24 Future Work 2 QUESTIONS ? Speaker: Alberto Scolari, PhD student @ Politecnico di Milano, Italy - alberto.scolari@polimi.it [9] Open Neural Network Exchange (ONNX). https://onnx.ai, 2017 [10] Announcing Tensor Comprehensions. https://research.fb.com/announcing-tensor-comprehensions/ Y. Lee, A. Scolari, B.-G. Chun, M. D. Santambrogio, M. Weimer, M. Interlandi PRETZEL: Opening the Black Box of Machine Learning Prediction Serving Systems https://arxiv.org/abs/1810.06115