SlideShare a Scribd company logo
1 of 49
Download to read offline
Matteo Interlandi, Karla Saur
Hummingbird
Overview
Machine Learning Prediction Serving
Machine Learning Prediction Serving
Machine Learning Prediction Serving
1. Models are learned from data
Data
Training
Model
Learn
Machine Learning Prediction Serving
1. Models are learned from data
2. Models are deployed and served together
Prediction serving
Users
Server
Data
Training
Model
Learn Deploy
Model Serving
Specialized Systems have been developed
Model Serving
Specialized Systems have been developed Focus: Deep Learning (DL)
Model Serving
Specialized Systems have been developed
Support for traditional ML methods is largely overlooked
Focus: Deep Learning (DL)
Traditional ML Models
2019 Kaggle Survey: The State of Data Science & Machine Learning Data Science through the looking glass: https://arxiv.org/abs/1912.09536
Problem: Lack of Optimizations for
Traditional ML Serving
Systems for training traditional ML models are not optimized for serving
Traditional ML models are expressed using imperative code in an ad-hoc
fashion, not using a shared logical abstraction
Traditional ML models cannot natively exploit hardware acceleration
Tokeniz
er
How do “Traditional ML Models” look inside?
<Example: Binary Classification>
Char
Ngram
Word
Ngram
Conca
t
Logistic
Regressio
n
0 vs 1
Traditional ML Model
A B C D
a 0.1 c 0.5
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Featurizers
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Featurizers
Predictor
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Split input
into
cat  num
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Normalize
num
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat  num One hot
encode cat
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Merge two
vectors
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat  num
Normalize
num
One hot
encode cat
Split
How do “Traditional ML Models” look inside?
Scaler
OneHot
Conca
t
Logistic
Regressio
n
DAG of Operators (aka pipeline)
Merge two
vectors
Compute
final score
A B C D
a 0.1 c 0.5 0 vs 1
<Example: Binary Classification>
Split input
into
cat  num
Normalize
num
One hot
encode cat
Deep Learning
Primarily relies on the abstraction of tensors
Deep Learning
DL models are expressed as a DAG of tensor operators
w1 b1
X
Mat
Mul
Add ReLU
Mat
Mul
Add Sigmoid
w1 b1
User
Input
Systems for DL Prediction Serving
Exploit the abstraction of tensor operations to support multiple DL frameworks
on multiple target environments
Mat
Mul
Add ReLU …
✔ Efficient implementations
✔ Declarative
✔ Seamless hardware acceleration
✔ Reduced engineering efforts
Benefit
s:
Hummingbird
Mat
Mul
Add ReLU …
Deep
Learning
Traditional
ML
DL Prediction
Serving Systems
Converting ML Operators into Tensor Operations
Observation: pipelines are composed of two classes of operators
Algebraic Operations: E.g., Linear Regression
Algorithmic Operations: E.g., RandomForest, OneHotEncoder
Y = wX + b
Complex data access patterns and control-flow patterns!
Introduce redundancies, both computational and storage
Make data access patterns and control flow uniform for all inputs
Our Solution:
Depending on the level of redundancy introduced there can be
more than one potential compilation approach
Hummingbird picks the one that works given pipeline statistics
Converting Decision tree-based models
Converting Decision tree-based models
Converting Decision tree-based models
Converting Decision tree-based models
Converting Decision tree-based models
<=
Converting Decision tree-based models
<=
Converting Decision tree-based models
<=
Converting Decision tree-based models
<=
Converting Decision tree-based models
<=
Compiling Decision Tree-based Models
Above approach (GEMM approach) essentially evaluates all paths in a
decision tree model: computation redundancy.
Works surprisingly well on modern hardware for many cases!
Two other tree traversal-based methods that exploit the tree structure.
For tall trees (e.g., LightGBM) For bushy trees (e.g., XGBoost)
Tree Traversal Method
Tree Traversal Method
Initial: 0
repeat while < max tree depth
Gather
Feature Ids
Featu
re id
X
Gather Feature
value
Gather
Thresholds
Threshold
value
<
Co
nd.
Where
Lefts
Rights
Tr
ue
Fal
se
Perfect Tree Traversal Method
3
F3
< 0.5
F2
< 2.0 F5
< 5.5
C1 C2 C1
F3
< 2.4
C2 C1
true false
true true
true
false false
false
F3
< 0.5
F2
< 2.0 F5
< 5.5
C1 C1
F3
< 2.4
C2 C1
true false
true true
true
false false
false
C1 C2 C2 C1
Operator Group Supported Operators
Linear Classifiers Logistic Regression, Linear SVC, LinearSVR, SVC, SVR, NuSVC, SGDClassifier,
LogisticRegressionCV
Tree Methods DecisionTreeClassifier/Regressor, RandomForestClassifier/Regressor,
(Hist)GradientBoostingClassifier/Regressor, ExtraTreesClassifier/Regressor,
XGBClassifier/Regressor, LGBMClassifier/Regressor/Ranker
Neural Networks MLPClassifier
Others BernouliNB, Kmeans, MeanShift
Feature Selectors SelectKBest
Decomposition PCA, TruncatedSVD
Feature Pre-Processing SimpleImputer, Imputer, ColumnTransformer, RobustScaler, MaxAbsScaler,
MinMaxScaler, StandardScaler, Binarizer, KBinsDiscretizer, Normalizer,
PolynomialFeatures, OneHotEncoder, LabelEncoder, FeatureHasher
Supported Operators
High-level System Design
Hummingbird
Trained
Traditional
ML Pipelines
DL Prediction
Serving Systems
End-to-End Pipeline Evaluation
4
Hardware Setup Experimental Workload
Hummingbird can translate 2328 pipelines (88%).
Perform inference on 20% of the dataset.
TorchScript as the backend for Hummingbird.
Scikit-Learn pipelines for OpenML-CC18
benchmark which has 72 datasets.
Azure NC6 v2 machine
Intel Xeon E5-2690
v4@ 2.6GHz (6 cores)
112 GB RAM
Nvidia P100 Ubuntu 18.04,
PyTorch 1.3, TVM 0.6, CUDA 10,
RAPIDS 0.9
End-to-End Pipeline Evaluation
4
CPU
60%
1200 X
60 X
End-to-End Pipeline Evaluation
4
CPU GPU
60%
1200 X
60 X
73%
1000 X
130 X
Main reasons for slowdowns: Sparse input data, small inference datasets.
43
Demo
Hummingbird
Updates
• Hummingbird has reached > 21K PyPI
downloads and 2.4k stars
• Demoed at Microsoft Ignite
• Integrated with ONNX converter tools
• OSDI paper
• New features include:
• Pandas Dataframes
• PySparkML support
• TVM support
• Looking for new users/contributors!
45
Thank you!
hummingbird-dev@microsoft.com
Tree-Models Microbenchmark
46
Experimental Workload: Nvidia Gradient Boosting Algorithm Benchmark*
Dataset Rows #Features Task
Fraud 285k 28 BinaryClass
Year 512k 90 Regression
Covtype 581k 54 MultiClass
Epsilon 500k 2000 BinaryClass
3 Models: RandomForest, XGBoost, LightGBM.
80/20 train/test split.
Batch inference (batch size 10k w/ and w/o
GPU.
(* https://github.com/NVIDIA/gbm-bench)
Tree-Models Microbenchmark
47
Algorithm Dataset
Sklearn
(CPU Baseline)
Hummingbird (CPU) RAPIDS
(GPU Baseline)
Hummingbird (GPU)
TorchScript TVM TorchScript TVM
Rand. Forest
Fraud
Year
Covtype
Epsilon
LightGBM
Fraud
Year
Covtype
Epsilon
XGBoost
Fraud
Year
Covtype
Epsilon
(All runtimes are reported in seconds. More datasets and experimental results in the paper.)
Tree-Models Microbenchmark
48
Algorithm Dataset
Sklearn
(CPU Baseline)
Hummingbird (CPU) RAPIDS
(GPU Baseline)
Hummingbird (GPU)
TorchScript TVM TorchScript TVM
Rand. Forest
Fraud 2.5 7.8 3.0
Year 1.9 7.7 1.4
Covtype 5.9 16.5 6.8
Epsilon 9.8 13.9 6.6
LightGBM
Fraud 3.4 7.6 1.7
Year 5.0 7.6 1.6
Covtype 51.1 79.5 27.2
Epsilon 10.5 14.5 4.0
XGBoost
Fraud 1.9 7.6 1.6
Year 3.1 7.6 1.6
Covtype 42.3 79.0 26.4
Epsilon 7.6 14.8 4.2
(All runtimes are reported in seconds. More datasets and experimental results in the paper.)
Tree-Models Microbenchmark
49
Algorithm Dataset
Sklearn
(CPU Baseline)
Hummingbird (CPU) RAPIDS
(GPU Baseline)
Hummingbird (GPU)
TorchScript TVM TorchScript TVM
Rand. Forest
Fraud 2.5 7.8 3.0 !SUPPORTED 0.044 0.015
Year 1.9 7.7 1.4 !SUPPORTED 0.045 0.026
Covtype 5.9 16.5 6.8 !SUPPORTED 0.110 0.047
Epsilon 9.8 13.9 6.6 !SUPPORTED 0.130 0.13
LightGBM
Fraud 3.4 7.6 1.7 0.014 0.044 0.014
Year 5.0 7.6 1.6 0.023 0.045 0.025
Covtype 51.1 79.5 27.2 !SUPPORTED 0.620 0.250
Epsilon 10.5 14.5 4.0 0.150 0.130 0.120
XGBoost
Fraud 1.9 7.6 1.6 0.013 0.044 0.015
Year 3.1 7.6 1.6 0.022 0.045 0.026
Covtype 42.3 79.0 26.4 !SUPPORTED 0.620 0.250
Epsilon 7.6 14.8 4.2 0.150 0.130 0.120
(All runtimes are reported in seconds. More datasets and experimental results in the paper.)

More Related Content

What's hot

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks
 
Restoring and Non-Restoring division algo for CSE
Restoring and Non-Restoring division algo for CSERestoring and Non-Restoring division algo for CSE
Restoring and Non-Restoring division algo for CSEARoy10
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksDatabricks
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationArmando Vieira
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2RLJIT
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning ClassifiersMostafa
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningPaxcel Technologies
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation MethodSHUBHAM GUPTA
 
Cross validation
Cross validationCross validation
Cross validationRidhaAfrawe
 
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086Dr. AISHWARYA N
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisJaclyn Kokx
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm ModelsMartin Coronel
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning ClusteringRupak Roy
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesRajat Sharma
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningEng Teong Cheah
 
Knowledge Discovery Using Data Mining
Knowledge Discovery Using Data MiningKnowledge Discovery Using Data Mining
Knowledge Discovery Using Data Miningparthvora18
 

What's hot (20)

Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
Parallel Processing Concepts
Parallel Processing Concepts Parallel Processing Concepts
Parallel Processing Concepts
 
Restoring and Non-Restoring division algo for CSE
Restoring and Non-Restoring division algo for CSERestoring and Non-Restoring division algo for CSE
Restoring and Non-Restoring division algo for CSE
 
Building a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with DatabricksBuilding a Data Science as a Service Platform in Azure with Databricks
Building a Data Science as a Service Platform in Azure with Databricks
 
Neural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective accelerationNeural Networks and Genetic Algorithms Multiobjective acceleration
Neural Networks and Genetic Algorithms Multiobjective acceleration
 
15CS44 MP & MC Module 2
15CS44 MP & MC Module  215CS44 MP & MC Module  2
15CS44 MP & MC Module 2
 
Machine Learning Classifiers
Machine Learning ClassifiersMachine Learning Classifiers
Machine Learning Classifiers
 
Binary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine LearningBinary Class and Multi Class Strategies for Machine Learning
Binary Class and Multi Class Strategies for Machine Learning
 
K-Folds Cross Validation Method
K-Folds Cross Validation MethodK-Folds Cross Validation Method
K-Folds Cross Validation Method
 
Cross validation
Cross validationCross validation
Cross validation
 
Addressing modes of 8086
Addressing modes of 8086Addressing modes of 8086
Addressing modes of 8086
 
AI_session 23 Resolution.pptx
AI_session 23 Resolution.pptxAI_session 23 Resolution.pptx
AI_session 23 Resolution.pptx
 
Introduction to Linear Discriminant Analysis
Introduction to Linear Discriminant AnalysisIntroduction to Linear Discriminant Analysis
Introduction to Linear Discriminant Analysis
 
Parallel Algorithm Models
Parallel Algorithm ModelsParallel Algorithm Models
Parallel Algorithm Models
 
Machine Learning Clustering
Machine Learning ClusteringMachine Learning Clustering
Machine Learning Clustering
 
Logistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | DisadvantagesLogistic regression : Use Case | Background | Advantages | Disadvantages
Logistic regression : Use Case | Background | Advantages | Disadvantages
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Back propagation
Back propagationBack propagation
Back propagation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Knowledge Discovery Using Data Mining
Knowledge Discovery Using Data MiningKnowledge Discovery Using Data Mining
Knowledge Discovery Using Data Mining
 

Similar to Tensors Are All You Need: Faster Inference with Hummingbird

StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkSri Ambati
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitDatabricks
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model CompressionApache MXNet
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptxhu153574
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMsSylvainGugger
 
Data Con LA 2022 - AutoDC + AutoML = your AI development superpower
Data Con LA 2022 - AutoDC + AutoML = your AI development superpowerData Con LA 2022 - AutoDC + AutoML = your AI development superpower
Data Con LA 2022 - AutoDC + AutoML = your AI development superpowerData Con LA
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...PyData
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudGábor Szárnyas
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnJosef A. Habdank
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O Sri Ambati
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...Databricks
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...Institute of Contemporary Sciences
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...Chester Chen
 
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesRevolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesPhilip Goddard
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit
 

Similar to Tensors Are All You Need: Faster Inference with Hummingbird (20)

StackNet Meta-Modelling framework
StackNet Meta-Modelling frameworkStackNet Meta-Modelling framework
StackNet Meta-Modelling framework
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
AI On the Edge: Model Compression
AI On the Edge: Model CompressionAI On the Edge: Model Compression
AI On the Edge: Model Compression
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx11.1. PPT on How to crack ML Competitions all steps explained.pptx
11.1. PPT on How to crack ML Competitions all steps explained.pptx
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Data Con LA 2022 - AutoDC + AutoML = your AI development superpower
Data Con LA 2022 - AutoDC + AutoML = your AI development superpowerData Con LA 2022 - AutoDC + AutoML = your AI development superpower
Data Con LA 2022 - AutoDC + AutoML = your AI development superpower
 
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
Driving Moore's Law with Python-Powered Machine Learning: An Insider's Perspe...
 
IncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the CloudIncQuery-D: Incremental Queries in the Cloud
IncQuery-D: Incremental Queries in the Cloud
 
System mldl meetup
System mldl meetupSystem mldl meetup
System mldl meetup
 
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearnPrediction as a service with ensemble model in SparkML and Python ScikitLearn
Prediction as a service with ensemble model in SparkML and Python ScikitLearn
 
Generalized Linear Models with H2O
Generalized Linear Models with H2O Generalized Linear Models with H2O
Generalized Linear Models with H2O
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D... Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
Apache Spark Based Hyper-Parameter Selection and Adaptive Model Tuning for D...
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
SF Big Analytics & SF Machine Learning Meetup: Machine Learning at the Limit ...
 
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn PipelinesRevolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
Revolutionise your Machine Learning Workflow using Scikit-Learn Pipelines
 
Spark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef HabdankSpark Summit EU talk by Josef Habdank
Spark Summit EU talk by Josef Habdank
 

More from Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

More from Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Recently uploaded

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schscnajjemba
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...Health
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样wsppdmt
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...gajnagarg
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraGovindSinghDasila
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制vexqp
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdftheeltifs
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...gajnagarg
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxVivek487417
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1ranjankumarbehera14
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss ConfederationEfruzAsilolu
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...nirzagarg
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样wsppdmt
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........EfruzAsilolu
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Klinik kandungan
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxParas Gupta
 

Recently uploaded (20)

PLE-statistics document for primary schs
PLE-statistics document for primary schsPLE-statistics document for primary schs
PLE-statistics document for primary schs
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1Lecture_2_Deep_Learning_Overview-newone1
Lecture_2_Deep_Learning_Overview-newone1
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
SR-101-01012024-EN.docx  Federal Constitution  of the Swiss ConfederationSR-101-01012024-EN.docx  Federal Constitution  of the Swiss Confederation
SR-101-01012024-EN.docx Federal Constitution of the Swiss Confederation
 
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
怎样办理伦敦大学毕业证(UoL毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 

Tensors Are All You Need: Faster Inference with Hummingbird

  • 1. Matteo Interlandi, Karla Saur Hummingbird
  • 4. Machine Learning Prediction Serving 1. Models are learned from data Data Training Model Learn
  • 5. Machine Learning Prediction Serving 1. Models are learned from data 2. Models are deployed and served together Prediction serving Users Server Data Training Model Learn Deploy
  • 6. Model Serving Specialized Systems have been developed
  • 7. Model Serving Specialized Systems have been developed Focus: Deep Learning (DL)
  • 8. Model Serving Specialized Systems have been developed Support for traditional ML methods is largely overlooked Focus: Deep Learning (DL)
  • 9. Traditional ML Models 2019 Kaggle Survey: The State of Data Science & Machine Learning Data Science through the looking glass: https://arxiv.org/abs/1912.09536
  • 10. Problem: Lack of Optimizations for Traditional ML Serving Systems for training traditional ML models are not optimized for serving Traditional ML models are expressed using imperative code in an ad-hoc fashion, not using a shared logical abstraction Traditional ML models cannot natively exploit hardware acceleration
  • 11. Tokeniz er How do “Traditional ML Models” look inside? <Example: Binary Classification> Char Ngram Word Ngram Conca t Logistic Regressio n 0 vs 1 Traditional ML Model A B C D a 0.1 c 0.5
  • 12. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification>
  • 13. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification> Featurizers
  • 14. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) Featurizers Predictor A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification>
  • 15. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification>
  • 16. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) Split input into cat num A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification>
  • 17. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) Normalize num A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification> Split input into cat num One hot encode cat
  • 18. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) Merge two vectors A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification> Split input into cat num Normalize num One hot encode cat
  • 19. Split How do “Traditional ML Models” look inside? Scaler OneHot Conca t Logistic Regressio n DAG of Operators (aka pipeline) Merge two vectors Compute final score A B C D a 0.1 c 0.5 0 vs 1 <Example: Binary Classification> Split input into cat num Normalize num One hot encode cat
  • 21. Primarily relies on the abstraction of tensors Deep Learning DL models are expressed as a DAG of tensor operators w1 b1 X Mat Mul Add ReLU Mat Mul Add Sigmoid w1 b1 User Input
  • 22. Systems for DL Prediction Serving Exploit the abstraction of tensor operations to support multiple DL frameworks on multiple target environments Mat Mul Add ReLU … ✔ Efficient implementations ✔ Declarative ✔ Seamless hardware acceleration ✔ Reduced engineering efforts Benefit s:
  • 24. Converting ML Operators into Tensor Operations Observation: pipelines are composed of two classes of operators Algebraic Operations: E.g., Linear Regression Algorithmic Operations: E.g., RandomForest, OneHotEncoder Y = wX + b Complex data access patterns and control-flow patterns! Introduce redundancies, both computational and storage Make data access patterns and control flow uniform for all inputs Our Solution: Depending on the level of redundancy introduced there can be more than one potential compilation approach Hummingbird picks the one that works given pipeline statistics
  • 34. Compiling Decision Tree-based Models Above approach (GEMM approach) essentially evaluates all paths in a decision tree model: computation redundancy. Works surprisingly well on modern hardware for many cases! Two other tree traversal-based methods that exploit the tree structure. For tall trees (e.g., LightGBM) For bushy trees (e.g., XGBoost)
  • 36. Tree Traversal Method Initial: 0 repeat while < max tree depth Gather Feature Ids Featu re id X Gather Feature value Gather Thresholds Threshold value < Co nd. Where Lefts Rights Tr ue Fal se
  • 37. Perfect Tree Traversal Method 3 F3 < 0.5 F2 < 2.0 F5 < 5.5 C1 C2 C1 F3 < 2.4 C2 C1 true false true true true false false false F3 < 0.5 F2 < 2.0 F5 < 5.5 C1 C1 F3 < 2.4 C2 C1 true false true true true false false false C1 C2 C2 C1
  • 38. Operator Group Supported Operators Linear Classifiers Logistic Regression, Linear SVC, LinearSVR, SVC, SVR, NuSVC, SGDClassifier, LogisticRegressionCV Tree Methods DecisionTreeClassifier/Regressor, RandomForestClassifier/Regressor, (Hist)GradientBoostingClassifier/Regressor, ExtraTreesClassifier/Regressor, XGBClassifier/Regressor, LGBMClassifier/Regressor/Ranker Neural Networks MLPClassifier Others BernouliNB, Kmeans, MeanShift Feature Selectors SelectKBest Decomposition PCA, TruncatedSVD Feature Pre-Processing SimpleImputer, Imputer, ColumnTransformer, RobustScaler, MaxAbsScaler, MinMaxScaler, StandardScaler, Binarizer, KBinsDiscretizer, Normalizer, PolynomialFeatures, OneHotEncoder, LabelEncoder, FeatureHasher Supported Operators
  • 39. High-level System Design Hummingbird Trained Traditional ML Pipelines DL Prediction Serving Systems
  • 40. End-to-End Pipeline Evaluation 4 Hardware Setup Experimental Workload Hummingbird can translate 2328 pipelines (88%). Perform inference on 20% of the dataset. TorchScript as the backend for Hummingbird. Scikit-Learn pipelines for OpenML-CC18 benchmark which has 72 datasets. Azure NC6 v2 machine Intel Xeon E5-2690 v4@ 2.6GHz (6 cores) 112 GB RAM Nvidia P100 Ubuntu 18.04, PyTorch 1.3, TVM 0.6, CUDA 10, RAPIDS 0.9
  • 42. End-to-End Pipeline Evaluation 4 CPU GPU 60% 1200 X 60 X 73% 1000 X 130 X Main reasons for slowdowns: Sparse input data, small inference datasets.
  • 44. Hummingbird Updates • Hummingbird has reached > 21K PyPI downloads and 2.4k stars • Demoed at Microsoft Ignite • Integrated with ONNX converter tools • OSDI paper • New features include: • Pandas Dataframes • PySparkML support • TVM support • Looking for new users/contributors!
  • 46. Tree-Models Microbenchmark 46 Experimental Workload: Nvidia Gradient Boosting Algorithm Benchmark* Dataset Rows #Features Task Fraud 285k 28 BinaryClass Year 512k 90 Regression Covtype 581k 54 MultiClass Epsilon 500k 2000 BinaryClass 3 Models: RandomForest, XGBoost, LightGBM. 80/20 train/test split. Batch inference (batch size 10k w/ and w/o GPU. (* https://github.com/NVIDIA/gbm-bench)
  • 47. Tree-Models Microbenchmark 47 Algorithm Dataset Sklearn (CPU Baseline) Hummingbird (CPU) RAPIDS (GPU Baseline) Hummingbird (GPU) TorchScript TVM TorchScript TVM Rand. Forest Fraud Year Covtype Epsilon LightGBM Fraud Year Covtype Epsilon XGBoost Fraud Year Covtype Epsilon (All runtimes are reported in seconds. More datasets and experimental results in the paper.)
  • 48. Tree-Models Microbenchmark 48 Algorithm Dataset Sklearn (CPU Baseline) Hummingbird (CPU) RAPIDS (GPU Baseline) Hummingbird (GPU) TorchScript TVM TorchScript TVM Rand. Forest Fraud 2.5 7.8 3.0 Year 1.9 7.7 1.4 Covtype 5.9 16.5 6.8 Epsilon 9.8 13.9 6.6 LightGBM Fraud 3.4 7.6 1.7 Year 5.0 7.6 1.6 Covtype 51.1 79.5 27.2 Epsilon 10.5 14.5 4.0 XGBoost Fraud 1.9 7.6 1.6 Year 3.1 7.6 1.6 Covtype 42.3 79.0 26.4 Epsilon 7.6 14.8 4.2 (All runtimes are reported in seconds. More datasets and experimental results in the paper.)
  • 49. Tree-Models Microbenchmark 49 Algorithm Dataset Sklearn (CPU Baseline) Hummingbird (CPU) RAPIDS (GPU Baseline) Hummingbird (GPU) TorchScript TVM TorchScript TVM Rand. Forest Fraud 2.5 7.8 3.0 !SUPPORTED 0.044 0.015 Year 1.9 7.7 1.4 !SUPPORTED 0.045 0.026 Covtype 5.9 16.5 6.8 !SUPPORTED 0.110 0.047 Epsilon 9.8 13.9 6.6 !SUPPORTED 0.130 0.13 LightGBM Fraud 3.4 7.6 1.7 0.014 0.044 0.014 Year 5.0 7.6 1.6 0.023 0.045 0.025 Covtype 51.1 79.5 27.2 !SUPPORTED 0.620 0.250 Epsilon 10.5 14.5 4.0 0.150 0.130 0.120 XGBoost Fraud 1.9 7.6 1.6 0.013 0.044 0.015 Year 3.1 7.6 1.6 0.022 0.045 0.026 Covtype 42.3 79.0 26.4 !SUPPORTED 0.620 0.250 Epsilon 7.6 14.8 4.2 0.150 0.130 0.120 (All runtimes are reported in seconds. More datasets and experimental results in the paper.)