SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
Gradient Boosted Regression Trees
scikit
Peter Prettenhofer (@pprett)
DataRobot
Gilles Louppe (@glouppe)
Universit´e de Li`ege, Belgium
Motivation
Motivation
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
About us
Peter
• @pprett
• Python & ML ∼ 6 years
• sklearn dev since 2010
Gilles
• @glouppe
• PhD student (Li`ege,
Belgium)
• sklearn dev since 2011
Chief tree hugger
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Machine Learning 101
• Data comes as...
• A set of examples {(xi , yi )|0 ≤ i < n samples}, with
• Feature vector x ∈ Rn features
, and
• Response y ∈ R (regression) or y ∈ {−1, 1} (classification)
• Goal is to...
• Find a function ˆy = f (x)
• Such that error L(y, ˆy) on new (unseen) x is minimal
Classification and Regression Trees [Breiman et al, 1984]
MedInc <= 5.04
MedInc <= 3.07 MedInc <= 6.82
AveRooms <= 4.31 AveOccup <= 2.37
1.62 1.16 2.79 1.88
AveOccup <= 2.74 MedInc <= 7.82
3.39 2.56 3.73 4.57
sklearn.tree.DecisionTreeClassifier|Regressor
Function approximation with Regression Trees
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10y
ground truth
RT d=1
RT d=3
RT d=20
Function approximation with Regression Trees
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10y
ground truth
RT d=1
RT d=3
RT d=20
Deprecated
• Nowadays seldom used alone
• Ensembles: Random Forest, Bagging, or Boosting
(see sklearn.ensemble)
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Gradient Boosted Regression Trees
Advantages
• Heterogeneous data (features measured on different scale),
• Supports different loss functions (e.g. huber),
• Automatically detects (non-linear) feature interactions,
Disadvantages
• Requires careful tuning
• Slow to train (but fast to predict)
• Cannot extrapolate
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its
predecessor
• Iteratively re-weights training examples based on errors
2 1 0 1 2 3
x0
2
1
0
1
2
x1
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
sklearn.ensemble.AdaBoostClassifier|Regressor
Boosting
AdaBoost [Y. Freund & R. Schapire, 1995]
• Ensemble: each member is an expert on the errors of its
predecessor
• Iteratively re-weights training examples based on errors
2 1 0 1 2 3
x0
2
1
0
1
2
x1
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
2 1 0 1 2 3
x0
sklearn.ensemble.AdaBoostClassifier|Regressor
Huge success
• Viola-Jones Face Detector (2001)
• Freund & Schapire won the G¨odel prize 2003
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Gradient Boosting [J. Friedman, 1999]
Statistical view on boosting
• ⇒ Generalization of boosting to arbitrary loss functions
Residual fitting
2 6 10
x
2.0
1.5
1.0
0.5
0.0
0.5
1.0
1.5
2.0
2.5
y
Ground truth
2 6 10
x
∼
tree 1
2 6 10
x
+
tree 2
2 6 10
x
+
tree 3
sklearn.ensemble.GradientBoostingClassifier|Regressor
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
• The residual ∼ the (negative) gradient ∂L(yi , f (xi ))
∂f (xi )
Functional Gradient Descent
Least Squares Regression
• Squared loss: L(yi , f (xi )) = (yi − f (xi ))2
• The residual ∼ the (negative) gradient ∂L(yi , f (xi ))
∂f (xi )
Steepest Descent
• Regression trees approximate the (negative) gradient
• Each tree is a successive gradient descent step
4 3 2 1 0 1 2 3 4
y−f(x)
0
1
2
3
4
5
6
7
8
L(y,f(x))
Squared error
Absolute error
Huber error
4 3 2 1 0 1 2 3 4
y·f(x)
0
1
2
3
4
5
6
7
8
L(y,f(x))
Zero-one loss
Log loss
Exponential loss
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
GBRT in scikit-learn
How to use it
>>> from sklearn.ensemble import GradientBoostingClassifier
>>> from sklearn.datasets import make_hastie_10_2
>>> X, y = make_hastie_10_2(n_samples=10000)
>>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3)
>>> est.fit(X, y)
...
>>> # get predictions
>>> pred = est.predict(X)
>>> est.predict_proba(X)[0] # class probabilities
array([ 0.67, 0.33])
Implementation
• Written in pure Python/Numpy (easy to extend).
• Builds on top of sklearn.tree.DecisionTreeRegressor (Cython).
• Custom node splitter that uses pre-sorting (better for shallow trees).
Example
from sklearn.ensemble import GradientBoostingRegressor
est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y)
for pred in est.staged_predict(X):
plt.plot(X[:, 0], pred, color=’r’, alpha=0.1)
0 2 4 6 8 10
x
8
6
4
2
0
2
4
6
8
10
y
High bias - low variance
Low bias - high variance
ground truth
RT d=1
RT d=3
GBRT d=1
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Lowest test error
train-test gap
Test
Train
Model complexity & Overfitting
test_score = np.empty(len(est.estimators_))
for i, pred in enumerate(est.staged_predict(X_test)):
test_score[i] = est.loss_(y_test, pred)
plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’)
plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’)
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Lowest test error
train-test gap
Test
Train
Regularization
GBRT provides a number of knobs to control
overfitting
• Tree structure
• Shrinkage
• Stochastic Gradient Boosting
Regularization: Tree structure
• The max depth of the trees controls the degree of features interactions
• Use min samples leaf to have a sufficient nr. of samples per leaf.
Regularization: Shrinkage
• Slow learning by shrinking tree predictions with 0 < learning rate <= 1
• Lower learning rate requires higher n estimators
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Requires more trees
Lower test error
Test
Train
Test learning_rate=0.1
Train learning_rate=0.1
Regularization: Stochastic Gradient Boosting
• Samples: random subset of the training set (subsample)
• Features: random subset of features (max features)
• Improved accuracy – reduced runtime
0 200 400 600 800 1000
n_estimators
0.0
0.5
1.0
1.5
2.0
Error
Even lower test error
Subsample alone does poorly
Train
Test
Train subsample=0.5, learning_rate=0.1
Test subsample=0.5, learning_rate=0.1
Hyperparameter tuning
1. Set n estimators as high as possible (eg. 3000)
2. Tune hyperparameters via grid search.
from sklearn.grid_search import GridSearchCV
param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01],
’max_depth’: [4, 6],
’min_samples_leaf’: [3, 5, 9, 17],
’max_features’: [1.0, 0.3, 0.1]}
est = GradientBoostingRegressor(n_estimators=3000)
gs_cv = GridSearchCV(est, param_grid).fit(X, y)
# best hyperparameter setting
gs_cv.best_params_
3. Finally, set n estimators even higher and tune
learning rate.
Outline
1 Basics
2 Gradient Boosting
3 Gradient Boosting in Scikit-learn
4 Case Study: California housing
Case Study
California Housing dataset
• Predict log(medianHouseValue)
• Block groups in 1990 census
• 20.640 groups with 8 features
(median income, median age, lat,
lon, ...)
• Evaluation: Mean absolute error
on 80/20 split
Challenges
• Heterogeneous features
• Non-linear interactions
Predictive accuracy & runtime
Train time [s] Test time [ms] MAE
Mean - - 0.4635
Ridge 0.006 0.11 0.2756
SVR 28.0 2000.00 0.1888
RF 26.3 605.00 0.1620
GBRT 192.0 439.00 0.1438
0 500 1000 1500 2000 2500 3000
n_estimators
0.0
0.1
0.2
0.3
0.4
0.5
error
Test
Train
Model interpretation
Which features are important?
>>> est.feature_importances_
array([ 0.01, 0.38, ...])
0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18
Relative importance
HouseAge
Population
AveBedrms
Latitude
AveOccup
Longitude
AveRooms
MedInc
Model interpretation
What is the effect of a feature on the response?
from sklearn.ensemble import partial_dependence import as pd
features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’,
(’AveOccup’, ’HouseAge’)]
fig, axs = pd.plot_partial_dependence(est, X_train, features,
feature_names=names)
1.5 3.0 4.5 6.0 7.5
MedInc
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
2.0 2.5 3.0 3.54.0 4.5
AveOccup
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
10 20 30 40 50 60
HouseAge
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
4 5 6 7 8
AveRooms
0.4
0.2
0.0
0.2
0.4
0.6
Partialdependence
2.0 2.5 3.0 3.5 4.0
AveOccup
10
20
30
40
50
HouseAge
-0.12
-0.05
0.02
0.090.16
0.23
Partial dependence of house value on nonlocation features
for the California housing dataset
Model interpretation
Automatically detects spatial effects
longitude
latitude
-1.54
-1.22
-0.91
-0.60
-0.28
0.03
0.34
0.66
0.97
partialdep.onmedianhousevalue
longitudelatitude
-0.15
-0.07
0.01
0.09
0.17
0.25
0.33
0.41
0.49
0.57
partialdep.onmedianhousevalue
Summary
• Flexible non-parametric classification and regression technique
• Applicable to a variety of problems
• Solid, battle-worn implementation in scikit-learn
Thanks! Questions?
Benchmarks
0.0
0.2
0.4
0.6
0.8
1.0
1.2
Error
gbm
sklearn-0.15
0.0
0.5
1.0
1.5
2.0
2.5
3.0
Traintime
Arcene
Boston
California
Covtype
Example10.2
Expedia
Madelon
Solar
Spam
YahooLTRC
bioresp
dataset
0.0
0.2
0.4
0.6
0.8
1.0
Testtime
Tipps & Tricks 1
Input layout
Use dtype=np.float32 to avoid memory copies and fortan layout for slight
runtime benefit.
X = np.asfortranarray(X, dtype=np.float32)
Tipps & Tricks 2
Feature interactions
GBRT automatically detects feature interactions but often explicit interactions
help.
Trees required to approximate X1 − X2: 10 (left), 1000 (right).
x 0.00.20.40.60.81.0
y
0.0
0.2
0.4
0.6
0.8
1.0
x-y
0.3
0.2
0.1
0.0
0.1
0.2
0.3
x 0.00.20.40.60.81.0
y
0.0
0.2
0.4
0.6
0.8
1.0
x-y
1.0
0.5
0.0
0.5
1.0
Tipps & Tricks 3
Categorical variables
Sklearn requires that categorical variables are encoded as numerics. Tree-based
methods work well with ordinal encoding:
df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]})
# ordinal encoding
df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao,
return_inverse=True)[1]})
X = np.asfortranarray(df_enc.values, dtype=np.float32)

Contenu connexe

Tendances

Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in ProductionJannes Klaas
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine LearningKuppusamy P
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machinesnextlib
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Simplilearn
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
파이썬으로 익히는 딥러닝 기본 (18년)
파이썬으로 익히는 딥러닝 기본 (18년)파이썬으로 익히는 딥러닝 기본 (18년)
파이썬으로 익히는 딥러닝 기본 (18년)SK(주) C&C - 강병호
 
Clustering training
Clustering trainingClustering training
Clustering trainingGabor Veress
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningAbhishek Vijayvargia
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsAndrew Ferlitsch
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요Dongmin Lee
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tearsAnkit Sharma
 
Classification
ClassificationClassification
ClassificationCloudxLab
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization Sourabh Sahu
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 

Tendances (20)

Monitoring Models in Production
Monitoring Models in ProductionMonitoring Models in Production
Monitoring Models in Production
 
Logistic regression in Machine Learning
Logistic regression in Machine LearningLogistic regression in Machine Learning
Logistic regression in Machine Learning
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
파이썬으로 익히는 딥러닝 기본 (18년)
파이썬으로 익히는 딥러닝 기본 (18년)파이썬으로 익히는 딥러닝 기본 (18년)
파이썬으로 익히는 딥러닝 기본 (18년)
 
Clustering training
Clustering trainingClustering training
Clustering training
 
Decision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learningDecision tree, softmax regression and ensemble methods in machine learning
Decision tree, softmax regression and ensemble methods in machine learning
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Machine Learning - Splitting Datasets
Machine Learning - Splitting DatasetsMachine Learning - Splitting Datasets
Machine Learning - Splitting Datasets
 
강화학습의 개요
강화학습의 개요강화학습의 개요
강화학습의 개요
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Naive Bayes
Naive Bayes Naive Bayes
Naive Bayes
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
Classification
ClassificationClassification
Classification
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Python Seaborn Data Visualization
Python Seaborn Data Visualization Python Seaborn Data Visualization
Python Seaborn Data Visualization
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 

Similaire à Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnDataRobot
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learningYogendra Singh
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习AdaboostShocky1
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning재연 윤
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptxssuserf07225
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentationrohan_anil
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoringharmonylab
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Gabriel Moreira
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data ScienceAlbert Bifet
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep LearningSourya Dey
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 

Similaire à Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer (20)

Gradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learnGradient Boosted Regression Trees in scikit-learn
Gradient Boosted Regression Trees in scikit-learn
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Demystifying deep reinforement learning
Demystifying deep reinforement learningDemystifying deep reinforement learning
Demystifying deep reinforement learning
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Rohan's Masters presentation
Rohan's Masters presentationRohan's Masters presentation
Rohan's Masters presentation
 
Study on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit ScoringStudy on Application of Ensemble learning on Credit Scoring
Study on Application of Ensemble learning on Credit Scoring
 
Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017Feature Engineering - Getting most out of data for predictive models - TDC 2017
Feature Engineering - Getting most out of data for predictive models - TDC 2017
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
Lecture4.pptx
Lecture4.pptxLecture4.pptx
Lecture4.pptx
 
Xgboost
XgboostXgboost
Xgboost
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Decision Tree.pptx
Decision Tree.pptxDecision Tree.pptx
Decision Tree.pptx
 
Techniques in Deep Learning
Techniques in Deep LearningTechniques in Deep Learning
Techniques in Deep Learning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
ngboost.pptx
ngboost.pptxngboost.pptx
ngboost.pptx
 
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 

Plus de PyData

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...PyData
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshPyData
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiPyData
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...PyData
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerPyData
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...PyData
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroPyData
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...PyData
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottPyData
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...PyData
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPyData
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...PyData
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydPyData
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverPyData
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldPyData
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...PyData
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardPyData
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...PyData
 

Plus de PyData (20)

Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
 
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif WalshUnit testing data with marbles - Jane Stewart Adams, Leif Walsh
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
 
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake BolewskiThe TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
 
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
 
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne BauerDeploying Data Science for Distribution of The New York Times - Anne Bauer
Deploying Data Science for Distribution of The New York Times - Anne Bauer
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
 
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo MazzaferroRESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
 
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
 
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven LottAvoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...End-to-End Machine learning pipelines for Python driven organizations - Nick ...
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
 
Pydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica PuertoPydata beautiful soup - Monica Puerto
Pydata beautiful soup - Monica Puerto
 
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
 
Extending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will AydExtending Pandas with Custom Types - Will Ayd
Extending Pandas with Custom Types - Will Ayd
 
Measuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen HooverMeasuring Model Fairness - Stephen Hoover
Measuring Model Fairness - Stephen Hoover
 
What's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper SeaboldWhat's the Science in Data Science? - Skipper Seabold
What's the Science in Data Science? - Skipper Seabold
 
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
 
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-WardSolving very simple substitution ciphers algorithmically - Stephen Enright-Ward
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
 
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
 

Dernier

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 

Dernier (20)

Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 

Gradient Boosted Regression Trees in Scikit Learn by Gilles Louppe & Peter Prettenhofer

  • 1. Gradient Boosted Regression Trees scikit Peter Prettenhofer (@pprett) DataRobot Gilles Louppe (@glouppe) Universit´e de Li`ege, Belgium
  • 4. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 5. About us Peter • @pprett • Python & ML ∼ 6 years • sklearn dev since 2010 Gilles • @glouppe • PhD student (Li`ege, Belgium) • sklearn dev since 2011 Chief tree hugger
  • 6. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 7. Machine Learning 101 • Data comes as... • A set of examples {(xi , yi )|0 ≤ i < n samples}, with • Feature vector x ∈ Rn features , and • Response y ∈ R (regression) or y ∈ {−1, 1} (classification) • Goal is to... • Find a function ˆy = f (x) • Such that error L(y, ˆy) on new (unseen) x is minimal
  • 8. Classification and Regression Trees [Breiman et al, 1984] MedInc <= 5.04 MedInc <= 3.07 MedInc <= 6.82 AveRooms <= 4.31 AveOccup <= 2.37 1.62 1.16 2.79 1.88 AveOccup <= 2.74 MedInc <= 7.82 3.39 2.56 3.73 4.57 sklearn.tree.DecisionTreeClassifier|Regressor
  • 9. Function approximation with Regression Trees 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10y ground truth RT d=1 RT d=3 RT d=20
  • 10. Function approximation with Regression Trees 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10y ground truth RT d=1 RT d=3 RT d=20 Deprecated • Nowadays seldom used alone • Ensembles: Random Forest, Bagging, or Boosting (see sklearn.ensemble)
  • 11. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 12. Gradient Boosted Regression Trees Advantages • Heterogeneous data (features measured on different scale), • Supports different loss functions (e.g. huber), • Automatically detects (non-linear) feature interactions, Disadvantages • Requires careful tuning • Slow to train (but fast to predict) • Cannot extrapolate
  • 13. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 1 0 1 2 3 x0 2 1 0 1 2 x1 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 sklearn.ensemble.AdaBoostClassifier|Regressor
  • 14. Boosting AdaBoost [Y. Freund & R. Schapire, 1995] • Ensemble: each member is an expert on the errors of its predecessor • Iteratively re-weights training examples based on errors 2 1 0 1 2 3 x0 2 1 0 1 2 x1 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 2 1 0 1 2 3 x0 sklearn.ensemble.AdaBoostClassifier|Regressor Huge success • Viola-Jones Face Detector (2001) • Freund & Schapire won the G¨odel prize 2003
  • 15. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions
  • 16. Gradient Boosting [J. Friedman, 1999] Statistical view on boosting • ⇒ Generalization of boosting to arbitrary loss functions Residual fitting 2 6 10 x 2.0 1.5 1.0 0.5 0.0 0.5 1.0 1.5 2.0 2.5 y Ground truth 2 6 10 x ∼ tree 1 2 6 10 x + tree 2 2 6 10 x + tree 3 sklearn.ensemble.GradientBoostingClassifier|Regressor
  • 17. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 • The residual ∼ the (negative) gradient ∂L(yi , f (xi )) ∂f (xi )
  • 18. Functional Gradient Descent Least Squares Regression • Squared loss: L(yi , f (xi )) = (yi − f (xi ))2 • The residual ∼ the (negative) gradient ∂L(yi , f (xi )) ∂f (xi ) Steepest Descent • Regression trees approximate the (negative) gradient • Each tree is a successive gradient descent step 4 3 2 1 0 1 2 3 4 y−f(x) 0 1 2 3 4 5 6 7 8 L(y,f(x)) Squared error Absolute error Huber error 4 3 2 1 0 1 2 3 4 y·f(x) 0 1 2 3 4 5 6 7 8 L(y,f(x)) Zero-one loss Log loss Exponential loss
  • 19. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 20. GBRT in scikit-learn How to use it >>> from sklearn.ensemble import GradientBoostingClassifier >>> from sklearn.datasets import make_hastie_10_2 >>> X, y = make_hastie_10_2(n_samples=10000) >>> est = GradientBoostingClassifier(n_estimators=200, max_depth=3) >>> est.fit(X, y) ... >>> # get predictions >>> pred = est.predict(X) >>> est.predict_proba(X)[0] # class probabilities array([ 0.67, 0.33]) Implementation • Written in pure Python/Numpy (easy to extend). • Builds on top of sklearn.tree.DecisionTreeRegressor (Cython). • Custom node splitter that uses pre-sorting (better for shallow trees).
  • 21. Example from sklearn.ensemble import GradientBoostingRegressor est = GradientBoostingRegressor(n_estimators=2000, max_depth=1).fit(X, y) for pred in est.staged_predict(X): plt.plot(X[:, 0], pred, color=’r’, alpha=0.1) 0 2 4 6 8 10 x 8 6 4 2 0 2 4 6 8 10 y High bias - low variance Low bias - high variance ground truth RT d=1 RT d=3 GBRT d=1
  • 22. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Lowest test error train-test gap Test Train
  • 23. Model complexity & Overfitting test_score = np.empty(len(est.estimators_)) for i, pred in enumerate(est.staged_predict(X_test)): test_score[i] = est.loss_(y_test, pred) plt.plot(np.arange(n_estimators) + 1, test_score, label=’Test’) plt.plot(np.arange(n_estimators) + 1, est.train_score_, label=’Train’) 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Lowest test error train-test gap Test Train Regularization GBRT provides a number of knobs to control overfitting • Tree structure • Shrinkage • Stochastic Gradient Boosting
  • 24. Regularization: Tree structure • The max depth of the trees controls the degree of features interactions • Use min samples leaf to have a sufficient nr. of samples per leaf.
  • 25. Regularization: Shrinkage • Slow learning by shrinking tree predictions with 0 < learning rate <= 1 • Lower learning rate requires higher n estimators 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Requires more trees Lower test error Test Train Test learning_rate=0.1 Train learning_rate=0.1
  • 26. Regularization: Stochastic Gradient Boosting • Samples: random subset of the training set (subsample) • Features: random subset of features (max features) • Improved accuracy – reduced runtime 0 200 400 600 800 1000 n_estimators 0.0 0.5 1.0 1.5 2.0 Error Even lower test error Subsample alone does poorly Train Test Train subsample=0.5, learning_rate=0.1 Test subsample=0.5, learning_rate=0.1
  • 27. Hyperparameter tuning 1. Set n estimators as high as possible (eg. 3000) 2. Tune hyperparameters via grid search. from sklearn.grid_search import GridSearchCV param_grid = {’learning_rate’: [0.1, 0.05, 0.02, 0.01], ’max_depth’: [4, 6], ’min_samples_leaf’: [3, 5, 9, 17], ’max_features’: [1.0, 0.3, 0.1]} est = GradientBoostingRegressor(n_estimators=3000) gs_cv = GridSearchCV(est, param_grid).fit(X, y) # best hyperparameter setting gs_cv.best_params_ 3. Finally, set n estimators even higher and tune learning rate.
  • 28. Outline 1 Basics 2 Gradient Boosting 3 Gradient Boosting in Scikit-learn 4 Case Study: California housing
  • 29. Case Study California Housing dataset • Predict log(medianHouseValue) • Block groups in 1990 census • 20.640 groups with 8 features (median income, median age, lat, lon, ...) • Evaluation: Mean absolute error on 80/20 split Challenges • Heterogeneous features • Non-linear interactions
  • 30. Predictive accuracy & runtime Train time [s] Test time [ms] MAE Mean - - 0.4635 Ridge 0.006 0.11 0.2756 SVR 28.0 2000.00 0.1888 RF 26.3 605.00 0.1620 GBRT 192.0 439.00 0.1438 0 500 1000 1500 2000 2500 3000 n_estimators 0.0 0.1 0.2 0.3 0.4 0.5 error Test Train
  • 31. Model interpretation Which features are important? >>> est.feature_importances_ array([ 0.01, 0.38, ...]) 0.00 0.02 0.04 0.06 0.08 0.10 0.12 0.14 0.16 0.18 Relative importance HouseAge Population AveBedrms Latitude AveOccup Longitude AveRooms MedInc
  • 32. Model interpretation What is the effect of a feature on the response? from sklearn.ensemble import partial_dependence import as pd features = [’MedInc’, ’AveOccup’, ’HouseAge’, ’AveRooms’, (’AveOccup’, ’HouseAge’)] fig, axs = pd.plot_partial_dependence(est, X_train, features, feature_names=names) 1.5 3.0 4.5 6.0 7.5 MedInc 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 2.0 2.5 3.0 3.54.0 4.5 AveOccup 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 10 20 30 40 50 60 HouseAge 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 4 5 6 7 8 AveRooms 0.4 0.2 0.0 0.2 0.4 0.6 Partialdependence 2.0 2.5 3.0 3.5 4.0 AveOccup 10 20 30 40 50 HouseAge -0.12 -0.05 0.02 0.090.16 0.23 Partial dependence of house value on nonlocation features for the California housing dataset
  • 33. Model interpretation Automatically detects spatial effects longitude latitude -1.54 -1.22 -0.91 -0.60 -0.28 0.03 0.34 0.66 0.97 partialdep.onmedianhousevalue longitudelatitude -0.15 -0.07 0.01 0.09 0.17 0.25 0.33 0.41 0.49 0.57 partialdep.onmedianhousevalue
  • 34. Summary • Flexible non-parametric classification and regression technique • Applicable to a variety of problems • Solid, battle-worn implementation in scikit-learn
  • 37. Tipps & Tricks 1 Input layout Use dtype=np.float32 to avoid memory copies and fortan layout for slight runtime benefit. X = np.asfortranarray(X, dtype=np.float32)
  • 38. Tipps & Tricks 2 Feature interactions GBRT automatically detects feature interactions but often explicit interactions help. Trees required to approximate X1 − X2: 10 (left), 1000 (right). x 0.00.20.40.60.81.0 y 0.0 0.2 0.4 0.6 0.8 1.0 x-y 0.3 0.2 0.1 0.0 0.1 0.2 0.3 x 0.00.20.40.60.81.0 y 0.0 0.2 0.4 0.6 0.8 1.0 x-y 1.0 0.5 0.0 0.5 1.0
  • 39. Tipps & Tricks 3 Categorical variables Sklearn requires that categorical variables are encoded as numerics. Tree-based methods work well with ordinal encoding: df = pd.DataFrame(data={’icao’: [’CRJ2’, ’A380’, ’B737’, ’B737’]}) # ordinal encoding df_enc = pd.DataFrame(data={’icao’: np.unique(df.icao, return_inverse=True)[1]}) X = np.asfortranarray(df_enc.values, dtype=np.float32)