SlideShare une entreprise Scribd logo
1  sur  26
Télécharger pour lire hors ligne
Ensembling & Boosting
概念介紹
Wayne Chen
201608
簡報目的
增加資料分析領域的 sense
遇到自稱打過比賽的人不會心裡涼涼的覺得你好神
Maybe 就算用不上 概念也有借鏡的地方
如果說 Deep Learning 改變了 ML 的遊戲規則
XGBoost : Kaggle Winning Solution
Giuliano Janson: Won two games and retired from Kaggle
Persistence: every Kaggler nowadays can put up a great model in a few hours
and usually achieve 95% of final score. Only persistence will get you the
remaining 5%.
Ensembling: need to know how to do it "like a pro". Forget about averaging
models. Nowadays many Kaggler do meta-models, and meta-meta-models.
Why Ensemble is needed?
奧卡姆剃刀 Occam's Razor
● An explanation of the data should be made as simple as possible, but no simpler.
簡單的方法,勝過複雜的方法。 Simple s good. 任何的浪費都是不好的
將多個簡單的模型組合起來,效果比一個複雜的模型還要好
● Training data might not provide sufficient information for choosing a single best learner.
● The search processes of the learning algorithms might be imperfect (difficult to achieve unique
best hypothesis)
● Hypothesis space being searched might not contain the true target function.
所謂簡單的方法是指
ID3, C4.5, CART … Tree base method
Entropy
ex. 找出愛花錢的人,以性別作為切分 5 愛(1M,4F), 9 不愛(6M,3F)
● E_all → -5/14 * log(5/14) - 9/14 * log(9/14)
● Entropy is 1 if 50% - 50%, 0 if 100% - 0%
Information Gain
● 選擇 a 當作 split attribute,之後 Entropy 比原本減少了多少
● E_gender → P(M) * E(1,6) + P(F) * E(4,3) Gain = E_all - E_gender
http://www.saedsayad.com/decision_tree.htm
這樣會有什麼問題?
越精準的模型可能是越偏頗的
http://blogs.sas.com/content/jmp/2013/03/25/partitioning-a-quadratic-in-jmp/
一句話講完 Boost Ensemble
知錯能改、善莫大焉
學習就是一遍一遍的的對錯誤加重記憶,然後改進
做錯的事就沒有後悔藥吃了,記取教訓努力在未來不再犯錯
1. 錯了就錯了,不要丟掉,也不要執著
2. 記住錯在哪裡,下次加重學習
3. 一直學到考試都可以考一百分 (誤)
一秒鐘學會用 Ensemble
我想你已經 try 過一些不同 model 了
● Decision tree, NN, SVM, Regression ..
Ensemble Kaggle submission CSV files. → It’s work!
Majority Voting
● Three models : 70%, 70%, 70%
● Majority vote ensemble will be ~78%.
● Averaging predictions often reduces overfit.
http://mlwave.com/kaggle-ensembling-guide/
Ensemble 的陷阱
把 Kobe, Curry, LBJ 組一隊,就會拿總冠軍嗎?
Uncorrelated models usually performed better
As more accurate as possible, and as more diverse aspossible
常見機制 Majority Vote, Weighted Averaging
Voting Ensemble → RandomForest → GradientBoostingMachine
1111111100 = 80% accuracy
1111111100 = 80% accuracy
1011111100 = 70% accuracy
1111111100 = 80% accuracy
1111111100 = 80% accuracy
0111011101 = 70% accuracy
1000101111 = 60% accuracy
1111111101 = 90% accuracy
你一定聽過的
Ensemble 方法
● Randomly sampling not
only dat but also feature
● Majority vote
● Minimal tuning
● Performance pass lots of
complex method
n: subsample size
m: subfeature set size
tree size, tree number
http://www.slideshare.net/0xdata/jan-vitek-distributedrandomforest522013
Base Learner:被拿來 ensemble 的基礎模型 ex. 一棵樹, simple neural network
● Train by base learning algorithm (ex. decision tree, neural network ..)
三大訓練方法分支:
● Boosting - Boost weak learners too strong learners (sequential learners)
● Bagging - Like RandomForest, sampling from data or features
● Stacking - 打包的概念 (parallel learners)
● Employing different learning algorithms to train individual learners
● Individual learners then combined by a second-level learner which is
called meta-learner.
Ensemble 的關鍵字
Bagging Ensemble Bootstrap Aggregating
每次取樣m個資料點 (bootstrap sample) train base learner by calling a base
learning algorithm
● Sampling 的比例是學問
● 甚至針對不同特徵的子資料集 train 不同 model
○ Cherkauer(1996) 火山鑑定工程 32 NN,依據不同 input feature 切分
● 加入 randomness 元素
○ backpropagation random init, tree random select feature
● Majority voting
優點 -- 保留整體假說的多樣化特徵
Boost Family
● AdaBoost (Adaptive Boosting)
● Gradient Tree Boosting
● XGBoost
Conbination of Additive Models
學習收斂效能好
有放大雜訊的危險性
● Bagging can significantly reduce the variance
● Boosting can significantly reduce the bias
http://slideplayer.com/slide/4816467/
Assigns equal weights to all the training examples,
increased the weights of incorrectly classified examples.
Adaboost 特性介紹
在大部分情況下,可以有非常好的
表現,但對於雜訊的放大,是其必
須克服的地方。
在每一次的分類中,我們要提升被
分錯的點再下一次被分對的機率,
以及降低被分錯的機率。
http://www.37steps.com/exam/adaboost_comp/html/adaboost_comp.html
Gradient Boosting
Additive training
● New predictor is optimized by moving in the opposite direction of the
gradient to minimize the loss function.
GBDT 中的決策樹深度較小一般不會超過5,葉子節點的數量也不會超過10
● Boosted Tree: GBDT, GBRT, MART, LambdaMART
Gradient Boosting Model Steps
● Leaf weighted cost score
● Additive training: 加入一個新模型到模型中 → 選擇一個
加入後 cost error 下降最多的模型
● Greedy algorithm to build new tree from a single leaf
● Gradient update weight
Training Tips
Shrinkage
● Reduces the influence of each individual tree and leaves space for
future trees to improve the model.
● Better to improve model by many small steps than lagre steps.
Subsampling, Early Stopping, Post-Prunning
● In 2015, 29 challenge winning solutions, 17 used XGBoost (deep neural
nets 11)
● KDDCup 2015 all winning solution mention it.
● 用了直接上 leaderboard top 10
Scalability enables data scientists to process hundred millions of examples
on a desktop.
● OpenMP CPU multi-thread
● DMatrix
● Cache-aware and Sparsity-aware
為什麼 XGBoost 這麼威
Column Block for Parallel Learning
The most time consuming part of tree learning is to get the data into sorted
order.
In memory block, compressed column format, each column sorted by the
corresponding feature value. Block Compression, Block Sharding.
Results
Use it in Python
xgb_model = XGBClassifier( learning_rate =0.1, n_estimators=1000,
max_depth=5, min_child_weight=1, gamma=0, subsample=0.8,
colsample_bytree=0.8, objective= 'binary:logistic', nthread=8,
scale_pos_weight=1, seed=27)
● gamma : Minimum loss reduction required to make a further partition on a
leaf node of the tree.
● min_child_weight : Minimum sum of instance weight(hessian) needed in a
child.
● colsample_bytree : Subsample ratio of columns when constructing each
tree.
Ensamble in Kaggle
Voting ensembles, Weighted majority vote, Bagged Perceptrons, Rank
averaging, Historical ranks, Stacked & Blending (Netflix)
圖片分類比賽
● Voting ensemble of around 30 convnets. The best single model scored
0.93170. Final score 0.94120.
Ensemble in Kaggle
No Free Lunch
Ensemble is much better than single learner.
Bias-variance tradeoff → Boosting or Average vote it.
● Not understandable -- like DNN, Non-linear SVM
● There is no ensemble method which outperforms other ensemble methods
consistently
Selecting some base learners instead of using all of them to compose an
ensemble is a better choice -- selective ensembles
XGBoost(tabular data) v.s. Deep Learning(more & complex data, hard tuning)
Reference
● Gradient boosting machines, a tutorial Alexey Natekin1* and Alois Knoll2
● XGBoost: A Scalable Tree Boosting System - Tianqi Chen
● NTU cmlab http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/
● http://mlwave.com/kaggle-ensembling-guide/

Contenu connexe

Tendances

Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboostmichiaki ito
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learningSubrat Panda, PhD
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboostShuai Zhang
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorVivian S. Zhang
 
A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...Yuya Soneoka
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitionsOwen Zhang
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度Masa Kato
 
SimGAN 輪講資料
SimGAN 輪講資料SimGAN 輪講資料
SimGAN 輪講資料Genki Mori
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명홍배 김
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.Deep Learning JP
 
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked AutoencodersDeep Learning JP
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence ModelingDeep Learning JP
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ FyberDaniel Hen
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...Simplilearn
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RSatoshi Kato
 
強化学習入門
強化学習入門強化学習入門
強化学習入門Shunta Saito
 

Tendances (20)

Introduction of Xgboost
Introduction of XgboostIntroduction of Xgboost
Introduction of Xgboost
 
An introduction to reinforcement learning
An introduction to reinforcement learningAn introduction to reinforcement learning
An introduction to reinforcement learning
 
Introduction to XGboost
Introduction to XGboostIntroduction to XGboost
Introduction to XGboost
 
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its authorKaggle Winning Solution Xgboost algorithm -- Let us learn from its author
Kaggle Winning Solution Xgboost algorithm -- Let us learn from its author
 
A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...A systematic study of the class imbalance problem in convolutional neural net...
A systematic study of the class imbalance problem in convolutional neural net...
 
Tips for data science competitions
Tips for data science competitionsTips for data science competitions
Tips for data science competitions
 
敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度敵対的学習に対するラデマッハ複雑度
敵対的学習に対するラデマッハ複雑度
 
SimGAN 輪講資料
SimGAN 輪講資料SimGAN 輪講資料
SimGAN 輪講資料
 
Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명Recurrent Neural Net의 이론과 설명
Recurrent Neural Net의 이론과 설명
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
[DL輪読会]深層強化学習はなぜ難しいのか?Why Deep RL fails? A brief survey of recent works.
 
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
【DL輪読会】ConvNeXt V2: Co-designing and Scaling ConvNets with Masked Autoencoders
 
Soft Actor Critic 解説
Soft Actor Critic 解説Soft Actor Critic 解説
Soft Actor Critic 解説
 
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
[DL輪読会]Decision Transformer: Reinforcement Learning via Sequence Modeling
 
XGBoost @ Fyber
XGBoost @ FyberXGBoost @ Fyber
XGBoost @ Fyber
 
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
K Means Clustering Algorithm | K Means Clustering Example | Machine Learning ...
 
Exploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in RExploratory data analysis using xgboost package in R
Exploratory data analysis using xgboost package in R
 
KNN
KNN KNN
KNN
 
強化学習入門
強化学習入門強化学習入門
強化学習入門
 

Similaire à Ensembling & Boosting 概念介紹

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntEugene Yan Ziyou
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAminaRepo
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black BoxRebecca Bilbro
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringSri Ambati
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionJaroslaw Szymczak
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensembleDanbi Cho
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417Shuai Zhang
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat omarodibat
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxyadav834181
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning TechniquesBabu Priyavrat
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble AlgorithmsSara Hooker
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmaxJaeJun Yoo
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdfDynamicPitch
 

Similaire à Ensembling & Boosting 概念介紹 (20)

Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learntKaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
 
Aaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble LearningAaa ped-14-Ensemble Learning: About Ensemble Learning
Aaa ped-14-Ensemble Learning: About Ensemble Learning
 
gan.pdf
gan.pdfgan.pdf
gan.pdf
 
2021 04-01-dalle
2021 04-01-dalle2021 04-01-dalle
2021 04-01-dalle
 
Ensemble methods
Ensemble methodsEnsemble methods
Ensemble methods
 
Escaping the Black Box
Escaping the Black BoxEscaping the Black Box
Escaping the Black Box
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Ensemble methods
Ensemble methods Ensemble methods
Ensemble methods
 
Decision tree
Decision treeDecision tree
Decision tree
 
XGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competitionXGBoost: the algorithm that wins every competition
XGBoost: the algorithm that wins every competition
 
Decision tree and ensemble
Decision tree and ensembleDecision tree and ensemble
Decision tree and ensemble
 
Reading group gan - 20170417
Reading group   gan - 20170417Reading group   gan - 20170417
Reading group gan - 20170417
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
XGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptxXGBOOST [Autosaved]12.pptx
XGBOOST [Autosaved]12.pptx
 
Ensemble learning Techniques
Ensemble learning TechniquesEnsemble learning Techniques
Ensemble learning Techniques
 
Module 6: Ensemble Algorithms
Module 6:  Ensemble AlgorithmsModule 6:  Ensemble Algorithms
Module 6: Ensemble Algorithms
 
Machine Learning - Supervised Learning
Machine Learning - Supervised LearningMachine Learning - Supervised Learning
Machine Learning - Supervised Learning
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax[PR12] categorical reparameterization with gumbel softmax
[PR12] categorical reparameterization with gumbel softmax
 
BaggingBoosting.pdf
BaggingBoosting.pdfBaggingBoosting.pdf
BaggingBoosting.pdf
 

Dernier

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfRachmat Ramadhan H
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 

Dernier (20)

Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdfMarket Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
Market Analysis in the 5 Largest Economic Countries in Southeast Asia.pdf
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 

Ensembling & Boosting 概念介紹

  • 3. 如果說 Deep Learning 改變了 ML 的遊戲規則 XGBoost : Kaggle Winning Solution Giuliano Janson: Won two games and retired from Kaggle Persistence: every Kaggler nowadays can put up a great model in a few hours and usually achieve 95% of final score. Only persistence will get you the remaining 5%. Ensembling: need to know how to do it "like a pro". Forget about averaging models. Nowadays many Kaggler do meta-models, and meta-meta-models.
  • 4. Why Ensemble is needed? 奧卡姆剃刀 Occam's Razor ● An explanation of the data should be made as simple as possible, but no simpler. 簡單的方法,勝過複雜的方法。 Simple s good. 任何的浪費都是不好的 將多個簡單的模型組合起來,效果比一個複雜的模型還要好 ● Training data might not provide sufficient information for choosing a single best learner. ● The search processes of the learning algorithms might be imperfect (difficult to achieve unique best hypothesis) ● Hypothesis space being searched might not contain the true target function.
  • 5. 所謂簡單的方法是指 ID3, C4.5, CART … Tree base method Entropy ex. 找出愛花錢的人,以性別作為切分 5 愛(1M,4F), 9 不愛(6M,3F) ● E_all → -5/14 * log(5/14) - 9/14 * log(9/14) ● Entropy is 1 if 50% - 50%, 0 if 100% - 0% Information Gain ● 選擇 a 當作 split attribute,之後 Entropy 比原本減少了多少 ● E_gender → P(M) * E(1,6) + P(F) * E(4,3) Gain = E_all - E_gender http://www.saedsayad.com/decision_tree.htm
  • 7. 一句話講完 Boost Ensemble 知錯能改、善莫大焉 學習就是一遍一遍的的對錯誤加重記憶,然後改進 做錯的事就沒有後悔藥吃了,記取教訓努力在未來不再犯錯 1. 錯了就錯了,不要丟掉,也不要執著 2. 記住錯在哪裡,下次加重學習 3. 一直學到考試都可以考一百分 (誤)
  • 8. 一秒鐘學會用 Ensemble 我想你已經 try 過一些不同 model 了 ● Decision tree, NN, SVM, Regression .. Ensemble Kaggle submission CSV files. → It’s work! Majority Voting ● Three models : 70%, 70%, 70% ● Majority vote ensemble will be ~78%. ● Averaging predictions often reduces overfit. http://mlwave.com/kaggle-ensembling-guide/
  • 9. Ensemble 的陷阱 把 Kobe, Curry, LBJ 組一隊,就會拿總冠軍嗎? Uncorrelated models usually performed better As more accurate as possible, and as more diverse aspossible 常見機制 Majority Vote, Weighted Averaging Voting Ensemble → RandomForest → GradientBoostingMachine 1111111100 = 80% accuracy 1111111100 = 80% accuracy 1011111100 = 70% accuracy 1111111100 = 80% accuracy 1111111100 = 80% accuracy 0111011101 = 70% accuracy 1000101111 = 60% accuracy 1111111101 = 90% accuracy
  • 10. 你一定聽過的 Ensemble 方法 ● Randomly sampling not only dat but also feature ● Majority vote ● Minimal tuning ● Performance pass lots of complex method n: subsample size m: subfeature set size tree size, tree number http://www.slideshare.net/0xdata/jan-vitek-distributedrandomforest522013
  • 11. Base Learner:被拿來 ensemble 的基礎模型 ex. 一棵樹, simple neural network ● Train by base learning algorithm (ex. decision tree, neural network ..) 三大訓練方法分支: ● Boosting - Boost weak learners too strong learners (sequential learners) ● Bagging - Like RandomForest, sampling from data or features ● Stacking - 打包的概念 (parallel learners) ● Employing different learning algorithms to train individual learners ● Individual learners then combined by a second-level learner which is called meta-learner. Ensemble 的關鍵字
  • 12. Bagging Ensemble Bootstrap Aggregating 每次取樣m個資料點 (bootstrap sample) train base learner by calling a base learning algorithm ● Sampling 的比例是學問 ● 甚至針對不同特徵的子資料集 train 不同 model ○ Cherkauer(1996) 火山鑑定工程 32 NN,依據不同 input feature 切分 ● 加入 randomness 元素 ○ backpropagation random init, tree random select feature ● Majority voting 優點 -- 保留整體假說的多樣化特徵
  • 13. Boost Family ● AdaBoost (Adaptive Boosting) ● Gradient Tree Boosting ● XGBoost Conbination of Additive Models 學習收斂效能好 有放大雜訊的危險性 ● Bagging can significantly reduce the variance ● Boosting can significantly reduce the bias
  • 14. http://slideplayer.com/slide/4816467/ Assigns equal weights to all the training examples, increased the weights of incorrectly classified examples.
  • 16. Gradient Boosting Additive training ● New predictor is optimized by moving in the opposite direction of the gradient to minimize the loss function. GBDT 中的決策樹深度較小一般不會超過5,葉子節點的數量也不會超過10 ● Boosted Tree: GBDT, GBRT, MART, LambdaMART
  • 17. Gradient Boosting Model Steps ● Leaf weighted cost score ● Additive training: 加入一個新模型到模型中 → 選擇一個 加入後 cost error 下降最多的模型 ● Greedy algorithm to build new tree from a single leaf ● Gradient update weight
  • 18. Training Tips Shrinkage ● Reduces the influence of each individual tree and leaves space for future trees to improve the model. ● Better to improve model by many small steps than lagre steps. Subsampling, Early Stopping, Post-Prunning
  • 19. ● In 2015, 29 challenge winning solutions, 17 used XGBoost (deep neural nets 11) ● KDDCup 2015 all winning solution mention it. ● 用了直接上 leaderboard top 10 Scalability enables data scientists to process hundred millions of examples on a desktop. ● OpenMP CPU multi-thread ● DMatrix ● Cache-aware and Sparsity-aware 為什麼 XGBoost 這麼威
  • 20. Column Block for Parallel Learning The most time consuming part of tree learning is to get the data into sorted order. In memory block, compressed column format, each column sorted by the corresponding feature value. Block Compression, Block Sharding.
  • 22. Use it in Python xgb_model = XGBClassifier( learning_rate =0.1, n_estimators=1000, max_depth=5, min_child_weight=1, gamma=0, subsample=0.8, colsample_bytree=0.8, objective= 'binary:logistic', nthread=8, scale_pos_weight=1, seed=27) ● gamma : Minimum loss reduction required to make a further partition on a leaf node of the tree. ● min_child_weight : Minimum sum of instance weight(hessian) needed in a child. ● colsample_bytree : Subsample ratio of columns when constructing each tree.
  • 23. Ensamble in Kaggle Voting ensembles, Weighted majority vote, Bagged Perceptrons, Rank averaging, Historical ranks, Stacked & Blending (Netflix)
  • 24. 圖片分類比賽 ● Voting ensemble of around 30 convnets. The best single model scored 0.93170. Final score 0.94120. Ensemble in Kaggle
  • 25. No Free Lunch Ensemble is much better than single learner. Bias-variance tradeoff → Boosting or Average vote it. ● Not understandable -- like DNN, Non-linear SVM ● There is no ensemble method which outperforms other ensemble methods consistently Selecting some base learners instead of using all of them to compose an ensemble is a better choice -- selective ensembles XGBoost(tabular data) v.s. Deep Learning(more & complex data, hard tuning)
  • 26. Reference ● Gradient boosting machines, a tutorial Alexey Natekin1* and Alois Knoll2 ● XGBoost: A Scalable Tree Boosting System - Tianqi Chen ● NTU cmlab http://www.cmlab.csie.ntu.edu.tw/~cyy/learning/tutorials/ ● http://mlwave.com/kaggle-ensembling-guide/