SlideShare une entreprise Scribd logo
1  sur  70
Télécharger pour lire hors ligne
Valencian Summer School in Machine Learning
3rd edition
September 14-15, 2017
BigML, Inc 2
Ensembles
Making trees unstoppable
Poul Petersen
CIO, BigML, Inc
BigML, Inc 3Ensembles
what is an Ensemble?
• Rather than build a single model…
• Combine the output of several typically “weaker” models into
a powerful ensemble…
• Q1: Why is this necessary?
• Q2: How do we build “weaker” models?
• Q3: How do we “combine” models?
BigML, Inc 4Ensembles
No Model is Perfect
• A given ML algorithm may simply not be able to exactly
model the “real solution” of a particular dataset.
• Try to fit a line to a curve
• Even if the model is very capable, the “real solution” may be
elusive
• DT/NN can model any decision boundary with enough
training data, but the solution is NP-hard
• Practical algorithms involve random processes and may
arrive at different, yet equally good, “solutions” depending
on the starting conditions, local optima, etc.
• If that wasn’t bad enough…
BigML, Inc 5Ensembles
No Data is Perfect
• Not enough data!
• Always working with finite training data
• Therefore, every “model” is an approximation of the “real
solution” and there may be several good approximations.
• Anomalies / Outliers
• The model is trying to generalize from discrete training
data.
• Outliers can “skew” the model, by overfitting
• Mistakes in your data
• Does the model have to do everything for you?
• But really, there is always mistakes in your data
BigML, Inc 6Ensembles
Ensembles Techniques
• Key Idea:
• By combining several good “models”, the combination
may be closer to the best possible “model”
• we want to ensure diversity. It’s not useful to use an
ensemble of 100 models that are all the same
• Training Data Tricks
• Build several models, each with only some of the data
• Introduce randomness directly into the algorithm
• Add training weights to “focus” the additional models on
the mistakes made
• Prediction Tricks
• Model the mistakes
• Model the output of several different algorithms
BigML, Inc 7Ensembles
Simple Example
BigML, Inc 8Ensembles
Simple Example
BigML, Inc 9Ensembles
Simple Example
Partition the data… then model each partition…
For predictions, use the model for the same partition
?
BigML, Inc 10Ensembles
Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
COMBINER
BigML, Inc 11Ensembles
Ensembles Demo #1
BigML, Inc 12Ensembles
Decision Forest Config
• Individual tree parameters are still available
• Balanced objective, Missing splits, Node Depth, etc.
• Number of models: How many trees to build
• Sampling options:
• Deterministic / Random
• Replacement:
• Allows sampling the same instance more than once
• Effectively the same as ≈ 63.21%
• “Full size” samples with zero covariance (good thing)
• At prediction time
• Combiner…
BigML, Inc 13Ensembles
Quick Review
animal state … proximity action
tiger hungry … close run
elephant happy … far take picture
… … … … …
Classification
animal state … proximity min_kmh
tiger hungry … close 70
hippo angry … far 10
… …. … … …
Regression
label
BigML, Inc 14Ensembles
Ensemble Combiners
• Regression: Average of the predictions and expected error
• Classification:
• Plurality - majority wins.
• Confidence Weighted - majority wins but each vote is
weighted by the confidence.
• Probability Weighted - each tree votes the distribution at
it’s leaf node.
• K Threshold - only votes if the specified class and
required number of trees is met. For example, allowing a
“True” vote if and only if at least 9 out of 10 trees vote
“True”.
• Confidence Threshold - only votes the specified class if
the minimum confidence is met.
BigML, Inc 15Ensembles
Ensembles Demo #2
BigML, Inc 16Ensembles
Outlier Example
Diameter Color Shape Fruit
4 red round plum
5 red round apple
5 red round apple
6 red round plum
7 red round apple
All Data: “plum”
Sample 2: “apple”
Sample 3: “apple”
Sample 1: “plum”
}“apple”
What is a round, red 6cm fruit?
BigML, Inc 17Ensembles
Random Decision Forest
MODEL 1
DATASET
SAMPLE 1
SAMPLE 2
SAMPLE 3
SAMPLE 4
MODEL 2
MODEL 3
MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
SAMPLE 1
PREDICTION
COMBINER
BigML, Inc 18Ensembles
RDF Config
• Individual tree parameters are still available
• Balanced objective, Missing splits, Node Depth, etc.
• Decision Forest parameters still available
• Number of model, Sampling, etc
• Random candidates:
• The number of features to consider at each split
BigML, Inc 19Ensembles
Ensembles Demo #3
BigML, Inc 20Ensembles
Boosting
ADDRESS BEDS BATHS SQFT
LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE
LAST SALE
PRICE
1522 NW
Jonquil
4 3 2424 5227 1991 44.594828 -123.269328 360000
7360 NW
Valley Vw
3 2 1785 25700 1979 44.643876 -123.238189 307500
4748 NW
Veronica
5 3.5 4135 6098 2004 44.5929659 -123.306916 600000
411 NW 16th 3 2825 4792 1938 44.570883 -123.272113 435350
MODEL 1
PREDICTED
SALE PRICE
360750
306875
587500
435350
ERROR
750
-625
-12500
0
ADDRESS BEDS BATHS SQFT
LOT
SIZE
YEAR
BUILT
LATITUDE LONGITUDE ERROR
1522 NW
Jonquil
4 3 2424 5227 1991 44.594828 -123.269328 750
7360 NW
Valley Vw
3 2 1785 25700 1979 44.643876 -123.238189 625
4748 NW
Veronica
5 3.5 4135 6098 2004 44.5929659 -123.306916 12500
411 NW 16th 3 2825 4792 1938 44.570883 -123.272113 0
MODEL 2
PREDICTED
ERROR
750
625
12393.83333
6879.67857
Why	
  stop	
  at	
  one	
  iteration?
"Hey Model 1, what do you predict is the sale price of this home?"
"Hey Model 2, how much error do you predict Model 1 just made?"
BigML, Inc 21Ensembles
Boosting
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration	
  1
Iteration	
  2
Iteration	
  3
Iteration	
  4	
  
etc…
BigML, Inc 22Ensembles
Boosting Config
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
BigML, Inc 23Ensembles
Boosting Config
“OUT OF BAG”
SAMPLES
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration	
  1
Iteration	
  2
Iteration	
  3
Iteration	
  4	
  
etc…
BigML, Inc 24Ensembles
Boosting Config
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to
use a high number of iterations and let the early stopping
work.
BigML, Inc 25Ensembles
Iterations
Boosted	
  Ensemble	
  #1
1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50
Early Stop # Iterations
1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50
Boosted	
  Ensemble	
  #2
Early Stop# Iterations
This is OK because the early stop means the iterative improvement is small

and we have "converged" before being forcibly stopped by the # iterations
This is NOT OK because the hard limit on iterations stopped improving the quality of the
boosting long before there was enough iterations to have achieved the best quality.
BigML, Inc 26Ensembles
Boosting Config
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to
use a high number of iterations and let the early stopping
work.
• Learning Rate: Controls how aggressively boosting will fit the data:
• Larger values ~ maybe quicker fit, but risk of overfitting
• You can combine sampling with Boosting!
• Samples with Replacement
• Add Randomize
BigML, Inc 27Ensembles
Boosting Randomize
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration	
  1
Iteration	
  2
Iteration	
  3
Iteration	
  4	
  
etc…
BigML, Inc 28Ensembles
Boosting Randomize
DATASET MODEL 1
DATASET 2 MODEL 2
DATASET 3 MODEL 3
DATASET 4 MODEL 4
PREDICTION 1
PREDICTION 2
PREDICTION 3
PREDICTION 4
PREDICTION
SUM
Iteration	
  1
Iteration	
  2
Iteration	
  3
Iteration	
  4	
  
etc…
BigML, Inc 29Ensembles
Boosting Config
• Number of iterations - similar to number of models for DF/RDF
• Iterations can be limited with Early Stopping:
• Early out of bag: tests with the out-of-bag samples
• Early holdout: tests with a portion of the dataset
• None: performs all iterations. Note: In general, it is better to
use a high number of iterations and let the early stopping
work.
• Learning Rate: Controls how aggressively boosting will fit the data:
• Larger values ~ maybe quicker fit, but risk of overfitting
• You can combine sampling with Boosting!
• Samples with Replacement
• Add Randomize
• Individual tree parameters are still available
• Balanced objective, Missing splits, Node Depth, etc.
BigML, Inc 30Ensembles
Ensembles Demo #4
BigML, Inc 31Ensembles
Wait a Second…
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age diabetes
6 148 72 35 0 33.6 0.627 50 TRUE
1 85 66 29 0 26.6 0.351 31 FALSE
8 183 64 0 0 23.3 0.672 32 TRUE
1 89 66 23 94 28.1 0.167 21 FALSE
MODEL 1
predicted
diabetes
TRUE
TRUE
FALSE
FALSE
ERROR
?
?
?
?
…	
  what	
  about	
  classification?
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age diabetes
6 148 72 35 0 33.6 0.627 50 1
1 85 66 29 0 26.6 0.351 31 0
8 183 64 0 0 23.3 0.672 32 1
1 89 66 23 94 28.1 0.167 21 0
MODEL 1
predicted
diabetes
1
1
0
0
ERROR
0
-1
1
0
…	
  we	
  could	
  try
BigML, Inc 32Ensembles
Wait a Second…
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age
favorite
color
6 148 72 35 0 33.6 0.627 50 RED
1 85 66 29 0 26.6 0.351 31 GREEN
8 183 64 0 0 23.3 0.672 32 BLUE
1 89 66 23 94 28.1 0.167 21 RED
MODEL 1
predicted
favorite color
BLUE
GREEN
RED
GREEN
ERROR
?
?
?
?
…	
  but	
  then	
  what	
  about	
  multiple	
  classes?
BigML, Inc 33Ensembles
Boosting Classification
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age
favorite
color
6 148 72 35 0 33.6 0.627 50 RED
1 85 66 29 0 26.6 0.351 31 GREEN
8 183 64 0 0 23.3 0.672 32 BLUE
1 89 66 23 94 28.1 0.167 21 RED
MODEL 1
RED/NOT RED
Class RED
Probability
0.9
0.7
0.46
0.12
Class RED
ERROR
0.1
-0.7
0.54
-0.12
pregnancies
plasma
glucose
blood
pressure
triceps skin
thickness
insulin bmi
diabetes
pedigree
age ERROR
6 148 72 35 0 33.6 0.627 50 0.1
1 85 66 29 0 26.6 0.351 31 -0.7
8 183 64 0 0 23.3 0.672 32 0.54
1 89 66 23 94 28.1 0.167 21 -0.12
MODEL 2
RED/NOT RED ERR
PREDICTED
ERROR
0.05
-0.54
0.32
-0.22
MODEL 1
BLUE/NOT BLUE
Class BLUE
Probability
0.1
0.3
0.54
0.88
Class BLUE
ERROR
-0.1
0.7
-0.54
0.12
…and	
  repeat	
  for	
  each	
  	
  
class	
  at	
  each	
  iteration
…and	
  repeat	
  for	
  each	
  	
  
class	
  at	
  each	
  iteration
Iteration	
  1
Iteration	
  2
BigML, Inc 34Ensembles
Boosting Classification
DATASET
MODELS 1
per class
DATASETS 2
per class
MODELS 2
per class
PREDICTIONS 1
per class
PREDICTIONS 2
per class
PREDICTIONS 3
per class
PREDICTIONS 4
per class
Comb
PROBABILITY
per class
MODELS 3
per class
MODELS 4
per class
DATASETS 3
per class
DATASETS 4
per class
Iteration	
  1
Iteration	
  2
Iteration	
  3
Iteration	
  4	
  
etc…
BigML, Inc 35Ensembles
Ensembles Demo #5
BigML, Inc 36Ensembles
Stacked Generalization
ENSEMBLE
LOGISTIC
REGRESSION
SOURCE DATASET
MODEL
BATCH
PREDICTION
BATCH
PREDICTION
BATCH
PREDICTION
EXTENDED
DATASET
EXTENDED
DATASET
EXTENDED
DATASET
LOGISTIC
REGRESSION
BigML, Inc 37Ensembles
Which Ensemble Method
• The one that works best!
• Ok, but seriously. Did you evaluate?
• For "large" / "complex" datasets
• Use DF/RDF with deeper node threshold
• Even better, use Boosting with more iterations
• For "noisy" data
• Boosting may overfit
• RDF preferred
• For "wide" data
• Randomize features (RDF) will be quicker
• For "easy" data
• A single model may be fine
• Bonus: also has the best interpretability!
• For classification with "large" number of classes
• Boosting will be slower
• For "general" data
• DF/RDF likely better than a single model or Boosting.
• Boosting will be slower since the models are processed serially
BigML, Inc 38Ensembles
Too Many Parameters?
• How many trees?
• How many nodes?
• Missing splits?
• Random candidates?
• Too many parameters?
SMACdown!
BigML, Inc 39Ensembles
Summary
• Models have shortcomings: ability to fit, NP-hard, etc
• Data has shortcomings: not enough, outliers, mistakes, etc
• Ensemble Techniques can improve on single models
• Sampling: partitioning, Decision Tree bagging
• Adding Randomness: RDF
• Modeling the Error: Boosting
• Modeling the Models: Stacking
• Guidelines for knowing which one might work best in a given
situation
BigML, Inc 2
Logistic Regressions
Modeling probabilities
Poul Petersen
CIO, BigML, Inc
BigML, Inc 3Logistic Regressions
Logistic Regression
• Classification implies a discrete objective. How can this be a
regression?
• Why do we need another classification algorithm?
• more questions….
Logistic Regression is a classification algorithm
Potential Confusion:
BigML, Inc 4Logistic Regressions
Linear Regression
BigML, Inc 5Logistic Regressions
Linear Regression
BigML, Inc 6Logistic Regressions
Polynomial Regression
BigML, Inc 7Logistic Regressions
Regression
• Linear Regression: 𝛽₀+𝛽1·∙(INPUT)	
  ≈	
  OBJECTIVE
• Quadratic Regression: 𝛽₀+𝛽1·∙(INPUT)+𝛽2·∙(INPUT)2	
  ≈	
  OBJECTIVE
• Decision Tree Regression: DT(INPUT)	
  ≈	
  OBJECTIVE
• Problem:
• What if we want to do a classification problem: T/F or 1/0
• What function can we fit to discrete data?
Regression is the process of "fitting" a function to the data
Key Take-Away:
BigML, Inc 8Logistic Regressions
Discrete Data Function?
BigML, Inc 9Logistic Regressions
Discrete Data Function?
????
BigML, Inc 10Logistic Regressions
Logistic Function
𝑥➝-­‐∞	
  	
  	
  
𝑓(𝑥)➝0
• Looks promising, but still not "discrete"
• What about the "green" in the middle?
• Let’s change the problem…
𝑥➝∞	
  
𝑓(𝑥)➝1
Goal
1	
  
1	
  +	
   𝒆−𝑥𝑓(𝑥)	
  =	
  
Logistic Function
BigML, Inc 11Logistic Regressions
Modeling Probabilities
𝑃≈0 𝑃≈10<𝑃<1
BigML, Inc 12Logistic Regressions
Logistic Regression
• Assumes that output is linearly related to "predictors"
• What? (hang in there…)
• Sometimes we can "fix" this with feature engineering
• Question: how do we "fit" the logistic function to real data?
LR is a classification algorithm … that uses a regression …

to model the probability of the discrete objective
Clarification:
Caveats:
BigML, Inc 13Logistic Regressions
Logistic Regression
𝛽₀ is the "intercept"
𝛽₁ is the "coefficient"
• In which case solving is now a linear regression
• But this is only one dimension, that is one feature 𝑥…
• Given training data consisting of inputs 𝑥, and probabilities 𝑃
• Solve for 𝛽₀ and 𝛽₁ to fit the logistic function
• How? The inverse of the logistic function is called the "logit":
𝑃(𝑥)=
1
1+𝑒−(𝛽0+𝛽1 𝑥)
𝑙𝑛( )𝑃(𝑥)
𝑃(𝑥ʹ′)
=𝑙𝑛 ( )1-𝑃(𝑥ʹ′)
𝑃(𝑥)
=𝛽0+𝛽1 𝑥
BigML, Inc 14Logistic Regressions
Logistic Regression
For "𝑖" dimensions, 𝑿﹦[	
   𝑥1,	
   𝑥2,⋯,	
   𝑥𝑖	
  ],	
  we solve
𝑃(𝑿)=
1
1+𝑒−𝑓(𝑿)
𝑓(𝑿)=𝛽0+𝞫·∙𝑿=𝛽0+𝛽1 𝑥1+⋯+𝛽𝑖 𝑥𝑖
where:
BigML, Inc 15Logistic Regressions
Interpreting Coefficients
• LR computes 𝛽0 and coefficients 𝛽𝑗 for each feature 𝑥𝑗
• negative 𝛽𝑗 → negatively correlated:
• positive 𝛽𝑗 → positively correlated:
• "larger" 𝛽𝑗 → more impact:
• "smaller" → less impact:
• 𝛽𝑗	
  "size" should not be confused with field importance
• Can include a coefficient for "missing" (if enabled)
• 𝑃(𝑿)	
  =	
   𝛽0+⋯+𝛽𝑗 𝑥𝑗+⋯	
  
• Binary Classification (true/false) coefficients are complementary
• 𝑃(True)	
  ≡	
  1−	
   𝑃(False)
+𝛽𝑗+1[	
   𝑥𝑗	
  ≡	
  Missing	
  ]
𝑥𝑗↑	
  then	
   𝑃(𝑿)↓
𝑥𝑗↑	
  then	
   𝑃(𝑿)↑
𝑥𝑗≫	
  then	
   𝑃(𝑿)﹥
𝑥𝑗﹥then	
   𝑃(𝑿)≫
BigML, Inc 16Logistic Regressions
LR Demo #1
BigML, Inc 17Logistic Regressions
LR Parameters
1. Default Numeric: Replaces missing numeric values
2. Missing Numeric: Adds a field for missing numerics
3. Stats: Extended statistics, ex: p-value (runs slower)
4. Bias: Enables/Disables the intercept term - 𝛽₀
• Don’t disable this…
5. Regularization: Reduces over-fitting by minimizing 𝛽𝑗
• L1: prefers reducing individual coefficients
• L2 (default): prefers reducing all coefficients
6. Strength "C": Higher values reduce regularization
7. EPS: The minimum error between steps to stop
Larger values stop earlier but quality may be less
8. Auto-scaling: Ensures that all features contribute equally
• Don’t change this unless you have a specific reason
BigML, Inc 18Logistic Regressions
LR Questions
• How do we handle multiple classes?
• Binary class True/False only need to solve for one

𝑃(True)	
  ≡	
  1−	
   𝑃(False)	
  
• What about non-numeric inputs?
• Text/Items fields
• Categorical fields
Questions:
BigML, Inc 19Logistic Regressions
LR - Multi Class
• Instead of a binary class ex: [ true, false ], 

we have multi-class ex: [ red, green, blue, … ]
• "𝑘" classes: 𝑪=[𝑐1,	
   𝑐2,⋯,	
   𝑐 𝑘]
• solve one-vs-rest LR
• Result: 𝞫𝑗 for each class 𝑐𝑗
• apply combiner to ensure

all probabilities add to 1
𝑙𝑛( )𝑃(𝑐1)
𝑃(𝑐1ʹ′)
=𝛽1,0+𝞫1·∙𝑿
𝑙𝑛( )𝑃(𝑐2)
𝑃(𝑐2ʹ′)
=𝛽2,0+𝞫2·∙𝑿
⋯
𝑙𝑛( )𝑃(𝑐 𝑘)
𝑃(𝑐 𝑘ʹ′)
=𝛽 𝑘,0+𝞫 𝑘·∙𝑿
BigML, Inc 20Logistic Regressions
LR - Field Codings
• LR is expecting numeric values to perform regression.
• How do we handle categorical values, or text?
Class color=red color=blue color=green color=NULL
red 1 0 0 0
blue 0 1 0 0
green 0 0 1 0
MISSING 0 0 0 1
One-hot encoding
• Only one feature is "hot" for each class
• This is the default
BigML, Inc 21Logistic Regressions
LR - Field Codings
Dummy Encoding
• Chooses a *reference class*
• requires one less degree of freedom
Class color_1 color_2 color_3
*red* 0 0 0
blue 1 0 0
green 0 1 0
MISSING 0 0 1
BigML, Inc 22Logistic Regressions
LR - Field Codings
Contrast Encoding
• Field values must sum to zero
• Allows comparison between classes
Class field "influence"
red 0.5 positive
blue -0.25 negative
green -0.25 negative
MISSING 0 excluded
BigML, Inc 23Logistic Regressions
LR - Field Codings
Which one to use?
• One-hot is the default
• Use this unless you have a specific need
• Dummy
• Use when there is a control group in mind, which

becomes the reference class
• Contrast
• Allows for testing specific hypothesis of relationships.
• Ex: customers give a "rating" of bad / ok / good
rating
Contrast
Encoding
bad -0.66
ok 0.33
good 0.33
Hypothesis is a
good and ok
review have the
same impact, but
a bad review has
a negative impact
twice as great.
rating
Contrast
Encoding
bad -0.5
ok 0
good 0.5
Hypothesis is that
a good and bad
review have an
equal but opposite
impact, while an
ok rating has no
impact.
BigML, Inc 24Logistic Regressions
LR - Field Codings
• Text/Items field types are handled by creating a field
for text token/item and setting 1 or 0
Text "hippo" "safari" "zebra"
“we saw hippos and
zebras…
1 0 1
“The best safari for
seeing zebras”
0 1 1
“The Oregon coast
is rainy in winter”
0 0 0
“Have you ever tried
a hippo burger”
1 0 0
Text / Items
BigML, Inc 25Logistic Regressions
LR Demo #2
BigML, Inc 26Logistic Regressions
Curvilinear LR
• Logistic Regression is expecting a linear relationship
between the features and the objective
• Remember - it’s a linear regression under the hood
• This is actually pretty common in natural datasets
• But non-linear relationships will impact model quality
• This can be addressed by adding non-linear
transformations to the features
• Knowing which transformations requires
• domain knowledge
• experimentation
• both
BigML, Inc 27Logistic Regressions
Curvilinear LR
Instead of
We could add a feature
Where
????
Possible to add any higher order terms or other functions to
match shape of data
𝛽0+𝛽1 𝑥1
𝛽0+𝛽1 𝑥1+𝛽2 𝑥2
𝑥1	
  ≡	
   𝑥2
2
BigML, Inc 28Logistic Regressions
LR Demo #3
BigML, Inc 29Logistic Regressions
LR vs DT
• Expects a "smooth" linear
relationship with predictors.
• LR is concerned with probability of
a discrete outcome.
• Lots of parameters to get wrong: 

regularization, scaling, codings
• Slightly less prone to over-fitting

• Because fits a shape, might work
better when less data available.

• Adapts well to ragged non-linear
relationships
• No concern: classification,
regression, multi-class all fine.
• Virtually parameter free

• Slightly more prone to over-fitting

• Prefers surfaces parallel to
parameter axes, but given enough
data will discover any shape.
Logistic Regression Decision Tree
BigML, Inc 30Logistic Regressions
LR Demo #4
BigML, Inc 31Logistic Regressions
Summary
• Logistic Regression is a classification algorithm that
models the probabilities of each class
• How the algorithm works and why this is important
• Expects a linear relationship between the features
and the objective, and how to fix it
• Categorical encodings
• LR outputs a set of coefficients and how to interpret
• Scale relates to impact
• Sign relates to direction of impact
• Guidelines for comparing to Decision Trees
VSSML17 L2. Ensembles and Logistic Regressions

Contenu connexe

Tendances

BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBigML, Inc
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringBigML, Inc
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBigML, Inc
 
BSSML17 - Clusters
BSSML17 - ClustersBSSML17 - Clusters
BSSML17 - ClustersBigML, Inc
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBigML, Inc
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - DeepnetsBigML, Inc
 
BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBigML, Inc
 
BSSML17 - Time Series
BSSML17 - Time SeriesBSSML17 - Time Series
BSSML17 - Time SeriesBigML, Inc
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionBigML, Inc
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBigML, Inc
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1BigML, Inc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBigML, Inc
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsBigML, Inc
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesBigML, Inc
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML, Inc
 
BigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML, Inc
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBigML, Inc
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBigML, Inc
 

Tendances (20)

BSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 SessionsBSSML16 L5. Summary Day 1 Sessions
BSSML16 L5. Summary Day 1 Sessions
 
VSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature EngineeringVSSML17 L5. Basic Data Transformations and Feature Engineering
VSSML17 L5. Basic Data Transformations and Feature Engineering
 
BSSML17 - Logistic Regressions
BSSML17 - Logistic RegressionsBSSML17 - Logistic Regressions
BSSML17 - Logistic Regressions
 
BSSML17 - Clusters
BSSML17 - ClustersBSSML17 - Clusters
BSSML17 - Clusters
 
BSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and EvaluationsBSSML16 L1. Introduction, Models, and Evaluations
BSSML16 L1. Introduction, Models, and Evaluations
 
BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
BSSML17 - Anomaly Detection
BSSML17 - Anomaly DetectionBSSML17 - Anomaly Detection
BSSML17 - Anomaly Detection
 
BSSML17 - Time Series
BSSML17 - Time SeriesBSSML17 - Time Series
BSSML17 - Time Series
 
VSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly DetectionVSSML16 L3. Clusters and Anomaly Detection
VSSML16 L3. Clusters and Anomaly Detection
 
BSSML17 - Basic Data Transformations
BSSML17 - Basic Data TransformationsBSSML17 - Basic Data Transformations
BSSML17 - Basic Data Transformations
 
VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1VSSML16 LR1. Summary Day 1
VSSML16 LR1. Summary Day 1
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
BSSML17 - Feature Engineering
BSSML17 - Feature EngineeringBSSML17 - Feature Engineering
BSSML17 - Feature Engineering
 
VSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 SessionsVSSML17 Review. Summary Day 2 Sessions
VSSML17 Review. Summary Day 2 Sessions
 
DutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time SeriesDutchMLSchool. Logistic Regression, Deepnets, Time Series
DutchMLSchool. Logistic Regression, Deepnets, Time Series
 
BigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with FlatlineBigML Education - Feature Engineering with Flatline
BigML Education - Feature Engineering with Flatline
 
BigML Education - Logistic Regression
BigML Education - Logistic RegressionBigML Education - Logistic Regression
BigML Education - Logistic Regression
 
Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
BSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data TransformationsBSSML16 L6. Basic Data Transformations
BSSML16 L6. Basic Data Transformations
 
BSSML17 - API and WhizzML
BSSML17 - API and WhizzMLBSSML17 - API and WhizzML
BSSML17 - API and WhizzML
 

Similaire à VSSML17 L2. Ensembles and Logistic Regressions

VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsBigML, Inc
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsDarius Barušauskas
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balanceAlex Henderson
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Alok Singh
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformBigML, Inc
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018digitalzombie
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Maarten Smeets
 
VSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsVSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsBigML, Inc
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesDatabricks
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandrySri Ambati
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetupWillem Hendriks
 
Phinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkPhinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkUC Davis
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 

Similaire à VSSML17 L2. Ensembles and Logistic Regressions (20)

VSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic RegressionsVSSML18. Ensembles and Logistic Regressions
VSSML18. Ensembles and Logistic Regressions
 
Tips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitionsTips and tricks to win kaggle data science competitions
Tips and tricks to win kaggle data science competitions
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
To bag, or to boost? A question of balance
To bag, or to boost? A question of balanceTo bag, or to boost? A question of balance
To bag, or to boost? A question of balance
 
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
Big Data Spain 2018: How to build Weighted XGBoost ML model for Imbalance dat...
 
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML PlatformDutchMLSchool. Introduction to Machine Learning with the BigML Platform
DutchMLSchool. Introduction to Machine Learning with the BigML Platform
 
Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018Random forest sgv_ai_talk_oct_2_2018
Random forest sgv_ai_talk_oct_2_2018
 
Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!Performance Issue? Machine Learning to the rescue!
Performance Issue? Machine Learning to the rescue!
 
VSSML18. OptiML and Fusions
VSSML18. OptiML and FusionsVSSML18. OptiML and Fusions
VSSML18. OptiML and Fusions
 
PandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled EnsemblesPandasUDFs: One Weird Trick to Scaled Ensembles
PandasUDFs: One Weird Trick to Scaled Ensembles
 
Maths Behind Models.pptx
Maths Behind Models.pptxMaths Behind Models.pptx
Maths Behind Models.pptx
 
Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
Random forests-talk-nl-meetup
Random forests-talk-nl-meetupRandom forests-talk-nl-meetup
Random forests-talk-nl-meetup
 
Phinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group TalkPhinney 2019 ASMS Proteome software Users group Talk
Phinney 2019 ASMS Proteome software Users group Talk
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Debugging machine-learning
Debugging machine-learningDebugging machine-learning
Debugging machine-learning
 

Plus de BigML, Inc

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingBigML, Inc
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationBigML, Inc
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceBigML, Inc
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesBigML, Inc
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector BigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionBigML, Inc
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLBigML, Inc
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLBigML, Inc
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyBigML, Inc
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorBigML, Inc
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsBigML, Inc
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsBigML, Inc
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleBigML, Inc
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIBigML, Inc
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object DetectionBigML, Inc
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image ProcessingBigML, Inc
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureBigML, Inc
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorBigML, Inc
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotBigML, Inc
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...BigML, Inc
 

Plus de BigML, Inc (20)

Digital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in ManufacturingDigital Transformation and Process Optimization in Manufacturing
Digital Transformation and Process Optimization in Manufacturing
 
DutchMLSchool 2022 - Automation
DutchMLSchool 2022 - AutomationDutchMLSchool 2022 - Automation
DutchMLSchool 2022 - Automation
 
DutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML ComplianceDutchMLSchool 2022 - ML for AML Compliance
DutchMLSchool 2022 - ML for AML Compliance
 
DutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective AnomaliesDutchMLSchool 2022 - Multi Perspective Anomalies
DutchMLSchool 2022 - Multi Perspective Anomalies
 
DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector DutchMLSchool 2022 - My First Anomaly Detector
DutchMLSchool 2022 - My First Anomaly Detector
 
DutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly DetectionDutchMLSchool 2022 - Anomaly Detection
DutchMLSchool 2022 - Anomaly Detection
 
DutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in MLDutchMLSchool 2022 - History and Developments in ML
DutchMLSchool 2022 - History and Developments in ML
 
DutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End MLDutchMLSchool 2022 - End-to-End ML
DutchMLSchool 2022 - End-to-End ML
 
DutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven CompanyDutchMLSchool 2022 - A Data-Driven Company
DutchMLSchool 2022 - A Data-Driven Company
 
DutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal SectorDutchMLSchool 2022 - ML in the Legal Sector
DutchMLSchool 2022 - ML in the Legal Sector
 
DutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe StadiumsDutchMLSchool 2022 - Smart Safe Stadiums
DutchMLSchool 2022 - Smart Safe Stadiums
 
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing PlantsDutchMLSchool 2022 - Process Optimization in Manufacturing Plants
DutchMLSchool 2022 - Process Optimization in Manufacturing Plants
 
DutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at ScaleDutchMLSchool 2022 - Anomaly Detection at Scale
DutchMLSchool 2022 - Anomaly Detection at Scale
 
DutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AIDutchMLSchool 2022 - Citizen Development in AI
DutchMLSchool 2022 - Citizen Development in AI
 
Democratizing Object Detection
Democratizing Object DetectionDemocratizing Object Detection
Democratizing Object Detection
 
BigML Release: Image Processing
BigML Release: Image ProcessingBigML Release: Image Processing
BigML Release: Image Processing
 
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your FutureMachine Learning in Retail: Know Your Customers' Customer. See Your Future
Machine Learning in Retail: Know Your Customers' Customer. See Your Future
 
Machine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail SectorMachine Learning in Retail: ML in the Retail Sector
Machine Learning in Retail: ML in the Retail Sector
 
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a LawyerbotML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
ML in GRC: Machine Learning in Legal Automation, How to Trust a Lawyerbot
 
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
ML in GRC: Supporting Human Decision Making for Regulatory Adherence with Mac...
 

Dernier

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...PrithaVashisht1
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerPavel Šabatka
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionajayrajaganeshkayala
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsThinkInnovation
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxDwiAyuSitiHartinah
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxdhiyaneswaranv1
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Guido X Jansen
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introductionsanjaymuralee1
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityAggregage
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxVenkatasubramani13
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Vladislav Solodkiy
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?sonikadigital1
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best PracticesDataArchiva
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationGiorgio Carbone
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructuresonikadigital1
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxFinatron037
 

Dernier (16)

Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...Elements of language learning - an analysis of how different elements of lang...
Elements of language learning - an analysis of how different elements of lang...
 
The Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayerThe Universal GTM - how we design GTM and dataLayer
The Universal GTM - how we design GTM and dataLayer
 
CI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual interventionCI, CD -Tools to integrate without manual intervention
CI, CD -Tools to integrate without manual intervention
 
Optimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in LogisticsOptimal Decision Making - Cost Reduction in Logistics
Optimal Decision Making - Cost Reduction in Logistics
 
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptxTINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
TINJUAN PEMROSESAN TRANSAKSI DAN ERP.pptx
 
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptxCCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
CCS336-Cloud-Services-Management-Lecture-Notes-1.pptx
 
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
Persuasive E-commerce, Our Biased Brain @ Bikkeldag 2024
 
Virtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product IntroductionVirtuosoft SmartSync Product Introduction
Virtuosoft SmartSync Product Introduction
 
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for ClarityStrategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
Strategic CX: A Deep Dive into Voice of the Customer Insights for Clarity
 
Mapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptxMapping the pubmed data under different suptopics using NLP.pptx
Mapping the pubmed data under different suptopics using NLP.pptx
 
Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023Cash Is Still King: ATM market research '2023
Cash Is Still King: ATM market research '2023
 
How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?How is Real-Time Analytics Different from Traditional OLAP?
How is Real-Time Analytics Different from Traditional OLAP?
 
5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices5 Ds to Define Data Archiving Best Practices
5 Ds to Define Data Archiving Best Practices
 
Master's Thesis - Data Science - Presentation
Master's Thesis - Data Science - PresentationMaster's Thesis - Data Science - Presentation
Master's Thesis - Data Science - Presentation
 
ChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics InfrastructureChistaDATA Real-Time DATA Analytics Infrastructure
ChistaDATA Real-Time DATA Analytics Infrastructure
 
Rock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptxRock Songs common codes and conventions.pptx
Rock Songs common codes and conventions.pptx
 

VSSML17 L2. Ensembles and Logistic Regressions

  • 1. Valencian Summer School in Machine Learning 3rd edition September 14-15, 2017
  • 2. BigML, Inc 2 Ensembles Making trees unstoppable Poul Petersen CIO, BigML, Inc
  • 3. BigML, Inc 3Ensembles what is an Ensemble? • Rather than build a single model… • Combine the output of several typically “weaker” models into a powerful ensemble… • Q1: Why is this necessary? • Q2: How do we build “weaker” models? • Q3: How do we “combine” models?
  • 4. BigML, Inc 4Ensembles No Model is Perfect • A given ML algorithm may simply not be able to exactly model the “real solution” of a particular dataset. • Try to fit a line to a curve • Even if the model is very capable, the “real solution” may be elusive • DT/NN can model any decision boundary with enough training data, but the solution is NP-hard • Practical algorithms involve random processes and may arrive at different, yet equally good, “solutions” depending on the starting conditions, local optima, etc. • If that wasn’t bad enough…
  • 5. BigML, Inc 5Ensembles No Data is Perfect • Not enough data! • Always working with finite training data • Therefore, every “model” is an approximation of the “real solution” and there may be several good approximations. • Anomalies / Outliers • The model is trying to generalize from discrete training data. • Outliers can “skew” the model, by overfitting • Mistakes in your data • Does the model have to do everything for you? • But really, there is always mistakes in your data
  • 6. BigML, Inc 6Ensembles Ensembles Techniques • Key Idea: • By combining several good “models”, the combination may be closer to the best possible “model” • we want to ensure diversity. It’s not useful to use an ensemble of 100 models that are all the same • Training Data Tricks • Build several models, each with only some of the data • Introduce randomness directly into the algorithm • Add training weights to “focus” the additional models on the mistakes made • Prediction Tricks • Model the mistakes • Model the output of several different algorithms
  • 9. BigML, Inc 9Ensembles Simple Example Partition the data… then model each partition… For predictions, use the model for the same partition ?
  • 10. BigML, Inc 10Ensembles Decision Forest MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION COMBINER
  • 12. BigML, Inc 12Ensembles Decision Forest Config • Individual tree parameters are still available • Balanced objective, Missing splits, Node Depth, etc. • Number of models: How many trees to build • Sampling options: • Deterministic / Random • Replacement: • Allows sampling the same instance more than once • Effectively the same as ≈ 63.21% • “Full size” samples with zero covariance (good thing) • At prediction time • Combiner…
  • 13. BigML, Inc 13Ensembles Quick Review animal state … proximity action tiger hungry … close run elephant happy … far take picture … … … … … Classification animal state … proximity min_kmh tiger hungry … close 70 hippo angry … far 10 … …. … … … Regression label
  • 14. BigML, Inc 14Ensembles Ensemble Combiners • Regression: Average of the predictions and expected error • Classification: • Plurality - majority wins. • Confidence Weighted - majority wins but each vote is weighted by the confidence. • Probability Weighted - each tree votes the distribution at it’s leaf node. • K Threshold - only votes if the specified class and required number of trees is met. For example, allowing a “True” vote if and only if at least 9 out of 10 trees vote “True”. • Confidence Threshold - only votes the specified class if the minimum confidence is met.
  • 16. BigML, Inc 16Ensembles Outlier Example Diameter Color Shape Fruit 4 red round plum 5 red round apple 5 red round apple 6 red round plum 7 red round apple All Data: “plum” Sample 2: “apple” Sample 3: “apple” Sample 1: “plum” }“apple” What is a round, red 6cm fruit?
  • 17. BigML, Inc 17Ensembles Random Decision Forest MODEL 1 DATASET SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 MODEL 2 MODEL 3 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 SAMPLE 1 PREDICTION COMBINER
  • 18. BigML, Inc 18Ensembles RDF Config • Individual tree parameters are still available • Balanced objective, Missing splits, Node Depth, etc. • Decision Forest parameters still available • Number of model, Sampling, etc • Random candidates: • The number of features to consider at each split
  • 20. BigML, Inc 20Ensembles Boosting ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE LAST SALE PRICE 1522 NW Jonquil 4 3 2424 5227 1991 44.594828 -123.269328 360000 7360 NW Valley Vw 3 2 1785 25700 1979 44.643876 -123.238189 307500 4748 NW Veronica 5 3.5 4135 6098 2004 44.5929659 -123.306916 600000 411 NW 16th 3 2825 4792 1938 44.570883 -123.272113 435350 MODEL 1 PREDICTED SALE PRICE 360750 306875 587500 435350 ERROR 750 -625 -12500 0 ADDRESS BEDS BATHS SQFT LOT SIZE YEAR BUILT LATITUDE LONGITUDE ERROR 1522 NW Jonquil 4 3 2424 5227 1991 44.594828 -123.269328 750 7360 NW Valley Vw 3 2 1785 25700 1979 44.643876 -123.238189 625 4748 NW Veronica 5 3.5 4135 6098 2004 44.5929659 -123.306916 12500 411 NW 16th 3 2825 4792 1938 44.570883 -123.272113 0 MODEL 2 PREDICTED ERROR 750 625 12393.83333 6879.67857 Why  stop  at  one  iteration? "Hey Model 1, what do you predict is the sale price of this home?" "Hey Model 2, how much error do you predict Model 1 just made?"
  • 21. BigML, Inc 21Ensembles Boosting DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration  1 Iteration  2 Iteration  3 Iteration  4   etc…
  • 22. BigML, Inc 22Ensembles Boosting Config • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples
  • 23. BigML, Inc 23Ensembles Boosting Config “OUT OF BAG” SAMPLES DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration  1 Iteration  2 Iteration  3 Iteration  4   etc…
  • 24. BigML, Inc 24Ensembles Boosting Config • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work.
  • 25. BigML, Inc 25Ensembles Iterations Boosted  Ensemble  #1 1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50 Early Stop # Iterations 1 2 3 4 5 6 7 8 9 10 ….. 41 42 43 44 45 46 47 48 49 50 Boosted  Ensemble  #2 Early Stop# Iterations This is OK because the early stop means the iterative improvement is small and we have "converged" before being forcibly stopped by the # iterations This is NOT OK because the hard limit on iterations stopped improving the quality of the boosting long before there was enough iterations to have achieved the best quality.
  • 26. BigML, Inc 26Ensembles Boosting Config • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data: • Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting! • Samples with Replacement • Add Randomize
  • 27. BigML, Inc 27Ensembles Boosting Randomize DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration  1 Iteration  2 Iteration  3 Iteration  4   etc…
  • 28. BigML, Inc 28Ensembles Boosting Randomize DATASET MODEL 1 DATASET 2 MODEL 2 DATASET 3 MODEL 3 DATASET 4 MODEL 4 PREDICTION 1 PREDICTION 2 PREDICTION 3 PREDICTION 4 PREDICTION SUM Iteration  1 Iteration  2 Iteration  3 Iteration  4   etc…
  • 29. BigML, Inc 29Ensembles Boosting Config • Number of iterations - similar to number of models for DF/RDF • Iterations can be limited with Early Stopping: • Early out of bag: tests with the out-of-bag samples • Early holdout: tests with a portion of the dataset • None: performs all iterations. Note: In general, it is better to use a high number of iterations and let the early stopping work. • Learning Rate: Controls how aggressively boosting will fit the data: • Larger values ~ maybe quicker fit, but risk of overfitting • You can combine sampling with Boosting! • Samples with Replacement • Add Randomize • Individual tree parameters are still available • Balanced objective, Missing splits, Node Depth, etc.
  • 31. BigML, Inc 31Ensembles Wait a Second… pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes 6 148 72 35 0 33.6 0.627 50 TRUE 1 85 66 29 0 26.6 0.351 31 FALSE 8 183 64 0 0 23.3 0.672 32 TRUE 1 89 66 23 94 28.1 0.167 21 FALSE MODEL 1 predicted diabetes TRUE TRUE FALSE FALSE ERROR ? ? ? ? …  what  about  classification? pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age diabetes 6 148 72 35 0 33.6 0.627 50 1 1 85 66 29 0 26.6 0.351 31 0 8 183 64 0 0 23.3 0.672 32 1 1 89 66 23 94 28.1 0.167 21 0 MODEL 1 predicted diabetes 1 1 0 0 ERROR 0 -1 1 0 …  we  could  try
  • 32. BigML, Inc 32Ensembles Wait a Second… pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age favorite color 6 148 72 35 0 33.6 0.627 50 RED 1 85 66 29 0 26.6 0.351 31 GREEN 8 183 64 0 0 23.3 0.672 32 BLUE 1 89 66 23 94 28.1 0.167 21 RED MODEL 1 predicted favorite color BLUE GREEN RED GREEN ERROR ? ? ? ? …  but  then  what  about  multiple  classes?
  • 33. BigML, Inc 33Ensembles Boosting Classification pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age favorite color 6 148 72 35 0 33.6 0.627 50 RED 1 85 66 29 0 26.6 0.351 31 GREEN 8 183 64 0 0 23.3 0.672 32 BLUE 1 89 66 23 94 28.1 0.167 21 RED MODEL 1 RED/NOT RED Class RED Probability 0.9 0.7 0.46 0.12 Class RED ERROR 0.1 -0.7 0.54 -0.12 pregnancies plasma glucose blood pressure triceps skin thickness insulin bmi diabetes pedigree age ERROR 6 148 72 35 0 33.6 0.627 50 0.1 1 85 66 29 0 26.6 0.351 31 -0.7 8 183 64 0 0 23.3 0.672 32 0.54 1 89 66 23 94 28.1 0.167 21 -0.12 MODEL 2 RED/NOT RED ERR PREDICTED ERROR 0.05 -0.54 0.32 -0.22 MODEL 1 BLUE/NOT BLUE Class BLUE Probability 0.1 0.3 0.54 0.88 Class BLUE ERROR -0.1 0.7 -0.54 0.12 …and  repeat  for  each     class  at  each  iteration …and  repeat  for  each     class  at  each  iteration Iteration  1 Iteration  2
  • 34. BigML, Inc 34Ensembles Boosting Classification DATASET MODELS 1 per class DATASETS 2 per class MODELS 2 per class PREDICTIONS 1 per class PREDICTIONS 2 per class PREDICTIONS 3 per class PREDICTIONS 4 per class Comb PROBABILITY per class MODELS 3 per class MODELS 4 per class DATASETS 3 per class DATASETS 4 per class Iteration  1 Iteration  2 Iteration  3 Iteration  4   etc…
  • 36. BigML, Inc 36Ensembles Stacked Generalization ENSEMBLE LOGISTIC REGRESSION SOURCE DATASET MODEL BATCH PREDICTION BATCH PREDICTION BATCH PREDICTION EXTENDED DATASET EXTENDED DATASET EXTENDED DATASET LOGISTIC REGRESSION
  • 37. BigML, Inc 37Ensembles Which Ensemble Method • The one that works best! • Ok, but seriously. Did you evaluate? • For "large" / "complex" datasets • Use DF/RDF with deeper node threshold • Even better, use Boosting with more iterations • For "noisy" data • Boosting may overfit • RDF preferred • For "wide" data • Randomize features (RDF) will be quicker • For "easy" data • A single model may be fine • Bonus: also has the best interpretability! • For classification with "large" number of classes • Boosting will be slower • For "general" data • DF/RDF likely better than a single model or Boosting. • Boosting will be slower since the models are processed serially
  • 38. BigML, Inc 38Ensembles Too Many Parameters? • How many trees? • How many nodes? • Missing splits? • Random candidates? • Too many parameters? SMACdown!
  • 39. BigML, Inc 39Ensembles Summary • Models have shortcomings: ability to fit, NP-hard, etc • Data has shortcomings: not enough, outliers, mistakes, etc • Ensemble Techniques can improve on single models • Sampling: partitioning, Decision Tree bagging • Adding Randomness: RDF • Modeling the Error: Boosting • Modeling the Models: Stacking • Guidelines for knowing which one might work best in a given situation
  • 40. BigML, Inc 2 Logistic Regressions Modeling probabilities Poul Petersen CIO, BigML, Inc
  • 41. BigML, Inc 3Logistic Regressions Logistic Regression • Classification implies a discrete objective. How can this be a regression? • Why do we need another classification algorithm? • more questions…. Logistic Regression is a classification algorithm Potential Confusion:
  • 42. BigML, Inc 4Logistic Regressions Linear Regression
  • 43. BigML, Inc 5Logistic Regressions Linear Regression
  • 44. BigML, Inc 6Logistic Regressions Polynomial Regression
  • 45. BigML, Inc 7Logistic Regressions Regression • Linear Regression: 𝛽₀+𝛽1·∙(INPUT)  ≈  OBJECTIVE • Quadratic Regression: 𝛽₀+𝛽1·∙(INPUT)+𝛽2·∙(INPUT)2  ≈  OBJECTIVE • Decision Tree Regression: DT(INPUT)  ≈  OBJECTIVE • Problem: • What if we want to do a classification problem: T/F or 1/0 • What function can we fit to discrete data? Regression is the process of "fitting" a function to the data Key Take-Away:
  • 46. BigML, Inc 8Logistic Regressions Discrete Data Function?
  • 47. BigML, Inc 9Logistic Regressions Discrete Data Function? ????
  • 48. BigML, Inc 10Logistic Regressions Logistic Function 𝑥➝-­‐∞       𝑓(𝑥)➝0 • Looks promising, but still not "discrete" • What about the "green" in the middle? • Let’s change the problem… 𝑥➝∞   𝑓(𝑥)➝1 Goal 1   1  +   𝒆−𝑥𝑓(𝑥)  =   Logistic Function
  • 49. BigML, Inc 11Logistic Regressions Modeling Probabilities 𝑃≈0 𝑃≈10<𝑃<1
  • 50. BigML, Inc 12Logistic Regressions Logistic Regression • Assumes that output is linearly related to "predictors" • What? (hang in there…) • Sometimes we can "fix" this with feature engineering • Question: how do we "fit" the logistic function to real data? LR is a classification algorithm … that uses a regression …
 to model the probability of the discrete objective Clarification: Caveats:
  • 51. BigML, Inc 13Logistic Regressions Logistic Regression 𝛽₀ is the "intercept" 𝛽₁ is the "coefficient" • In which case solving is now a linear regression • But this is only one dimension, that is one feature 𝑥… • Given training data consisting of inputs 𝑥, and probabilities 𝑃 • Solve for 𝛽₀ and 𝛽₁ to fit the logistic function • How? The inverse of the logistic function is called the "logit": 𝑃(𝑥)= 1 1+𝑒−(𝛽0+𝛽1 𝑥) 𝑙𝑛( )𝑃(𝑥) 𝑃(𝑥ʹ′) =𝑙𝑛 ( )1-𝑃(𝑥ʹ′) 𝑃(𝑥) =𝛽0+𝛽1 𝑥
  • 52. BigML, Inc 14Logistic Regressions Logistic Regression For "𝑖" dimensions, 𝑿﹦[   𝑥1,   𝑥2,⋯,   𝑥𝑖  ],  we solve 𝑃(𝑿)= 1 1+𝑒−𝑓(𝑿) 𝑓(𝑿)=𝛽0+𝞫·∙𝑿=𝛽0+𝛽1 𝑥1+⋯+𝛽𝑖 𝑥𝑖 where:
  • 53. BigML, Inc 15Logistic Regressions Interpreting Coefficients • LR computes 𝛽0 and coefficients 𝛽𝑗 for each feature 𝑥𝑗 • negative 𝛽𝑗 → negatively correlated: • positive 𝛽𝑗 → positively correlated: • "larger" 𝛽𝑗 → more impact: • "smaller" → less impact: • 𝛽𝑗  "size" should not be confused with field importance • Can include a coefficient for "missing" (if enabled) • 𝑃(𝑿)  =   𝛽0+⋯+𝛽𝑗 𝑥𝑗+⋯   • Binary Classification (true/false) coefficients are complementary • 𝑃(True)  ≡  1−   𝑃(False) +𝛽𝑗+1[   𝑥𝑗  ≡  Missing  ] 𝑥𝑗↑  then   𝑃(𝑿)↓ 𝑥𝑗↑  then   𝑃(𝑿)↑ 𝑥𝑗≫  then   𝑃(𝑿)﹥ 𝑥𝑗﹥then   𝑃(𝑿)≫
  • 54. BigML, Inc 16Logistic Regressions LR Demo #1
  • 55. BigML, Inc 17Logistic Regressions LR Parameters 1. Default Numeric: Replaces missing numeric values 2. Missing Numeric: Adds a field for missing numerics 3. Stats: Extended statistics, ex: p-value (runs slower) 4. Bias: Enables/Disables the intercept term - 𝛽₀ • Don’t disable this… 5. Regularization: Reduces over-fitting by minimizing 𝛽𝑗 • L1: prefers reducing individual coefficients • L2 (default): prefers reducing all coefficients 6. Strength "C": Higher values reduce regularization 7. EPS: The minimum error between steps to stop Larger values stop earlier but quality may be less 8. Auto-scaling: Ensures that all features contribute equally • Don’t change this unless you have a specific reason
  • 56. BigML, Inc 18Logistic Regressions LR Questions • How do we handle multiple classes? • Binary class True/False only need to solve for one
 𝑃(True)  ≡  1−   𝑃(False)   • What about non-numeric inputs? • Text/Items fields • Categorical fields Questions:
  • 57. BigML, Inc 19Logistic Regressions LR - Multi Class • Instead of a binary class ex: [ true, false ], 
 we have multi-class ex: [ red, green, blue, … ] • "𝑘" classes: 𝑪=[𝑐1,   𝑐2,⋯,   𝑐 𝑘] • solve one-vs-rest LR • Result: 𝞫𝑗 for each class 𝑐𝑗 • apply combiner to ensure
 all probabilities add to 1 𝑙𝑛( )𝑃(𝑐1) 𝑃(𝑐1ʹ′) =𝛽1,0+𝞫1·∙𝑿 𝑙𝑛( )𝑃(𝑐2) 𝑃(𝑐2ʹ′) =𝛽2,0+𝞫2·∙𝑿 ⋯ 𝑙𝑛( )𝑃(𝑐 𝑘) 𝑃(𝑐 𝑘ʹ′) =𝛽 𝑘,0+𝞫 𝑘·∙𝑿
  • 58. BigML, Inc 20Logistic Regressions LR - Field Codings • LR is expecting numeric values to perform regression. • How do we handle categorical values, or text? Class color=red color=blue color=green color=NULL red 1 0 0 0 blue 0 1 0 0 green 0 0 1 0 MISSING 0 0 0 1 One-hot encoding • Only one feature is "hot" for each class • This is the default
  • 59. BigML, Inc 21Logistic Regressions LR - Field Codings Dummy Encoding • Chooses a *reference class* • requires one less degree of freedom Class color_1 color_2 color_3 *red* 0 0 0 blue 1 0 0 green 0 1 0 MISSING 0 0 1
  • 60. BigML, Inc 22Logistic Regressions LR - Field Codings Contrast Encoding • Field values must sum to zero • Allows comparison between classes Class field "influence" red 0.5 positive blue -0.25 negative green -0.25 negative MISSING 0 excluded
  • 61. BigML, Inc 23Logistic Regressions LR - Field Codings Which one to use? • One-hot is the default • Use this unless you have a specific need • Dummy • Use when there is a control group in mind, which
 becomes the reference class • Contrast • Allows for testing specific hypothesis of relationships. • Ex: customers give a "rating" of bad / ok / good rating Contrast Encoding bad -0.66 ok 0.33 good 0.33 Hypothesis is a good and ok review have the same impact, but a bad review has a negative impact twice as great. rating Contrast Encoding bad -0.5 ok 0 good 0.5 Hypothesis is that a good and bad review have an equal but opposite impact, while an ok rating has no impact.
  • 62. BigML, Inc 24Logistic Regressions LR - Field Codings • Text/Items field types are handled by creating a field for text token/item and setting 1 or 0 Text "hippo" "safari" "zebra" “we saw hippos and zebras… 1 0 1 “The best safari for seeing zebras” 0 1 1 “The Oregon coast is rainy in winter” 0 0 0 “Have you ever tried a hippo burger” 1 0 0 Text / Items
  • 63. BigML, Inc 25Logistic Regressions LR Demo #2
  • 64. BigML, Inc 26Logistic Regressions Curvilinear LR • Logistic Regression is expecting a linear relationship between the features and the objective • Remember - it’s a linear regression under the hood • This is actually pretty common in natural datasets • But non-linear relationships will impact model quality • This can be addressed by adding non-linear transformations to the features • Knowing which transformations requires • domain knowledge • experimentation • both
  • 65. BigML, Inc 27Logistic Regressions Curvilinear LR Instead of We could add a feature Where ???? Possible to add any higher order terms or other functions to match shape of data 𝛽0+𝛽1 𝑥1 𝛽0+𝛽1 𝑥1+𝛽2 𝑥2 𝑥1  ≡   𝑥2 2
  • 66. BigML, Inc 28Logistic Regressions LR Demo #3
  • 67. BigML, Inc 29Logistic Regressions LR vs DT • Expects a "smooth" linear relationship with predictors. • LR is concerned with probability of a discrete outcome. • Lots of parameters to get wrong: 
 regularization, scaling, codings • Slightly less prone to over-fitting
 • Because fits a shape, might work better when less data available.
 • Adapts well to ragged non-linear relationships • No concern: classification, regression, multi-class all fine. • Virtually parameter free
 • Slightly more prone to over-fitting
 • Prefers surfaces parallel to parameter axes, but given enough data will discover any shape. Logistic Regression Decision Tree
  • 68. BigML, Inc 30Logistic Regressions LR Demo #4
  • 69. BigML, Inc 31Logistic Regressions Summary • Logistic Regression is a classification algorithm that models the probabilities of each class • How the algorithm works and why this is important • Expects a linear relationship between the features and the objective, and how to fix it • Categorical encodings • LR outputs a set of coefficients and how to interpret • Scale relates to impact • Sign relates to direction of impact • Guidelines for comparing to Decision Trees