As organizations increasingly leverage data and machine learning methods, people throughout those organizations need to build a basic "data literacy" in those topics. In this session, data scientist and instructor Brian Lange provides simple, visual, and equation free explanations for a variety of classification algorithms, geared towards helping anyone understand how they work. Now with Python code examples!
11. things to know
- you need data labeled with the correct answers to
“train” these algorithms before they work
12. things to know
- you need data labeled with the correct answers to
“train” these algorithms before they work
- feature = dimension = column = attribute of the data
13. things to know
- you need data labeled with the correct answers to
“train” these algorithms before they work
- feature = dimension = column = attribute of the data
- class = category = label = Harry Potter house
14. BIG CAVEAT
Often times choosing/creating
good features or gathering more
data will help more than
changing algorithms...
15. % of email body that is all-caps
# mentions
of brand
names spam
not spam
44. logistic regression
“divide it with a logistic function”
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X,y)
predicted = model.predict(z)
47. SVMs (support vector machines)
“*advanced* draw a line through it”
- better definition of “terrible”
48. SVMs (support vector machines)
“*advanced* draw a line through it”
- better definition of “terrible”
- lines can turn into non-linear
shapes if you transform your data
57. SVMs (support vector machines)
“*advanced* draw a line through it”
figure credit: scikit-learn documentation
58. % of email body that is all-caps
# mentions
of brand
names
59. % of email body that is all-caps
# mentions
of brand
names
60. % of email body that is all-caps
# mentions
of brand
names
61. % of email body that is all-caps
# mentions
of brand
names
62. % of email body that is all-caps
# mentions
of brand
names
63. % of email body that is all-caps
# mentions
of brand
names
64. SVMs (support vector machines)
“*advanced* draw a line through it”
from sklearn.svm import SVC
model = SVC(kernel='poly', degree=2)
model.fit(X,y)
predicted = model.predict(z)
65. SVMs (support vector machines)
“*advanced* draw a line through it”
from sklearn.svm import SVC
model = SVC(kernel='rbf')
model.fit(X,y)
predicted = model.predict(z)
83. decision tree learners
make a flow chart of it
from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,y)
predicted = model.predict(z)
97. other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from
98. other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non-
optimally. Compensate by training a ton of trees
99. other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non-
optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your
design, have them “vote” on the correct answer.
100. other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non-
optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your
design, have them “vote” on the correct answer.
Boosting- train models in order, make the later ones
focus on the points the earlier ones missed
102. other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non-
optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your
design, have them “vote” on the correct answer.
Boosting- train models in order, make the later ones
focus on the points the earlier ones missed
103. from sklearn.ensemble import BaggingClassifier
RandomForestClassifier
ExtraTreesClassifier
VotingClassifier
AdaBoostClassifier
GradientBoostingClassifier
113. Nonlinear
decision
boundary
provide
probability
estimates
tell how important a
feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of
nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times
that feature is used)
114. Nonlinear
decision
boundary
provide
probability
estimates
tell how important a
feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of
nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times
that feature is used)
Ensemble models yes kinda (% of models
that agree)
yes, depending on
component parts
115. Nonlinear
decision
boundary
provide
probability
estimates
tell how important a
feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of
nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times
that feature is used)
Ensemble models yes kinda (% of models
that agree)
yes, depending on
component parts
Boosted models yes kinda (% of models
that agree)
yes, depending on
component parts
116.
117. can be updated with new
training data
easy to parallelize?
118. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
119. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
120. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
KNN yes yes
121. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
KNN yes yes
Naïve Bayes yes yes
122. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
123. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
Ensemble models kinda, by adding new
models to the ensemble
yes
124. can be updated with new
training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels,
no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
Ensemble models kinda, by adding new
models to the ensemble
yes
Boosted models kinda, by adding new
models to the ensemble
no
127. Other quirks
SVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)
128. Other quirks
SVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution
can deal with missing data
129. Other quirks
SVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution
can deal with missing data
Decision Tree can provide literal flow charts
very sensitive to outliers
130. Other quirks
SVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution
can deal with missing data
Decision Tree can provide literal flow charts
very sensitive to outliers
Ensemble models less prone to overfitting than their component parts
131. Other quirks
SVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution
can deal with missing data
Decision Tree can provide literal flow charts
very sensitive to outliers
Ensemble models less prone to overfitting than their component parts
Boosted models many parameters to tweak
more prone to overfit than normal ensembles
most popular Kaggle winners use these