It's Not Magic - Explaining classification algorithms

It’s Not Magic
Brian Lange, Data Scientist + Partner at
Explaining
classiﬁcation algorithms

I work with some really
freakin’ smart people.

popular examples
-spam ﬁlters

popular examples
-spam ﬁlters
-the Sorting Hat

things to know
- you need data labeled with the correct answers to
“train” these algorithms before they work

things to know
- feature = dimension = column = attribute of the data

things to know
- feature = dimension = column = attribute of the data
- class = category = label = Harry Potter house

BIG CAVEAT
Often times choosing/creating
good features or gathering more
data will help more than
changing algorithms...

% of email body that is all-caps
# mentions
of brand
names spam
not spam

# mentions
of brand
names

# mentions
of brand
names
1 wrong

# mentions
of brand
names
5 wrong

# mentions
of brand
names
4 wrong

# mentions
of brand
names
4 wrong
y = .01x+4

terribleness
slope
intercept
a map of terribleness
to ﬁnd the least terrible line

terribleness
slope
intercept
“gradient descent”

training
data
import numpy as np
X = np.array([[1, 0.1], [3, 0.2], [5, 0.1]…])
y = np.array([1, 2, 1])

training
data
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
model = LinearDiscriminantAnalysis()
model.fit(X, y)

trained
model
new_point = np.array([1, .3])

trained
model
print(model.predict(new_point))

trained
model
1

trained
model
1
not
spam
prediction

trained
model
not
spam
prediction

logistic regression
“divide it with a logistic function”

logistic regression
“divide it with a logistic function”
from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X,y)
predicted = model.predict(z)

SVMs (support vector machines)
“*advanced* draw a line through it”

- better deﬁnition of “terrible”

- better deﬁnition of “terrible”
- lines can turn into non-linear
shapes if you transform your data

ﬁgure credit: scikit-learn documentation

from sklearn.svm import SVC
model = SVC(kernel='poly', degree=2)
model.fit(X,y)

from sklearn.svm import SVC
model = SVC(kernel='rbf')
model.fit(X,y)

KNN (k-nearest neighbors)
“what do similar cases look like?”

k=1

k=2

k=3

ﬁgure credit: Burton DeWilde

from sklearn.neighbors import NearestNeighbors
model = NearestNeighbors(n_neighbors=5)
model.fit(X,y)

decision tree learners
make a ﬂow chart of it

x < 3?
yes no
3

x < 3?
yes no
y < 4?
yes no
3
4

x < 3?
yes no
y < 4?
yes no
x < 5?
yes no
3 5
4

from sklearn.tree import DecisionTreeClassifier
model = DecisionTreeClassifier()
model.fit(X,y)

sklearn.tree.export_graphviz() +
pydot

Ensemble
models
(make a bunch of models and combine them)

bagging
split training set, train one model each, models “vote”

bagging
new data
point

bagging
new data
point
not
spam
spam
not
spam

bagging
new data
point
not
spam
spam
not
spam
not
spam
Final
Answer:

other spins on this
Random Forest - like bagging, but at each split
randomly constrain features to choose from

other spins on this
Extra Trees - for each split, make it randomly, non-
optimally. Compensate by training a ton of trees

other spins on this
Voting - combine a bunch of diﬀerent models of your
design, have them “vote” on the correct answer.

other spins on this
Voting - combine a bunch of diﬀerent models of your
design, have them “vote” on the correct answer.
Boosting- train models in order, make the later ones
focus on the points the earlier ones missed

voting example

from sklearn.ensemble import BaggingClassifier
RandomForestClassifier
ExtraTreesClassifier
VotingClassifier
AdaBoostClassifier
GradientBoostingClassifier

which one do I
pick?
try a few!

Nonlinear
decision
boundary
provide
probability
estimates
tell how important a
feature is to the model

Nonlinear
decision
boundary
provide
probability
estimates
Logistic Regression no yes yes, if you scale

Nonlinear
decision
boundary
provide
probability
estimates
SVMs yes, with kernel no no

Nonlinear
decision
boundary
provide
probability
estimates
KNN yes kinda (percent of
nearby points)
no

Nonlinear
decision
boundary
provide
probability
estimates
nearby points)
no
Naïve Bayes yes yes no

Nonlinear
decision
boundary
provide
probability
estimates
nearby points)
no
Decision Tree yes no yes (number of times
that feature is used)

Nonlinear
decision
boundary
provide
probability
estimates
nearby points)
no
Ensemble models yes kinda (% of models
that agree)
yes, depending on
component parts

Nonlinear
decision
boundary
provide
probability
estimates
nearby points)
no
Ensemble models yes kinda (% of models
that agree)
yes, depending on
component parts
Boosted models yes kinda (% of models
that agree)
yes, depending on
component parts

can be updated with new
training data
easy to parallelize?

training data
Logistic Regression kinda kinda

training data
SVMs kinda, depending on kernel yes for some kernels,
no for others

training data
no for others
KNN yes yes

training data
no for others
KNN yes yes
Naïve Bayes yes yes

training data
no for others
KNN yes yes
Decision Tree no no (but it’s very fast)

training data
no for others
KNN yes yes
Ensemble models kinda, by adding new
models to the ensemble
yes

training data
no for others
KNN yes yes
Ensemble models kinda, by adding new
yes
Boosted models kinda, by adding new
no

Other quirks
SVMs have to pick a kernel

Other quirks
KNN you need to deﬁne what “similarity” is in a good way.
fast to train, slow to classify (compared to other methods)

Other quirks
Naïve Bayes have to choose the distribution
can deal with missing data

Other quirks
Decision Tree can provide literal ﬂow charts
very sensitive to outliers

Other quirks
Ensemble models less prone to overﬁtting than their component parts

Other quirks
Ensemble models less prone to overﬁtting than their component parts
Boosted models many parameters to tweak
more prone to overﬁt than normal ensembles
most popular Kaggle winners use these

if this sounds cool
datascope.co/careers

thanks!
question time…
.cohttp:// @bjlange

It's Not Magic - Explaining classification algorithms

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (20)

Similaire à It's Not Magic - Explaining classification algorithms

Similaire à It's Not Magic - Explaining classification algorithms (20)

Dernier

Dernier (20)

It's Not Magic - Explaining classification algorithms