SlideShare une entreprise Scribd logo
1  sur  78
Télécharger pour lire hors ligne
Machine Learning
Ludovic Samper
Antidot
September 1st, 2015
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77
Antidot
Software vendor since 1999
Paris, Lyon, Aix-en-Provence
45 employees
Founders : Fabrice Lacroix CEO, St´ephane Loesel CTO, J´erˆome
Mainka Chief Scientist Officer
Software products and solutions
Antidot Finder Suite (AFS) search engine
Antidot Information Factory (AIF) a pipe & filters framework
SaaS, Hosted License, 0n-site License
50% of the revenue invested in R&D
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 2 / 77
Antidot
Machine Learning
Automatic text document classification
Named Entity Extraction
Compound Splitter (for german words)
Clustering algorithm (for news agregation)
Open Data, Semantic Web
http://www.rechercheisidore.fr/ Social Sciences and
Humanities research platform. Enriched with open resources
https://github.com/antidot/db2triples/ open source library
to export a db in RDF
Antidot is a Partner organization in WDAqua project
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77
Tutorial
Study a classical task in Machine Learning : text classification
Show scikit-learn.org Python machine learning library
Follow the “Working with text data” tutorial :
http://scikit-learn.org/stable/tutorial/text_analytics/
working_with_text_data.html
Additional material on http://blog.antidot.net/
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 4 / 77
Summary of the tutorial
1 Problem definition
Supervised classification
Evaluation metrics
2 Extracting features from text files
Bag of words model
Term frequency inverse document frequency (tfidf)
3 Algorithms for classification
Na¨ıve Bayes
Support Vector Machine (SVM)
Tuning parameters
Cross validation
Grid search
4 Conclusion
Methodology
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 5 / 77
Sommaire
1 Problem definition
Supervised classification
Evaluation metrics
2 Extracting features from text files
3 Algorithms for classification
4 Conclusion
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 6 / 77
20 newsgroups dataset
http://qwone.com/~jason/20Newsgroups/
20 newsgroups
20 newsgroups documents collected in the 90’s
The label is the newsgroup the document belongs to
A popular collection
18846 documents : 11314 in train, 7532 in test
wiss-ml.ipynb#The-20-newsgroups-dataset
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 7 / 77
Classification
Problem statement
One label per document
Automatically determine the label of an unseen document. Set of
documents and their labels
A supervised classification problem
Training
Set of documents and their labels
Build a model
Inference
Given a new document, use the model to predict its label
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 8 / 77
Precision and Recall I
Binary classification
C C
Labeled C TP True Positive FP False Positive
Not labeled C FN False Negative TN True Negative
Precision
TP
TP + FP
Proba(e ∈ C|e labeled C )
Recall
TP
TP + FN
Proba(e labeled C|e ∈ C)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 9 / 77
Precision and Recall II
F1
F1 = 2
P × R
P + R
Harmonic mean of Precision and Recall
Accuracy
TP + TN
TP + TN + FP + FN
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 10 / 77
Multiclass I
NC = number of class
Macro Average
Bmacro =
1
NC
NC
k=1
(Bbinary (TPk, FPk, TNk, FNk))
Average mesure by class. Large classes count has much as small ones.
Micro Average
Bmicro = Bbinary (
NC
k=1
TPi ,
NC
k=1
FPi ,
NC
k=1
TNk,
NC
k=1
FNk)
Average mesure by instance
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 11 / 77
Multiclass II
Micro average in single label multiclass
NC
k=1
(FNk) =
NC
k=1
(FPk)
and
NC
k=1
(TNk) =
NC
k=1
(TPk)
Then,
Precisionmicro = Recallmicro = Accuracy =
NC
k=1(TPk)
Nbdoc
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 12 / 77
Sommaire
1 Problem definition
2 Extracting features from text files
Bag of words model
Term frequency inverse document frequency (tfidf)
3 Algorithms for classification
4 Conclusion
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 13 / 77
Bag of words
From text to features
Count the number of occurrences of words in text
“bag” because position isn’t taken into account
Extensions
Remove stop words
Remove too frequent words (max_df)
lowercase
Ngram (ngram_range) tokenize ngrams instead of words. Useful to
take into account word positions
wiss-ml.ipynb#Bag-of-words
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 14 / 77
Term frequency inverse document frequency (tfidf) I
Intuition
Take into account relative importance of each word regarding the whole
dataset
If a word occurs in every document, it doesn’t hold any information
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 15 / 77
Term frequency inverse document frequency (tfidf) II
Definition
Term frequency × inverse document frequency
tfidf (w, d) = tf (w, d) × idf (w, d)
tf (w, d) = term frequency(word w in doc d)
idf (w) = log(
Ndoc
doc freq(w)
)
In scikit-learn :
tfidf (w, d) = tf (w, d) × (idf (w) + 1)
Terms that occurs in all documents idf = 0 will not be ignored
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 16 / 77
Term frequency inverse document frequency (tfidf) III
Options
Normalisation ||doc|| = 1. Ex, for norm L2, w∈d tfidf(w, d)2 = 1
Smoothing : add one to document frequencies as if an extra doc
contained every term in the collection exactly once
idf (w) = log(
Ndoc + 1
doc freq(w) + 1
)
Example
Show most significants words of a doc wiss-ml.ipynb#Tfidf
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 17 / 77
Sommaire
1 Problem definition
2 Extracting features from text files
3 Algorithms for classification
Na¨ıve Bayes
Support Vector Machine (SVM)
Tuning parameters
Cross validation
Grid search
4 Conclusion
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 18 / 77
Supervised classification problem I
Notations
x = (x1, · · · , xn) = (xi )0≤i<n feature vector
{(xd , yd )}0≤d<D the training set
∀i, xi ∈ Rn
xi feature vector for document i
n dimension of the feature space
∀d, yd ∈ {1, · · · , NC }
NC the number of classes
yd
the class of document d
ˆy class prediction
For a new vector x, ˆy is the predicted class of x.
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 19 / 77
Supervised classification problem II
Goal
Find a function F :
Rn
→ {1, · · · , NC }
x → ˆy
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 20 / 77
In 20newsgroups I
Values in 20 newsgroups
n = 130107 nb features (number of unique terms)
D = 11314 training samples
NC = 20 different classes
Goal
Find a function F that given a new document predicts its class
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 21 / 77
Na¨ıve Bayes Algorithm I
Bayes’ theorem
P(A|B) =
P(B|A)P(A)
P(B)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 22 / 77
Na¨ıve Bayes Algorithm II
Posterior probability of class C
P(C|x) =
P(x|C)P(C)
P(x)
P(x) does not depend on C,
P(C|x) ∝ P(x|C)P(C)
Na¨ıve Bayes independent assumption : each feature i is conditionally
independent of every other feature j
P(C|x) ∝ P(C) ×
n
i=1
P(xi |C)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 23 / 77
Na¨ıve Bayes Algorithm III
Classifier from the probability model
ˆy = arg max
k∈{1,··· ,NC }
P(y = k) ×
n
i=0
P(xi |y = k)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 24 / 77
Parameter estimation in Na¨ıve Bayes’ classifier
Prior of a class
P(y = k) =
nb samples in class k
total nb samples
Can also be uniform : P(y = k) = 1
NC
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 25 / 77
Multinomial Na¨ıve Bayes I
Na¨ıve Bayes
P(x|y = k) = n
i=1 P(xi |y = k)
Multinomial distribution
Event word is i follows a multinomial distribution with parameters
(p1, · · · , pn) where pi = P(word = i)
P(x1, · · · , xn) =
n
i=1
pxi
i
Where i pi = 1.
pi = P(w = i)
One distribution for each class y.
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 26 / 77
Multinomial Na¨ıve Bayes II
Multinomial Na¨ıve Bayes
One multinomial distribution for each class
P(i|y = k) =
sum of occurrences of word xi in class k
total nb words in class k
= d∈k xi
0≤j<n d∈k xj
With smoothing,
P(i|y = k) = d∈k xi + α
0≤j<n d∈k xj + αn
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 27 / 77
Multinomial Na¨ıve Bayes III
Inference in Multinomial Na¨ıve Bayes
ˆy = arg max
k
P(y = k|x)
= arg max
k
P(y = k)
0≤i<n
P(i|y = k)xi
= arg max
k
log(P(y = k)) +
0≤i<n
xi log(P(i|y = k))
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 28 / 77
Multinomial Na¨ıve Bayes IV
A linear model
In the log space,
(log P(y = k|x))k ∝ W0 + W T
.x
W0, is the vector of priors :
W0 = log(P(y = k))
W is the matrix of distributions :
W = (wik), i ∈ [1, n], k ∈ [1, NC ]
wik = log P(i|y = k)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 29 / 77
Multinomial Na¨ıve Bayes V
Example step-by-step
http://www.antidot.net/wiss2015/wiss-ml.html#Naive-Bayes
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 30 / 77
Sommaire
1 Problem definition
2 Extracting features from text files
3 Algorithms for classification
Na¨ıve Bayes
Support Vector Machine (SVM)
Tuning parameters
Cross validation
Grid search
4 Conclusion
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 31 / 77
A linear classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 32 / 77
A linear classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 33 / 77
A linear classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 34 / 77
A linear classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 35 / 77
A linear classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 36 / 77
Support Vector Machine, notations
Problem
S, training set
{(xi , yi ), xi ∈ Rn
, yi ∈ {−1, 1}}i∈0..D
Find a linear function w, xi + b such that :
sign( w, xi + b) = yi
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 37 / 77
SVM, maximum margin classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 38 / 77
Margin
distance(x+, x−) =
w
||w||
, x+ − x−
=
1
||w||
( w, x+ − w, x− )
=
1
||w||
(( w, x+ + b) − ( w, x− + b))
=
1
||w||
(1 − (−1))
=
2
||w||
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 39 / 77
SVM, maximum margin classifier
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 40 / 77
Solving an optimization problem using the Lagrangien
Primal problem
minimizew,bf (w, b)
Under the constraints, hi (w, b) ≥ 0
Lagrange function
L(w, b, α) = f (w, b) −
i
αi hi (w, b)
Let, g(α) = inf(w,b) L(w, b, α)
∀w, b, g(α) ≤ L(w, b, α)
Moreover, L(w, b, α) ≤ f (w, b)
Thus, ∀αi ≥ 0, g(α) ≤ minw,b f (w, b)
And with Karush Kuhn Tucker (KKT) optimality condition,
max
α
g(α) = min
w,b
f (w, b) ⇔ αi hi (w, x) = 0
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 41 / 77
Support Vector Machine, problem
Primal problem
minimize(w,b)
||w||2
2
Under the constraints, ∀0 < i ≤ D, yi ( w, xi + b) ≥ 1
Lagrange function
L(w, b, α) =
1
2
||w||2
−
i
αi (yi ( w, xi + b) − 1)
Dual problem :
maximize(w,b,α)L(w, b, α)
with αi ≥ 0
Optimality in w, b is a saddle point with α
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 42 / 77
Support Vector Machine, problem
Derivative in w, b need to vanish
∂
∂w
L(w, b, α) = w −
i
αi yi xi = 0
∂
∂b
L(w, b, α) =
i
αi yi = 0
Dual problem
maximizeα −
1
2
i,j
αi αj yi yj xi , xj +
i
αi
under the constraints,
i αi yi = 0
αi ≥ 0
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 43 / 77
Support Vectors
Support vectors
w =
i
yi αi xi
Karush Kuhn Tucker (KKT) optimality condition
Lagrange multiplier times constraint equals zero
αi (yi ( w, xi + b) − 1) = 0
Thus,
αi = 0
αi > 0 ⇒ yi ( w, xi + b) = 1
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 44 / 77
Experiments with separable space
SVMvaryingC.ipynb
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 45 / 77
What happens if space is not separable
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 46 / 77
Adding slack variable
Problem was
minimize(w,b)
||w||2
2
With,
yi (w.xi + b) ≥ 1
With slack
minimize(w,b)
||w||2
2
+ C
i
ξi
With,
yi (w.xi + b) ≥ 1 − ξi
ξi ≥ 0
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 47 / 77
Support Vector Machine, without slack
Primal problem
minimize(w,b)
||w||2
2
With,
yi (w.xi + b) ≥ 1
Lagrange function
L(w, b, α) =
1
2
||w||2
−
i
αi (yi ( w, xi + b) − 1)
Dual problem :
maximize(w,b,α)L(w, b, α)
Optimality in w, b, is a saddle point with α
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 48 / 77
Support Vector Machine, with slack
Primal problem
minimize(w,b)
||w||2
2
+ C
i
ξi
With,
yi (w.xi + b) ≥ 1 − ξi
ξi ≥ 0
Lagrange function
L(w, b, ξ, α, η) =
1
2
||w||2
+ C
i
ξi −
i
αi (yi ( xi , w + b) + ξi − 1) −
i
ηi ξi
Dual problem :
maximize(w,b,ξ,α,η)L(w, b, ξ, α, η)
Optimality in w, b, ξ is a saddle point with α, η
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 49 / 77
Support Vector Machine, problem
Derivative in w, b, ξ need to vanish
∂
∂w
L(w, b, ξ, α, η) = w −
i
αi yi xi = 0
∂
∂b
L(w, b, ξ, α, η) =
i
αi yi = 0
∂
∂ξ
L(w, b, ξ, α, η) = C − αi − ηi = 0 ⇒ ηi = C − αi
Dual problem
maximizeα −
1
2
i,j
αi αj yi yj xi , xj +
i
αi
under the constraints, i αi yi = 0 and 0 ≤ αi ≤ C
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 50 / 77
Support Vectors
Support vectors
w =
i
yi αi xi
Karush Kuhn Tucker (KKT) optimality condition
Lagrange multiplier times constraint equals zero
αi (yi ( w, xi + b) + ξi − 1) = 0
ηi ξi = 0 ⇔ (C − αi )ξi = 0
Thus, 


αi = 0 ⇒ yi ( w, xi + b) ≥ 1
0 < αi < C ⇒ yi ( w, xi + b) = 1
αi = C ⇒ yi ( w, xi + b) ≤ 1
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 51 / 77
Support Vector Machine, Loss functions
Primal problem
minimize(w,b)
||w||2
2
+ C
i
ξi
With,
yi (w.xi + b) ≥ 1 − ξi
ξi ≥ 0
With loss function
minimize(w,b)
||w||2
2
+ C
i
max(0, 1 − yi (w.xi + b))
here,
loss(xi , yi ) = max(0, 1 − yi (w.xi + b)) = max(0, 1 − f (xi ))
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 52 / 77
Support Vector Machine, Common loss functions
Common loss functions
hinge loss, L1-loss : max(0, 1 − yi (w.xi + b))
squares hinge L2-loss : max(0, (1 − yi (w.xi + b))2
)
logistic loss : log(1 + exp(−yi (w.xi + b)))
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 53 / 77
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 54 / 77
Expermiments with different values for C
SVMvaryingC.ipynb#Varying-C-parameter
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 55 / 77
Non linearly separable data
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 56 / 77
Non linearly separable data, Φ(x) = (x, x2
)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 57 / 77
Non linearly separable data, Φ(x) = (x, x2
)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 58 / 77
Linear case
Primal Problem
minimizew,b
1
2
||w||2
+ C
i
ξi
subject to, yi ( w, xi + b) ≥ 1 − ξi and ξi ≥ 0
Dual Problem
maximizeα
1
2
i,j
αi αj yi yj xi , xj +
i
αi
subject to, i αi yi = 0 and 0 ≤ αi ≤ C
Support vector expansion
f (x) =
i
αi yi xi , x + b
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 59 / 77
With a transformation Φ : x → Φ(x)
Primal Problem
minimizew,b
1
2
||w||2
+ C
i
ξi
subject to, yi ( w, Φ(xi ) + b) ≥ 1 − ξi and ξi ≥ 0
Dual Problem
maximizeα
1
2
i,j
αi αj yi yj Φ(xi ), Φ(xj ) +
i
αi
subject to, i αi yi = 0 and 0 ≤ αi ≤ C
Support vector expansion
f (x) =
i
αi yi Φ(xi ), Φ(x) + b
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 60 / 77
The kernel trick
Kernel function
k(x, x ) = Φ(x), Φ(x )
We just need to compute the dot product in the new space
Dual Problem
maximizeα
1
2
i,j
αi αj yi yj k(xi , xj ) +
i
αi
subject to, i αi yi = 0 and 0 ≤ αi ≤ C
Support vector expansion
f (x) =
i
αi yi k(xi , x) + b
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 61 / 77
Kernels
Kernel functions
linear : k(x, x ) = x, x
polynomial : k(x, x ) = (γ x, x + r)d
rbf : k(x, x ) = exp(−γ|x − x |2)
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 62 / 77
RBF Kernel imply an infinite space
Here we’re in dimension 1, x ∈ R
k(x, x ) = exp(−(x − x )2
)
= exp(−x2
)exp(−x 2
)exp(2xx )
With Taylor transformation,
k(x, x ) = exp(−x2
)exp(−x 2
)
∞
k=0
2kxkx k
k!
= (· · · ,
2k−1
√
k!
exp(−x2
)xk
, · · · ),
(· · · ,
2k−1
√
k!
exp(−x 2
)x k
, · · · )
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 63 / 77
Experiments with different kernels
www.antidot.net/wiss2015/SVMvaryingC.html#Non-linear-kernels
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 64 / 77
SVM in multiclass
one-vs-the rest
NC binary classifiers (but each involving all dataset)
At prediction time, choose the class with maximum decision value
one-vs-one
NC (NC −1)
2 binary classifiers
At prediction time, vote
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 65 / 77
SVM in scikit-learn
SVC : Support Vector Classification
sklearn.svm.linearSVC
based on Liblinear library
strategy : one-vs-the rest
only linear kernel
loss can be : ‘hinge’ or ‘squared hinge’
sklearn.svm.SVC
based on libSVM
multiclass strategy : one-vs-one
kernel can be : linear, polynomial, RBF, sigmoid, precomputed
only hinge loss
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 66 / 77
Sommaire
1 Problem definition
2 Extracting features from text files
3 Algorithms for classification
Na¨ıve Bayes
Support Vector Machine (SVM)
Tuning parameters
Cross validation
Grid search
4 Conclusion
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 67 / 77
Cross validation I
http://scikit-learn.org/stable/modules/cross_validation.html
Overfitting
Estimation of parameters on the test set can lead to overfitting :
parameters are the best for this test set but not in the general case.
Train, test and validation dataset
A solution :
tweak the parameters on the test set
validate on a validation dataset
only few data in training dataset
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 68 / 77
Cross validation II
Cross validation
k-fold cross validation
Split training data in k partitions of the same size
train the model on k − 1 partitions
then, evaluate on the kth partition
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 69 / 77
Cross validation III
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 70 / 77
Grid Search
http://scikit-learn.org/stable/modules/grid_search.html
Grid search
Test each value for each parameter
brut force algorithm to find the best value for each parameter
In scikit-learn
Automatically runs k× number of parameters’ values trainings
Keeps the best model
Demo with scikit-learn
http://www.antidot.net/wiss2015/grid_search_20newsgroups.html
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 71 / 77
Sommaire
1 Problem definition
2 Extracting features from text files
3 Algorithms for classification
4 Conclusion
Methodology
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 72 / 77
1 Problem definition
Supervised classification
Evaluation metrics
2 Extracting features from text files
Bag of words model
Term frequency inverse document frequency (tfidf)
3 Algorithms for classification
Na¨ıve Bayes
Support Vector Machine (SVM)
Tuning parameters
Cross validation
Grid search
4 Conclusion
Methodology
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77
Methodology
To solve a problem using Machine Learning, you have to :
1 Understand the data
2 Choose an evaluation measure
3 Be able to test the model
4 Find the main features
5 Try the algorithms, with different parameters
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77
Conclusion
Machine Learning has a lot of applications
With libraries like scikit-learn, no need to implement algorithms
yourself
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 74 / 77
Questions ?
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 75 / 77
References
Machine Learning in Python :
http://scikit-learn.org
Alex Smola very good lecture on Machine Learning at CMU :
http://alex.smola.org/teaching/10-701-15/
Kernels : https://www.youtube.com/watch?v=0Nis-oMLbDs
SVM : https://www.youtube.com/watch?v=bsbpqNIKQzU
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 76 / 77
Bernoulli Na¨ıve Bayes
Features
xi = 1 iff word i is present in document
Else, xi = 0
The number of occurrences of word i doesn’t matter
Bernoulli
For each feature i,
P(xi |y = k) = P(i|y = k)xi + (1 − P(i|y = k))(1 − xi )
Absence of a feature is explicitly taken into account
Estimation of P(i|y = k)
P(i|y = k) =
1 + nb of documents in k that contains word i
nb of documents in k
Ludovic Samper (Antidot) Machine Learning September 1st, 2015 77 / 77

Contenu connexe

Tendances

Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019Travis Oliphant
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataTravis Oliphant
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...Edureka!
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glueJiahao Chen
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldKai-Wen Zhao
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersBayu Aldi Yansyah
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEHONGJOO LEE
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKOlivier Grisel
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019Travis Oliphant
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsAtsunori Kanemura
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorchMayur Bhangale
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...AMIDST Toolbox
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsGael Varoquaux
 
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaTensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaEdureka!
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017Yu-Hsun (lymanblue) Lin
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
Max Entropy
Max EntropyMax Entropy
Max Entropyjianingy
 

Tendances (20)

Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
Keynote at Converge 2019
Keynote at Converge 2019Keynote at Converge 2019
Keynote at Converge 2019
 
Array computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyDataArray computing and the evolution of SciPy, NumPy, and PyData
Array computing and the evolution of SciPy, NumPy, and PyData
 
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
TensorFlow Tutorial | Deep Learning Using TensorFlow | TensorFlow Tutorial Py...
 
Python as number crunching code glue
Python as number crunching code gluePython as number crunching code glue
Python as number crunching code glue
 
5 csp
5 csp5 csp
5 csp
 
Learning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifoldLearning to discover monte carlo algorithm on spin ice manifold
Learning to discover monte carlo algorithm on spin ice manifold
 
PyTorch for Deep Learning Practitioners
PyTorch for Deep Learning PractitionersPyTorch for Deep Learning Practitioners
PyTorch for Deep Learning Practitioners
 
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOMEEuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
EuroPython 2017 - PyData - Deep Learning your Broadband Network @ HOME
 
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTKStatistical Machine Learning for Text Classification with scikit-learn and NLTK
Statistical Machine Learning for Text Classification with scikit-learn and NLTK
 
SciPy Latin America 2019
SciPy Latin America 2019SciPy Latin America 2019
SciPy Latin America 2019
 
PAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and BasicsPAKDD2016 Tutorial DLIF: Introduction and Basics
PAKDD2016 Tutorial DLIF: Introduction and Basics
 
Deep Learning with PyTorch
Deep Learning with PyTorchDeep Learning with PyTorch
Deep Learning with PyTorch
 
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
Parallel Filter-Based Feature Selection Based on Balanced Incomplete Block De...
 
Processing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patternsProcessing biggish data on commodity hardware: simple Python patterns
Processing biggish data on commodity hardware: simple Python patterns
 
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | EdurekaTensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
TensorFlow In 10 Minutes | Deep Learning & TensorFlow | Edureka
 
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
[Update] PyTorch Tutorial for NTU Machine Learing Course 2017
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
Max Entropy
Max EntropyMax Entropy
Max Entropy
 
TensorFlow Object Detection API
TensorFlow Object Detection APITensorFlow Object Detection API
TensorFlow Object Detection API
 

Similaire à WISS 2015 - Machine Learning lecture by Ludovic Samper

Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnGilles Louppe
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringArthur Mensch
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET Journal
 
Programming Math in Java - Lessons from Apache Commons Math
Programming Math in Java - Lessons from Apache Commons MathProgramming Math in Java - Lessons from Apache Commons Math
Programming Math in Java - Lessons from Apache Commons MathPhil Steitz
 
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Philippe Laborie
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science Frank Kienle
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in PythonGael Varoquaux
 
DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classifica...
DeustoTech Internet at TASS 2015:  Sentiment analysis and polarity classifica...DeustoTech Internet at TASS 2015:  Sentiment analysis and polarity classifica...
DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classifica...Juan Sixto
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsDmitriy Selivanov
 
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Oleksandr Zaitsev
 
Functional programming-advantages
Functional programming-advantagesFunctional programming-advantages
Functional programming-advantagesSergei Winitzki
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Turi, Inc.
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDeltares
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmKaniska Mandal
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques ijsc
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...AIST
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Dev Sahu
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 

Similaire à WISS 2015 - Machine Learning lecture by Ludovic Samper (20)

Accelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-LearnAccelerating Random Forests in Scikit-Learn
Accelerating Random Forests in Scikit-Learn
 
Massive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filteringMassive Matrix Factorization : Applications to collaborative filtering
Massive Matrix Factorization : Applications to collaborative filtering
 
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
 
Programming Math in Java - Lessons from Apache Commons Math
Programming Math in Java - Lessons from Apache Commons MathProgramming Math in Java - Lessons from Apache Commons Math
Programming Math in Java - Lessons from Apache Commons Math
 
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
Self-Adapting Large Neighborhood Search: Application to single-mode schedulin...
 
Machine Learning part 2 - Introduction to Data Science
Machine Learning part 2 -  Introduction to Data Science Machine Learning part 2 -  Introduction to Data Science
Machine Learning part 2 - Introduction to Data Science
 
Simple big data, in Python
Simple big data, in PythonSimple big data, in Python
Simple big data, in Python
 
DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classifica...
DeustoTech Internet at TASS 2015:  Sentiment analysis and polarity classifica...DeustoTech Internet at TASS 2015:  Sentiment analysis and polarity classifica...
DeustoTech Internet at TASS 2015: Sentiment analysis and polarity classifica...
 
Matrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender SystemsMatrix Factorizations for Recommender Systems
Matrix Factorizations for Recommender Systems
 
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
Suggesting Descriptive Method Names: An Exploratory Study of Two Machine Lear...
 
Review_Cibe Sridharan
Review_Cibe SridharanReview_Cibe Sridharan
Review_Cibe Sridharan
 
Functional programming-advantages
Functional programming-advantagesFunctional programming-advantages
Functional programming-advantages
 
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
Splash: User-friendly Programming Interface for Parallelizing Stochastic Lear...
 
DSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - MarkusDSD-INT 2018 Algorithmic Differentiation - Markus
DSD-INT 2018 Algorithmic Differentiation - Markus
 
MS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning AlgorithmMS CS - Selecting Machine Learning Algorithm
MS CS - Selecting Machine Learning Algorithm
 
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques  Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
Methodological Study Of Opinion Mining And Sentiment Analysis Techniques
 
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
Oleksandr Frei and Murat Apishev - Parallel Non-blocking Deterministic Algori...
 
Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier Sentiment analysis using naive bayes classifier
Sentiment analysis using naive bayes classifier
 
Triggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphsTriggering patterns of topology changes in dynamic attributed graphs
Triggering patterns of topology changes in dynamic attributed graphs
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 

Plus de Antidot

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireAntidot
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteAntidot
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content ClassifierAntidot
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJAntidot
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Antidot
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Antidot
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Antidot
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Antidot
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRAntidot
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceAntidot
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Antidot
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Antidot
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?Antidot
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Antidot
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Antidot
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Antidot
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Antidot
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesAntidot
 
Wikidata : quand Wikipédia s'intéresse aux données
Wikidata : quand Wikipédia s'intéresse aux donnéesWikidata : quand Wikipédia s'intéresse aux données
Wikidata : quand Wikipédia s'intéresse aux donnéesAntidot
 

Plus de Antidot (20)

Comment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaireComment l'intelligence artificielle améliore la recherche documentaire
Comment l'intelligence artificielle améliore la recherche documentaire
 
Antidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenusAntidot Content Classifier - Valorisez vos contenus
Antidot Content Classifier - Valorisez vos contenus
 
Comment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texteComment l’intelligence artificielle réinvente la fouille de texte
Comment l’intelligence artificielle réinvente la fouille de texte
 
Antidot Content Classifier
Antidot Content ClassifierAntidot Content Classifier
Antidot Content Classifier
 
Cas client CAIJ
Cas client CAIJCas client CAIJ
Cas client CAIJ
 
Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...Du Big Data à la Smart Information : comment valoriser les actifs information...
Du Big Data à la Smart Information : comment valoriser les actifs information...
 
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
Compte rendu de la matinée "E-commerce B2B : les leviers de croissance"
 
Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?Web sémantique et Web de données, et si on passait à la pratique ?
Web sémantique et Web de données, et si on passait à la pratique ?
 
Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...Machine learning, deep learning et search : à quand ces innovations dans nos ...
Machine learning, deep learning et search : à quand ces innovations dans nos ...
 
Flyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FRFlyer AFS@Store 2015 FR
Flyer AFS@Store 2015 FR
 
Do’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerceDo’s and don'ts : la recherche interne aux sites de ecommerce
Do’s and don'ts : la recherche interne aux sites de ecommerce
 
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
Boostez votre taux de conversion et augmentez vos ventes grâce au searchandis...
 
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
Synergie entre intranet collaboratif et recherche sémantique : le cas des hôp...
 
En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?En 2015, quelles sont les bonnes pratiques du searchandising ?
En 2015, quelles sont les bonnes pratiques du searchandising ?
 
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
Comment tirer profit des données publiques ouvertes dans un mashup web grâce ...
 
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
Vous utilisez Prestashop ? Changez votre moteur de recherche interne pour boo...
 
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
Boostez votre taux de conversion en tirant profit des bonnes pratiques du sea...
 
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
Améliorer le searchandising d’un site spécialisé : retour d'expérience de Cui...
 
Comment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertesComment sélectionner, qualifier puis exploiter les données ouvertes
Comment sélectionner, qualifier puis exploiter les données ouvertes
 
Wikidata : quand Wikipédia s'intéresse aux données
Wikidata : quand Wikipédia s'intéresse aux donnéesWikidata : quand Wikipédia s'intéresse aux données
Wikidata : quand Wikipédia s'intéresse aux données
 

Dernier

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetEnjoy Anytime
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Hyundai Motor Group
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraDeakin University
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAndikSusilo4
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Dernier (20)

SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...Swan(sea) Song – personal research during my six years at Swansea ... and bey...
Swan(sea) Song – personal research during my six years at Swansea ... and bey...
 
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your BudgetHyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
Hyderabad Call Girls Khairatabad ✨ 7001305949 ✨ Cheap Price Your Budget
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2Next-generation AAM aircraft unveiled by Supernal, S-A2
Next-generation AAM aircraft unveiled by Supernal, S-A2
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Artificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning eraArtificial intelligence in the post-deep learning era
Artificial intelligence in the post-deep learning era
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
The transition to renewables in India.pdf
The transition to renewables in India.pdfThe transition to renewables in India.pdf
The transition to renewables in India.pdf
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
Azure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & ApplicationAzure Monitor & Application Insight to monitor Infrastructure & Application
Azure Monitor & Application Insight to monitor Infrastructure & Application
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

WISS 2015 - Machine Learning lecture by Ludovic Samper

  • 1. Machine Learning Ludovic Samper Antidot September 1st, 2015 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 1 / 77
  • 2. Antidot Software vendor since 1999 Paris, Lyon, Aix-en-Provence 45 employees Founders : Fabrice Lacroix CEO, St´ephane Loesel CTO, J´erˆome Mainka Chief Scientist Officer Software products and solutions Antidot Finder Suite (AFS) search engine Antidot Information Factory (AIF) a pipe & filters framework SaaS, Hosted License, 0n-site License 50% of the revenue invested in R&D Ludovic Samper (Antidot) Machine Learning September 1st, 2015 2 / 77
  • 3. Antidot Machine Learning Automatic text document classification Named Entity Extraction Compound Splitter (for german words) Clustering algorithm (for news agregation) Open Data, Semantic Web http://www.rechercheisidore.fr/ Social Sciences and Humanities research platform. Enriched with open resources https://github.com/antidot/db2triples/ open source library to export a db in RDF Antidot is a Partner organization in WDAqua project Ludovic Samper (Antidot) Machine Learning September 1st, 2015 3 / 77
  • 4. Tutorial Study a classical task in Machine Learning : text classification Show scikit-learn.org Python machine learning library Follow the “Working with text data” tutorial : http://scikit-learn.org/stable/tutorial/text_analytics/ working_with_text_data.html Additional material on http://blog.antidot.net/ Ludovic Samper (Antidot) Machine Learning September 1st, 2015 4 / 77
  • 5. Summary of the tutorial 1 Problem definition Supervised classification Evaluation metrics 2 Extracting features from text files Bag of words model Term frequency inverse document frequency (tfidf) 3 Algorithms for classification Na¨ıve Bayes Support Vector Machine (SVM) Tuning parameters Cross validation Grid search 4 Conclusion Methodology Ludovic Samper (Antidot) Machine Learning September 1st, 2015 5 / 77
  • 6. Sommaire 1 Problem definition Supervised classification Evaluation metrics 2 Extracting features from text files 3 Algorithms for classification 4 Conclusion Ludovic Samper (Antidot) Machine Learning September 1st, 2015 6 / 77
  • 7. 20 newsgroups dataset http://qwone.com/~jason/20Newsgroups/ 20 newsgroups 20 newsgroups documents collected in the 90’s The label is the newsgroup the document belongs to A popular collection 18846 documents : 11314 in train, 7532 in test wiss-ml.ipynb#The-20-newsgroups-dataset Ludovic Samper (Antidot) Machine Learning September 1st, 2015 7 / 77
  • 8. Classification Problem statement One label per document Automatically determine the label of an unseen document. Set of documents and their labels A supervised classification problem Training Set of documents and their labels Build a model Inference Given a new document, use the model to predict its label Ludovic Samper (Antidot) Machine Learning September 1st, 2015 8 / 77
  • 9. Precision and Recall I Binary classification C C Labeled C TP True Positive FP False Positive Not labeled C FN False Negative TN True Negative Precision TP TP + FP Proba(e ∈ C|e labeled C ) Recall TP TP + FN Proba(e labeled C|e ∈ C) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 9 / 77
  • 10. Precision and Recall II F1 F1 = 2 P × R P + R Harmonic mean of Precision and Recall Accuracy TP + TN TP + TN + FP + FN Ludovic Samper (Antidot) Machine Learning September 1st, 2015 10 / 77
  • 11. Multiclass I NC = number of class Macro Average Bmacro = 1 NC NC k=1 (Bbinary (TPk, FPk, TNk, FNk)) Average mesure by class. Large classes count has much as small ones. Micro Average Bmicro = Bbinary ( NC k=1 TPi , NC k=1 FPi , NC k=1 TNk, NC k=1 FNk) Average mesure by instance Ludovic Samper (Antidot) Machine Learning September 1st, 2015 11 / 77
  • 12. Multiclass II Micro average in single label multiclass NC k=1 (FNk) = NC k=1 (FPk) and NC k=1 (TNk) = NC k=1 (TPk) Then, Precisionmicro = Recallmicro = Accuracy = NC k=1(TPk) Nbdoc Ludovic Samper (Antidot) Machine Learning September 1st, 2015 12 / 77
  • 13. Sommaire 1 Problem definition 2 Extracting features from text files Bag of words model Term frequency inverse document frequency (tfidf) 3 Algorithms for classification 4 Conclusion Ludovic Samper (Antidot) Machine Learning September 1st, 2015 13 / 77
  • 14. Bag of words From text to features Count the number of occurrences of words in text “bag” because position isn’t taken into account Extensions Remove stop words Remove too frequent words (max_df) lowercase Ngram (ngram_range) tokenize ngrams instead of words. Useful to take into account word positions wiss-ml.ipynb#Bag-of-words Ludovic Samper (Antidot) Machine Learning September 1st, 2015 14 / 77
  • 15. Term frequency inverse document frequency (tfidf) I Intuition Take into account relative importance of each word regarding the whole dataset If a word occurs in every document, it doesn’t hold any information Ludovic Samper (Antidot) Machine Learning September 1st, 2015 15 / 77
  • 16. Term frequency inverse document frequency (tfidf) II Definition Term frequency × inverse document frequency tfidf (w, d) = tf (w, d) × idf (w, d) tf (w, d) = term frequency(word w in doc d) idf (w) = log( Ndoc doc freq(w) ) In scikit-learn : tfidf (w, d) = tf (w, d) × (idf (w) + 1) Terms that occurs in all documents idf = 0 will not be ignored Ludovic Samper (Antidot) Machine Learning September 1st, 2015 16 / 77
  • 17. Term frequency inverse document frequency (tfidf) III Options Normalisation ||doc|| = 1. Ex, for norm L2, w∈d tfidf(w, d)2 = 1 Smoothing : add one to document frequencies as if an extra doc contained every term in the collection exactly once idf (w) = log( Ndoc + 1 doc freq(w) + 1 ) Example Show most significants words of a doc wiss-ml.ipynb#Tfidf Ludovic Samper (Antidot) Machine Learning September 1st, 2015 17 / 77
  • 18. Sommaire 1 Problem definition 2 Extracting features from text files 3 Algorithms for classification Na¨ıve Bayes Support Vector Machine (SVM) Tuning parameters Cross validation Grid search 4 Conclusion Ludovic Samper (Antidot) Machine Learning September 1st, 2015 18 / 77
  • 19. Supervised classification problem I Notations x = (x1, · · · , xn) = (xi )0≤i<n feature vector {(xd , yd )}0≤d<D the training set ∀i, xi ∈ Rn xi feature vector for document i n dimension of the feature space ∀d, yd ∈ {1, · · · , NC } NC the number of classes yd the class of document d ˆy class prediction For a new vector x, ˆy is the predicted class of x. Ludovic Samper (Antidot) Machine Learning September 1st, 2015 19 / 77
  • 20. Supervised classification problem II Goal Find a function F : Rn → {1, · · · , NC } x → ˆy Ludovic Samper (Antidot) Machine Learning September 1st, 2015 20 / 77
  • 21. In 20newsgroups I Values in 20 newsgroups n = 130107 nb features (number of unique terms) D = 11314 training samples NC = 20 different classes Goal Find a function F that given a new document predicts its class Ludovic Samper (Antidot) Machine Learning September 1st, 2015 21 / 77
  • 22. Na¨ıve Bayes Algorithm I Bayes’ theorem P(A|B) = P(B|A)P(A) P(B) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 22 / 77
  • 23. Na¨ıve Bayes Algorithm II Posterior probability of class C P(C|x) = P(x|C)P(C) P(x) P(x) does not depend on C, P(C|x) ∝ P(x|C)P(C) Na¨ıve Bayes independent assumption : each feature i is conditionally independent of every other feature j P(C|x) ∝ P(C) × n i=1 P(xi |C) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 23 / 77
  • 24. Na¨ıve Bayes Algorithm III Classifier from the probability model ˆy = arg max k∈{1,··· ,NC } P(y = k) × n i=0 P(xi |y = k) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 24 / 77
  • 25. Parameter estimation in Na¨ıve Bayes’ classifier Prior of a class P(y = k) = nb samples in class k total nb samples Can also be uniform : P(y = k) = 1 NC Ludovic Samper (Antidot) Machine Learning September 1st, 2015 25 / 77
  • 26. Multinomial Na¨ıve Bayes I Na¨ıve Bayes P(x|y = k) = n i=1 P(xi |y = k) Multinomial distribution Event word is i follows a multinomial distribution with parameters (p1, · · · , pn) where pi = P(word = i) P(x1, · · · , xn) = n i=1 pxi i Where i pi = 1. pi = P(w = i) One distribution for each class y. Ludovic Samper (Antidot) Machine Learning September 1st, 2015 26 / 77
  • 27. Multinomial Na¨ıve Bayes II Multinomial Na¨ıve Bayes One multinomial distribution for each class P(i|y = k) = sum of occurrences of word xi in class k total nb words in class k = d∈k xi 0≤j<n d∈k xj With smoothing, P(i|y = k) = d∈k xi + α 0≤j<n d∈k xj + αn Ludovic Samper (Antidot) Machine Learning September 1st, 2015 27 / 77
  • 28. Multinomial Na¨ıve Bayes III Inference in Multinomial Na¨ıve Bayes ˆy = arg max k P(y = k|x) = arg max k P(y = k) 0≤i<n P(i|y = k)xi = arg max k log(P(y = k)) + 0≤i<n xi log(P(i|y = k)) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 28 / 77
  • 29. Multinomial Na¨ıve Bayes IV A linear model In the log space, (log P(y = k|x))k ∝ W0 + W T .x W0, is the vector of priors : W0 = log(P(y = k)) W is the matrix of distributions : W = (wik), i ∈ [1, n], k ∈ [1, NC ] wik = log P(i|y = k) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 29 / 77
  • 30. Multinomial Na¨ıve Bayes V Example step-by-step http://www.antidot.net/wiss2015/wiss-ml.html#Naive-Bayes Ludovic Samper (Antidot) Machine Learning September 1st, 2015 30 / 77
  • 31. Sommaire 1 Problem definition 2 Extracting features from text files 3 Algorithms for classification Na¨ıve Bayes Support Vector Machine (SVM) Tuning parameters Cross validation Grid search 4 Conclusion Ludovic Samper (Antidot) Machine Learning September 1st, 2015 31 / 77
  • 32. A linear classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 32 / 77
  • 33. A linear classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 33 / 77
  • 34. A linear classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 34 / 77
  • 35. A linear classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 35 / 77
  • 36. A linear classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 36 / 77
  • 37. Support Vector Machine, notations Problem S, training set {(xi , yi ), xi ∈ Rn , yi ∈ {−1, 1}}i∈0..D Find a linear function w, xi + b such that : sign( w, xi + b) = yi Ludovic Samper (Antidot) Machine Learning September 1st, 2015 37 / 77
  • 38. SVM, maximum margin classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 38 / 77
  • 39. Margin distance(x+, x−) = w ||w|| , x+ − x− = 1 ||w|| ( w, x+ − w, x− ) = 1 ||w|| (( w, x+ + b) − ( w, x− + b)) = 1 ||w|| (1 − (−1)) = 2 ||w|| Ludovic Samper (Antidot) Machine Learning September 1st, 2015 39 / 77
  • 40. SVM, maximum margin classifier Ludovic Samper (Antidot) Machine Learning September 1st, 2015 40 / 77
  • 41. Solving an optimization problem using the Lagrangien Primal problem minimizew,bf (w, b) Under the constraints, hi (w, b) ≥ 0 Lagrange function L(w, b, α) = f (w, b) − i αi hi (w, b) Let, g(α) = inf(w,b) L(w, b, α) ∀w, b, g(α) ≤ L(w, b, α) Moreover, L(w, b, α) ≤ f (w, b) Thus, ∀αi ≥ 0, g(α) ≤ minw,b f (w, b) And with Karush Kuhn Tucker (KKT) optimality condition, max α g(α) = min w,b f (w, b) ⇔ αi hi (w, x) = 0 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 41 / 77
  • 42. Support Vector Machine, problem Primal problem minimize(w,b) ||w||2 2 Under the constraints, ∀0 < i ≤ D, yi ( w, xi + b) ≥ 1 Lagrange function L(w, b, α) = 1 2 ||w||2 − i αi (yi ( w, xi + b) − 1) Dual problem : maximize(w,b,α)L(w, b, α) with αi ≥ 0 Optimality in w, b is a saddle point with α Ludovic Samper (Antidot) Machine Learning September 1st, 2015 42 / 77
  • 43. Support Vector Machine, problem Derivative in w, b need to vanish ∂ ∂w L(w, b, α) = w − i αi yi xi = 0 ∂ ∂b L(w, b, α) = i αi yi = 0 Dual problem maximizeα − 1 2 i,j αi αj yi yj xi , xj + i αi under the constraints, i αi yi = 0 αi ≥ 0 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 43 / 77
  • 44. Support Vectors Support vectors w = i yi αi xi Karush Kuhn Tucker (KKT) optimality condition Lagrange multiplier times constraint equals zero αi (yi ( w, xi + b) − 1) = 0 Thus, αi = 0 αi > 0 ⇒ yi ( w, xi + b) = 1 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 44 / 77
  • 45. Experiments with separable space SVMvaryingC.ipynb Ludovic Samper (Antidot) Machine Learning September 1st, 2015 45 / 77
  • 46. What happens if space is not separable Ludovic Samper (Antidot) Machine Learning September 1st, 2015 46 / 77
  • 47. Adding slack variable Problem was minimize(w,b) ||w||2 2 With, yi (w.xi + b) ≥ 1 With slack minimize(w,b) ||w||2 2 + C i ξi With, yi (w.xi + b) ≥ 1 − ξi ξi ≥ 0 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 47 / 77
  • 48. Support Vector Machine, without slack Primal problem minimize(w,b) ||w||2 2 With, yi (w.xi + b) ≥ 1 Lagrange function L(w, b, α) = 1 2 ||w||2 − i αi (yi ( w, xi + b) − 1) Dual problem : maximize(w,b,α)L(w, b, α) Optimality in w, b, is a saddle point with α Ludovic Samper (Antidot) Machine Learning September 1st, 2015 48 / 77
  • 49. Support Vector Machine, with slack Primal problem minimize(w,b) ||w||2 2 + C i ξi With, yi (w.xi + b) ≥ 1 − ξi ξi ≥ 0 Lagrange function L(w, b, ξ, α, η) = 1 2 ||w||2 + C i ξi − i αi (yi ( xi , w + b) + ξi − 1) − i ηi ξi Dual problem : maximize(w,b,ξ,α,η)L(w, b, ξ, α, η) Optimality in w, b, ξ is a saddle point with α, η Ludovic Samper (Antidot) Machine Learning September 1st, 2015 49 / 77
  • 50. Support Vector Machine, problem Derivative in w, b, ξ need to vanish ∂ ∂w L(w, b, ξ, α, η) = w − i αi yi xi = 0 ∂ ∂b L(w, b, ξ, α, η) = i αi yi = 0 ∂ ∂ξ L(w, b, ξ, α, η) = C − αi − ηi = 0 ⇒ ηi = C − αi Dual problem maximizeα − 1 2 i,j αi αj yi yj xi , xj + i αi under the constraints, i αi yi = 0 and 0 ≤ αi ≤ C Ludovic Samper (Antidot) Machine Learning September 1st, 2015 50 / 77
  • 51. Support Vectors Support vectors w = i yi αi xi Karush Kuhn Tucker (KKT) optimality condition Lagrange multiplier times constraint equals zero αi (yi ( w, xi + b) + ξi − 1) = 0 ηi ξi = 0 ⇔ (C − αi )ξi = 0 Thus,    αi = 0 ⇒ yi ( w, xi + b) ≥ 1 0 < αi < C ⇒ yi ( w, xi + b) = 1 αi = C ⇒ yi ( w, xi + b) ≤ 1 Ludovic Samper (Antidot) Machine Learning September 1st, 2015 51 / 77
  • 52. Support Vector Machine, Loss functions Primal problem minimize(w,b) ||w||2 2 + C i ξi With, yi (w.xi + b) ≥ 1 − ξi ξi ≥ 0 With loss function minimize(w,b) ||w||2 2 + C i max(0, 1 − yi (w.xi + b)) here, loss(xi , yi ) = max(0, 1 − yi (w.xi + b)) = max(0, 1 − f (xi )) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 52 / 77
  • 53. Support Vector Machine, Common loss functions Common loss functions hinge loss, L1-loss : max(0, 1 − yi (w.xi + b)) squares hinge L2-loss : max(0, (1 − yi (w.xi + b))2 ) logistic loss : log(1 + exp(−yi (w.xi + b))) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 53 / 77
  • 54. Ludovic Samper (Antidot) Machine Learning September 1st, 2015 54 / 77
  • 55. Expermiments with different values for C SVMvaryingC.ipynb#Varying-C-parameter Ludovic Samper (Antidot) Machine Learning September 1st, 2015 55 / 77
  • 56. Non linearly separable data Ludovic Samper (Antidot) Machine Learning September 1st, 2015 56 / 77
  • 57. Non linearly separable data, Φ(x) = (x, x2 ) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 57 / 77
  • 58. Non linearly separable data, Φ(x) = (x, x2 ) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 58 / 77
  • 59. Linear case Primal Problem minimizew,b 1 2 ||w||2 + C i ξi subject to, yi ( w, xi + b) ≥ 1 − ξi and ξi ≥ 0 Dual Problem maximizeα 1 2 i,j αi αj yi yj xi , xj + i αi subject to, i αi yi = 0 and 0 ≤ αi ≤ C Support vector expansion f (x) = i αi yi xi , x + b Ludovic Samper (Antidot) Machine Learning September 1st, 2015 59 / 77
  • 60. With a transformation Φ : x → Φ(x) Primal Problem minimizew,b 1 2 ||w||2 + C i ξi subject to, yi ( w, Φ(xi ) + b) ≥ 1 − ξi and ξi ≥ 0 Dual Problem maximizeα 1 2 i,j αi αj yi yj Φ(xi ), Φ(xj ) + i αi subject to, i αi yi = 0 and 0 ≤ αi ≤ C Support vector expansion f (x) = i αi yi Φ(xi ), Φ(x) + b Ludovic Samper (Antidot) Machine Learning September 1st, 2015 60 / 77
  • 61. The kernel trick Kernel function k(x, x ) = Φ(x), Φ(x ) We just need to compute the dot product in the new space Dual Problem maximizeα 1 2 i,j αi αj yi yj k(xi , xj ) + i αi subject to, i αi yi = 0 and 0 ≤ αi ≤ C Support vector expansion f (x) = i αi yi k(xi , x) + b Ludovic Samper (Antidot) Machine Learning September 1st, 2015 61 / 77
  • 62. Kernels Kernel functions linear : k(x, x ) = x, x polynomial : k(x, x ) = (γ x, x + r)d rbf : k(x, x ) = exp(−γ|x − x |2) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 62 / 77
  • 63. RBF Kernel imply an infinite space Here we’re in dimension 1, x ∈ R k(x, x ) = exp(−(x − x )2 ) = exp(−x2 )exp(−x 2 )exp(2xx ) With Taylor transformation, k(x, x ) = exp(−x2 )exp(−x 2 ) ∞ k=0 2kxkx k k! = (· · · , 2k−1 √ k! exp(−x2 )xk , · · · ), (· · · , 2k−1 √ k! exp(−x 2 )x k , · · · ) Ludovic Samper (Antidot) Machine Learning September 1st, 2015 63 / 77
  • 64. Experiments with different kernels www.antidot.net/wiss2015/SVMvaryingC.html#Non-linear-kernels Ludovic Samper (Antidot) Machine Learning September 1st, 2015 64 / 77
  • 65. SVM in multiclass one-vs-the rest NC binary classifiers (but each involving all dataset) At prediction time, choose the class with maximum decision value one-vs-one NC (NC −1) 2 binary classifiers At prediction time, vote Ludovic Samper (Antidot) Machine Learning September 1st, 2015 65 / 77
  • 66. SVM in scikit-learn SVC : Support Vector Classification sklearn.svm.linearSVC based on Liblinear library strategy : one-vs-the rest only linear kernel loss can be : ‘hinge’ or ‘squared hinge’ sklearn.svm.SVC based on libSVM multiclass strategy : one-vs-one kernel can be : linear, polynomial, RBF, sigmoid, precomputed only hinge loss Ludovic Samper (Antidot) Machine Learning September 1st, 2015 66 / 77
  • 67. Sommaire 1 Problem definition 2 Extracting features from text files 3 Algorithms for classification Na¨ıve Bayes Support Vector Machine (SVM) Tuning parameters Cross validation Grid search 4 Conclusion Ludovic Samper (Antidot) Machine Learning September 1st, 2015 67 / 77
  • 68. Cross validation I http://scikit-learn.org/stable/modules/cross_validation.html Overfitting Estimation of parameters on the test set can lead to overfitting : parameters are the best for this test set but not in the general case. Train, test and validation dataset A solution : tweak the parameters on the test set validate on a validation dataset only few data in training dataset Ludovic Samper (Antidot) Machine Learning September 1st, 2015 68 / 77
  • 69. Cross validation II Cross validation k-fold cross validation Split training data in k partitions of the same size train the model on k − 1 partitions then, evaluate on the kth partition Ludovic Samper (Antidot) Machine Learning September 1st, 2015 69 / 77
  • 70. Cross validation III Ludovic Samper (Antidot) Machine Learning September 1st, 2015 70 / 77
  • 71. Grid Search http://scikit-learn.org/stable/modules/grid_search.html Grid search Test each value for each parameter brut force algorithm to find the best value for each parameter In scikit-learn Automatically runs k× number of parameters’ values trainings Keeps the best model Demo with scikit-learn http://www.antidot.net/wiss2015/grid_search_20newsgroups.html Ludovic Samper (Antidot) Machine Learning September 1st, 2015 71 / 77
  • 72. Sommaire 1 Problem definition 2 Extracting features from text files 3 Algorithms for classification 4 Conclusion Methodology Ludovic Samper (Antidot) Machine Learning September 1st, 2015 72 / 77
  • 73. 1 Problem definition Supervised classification Evaluation metrics 2 Extracting features from text files Bag of words model Term frequency inverse document frequency (tfidf) 3 Algorithms for classification Na¨ıve Bayes Support Vector Machine (SVM) Tuning parameters Cross validation Grid search 4 Conclusion Methodology Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77
  • 74. Methodology To solve a problem using Machine Learning, you have to : 1 Understand the data 2 Choose an evaluation measure 3 Be able to test the model 4 Find the main features 5 Try the algorithms, with different parameters Ludovic Samper (Antidot) Machine Learning September 1st, 2015 73 / 77
  • 75. Conclusion Machine Learning has a lot of applications With libraries like scikit-learn, no need to implement algorithms yourself Ludovic Samper (Antidot) Machine Learning September 1st, 2015 74 / 77
  • 76. Questions ? Ludovic Samper (Antidot) Machine Learning September 1st, 2015 75 / 77
  • 77. References Machine Learning in Python : http://scikit-learn.org Alex Smola very good lecture on Machine Learning at CMU : http://alex.smola.org/teaching/10-701-15/ Kernels : https://www.youtube.com/watch?v=0Nis-oMLbDs SVM : https://www.youtube.com/watch?v=bsbpqNIKQzU Ludovic Samper (Antidot) Machine Learning September 1st, 2015 76 / 77
  • 78. Bernoulli Na¨ıve Bayes Features xi = 1 iff word i is present in document Else, xi = 0 The number of occurrences of word i doesn’t matter Bernoulli For each feature i, P(xi |y = k) = P(i|y = k)xi + (1 − P(i|y = k))(1 − xi ) Absence of a feature is explicitly taken into account Estimation of P(i|y = k) P(i|y = k) = 1 + nb of documents in k that contains word i nb of documents in k Ludovic Samper (Antidot) Machine Learning September 1st, 2015 77 / 77