SlideShare une entreprise Scribd logo
1  sur  22
Text Classification and Naïve Bayes
An example of text classification
Definition of a machine learning problem
A refresher on probability
The Naive Bayes classifier
1
Google News
2
Different ways for classification
Human labor (people assign categories to every incoming
article)
Hand-crafted rules for automatic classification
 If article contains: stock, Dow, share, Nasdaq, etc.  Business
 If article contains: set, breakpoint, player, Federer, etc.  Tennis
Machine learning algorithms
3
What is Machine Learning?
4
Definition: A computer program is said to learn from
experience E when its performance P at a task T
improves with experience E.
Tom Mitchell, Machine Learning, 1997
Examples:
- Learning to recognize spoken words
- Learning to drive a vehicle
- Learning to play backgammon
Components of a ML System (1)
Experience (a set of examples that combines together
input and output for a task)
 Text categorization: document + category
 Speech recognition: spoken text + written text
Experience is referred to as Training Data. When training
data is available, we talk of Supervised Learning.
Performance metrics
 Error or accuracy in the Test Data
 Test Data are not present in the Training Data
 When there are few training data, methods like ‘leave-one-out’ or
‘ten-fold cross validation’ are used to measure error.
5
Components of a ML System (2)
Type of knowledge to be learned (known as the target
function, that will map between input and output)
Representation of the target function
 Decision trees
 Neural networks
 Linear functions
The learning algorithm
 C4.5 (learns decision trees)
 Gradient descent (learns a neural network)
 Linear programming (learns linear functions)
6
Task
Defining Text Classification
7
XdX∈d
},,,{ 21 Jccc =C
D cd,
C×∈Xcd,
C→X:γ
γ=Γ D)(
the document in the multi-dimensional space
a set of classes (categories, or labels)
the training set of labeled documents
Target function:
Learning algorithm:
=cd, “Beijing joins the World Trade Organization”, China
cd =)(γ =)(dγ China
Naïve Bayes Learning
8
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
cd =)(γ
Learning Algorithm: Naïve Bayes
Target Function:
)|()(maxarg)|(maxarg cdPcPdcPc
CcCc
MAP
∈∈
==
)(cP
)|( cdP
The generative process:
)|( dcP
a priori probability, of choosing a category
the cond. prob. of generating d, given the fixed c
a posteriori probability that c generated d
A Refresher on Probability
9
Visualizing probability
A is a random variable that denotes an uncertain event
 Example: A = “I’ll get an A+ in the final exam”
P(A) is “the fraction of possible worlds where A is true”
10
Worlds in
which A
is true
Slide: Andrew W. Moore
Worlds in which A is false
Event space of all possible
worlds. Its area is 1.
P(A) = Area of the blue
circle.
Axioms and Theorems of Probability
Axioms:
 0 <= P(A) <= 1
 P(True) = 1
 P(False) = 0
 P(A or B) = P(A) + P(B) – P(A and B)
Theorems:
 P(not A) = P(~A) = 1 – P(A)
 P(A) = P(A ^ B) + P(A ^ ~B)
11
Conditional Probability
P(A|B) = the probability of A being true, given that we
know that B is true
12
F
H
H = “I have a headache”
F = “Coming down with flu”
P(H) = 1/10
P(F) = 1/40
P(H/F) = 1/2
Slide: Andrew W. Moore
Headaches are rare and flu
even rarer, but if you got that flu,
there is a 50-50 chance you’ll
have a headache.
Deriving the Bayes Rule
13
)(
)(
)|(
BP
BAP
BAP
∧
=Conditional Probability:
)()|()( BPBAPBAP =∧Chain rule:
)()|()()( APABPABPBAP =∧=∧
Bayes Rule:
)(
)()|(
)|(
AP
BPBAP
ABP =
Back to the Naïve Bayes Classifier
14
Deriving the Naïve Bayes
15
)(
)()|(
)|(
AP
BPBAP
ABP = (Bayes Rule)
21,cc 'dGiven two classes and the document
)'(
)|'()(
)'|( 11
1
dP
cdPcP
dcP =
)'(
)|'()(
)'|( 22
2
dP
cdPcP
dcP =
We are looking for a that maximizes the a-posterioriic )'|( dcP i
)'(dP (the denominator) is the same in both cases
)|()(maxarg cdPcPc
Cc
MAP
∈
=Thus:
Estimating parameters for the
target function
We are looking for the estimates and
16
)(ˆ cP )|(ˆ cdP
P(c) is the fraction of possible worlds where c is true.
N
N
cP c
=)(ˆ N – number of all documents
Nc – number of documents in class c
d is a vector in the space X
)|,,,()|( 2 ctttPcdP dni =
where each dimension is a term:
)()|()( BPBAPBAP =∧By using the chain rule: we have:
(P
),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni =
...=
Naïve assumptions of independence
1. All attribute values are independent of each other given
the class. (conditional independence assumption)
2. The conditional probabilities for a term are the same
independent of position in the document.
We assume the document is a “bag-of-words”.
17
∏≤≤
==
d
d
nk
kni ctPctttPcdP
1
2 )|()|,,,()|( 
∏≤≤∈∈
==
dnk
k
CcCc
MAP ctPcPdcPc
1
)|(ˆ)(ˆmaxarg)|(ˆmaxarg
Finally, we get the target function of Slide 8:
Again about estimation
18
For each term, t, we need to estimate P(t|c)
∑ ∈
=
Vt ct
ct
T
T
ctP
' '
)|(ˆ
Because an estimate will be 0 if a term does not appear with a class
in the training data, we need smoothing:
||)(
1
)1(
1
)|(ˆ
' '' ' VT
T
T
T
ctP
Vt ct
ct
Vt ct
ct
∑∑ ∈∈
+
+
=
+
+
=Laplace
Smoothing
|V| is the number of terms in the vocabulary
Tct is the count of term t in all documents of class c
An Example of classification with
Naïve Bayes
19
Example 13.1 (Part 1)
20
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
Two classes: “China”, “not China”
N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP
V = {Beijing, Chinese, Japan, Macao, Tokyo}
Example 13.1 (Part 1)
21
Training
set
docID c = China?
1 Chinese Beijing Chinese Yes
2 Chinese Chinese Shangai Yes
3 Chinese Macao Yes
4 Tokyo Japan Chinese No
Test set 5 Chinese Chinese Chinese Tokyo Japan ?
7/3)68/()15()|Chinese(ˆ =++=cP
14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
9/2)63/()11()|Chinese(ˆ =++=cP
9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP
Estimation Classification
∏≤≤
∝
dnk
k ctPcPdcP
1
)|()()|(
0001.09/29/2)9/2(4/1)|(
0003.014/114/1)7/3(4/3)|(
3
5
3
5
≈⋅⋅⋅∝
≈⋅⋅⋅∝
dcP
dcP
Summary: Miscellanious
Naïve Bayes is linear in the time is takes to scan the data
When we have many terms, the product of probabilities
with cause a floating point underflow, therefore:
For a large training set, the vocabulary is large. It is better
to select only a subset of terms. For that is used “feature
selection” (Section 13.5).
22
∑≤≤∈
+=
dnk
k
Cc
MAP ctPcPc
1
)|(log)(ˆ[logmaxarg

Contenu connexe

Tendances

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…Dongseo University
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes ClassifierArunabha Saha
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2butest
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier108kaushik
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2Srinivasan R
 
Introductory maths analysis chapter 08 official
Introductory maths analysis   chapter 08 officialIntroductory maths analysis   chapter 08 official
Introductory maths analysis chapter 08 officialEvert Sandye Taasiringan
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningMark Chang
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningMark Chang
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresAnmol Dwivedi
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksAnmol Dwivedi
 

Tendances (15)

2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
2013-1 Machine Learning Lecture 06 - Artur Ferreira - A Survey on Boosting…
 
adaboost
adaboostadaboost
adaboost
 
Naive Bayes Classifier
Naive Bayes ClassifierNaive Bayes Classifier
Naive Bayes Classifier
 
Bayes 6
Bayes 6Bayes 6
Bayes 6
 
Machine Learning Chapter 11 2
Machine Learning Chapter 11 2Machine Learning Chapter 11 2
Machine Learning Chapter 11 2
 
Pattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifierPattern recognition binoy 05-naive bayes classifier
Pattern recognition binoy 05-naive bayes classifier
 
Machine learning Lecture 2
Machine learning Lecture 2Machine learning Lecture 2
Machine learning Lecture 2
 
ppt
pptppt
ppt
 
Naive Bayes
Naive BayesNaive Bayes
Naive Bayes
 
Introductory maths analysis chapter 08 official
Introductory maths analysis   chapter 08 officialIntroductory maths analysis   chapter 08 official
Introductory maths analysis chapter 08 official
 
PAC Bayesian for Deep Learning
PAC Bayesian for Deep LearningPAC Bayesian for Deep Learning
PAC Bayesian for Deep Learning
 
PAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep LearningPAC-Bayesian Bound for Deep Learning
PAC-Bayesian Bound for Deep Learning
 
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence MeasuresLinear Discriminant Analysis (LDA) Under f-Divergence Measures
Linear Discriminant Analysis (LDA) Under f-Divergence Measures
 
06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes06 Machine Learning - Naive Bayes
06 Machine Learning - Naive Bayes
 
Tutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian NetworksTutorial on Belief Propagation in Bayesian Networks
Tutorial on Belief Propagation in Bayesian Networks
 

En vedette

Loeb's crunch investor deck 2015.v4
Loeb's crunch investor deck 2015.v4Loeb's crunch investor deck 2015.v4
Loeb's crunch investor deck 2015.v4Gbarrera26
 
Command Channel Slides Week Between Aug. 28 to Sept. 3
Command Channel Slides Week Between Aug. 28 to Sept. 3Command Channel Slides Week Between Aug. 28 to Sept. 3
Command Channel Slides Week Between Aug. 28 to Sept. 3U.S. Army Garrison Japan
 
BIS Conferrings 2016
BIS Conferrings 2016BIS Conferrings 2016
BIS Conferrings 2016Jeremy Hayes
 
apresentacao_patlib_2006
apresentacao_patlib_2006apresentacao_patlib_2006
apresentacao_patlib_2006Pedro Mota
 
Sheets lesweek4 les 4.1
Sheets lesweek4 les 4.1Sheets lesweek4 les 4.1
Sheets lesweek4 les 4.1Jos de Jong
 
Receta huevos duros
Receta huevos durosReceta huevos duros
Receta huevos durosIván Glez.
 
Chico xavier nosso-lar-i
Chico xavier nosso-lar-iChico xavier nosso-lar-i
Chico xavier nosso-lar-iValentin Badea
 
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y pauta bre...
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y  pauta bre...Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y  pauta bre...
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y pauta bre...CICAT SALUD
 
Five Qualities Investors Look For In A Startup Team
Five Qualities Investors Look For In A Startup TeamFive Qualities Investors Look For In A Startup Team
Five Qualities Investors Look For In A Startup TeamAbhishek Shah
 

En vedette (15)

Loeb's crunch investor deck 2015.v4
Loeb's crunch investor deck 2015.v4Loeb's crunch investor deck 2015.v4
Loeb's crunch investor deck 2015.v4
 
Command Channel Slides Week Between Aug. 28 to Sept. 3
Command Channel Slides Week Between Aug. 28 to Sept. 3Command Channel Slides Week Between Aug. 28 to Sept. 3
Command Channel Slides Week Between Aug. 28 to Sept. 3
 
BIS Conferrings 2016
BIS Conferrings 2016BIS Conferrings 2016
BIS Conferrings 2016
 
apresentacao_patlib_2006
apresentacao_patlib_2006apresentacao_patlib_2006
apresentacao_patlib_2006
 
Garden chapel
Garden chapelGarden chapel
Garden chapel
 
Loeb's Crunch
Loeb's CrunchLoeb's Crunch
Loeb's Crunch
 
Sheets lesweek4 les 4.1
Sheets lesweek4 les 4.1Sheets lesweek4 les 4.1
Sheets lesweek4 les 4.1
 
Receta huevos duros
Receta huevos durosReceta huevos duros
Receta huevos duros
 
Storyboard
StoryboardStoryboard
Storyboard
 
Chico xavier nosso-lar-i
Chico xavier nosso-lar-iChico xavier nosso-lar-i
Chico xavier nosso-lar-i
 
Hash crypto
Hash cryptoHash crypto
Hash crypto
 
Le CNRE
Le CNRELe CNRE
Le CNRE
 
Brandon
BrandonBrandon
Brandon
 
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y pauta bre...
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y  pauta bre...Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y  pauta bre...
Evaluación del desarrollo psicomotor del niño de 0 a 3 años EEDP y pauta bre...
 
Five Qualities Investors Look For In A Startup Team
Five Qualities Investors Look For In A Startup TeamFive Qualities Investors Look For In A Startup Team
Five Qualities Investors Look For In A Startup Team
 

Similaire à Text classification

Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsCheng-You Lu
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonChun-Ming Chang
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程台灣資料科學年會
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Languagevsssuresh
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2butest
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentNazeer Wahab
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)Pierre Schaus
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionMargaret Wang
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.pptImXaib
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptmoiza354
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxavinashBajpayee1
 

Similaire à Text classification (20)

Naive Bayes Presentation
Naive Bayes PresentationNaive Bayes Presentation
Naive Bayes Presentation
 
Joint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labelsJoint optimization framework for learning with noisy labels
Joint optimization framework for learning with noisy labels
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程[系列活動] Machine Learning 機器學習課程
[系列活動] Machine Learning 機器學習課程
 
Automatic bayesian cubature
Automatic bayesian cubatureAutomatic bayesian cubature
Automatic bayesian cubature
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
Scala as a Declarative Language
Scala as a Declarative LanguageScala as a Declarative Language
Scala as a Declarative Language
 
Alpaydin - Chapter 2
Alpaydin - Chapter 2Alpaydin - Chapter 2
Alpaydin - Chapter 2
 
ch8Bayes.pptx
ch8Bayes.pptxch8Bayes.pptx
ch8Bayes.pptx
 
Python Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all departmentPython Lab manual program for BE First semester (all department
Python Lab manual program for BE First semester (all department
 
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)The Concurrent Constraint Programming Research Programmes -- Redux (part2)
The Concurrent Constraint Programming Research Programmes -- Redux (part2)
 
Data.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and predictionData.Mining.C.6(II).classification and prediction
Data.Mining.C.6(II).classification and prediction
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
ch8Bayes.ppt
ch8Bayes.pptch8Bayes.ppt
ch8Bayes.ppt
 
Introduction
IntroductionIntroduction
Introduction
 
Design and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.pptDesign and Analysis of Algorithm Brute Force 1.ppt
Design and Analysis of Algorithm Brute Force 1.ppt
 
Midterm
MidtermMidterm
Midterm
 
Unit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptxUnit-2 Bayes Decision Theory.pptx
Unit-2 Bayes Decision Theory.pptx
 

Plus de David Hoen

Computer security
Computer securityComputer security
Computer securityDavid Hoen
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prologDavid Hoen
 
Database introduction
Database introductionDatabase introduction
Database introductionDavid Hoen
 
Building a-database
Building a-databaseBuilding a-database
Building a-databaseDavid Hoen
 
Database constraints
Database constraintsDatabase constraints
Database constraintsDavid Hoen
 
Prolog programming
Prolog programmingProlog programming
Prolog programmingDavid Hoen
 
Introduction to security_and_crypto
Introduction to security_and_cryptoIntroduction to security_and_crypto
Introduction to security_and_cryptoDavid Hoen
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in cryptoDavid Hoen
 
Nlp naive bayes
Nlp naive bayesNlp naive bayes
Nlp naive bayesDavid Hoen
 
Access data connection
Access data connectionAccess data connection
Access data connectionDavid Hoen
 
Database concepts
Database conceptsDatabase concepts
Database conceptsDavid Hoen
 
Datamining with nb
Datamining with nbDatamining with nb
Datamining with nbDavid Hoen
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph David Hoen
 
Text classification methods
Text classification methodsText classification methods
Text classification methodsDavid Hoen
 
Information retrieval
Information retrievalInformation retrieval
Information retrievalDavid Hoen
 

Plus de David Hoen (20)

Computer security
Computer securityComputer security
Computer security
 
Introduction to prolog
Introduction to prologIntroduction to prolog
Introduction to prolog
 
Database introduction
Database introductionDatabase introduction
Database introduction
 
Building a-database
Building a-databaseBuilding a-database
Building a-database
 
Decision tree
Decision treeDecision tree
Decision tree
 
Database constraints
Database constraintsDatabase constraints
Database constraints
 
Prolog programming
Prolog programmingProlog programming
Prolog programming
 
Introduction to security_and_crypto
Introduction to security_and_cryptoIntroduction to security_and_crypto
Introduction to security_and_crypto
 
Key exchange in crypto
Key exchange in cryptoKey exchange in crypto
Key exchange in crypto
 
Nlp naive bayes
Nlp naive bayesNlp naive bayes
Nlp naive bayes
 
Prolog resume
Prolog resumeProlog resume
Prolog resume
 
Access data connection
Access data connectionAccess data connection
Access data connection
 
Basic dns-mod
Basic dns-modBasic dns-mod
Basic dns-mod
 
Database concepts
Database conceptsDatabase concepts
Database concepts
 
Hashfunction
HashfunctionHashfunction
Hashfunction
 
Datamining with nb
Datamining with nbDatamining with nb
Datamining with nb
 
Text categorization as a graph
Text categorization as a graph Text categorization as a graph
Text categorization as a graph
 
Xml schema
Xml schemaXml schema
Xml schema
 
Text classification methods
Text classification methodsText classification methods
Text classification methods
 
Information retrieval
Information retrievalInformation retrieval
Information retrieval
 

Dernier

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 

Dernier (20)

Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 

Text classification

  • 1. Text Classification and Naïve Bayes An example of text classification Definition of a machine learning problem A refresher on probability The Naive Bayes classifier 1
  • 3. Different ways for classification Human labor (people assign categories to every incoming article) Hand-crafted rules for automatic classification  If article contains: stock, Dow, share, Nasdaq, etc.  Business  If article contains: set, breakpoint, player, Federer, etc.  Tennis Machine learning algorithms 3
  • 4. What is Machine Learning? 4 Definition: A computer program is said to learn from experience E when its performance P at a task T improves with experience E. Tom Mitchell, Machine Learning, 1997 Examples: - Learning to recognize spoken words - Learning to drive a vehicle - Learning to play backgammon
  • 5. Components of a ML System (1) Experience (a set of examples that combines together input and output for a task)  Text categorization: document + category  Speech recognition: spoken text + written text Experience is referred to as Training Data. When training data is available, we talk of Supervised Learning. Performance metrics  Error or accuracy in the Test Data  Test Data are not present in the Training Data  When there are few training data, methods like ‘leave-one-out’ or ‘ten-fold cross validation’ are used to measure error. 5
  • 6. Components of a ML System (2) Type of knowledge to be learned (known as the target function, that will map between input and output) Representation of the target function  Decision trees  Neural networks  Linear functions The learning algorithm  C4.5 (learns decision trees)  Gradient descent (learns a neural network)  Linear programming (learns linear functions) 6 Task
  • 7. Defining Text Classification 7 XdX∈d },,,{ 21 Jccc =C D cd, C×∈Xcd, C→X:γ γ=Γ D)( the document in the multi-dimensional space a set of classes (categories, or labels) the training set of labeled documents Target function: Learning algorithm: =cd, “Beijing joins the World Trade Organization”, China cd =)(γ =)(dγ China
  • 8. Naïve Bayes Learning 8 ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg cd =)(γ Learning Algorithm: Naïve Bayes Target Function: )|()(maxarg)|(maxarg cdPcPdcPc CcCc MAP ∈∈ == )(cP )|( cdP The generative process: )|( dcP a priori probability, of choosing a category the cond. prob. of generating d, given the fixed c a posteriori probability that c generated d
  • 9. A Refresher on Probability 9
  • 10. Visualizing probability A is a random variable that denotes an uncertain event  Example: A = “I’ll get an A+ in the final exam” P(A) is “the fraction of possible worlds where A is true” 10 Worlds in which A is true Slide: Andrew W. Moore Worlds in which A is false Event space of all possible worlds. Its area is 1. P(A) = Area of the blue circle.
  • 11. Axioms and Theorems of Probability Axioms:  0 <= P(A) <= 1  P(True) = 1  P(False) = 0  P(A or B) = P(A) + P(B) – P(A and B) Theorems:  P(not A) = P(~A) = 1 – P(A)  P(A) = P(A ^ B) + P(A ^ ~B) 11
  • 12. Conditional Probability P(A|B) = the probability of A being true, given that we know that B is true 12 F H H = “I have a headache” F = “Coming down with flu” P(H) = 1/10 P(F) = 1/40 P(H/F) = 1/2 Slide: Andrew W. Moore Headaches are rare and flu even rarer, but if you got that flu, there is a 50-50 chance you’ll have a headache.
  • 13. Deriving the Bayes Rule 13 )( )( )|( BP BAP BAP ∧ =Conditional Probability: )()|()( BPBAPBAP =∧Chain rule: )()|()()( APABPABPBAP =∧=∧ Bayes Rule: )( )()|( )|( AP BPBAP ABP =
  • 14. Back to the Naïve Bayes Classifier 14
  • 15. Deriving the Naïve Bayes 15 )( )()|( )|( AP BPBAP ABP = (Bayes Rule) 21,cc 'dGiven two classes and the document )'( )|'()( )'|( 11 1 dP cdPcP dcP = )'( )|'()( )'|( 22 2 dP cdPcP dcP = We are looking for a that maximizes the a-posterioriic )'|( dcP i )'(dP (the denominator) is the same in both cases )|()(maxarg cdPcPc Cc MAP ∈ =Thus:
  • 16. Estimating parameters for the target function We are looking for the estimates and 16 )(ˆ cP )|(ˆ cdP P(c) is the fraction of possible worlds where c is true. N N cP c =)(ˆ N – number of all documents Nc – number of documents in class c d is a vector in the space X )|,,,()|( 2 ctttPcdP dni = where each dimension is a term: )()|()( BPBAPBAP =∧By using the chain rule: we have: (P ),,...,(),,...,|()|,,,( 2212 cttPctttPctttP ddd nnni = ...=
  • 17. Naïve assumptions of independence 1. All attribute values are independent of each other given the class. (conditional independence assumption) 2. The conditional probabilities for a term are the same independent of position in the document. We assume the document is a “bag-of-words”. 17 ∏≤≤ == d d nk kni ctPctttPcdP 1 2 )|()|,,,()|(  ∏≤≤∈∈ == dnk k CcCc MAP ctPcPdcPc 1 )|(ˆ)(ˆmaxarg)|(ˆmaxarg Finally, we get the target function of Slide 8:
  • 18. Again about estimation 18 For each term, t, we need to estimate P(t|c) ∑ ∈ = Vt ct ct T T ctP ' ' )|(ˆ Because an estimate will be 0 if a term does not appear with a class in the training data, we need smoothing: ||)( 1 )1( 1 )|(ˆ ' '' ' VT T T T ctP Vt ct ct Vt ct ct ∑∑ ∈∈ + + = + + =Laplace Smoothing |V| is the number of terms in the vocabulary Tct is the count of term t in all documents of class c
  • 19. An Example of classification with Naïve Bayes 19
  • 20. Example 13.1 (Part 1) 20 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? Two classes: “China”, “not China” N = 4 4/3)(ˆ =cP 4/1)(ˆ =cP V = {Beijing, Chinese, Japan, Macao, Tokyo}
  • 21. Example 13.1 (Part 1) 21 Training set docID c = China? 1 Chinese Beijing Chinese Yes 2 Chinese Chinese Shangai Yes 3 Chinese Macao Yes 4 Tokyo Japan Chinese No Test set 5 Chinese Chinese Chinese Tokyo Japan ? 7/3)68/()15()|Chinese(ˆ =++=cP 14/1)68/()10()|Japan(ˆ)|Tokyo(ˆ =++== cPcP 9/2)63/()11()|Chinese(ˆ =++=cP 9/2)63/()11()|Japan(ˆ)|Tokyo(ˆ =++== cPcP Estimation Classification ∏≤≤ ∝ dnk k ctPcPdcP 1 )|()()|( 0001.09/29/2)9/2(4/1)|( 0003.014/114/1)7/3(4/3)|( 3 5 3 5 ≈⋅⋅⋅∝ ≈⋅⋅⋅∝ dcP dcP
  • 22. Summary: Miscellanious Naïve Bayes is linear in the time is takes to scan the data When we have many terms, the product of probabilities with cause a floating point underflow, therefore: For a large training set, the vocabulary is large. It is better to select only a subset of terms. For that is used “feature selection” (Section 13.5). 22 ∑≤≤∈ += dnk k Cc MAP ctPcPc 1 )|(log)(ˆ[logmaxarg

Notes de l'éditeur

  1. Q: What is different in this definition from other types of computer programs? A: We do not speak about experience in other occasions, just about the task and performance criteria. Q: If the task T is speech recognition, could you imagine what would be E and P? A: E would be examples of spoken text, i.e., the computer has the written text and while someone speaks the computer matches the written words to the spoken words. P (performance) will be the number of words that the computer recognizes correctly.
  2. We give the target function at the beginning, but we say that we are going to explain later on how this formula is derived (after the refresher in probability). Give the example of selecting topics for the class project, that means, selecting c. Then, given c, the choice of d, is conditional, P(d|c).
  3. It is clear that calculating all the parameters that derive from the application of the chain rule is infeasible. Therefore, we need the naïve assumptions of independence in next page.