SlideShare a Scribd company logo
1 of 10
Download to read offline
Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Week 1: Probabilities Refresher
Today’s Class
Christof Monz
Data Mining - Week 1: Probabilities Refresher
1
Quick refresher of probabilities
Essential Information Theory
Calculus in one slide
Probabilities: Refresher
Christof Monz
Data Mining - Week 1: Probabilities Refresher
2
Experiment (trial): Repeatable procedure with
well-defined possible outcomes
Sample Space (S): the set of all possible
outcomes (finite or infinite)
• Example: coin toss experiment possible outcomes:
S = {heads, tails}
• Example: die toss experiment possible outcomes:
S = {1,2,3,4,5,6}
Probabilities: Sample Space
Christof Monz
Data Mining - Week 1: Probabilities Refresher
3
Definition of sample space depends on what we
are asking
Sample Space (S): the set of all possible
outcomes
Example: die toss experiment for whether the
number is even or odd
• possible outcomes: {even, odd}
• not {1,2,3,4,5,6}
Probabilities: Definitions
Christof Monz
Data Mining - Week 1: Probabilities Refresher
4
An event is any subset of outcomes from the
sample space
Example: let A represent the event such that
the outcome of the die toss experiment is
divisible by 3
• A = {3,6}
• A is a subset of the sample space S= {1,2,3,4,5,6}
Example: suppose sample space S =
{heart,spade,club,diamond} (deck of cards)
• let A represent the event of drawing a heart: A =
{heart}
• let B represent the event of drawing a red card: B =
{heart,diamond}
Probability Function
Christof Monz
Data Mining - Week 1: Probabilities Refresher
5
The probability law assigns to an event a
nonnegative number called P(A) (also called the
probability of A)
P(A) encodes our knowledge or belief about the
collective likelihood of all the elements of A
Probability law must satisfy certain properties
Probability Axioms
Christof Monz
Data Mining - Week 1: Probabilities Refresher
6
Non-negativity: P(A) ≥ 0, for every event A
Additivity: If A and B are two disjoint events,
then the probability of their union satisfies:
P(A ∪B) = P(A)+P(B)
Normalization: The probability of the entire
sample space S is equal to 1, i.e. P(S) = 1
Probabilities: Example
Christof Monz
Data Mining - Week 1: Probabilities Refresher
7
An experiment involving a single coin toss
There are two possible outcomes, H and T, i.e.
the sample space S = {H,T}
If coin is fair, one should assign equal
probabilities to 2 outcomes
P({H}) = 0.5
P({T}) = 0.5
P({H,T}) = P({H})+P({T}) = 1.0
Probabilities: Example II
Christof Monz
Data Mining - Week 1: Probabilities Refresher
8
Experiment involving 3 coin tosses
Outcome is a 3-long string of H or T: S =
{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT}
Assume each outcome is equiprobable
(“Uniform distribution”)
What is probability of the event that exactly 2
heads occur?
A = {HHT,HTH,THH}
P(A) = P({HHT})+P({HTH})+P({THH})
P(A)= 1/8 + 1/8 + 1/8 = 3/8
Joint and Conditional Probabilities
Christof Monz
Data Mining - Week 1: Probabilities Refresher
9
The joint probability P(A,B) is the probability
of two events (A and B) occurring together
The conditional probability P(A|B): Assume
event B is the case, what is probability of event
A being the case as well?
Note: P(A|B) = P(A,B) (not necessarily)
Definition: P(A|B) =
P(A,B)
P(B)
P(A,B) = P(B,A) but P(A|B) = P(B|A)
Bayes’ Rule
Christof Monz
Data Mining - Week 1: Probabilities Refresher
10
Chain Rule:
P(A,B) = P(A|B)P(B) = P(B|A)P(A)
Bayes’ rule lets us swap the order of dependence
between events
P(A|B) =
P(B|A)P(A)
P(B)
Determining Probabilities
Christof Monz
Data Mining - Week 1: Probabilities Refresher
11
So far we have assumed that the values that P
assigns to events is given
Determining P is an important part of machine
learning
In an empirical setting, P is of estimated by
using relative frequencies:
• P(A) = freq(A)
N
where freq(A) is the frequency of A in a sample set, and
N is the size of the sample set
Entropy
Christof Monz
Data Mining - Week 1: Probabilities Refresher
12
Entropy measures the amount of uncertainty in
a variable (the variable ranges over points in the
sample space)
The amount of uncertainty is commonly
measured in bits
H(p) = H(X) = − ∑
x∈X
p(x)log2p(x)
Entropy: Example
Christof Monz
Data Mining - Week 1: Probabilities Refresher
13
let x represent the result of rolling a (fair)
8-sided die
H(X) = − ∑
x∈X
p(x)log2p(x)
H(X) = − ∑
x∈X
1/8log21/8
H(X) = − ∑
x∈X
1/8 ·−3 = 3
Each equiprobable outcome can be represented
by 3 bits:
1 2 3 4 5 6 7 8
001 010 011 100 101 110 111 000
Entropy: Better Encoding
Christof Monz
Data Mining - Week 1: Probabilities Refresher
14
If the probability distribution is not uniform, one
can achieve lower entropy
Example: Consider a unfair 4-sided die
value probability
1 0.5
2 0.125
3 0.125
4 0.25
H(X) = 0.5log20.5 +0.25log20.25 +
0.125log20.125 = 1.75
Entropy: Better Encoding
Christof Monz
Data Mining - Week 1: Probabilities Refresher
15
value probability code1 code2
1 0.5 00 0
2 0.125 01 110
3 0.125 10 111
4 0.25 11 10
Average number of bits:
• code1: 0.5 ·2bits +0.25 ·2bits +0.25 ·2bits = 2bits
• code2: 0.5 ·1bit +0.25 ·3bits +0.25 ·2bits = 1.75bits
Entropy: Saving Bits
Christof Monz
Data Mining - Week 1: Probabilities Refresher
16
Coding tree: How many yes-no questions must
be asked to determine each message?
0
10
110 111
In general, the optimal number of bits can be
computed as:
− log2p(x) bits for each message x ∈ X
or: log2
1
p(x)
bits for each message x ∈ X
Tiny Calculus Refresher
Christof Monz
Data Mining - Week 1: Probabilities Refresher
17
Derivate: The (first) derivative of function
allows us to compute the rate of change for any
point
Rate of change: slope of the tangent
For multi-variable functions we compute the
partial derivatives for each variable separately
Derivatives are computed by applying
differentiation rules:
• ∂
∂x
(φ+ψ) = ∂
∂x
φ+ ∂
∂x
ψ
• ∂
∂x
cxn
= cnxn−1
• ∂
∂x
f(g(x)) = f (g(x))g (x) (chain rule)
Recap
Christof Monz
Data Mining - Week 1: Probabilities Refresher
18
Probability distributions (joint, conditional)
Bayes’ rule
Entropy

More Related Content

Viewers also liked

Sw 7 triple20
Sw 7 triple20Sw 7 triple20
Sw 7 triple20okeee
 
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...okeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Mit press a semantic web primer - 2004 !! - (by laxxuss)
Mit press   a semantic web primer - 2004 !! - (by laxxuss)Mit press   a semantic web primer - 2004 !! - (by laxxuss)
Mit press a semantic web primer - 2004 !! - (by laxxuss)okeee
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4okeee
 
Sw practicumopdracht 1
Sw practicumopdracht 1Sw practicumopdracht 1
Sw practicumopdracht 1okeee
 
Kbms intro
Kbms introKbms intro
Kbms introokeee
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choookeee
 
Sw 5semantic web-primer
Sw 5semantic web-primerSw 5semantic web-primer
Sw 5semantic web-primerokeee
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-imageokeee
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Sw cursusoverzicht
Sw cursusoverzichtSw cursusoverzicht
Sw cursusoverzichtokeee
 
Kbms jan catin cont(1)
Kbms jan catin cont(1)Kbms jan catin cont(1)
Kbms jan catin cont(1)okeee
 
Chapter1 de vrieshuizing
Chapter1 de vrieshuizingChapter1 de vrieshuizing
Chapter1 de vrieshuizingokeee
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handoutokeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Sw semantic web
Sw semantic webSw semantic web
Sw semantic webokeee
 
Chapter5 bryant
Chapter5 bryantChapter5 bryant
Chapter5 bryantokeee
 

Viewers also liked (18)

Sw 7 triple20
Sw 7 triple20Sw 7 triple20
Sw 7 triple20
 
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
Really usefulebooks 0262012421_the mit press a semantic web primer 2nd editio...
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Mit press a semantic web primer - 2004 !! - (by laxxuss)
Mit press   a semantic web primer - 2004 !! - (by laxxuss)Mit press   a semantic web primer - 2004 !! - (by laxxuss)
Mit press a semantic web primer - 2004 !! - (by laxxuss)
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4
 
Sw practicumopdracht 1
Sw practicumopdracht 1Sw practicumopdracht 1
Sw practicumopdracht 1
 
Kbms intro
Kbms introKbms intro
Kbms intro
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choo
 
Sw 5semantic web-primer
Sw 5semantic web-primerSw 5semantic web-primer
Sw 5semantic web-primer
 
Kbms text-image
Kbms text-imageKbms text-image
Kbms text-image
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Sw cursusoverzicht
Sw cursusoverzichtSw cursusoverzicht
Sw cursusoverzicht
 
Kbms jan catin cont(1)
Kbms jan catin cont(1)Kbms jan catin cont(1)
Kbms jan catin cont(1)
 
Chapter1 de vrieshuizing
Chapter1 de vrieshuizingChapter1 de vrieshuizing
Chapter1 de vrieshuizing
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handout
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Sw semantic web
Sw semantic webSw semantic web
Sw semantic web
 
Chapter5 bryant
Chapter5 bryantChapter5 bryant
Chapter5 bryant
 

Similar to Dm week01 prob-refresher.handout

Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptLong Dang
 
Cs221 lecture4-fall11
Cs221 lecture4-fall11Cs221 lecture4-fall11
Cs221 lecture4-fall11darwinrlo
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Pythonfreshdatabos
 
1 chapter1 introduction
1 chapter1 introduction1 chapter1 introduction
1 chapter1 introductionSSE_AndyLi
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classificationsathish sak
 
Probability and statistics- Understanding
Probability and statistics- UnderstandingProbability and statistics- Understanding
Probability and statistics- UnderstandingMahmudHasan154
 
Probability and statistics- Understanding
Probability and statistics- UnderstandingProbability and statistics- Understanding
Probability and statistics- UnderstandingMahmudHasan154
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probabilityguest45a926
 
Lecture 2-cs648
Lecture 2-cs648Lecture 2-cs648
Lecture 2-cs648Rajiv Omar
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumStijn De Vuyst
 
Final Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxFinal Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxlmelaine
 

Similar to Dm week01 prob-refresher.handout (20)

Introduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.pptIntroduction to Bayesian Statistics.ppt
Introduction to Bayesian Statistics.ppt
 
bayesjaw.ppt
bayesjaw.pptbayesjaw.ppt
bayesjaw.ppt
 
Cs221 lecture4-fall11
Cs221 lecture4-fall11Cs221 lecture4-fall11
Cs221 lecture4-fall11
 
probability assignment help (2)
probability assignment help (2)probability assignment help (2)
probability assignment help (2)
 
An introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using PythonAn introduction to Bayesian Statistics using Python
An introduction to Bayesian Statistics using Python
 
PTSP PPT.pdf
PTSP PPT.pdfPTSP PPT.pdf
PTSP PPT.pdf
 
1 chapter1 introduction
1 chapter1 introduction1 chapter1 introduction
1 chapter1 introduction
 
Bayesian statistics
Bayesian statisticsBayesian statistics
Bayesian statistics
 
Bayes Classification
Bayes ClassificationBayes Classification
Bayes Classification
 
statistics assignment help
statistics assignment helpstatistics assignment help
statistics assignment help
 
Probability and statistics- Understanding
Probability and statistics- UnderstandingProbability and statistics- Understanding
Probability and statistics- Understanding
 
Probability and statistics- Understanding
Probability and statistics- UnderstandingProbability and statistics- Understanding
Probability and statistics- Understanding
 
2주차
2주차2주차
2주차
 
Probability Homework Help
Probability Homework Help Probability Homework Help
Probability Homework Help
 
Basic Concept Of Probability
Basic Concept Of ProbabilityBasic Concept Of Probability
Basic Concept Of Probability
 
Lecture 2-cs648
Lecture 2-cs648Lecture 2-cs648
Lecture 2-cs648
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Probability distributions
Probability distributions  Probability distributions
Probability distributions
 
Final Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docxFinal Exam ReviewChapter 10Know the three ideas of s.docx
Final Exam ReviewChapter 10Know the three ideas of s.docx
 

More from okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handoutokeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
Prob18
Prob18Prob18
Prob18okeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handoutokeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 
Kbms audio
Kbms audioKbms audio
Kbms audiookeee
 
Kbms video-app
Kbms video-appKbms video-app
Kbms video-appokeee
 
Sw owl rules-proposal
Sw owl rules-proposalSw owl rules-proposal
Sw owl rules-proposalokeee
 
Sw practicumopdracht 4
Sw practicumopdracht 4Sw practicumopdracht 4
Sw practicumopdracht 4okeee
 
Sw wordnet h1
Sw wordnet h1Sw wordnet h1
Sw wordnet h1okeee
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet introokeee
 

More from okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-handout
Dm part03 neural-networks-handoutDm part03 neural-networks-handout
Dm part03 neural-networks-handout
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
Prob18
Prob18Prob18
Prob18
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handout
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 
Kbms audio
Kbms audioKbms audio
Kbms audio
 
Kbms video-app
Kbms video-appKbms video-app
Kbms video-app
 
Sw owl rules-proposal
Sw owl rules-proposalSw owl rules-proposal
Sw owl rules-proposal
 
Sw practicumopdracht 4
Sw practicumopdracht 4Sw practicumopdracht 4
Sw practicumopdracht 4
 
Sw wordnet h1
Sw wordnet h1Sw wordnet h1
Sw wordnet h1
 
Sw wordnet intro
Sw wordnet introSw wordnet intro
Sw wordnet intro
 

Recently uploaded

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfSpandanaRallapalli
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxPoojaSen20
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 

Recently uploaded (20)

call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdfACC 2024 Chronicles. Cardiology. Exam.pdf
ACC 2024 Chronicles. Cardiology. Exam.pdf
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptxCulture Uniformity or Diversity IN SOCIOLOGY.pptx
Culture Uniformity or Diversity IN SOCIOLOGY.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 

Dm week01 prob-refresher.handout

  • 1. Christof Monz Informatics Institute University of Amsterdam Data Mining Week 1: Probabilities Refresher Today’s Class Christof Monz Data Mining - Week 1: Probabilities Refresher 1 Quick refresher of probabilities Essential Information Theory Calculus in one slide
  • 2. Probabilities: Refresher Christof Monz Data Mining - Week 1: Probabilities Refresher 2 Experiment (trial): Repeatable procedure with well-defined possible outcomes Sample Space (S): the set of all possible outcomes (finite or infinite) • Example: coin toss experiment possible outcomes: S = {heads, tails} • Example: die toss experiment possible outcomes: S = {1,2,3,4,5,6} Probabilities: Sample Space Christof Monz Data Mining - Week 1: Probabilities Refresher 3 Definition of sample space depends on what we are asking Sample Space (S): the set of all possible outcomes Example: die toss experiment for whether the number is even or odd • possible outcomes: {even, odd} • not {1,2,3,4,5,6}
  • 3. Probabilities: Definitions Christof Monz Data Mining - Week 1: Probabilities Refresher 4 An event is any subset of outcomes from the sample space Example: let A represent the event such that the outcome of the die toss experiment is divisible by 3 • A = {3,6} • A is a subset of the sample space S= {1,2,3,4,5,6} Example: suppose sample space S = {heart,spade,club,diamond} (deck of cards) • let A represent the event of drawing a heart: A = {heart} • let B represent the event of drawing a red card: B = {heart,diamond} Probability Function Christof Monz Data Mining - Week 1: Probabilities Refresher 5 The probability law assigns to an event a nonnegative number called P(A) (also called the probability of A) P(A) encodes our knowledge or belief about the collective likelihood of all the elements of A Probability law must satisfy certain properties
  • 4. Probability Axioms Christof Monz Data Mining - Week 1: Probabilities Refresher 6 Non-negativity: P(A) ≥ 0, for every event A Additivity: If A and B are two disjoint events, then the probability of their union satisfies: P(A ∪B) = P(A)+P(B) Normalization: The probability of the entire sample space S is equal to 1, i.e. P(S) = 1 Probabilities: Example Christof Monz Data Mining - Week 1: Probabilities Refresher 7 An experiment involving a single coin toss There are two possible outcomes, H and T, i.e. the sample space S = {H,T} If coin is fair, one should assign equal probabilities to 2 outcomes P({H}) = 0.5 P({T}) = 0.5 P({H,T}) = P({H})+P({T}) = 1.0
  • 5. Probabilities: Example II Christof Monz Data Mining - Week 1: Probabilities Refresher 8 Experiment involving 3 coin tosses Outcome is a 3-long string of H or T: S = {HHH,HHT,HTH,HTT,THH,THT,TTH,TTT} Assume each outcome is equiprobable (“Uniform distribution”) What is probability of the event that exactly 2 heads occur? A = {HHT,HTH,THH} P(A) = P({HHT})+P({HTH})+P({THH}) P(A)= 1/8 + 1/8 + 1/8 = 3/8 Joint and Conditional Probabilities Christof Monz Data Mining - Week 1: Probabilities Refresher 9 The joint probability P(A,B) is the probability of two events (A and B) occurring together The conditional probability P(A|B): Assume event B is the case, what is probability of event A being the case as well? Note: P(A|B) = P(A,B) (not necessarily) Definition: P(A|B) = P(A,B) P(B) P(A,B) = P(B,A) but P(A|B) = P(B|A)
  • 6. Bayes’ Rule Christof Monz Data Mining - Week 1: Probabilities Refresher 10 Chain Rule: P(A,B) = P(A|B)P(B) = P(B|A)P(A) Bayes’ rule lets us swap the order of dependence between events P(A|B) = P(B|A)P(A) P(B) Determining Probabilities Christof Monz Data Mining - Week 1: Probabilities Refresher 11 So far we have assumed that the values that P assigns to events is given Determining P is an important part of machine learning In an empirical setting, P is of estimated by using relative frequencies: • P(A) = freq(A) N where freq(A) is the frequency of A in a sample set, and N is the size of the sample set
  • 7. Entropy Christof Monz Data Mining - Week 1: Probabilities Refresher 12 Entropy measures the amount of uncertainty in a variable (the variable ranges over points in the sample space) The amount of uncertainty is commonly measured in bits H(p) = H(X) = − ∑ x∈X p(x)log2p(x) Entropy: Example Christof Monz Data Mining - Week 1: Probabilities Refresher 13 let x represent the result of rolling a (fair) 8-sided die H(X) = − ∑ x∈X p(x)log2p(x) H(X) = − ∑ x∈X 1/8log21/8 H(X) = − ∑ x∈X 1/8 ·−3 = 3 Each equiprobable outcome can be represented by 3 bits: 1 2 3 4 5 6 7 8 001 010 011 100 101 110 111 000
  • 8. Entropy: Better Encoding Christof Monz Data Mining - Week 1: Probabilities Refresher 14 If the probability distribution is not uniform, one can achieve lower entropy Example: Consider a unfair 4-sided die value probability 1 0.5 2 0.125 3 0.125 4 0.25 H(X) = 0.5log20.5 +0.25log20.25 + 0.125log20.125 = 1.75 Entropy: Better Encoding Christof Monz Data Mining - Week 1: Probabilities Refresher 15 value probability code1 code2 1 0.5 00 0 2 0.125 01 110 3 0.125 10 111 4 0.25 11 10 Average number of bits: • code1: 0.5 ·2bits +0.25 ·2bits +0.25 ·2bits = 2bits • code2: 0.5 ·1bit +0.25 ·3bits +0.25 ·2bits = 1.75bits
  • 9. Entropy: Saving Bits Christof Monz Data Mining - Week 1: Probabilities Refresher 16 Coding tree: How many yes-no questions must be asked to determine each message? 0 10 110 111 In general, the optimal number of bits can be computed as: − log2p(x) bits for each message x ∈ X or: log2 1 p(x) bits for each message x ∈ X Tiny Calculus Refresher Christof Monz Data Mining - Week 1: Probabilities Refresher 17 Derivate: The (first) derivative of function allows us to compute the rate of change for any point Rate of change: slope of the tangent For multi-variable functions we compute the partial derivatives for each variable separately Derivatives are computed by applying differentiation rules: • ∂ ∂x (φ+ψ) = ∂ ∂x φ+ ∂ ∂x ψ • ∂ ∂x cxn = cnxn−1 • ∂ ∂x f(g(x)) = f (g(x))g (x) (chain rule)
  • 10. Recap Christof Monz Data Mining - Week 1: Probabilities Refresher 18 Probability distributions (joint, conditional) Bayes’ rule Entropy