Judging the Relevance and worth of ideas part 2.pptx
Lecture9 - Bayesian-Decision-Theory
1. Introduction to Machine
Learning
Lecture 9
Bayesian decision theory – An introduction
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull
2. Recap of Lecture 5-8
LET’S START WITH DATA
CLASSIFICATION
Slide 2
Artificial Intelligence Machine Learning
3. Recap of Lectures 5-8
We want to build decision trees
How can I automatically
generate these types
of trees?
Decide which attribute we
should put in each node
Decide a split point
Rely on information theory
We also saw many other improvements
Slide 3
Artificial Intelligence Machine Learning
4. Recap of Lecture 5-8
From kNN to CBR
15-NN 1-NN
Key aspects
Value of k
Distance functions
Slide 4
Artificial Intelligence Machine Learning
5. Today’s Agenda
Could we use probability to classify?
p y y
Where all began
Some anecdotes on the correct use of
probabilities
b biliti
Slide 5
Artificial Intelligence Introduction to C++
6. Why Bother about Prob.?
The world is a very uncertain place
Almost 40 years of AI and ML dealing with uncertain
domains
Some researchers decided to employ ideas from
probability to model concepts
Before saying more let’s go to the beginning
more… let s
Slide 6
Artificial Intelligence Machine Learning
7. Meeting the Reverend Thomas Bayes
Two main works:
Divine Benevolence or an Attempt to
Benevolence,
Prove That the Principal End of the Divine
Providence and Government is the
Happiness of Hi C t
H i f His Creatures (1731)
An Introduction to the Doctrine of Fluxions,
and a Defence of the Mathematicians
Against the Objections of the Author of the
Analyst (published anonymously in 1736)
But we are especially interested in:
Essay Towards Solving a Problem in the Doctrine of Chances (1764)
which was actually published p
yp posthumously by Richard Price
yy
Slide 7
Artificial Intelligence Machine Learning
8. Where These Ideas Came From?
Bayes build his theory upon several ideas
y yp
Immanuel Kant (1724-1804)
Copernican revolution: our understanding
of the external world had its foundations
not merely in experience, but in both experience
and a priori concepts, th offering a
d ii t thus ff i
non-empiricist critique of rationalist philosophy
Isaac Newton (1643-1727)
Universal gravitation
three laws of motion which dominated
the scientific view of the physical universe
for the next three centuries
Slide 8
Artificial Intelligence Machine Learning
9. What Was Bayes’ Point
Bayesian p
y probability
y
Notion of probability interpreted as partial belief rather than as
frequency
Bayesian estimation
Calculate the validity of a proposition
On the basis of a prior estimate of its probability and new
relevant evidence
E.g.:
Before Bayes, forward probability
Bf B f d b bilit
given a specified number of white and black balls in an urn, what
is the probability of drawing a black ball?
p y g
Bayes turned its attention to the converse problem
given that one or more balls have been drawn, what can be said
about the number of white and black balls in the urn?
Slide 9
Artificial Intelligence Machine Learning
10. Bayes’ Theorem
Outputs the most probable hypothesis h∈H, given the data D +
knowledge about prior probabilities of hypotheses in H
Terminology:
P(h|D): probability that h holds given data D. Posterior probability of h;
confidence that h holds given D.
P(h): prior probability of h (background knowledge we have about that h is a
correct hypothesis)
P(D): prior probability that training data D will be observed
P(D|h): probability of observing D given h holds
P (D | h )P (h )
P (h | D ) =
P (D )
Slide 10
Artificial Intelligence Machine Learning
11. Bayes’ Theorem
Given H the space of possible hypothesis
The
Th most probable h
b bl hypothesis i the one that maximizes P(h|D)
h i is h h ii P(h|D):
P (D | h )P (h )
hMAP ≡ arg max P (h | D ) = arg max = arg max P (D | h )P (h )
P (D )
h∈H
Slide 11
Artificial Intelligence Machine Learning
12. Is the Pope the Pope?
The chances that a randomly chosen human being is the Pope
y g p
are about 1 in 6 billion
Benedict XVI is the Pope
p
What are the chances that Benedict XVI is human?
(Beck-Bornholdt
(Beck Bornholdt and Dubben, 1996)
Dubben
Analogy to syllogistic reasoning: 1 in 6 billion
Slide 12
Artificial Intelligence Machine Learning
13. So, Is the Pope an Alien?
Where is the trick?
Probability of the data given a
hypothesis H: P(D|H)
ypo es s (|)
Probability of the hypothesis
ge
given the da a P(H|D)
e data: ( | )
P(D|H) is different from P(H|D)
So, i th P
S is the Pope An alien?
A li ?
Probability of being an alien P(A)
Probability of being human P(H)
Probability that the pope is an alien
P( Pope | Alien) P( Alien)
P( Alien | Pope) =
p
Human) + P( P
P( P
Pope | H
Human) P( H Pope | Ali ) P( Ali )
Alien Alien
Slide 13
Artificial Intelligence Machine Learning
14. So, Is the Pope an Alien?
What’s missing?
g
P(Pope|Alien)
P(Human)
P(H )
P(Alien)
Considering
Low values of P(Alien) and P(Pope|Alien)
And large values of P(Human)
f( )
We could “probably” say that the pope is not an alien!
Slide 14
Artificial Intelligence Machine Learning
15. More examples: Monty Hall
Stick or switch
Slide 15
Artificial Intelligence Machine Learning
16. Stick or Switch
I chose door number 3
Door 2 is uncovered
a d contains sheep
and co a s a s eep
They give me the chance to change the door
Should I?
Use probability, not faith,
to give an answer!
Slide 16
Artificial Intelligence Machine Learning
17. Stick or Switch
I should switch!
Slide 17
Artificial Intelligence Machine Learning
18. Yet Another Example: The Defendant’s Fallacy
The history of a murder
A suspect was caught
h
DNA test was positive
DNA test fails only 1 over 1 million times
So, my suspect must be guilty, right?
More specifically, it will be guilty with p = 0.999999. Agree?
Slide 18
Artificial Intelligence Machine Learning
19. The Defendant’s Fallacy
Where is the trick now?
P(coincides | innocent) as opposed to P(innocent|coincides)
P(coincides | innocent) commonly misused as the probability
of being innocent
P(innocent | coincides) is the probability of being guilty
( ) p y gg y
having that the test was positive!
Does this really matter?
Let’s
L t’ assume a city of 10 million i h bit t
it f illi inhabitants
We apply the test to all the 10 million inhabitants
How many of them will be positive?
10
Slide 19
Artificial Intelligence Machine Learning
20. The Defendant’s Fallacy
Two arguments
g
The prosecutor: There is 0.000001 that the suspect is innocent
The d f d t In thi it f
Th defendant: I this city of 10M people, the probability of th
l th b bilit f the
suspect being innocent is approximately 90%
Who is right?
The d f d t
Th defendant
Prove for that? You do the math
Slide 20
Artificial Intelligence Machine Learning
21. Next Class
How we can use these concepts in machine learning
Slide 21
Artificial Intelligence Introduction to C++
22. Introduction to Machine
Learning
Lecture 9
Bayesian decision theory – An introduction
Albert Orriols i Puig
aorriols@salle.url.edu
i l @ ll ld
Artificial Intelligence – Machine Learning
Enginyeria i Arquitectura La Salle
gy q
Universitat Ramon Llull