How to Troubleshoot Apps for the Modern Connected Worker
6334 Day 3 slides: Spanos-lecture-2
1. PHIL 6334 - Probability/Statistics Lecture Notes 2:
Conditional Probabilities and Bayes’ theorem
Aris Spanos [Spring 2014]
1
The view from the ( F ()) perspective
1.1
Conditional probability
Consider the probability set up described by the probability
space ( F ()) where - set of all possible outcomes, F field of events of interest, and () a probability set function
assigning probabilities to events in F
For any two events and in F the following formula for
conditional probability holds:
(|)=
( ∩ )
() 0
()
(1)
This formula treats the events and symmetrically, and
thus:
( ∩ )
() 0
(2)
(|)=
()
Solving (1) and (2) for ( ∩ ) yields the multiplication
rule:
( ∩ )= (|)· ()= (|)· ()
(3)
Substituting (3) into (1) yields the conditional probability:
(|)=
(|)· ()
() 0
()
1
(4)
2. Example. Consider the random experiment of tossing a
fair coin twice:
= {() ( ) ( ) ( )}
Let the events of interest be:
= {() ( ) ( )} ()=75
= {( ) ( ) ( )} ()=75
The conditional probability of given takes the form:
(|)=
( ∩ )
=
()
5 2
75 = 3
(5)
since (∩)= (( ) ( ))=5 Notice also that:
(|)=
5 2
75 = 3
→ (∩)= (|)· ()=
5
75 (75)
=5
Now consider introducing a third event:
= {() ( ) ( )} ()=75
What is the conditional probability of given and ?
(| ∩ )=
( ∩ [ ∩ ])
( ∩ ) 0
( ∩ )
which in light of the fact that:
( ∩ )= (|)· () → (|) 0 () 0
(∩)= (() ( ))=5 (∩)= (( ) ( ))=5
(∩∩)= ( )=25 which imply that:
(| ∩ )=
25 1
= (|)=
5
2
2
2
3
3. 1.2
Bayes’ theorem from the ( F ()) perspective
The conditional probability formula in (4) is transformed into
an updating rule by interpreting the two events and as a
hypothesis and evidence to yield Bayes’ formula:
(|)· ()
(|)=
() 0
(6)
()
(i) (|) as the posterior probability of ,
(ii) (|) is interpreted as the likelihood of ,
(iii) () is interpreted as the prior probability of , and
(iv) () is interpreted as the initial probability of evidence.
Remark 1: Viewed from the probability space ( F ())
perspective, (6) makes mathematical sense only when the hypothesis and evidence belong to the same field F This is
potentially problematic because in empirical modeling lives
in Plato’s world and lives in the real world. Hence, (6) presumes that the two worlds can be easily merged in with
and constituting overlapping events. However, Bayesians
feel timid to introduce ( ∩ ) and assign it a probability
using Bayes’ formula:
(∩)
(|) =
() 0
()
Instead, they replace (∩) with (|)· (), which
although mathematically equivalent, the terms (|) and
() can be given more beguiling interpretations! These issues become more insidious when Bayes’ formula is viewed
from the { (x; θ) θ∈Θ ∈R } perspective.
The most problematic of the probabilistic assignments (i)(iv) is () because it’s not obvious where the probability
3
4. could come from. The Bayesians seek to address this conundrum by defining (iv) in terms of (ii)-(iii). In particular, they
use and not- denoted by (¬, the "catch-all"), to define
a partition of :
= ∪ ¬
and then use ( ∪ ¬)= ()+ (¬)= ()=1
to deduce the total probability rule:
()= ()· (|) + (¬)· (|¬)
(7)
This rule holds for any set of events (1 2 ) that constitutes a partition of in the sense that if:
(1 ∪ 2 ∪ ∪ ) = ∩ =∅ for any 6= =1 2
X
()=
()· (|)
=1
The rule in (7) is often used to write Bayes’ formula as:
(|)=
(|)· ()
() 0
()· (|)+ (¬)· (|¬)
(8)
Remark 2: It is important to distinguish between the
formula for conditional probabilities (4), which is totally noncontroversial, and Bayes’ formula (8) which is controversial
because:
(a) it assumes that a hypothesis and evidence are just
overlapping events in the same field F and
(b) it invokes the total probability formula to assign a probability to
4
5. 1.3
Bayesian Confirmation Theory
The Bayesian confirmation theory relies on comparing the
prior with the posterior probability of hypothesis :
(|) ()
[i] Confirmation:
[ii] Disconfirmation: (|) ()
In case [i] evidence confirms hypothesis , and in case [ii]
evidence disconfirms hypothesis .
The degree of confirmation is measured using some
measure c( ) of the "degree to which raises the probability of ". The most popular such Bayesian measures are:
( )= (|) − ()
( )= (|) − (¬)
( )= (|) − ()
( )= (|) − (¬)
(|)
( )= ()
(|)
( )= (|¬)
One can use any one of the above measures to argue that:
According to measure c( ), evidence favors hypothesis 1 over 0 iff:
c(1 ) c(0 )
For instance using the measure ( ) in the case of two
competing hypotheses 0 and 1 :
(1 |)
(1 )
(0 |) Bayes (|1 )
⇔
(0 )
()
5
(|0 )
()
⇔
(|1 )
(|0 )
1
6. where (|1) is the (Bayesian) likelihood ratio.
(|0 )
For comparison purposes let us contrast this to the ratio of
the posteriors:
(1 |)
(0 |) =
(|1 )· (1 )
()
(|0 )· (0 )
()
which is the product of
= (|1)· (1) 1
(|0 )· (0 )
(|1 )
(|0 )
and the ratio of the priors
(1 )
(0 ) .
Remark 3: It is important to note that the above measures are considered different only when they are not ordinally equivalent in the sense that they give rise to the same
ranking. This, however, raises serious questions about the appropriateness of such measures since ordinal measures render
the differences between the same ranking uninterpretable; how
can one interpret such differences as measuring the degree of
confirmation.
6