3. KAIST AIPR Lab.
Backgrounds
• Bayes’ rule
From the product rule, P( X Y ) P( X | Y ) P(Y ) P(Y | X ) P( X )
P(Y | X ) P( X | Y ) P(Y ) P( X | Y ) P(Y ), where is the normalization constant
P( X )
Combining evidence e
P( X | Y , e) P(Y | e)
P(Y | X , e)
P( X | e)
• Conditional independence
P( X , Y | Z ) P( X | Z ) P(Y | Z ) when X Y|Z
3
4. KAIST AIPR Lab.
Bayesian Network
• Causal relationships among random variables
• Directed acyclic graph
Node X i : random variables
Directed links: probabilistic relationships between variables
Acyclic: no links from any node to any lower node
• Link from node X to node Y, X is Parent (Y )
• Conditional probability distribution of X i
P( X i | Parents ( X i ))
Effect of the parents on the node X i
4
5. KAIST AIPR Lab.
Example of Bayesian Network
• Burglary network P(E)
0.002
P(B)
Burglary Earthquake
0.001
B E P(A|B,E)
T T 0.95
Alarm
T F 0.94
A P(J|A) F T 0.29
T 0.90 F F 0.001
F 0.05 JohnCalls Conditional Probability Tables
Directly influenced by Alarm A P(M|A)
MaryCalls
P( J | M A E B) P( J | A)
T 0.70
F 0.01
5
6. KAIST AIPR Lab.
Semantics of Bayesian Network
• Full joint probability distribution
Notation: P( x1 ,, xn ) abbreviated from P( X1 x1 X n xn )
n
P( x1 ,, xn ) P( xi | parents ( X i )),
i 1
where parents ( X i ) is the specific values of the variables in Parents ( X i )
• Constructing Bayesian networks
n
P( x1 ,, xn ) P(xi | xi 1 ,, x1 ) by chain rule
i 1
For every variable X i in the network,
• P( X i | X i 1 ,, X1 ) P( X i | Parents ( X i )) provided that Parents ( X i ) {X i 1 ,, X1}
Correctness
• Choose parents for each node s.t. this property holds
6
7. KAIST AIPR Lab.
Semantics of Bayesian Network (cont’d)
• Compactness
Locally structured system
• Interacts directly with only a bounded number of components
Complete network specified by n2 k conditional probabilities
where at most k parents
• Node ordering
Add “root causes” first
Add variables influenced, and so on
Until reach the “leaves”
• “Leaves”: no direct causal influence on others
7
8. KAIST AIPR Lab.
Three example of 3-node graphs
Tail-to-Tail Connection
• Node c is said to be tail-to-tail
c P(a, b) P(a | c) P(b | c) P(c)
c
a b a
b| 0
c P ( a, b | c )
P(a, b, c)
P(a | c) P(b | c)
P (c )
a b a b| c
• When node c is observed,
Node c blocks the path from a to b
Variables a and b are independent
8
9. KAIST AIPR Lab.
Three example of 3-node graphs
Head-to-Tail Connection
• Node c is said to be head-to-tail
P(a, b) P(a) P(c | a) P(b | c) P(a) P(b | a)
a c b c
a
b| 0
P(a, b, c) P(a) P(c | a) P(b | c)
P ( a, b | c ) P(a | c) P(b | c)
a c b P (c ) P (c )
a b| c
• When node c is observed,
Node c blocks the path from a to b
Variables a and b are independent
9
10. KAIST AIPR Lab.
Three example of 3-node graphs
Head-to-Head Connection
• Node c is said to be head-to-head
P(a, b, c) P(a) P(b) P(c | a, b)
a b
P(a, b, c) P(a, b), P(a) P(b) P(c | a, b) P(a) P(b)
c c c
a
b| 0
a b P ( a, b | c )
P(a, b, c) P(a) P(b) P(c | a, b)
P (c ) P (c )
c a b| c
• When node c is unobserved,
Node c blocks the path from a to b
Variables a and b are independent
10
11. KAIST AIPR Lab.
D-separation
• Let A, B, and C be arbitrary nonintersecting sets of nodes
• Paths from A to B is blocked if it includes either,
Head-to-tail or tail-to-tail node, and node is in C
Head-to-head node, and node and its descendants is not in C
• A is d-separated from B by C if,
Any node in possible paths from A to B blocks the path
a f a f
e b e b
c c
a b|c a b| f
11
12. KAIST AIPR Lab.
Conditional Independence Relations
• Conditionally independent of
U1 Um
its non-descendants, given its
parents Z1j X Znj
• Conditionally independent of Y1 Yn
all other nodes, given its
Markov blanket*
U1 Um
• In general, d-separation is used for
deciding independence Z1j X Znj
Y1 Yn
* Parents, children, and children’s other parents
12
13. KAIST AIPR Lab.
Probabilistic Inference In Bayesian Networks
• Notation
X: the query variable
E: the set of evidence variables, E1,…,Em
e: particular observed evidences
• Compute posterior probability distribution P( X | e)
• Exact inference
Inference by enumeration
Variable elimination algorithm
• Approximate inference
Direct sampling methods
Markov chain Monte Carlo (MCMC) algorithm
13
14. KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Inference By Enumeration
• P( X | e) P( X , e) P( X , e, y) where y is hidden var iable
y
• Recall, n
P( x1 ,, xn ) P( xi | parents ( X i ))
i 1
• Computing sums of products of conditional probabilities
• In Burglary example,
B E
P( B | j, m) P( B, j, m) P( B, e, a, j, m)
e a
P(b | j , m) P(b) P(e) P(a | b, e) P( j | a) P(m | a) A
e a
P(b) P(e) P(a | b, e) P( j | a) P(m | a) J M
e a
• O(2n) time complexity for n Boolean variables
14
15. KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm
• Eliminating repeated calculations of Enumeration
P( B | j, m) P( B) P( E ) P(a | B, e) P( j | a) P(m | a)
e a
Repeated calculations
15
16. KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm (cont’d)
• Evaluating in right-to-left order (bottom-up) B E
P( B | j, m) P( B) P( E ) P(a | B, e) P( j | a) P(m | a)
e a
• Each part of the expression makes factor A
P(m | a) P( j | a) J M
f M ( A) , f J ( A)
P(m | a P( j | a
• Pointwise product
f ( A) P( j | a) P(m | a)
P ( j | a ) P ( m | a )
JM
f AJM ( B, E ) f A (a, B, E ) f J (a) f M (a)
a
f E AJM ( B) f E (e) f AJM ( B, e)
e
P( B | j , m) f B ( B) f E AJM ( B)
16
17. KAIST AIPR Lab.
Exact Inference In Bayesian Networks
Variable Elimination Algorithm (cont’d)
• Repeat removing any leaf node that is not a query variable or
an evidence variable
• In Burglary example, P( J | B true) B E
P( J | b) P(b) P(e) P(a | b, e) P( J | a) P(m | a)
e a m
A
P(b) P(e) P(a | b, e) P( J | a)
e a
J M
• Time and space complexity
Dominated by the size of the largest factor
In the worst case, exponential time and space complexity
17
18. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Direct Sampling Methods
• Generating of samples from known probability distribution
• Sample each variable in topological order
• Function Prior-Sample(bn) returns an event sampled from the prior specified by bn
inputs: bn, a Bayesian network specifying joint distribution P(X1,…,Xn)
x ← an event with n elements
for i=1 to n do
xi ← a random sample from P(Xi | parents(Xi))
return x
• S PS ( x1 ,..., xn ) : the probability of specific event from Prior-Sample
n
S PS ( x1 ,..., xn ) P( xi | parents ( X i )) P( x1 , , xn )
i 1
N PS ( x1 ,..., xn )
lim S PS ( x1 ,..., xn ) P( x1 , , xn ) (Consistent estimate)
N N
where N(x1,...,xn ) is the frequency of the event x1 , , xn
18
19. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Rejection Sampling Methods
• Rejecting samples that is inconsistent with evidence
• Estimate by counting how often X x occurs
P( X | e) N PS ( X , e) N PS ( X , e)
ˆ
N PS (e)
P ( X , e)
P ( X | e) (Consistent estimate)
P ( e)
• Rejects samples exponentially as the number of evidence
variables grows
19
20. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Likelihood weighting
• Generating only consistent events w.r.t. the evidence
Fixes the values for the evidence variables E
Samples only the remaining variables X and Y
• function Likelihood-Weighting(X, e, bn, N) returns an estimate of P(X|e)
local variables: W, a vector of weighted counts over X, initially zero
for i=1 to N do
x, w ← Weighted-Sample(bn, e)
W[x] ← W[x]+w where x is the value of X in x
Return Normalize(W[X])
function Weighted-Sample(bn, e) returns an event and a weight
x ← an event with n elements; w ← 1
for i=1 to n do
if Xi has a value xi in e
then w ← w P( X i xi | parents ( X i ))
else xi ← a random sample from P( X i | parents ( X i ))
return x, w
20
21. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Likelihood weighting (cont’d)
• Sampling distribution SWS by Weighted-Sample
l
SWS ( z, e) P( zi | parents (Zi )) where Z {X} Y
i 1
• The likelihood weight w(z,e)
m
w( z, e) P(ei | parents ( Ei ))
i 1
• Weighted probability of a sample
l m
SWS ( z, e)w( z, e) P( zi | parents (Z i )) P(ei | parents ( Ei )
i 1 i 1
P ( z , e)
21
22. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Markov Chain Monte Carlo Algorithm
• Generating event by random change to one of nonevidence
variables Zi
• Zi conditioned on current values in the Markov blanket of Zi
• State specifying a value for every variables
• Long-run fraction of time spent in each state P( X | e)
• functionvariables: N[X], e, bn, N) returns an estimate of P(X|e)
local
MCMC-Ask(X,
a vector of counts over X, initially zero
Z, the nonevidence variables in bn
x, the current state of the network, initially copied from e
initialize x with random values for the variables in Z
for j=1 to N do
for each Zi in Z do
sample the value of Zi in x from P(Zi | mb(Zi )) given the values of mb( Z i ) in x
N[x]←N[x] + 1 where x is the value of X in x
return Normalize(N[X])
22
23. KAIST AIPR Lab.
Approximate Inference In Bayesian Networks
Markov Chain Monte Carlo Algorithm (cont’d)
• Markov chain on the state space
q( x x) : the probability of transition from state x to state x
• Consistency
Let X i be all the hidden var iables other than X i
q( x x) q(( xi , xi ) ( xi, xi )) P( xi | xi , e), called Gibbs sampler
Markov chain reached its stationary distribution if it has detailed
balance
23
24. KAIST AIPR Lab.
Summary
• Bayesian network
Directed acyclic graph expressing causal relationship
• Conditional independence
D-separation property
• Inference in Bayesian network
Enumeration: intractable
Variable elimination: efficient, but sensitive to topology
Direct sampling: estimate posterior probabilities
MCMC algorithm: powerful method for computing with
probability models
24
25. KAIST AIPR Lab.
References
[1] Stuart Russell et al., “Probabilistic Reasoning”, Artificial
Intelligence A Modern Approach, Chapter 14, pp.492-519
[2] Eugene Charniak, "Bayesian Networks without Tears", 1991
[3] C. Bishop, “Graphical Models”, Pattern Recognition and
Machine Learning, Chapter 8, pp.359-418
25
27. KAIST AIPR Lab.
Appendix 1. Example of Bad Node Ordering
• Two more links and unnatural probability judgments
① ②
MaryCalls
JohnCalls
③
Alarm
④ ⑤
Burglary Earthquake
27
28. KAIST AIPR Lab.
Appendix 2. Consistency of Likelihood Weighting
• P( x | e) NWS ( x, y, e) w( x, y, e)
ˆ from Likelihood-Weighting
y
' SWS ( x, y, e) w( x, y, e) for large N
y
' P ( x, y , e)
y
' P ( x , e)
P ( x | e) (Consistent estimate)
28
29. KAIST AIPR Lab.
Appendix 2. State Distribution of MCMC
• Detailed balance
Let πt(x) be the probability of systembeing in state x at time t
( x)q( x x) ( x)q( x x) for all x, x
• Gibbs sampler, q( x x) q(( xi , xi ) ( xi, xi )) P( xi | xi , e)
( x)q( x x) P( x | e) P( xi | xi , e) P( xi , xi | e) P( xi | xi , e)
P( xi | xi , e) P( xi | e) P( xi | xi , e) by chain rule on P( xi , xi | e)
P( xi | xi , e) P( xi, xi | e) by backwards chain rule
q(x x) (x)
• Stationary distribution if t t 1
t 1 ( x) ( x)q( x x) ( x)q( x x)
x x
( x) q( x x) ( x)
x
29