This document discusses Bayesian networks and their application in bioinformatics. It begins with an introduction to Bayesian networks, including how they can represent joint probability distributions and be used for classification. It then discusses learning Bayesian network structures from data and performing probabilistic inference. The document applies these concepts to analyzing microarray gene expression and drug activity data from cancer cell lines. It describes preprocessing the NCI60 dataset and learning a Bayesian network to model dependencies between genes, drugs and cancer types for purposes of target discovery.
2. Copyright (c) 2002 by SNU CSE Biointelligence Lab 2
Contents
Bayesian networks – preliminaries
Bayesian networks vs. causal networks
Partially DAG representation of the Bayesian network
Structural learning of the Bayesian network
Classification using Bayesian networks
Microarray data analysis with Bayesian networks
Experimental results on the NCI60 data set
Term Project #3
Diagnosis using Bayesian networks
3. Copyright (c) 2002 by SNU CSE Biointelligence Lab 3
Bayesian Networks
The joint probability distribution over all the variables in
the Bayesian network.
n
i i
i
n X
P
X
X
X
P 1
2
1 )
|
(
)
,...,
,
( Pa
)
|
(
)
|
(
)
,
|
(
)
(
)
(
)
,
,
,
|
(
)
,
,
|
(
)
,
|
(
)
|
(
)
(
)
,
,
,
,
(
C
E
P
B
D
P
B
A
C
P
B
P
A
P
D
C
B
A
E
P
C
B
A
D
P
B
A
C
P
A
B
P
A
P
E
D
C
B
A
P
B
A
C D
E
Local probability
distribution for Xi
i
i
i
i
i
i
iq
i
i
i
i
X
r
q
X
P
X
i
for
states
of
#
:
for
ions
configurat
of
#
:
)
|
(
for
parameter
~
)
,...,
(
of
parents
of
set
the
:
1
Pa
Pa
Pa
4. Copyright (c) 2002 by SNU CSE Biointelligence Lab 4
Knowing
the Joint Probability Distribution
We can calculate any conditional probability from the
joint probability distribution in principle.
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
This Bayesian network can classify
the examples by calculating the
appropriate conditional
probabilities.
P(Class| other variables)
5. Copyright (c) 2002 by SNU CSE Biointelligence Lab 5
Classification by Bayesian Networks I
Calculate the conditional probability of ‘Class’ variable
given the value of the other variables.
Infer the conditional probability from the joint probability
distribution.
For example,
where the summation is taken over all the possible class values.
,
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
|
(
Class
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
6. Copyright (c) 2002 by SNU CSE Biointelligence Lab 6
Knowing the Causal Structure
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
Gene C regulates Gene E and F.
Gene D regulates Gene G and H.
Class has an effect on Gene F and G.
7. Copyright (c) 2002 by SNU CSE Biointelligence Lab 7
Bayesian Networks vs. Causal Networks
Bayesian networks Causal networks
Network structure
Conditional
independencies
Causal relationships
By d-separation property of the Bayesian network
structure
The network structure asserts that every node is
conditionally independent from all of its non-
descendants given the values of its immediate parents.
8. Copyright (c) 2002 by SNU CSE Biointelligence Lab 8
Equivalent Two DAGs
X Y
X Y
These two DAGs assert
that X and Y are
dependent on each other.
the same conditional
independencies
equivalence class
Causal relationships are hard to learn from the
observational data.
9. Copyright (c) 2002 by SNU CSE Biointelligence Lab 9
Verma and Pearl’s Theorem
Theorem:
Two DAGs are equivalent if and only if they have the same
skeleton and the same v-structures.
X Y
Z
v-structure (X, Z, Y)
: X and Y are parents of Z and not
adjacent to each other.
10. Copyright (c) 2002 by SNU CSE Biointelligence Lab 10
PDAG Representations
Minimal PDAG representations of the equivalence class
The only directed edges are those that participate in v-structures.
Completed PDAG representation
Every directed edge corresponds to a compelled edge, and every
undirected edge corresponds to a reversible edge.
11. Copyright (c) 2002 by SNU CSE Biointelligence Lab 11
Example: PDAG Representations
X
Y
Z
W V X
Y
Z
W V
X
Y
Z
W V X
Y
Z
W V
An equivalence class
Minimal
PDAG
Completed
PDAG
12. Copyright (c) 2002 by SNU CSE Biointelligence Lab 12
Learning Bayesian Networks
Metric approach
Use a scoring metric to measure how well a particular structure
fits an observed set of cases.
A search algorithm is used. Find a canonical form of an
equivalence class.
Independence approach
An independence oracle (approximated by some statistical test)
is queried to identify the equivalence class that captures the
independencies in the distribution from which the data was
generated. Search for a PDAG
13. Copyright (c) 2002 by SNU CSE Biointelligence Lab 13
Scoring Metrics for Bayesian Networks
Likelihood L(G, G, C) = P(C|Gh, G)
Gh: the hypothesis that the data (C) was generated by a
distribution that can be factored according to G.
The maximum likelihood metric of G
)
,
,
(
max
)
,
( C
G
L
C
G
M G
ML
G
prefer the complete graph structure
14. Copyright (c) 2002 by SNU CSE Biointelligence Lab 14
Information Criterion Scoring Metrics
The Akaike information criterion (AIC) metric
The Bayesian information criterion (BIC) metric
)
(
)
,
(
log
)
,
( G
Dim
C
G
M
C
G
M ML
AIC
N
G
Dim
C
G
M
C
G
M ML
BIC log
)
(
2
1
)
,
(
log
)
,
(
15. Copyright (c) 2002 by SNU CSE Biointelligence Lab 15
MDL Scoring Metrics
The minimum description length (MDL) metric 1
The minimum description length (MDL) metric 1
)
,
(
)
(
log
)
,
(
1 C
G
M
G
P
C
G
M BIC
MDL
)
(
log
|
|
)
,
(
log
)
,
(
2 G
Dim
c
N
E
C
G
M
C
G
M G
ML
MDL
16. Copyright (c) 2002 by SNU CSE Biointelligence Lab 16
Bayesian Scoring Metrics
A Bayesian metric
The BDe (Bayesian Dirichlet & likelihood equivalence)
metric
c
G
C
P
G
P
C
G
M h
h
)
,
|
(
log
)
|
(
log
)
,
,
(
n
i
q
j
r
k ijk
ijk
ijk
ij
ij
ij
h
i i
N
N
N
N
N
N
G
C
P
1 1 1 )
'
(
)
'
(
)
'
(
)
'
(
)
,
|
(
17. Copyright (c) 2002 by SNU CSE Biointelligence Lab 17
Greedy Search Algorithm
for Bayesian Network Learning
Generate the initial Bayesian network structure G0.
For m = 1, 2, 3, …, until convergence.
Among all the possible local changes (insertion of an edge, reversal of
an edge, and deletion of an edge) in Gm–1, the one leads to the largest
improvement in the score is performed. The resulting graph is Gm.
Stopping criterion
Score(Gm–1) == Score(Gm).
At each iteration (learning Bayesian network consisting of n
variables)
O(n2) local changes should be evaluated to select the best one.
Random restarts is usually adopted to escape the local
maxima.
18. Copyright (c) 2002 by SNU CSE Biointelligence Lab 18
Probabilistic Inference
Calculate the conditional probability given the values of
the observed variables.
Junction tree algorithm
Sampling method
General probabilistic inference is intractable.
However, calculation of the conditional probability for the
classification is rather straightforward because of the property of
the Bayesian network structure.
19. Copyright (c) 2002 by SNU CSE Biointelligence Lab 19
The Markov Blanket
All the variables of interest
X = {X1, X2, …, Xn}
For a variable Xi, its Markov blanket MB(Xi) is the
subset of X – Xi which satisfies the following:
Markov boundary
Minimal Markov blanket
)).
(
|
(
)
|
( i
i
i
i X
X
P
X
X
P MB
X
20. Copyright (c) 2002 by SNU CSE Biointelligence Lab 20
Markov Blanket in Bayesian Networks
Given the Bayesian network structure, the determination
of the Markov blanket of a variable is straightforward.
By the conditional independence assertions.
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
The Markov blanket of a node in
the Bayesian network consists of
all of its parents, spouses, and
children.
21. Copyright (c) 2002 by SNU CSE Biointelligence Lab 21
Classification by Bayesian Networks II
)
,
|
(
)
,
|
(
)
,
|
(
)
,
|
(
)
,
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
,
|
(
)
,
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
|
(
D
Class
G
P
Class
C
F
P
B
A
Class
P
D
Class
G
P
Class
C
F
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
Class
G
P
Class
C
F
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
H
P
D
Class
G
P
Class
C
F
P
C
E
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
H
P
D
Class
G
P
Class
C
F
P
C
E
P
D
P
B
A
Class
P
C
P
B
P
A
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
Class
Class
Class
22. Copyright (c) 2002 by SNU CSE Biointelligence Lab 22
DNA Microarrays
Monitor thousands of gene expression levels
simultaneously traditional one gene experiments.
Fabricated by high-speed robotics.
Known
probes
23. Copyright (c) 2002 by SNU CSE Biointelligence Lab 23
A Comparative
Hybridization Experiment
Image
analysis
24. Copyright (c) 2002 by SNU CSE Biointelligence Lab 24
Mining on
Gene Expression and Drug Activity Data
Relationships among human cancer, gene expression, and drug
activity
Revealing these relationships
Cause and mechanisms of the cancer development
New molecular targets for anti-cancer drugs
Human cancer
Gene expression Drug activity
25. Copyright (c) 2002 by SNU CSE Biointelligence Lab 25
NCI (National Cancer Institute)
Drug Discovery Program
NCI 60
cell lines
data set
26. Copyright (c) 2002 by SNU CSE Biointelligence Lab 26
NCI60 Cell Lines Data Set
From 60 human cancer cell lines
Colorectal, renal, ovarian, breast, prostate, lung, and central
nervous system origin cancers, as well as leukemias and
melanomas
Gene expression patterns
cDNA microarray
Drug activity patterns
Sulphorhodamine B assay changes in total cellular protein
after 48 hours of drug treatment
27. Copyright (c) 2002 by SNU CSE Biointelligence Lab 27
Schematic View
of the Modeling Approach
Gene B
Cancer
Drug B
Drug A
Gene A
- Selected genes, drugs
and cancer type node
Drug A
Cancer
Drug B
Gene B
Gene A
< Learned Bayesian network >
- Dependency analysis
- Probabilistic inference
Drug activity
Data
Gene Expression
Data
Preprocessing
- Thresholding
- Clustering
- Discretization
28. Copyright (c) 2002 by SNU CSE Biointelligence Lab 28
Data Preparation
cDNA microarray data
Gene expression profiles on
60 cell lines
1376 60 matrix
Drug activity data
Drug activity patterns on 60
cell lines
118 60 matrix
(1376 + 118) 60 data matrix
60 samples
Gene
expressions
60 samples
Drug
activities
1376
genes
118
drugs
29. Copyright (c) 2002 by SNU CSE Biointelligence Lab 29
Preprocessing
Thresholding
Elimination of
unknown ESTs
805 genes
Elimination of drugs
which have more
than 4 missing
values 84 drugs
Discretization
Local probability
model for Bayesian
networks:
multinomial
distribution
1376
genes
118
drugs
60 samples
805
genes
84
drugs
60 samples
+ c
- c
1
0
-1
30. Copyright (c) 2002 by SNU CSE Biointelligence Lab 30
Bayesian Network Learning
for Gene-Drug Analysis
Large-scale Bayesian network
Several hundreds nodes (up to 890)
General greedy search is inapplicable because of time and space
complexity.
Search heuristics
Local to global search heuristics
Exploit the locality of Bayesian networks to reduce the entire
search space.
The local structure: Markov blanket
Find the candidate Markov blanket (of pre-determined size k) of
each node reduce the global search space
31. Copyright (c) 2002 by SNU CSE Biointelligence Lab 31
Local to Global Search Heuristics
Input:
- A data set D.
- An initial Bayesian network structure B0.
- A decomposable scoring metric,
Output: A Bayesian network structure B.
Loop for n = 1, 2, …, until convergence.
- Local Search Step:
* Based on D and Bn–1, select for Xi, a set CBi
n (|CBi
n| k) of candidate Markov blanket of Xi.
* For each set {Xi, CBi
n}, learn the local structure and determine the Markov blanket of Xi, BLn(Xi),
from this local structure.
* Merge all Markov blanket structures G({Xi, BLn(Xi)}, Ei) into a global network structure Hn
(could be cyclic).
- Global Search Step:
* Find the Bayesian network structure Bn Hn, which maximizes Score(Bn, D) and retains all non-
cyclic edges in Hn.
.
)
),
(
|
(
)
,
(
i i
B
i D
X
Pa
X
Score
D
B
Score
32. Copyright (c) 2002 by SNU CSE Biointelligence Lab 32
Dimensionality Problem
The number of attributes (nodes) >> sample size
Unreliable structure of the learned Bayesian networks
Probabilistic inference is nearly impossible.
Downsize the number of attributes by clustering
Prototype: mean of all members in a cluster
In the
preprocessing step
33. Copyright (c) 2002 by SNU CSE Biointelligence Lab 33
Bayesian Network with 45 Prototypes
Node types (46 nodes in all)
40 gene prototypes
5 drug prototypes
Cancer label
Discretization boundary
- c, + c
Bayesian network learning
Varying candidate Markov
blanket size (k = 5 ~ 15)
Select the best one
Three data sets (c = 0.43, 0.50,
0.60) three Bayesian
networks
Probabilistic inference
c Distribution Ratio
-1 0 1
0.43 33.3
%
33.3
%
33.3
%
0.50 30.8
%
38.3
%
30.8
%
0.60 27.4 45.1 27.4
34. Copyright (c) 2002 by SNU CSE Biointelligence Lab 34
Correlations between
ASNS and L-Asparaginase
Part of the Bayesian network (c = 0.60)
Prototype for ASNS and SID W
484773, PYRROLINE-5-
CARBOXYLATE REDUCTASE
[5':AA037688, 3':AA037689]
Prototype for L-Asparaginase
P(D2|G4) D2 = -1 D2 = 0 D2 = 1
G4 = -1 0.32096 0.27086 0.40818
G4 = 0 0.31387 0.41247 0.27366
G4 = 1 0.32167 0.34920 0.32913
< Conditional probability table >
35. Copyright (c) 2002 by SNU CSE Biointelligence Lab 35
Bayesian Networks
on Subset of Genes and Drugs
Node types (17 nodes in all)
12 genes
4 drugs
Cancer label
Discretization boundary
- c, + c
Bayesian network learning
General greedy search with
restart (100 times)
Select the best one
Three data sets (c = 0.43, 0.50,
0.60) three Bayesian
networks
Probabilistic inference
c Distribution Ratio
-1 0 1
0.43 33.3
%
33.3
%
33.3
%
0.50 30.8
%
38.3
%
30.8
%
0.60 27.4 45.1 27.4
Clustering of genes and drugs
together
- From neighboring clusters
36. Copyright (c) 2002 by SNU CSE Biointelligence Lab 36
Around the L-Asparaginase
< Part of the Bayesian network (c = 0.6) >
39. Copyright (c) 2002 by SNU CSE Biointelligence Lab 39
Outline
Task 1: Structural learning of the Bayesian network
Data generation from the ALARM network
Structural learning of Bayesian networks using more than two
kinds of algorithms and scores
Compare the learned results w.r.t. the edge errors according to
the various sample sizes and the learning algorithms
Task 2: Classification using Bayesian networks
Arbitrarily divide the Leukemia data set between the training set
and the test set
Learn the Bayesian network from the training data set using one
of the metric-based approaches
Evaluate the performance of the Bayesian network as a
classifier (classification accuracy)
40. Copyright (c) 2002 by SNU CSE Biointelligence Lab 40
Data Generation
Using the Netica Software (http://www.norsys.com)
The ALARM network
# of nodes: 37
# of edges: 46
41. Copyright (c) 2002 by SNU CSE Biointelligence Lab 41
Structural Learning
Independence method
BN Power constructor
(http://www.cs.ualberta.ca/~jcheng/bnsoft.htm)
Metric-based method
LearnBayes (http://www.cs.huji.ac.il/labs/compbio/LibB/)
MDL, BIC, BD, and likelihood score are can be used.
42. Copyright (c) 2002 by SNU CSE Biointelligence Lab 42
The Leukemia Data Set
Class type
ALL (acute lymphoblastic leukemia) or AML (acute myeloid
leukemia)
Data set
# of attributes: 50 gene expression levels (0 or 1)
# of samples: 72