SlideShare a Scribd company logo
1 of 43
Bayesian Networks
in Bioinformatics
Kyu-Baek Hwang
Biointelligence Lab
School of Computer Science and Engineering
Seoul National University
kbhwang@bi.snu.ac.kr
Copyright (c) 2002 by SNU CSE Biointelligence Lab 2
Contents
 Bayesian networks – preliminaries
 Bayesian networks vs. causal networks
 Partially DAG representation of the Bayesian network
 Structural learning of the Bayesian network
 Classification using Bayesian networks
 Microarray data analysis with Bayesian networks
 Experimental results on the NCI60 data set
 Term Project #3
 Diagnosis using Bayesian networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab 3
Bayesian Networks
 The joint probability distribution over all the variables in
the Bayesian network.


n
i i
i
n X
P
X
X
X
P 1
2
1 )
|
(
)
,...,
,
( Pa
)
|
(
)
|
(
)
,
|
(
)
(
)
(
)
,
,
,
|
(
)
,
,
|
(
)
,
|
(
)
|
(
)
(
)
,
,
,
,
(
C
E
P
B
D
P
B
A
C
P
B
P
A
P
D
C
B
A
E
P
C
B
A
D
P
B
A
C
P
A
B
P
A
P
E
D
C
B
A
P


B
A
C D
E
Local probability
distribution for Xi
i
i
i
i
i
i
iq
i
i
i
i
X
r
q
X
P
X
i
for
states
of
#
:
for
ions
configurat
of
#
:
)
|
(
for
parameter
~
)
,...,
(
of
parents
of
set
the
:
1
Pa
Pa
Pa




Copyright (c) 2002 by SNU CSE Biointelligence Lab 4
Knowing
the Joint Probability Distribution
 We can calculate any conditional probability from the
joint probability distribution in principle.
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
This Bayesian network can classify
the examples by calculating the
appropriate conditional
probabilities.
 P(Class| other variables)
Copyright (c) 2002 by SNU CSE Biointelligence Lab 5
Classification by Bayesian Networks I
 Calculate the conditional probability of ‘Class’ variable
given the value of the other variables.
 Infer the conditional probability from the joint probability
distribution.
 For example,
 where the summation is taken over all the possible class values.
,
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
|
(


Class
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
Copyright (c) 2002 by SNU CSE Biointelligence Lab 6
Knowing the Causal Structure
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
Gene C regulates Gene E and F.
Gene D regulates Gene G and H.
Class has an effect on Gene F and G.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 7
Bayesian Networks vs. Causal Networks
Bayesian networks Causal networks
Network structure
Conditional
independencies
Causal relationships
By d-separation property of the Bayesian network
structure
 The network structure asserts that every node is
conditionally independent from all of its non-
descendants given the values of its immediate parents.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 8
Equivalent Two DAGs
X Y
X Y
These two DAGs assert
that X and Y are
dependent on each other.
 the same conditional
independencies
 equivalence class
Causal relationships are hard to learn from the
observational data.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 9
Verma and Pearl’s Theorem
 Theorem:
 Two DAGs are equivalent if and only if they have the same
skeleton and the same v-structures.
X Y
Z
v-structure (X, Z, Y)
: X and Y are parents of Z and not
adjacent to each other.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 10
PDAG Representations
 Minimal PDAG representations of the equivalence class
 The only directed edges are those that participate in v-structures.
 Completed PDAG representation
 Every directed edge corresponds to a compelled edge, and every
undirected edge corresponds to a reversible edge.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 11
Example: PDAG Representations
X
Y
Z
W V X
Y
Z
W V
X
Y
Z
W V X
Y
Z
W V
An equivalence class
Minimal
PDAG
Completed
PDAG
Copyright (c) 2002 by SNU CSE Biointelligence Lab 12
Learning Bayesian Networks
 Metric approach
 Use a scoring metric to measure how well a particular structure
fits an observed set of cases.
 A search algorithm is used.  Find a canonical form of an
equivalence class.
 Independence approach
 An independence oracle (approximated by some statistical test)
is queried to identify the equivalence class that captures the
independencies in the distribution from which the data was
generated.  Search for a PDAG
Copyright (c) 2002 by SNU CSE Biointelligence Lab 13
Scoring Metrics for Bayesian Networks
 Likelihood L(G, G, C) = P(C|Gh, G)
 Gh: the hypothesis that the data (C) was generated by a
distribution that can be factored according to G.
 The maximum likelihood metric of G
)
,
,
(
max
)
,
( C
G
L
C
G
M G
ML
G



 prefer the complete graph structure
Copyright (c) 2002 by SNU CSE Biointelligence Lab 14
Information Criterion Scoring Metrics
 The Akaike information criterion (AIC) metric
 The Bayesian information criterion (BIC) metric
)
(
)
,
(
log
)
,
( G
Dim
C
G
M
C
G
M ML
AIC 

N
G
Dim
C
G
M
C
G
M ML
BIC log
)
(
2
1
)
,
(
log
)
,
( 

Copyright (c) 2002 by SNU CSE Biointelligence Lab 15
MDL Scoring Metrics
 The minimum description length (MDL) metric 1
 The minimum description length (MDL) metric 1
)
,
(
)
(
log
)
,
(
1 C
G
M
G
P
C
G
M BIC
MDL 

)
(
log
|
|
)
,
(
log
)
,
(
2 G
Dim
c
N
E
C
G
M
C
G
M G
ML
MDL 



Copyright (c) 2002 by SNU CSE Biointelligence Lab 16
Bayesian Scoring Metrics
 A Bayesian metric
 The BDe (Bayesian Dirichlet & likelihood equivalence)
metric
c
G
C
P
G
P
C
G
M h
h


 )
,
|
(
log
)
|
(
log
)
,
,
( 


 
   







n
i
q
j
r
k ijk
ijk
ijk
ij
ij
ij
h
i i
N
N
N
N
N
N
G
C
P
1 1 1 )
'
(
)
'
(
)
'
(
)
'
(
)
,
|
( 
Copyright (c) 2002 by SNU CSE Biointelligence Lab 17
Greedy Search Algorithm
for Bayesian Network Learning
 Generate the initial Bayesian network structure G0.
 For m = 1, 2, 3, …, until convergence.
 Among all the possible local changes (insertion of an edge, reversal of
an edge, and deletion of an edge) in Gm–1, the one leads to the largest
improvement in the score is performed. The resulting graph is Gm.
 Stopping criterion
 Score(Gm–1) == Score(Gm).
 At each iteration (learning Bayesian network consisting of n
variables)
 O(n2) local changes should be evaluated to select the best one.
 Random restarts is usually adopted to escape the local
maxima.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 18
Probabilistic Inference
 Calculate the conditional probability given the values of
the observed variables.
 Junction tree algorithm
 Sampling method
 General probabilistic inference is intractable.
 However, calculation of the conditional probability for the
classification is rather straightforward because of the property of
the Bayesian network structure.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 19
The Markov Blanket
 All the variables of interest
 X = {X1, X2, …, Xn}
 For a variable Xi, its Markov blanket MB(Xi) is the
subset of X – Xi which satisfies the following:
 Markov boundary
 Minimal Markov blanket
)).
(
|
(
)
|
( i
i
i
i X
X
P
X
X
P MB
X 

Copyright (c) 2002 by SNU CSE Biointelligence Lab 20
Markov Blanket in Bayesian Networks
 Given the Bayesian network structure, the determination
of the Markov blanket of a variable is straightforward.
 By the conditional independence assertions.
Gene B
Class
Gene F Gene G
Gene A
Gene C Gene D
Gene E Gene H
The Markov blanket of a node in
the Bayesian network consists of
all of its parents, spouses, and
children.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 21
Classification by Bayesian Networks II
)
,
|
(
)
,
|
(
)
,
|
(
)
,
|
(
)
,
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
,
|
(
)
,
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
|
(
)
,
|
(
)
,
|
(
)
|
(
)
(
)
,
|
(
)
(
)
(
)
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
,
(
)
,
,
,
,
,
,
,
|
(
D
Class
G
P
Class
C
F
P
B
A
Class
P
D
Class
G
P
Class
C
F
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
Class
G
P
Class
C
F
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
H
P
D
Class
G
P
Class
C
F
P
C
E
P
D
P
B
A
Class
P
C
P
B
P
A
P
D
H
P
D
Class
G
P
Class
C
F
P
C
E
P
D
P
B
A
Class
P
C
P
B
P
A
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
H
Gene
G
Gene
F
Gene
E
Gene
D
Gene
C
Gene
B
Gene
A
Gene
Class
P
Class
Class
Class







Copyright (c) 2002 by SNU CSE Biointelligence Lab 22
DNA Microarrays
 Monitor thousands of gene expression levels
simultaneously  traditional one gene experiments.
 Fabricated by high-speed robotics.
Known
probes
Copyright (c) 2002 by SNU CSE Biointelligence Lab 23
A Comparative
Hybridization Experiment
Image
analysis
Copyright (c) 2002 by SNU CSE Biointelligence Lab 24
Mining on
Gene Expression and Drug Activity Data
 Relationships among human cancer, gene expression, and drug
activity
 Revealing these relationships 
 Cause and mechanisms of the cancer development
 New molecular targets for anti-cancer drugs
Human cancer
Gene expression Drug activity
Copyright (c) 2002 by SNU CSE Biointelligence Lab 25
NCI (National Cancer Institute)
Drug Discovery Program
NCI 60
cell lines
data set
Copyright (c) 2002 by SNU CSE Biointelligence Lab 26
NCI60 Cell Lines Data Set
 From 60 human cancer cell lines
 Colorectal, renal, ovarian, breast, prostate, lung, and central
nervous system origin cancers, as well as leukemias and
melanomas
 Gene expression patterns
 cDNA microarray
 Drug activity patterns
 Sulphorhodamine B assay  changes in total cellular protein
after 48 hours of drug treatment
Copyright (c) 2002 by SNU CSE Biointelligence Lab 27
Schematic View
of the Modeling Approach
Gene B
Cancer
Drug B
Drug A
Gene A
- Selected genes, drugs
and cancer type node
Drug A
Cancer
Drug B
Gene B
Gene A
< Learned Bayesian network >
- Dependency analysis
- Probabilistic inference
Drug activity
Data
Gene Expression
Data
Preprocessing
- Thresholding
- Clustering
- Discretization
Copyright (c) 2002 by SNU CSE Biointelligence Lab 28
Data Preparation
 cDNA microarray data
 Gene expression profiles on
60 cell lines
 1376  60 matrix
 Drug activity data
 Drug activity patterns on 60
cell lines
 118  60 matrix
(1376 + 118)  60 data matrix
60 samples
Gene
expressions
60 samples
Drug
activities
1376
genes
118
drugs
Copyright (c) 2002 by SNU CSE Biointelligence Lab 29
Preprocessing
 Thresholding
 Elimination of
unknown ESTs 
805 genes
 Elimination of drugs
which have more
than 4 missing
values  84 drugs
 Discretization
 Local probability
model for Bayesian
networks:
multinomial
distribution
1376
genes
118
drugs
60 samples
805
genes
84
drugs
60 samples
  + c
 - c
1
0
-1
Copyright (c) 2002 by SNU CSE Biointelligence Lab 30
Bayesian Network Learning
for Gene-Drug Analysis
 Large-scale Bayesian network
 Several hundreds nodes (up to 890)
 General greedy search is inapplicable because of time and space
complexity.
 Search heuristics
 Local to global search heuristics
 Exploit the locality of Bayesian networks to reduce the entire
search space.
 The local structure: Markov blanket
 Find the candidate Markov blanket (of pre-determined size k) of
each node  reduce the global search space
Copyright (c) 2002 by SNU CSE Biointelligence Lab 31
Local to Global Search Heuristics
Input:
- A data set D.
- An initial Bayesian network structure B0.
- A decomposable scoring metric,
Output: A Bayesian network structure B.
Loop for n = 1, 2, …, until convergence.
- Local Search Step:
* Based on D and Bn–1, select for Xi, a set CBi
n (|CBi
n|  k) of candidate Markov blanket of Xi.
* For each set {Xi, CBi
n}, learn the local structure and determine the Markov blanket of Xi, BLn(Xi),
from this local structure.
* Merge all Markov blanket structures G({Xi, BLn(Xi)}, Ei) into a global network structure Hn
(could be cyclic).
- Global Search Step:
* Find the Bayesian network structure Bn  Hn, which maximizes Score(Bn, D) and retains all non-
cyclic edges in Hn.
.
)
),
(
|
(
)
,
( 
 i i
B
i D
X
Pa
X
Score
D
B
Score
Copyright (c) 2002 by SNU CSE Biointelligence Lab 32
Dimensionality Problem
 The number of attributes (nodes) >> sample size
 Unreliable structure of the learned Bayesian networks
 Probabilistic inference is nearly impossible.
 Downsize the number of attributes by clustering
 Prototype: mean of all members in a cluster
In the
preprocessing step
Copyright (c) 2002 by SNU CSE Biointelligence Lab 33
Bayesian Network with 45 Prototypes
 Node types (46 nodes in all)
 40 gene prototypes
 5 drug prototypes
 Cancer label
 Discretization boundary
  - c,  + c
 Bayesian network learning
 Varying candidate Markov
blanket size (k = 5 ~ 15)
 Select the best one
 Three data sets (c = 0.43, 0.50,
0.60)  three Bayesian
networks
 Probabilistic inference
c Distribution Ratio
-1 0 1
0.43 33.3
%
33.3
%
33.3
%
0.50 30.8
%
38.3
%
30.8
%
0.60 27.4 45.1 27.4
Copyright (c) 2002 by SNU CSE Biointelligence Lab 34
Correlations between
ASNS and L-Asparaginase
 Part of the Bayesian network (c = 0.60)
Prototype for ASNS and SID W
484773, PYRROLINE-5-
CARBOXYLATE REDUCTASE
[5':AA037688, 3':AA037689]
Prototype for L-Asparaginase
P(D2|G4) D2 = -1 D2 = 0 D2 = 1
G4 = -1 0.32096 0.27086 0.40818
G4 = 0 0.31387 0.41247 0.27366
G4 = 1 0.32167 0.34920 0.32913
< Conditional probability table >
Copyright (c) 2002 by SNU CSE Biointelligence Lab 35
Bayesian Networks
on Subset of Genes and Drugs
 Node types (17 nodes in all)
 12 genes
 4 drugs
 Cancer label
 Discretization boundary
  - c,  + c
 Bayesian network learning
 General greedy search with
restart (100 times)
 Select the best one
 Three data sets (c = 0.43, 0.50,
0.60)  three Bayesian
networks
 Probabilistic inference
c Distribution Ratio
-1 0 1
0.43 33.3
%
33.3
%
33.3
%
0.50 30.8
%
38.3
%
30.8
%
0.60 27.4 45.1 27.4
Clustering of genes and drugs
together
- From neighboring clusters
Copyright (c) 2002 by SNU CSE Biointelligence Lab 36
Around the L-Asparaginase
< Part of the Bayesian network (c = 0.6) >
Copyright (c) 2002 by SNU CSE Biointelligence Lab 37
Probabilistic Relationships
Around the L-Asparaginase
 Cancer type unobserved
 D1: L-Asparaginase
 G1: ASNS gene
 G2: PYRROLINE-5-CARBOXYLATE
REDUCTASE
P(D1|G1) D1 = -1 D1 = 0 D1 = 1
G1 = -1 0.19857 0.27471 0.52672
G1 = 0 0.31110 0.49795 0.19095
G1 = 1 0.42159 0.36279 0.21561
 Cancer type observed (= leukemia)
 D1: L-Asparaginase
 G1: ASNS gene
 G2: PYRROLINE-5-CARBOXYLATE
REDUCTASE
P(D1|G2) D1 = -1 D1 = 0 D1 = 1
G2 = -1 0.27510 0.35226 0.37263
G2 = 0 0.31621 0.41072 0.27307
G2 = 1 0.33837 0.39664 0.26499
P(D1|G1,L) D1 = -1 D1 = 0 D1 = 1
G1 = -1 0.17536 0.22838 0.59626
G1 = 0 0.27128 0.53790 0.19081
G1 = 1 0.38500 0.42437 0.19063
P(D1|G2,L) D1 = -1 D1 = 0 D1 = 1
G2 = -1 0.23812 0.33853 0.42335
G2 = 0 0.27978 0.42666 0.29356
G2 = 1 0.30371 0.42108 0.27520
Term Project #3:
Diagnosis Using Bayesian Networks
Copyright (c) 2002 by SNU CSE Biointelligence Lab 39
Outline
 Task 1: Structural learning of the Bayesian network
 Data generation from the ALARM network
 Structural learning of Bayesian networks using more than two
kinds of algorithms and scores
 Compare the learned results w.r.t. the edge errors according to
the various sample sizes and the learning algorithms
 Task 2: Classification using Bayesian networks
 Arbitrarily divide the Leukemia data set between the training set
and the test set
 Learn the Bayesian network from the training data set using one
of the metric-based approaches
 Evaluate the performance of the Bayesian network as a
classifier (classification accuracy)
Copyright (c) 2002 by SNU CSE Biointelligence Lab 40
Data Generation
 Using the Netica Software (http://www.norsys.com)
 The ALARM network
 # of nodes: 37
 # of edges: 46
Copyright (c) 2002 by SNU CSE Biointelligence Lab 41
Structural Learning
 Independence method
 BN Power constructor
(http://www.cs.ualberta.ca/~jcheng/bnsoft.htm)
 Metric-based method
 LearnBayes (http://www.cs.huji.ac.il/labs/compbio/LibB/)
 MDL, BIC, BD, and likelihood score are can be used.
Copyright (c) 2002 by SNU CSE Biointelligence Lab 42
The Leukemia Data Set
 Class type
 ALL (acute lymphoblastic leukemia) or AML (acute myeloid
leukemia)
 Data set
 # of attributes: 50 gene expression levels (0 or 1)
 # of samples: 72
Copyright (c) 2002 by SNU CSE Biointelligence Lab 43
Submission
 Deadline: 2002. 11. 27
 Location: 301-419

More Related Content

Similar to Project3.ppt

Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
JULIO GONZALEZ SANZ
 
Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296
Editor IJARCET
 
Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296
Editor IJARCET
 

Similar to Project3.ppt (20)

Tutorial on Polynomial Networks at CVPR'22
Tutorial on Polynomial Networks at CVPR'22Tutorial on Polynomial Networks at CVPR'22
Tutorial on Polynomial Networks at CVPR'22
 
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco Scutari
 
Sgg crest-presentation-final
Sgg crest-presentation-finalSgg crest-presentation-final
Sgg crest-presentation-final
 
論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels論文紹介:Learning With Neighbor Consistency for Noisy Labels
論文紹介:Learning With Neighbor Consistency for Noisy Labels
 
A GPU-accelerated bioinformatics application for large-scale protein interact...
A GPU-accelerated bioinformatics application for large-scale protein interact...A GPU-accelerated bioinformatics application for large-scale protein interact...
A GPU-accelerated bioinformatics application for large-scale protein interact...
 
Materials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum ComputationMaterials Design in the Age of Deep Learning and Quantum Computation
Materials Design in the Age of Deep Learning and Quantum Computation
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]Introduction to bayesian_networks[1]
Introduction to bayesian_networks[1]
 
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetupLucas Theis - Compressing Images with Neural Networks - Creative AI meetup
Lucas Theis - Compressing Images with Neural Networks - Creative AI meetup
 
Performance analysis of transformation and bogdonov chaotic substitution base...
Performance analysis of transformation and bogdonov chaotic substitution base...Performance analysis of transformation and bogdonov chaotic substitution base...
Performance analysis of transformation and bogdonov chaotic substitution base...
 
Cornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 NetsCornell Pbsb 20090126 Nets
Cornell Pbsb 20090126 Nets
 
20320140503011
2032014050301120320140503011
20320140503011
 
009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain Networks009_20150201_Structural Inference for Uncertain Networks
009_20150201_Structural Inference for Uncertain Networks
 
Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296
 
Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296Ijarcet vol-2-issue-7-2292-2296
Ijarcet vol-2-issue-7-2292-2296
 
Analysis of the Iriscode Bioencoding Scheme
Analysis of the Iriscode Bioencoding SchemeAnalysis of the Iriscode Bioencoding Scheme
Analysis of the Iriscode Bioencoding Scheme
 
Fitness function X-means for prolonging wireless sensor networks lifetime
Fitness function X-means for prolonging wireless sensor  networks lifetimeFitness function X-means for prolonging wireless sensor  networks lifetime
Fitness function X-means for prolonging wireless sensor networks lifetime
 
Field-programmable gate array design of image encryption and decryption usin...
Field-programmable gate array design of image encryption and  decryption usin...Field-programmable gate array design of image encryption and  decryption usin...
Field-programmable gate array design of image encryption and decryption usin...
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 

More from ssuser30e7d2 (7)

PACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptxPACT_conference_2019_Tutorial_02_gpgpusim.pptx
PACT_conference_2019_Tutorial_02_gpgpusim.pptx
 
isca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptxisca22-feng-menda_for sparse transposition and dataflow.pptx
isca22-feng-menda_for sparse transposition and dataflow.pptx
 
xeon phi_mattan erez.pptx
xeon phi_mattan erez.pptxxeon phi_mattan erez.pptx
xeon phi_mattan erez.pptx
 
xeon phi_mattan erez.pptx
xeon phi_mattan erez.pptxxeon phi_mattan erez.pptx
xeon phi_mattan erez.pptx
 
FOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdfFOSDEM_2019_Buildroot_RISCV.pdf
FOSDEM_2019_Buildroot_RISCV.pdf
 
Gunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdfGunjae_ISCA15_slides.pdf
Gunjae_ISCA15_slides.pdf
 
shift register.ppt
shift register.pptshift register.ppt
shift register.ppt
 

Recently uploaded

SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
gindu3009
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
Sérgio Sacani
 

Recently uploaded (20)

❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
American Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptxAmerican Type Culture Collection (ATCC).pptx
American Type Culture Collection (ATCC).pptx
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICESAMASTIPUR CALL GIRL 7857803690  LOW PRICE  ESCORT SERVICE
SAMASTIPUR CALL GIRL 7857803690 LOW PRICE ESCORT SERVICE
 
module for grade 9 for distance learning
module for grade 9 for distance learningmodule for grade 9 for distance learning
module for grade 9 for distance learning
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Zoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdfZoology 4th semester series (krishna).pdf
Zoology 4th semester series (krishna).pdf
 

Project3.ppt

  • 1. Bayesian Networks in Bioinformatics Kyu-Baek Hwang Biointelligence Lab School of Computer Science and Engineering Seoul National University kbhwang@bi.snu.ac.kr
  • 2. Copyright (c) 2002 by SNU CSE Biointelligence Lab 2 Contents  Bayesian networks – preliminaries  Bayesian networks vs. causal networks  Partially DAG representation of the Bayesian network  Structural learning of the Bayesian network  Classification using Bayesian networks  Microarray data analysis with Bayesian networks  Experimental results on the NCI60 data set  Term Project #3  Diagnosis using Bayesian networks
  • 3. Copyright (c) 2002 by SNU CSE Biointelligence Lab 3 Bayesian Networks  The joint probability distribution over all the variables in the Bayesian network.   n i i i n X P X X X P 1 2 1 ) | ( ) ,..., , ( Pa ) | ( ) | ( ) , | ( ) ( ) ( ) , , , | ( ) , , | ( ) , | ( ) | ( ) ( ) , , , , ( C E P B D P B A C P B P A P D C B A E P C B A D P B A C P A B P A P E D C B A P   B A C D E Local probability distribution for Xi i i i i i i iq i i i i X r q X P X i for states of # : for ions configurat of # : ) | ( for parameter ~ ) ,..., ( of parents of set the : 1 Pa Pa Pa    
  • 4. Copyright (c) 2002 by SNU CSE Biointelligence Lab 4 Knowing the Joint Probability Distribution  We can calculate any conditional probability from the joint probability distribution in principle. Gene B Class Gene F Gene G Gene A Gene C Gene D Gene E Gene H This Bayesian network can classify the examples by calculating the appropriate conditional probabilities.  P(Class| other variables)
  • 5. Copyright (c) 2002 by SNU CSE Biointelligence Lab 5 Classification by Bayesian Networks I  Calculate the conditional probability of ‘Class’ variable given the value of the other variables.  Infer the conditional probability from the joint probability distribution.  For example,  where the summation is taken over all the possible class values. , ) , , , , , , , , ( ) , , , , , , , , ( ) , , , , , , , | (   Class H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P
  • 6. Copyright (c) 2002 by SNU CSE Biointelligence Lab 6 Knowing the Causal Structure Gene B Class Gene F Gene G Gene A Gene C Gene D Gene E Gene H Gene C regulates Gene E and F. Gene D regulates Gene G and H. Class has an effect on Gene F and G.
  • 7. Copyright (c) 2002 by SNU CSE Biointelligence Lab 7 Bayesian Networks vs. Causal Networks Bayesian networks Causal networks Network structure Conditional independencies Causal relationships By d-separation property of the Bayesian network structure  The network structure asserts that every node is conditionally independent from all of its non- descendants given the values of its immediate parents.
  • 8. Copyright (c) 2002 by SNU CSE Biointelligence Lab 8 Equivalent Two DAGs X Y X Y These two DAGs assert that X and Y are dependent on each other.  the same conditional independencies  equivalence class Causal relationships are hard to learn from the observational data.
  • 9. Copyright (c) 2002 by SNU CSE Biointelligence Lab 9 Verma and Pearl’s Theorem  Theorem:  Two DAGs are equivalent if and only if they have the same skeleton and the same v-structures. X Y Z v-structure (X, Z, Y) : X and Y are parents of Z and not adjacent to each other.
  • 10. Copyright (c) 2002 by SNU CSE Biointelligence Lab 10 PDAG Representations  Minimal PDAG representations of the equivalence class  The only directed edges are those that participate in v-structures.  Completed PDAG representation  Every directed edge corresponds to a compelled edge, and every undirected edge corresponds to a reversible edge.
  • 11. Copyright (c) 2002 by SNU CSE Biointelligence Lab 11 Example: PDAG Representations X Y Z W V X Y Z W V X Y Z W V X Y Z W V An equivalence class Minimal PDAG Completed PDAG
  • 12. Copyright (c) 2002 by SNU CSE Biointelligence Lab 12 Learning Bayesian Networks  Metric approach  Use a scoring metric to measure how well a particular structure fits an observed set of cases.  A search algorithm is used.  Find a canonical form of an equivalence class.  Independence approach  An independence oracle (approximated by some statistical test) is queried to identify the equivalence class that captures the independencies in the distribution from which the data was generated.  Search for a PDAG
  • 13. Copyright (c) 2002 by SNU CSE Biointelligence Lab 13 Scoring Metrics for Bayesian Networks  Likelihood L(G, G, C) = P(C|Gh, G)  Gh: the hypothesis that the data (C) was generated by a distribution that can be factored according to G.  The maximum likelihood metric of G ) , , ( max ) , ( C G L C G M G ML G     prefer the complete graph structure
  • 14. Copyright (c) 2002 by SNU CSE Biointelligence Lab 14 Information Criterion Scoring Metrics  The Akaike information criterion (AIC) metric  The Bayesian information criterion (BIC) metric ) ( ) , ( log ) , ( G Dim C G M C G M ML AIC   N G Dim C G M C G M ML BIC log ) ( 2 1 ) , ( log ) , (  
  • 15. Copyright (c) 2002 by SNU CSE Biointelligence Lab 15 MDL Scoring Metrics  The minimum description length (MDL) metric 1  The minimum description length (MDL) metric 1 ) , ( ) ( log ) , ( 1 C G M G P C G M BIC MDL   ) ( log | | ) , ( log ) , ( 2 G Dim c N E C G M C G M G ML MDL    
  • 16. Copyright (c) 2002 by SNU CSE Biointelligence Lab 16 Bayesian Scoring Metrics  A Bayesian metric  The BDe (Bayesian Dirichlet & likelihood equivalence) metric c G C P G P C G M h h    ) , | ( log ) | ( log ) , , (                 n i q j r k ijk ijk ijk ij ij ij h i i N N N N N N G C P 1 1 1 ) ' ( ) ' ( ) ' ( ) ' ( ) , | ( 
  • 17. Copyright (c) 2002 by SNU CSE Biointelligence Lab 17 Greedy Search Algorithm for Bayesian Network Learning  Generate the initial Bayesian network structure G0.  For m = 1, 2, 3, …, until convergence.  Among all the possible local changes (insertion of an edge, reversal of an edge, and deletion of an edge) in Gm–1, the one leads to the largest improvement in the score is performed. The resulting graph is Gm.  Stopping criterion  Score(Gm–1) == Score(Gm).  At each iteration (learning Bayesian network consisting of n variables)  O(n2) local changes should be evaluated to select the best one.  Random restarts is usually adopted to escape the local maxima.
  • 18. Copyright (c) 2002 by SNU CSE Biointelligence Lab 18 Probabilistic Inference  Calculate the conditional probability given the values of the observed variables.  Junction tree algorithm  Sampling method  General probabilistic inference is intractable.  However, calculation of the conditional probability for the classification is rather straightforward because of the property of the Bayesian network structure.
  • 19. Copyright (c) 2002 by SNU CSE Biointelligence Lab 19 The Markov Blanket  All the variables of interest  X = {X1, X2, …, Xn}  For a variable Xi, its Markov blanket MB(Xi) is the subset of X – Xi which satisfies the following:  Markov boundary  Minimal Markov blanket )). ( | ( ) | ( i i i i X X P X X P MB X  
  • 20. Copyright (c) 2002 by SNU CSE Biointelligence Lab 20 Markov Blanket in Bayesian Networks  Given the Bayesian network structure, the determination of the Markov blanket of a variable is straightforward.  By the conditional independence assertions. Gene B Class Gene F Gene G Gene A Gene C Gene D Gene E Gene H The Markov blanket of a node in the Bayesian network consists of all of its parents, spouses, and children.
  • 21. Copyright (c) 2002 by SNU CSE Biointelligence Lab 21 Classification by Bayesian Networks II ) , | ( ) , | ( ) , | ( ) , | ( ) , | ( ) ( ) , | ( ) ( ) ( ) ( ) , | ( ) , | ( ) ( ) , | ( ) ( ) ( ) ( ) | ( ) , | ( ) , | ( ) | ( ) ( ) , | ( ) ( ) ( ) ( ) | ( ) , | ( ) , | ( ) | ( ) ( ) , | ( ) ( ) ( ) ( ) , , , , , , , , ( ) , , , , , , , , ( ) , , , , , , , | ( D Class G P Class C F P B A Class P D Class G P Class C F P D P B A Class P C P B P A P D Class G P Class C F P D P B A Class P C P B P A P D H P D Class G P Class C F P C E P D P B A Class P C P B P A P D H P D Class G P Class C F P C E P D P B A Class P C P B P A P H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P H Gene G Gene F Gene E Gene D Gene C Gene B Gene A Gene Class P Class Class Class       
  • 22. Copyright (c) 2002 by SNU CSE Biointelligence Lab 22 DNA Microarrays  Monitor thousands of gene expression levels simultaneously  traditional one gene experiments.  Fabricated by high-speed robotics. Known probes
  • 23. Copyright (c) 2002 by SNU CSE Biointelligence Lab 23 A Comparative Hybridization Experiment Image analysis
  • 24. Copyright (c) 2002 by SNU CSE Biointelligence Lab 24 Mining on Gene Expression and Drug Activity Data  Relationships among human cancer, gene expression, and drug activity  Revealing these relationships   Cause and mechanisms of the cancer development  New molecular targets for anti-cancer drugs Human cancer Gene expression Drug activity
  • 25. Copyright (c) 2002 by SNU CSE Biointelligence Lab 25 NCI (National Cancer Institute) Drug Discovery Program NCI 60 cell lines data set
  • 26. Copyright (c) 2002 by SNU CSE Biointelligence Lab 26 NCI60 Cell Lines Data Set  From 60 human cancer cell lines  Colorectal, renal, ovarian, breast, prostate, lung, and central nervous system origin cancers, as well as leukemias and melanomas  Gene expression patterns  cDNA microarray  Drug activity patterns  Sulphorhodamine B assay  changes in total cellular protein after 48 hours of drug treatment
  • 27. Copyright (c) 2002 by SNU CSE Biointelligence Lab 27 Schematic View of the Modeling Approach Gene B Cancer Drug B Drug A Gene A - Selected genes, drugs and cancer type node Drug A Cancer Drug B Gene B Gene A < Learned Bayesian network > - Dependency analysis - Probabilistic inference Drug activity Data Gene Expression Data Preprocessing - Thresholding - Clustering - Discretization
  • 28. Copyright (c) 2002 by SNU CSE Biointelligence Lab 28 Data Preparation  cDNA microarray data  Gene expression profiles on 60 cell lines  1376  60 matrix  Drug activity data  Drug activity patterns on 60 cell lines  118  60 matrix (1376 + 118)  60 data matrix 60 samples Gene expressions 60 samples Drug activities 1376 genes 118 drugs
  • 29. Copyright (c) 2002 by SNU CSE Biointelligence Lab 29 Preprocessing  Thresholding  Elimination of unknown ESTs  805 genes  Elimination of drugs which have more than 4 missing values  84 drugs  Discretization  Local probability model for Bayesian networks: multinomial distribution 1376 genes 118 drugs 60 samples 805 genes 84 drugs 60 samples   + c  - c 1 0 -1
  • 30. Copyright (c) 2002 by SNU CSE Biointelligence Lab 30 Bayesian Network Learning for Gene-Drug Analysis  Large-scale Bayesian network  Several hundreds nodes (up to 890)  General greedy search is inapplicable because of time and space complexity.  Search heuristics  Local to global search heuristics  Exploit the locality of Bayesian networks to reduce the entire search space.  The local structure: Markov blanket  Find the candidate Markov blanket (of pre-determined size k) of each node  reduce the global search space
  • 31. Copyright (c) 2002 by SNU CSE Biointelligence Lab 31 Local to Global Search Heuristics Input: - A data set D. - An initial Bayesian network structure B0. - A decomposable scoring metric, Output: A Bayesian network structure B. Loop for n = 1, 2, …, until convergence. - Local Search Step: * Based on D and Bn–1, select for Xi, a set CBi n (|CBi n|  k) of candidate Markov blanket of Xi. * For each set {Xi, CBi n}, learn the local structure and determine the Markov blanket of Xi, BLn(Xi), from this local structure. * Merge all Markov blanket structures G({Xi, BLn(Xi)}, Ei) into a global network structure Hn (could be cyclic). - Global Search Step: * Find the Bayesian network structure Bn  Hn, which maximizes Score(Bn, D) and retains all non- cyclic edges in Hn. . ) ), ( | ( ) , (   i i B i D X Pa X Score D B Score
  • 32. Copyright (c) 2002 by SNU CSE Biointelligence Lab 32 Dimensionality Problem  The number of attributes (nodes) >> sample size  Unreliable structure of the learned Bayesian networks  Probabilistic inference is nearly impossible.  Downsize the number of attributes by clustering  Prototype: mean of all members in a cluster In the preprocessing step
  • 33. Copyright (c) 2002 by SNU CSE Biointelligence Lab 33 Bayesian Network with 45 Prototypes  Node types (46 nodes in all)  40 gene prototypes  5 drug prototypes  Cancer label  Discretization boundary   - c,  + c  Bayesian network learning  Varying candidate Markov blanket size (k = 5 ~ 15)  Select the best one  Three data sets (c = 0.43, 0.50, 0.60)  three Bayesian networks  Probabilistic inference c Distribution Ratio -1 0 1 0.43 33.3 % 33.3 % 33.3 % 0.50 30.8 % 38.3 % 30.8 % 0.60 27.4 45.1 27.4
  • 34. Copyright (c) 2002 by SNU CSE Biointelligence Lab 34 Correlations between ASNS and L-Asparaginase  Part of the Bayesian network (c = 0.60) Prototype for ASNS and SID W 484773, PYRROLINE-5- CARBOXYLATE REDUCTASE [5':AA037688, 3':AA037689] Prototype for L-Asparaginase P(D2|G4) D2 = -1 D2 = 0 D2 = 1 G4 = -1 0.32096 0.27086 0.40818 G4 = 0 0.31387 0.41247 0.27366 G4 = 1 0.32167 0.34920 0.32913 < Conditional probability table >
  • 35. Copyright (c) 2002 by SNU CSE Biointelligence Lab 35 Bayesian Networks on Subset of Genes and Drugs  Node types (17 nodes in all)  12 genes  4 drugs  Cancer label  Discretization boundary   - c,  + c  Bayesian network learning  General greedy search with restart (100 times)  Select the best one  Three data sets (c = 0.43, 0.50, 0.60)  three Bayesian networks  Probabilistic inference c Distribution Ratio -1 0 1 0.43 33.3 % 33.3 % 33.3 % 0.50 30.8 % 38.3 % 30.8 % 0.60 27.4 45.1 27.4 Clustering of genes and drugs together - From neighboring clusters
  • 36. Copyright (c) 2002 by SNU CSE Biointelligence Lab 36 Around the L-Asparaginase < Part of the Bayesian network (c = 0.6) >
  • 37. Copyright (c) 2002 by SNU CSE Biointelligence Lab 37 Probabilistic Relationships Around the L-Asparaginase  Cancer type unobserved  D1: L-Asparaginase  G1: ASNS gene  G2: PYRROLINE-5-CARBOXYLATE REDUCTASE P(D1|G1) D1 = -1 D1 = 0 D1 = 1 G1 = -1 0.19857 0.27471 0.52672 G1 = 0 0.31110 0.49795 0.19095 G1 = 1 0.42159 0.36279 0.21561  Cancer type observed (= leukemia)  D1: L-Asparaginase  G1: ASNS gene  G2: PYRROLINE-5-CARBOXYLATE REDUCTASE P(D1|G2) D1 = -1 D1 = 0 D1 = 1 G2 = -1 0.27510 0.35226 0.37263 G2 = 0 0.31621 0.41072 0.27307 G2 = 1 0.33837 0.39664 0.26499 P(D1|G1,L) D1 = -1 D1 = 0 D1 = 1 G1 = -1 0.17536 0.22838 0.59626 G1 = 0 0.27128 0.53790 0.19081 G1 = 1 0.38500 0.42437 0.19063 P(D1|G2,L) D1 = -1 D1 = 0 D1 = 1 G2 = -1 0.23812 0.33853 0.42335 G2 = 0 0.27978 0.42666 0.29356 G2 = 1 0.30371 0.42108 0.27520
  • 38. Term Project #3: Diagnosis Using Bayesian Networks
  • 39. Copyright (c) 2002 by SNU CSE Biointelligence Lab 39 Outline  Task 1: Structural learning of the Bayesian network  Data generation from the ALARM network  Structural learning of Bayesian networks using more than two kinds of algorithms and scores  Compare the learned results w.r.t. the edge errors according to the various sample sizes and the learning algorithms  Task 2: Classification using Bayesian networks  Arbitrarily divide the Leukemia data set between the training set and the test set  Learn the Bayesian network from the training data set using one of the metric-based approaches  Evaluate the performance of the Bayesian network as a classifier (classification accuracy)
  • 40. Copyright (c) 2002 by SNU CSE Biointelligence Lab 40 Data Generation  Using the Netica Software (http://www.norsys.com)  The ALARM network  # of nodes: 37  # of edges: 46
  • 41. Copyright (c) 2002 by SNU CSE Biointelligence Lab 41 Structural Learning  Independence method  BN Power constructor (http://www.cs.ualberta.ca/~jcheng/bnsoft.htm)  Metric-based method  LearnBayes (http://www.cs.huji.ac.il/labs/compbio/LibB/)  MDL, BIC, BD, and likelihood score are can be used.
  • 42. Copyright (c) 2002 by SNU CSE Biointelligence Lab 42 The Leukemia Data Set  Class type  ALL (acute lymphoblastic leukemia) or AML (acute myeloid leukemia)  Data set  # of attributes: 50 gene expression levels (0 or 1)  # of samples: 72
  • 43. Copyright (c) 2002 by SNU CSE Biointelligence Lab 43 Submission  Deadline: 2002. 11. 27  Location: 301-419