1. PRMs Random generation Population Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens
relationnels
Mouna Ben Ishak1;2, Philippe Leray2 and Nahla Ben Amor1
1 Laboratoire de Recherche Opérationnelle de Décision et de Contrôle de
Processus (LARODEC), ISG Tunis, Tunisie
2 Laboratoire d’Informatique de Nantes Atlantique (LINA), UMR CNRS 6241,
Université de Nantes, France
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 1/27
2. PRMs Random generation Population Conclusion & ongoing work
Motivation (1/3)
f1 f2 f3 … fm
x1 v1 v3 v2 … v1
x2 v2 v1 V3 … v1
x3 v1 v2 v3 … v2
… … … … … …
xn v1 v3 v2 … v1
Learned model
Features
Observations
Training
set
Learning
algorithm
Flat data representation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 2/27
3. PRMs Random generation Population Conclusion & ongoing work
Motivation (2/3)
Presentation
Presentation
Business logic DDDaaatttaaa
Business logic DDDaaatttaaa
Relational
representation!!!
Relational
representation!!!
How to use relational data with classical machine learning algorithms?
How to use this data with classical machine learning algorithms?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 3/27
4. PRMs Random generation Population Conclusion & ongoing work
Motivation (3/3)
Propositionalization
It has been shown that propositionalization is not always
appropriate to perform learning in relational domains (Maier et
al., 10)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 4/27
5. PRMs Random generation Population Conclusion & ongoing work
Motivation (3/3)
Propositionalization
It has been shown that propositionalization is not always
appropriate to perform learning in relational domains (Maier et
al., 10)
Relational transition
Extend classical machine learning techniques in the context of
relational data representation
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 4/27
6. PRMs Random generation Population Conclusion & ongoing work
Outline ...
1. PRMs
2. Random generation
2.1. Relational schema random generation
2.2. PRM random generation
3. Population
4. Conclusion & ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 5/27
7. PRMs Random generation Population Conclusion & ongoing work
Bayesian networks (BN) (Pearl, 85)
Definition
G qualitative description of
conditional dependences
/ independences
between variables
directed acyclic graph
(DAG)
quantitative description
of these dependences
conditional probability
distributions (CPDs)
Gender
Occupation
Age
Low Middle High
Age Occupation
Oc1 Oc2 Oc3
Low,F 0.5 0.1 0.4
Low,M 0.3 0.5 0.2
Middle,F 0.2 0.4 0.4
Middle,M 0.9 0.1 0
High,F 0.3 0.5 0.2
High,M 0.2 0.3 0.5
Gender
M F
Gender
0.4 0.3 0.3
Age
0.4 0.6
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 6/27
8. PRMs Random generation Population Conclusion ongoing work
BN structure learning
Constraint-based methods
BN = independence model
) find cond. indep. (CI) in data in order to build the DAG
ex : IC (Pearl Verma, 91), PC (Spirtes et al., 93)
problem : reliability of CI statistical tests (ok for n 100)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 7/27
9. PRMs Random generation Population Conclusion ongoing work
BN structure learning
Constraint-based methods
BN = independence model
problem : reliability of CI statistical tests (ok for n 100)
Score-based methods
BN = probabilistic model that must fit data as well as
possible
) search the DAG space in order to maximize a scoring
function
ex : Maximum Weighted Spanning Tree (Chow Liu, 68),
Greedy Search (Chickering, 95), evolutionary approaches
(Larranaga et al., 96) (Wang Yang, 10)
problem : size of search space (ok for n 1000)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 7/27
10. PRMs Random generation Population Conclusion ongoing work
BN structure learning
Constraint-based methods
BN = independence model
problem : reliability of CI statistical tests (ok for n 100)
Score-based methods
BN = probabilistic model that must fit data as well as
possible
problem : size of search space (ok for n 1000)
Hybrid/ local search methods
local search / neighbor identification (statistical tests)
global (score) optimization
usually for scalability reasons (ok for high n)
Génératioenxalé:atoMireMdeHrésCeauax Blgayoésrieitnhs rmelatio(nTneslsamardJFinRBo’1s4 et a25l-.2,70jui6n,)IHP, Paris, France 7/27
11. PRMs Random generation Population Conclusion ongoing work
Evaluating structure learning algorithms
Standard practice
generating data from a reference model
applying a structure learning algorithm with this data
comparing the learned and reference models
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 8/27
12. PRMs Random generation Population Conclusion ongoing work
Evaluating structure learning algorithms
Standard practice
generating data from a reference model
applying a structure learning algorithm with this data
comparing the learned and reference models
Which reference model ?
existence of reference benchmarks (e.g., Asia, Alarm, ...).
randomly generated models (Ide et al., 04)
arbitrarily large BN by tiling (Tsamardinos et al., 06)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 8/27
13. PRMs Random generation Population Conclusion ongoing work
Which kind of data ?
BN learning from data... but which kind of data ?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 9/27
14. PRMs Random generation Population Conclusion ongoing work
Which kind of data ?
BN learning from data... but which kind of data ?
how to deal with structured data ?
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 9/27
15. PRMs Random generation Population Conclusion ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
Occupation
RealiseDate
Genre
A relational schema R
classes + relational variables
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
16. PRMs Random generation Population Conclusion ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
Occupation
RealiseDate
Genre
A relational schema R
classes + relational variables
reference slots (e.g.,
Vote:Movie;Vote:User)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
17. PRMs Random generation Population Conclusion ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
Occupation
RealiseDate
Genre
A relational schema R
classes + relational variables
reference slots (e.g.,
Vote:Movie;Vote:User)
slot chain = a sequence of
reference slots
allow to walk in the relational
schema to create new variables
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
18. PRMs Random generation Population Conclusion ongoing work
Relational schema
Movie
User
Vote
Movie
User
Rating
Gender
Age
Occupation
RealiseDate
Genre
A relational schema R
classes + relational variables
reference slots (e.g.,
Vote:Movie;Vote:User)
slot chain = a sequence of
reference slots
allow to walk in the relational
schema to create new variables
ex : Vote:User:User1:Movie :
all the movies voted by a
particular user
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 10/27
19. PRMs Random generation Population Conclusion ongoing work
Probabilistic Relational Models
(Koller Pfeffer, 98)
Definition
A PRM associated to R :
a qualitative dependency
structure S (with possible
long slot chains and
aggregation functions)
a set of parameters S
Vote
User.Gender
M F
0.4 0.6
Rating
Movie
User
RealiseDate
Genre
Gender Age
Occupation
Movie.Genre Votes.Rating
Low High
Drama, M 0.5 0.5
Drama, F 0.3 0.7
Horror, M 0.2 0.8
Horror, F 0.9 0.1
Comedy, M 0.5 0.5
Comedy, F 0.6 0.4
User.Gender
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 11/27
20. PRMs Random generation Population Conclusion ongoing work
Probabilistic Relational Models
Definition
Vote
User.Gender
M F
0.4 0.6
Rating
Movie
User
RealiseDate
Genre
Gender Age
Occupation
Movie.Genre Votes.Rating
Low High
Drama, M 0.5 0.5
Drama, F 0.3 0.7
Horror, M 0.2 0.8
Horror, F 0.9 0.1
Comedy, M 0.5 0.5
Comedy, F 0.6 0.4
User.Gender
Aggregators
Vote:User:User1:Movie:genre ! Vote:rating
movie rating from one user can be dependent with the
genre of all the movies voted by this user
how to describe the dependency with an unknown number
of parents ?
solution : using an aggregated value, e.g.
= MODE
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 11/27
21. PRMs Random generation Population Conclusion ongoing work
Ground Bayesian Network
GBN
BN created from one
PRM and an
instantiated
database
= relational skeleton
Age
Rating
Age
Gender
Occupation
Age
Gender
Occupation
Gender
Occupation
Genre
RealiseDate
Genre
Genre
Genre
Genre
U1
U2
U3
M1
M2
M3
M4
M5
#U1, #M1
Rating
#U1, #M2
Rating
#U2, #M1
Rating
#U2, #M3
Rating
#U2, #M4
Rating
#U3, #M1
Rating
#U3, #M2
Rating
#U3, #M3
Rating
#U3, #M5
RealiseDate
RealiseDate
RealiseDate
RealiseDate
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 12/27
22. PRMs Random generation Population Conclusion ongoing work
Ground Bayesian Network
GBN
BN created from one
PRM and an
instantiated
database
= relational skeleton
+ probabilistic
dependencies
used for probabilistic
inference
Age
Rating
Age
Gender
Occupation
Age
Gender
Occupation
Gender
Occupation
Genre
RealiseDate
Genre
Genre
Genre
Genre
U1
U2
U3
M1
M2
M3
M4
M5
#U1, #M1
Rating
#U1, #M2
Rating
#U2, #M1
Rating
#U2, #M3
Rating
#U2, #M4
Rating
#U3, #M1
Rating
#U3, #M2
Rating
#U3, #M3
Rating
#U3, #M5
RealiseDate
RealiseDate
RealiseDate
RealiseDate
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 12/27
23. PRMs Random generation Population Conclusion ongoing work
PRM structure learning
Constraint-based methods
relational PC (Maier et al., 10) relational CD (Maier et al.,
13)
don’t deal with aggregation functions
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
24. PRMs Random generation Population Conclusion ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
greedy search (Getoor et al., 07)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
25. PRMs Random generation Population Conclusion ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methods
relational MMHC (Ben Ishak et al., in progress)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
26. PRMs Random generation Population Conclusion ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methods
Critics - previous works
lack of evaluation process, in a common framework
absence of relational benchmarks for evaluation algorithms
absence of relational data generation process
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
27. PRMs Random generation Population Conclusion ongoing work
PRM structure learning
Constraint-based methods
Score-based methods
Hybrid methods
Critics - previous works
Proposition
a synthetic approach to randomly generate and populate PRMs
and databases
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 13/27
28. PRMs Random generation Population Conclusion ongoing work
PRMs random generation
Related work
(Maier et al., 10, 13)
relational schemas are generated as tree structure ... too
simple
(Wuillemin et al., 12)
object-oriented paradigm rather than relational one
no population nor interaction with a relational database
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 14/27
29. PRMs Random generation Population Conclusion ongoing work
The overall process
PPRRMM
DDBB iinnssttaannccee
Instantiate
Sample
Model generation
RReellaattiioonnaall SScchheemmaa PPrroobbaabbiilliissttiiccddeeppeennddeenncciieess
Instance generation
RReellaattiioonnaall SSkkeelleettoonn PPrroobbaabbiilliissttiiccddeeppeennddeenncciieess GGrroouunnddBBNN
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 15/27
30. PRMs Random generation Population Conclusion ongoing work
The overall platform
RDB
Visualization
Inference
Learning PRM
PRM API
Parameters learning Structure learning
+
score-based
+
constraint-based
+
Hybrid
Statistical learning
+
Bayesian learning
Benchmarking
+
Evaluation
FIGURE: PRM API under the PILGRIM platform
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 16/27
31. PRMs Random generation Population Conclusion ongoing work
Outline ...
1. PRMs
2. Random generation
2.1. Relational schema random generation
2.2. PRM random generation
3. Population
4. Conclusion ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 17/27
32. PRMs Random generation Population Conclusion ongoing work
Generating the relational schema
Hypotheses
with respect to the relational model definition (Date, 08) :
avoid referential cycles when generating constraints
8Xi ;Xi 2 X there exist a referential path from Xi to Xj :
searching for DAG structures with a single connected
component
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 18/27
34. PRMs Random generation Population Conclusion ongoing work
Generating the PRM
Goal
randomly generating probabilistic dependencies S
between the attributes of the classes structure
sampling CPDs like for usual BNs
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
35. PRMs Random generation Population Conclusion ongoing work
Generating the PRM
Goal
Hypothesis
the dependency structure S should be a DAG
one descriptive attribute is dependent with another one,
but with which slot chain ?
we need a user-defined maximum slot chain length Kmax
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
36. PRMs Random generation Population Conclusion ongoing work
Generating the PRM
Goal
Hypothesis
Principle
step I : add dependencies while keeping a DAG structure,
first into classes, then intra classes
step II : random choice of a legal slot chain weighted by its
length
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 20/27
37. PRMs Random generation Population Conclusion ongoing work
Example
Clazz0
Clazz1
claszz1fkatt13. clazz1fkatt10-1]
Clazz2
Clazz3
att0
att1
att0
att0
att0
clazz1fkatt10
clazz0fkatt03
claszz1fkatt13
clazz2fkatt23
clazz1fkatt12
att2
att1
att3
att1
att2
[Clazz0.clazz1fkatt10]
[Clazz2.clazz1fkatt12]
-1]
clazz2fkatt23Calzz2.MODE
[-1]
MODE
clazz1fkatt12clazz1fkatt12. [Clazz2.[Calzz2.clazz2fkatt23-1.
MODE
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 21/27
38. PRMs Random generation Population Conclusion ongoing work
Outline ...
1. PRMs
2. Random generation
2.1. Relational schema random generation
2.2. PRM random generation
3. Population
4. Conclusion ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 22/27
39. PRMs Random generation Population Conclusion ongoing work
GBN creation and sampling
Generating the relational skeleton
by generating a random number of objects per class
adding links between objects : all referencing classes have
their generated objects related to objects from referenced
classes
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
40. PRMs Random generation Population Conclusion ongoing work
GBN creation and sampling
Generating the relational skeleton
Creating the GBN
the GBN is constructed by using the CPDs already defined
by the PRM
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
41. PRMs Random generation Population Conclusion ongoing work
GBN creation and sampling
Generating the relational skeleton
Creating the GBN
Populating the database
sampling from the GBN
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 23/27
42. PRMs Random generation Population Conclusion ongoing work
Example
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 24/27
43. PRMs Random generation Population Conclusion ongoing work
Outline ...
1. PRMs
2. Random generation
2.1. Relational schema random generation
2.2. PRM random generation
3. Population
4. Conclusion ongoing work
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 25/27
44. PRMs Random generation Population Conclusion ongoing work
Conclusion - Perspectives
Conclusion
we proposed one process to randomly generate PRMs and
instantiate them to populate a relational database
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 26/27
45. PRMs Random generation Population Conclusion ongoing work
Conclusion - Perspectives
Conclusion
we proposed one process to randomly generate PRMs and
instantiate them to populate a relational database
Ongoing work
propose a new approach to learn PRM structure from
relational data
comparing it with existing state-of-the-art approaches, with
databases using our random generation process
extend our generation approach to address other relational
probabilistic graphical models (e.g., DAPER)
Génération aléatoire de réseaux Bayésiens relationnels JFRB’14 25-27 juin, IHP, Paris, France 26/27
46. A suivre :-)
Jeudi 9h30 - Ghada Trabelsi -
Evaluation des algos
d’apprentissage de structure des
RB dynamiques
Jeudi 10h - Anthony Coutant -
Apprentissage d’une extension
des PRM
Vendredi 10h30 - Maroua
Haddad - Apprentissage des
réseaux possibilistes
D Données
Data
U Connaissances
Utilisateurs
User
Ke
Knowledge
47. A suivre :-)
Jeudi 9h30 - Ghada Trabelsi -
Evaluation des algos
d’apprentissage de structure des
RB dynamiques
Jeudi 10h - Anthony Coutant -
Apprentissage d’une extension
des PRM
Vendredi 10h30 - Maroua
Haddad - Apprentissage des
réseaux possibilistes
Any question ?