SlideShare une entreprise Scribd logo
1  sur  45
Télécharger pour lire hors ligne
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Graph mining with kernel self-organizing map
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Joint work with Fabrice Rossi, INRIA, Rocquencourt, France
Institut de Mathématiques de Toulouse, - IUT de Carcassonne, Université de
Perpignan
France
SanTouVal, February 1st, 2008
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Historian’s questions:
family or geographical social links ?
central people having a main social role ?
. . .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Historian’s questions:
family or geographical social links ?
central people having a main social role ?
. . .
⇒ Data mining is required.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Number of vertices: 615
Number of edges: 4193
Total of weights: 40 329
Diameter: 10
Density: 2,2%
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Number of vertices: 615
Number of edges: 4193
Total of weights: 40 329
Diameter: 10
Density: 2,2%
Clustering the vertices into homogeneous social groups to
understand the structure of the peasant community.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Question: Understanding the structure of these large graphs
Clustering: building relevant homogeneous groups;
Graph drawing: giving a global representation of the graph.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Question: Understanding the structure of these large graphs
Clustering: building relevant homogeneous groups;
Graph drawing: giving a global representation of the graph.
Here: Self-Organizing Map for nonvectorial data.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Dissimilarities based on the shortest paths;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Dissimilarities based on the shortest paths;
Dissimilarities or distances based on the Laplacian matrix:
spectral clustering.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian
Definitions
For a graph with vertices V = {x1, . . . , xn} having positive weights
(wi,j)i,j=1,...,n such that, for all i, j = 1, . . . , n, wi,j = wj,i and di = n
j=1 wi,j,
Laplacian: L = (Li,j)i,j=1,...,n where
Li,j =
−wi,j if i j
di if i = j
;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property I [von Luxburg, 2007]
Connected subgraphs
KerL = Span{IA1
, . . . , IAk
} where Ai indicates the positions of the
vertices of the ith connected component of the graph.
1
4
5
2
3
KerL = Span





1
0
0
1
1


;


0
1
1
0
0





Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Laplacian and perfect communities
For a non weighted graph,
The graph has a perfect community with m vertices
⇔
L has m eigenvectors such that each eigenvector has the same
n − m coordinates that vanish.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Application :
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Application :
But: only 1/3 of the graph can be drawn this way.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum , is equivalent to minimize
H = arg min
h∈Rn×k
Tr hT
Lh subject to
hT
h = I
hi = 1/
√
|Ai|1Ai
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum , is equivalent to minimize
H = arg min
h∈Rn×k
Tr hT
Lh subject to
hT
h = I
hi = 1/
√
|Ai|1Ai
⇒ NP-complete problem.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum can be approached by
H = arg min
h∈Rn×k
Tr hT
Lh subject to hT
h = I
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum can be approached by
H = arg min
h∈Rn×k
Tr hT
Lh subject to hT
h = I
Spectral clustering: Find the k smallest eigenvectors of L, H, and
make the classification on the rows of H.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A regularized version of L
Regularization : the diffusion matrix : pour β > 0,
Kβ = e−βL
= +∞
k=1
(−βL)k
k! .
⇒
kβ
: V × V → R
(xi, xj) → K
β
i,j
diffusion kernel (or heat kernel).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Diffusion process on the graph
If Z0 = (1 1 1 . . . 1 1)T
is the “energy” of each vertex at time 0 and
if a small fraction of this energy is propagated among the edges
of the graph at each time step, then after t steps, the energy of the
vertices of the graph is:
Zt = (1 + L)t
Z0
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Diffusion process on the graph
If Z0 = (1 1 1 . . . 1 1)T
is the “energy” of each vertex at time 0 and
if a small fraction of this energy is propagated among the edges
of the graph at each time step, then after t steps, the energy of the
vertices of the graph is:
Zt = (1 + L)t
Z0
Limits: Time step ∆t by t → t/(∆t) and → ∆t; then
(∆t) → 0 (continuous process) gives
lim Zt = e tL
= K t
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
2 Regularization operator: for u ∈ Rn
∼ V, uT
Kβu is higher for
vectors u that vary a lot over “close” vertices of the graph.
β intensity of regularization (for small β, direct neighbors are
more important);
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
2 Regularization operator: for u ∈ Rn
∼ V, uT
Kβu is higher for
vectors u that vary a lot over “close” vertices of the graph.
β intensity of regularization (for small β, direct neighbors are
more important);
3 Reproducing kernel property: kβ is symmetric and positive
⇒ ∃ Hilbert space (H, ., . ) and φ : V → H such that
kβ
(xi, xj) = φ(xi), φ(xj) .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Kohonen map
Mapping the data onto a 2 dimensional map
Each neuron of the map, i = 1, . . . , M is associated to a
prototype, pi ∈ H ;
Neurons are related to each others by a neighborhood
relationship (“distance”: d) :
Classifying the vertices on the map
Each xi is associated to a neuron (cluster or class) of the map,
f(xi).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Preserving the initial topology
Energy
The goal is to minimize the energy of the map:
E =
M
i=1
h(d(f(x), i)) x − pi
2
H dP(x)
where h is a decreasing function (ex: h(t) = αe−t/2σ2
).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Preserving the initial topology
Energy
The goal is to minimize the energy of the map:
E =
M
i=1
h(d(f(x), i)) x − pi
2
H dP(x)
where h is a decreasing function (ex: h(t) = αe−t/2σ2
).
Energy is approached by its empirical version:
En
=
n
j=1
M
i=1
h(d(f(xj), i)) xj − pi
2
H .
and minimization is approached by SOM algorithm.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
fl
(xi) = arg min
j=1,...,M
φ(xi) −
n
i=1
γl
jiφ(xi)
H
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
fl
(xi) = arg min
j=1,...,M
φ(xi) −
n
i=1
γl
jiφ(xi)
H
Representation step
γl
j = arg min
γ∈Rn
n
i=1
h(fl
(xi), j) φ(xi) −
n
l =1
γl φ(xl )
2
H
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
f(xi) = arg min
j=1,...,M
n
u,u =1
γjuγju kβ
(xu, xu ) − 2
n
u=1
γjukβ
(xu, xi)
Representation step
γl
ji =
h(fl
(xi), j))
n
i =1 h(fl(xi , j))
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Expected developments
1 Hierarchical clustering;
2 Achieve a classification based on density criterium (joint work
with S. Gadat);
3 Adapting the algorithm to very large graphs (thousands of
vertices).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
References
Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008).
Batch kernel SOM and related laplacian methods for social network
analysis.
Neurocomputing.
To appear.
Villa, N. and Rossi, F. (2007).
A comparison between dissimilarity SOM and kernel SOM for clustering the
vertices of a graph.
In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07),
Bielefield, Germany.
von Luxburg, U. (2007).
A tutorial on spectral clustering.
Technical Report TR-149, Max Planck Institut für biologische Kybernetik.
Avaliable at http://www.kyb.mpg.de/publications/
attachments/luxburg06_TR_v2_4139%5B1%5D.pdf.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008

Contenu connexe

Similaire à Graph mining with kernel self-organizing map

Similaire à Graph mining with kernel self-organizing map (20)

A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Network analysis for computational biology
Network analysis for computational biologyNetwork analysis for computational biology
Network analysis for computational biology
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans RVisualiser et fouiller des réseaux - Méthodes et exemples dans R
Visualiser et fouiller des réseaux - Méthodes et exemples dans R
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
http://old.nathalievilla.org/IMG/pdf/Presentation-27.pdf
http://old.nathalievilla.org/IMG/pdf/Presentation-27.pdfhttp://old.nathalievilla.org/IMG/pdf/Presentation-27.pdf
http://old.nathalievilla.org/IMG/pdf/Presentation-27.pdf
 
Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"Reading revue of "Inferring Multiple Graphical Structures"
Reading revue of "Inferring Multiple Graphical Structures"
 
Large scale landuse classification of satellite imagery
Large scale landuse classification of satellite imageryLarge scale landuse classification of satellite imagery
Large scale landuse classification of satellite imagery
 
08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)08 Inference for Networks – DYAD Model Overview (2017)
08 Inference for Networks – DYAD Model Overview (2017)
 
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
A Performance Analysis of Self-* Evolutionary Algorithms on Networks with Cor...
 
Statistical Modeling: The Two Cultures
Statistical Modeling: The Two CulturesStatistical Modeling: The Two Cultures
Statistical Modeling: The Two Cultures
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Traffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equationsTraffic flow modeling on road networks using Hamilton-Jacobi equations
Traffic flow modeling on road networks using Hamilton-Jacobi equations
 
Garbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning TechniquesGarbage Classification Using Deep Learning Techniques
Garbage Classification Using Deep Learning Techniques
 
About functional SIR
About functional SIRAbout functional SIR
About functional SIR
 
CoopLoc Technical Presentation
CoopLoc Technical PresentationCoopLoc Technical Presentation
CoopLoc Technical Presentation
 
Doubly-Massive MIMO Systems at mmWave Frequencies: Opportunities and Research...
Doubly-Massive MIMO Systems at mmWave Frequencies: Opportunities and Research...Doubly-Massive MIMO Systems at mmWave Frequencies: Opportunities and Research...
Doubly-Massive MIMO Systems at mmWave Frequencies: Opportunities and Research...
 
Final Project
Final ProjectFinal Project
Final Project
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 

Plus de tuxette

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
anilsa9823
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Lokesh Kothari
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 

Dernier (20)

PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service  🪡
CALL ON ➥8923113531 🔝Call Girls Kesar Bagh Lucknow best Night Fun service 🪡
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
fundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomologyfundamental of entomology all in one topics of entomology
fundamental of entomology all in one topics of entomology
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Botany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questionsBotany krishna series 2nd semester Only Mcq type questions
Botany krishna series 2nd semester Only Mcq type questions
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 

Graph mining with kernel self-organizing map

  • 1. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Graph mining with kernel self-organizing map Nathalie Villa-Vialaneix http://www.nathalievilla.org Joint work with Fabrice Rossi, INRIA, Rocquencourt, France Institut de Mathématiques de Toulouse, - IUT de Carcassonne, Université de Perpignan France SanTouVal, February 1st, 2008 Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 2. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Table of contents 1 Motivations 2 Dissimilarities and distances between vertices 3 Kernel SOM 4 Application and comments Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 3. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Exploring a big historic database Data 1000 agrarian contracts, from four seignories (about 10 villages) of South West of France, established between 1250 and 1350 (before the Hundred Years’ war). Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 4. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Exploring a big historic database Data 1000 agrarian contracts, from four seignories (about 10 villages) of South West of France, established between 1250 and 1350 (before the Hundred Years’ war). Historian’s questions: family or geographical social links ? central people having a main social role ? . . . Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 5. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Exploring a big historic database Data 1000 agrarian contracts, from four seignories (about 10 villages) of South West of France, established between 1250 and 1350 (before the Hundred Years’ war). Historian’s questions: family or geographical social links ? central people having a main social role ? . . . ⇒ Data mining is required. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 6. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments A graph clustering problem From the database, building a weighted graph: with 615 vertices x1, . . . , xn := peasants found in the contracts; Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 7. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments A graph clustering problem From the database, building a weighted graph: with 615 vertices x1, . . . , xn := peasants found in the contracts; with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are mentionned}. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 8. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments A graph clustering problem From the database, building a weighted graph: with 615 vertices x1, . . . , xn := peasants found in the contracts; with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are mentionned}. Number of vertices: 615 Number of edges: 4193 Total of weights: 40 329 Diameter: 10 Density: 2,2% Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 9. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments A graph clustering problem From the database, building a weighted graph: with 615 vertices x1, . . . , xn := peasants found in the contracts; with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are mentionned}. Number of vertices: 615 Number of edges: 4193 Total of weights: 40 329 Diameter: 10 Density: 2,2% Clustering the vertices into homogeneous social groups to understand the structure of the peasant community. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 10. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Other fields modelized by large graphs Computer science: World Wide Web, P2P network. . . Social networks Biology: Protein interactions, Neuronal network,. . . Business, management: Transportation networks, Industry partnerships. . . Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 11. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Other fields modelized by large graphs Computer science: World Wide Web, P2P network. . . Social networks Biology: Protein interactions, Neuronal network,. . . Business, management: Transportation networks, Industry partnerships. . . Question: Understanding the structure of these large graphs Clustering: building relevant homogeneous groups; Graph drawing: giving a global representation of the graph. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 12. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Other fields modelized by large graphs Computer science: World Wide Web, P2P network. . . Social networks Biology: Protein interactions, Neuronal network,. . . Business, management: Transportation networks, Industry partnerships. . . Question: Understanding the structure of these large graphs Clustering: building relevant homogeneous groups; Graph drawing: giving a global representation of the graph. Here: Self-Organizing Map for nonvectorial data. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 13. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Table of contents 1 Motivations 2 Dissimilarities and distances between vertices 3 Kernel SOM 4 Application and comments Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 14. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Usual dissimilarities between vertices The Dice (Jaccard) index: D(xi, xj) = Γ(xi) ∩ Γ(xj) |Γ(xi)| + |Γ(xj)| (non weighted graphs); Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 15. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Usual dissimilarities between vertices The Dice (Jaccard) index: D(xi, xj) = Γ(xi) ∩ Γ(xj) |Γ(xi)| + |Γ(xj)| (non weighted graphs); Dissimilarities based on the shortest paths; Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 16. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Usual dissimilarities between vertices The Dice (Jaccard) index: D(xi, xj) = Γ(xi) ∩ Γ(xj) |Γ(xi)| + |Γ(xj)| (non weighted graphs); Dissimilarities based on the shortest paths; Dissimilarities or distances based on the Laplacian matrix: spectral clustering. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 17. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian Definitions For a graph with vertices V = {x1, . . . , xn} having positive weights (wi,j)i,j=1,...,n such that, for all i, j = 1, . . . , n, wi,j = wj,i and di = n j=1 wi,j, Laplacian: L = (Li,j)i,j=1,...,n where Li,j = −wi,j if i j di if i = j ; Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 18. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property I [von Luxburg, 2007] Connected subgraphs KerL = Span{IA1 , . . . , IAk } where Ai indicates the positions of the vertices of the ith connected component of the graph. 1 4 5 2 3 KerL = Span      1 0 0 1 1   ;   0 1 1 0 0      Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 19. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property II [Boulet et al., 2008] Perfect community : Complete subgraph (clique) which vertices share the same neighbors outside the clique. Laplacian and perfect communities For a non weighted graph, The graph has a perfect community with m vertices ⇔ L has m eigenvectors such that each eigenvector has the same n − m coordinates that vanish. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 20. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property II [Boulet et al., 2008] Perfect community : Complete subgraph (clique) which vertices share the same neighbors outside the clique. Application : Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 21. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property II [Boulet et al., 2008] Perfect community : Complete subgraph (clique) which vertices share the same neighbors outside the clique. Application : But: only 1/3 of the graph can be drawn this way. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 22. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property III [von Luxburg, 2007] Min Cut problem: Suppose that we have a connected graph. Find a classification of the vertices of the graph, A1, . . . , Ak such that 1 2 k i=1 j∈Ai,j Ai wj,j is minimum , is equivalent to minimize H = arg min h∈Rn×k Tr hT Lh subject to hT h = I hi = 1/ √ |Ai|1Ai Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 23. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property III [von Luxburg, 2007] Min Cut problem: Suppose that we have a connected graph. Find a classification of the vertices of the graph, A1, . . . , Ak such that 1 2 k i=1 j∈Ai,j Ai wj,j is minimum , is equivalent to minimize H = arg min h∈Rn×k Tr hT Lh subject to hT h = I hi = 1/ √ |Ai|1Ai ⇒ NP-complete problem. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 24. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property III [von Luxburg, 2007] Min Cut problem: Suppose that we have a connected graph. Find a classification of the vertices of the graph, A1, . . . , Ak such that 1 2 k i=1 j∈Ai,j Ai wj,j is minimum can be approached by H = arg min h∈Rn×k Tr hT Lh subject to hT h = I Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 25. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Laplacian: property III [von Luxburg, 2007] Min Cut problem: Suppose that we have a connected graph. Find a classification of the vertices of the graph, A1, . . . , Ak such that 1 2 k i=1 j∈Ai,j Ai wj,j is minimum can be approached by H = arg min h∈Rn×k Tr hT Lh subject to hT h = I Spectral clustering: Find the k smallest eigenvectors of L, H, and make the classification on the rows of H. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 26. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments A regularized version of L Regularization : the diffusion matrix : pour β > 0, Kβ = e−βL = +∞ k=1 (−βL)k k! . ⇒ kβ : V × V → R (xi, xj) → K β i,j diffusion kernel (or heat kernel). Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 27. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Diffusion process on the graph If Z0 = (1 1 1 . . . 1 1)T is the “energy” of each vertex at time 0 and if a small fraction of this energy is propagated among the edges of the graph at each time step, then after t steps, the energy of the vertices of the graph is: Zt = (1 + L)t Z0 Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 28. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Diffusion process on the graph If Z0 = (1 1 1 . . . 1 1)T is the “energy” of each vertex at time 0 and if a small fraction of this energy is propagated among the edges of the graph at each time step, then after t steps, the energy of the vertices of the graph is: Zt = (1 + L)t Z0 Limits: Time step ∆t by t → t/(∆t) and → ∆t; then (∆t) → 0 (continuous process) gives lim Zt = e tL = K t Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 29. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Properties 1 Diffusion on the graph: kβ(xi, xj) quantity of energy accumulated in xj after a given time if energy 1 is injected in xi at time 0 and if diffusion is done continuously along the edges. β intensity of diffusion; Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 30. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Properties 1 Diffusion on the graph: kβ(xi, xj) quantity of energy accumulated in xj after a given time if energy 1 is injected in xi at time 0 and if diffusion is done continuously along the edges. β intensity of diffusion; 2 Regularization operator: for u ∈ Rn ∼ V, uT Kβu is higher for vectors u that vary a lot over “close” vertices of the graph. β intensity of regularization (for small β, direct neighbors are more important); Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 31. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Properties 1 Diffusion on the graph: kβ(xi, xj) quantity of energy accumulated in xj after a given time if energy 1 is injected in xi at time 0 and if diffusion is done continuously along the edges. β intensity of diffusion; 2 Regularization operator: for u ∈ Rn ∼ V, uT Kβu is higher for vectors u that vary a lot over “close” vertices of the graph. β intensity of regularization (for small β, direct neighbors are more important); 3 Reproducing kernel property: kβ is symmetric and positive ⇒ ∃ Hilbert space (H, ., . ) and φ : V → H such that kβ (xi, xj) = φ(xi), φ(xj) . Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 32. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Table of contents 1 Motivations 2 Dissimilarities and distances between vertices 3 Kernel SOM 4 Application and comments Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 33. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Kohonen map Mapping the data onto a 2 dimensional map Each neuron of the map, i = 1, . . . , M is associated to a prototype, pi ∈ H ; Neurons are related to each others by a neighborhood relationship (“distance”: d) : Classifying the vertices on the map Each xi is associated to a neuron (cluster or class) of the map, f(xi). Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 34. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Preserving the initial topology Energy The goal is to minimize the energy of the map: E = M i=1 h(d(f(x), i)) x − pi 2 H dP(x) where h is a decreasing function (ex: h(t) = αe−t/2σ2 ). Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 35. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Preserving the initial topology Energy The goal is to minimize the energy of the map: E = M i=1 h(d(f(x), i)) x − pi 2 H dP(x) where h is a decreasing function (ex: h(t) = αe−t/2σ2 ). Energy is approached by its empirical version: En = n j=1 M i=1 h(d(f(xj), i)) xj − pi 2 H . and minimization is approached by SOM algorithm. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 36. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Batch kernel SOM [Villa and Rossi, 2007] Initialize randomly γ0 ji ∈ R (i, j = 1, . . . , n) and p0 j = n i=1 γ0 ji φ(xi). Then, for l = 1, . . . , n repeat Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 37. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Batch kernel SOM [Villa and Rossi, 2007] Initialize randomly γ0 ji ∈ R (i, j = 1, . . . , n) and p0 j = n i=1 γ0 ji φ(xi). Then, for l = 1, . . . , n repeat Assignment step for all xi, fl (xi) = arg min j=1,...,M φ(xi) − n i=1 γl jiφ(xi) H Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 38. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Batch kernel SOM [Villa and Rossi, 2007] Initialize randomly γ0 ji ∈ R (i, j = 1, . . . , n) and p0 j = n i=1 γ0 ji φ(xi). Then, for l = 1, . . . , n repeat Assignment step for all xi, fl (xi) = arg min j=1,...,M φ(xi) − n i=1 γl jiφ(xi) H Representation step γl j = arg min γ∈Rn n i=1 h(fl (xi), j) φ(xi) − n l =1 γl φ(xl ) 2 H Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 39. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Batch kernel SOM [Villa and Rossi, 2007] Initialize randomly γ0 ji ∈ R (i, j = 1, . . . , n) and p0 j = n i=1 γ0 ji φ(xi). Then, for l = 1, . . . , n repeat Assignment step for all xi, f(xi) = arg min j=1,...,M n u,u =1 γjuγju kβ (xu, xu ) − 2 n u=1 γjukβ (xu, xi) Representation step γl ji = h(fl (xi), j)) n i =1 h(fl(xi , j)) Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 40. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Table of contents 1 Motivations 2 Dissimilarities and distances between vertices 3 Kernel SOM 4 Application and comments Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 41. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Results on a 7 × 7 rectangular map Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 42. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Results on a 7 × 7 rectangular map Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 43. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Results on a 7 × 7 rectangular map Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 44. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments Expected developments 1 Hierarchical clustering; 2 Achieve a classification based on density criterium (joint work with S. Gadat); 3 Adapting the algorithm to very large graphs (thousands of vertices). Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
  • 45. Motivations Dissimilarities and distances between vertices Kernel SOM Application and comments References Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008). Batch kernel SOM and related laplacian methods for social network analysis. Neurocomputing. To appear. Villa, N. and Rossi, F. (2007). A comparison between dissimilarity SOM and kernel SOM for clustering the vertices of a graph. In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07), Bielefield, Germany. von Luxburg, U. (2007). A tutorial on spectral clustering. Technical Report TR-149, Max Planck Institut für biologische Kybernetik. Avaliable at http://www.kyb.mpg.de/publications/ attachments/luxburg06_TR_v2_4139%5B1%5D.pdf. Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008