SlideShare une entreprise Scribd logo
1  sur  87
Télécharger pour lire hors ligne
Graph mining 2
Statistical approaches for graph mining
Nathalie Villa-Vialaneix
nathalie.villa@toulouse.inra.fr
http://www.nathalievilla.org
Advanced mathematics for network analysis
Luchon, May 3rd 2016
Nathalie Villa-Vialaneix | Graph mining 2 1/48
Talk map...
Who am I? Statistician working in biostatistics at INRA Toulouse
My research interests are: data mining, network inference and
mining, machine learning
Purpose of this talk: presenting a few statistical tools for graph
mining (graph structure, important vertices) and clustering
Nathalie Villa-Vialaneix | Graph mining 2 2/48
Background
Unlike said so, G:
undirected and connected graph;
Nathalie Villa-Vialaneix | Graph mining 2 3/48
Background
Unlike said so, G:
undirected and connected graph;
with vertices V = {x1, ..., xn};
with set of edges E;
Nathalie Villa-Vialaneix | Graph mining 2 3/48
Background
Unlike said so, G:
undirected and connected graph;
with vertices V = {x1, ..., xn};
with set of edges E;
eventually with (positive and symmetric) weights on edges, wij
(st wii = 0, no self loop)
adjacency matrix A = (wij)i,j=1,...,n
Nathalie Villa-Vialaneix | Graph mining 2 3/48
Examples are made with...
the toy example “Les Misérables” (co-appearance network in
Hugo’s novel)
Myriel
Napoleon
MlleBaptistine
MmeMagloire
CountessDeLoGeborand
Champtercier
Cravatte
Count
OldMan
Labarre
Valjean
Marguerite
MmeDeR
Isabeau
Gervais
Tholomyes
Listolier
Fameuil
Blacheville
Favourite
Dahlia
Zephine
Fantine
MmeThenardier
Thenardier
Cosette
Javert
Fauchelevent
Bamatabois
Perpetue
Simplice
Scaufflaire
Woman1
Judge
Champmathieu
Brevet
Chenildieu
Cochepaille
Pontmercy
Boulatruelle
Eponine
Anzelma
Woman2
MotherInnocent
Gribier
Jondrette
MmeBurgon
Gavroche
Gillenormand
Magnon
MlleGillenormand
MmePontmercy
MlleVaubois
LtGillenormand
Marius
BaronessT
Mabeuf
Enjolras
Combeferre
Prouvaire
Feuilly
Courfeyrac
Bahorel
Bossuet
Joly
Grantaire
MotherPlutarch
GueulemerBabet
Claquesous
Montparnasse
Toussaint
Child1Child2
Brujon
MmeHucheloup
Nathalie Villa-Vialaneix | Graph mining 2 4/48
Examples are made with...
the toy example “Les Misérables” (co-appearance network in
Hugo’s novel)
software and especially the R package igraph
Nathalie Villa-Vialaneix | Graph mining 2 4/48
Examples are made with...
the toy example “Les Misérables” (co-appearance network in
Hugo’s novel)
software and especially the R package igraph
the full script and the dataset is available on my website at:
http://www.nathalievilla.org/teaching/toconet.html
Nathalie Villa-Vialaneix | Graph mining 2 4/48
Basic description of the graph
lesmis
## IGRAPH U--- 77 254 --
## + attr: layout (g/n), id (v/n), label (v/c), value (e/n)
## + edges:
## [1] 1-- 2 1-- 3 1-- 4 3-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10
## [11] 11--12 4--12 3--12 1--12 12--13 12--14 12--15 12--16 17--18 17--19
## [21] 18--19 17--20 18--20 19--20 17--21 18--21 19--21 20--21 17--22 18--22
## [31] 19--22 20--22 21--22 17--23 18--23 19--23 20--23 21--23 22--23 17--24
## [41] 18--24 19--24 20--24 21--24 22--24 23--24 13--24 12--24 24--25 12--25
## [51] 25--26 24--26 12--26 25--27 12--27 17--27 26--27 12--28 24--28 26--28
## [61] 25--28 27--28 12--29 28--29 24--30 28--30 12--30 24--31 31--32 12--32
## [71] 24--32 28--32 12--33 12--34 28--34 12--35 30--35 12--36 35--36 30--36
## + ... omitted several edges
U--- means: Undirected, not Named (no name attribute for the
vertices), not Weighted (no weight attribute for the edges) and not
Bipartite
Nathalie Villa-Vialaneix | Graph mining 2 5/48
System information
## R version 3.2.5 (2016-04-14)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 14.04.4 LTS
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] igraph_1.0.1 knitr_1.12.3
##
## loaded via a namespace (and not attached):
## [1] magrittr_1.5 formatR_1.3 tools_3.2.5 stringi_1.0-1
## [5] highr_0.5.1 stringr_1.0.0 evaluate_0.8.3
Nathalie Villa-Vialaneix | Graph mining 2 6/48
Outline
Numerical characteristics
Clustering
Modularity optimization
Spectral clustering
Model based clustering
Nathalie Villa-Vialaneix | Graph mining 2 7/48
Sketch of this section
Issue at stake:
a graph is given
Nathalie Villa-Vialaneix | Graph mining 2 8/48
Sketch of this section
Issue at stake:
a graph is given
numerical characteristics describing the graph, the nodes, are
a standard approach to describe it
Nathalie Villa-Vialaneix | Graph mining 2 8/48
Sketch of this section
Issue at stake:
a graph is given
numerical characteristics describing the graph, the nodes, are
a standard approach to describe it
how to know that the observed value are unexpected
according to a so-called “null model”?
Nathalie Villa-Vialaneix | Graph mining 2 8/48
Standard (global) characteristics
density: |E|
n(n−1)/2
graph.density
number of triangles: triangles (see also motifs)
transitivity: number of triangles divided by the number of
triplets with at least two edges transitivity
diameter: length of the longest shortest paths between two
nodes diameter
radius: minimal length, over all vertices in the graph, of the
longest shortest path linking this vertex to another vertex
radius
girth: length of the shortest circle in the graph girth
cohesion: minimum number of vertices to remove to
disconnect the graph
Nathalie Villa-Vialaneix | Graph mining 2 9/48
Standard (global) characteristics for “Les misérables”
graph.density(lesmis); triangles(lesmis); length(triangles(lesmis))/3
## [1] 0.08680793
## + 1401/77 vertices:
## [1] 12 1 3 12 1 4 12 3 4 12 24 32 12 24 13 12 24 25 12 24 30 12 25
## [24] 71 12 25 70 12 25 69 12 25 27 12 26 24 12 26 25 12 26 27 12 26 72 12
## [47] 26 71 12 26 70 12 26 69 12 27 73 12 27 52 12 27 50 12 27 44 12 28 73
## [70] 12 28 24 12 28 25 12 28 26 12 28 27 12 28 29 12 28 30 12 28 32 12 28
## [93] 34 12 28 44 12 28 72 12 28 59 12 28 69 12 28 70 12 28 71 12 29 45 12
## [116] 30 39 12 30 38 12 30 37 12 30 35 12 30 36 12 35 39 12 35 38 12 35 36
## [139] 12 35 37 12 36 39 12 36 38 12 36 37 12 37 39 12 37 38 12 38 39 12 49
## [162] 26 12 49 28 12 49 56 12 49 59 12 49 65 12 49 69 12 49 70 12 49 72 12
## [185] 50 52 12 56 26 12 56 27 12 56 65 12 56 50 12 56 52 12 56 59 12 59 71
## [208] 12 59 65 12 69 72 12 69 71 12 69 70 12 70 72 12 70 71 12 71 72 49 26
## + ... omitted several vertices
## [1] 467
transitivity(lesmis); diameter(lesmis); radius(lesmis); girth(lesmis)
## [1] 0.4989316
## [1] 5
## [1] 3
## $girth
## [1] 3
##
## $circle
## + 3/77 vertices:
## [1] 3 1 4
Nathalie Villa-Vialaneix | Graph mining 2 10/48
Comparison with random graphs...
Erdos-Renyi model with the same number of nodes and the same
number of edges than the original graph (uniform probability to
observe an edge between two given nodes)
Nathalie Villa-Vialaneix | Graph mining 2 11/48
Comparison with random graphs...
Erdos-Renyi model with the same number of nodes and the same
number of edges than the original graph (uniform probability to
observe an edge between two given nodes)
Method: compare the observed values with those of a large
number of randomly generated random graphs (with no loop, only
connected graphs are kept)
sample_gnm(vcount(lesmis), ecount(lesmis))
Nathalie Villa-Vialaneix | Graph mining 2 11/48
Results of the comparison with random graphs...
For B = 500 graphs (only connected graphs are kept), we have:
## density triangles transitivity diameter
## Min. :0.08681 Min. :31.00 Min. :0.05834 Min. :4.000
## 1st Qu.:0.08681 1st Qu.:43.00 1st Qu.:0.07907 1st Qu.:4.000
## Median :0.08681 Median :47.00 Median :0.08701 Median :5.000
## Mean :0.08681 Mean :47.55 Mean :0.08660 Mean :4.627
## 3rd Qu.:0.08681 3rd Qu.:52.00 3rd Qu.:0.09415 3rd Qu.:5.000
## Max. :0.08681 Max. :67.00 Max. :0.11793 Max. :6.000
## radius girth cohesion
## Min. :3.000 Min. :3 Min. :1.000
## 1st Qu.:3.000 1st Qu.:3 1st Qu.:1.000
## Median :3.000 Median :3 Median :2.000
## Mean :3.004 Mean :3 Mean :1.599
## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:2.000
## Max. :4.000 Max. :3 Max. :3.000
compared to: 0.0868079, 467, 0.4989316, 5, 3, 3, 1
⇒ all values are standard except for:
the number of triangles and the transitivity which are larger:
local connectivity is strongest than expected in Erdos-Renyi
random graphs
the cohesion which is in the lowest values of what is expected
in Erdos-Renyi random graphs: this again indicates a
strongest local connectivity
Nathalie Villa-Vialaneix | Graph mining 2 12/48
Standard (local) characteristics
... for the vertex xi:
degree: {xj : (xi, xj) ∈ E, j i} degree (or strength for the
weighted version, j i wij)
betweenness (or centrality): number of shortest paths
between any pair of vertices in the graph which pass through
xi betweenness
eccentricity: maximal length of all the shortest paths going
from xi to any other vertex in the graph eccentricity
closeness (or closeness centrality): 1
j i d(xi,xj)
in which d(xi, xj)
is the length of the shortest path between xi and xj closeness
...and their distributions among all vertices.
Nathalie Villa-Vialaneix | Graph mining 2 13/48
Standard (local) characteristics for “Les misérables”
summary(degree(lesmis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 6.000 6.597 10.000 36.000
summary(betweenness(lesmis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00 0.00 0.00 62.36 22.92 1624.00
summary(eccentricity(lesmis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.00 4.00 4.00 4.13 5.00 5.00
summary(closeness(lesmis))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.003378 0.004484 0.005181 0.005123 0.005435 0.008475
Nathalie Villa-Vialaneix | Graph mining 2 14/48
Comparison with random graphs...
Erdos-Renyi model with the same number of nodes and the same
number of edges than the original graph (uniform probability to
observe an edge between two given nodes)
Nathalie Villa-Vialaneix | Graph mining 2 15/48
Comparison with random graphs...
Erdos-Renyi model with the same number of nodes and the same
number of edges than the original graph (uniform probability to
observe an edge between two given nodes)
Method: compare the observed values (average betweenness and
degree) with those of a large number of randomly generated
random graphs (with no loop, only connected graphs are kept)
sample_gnm(vcount(lesmis), ecount(lesmis))
Nathalie Villa-Vialaneix | Graph mining 2 15/48
Results of the comparison with random graphs...
For B = 500 graphs (only connected graphs are kept), we have:
## degree betweenness eccentricity closeness
## Min. :6.597 Min. :54.64 Min. :3.597 Min. :0.005249
## 1st Qu.:6.597 1st Qu.:55.93 1st Qu.:3.779 1st Qu.:0.005322
## Median :6.597 Median :56.32 Median :3.857 Median :0.005340
## Mean :6.597 Mean :56.36 Mean :3.863 Mean :0.005340
## 3rd Qu.:6.597 3rd Qu.:56.71 3rd Qu.:3.909 3rd Qu.:0.005361
## Max. :6.597 Max. :58.79 Max. :4.688 Max. :0.005430
compared to: 6.597, 62.364, 4.13, 0.00512
⇒ the observed average betweenness is higher and the observed
average closeness is smaller for all the randomly generated
graphs: this seems to indicate that, in average, shortest paths in
the graphs are longer than expected for graphs with uniform
distribution of the edges.
Nathalie Villa-Vialaneix | Graph mining 2 16/48
Degree distribution for “Les misérables”
+
+
+
+
+
+
+
+
+
+
+++
++++ +
0 1 2 3
−4.0−3.5−3.0−2.5−2.0−1.5
log(k)
log(P(k))
Estimation of power law fit (left: α = 1.49) with
fit_power_law(degree(lesmis) + 1, implementation =
"R.mle")
Nathalie Villa-Vialaneix | Graph mining 2 17/48
Comparison with random graphs...
Scale free model with a parameter for the power law identical to
the one previously estimated and the same number of nodes.
Barabási and Albert model is used with a number of edges added
at each step which is chosen so that the final number of edges
resembles that of the original graph (3 edges, which gives 225
edges in the final graph, compared to 254)
P(degree = k) = k−α
Nathalie Villa-Vialaneix | Graph mining 2 18/48
Comparison with random graphs...
Scale free model with a parameter for the power law identical to
the one previously estimated and the same number of nodes.
Barabási and Albert model is used with a number of edges added
at each step which is chosen so that the final number of edges
resembles that of the original graph (3 edges, which gives 225
edges in the final graph, compared to 254)
P(degree = k) = k−α
Method: compare the observed values with those of a large
number of randomly generated random graphs
sample_pa(vcount(lesmis), m = 3, power = ..., directed =
FALSE)
Nathalie Villa-Vialaneix | Graph mining 2 18/48
Results of the comparison with random graphs...
For B = 500 graphs, we have:
## density triangles transitivity diameter
## Min. :0.0769 Min. : 72 Min. :0.1075 Min. :3.000
## 1st Qu.:0.0769 1st Qu.:102 1st Qu.:0.1250 1st Qu.:4.000
## Median :0.0769 Median :112 Median :0.1307 Median :4.000
## Mean :0.0769 Mean :113 Mean :0.1303 Mean :3.988
## 3rd Qu.:0.0769 3rd Qu.:124 3rd Qu.:0.1359 3rd Qu.:4.000
## Max. :0.0769 Max. :153 Max. :0.1530 Max. :5.000
## radius girth cohesion degree betweenness
## Min. :2.000 Min. :3 Min. :3 Min. :5.844 Min. :41.86
## 1st Qu.:2.000 1st Qu.:3 1st Qu.:3 1st Qu.:5.844 1st Qu.:47.88
## Median :2.000 Median :3 Median :3 Median :5.844 Median :49.55
## Mean :2.314 Mean :3 Mean :3 Mean :5.844 Mean :49.35
## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:3 3rd Qu.:5.844 3rd Qu.:50.97
## Max. :3.000 Max. :3 Max. :3 Max. :5.844 Max. :55.73
## eccentricity closeness
## Min. :2.935 Min. :0.005407
## 1st Qu.:3.130 1st Qu.:0.005695
## Median :3.221 Median :0.005788
## Mean :3.234 Mean :0.005805
## 3rd Qu.:3.325 3rd Qu.:0.005901
## Max. :3.662 Max. :0.006334
compared to: 0.087, 467, 0.499, 5, 3, 3, 1, 6.597, 62.364, 4.13, 0.00512
⇒ the number of triangles, the transitivity, the radius, the average degree, the
average betweenness and the eccentricity are larger than in power law graphs
with power 1.495, whereas the cohesion and the closeness are smaller.
Nathalie Villa-Vialaneix | Graph mining 2 19/48
Limits of the previous approaches
Until now, we have compared the real graph to graphs randomly
generated according to a given random model but:
this approach only gives information about global
characteristics of the observed graph;
none of the distributions of the current characteristics is
preserved during the process, especially not the degree
distribution which is central for controlling local/global
connectivity, counts of specific patterns...
Nathalie Villa-Vialaneix | Graph mining 2 20/48
A null model closer to the real graph...
Sketch of statistical tests on graphs
1. sample at random within the set of graphs with the same
degree distribution than the observed graph (B times)
2. compute a numerical statistics for each of these randomly
generated graphs
3. comparing the observed value of the statistics and its
distribution over the random graphs, a p-value can be derived
(for B large enough)
Nathalie Villa-Vialaneix | Graph mining 2 21/48
A null model closer to the real graph...
Sketch of statistical tests on graphs
1. sample at random within the set of graphs with the same
degree distribution than the observed graph (B times)
2. compute a numerical statistics for each of these randomly
generated graphs
3. comparing the observed value of the statistics and its
distribution over the random graphs, a p-value can be derived
(for B large enough)
Two main approaches to sample at random with fixed degrees:
configuration model [Bender and Canfield, 1978]
permutation approach [Rao et al., 1996, Roberts Jr., 2000]
Nathalie Villa-Vialaneix | Graph mining 2 21/48
Sampling at random within the set of graphs with a given
degree distribution
Aim:
all graphs can exhaustively be sampled
all graphs have the same probability to be sampled
⇒ MCMC approach
Nathalie Villa-Vialaneix | Graph mining 2 22/48
Sampling at random within the set of graphs with a given
degree distribution
Aim:
all graphs can exhaustively be sampled
all graphs have the same probability to be sampled
⇒ MCMC approach
Method:
1: Start from the observed graph G
2: for t = 1 → T do
3: Select uniformly at random two edges e1
= (x1
i , x1
j ) and e2
= (x2
i , x2
j ) ∈ E
4: E ← E  {e1
, e2
} ∪ {e1
s , e2
s } with e1
s = (x1
i , x2
j ) and e2
s = (x2
i , x1
j )
5: if G = (V, E ) is simple and connected then
6: G ← G
7: end if
8: end for
9: return G
Nathalie Villa-Vialaneix | Graph mining 2 22/48
In practice...
This method is used in [Milo et al., 2004] with T = 100. It can be
performed using rewire(lesmis, keeping_degseq(n = 100))
Number of triangles
Frequency
200 300 400
020406080100120
transitivity
Frequency
0.25 0.35 0.45
020406080100
Nathalie Villa-Vialaneix | Graph mining 2 23/48
In practice... for the vertex characteristics
Find a(n empirical) p-value for all vertices which indicates if its
betweenness is higher or lower than expected with respect to its
degree: ratio of random graphs for which the observed
betweenness is higher (resp. lower) than 95% of the
betweennesses for the corresponding vertex in random graphs.
Myriel
Valjean
Listolier
Fameuil
Blacheville
Favourite
Dahlia
Zephine
Fantine
Judge
Champmathieu
Brevet
Chenildieu
Cochepaille
LtGillenormand
Marius
Combeferre
Prouvaire
FeuillyCourfeyrac
BahorelJoly
Grantaire
GueulemerBabet
Claquesous
MontparnasseBrujon
MmeHucheloup
Nathalie Villa-Vialaneix | Graph mining 2 24/48
More on random graphs generation
Sometimes, one wants to compare the observed graph with a
more sophisticated (constrained) null model (taking into account
some additional information on edges or nodes for instance):
This can be achieved using the same principle and throwing
away the random graphs which do not satisfy the constrains.
Nathalie Villa-Vialaneix | Graph mining 2 25/48
More on random graphs generation
Sometimes, one wants to compare the observed graph with a
more sophisticated (constrained) null model (taking into account
some additional information on edges or nodes for instance):
This can be achieved using the same principle and throwing
away the random graphs which do not satisfy the constrains.
Warning: The more sophisticated the model is, the more
costly the simulation would be. For instance, only removing
graphs with multiple edges and graphs which are not
connected leads to throw away 47 simulations over 500.
Nathalie Villa-Vialaneix | Graph mining 2 25/48
More on random graphs generation
Sometimes, one wants to compare the observed graph with a
more sophisticated (constrained) null model (taking into account
some additional information on edges or nodes for instance):
This can be achieved using the same principle and throwing
away the random graphs which do not satisfy the constrains.
Warning: The more sophisticated the model is, the more
costly the simulation would be. For instance, only removing
graphs with multiple edges and graphs which are not
connected leads to throw away 47 simulations over 500.
Possible solution: [Tabourier and Cointet, 2011] use multiple edge
switching to improve the simulations such simulations.
Nathalie Villa-Vialaneix | Graph mining 2 25/48
Outline
Numerical characteristics
Clustering
Modularity optimization
Spectral clustering
Model based clustering
Nathalie Villa-Vialaneix | Graph mining 2 26/48
Sketch of this section
Issue at stake:
short overview of different types of methods for vertex
clustering
only simple clustering (although some methods for
overlapping clustering, clustering according to vertex/edge
attributes, clustering of bipartite graphs... also exist)
statistical relevance and comparison of clustering results
Nathalie Villa-Vialaneix | Graph mining 2 27/48
A short overview of vertex clustering
Purpose: Find communities or modules (i.e., groups of vertices) st
vertices inside the community are strongly connected whereas
vertices between two communities are slightly connected.
Nathalie Villa-Vialaneix | Graph mining 2 28/48
A short overview of vertex clustering
Purpose: Find communities or modules (i.e., groups of vertices) st
vertices inside the community are strongly connected whereas
vertices between two communities are slightly connected.
Some approaches to perform such task:
optimizing a given criterion (e.g., modularity maximization)
spectral clustering
model based clustering
... (see [Fortunato and Barthélémy, 2007, Schaeffer, 2007,
Brohée and van Helden, 2006])
Nathalie Villa-Vialaneix | Graph mining 2 28/48
Clustering based on criterion optimization
“Cut” criteria: Given a number of clusters, K, find the partition
of V, C1, . . . , CK such that it solves the mincut problem, i.e., it
minimizes
cut(A1, . . . , AK ) =
1
2
K
k=1 xi∈Ak , xj Ak
wij
Nathalie Villa-Vialaneix | Graph mining 2 29/48
Clustering based on criterion optimization
“Cut” criteria: Given a number of clusters, K, find the partition
of V, C1, . . . , CK such that it solves the mincut problem, i.e., it
minimizes
cut(A1, . . . , AK ) =
1
2
K
k=1 xi∈Ak , xj Ak
wij
Problem: The mincut problem often separates individual
vertices from the rest of the graph.
Nathalie Villa-Vialaneix | Graph mining 2 29/48
Clustering based on criterion optimization
“Cut” criteria: Given a number of clusters, K, find the partition
of V, C1, . . . , CK such that it solves the “RatioCut” problem,
i.e., it minimizes
RatioCut(A1, . . . , AK ) =
1
2
K
k=1 xi∈Ak , xj Ak
wij
|Ak |
(forces larger communities than the mincut problem).
Nathalie Villa-Vialaneix | Graph mining 2 29/48
Clustering based on criterion optimization
“Cut” criteria: Given a number of clusters, K, find the partition
of V, C1, . . . , CK such that it solves the “NCut” problem, i.e.,
it minimizes
NCut(A1, . . . , AK ) =
1
2
K
k=1 xi∈Ak , xj Ak
wij
Vol(Ak )
in which Vol(Ak ) = xi, xj∈Ak
wij (also forces larger
communities than the mincut problem).
Nathalie Villa-Vialaneix | Graph mining 2 29/48
Clustering based on criterion optimization
“Cut” criteria
“Modularity” criterion [Newman and Girvan, 2004]: Given a
number of clusters, K, find the partition of V, C1, . . . , CK
which maximizes
Q(A1, . . . , Ak ) =
1
2m
K
k=1 xi, xj∈Ck
(wij − Pij)
with Pij: weight of a “null model” (graph with the same degree
distribution but no preferential attachment): Pij =
didj
2m with
di = 1
2 j i wij.
Nathalie Villa-Vialaneix | Graph mining 2 29/48
Advantages and drawbacks
mincut is not adapted to vertex clustering in practice (clusters
with isolated vertices)
the other three methods are NP hard to solve...
Nathalie Villa-Vialaneix | Graph mining 2 30/48
Advantages and drawbacks
mincut is not adapted to vertex clustering in practice (clusters
with isolated vertices)
the other three methods are NP hard to solve...
the modularity takes into account asymmetry in degree
distribution by correcting the importance of a vertex by its
degree: it is often more adapted to real life graphs
[Fortunato and Barthélémy, 2007] showed that modularity has a
small resolution issue. [Bickel and Chen, 2009] gave conditions
for consistency of the clusters obtained by modularity
optimization in Stochastic Block Models (SBM).
Nathalie Villa-Vialaneix | Graph mining 2 30/48
Advantages and drawbacks
mincut is not adapted to vertex clustering in practice (clusters
with isolated vertices)
the other three methods are NP hard to solve...
the modularity takes into account asymmetry in degree
distribution by correcting the importance of a vertex by its
degree: it is often more adapted to real life graphs
[Fortunato and Barthélémy, 2007] showed that modularity has a
small resolution issue. [Bickel and Chen, 2009] gave conditions
for consistency of the clusters obtained by modularity
optimization in Stochastic Block Models (SBM).
Remark: Relaxation of RatioCut problem and NCut problem gives
spectral clustering. Modularity optimization is often solved by
approximation methods.
Nathalie Villa-Vialaneix | Graph mining 2 30/48
A short description of approximation methods for
modularity optimization
simple greedy algorithms ([Newman, 2004] and
[Clauset et al., 2004] for a fast version): hierarchical clustering
which merges pairs of vertices with the highest contribution to
modularity cluster_fast_greedy
Nathalie Villa-Vialaneix | Graph mining 2 31/48
A short description of approximation methods for
modularity optimization
simple greedy algorithms ([Newman, 2004] and
[Clauset et al., 2004] for a fast version): hierarchical clustering
which merges pairs of vertices with the highest contribution to
modularity cluster_fast_greedy
multi-level greedy algorithms ([Blondel et al., 2008], also known
as “Louvain algorithm” and [Noack and Rotta, 2009] for an
improved version): hierarchical approach in which vertices are
sometimes re-assigned to a different community in a greedy
way cluster_louvain
Nathalie Villa-Vialaneix | Graph mining 2 31/48
A short description of approximation methods for
modularity optimization
simple greedy algorithms ([Newman, 2004] and
[Clauset et al., 2004] for a fast version): hierarchical clustering
which merges pairs of vertices with the highest contribution to
modularity cluster_fast_greedy
multi-level greedy algorithms ([Blondel et al., 2008], also known
as “Louvain algorithm” and [Noack and Rotta, 2009] for an
improved version): hierarchical approach in which vertices are
sometimes re-assigned to a different community in a greedy
way cluster_louvain
simulated annealing ([Reichardt and Bornholdt, 2006] uses a
spin-glass model which, in some cases, is equivalent to
modularity maximization) cluster_spinglass(..., gamma
= 1, update.rule = "config")
Nathalie Villa-Vialaneix | Graph mining 2 31/48
A short description of approximation methods for
modularity optimization
simple greedy algorithms ([Newman, 2004] and
[Clauset et al., 2004] for a fast version): hierarchical clustering
which merges pairs of vertices with the highest contribution to
modularity cluster_fast_greedy
multi-level greedy algorithms ([Blondel et al., 2008], also known
as “Louvain algorithm” and [Noack and Rotta, 2009] for an
improved version): hierarchical approach in which vertices are
sometimes re-assigned to a different community in a greedy
way cluster_louvain
simulated annealing ([Reichardt and Bornholdt, 2006] uses a
spin-glass model which, in some cases, is equivalent to
modularity maximization) cluster_spinglass(..., gamma
= 1, update.rule = "config")
...to be compared (when usable) with the exact optimization
cluster_optimal.
Nathalie Villa-Vialaneix | Graph mining 2 31/48
Examples
res_time <- cbind(
system.time(res_hierarchical <- cluster_fast_greedy(lesmis)),
system.time(res_multilevel <- cluster_louvain(lesmis)),
system.time(res_annealing <- cluster_spinglass(lesmis)),
system.time(res_exact <- cluster_optimal(lesmis))
)[3, ]
## hierarchical multilevel annealing exact
## 0.002 0.002 1.907 21.656
Nathalie Villa-Vialaneix | Graph mining 2 32/48
Computational time (greedy approaches)
Difference (computational time) between the first two approaches
(100 evaluations):
library(microbenchmark)
res_micro <- microbenchmark(cluster_fast_greedy(lesmis),
cluster_louvain(lesmis))
cluster_fast_greedy(lesmis)
cluster_louvain(lesmis)
1000
Time [microseconds]
Nathalie Villa-Vialaneix | Graph mining 2 33/48
Accuracy of the clustering
hierarchical − 0.5006 − 5 multilevel − 0.5556 − 6
simulated annealing − 0.5596 − 7 exact − 0.56 − 6
Nathalie Villa-Vialaneix | Graph mining 2 34/48
Assessing the relevance of a clustering
Given a graph, the modularity optimization will always return a
clustering: how to know that this clustering is meaningful? (i.e.,
that its modularity is large)
Nathalie Villa-Vialaneix | Graph mining 2 35/48
Assessing the relevance of a clustering
Given a graph, the modularity optimization will always return a
clustering: how to know that this clustering is meaningful? (i.e.,
that its modularity is large)
Similarly as previously, compare the maximum modularity to the
maximum modularity over a large number of randomly generated
graphs (with same degree sequence).
Modularity
Frequency
0.30 0.35 0.40 0.45 0.50 0.55
020406080
Nathalie Villa-Vialaneix | Graph mining 2 35/48
Relation between RatioCut and Laplacian
[von Luxburg, 2007] shows that minimizing
RatioCut(A1, A2) =
1
2
2
k=1 xi∈Ak , xj Ak
wij
|Ak |
is equivalent to the following constrained problem:
min
A1, ,A2
v Lv st v ⊥ 1n and v =
√
n
for v the vector of Rn
obtained from the partition by:
vi =
(|A2|)/|A1| if vi ∈ A1
− |A1|/(|A2|) otherwise.
and L is the Laplacian of the graph, n × n-matrix with entries:
Lij =
−wij if i j
di = j i wij otherwise
.
Nathalie Villa-Vialaneix | Graph mining 2 36/48
... and more remarks
this is a discrete (since v can only have two values) and
NP-hard problem;
Nathalie Villa-Vialaneix | Graph mining 2 37/48
... and more remarks
this is a discrete (since v can only have two values) and
NP-hard problem;
the same relation holds between NCut problem and
normalized Laplacian D−1/2
LD−1/2
is which
D = Diag(d1, . . . , dn);
Nathalie Villa-Vialaneix | Graph mining 2 37/48
... and more remarks
this is a discrete (since v can only have two values) and
NP-hard problem;
the same relation holds between NCut problem and
normalized Laplacian D−1/2
LD−1/2
is which
D = Diag(d1, . . . , dn);
a generalization of these results exist for K > 2.
Nathalie Villa-Vialaneix | Graph mining 2 37/48
Some properties of the Laplacian
Relations with the graph structure:
1
2
3
4
5
has a null space spanned by the vectors


1
1
1
0
0


and


0
0
0
1
1


.
Nathalie Villa-Vialaneix | Graph mining 2 38/48
Some properties of the Laplacian
Relations with the graph structure: the vector 1n spans the null
space for connected graphs.
Nathalie Villa-Vialaneix | Graph mining 2 38/48
Some properties of the Laplacian
Relations with the graph structure:
Random walk point of view: If we consider a random walk on the
graph with probability to jump from one node to the other equal to
wij
di
then NCut(A1, A2) is interpreted as the probability to go from A1
to A2 or from A2 to A1.
Nathalie Villa-Vialaneix | Graph mining 2 38/48
Some properties of the Laplacian
Relations with the graph structure:
Random walk point of view: If we consider a random walk on the
graph with probability to jump from one node to the other equal to
wij
di
then the average time to go from one node to another
(commute time) is given by L+ [Fouss et al., 2007].
Nathalie Villa-Vialaneix | Graph mining 2 38/48
Spectral clustering: relaxing the constrains
K has to be given. Solving minA1, ,A2
Tr(U LU) for a K × n matrix U
st U U = 1:
1. Compute the first K eigenvectors of L, u1
, . . . , uK
and write
U = (u1
, . . . , uK
) (a n × K matrix).
Nathalie Villa-Vialaneix | Graph mining 2 39/48
Spectral clustering: relaxing the constrains
K has to be given. Solving minA1, ,A2
Tr(U LU) for a K × n matrix U
st U U = 1:
1. Compute the first K eigenvectors of L, u1
, . . . , uK
and write
U = (u1
, . . . , uK
) (a n × K matrix).
2. For i = 1, . . . , n, denote ui ∈ RK
the i-th row of U. Cluster the
points (ui)i=1,...,n using a clustering algorithm (e.g., k-means).
Nathalie Villa-Vialaneix | Graph mining 2 39/48
Spectral clustering: relaxing the constrains
K has to be given. Solving minA1, ,A2
Tr(U LU) for a K × n matrix U
st U U = 1:
1. Compute the first K eigenvectors of L, u1
, . . . , uK
and write
U = (u1
, . . . , uK
) (a n × K matrix).
2. For i = 1, . . . , n, denote ui ∈ RK
the i-th row of U. Cluster the
points (ui)i=1,...,n using a clustering algorithm (e.g., k-means).
embed_laplacian_matrix(..., no = ..., which = "sa",
scaled = ...) et kmeans(..., centers = ..., nstart =
10)
Nathalie Villa-Vialaneix | Graph mining 2 39/48
Spectral clustering in practice
res_time_spec <- system.time({
spec_embed <- embed_laplacian_matrix(lesmis, no = 6, which = "sa",
scaled = FALSE)
res_spectral <- kmeans(spec_embed$X[ ,-1], centers = 6, nstart = 1)
})[3]
res_time_spec
## elapsed
## 0.017
Time is between the greedy approaches for modularity
optimization and simulated annealing for modularity optimization.
Nathalie Villa-Vialaneix | Graph mining 2 40/48
Accuracy of the clustering
spectral clustering − 0.4461 − 6 exact − 0.56 − 6
Modularity is smaller (as expected) and clusters tend to be more
unbalanced. An empirical comparison between the performance of
spectral clustering and modularity optimization is provided in
[Bickel and Chen, 2009]. [Lei and Rinaldo, 2015] gives conditions for the
consistency of spectral clustering in stochastic block models.
Nathalie Villa-Vialaneix | Graph mining 2 41/48
A mixture model for networks
[Snijders and Nowicki, 1997]: The observed network G is supposed to
be the realization of some random graph model in which vertices
are organized in groups.
description of the model
vertices xi belong to an unknow class in {C1, ..., CK } (K is
given) ⇒ latent (unobserved) variables
Zi ∼ M(1, α = (α1, . . . , αK ))
in which αk is the probability that xi belongs to Ck
Nathalie Villa-Vialaneix | Graph mining 2 42/48
A mixture model for networks
[Snijders and Nowicki, 1997]: The observed network G is supposed to
be the realization of some random graph model in which vertices
are organized in groups.
description of the model
vertices xi belong to an unknow class in {C1, ..., CK } (K is
given) ⇒ latent (unobserved) variables
Zi ∼ M(1, α = (α1, . . . , αK ))
in which αk is the probability that xi belongs to Ck
given the class membership, the probabilities to have an edge
between xi and xj are all independant and obtained by:
wij = 1|Zik Zik = 1 ∼ L(., πkk )
for a given distribution L
Nathalie Villa-Vialaneix | Graph mining 2 42/48
A mixture model for networks
[Snijders and Nowicki, 1997]: The observed network G is supposed to
be the realization of some random graph model in which vertices
are organized in groups.
description of the model
vertices xi belong to an unknow class in {C1, ..., CK } (K is
given) ⇒ latent (unobserved) variables
Zi ∼ M(1, α = (α1, . . . , αK ))
in which αk is the probability that xi belongs to Ck
given the class membership, the probabilities to have an edge
between xi and xj are all independant and obtained by:
typically, the Bernouilli distribution with probability πkk with
πkk =
p1 if k = k
p0 if k k
for p1 > p0.
Nathalie Villa-Vialaneix | Graph mining 2 42/48
Basic principle for using SBM
1. assignments of vertices to groups;
2. parameter estimation ((αk )k and (πkk )k,k );
3. estimation of the number of groups.
Nathalie Villa-Vialaneix | Graph mining 2 43/48
Basic principle for using SBM
1. assignments of vertices to groups;
2. parameter estimation ((αk )k and (πkk )k,k );
3. estimation of the number of groups.
Estimation is made by Bayesian or frequentist approaches and
Variational EM (see e.g., [Daudin et al., 2008] for the more
computationally efficient frequentist approach). Number of nodes
can be chosen using ICL [Biernacki et al., 2000].
Nathalie Villa-Vialaneix | Graph mining 2 43/48
Basic principle for using SBM
1. assignments of vertices to groups;
2. parameter estimation ((αk )k and (πkk )k,k );
3. estimation of the number of groups.
Estimation is made by Bayesian or frequentist approaches and
Variational EM (see e.g., [Daudin et al., 2008] for the more
computationally efficient frequentist approach). Number of nodes
can be chosen using ICL [Biernacki et al., 2000].
All this is implemented in the package blockmodels [Léger, 2016].
BM_bernoulli("SBM_sym", as_adjacency_matrix(...,
sparse = FALSE))
BM_bernoulli$estimate()
Nathalie Villa-Vialaneix | Graph mining 2 43/48
SBM in practice
library(blockmodels)
res_time_sbm <- system.time({
res_sbm <- BM_bernoulli("SBM_sym",
as_adjacency_matrix(lesmis, sparse = FALSE))
res_sbm$estimate()
})[3]
res_time_sbm
## elapsed
## 1.821
opt_K <- which.max(res_sbm$ICL)
opt_K
## [1] 6
sbm_clust <- apply(res_sbm$memberships[[opt_K]]$Z, 1, which.max)
Nathalie Villa-Vialaneix | Graph mining 2 44/48
Accuracy of the clustering
SBM clustering − 0.4556 − 6 exact − 0.56 − 6
Modularity is smaller (as expected) but groups can be interpreted
by being sets of vertices with similar connecting patterns.
Nathalie Villa-Vialaneix | Graph mining 2 45/48
Comparing clustering
Various metrics ((di)similarities) exist to compare clustering,
among which:
Rand Index [Rand, 1971] compare(..., method = "rand"):
number of agreements between the two clusterings
n
Normalized Mutual Information [Danon et al., 2005]
compare(..., method = "nmi")
K1
k=1
K2
k =1
nkk
n
log


nkk n
n1
k
n2
k


in which Kj is the number of clusters in clustering j, n
j
k
is the
number of vertices classified into cluster k for clustering j and
nkk is the number of vertices classified into cluster k for
clustering 1 and cluster k for clustering 2. The similarity is
normalized so that it is between 0 and 1 (1 is for a perfect
match).
Nathalie Villa-Vialaneix | Graph mining 2 46/48
How do clusterings relate?
Method:
1. compute a dissimilarity based on Rand index or NMI
(1 − value)
2. perform clustering (of the results of vertex clustering) using
hierarchical clustering hclust
Nathalie Villa-Vialaneix | Graph mining 2 47/48
How do clusterings relate?
sbm
spectral
hierarchical
multilevel
annealing
exact
0.00.10.20.3
Rand index
hclust (*, "complete")
as.dist(compare_rand)
Height
sbm
spectral
hierarchical
multilevel
annealing
exact
0.00.20.40.6
NMI
hclust (*, "complete")
as.dist(compare_nmi)
Height
Nathalie Villa-Vialaneix | Graph mining 2 47/48
Any question?
Nathalie Villa-Vialaneix | Graph mining 2 48/48
Bender, E. and Canfield, E. (1978).
The asymptotic number of labeled graphs with given degree sequences.
Journal of Combinatorial Theory, Series A, 24(3):296–307.
Bickel, P. and Chen, A. (2009).
A nonparametric view of network models and Newman-Girvan and other modularities.
Proceedings of the National Academy of Sciences, USA, 106(50):21068–21073.
Biernacki, C., Celeux, G., and Govaert, G. (2000).
Assessing a mixture model for clustering with the integrated completed likelihood.
IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725.
Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008).
Fast unfolding of communites in large networks.
Journal of Statistical Mechanics: Theory and Experiment, P10008:1742–5468.
Brohée, S. and van Helden, J. (2006).
Evaluation of clustering algorithms for protein-protein interaction networks.
BMC Bioinformatics, 7(488).
Clauset, A., Newman, M. E. J., and Moore, C. (2004).
Finding community structure in very large networks.
Physical Review E, 70:066111.
Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005).
Comparing community structure identification.
Journal of Statistical Mechanics, page P09008.
Daudin, J., Picard, F., and Robin, S. (2008).
A mixture model for random graphs.
Statistics and Computing, 18:173–183.
Fortunato, S. and Barthélémy, M. (2007).
Resolution limit in community detection.
Nathalie Villa-Vialaneix | Graph mining 2 48/48
In Proceedings of the National Academy of Sciences, volume 104, pages 36–41.
doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract.
Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007).
Random-walk computation of similarities between nodes of a graph, with application to collaborative
recommendation.
IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369.
Léger, J. (2016).
Blockmodels: a R-package for estimating in LBM and SBM, with many pdf, with or without covariates.
Preprint arXiv 1602.07587v1. Submitted for publication.
Lei, J. and Rinaldo, A. (2015).
Consistency of spectral clustering in stochastic block models.
The Annals of Statistics, 43(1):215–237.
Milo, R., Kashtan, N., Itzkovitz, S., Newman, M., and Alon, U. (2004).
On the uniform generation of random graphs with prescribed degree sequences.
eprint arXiv: cond-mat/0312028v2.
Newman, M. (2004).
Fast algorithm for detecting community structure in networks.
Physical Review E, 69:066133.
Newman, M. and Girvan, M. (2004).
Finding and evaluating community structure in networks.
Physical Review, E, 69:026113.
Noack, A. and Rotta, R. (2009).
Multi-level algorithms for modularity clustering.
In SEA 2009: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268,
Berlin, Heidelberg. Springer-Verlag.
Rand, W. (1971).
Nathalie Villa-Vialaneix | Graph mining 2 48/48
Objective criteria for the evaluation of clustering methods.
Journal of the American Statistical Association, 66(336):846–850.
Rao, A., Jana, R., and Bandyopadhyay, S. (1996).
A markov chain monte carlo method for generating random (0, 1)-matrices with given marginals.
Sankhyã: The Indian Journal of Statistics, Series A (1961-2002), 58(2):225–242.
Reichardt, J. and Bornholdt, S. (2006).
Statistical mechanics of community detection.
Physical Review, E, 74(016110).
Roberts Jr., J. (2000).
Simple methods for simulating sociomatrices with given marginal totals.
Social Networks, 22(3):273 – 283.
Schaeffer, S. (2007).
Graph clustering.
Computer Science Review, 1(1):27–64.
Snijders, T. and Nowicki, K. (1997).
Estimation and prediction for stochastic block-structures for graphs with latent block structure.
Journal of Classification, 14:75–100.
Tabourier, L.and Roth, C. and Cointet, J. (2011).
Generating constrained random graphs using multiple edge switches.
ACM Journal of Experimental Algorithmics, 16(1):1.7.
von Luxburg, U. (2007).
A tutorial on spectral clustering.
Statistics and Computing, 17(4):395–416.
Nathalie Villa-Vialaneix | Graph mining 2 48/48

Contenu connexe

Tendances

Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsFrancesco Tudisco
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithmsMohammad Sadiq
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesAnkit Garg
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsDrishti Bhalla
 
Graph Algorithms
Graph AlgorithmsGraph Algorithms
Graph AlgorithmsAshwin Shiv
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsThirunavukarasu Mani
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmMahesh Kodituwakku
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsKetan Jani
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.Mohd Arif
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons dericationKumar
 
Line Drawing Algorithms - Computer Graphics - Notes
Line Drawing Algorithms - Computer Graphics - NotesLine Drawing Algorithms - Computer Graphics - Notes
Line Drawing Algorithms - Computer Graphics - NotesOmprakash Chauhan
 

Tendances (18)

Pixelrelationships
PixelrelationshipsPixelrelationships
Pixelrelationships
 
2.5 graph dfs
2.5 graph dfs2.5 graph dfs
2.5 graph dfs
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Optimisation random graph presentation
Optimisation random graph presentationOptimisation random graph presentation
Optimisation random graph presentation
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectors
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithms
 
Discrete time signals on MATLAB
Discrete time signals on MATLABDiscrete time signals on MATLAB
Discrete time signals on MATLAB
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniques
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer Graphics
 
DDA algorithm
DDA algorithmDDA algorithm
DDA algorithm
 
Graph Algorithms
Graph AlgorithmsGraph Algorithms
Graph Algorithms
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygons
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing Algorithm
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygons
 
bresenham circles and polygons in computer graphics(Computer graphics tutorials)
bresenham circles and polygons in computer graphics(Computer graphics tutorials)bresenham circles and polygons in computer graphics(Computer graphics tutorials)
bresenham circles and polygons in computer graphics(Computer graphics tutorials)
 
Circle drawing algo.
Circle drawing algo.Circle drawing algo.
Circle drawing algo.
 
Bresenham circles and polygons derication
Bresenham circles and polygons dericationBresenham circles and polygons derication
Bresenham circles and polygons derication
 
Line Drawing Algorithms - Computer Graphics - Notes
Line Drawing Algorithms - Computer Graphics - NotesLine Drawing Algorithms - Computer Graphics - Notes
Line Drawing Algorithms - Computer Graphics - Notes
 

En vedette

Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression networktuxette
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014tuxette
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)SocialMediaMining
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Xiaohan Zeng
 
Initial-Population Bias in the Univariate Estimation of Distribution Algorithm
Initial-Population Bias in the Univariate Estimation of Distribution AlgorithmInitial-Population Bias in the Univariate Estimation of Distribution Algorithm
Initial-Population Bias in the Univariate Estimation of Distribution AlgorithmMartin Pelikan
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two PartsPatti Anklam
 
Mining the social graph
Mining the social graphMining the social graph
Mining the social graphshunya kimura
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...ACMBangalore
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Matthew Russell
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Pythonrik0
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projectsLinkurious
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisHendrik Speck
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph MiningSabri Skhiri
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 

En vedette (20)

Graph mining
Graph miningGraph mining
Graph mining
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression network
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014
 
Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)Social Media Mining - Chapter 2 (Graph Essentials)
Social Media Mining - Chapter 2 (Graph Essentials)
 
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
Social Network Analysis: What It Is, Why We Should Care, and What We Can Lear...
 
Initial-Population Bias in the Univariate Estimation of Distribution Algorithm
Initial-Population Bias in the Univariate Estimation of Distribution AlgorithmInitial-Population Bias in the Univariate Estimation of Distribution Algorithm
Initial-Population Bias in the Univariate Estimation of Distribution Algorithm
 
Social Network Analysis in Two Parts
Social Network Analysis in Two PartsSocial Network Analysis in Two Parts
Social Network Analysis in Two Parts
 
Mining the social graph
Mining the social graphMining the social graph
Mining the social graph
 
Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...Social Network Analysis (SNA) and its implications for knowledge discovery in...
Social Network Analysis (SNA) and its implications for knowledge discovery in...
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)Mining Social Web APIs with IPython Notebook (PyCon 2014)
Mining Social Web APIs with IPython Notebook (PyCon 2014)
 
Complex and Social Network Analysis in Python
Complex and Social Network Analysis in PythonComplex and Social Network Analysis in Python
Complex and Social Network Analysis in Python
 
Kick start graph visualization projects
Kick start graph visualization projectsKick start graph visualization projects
Kick start graph visualization projects
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Large Graph Mining
Large Graph MiningLarge Graph Mining
Large Graph Mining
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 

Similaire à Graph mining 2: Statistical approaches for graph mining

Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjteUnit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjtepournima055
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Nima Sarshar
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphsAndres Mendez-Vazquez
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaPyData
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics PipelineMark Kilgard
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & RasterizationAhmed Daoud
 
Dijkstra’s algorithm
Dijkstra’s algorithmDijkstra’s algorithm
Dijkstra’s algorithmfaisal2204
 
Wiener Filter Hardware Realization
Wiener Filter Hardware RealizationWiener Filter Hardware Realization
Wiener Filter Hardware RealizationSayan Chaudhuri
 
04 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x204 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x2MuradAmn
 
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGA
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGAScientific Computing II Numerical Tools & Algorithms - CEI40 - AGA
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGAAhmed Gamal Abdel Gawad
 

Similaire à Graph mining 2: Statistical approaches for graph mining (20)

Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjteUnit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
Unit II_Graph.pptxkgjrekjgiojtoiejhgnltegjte
 
Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)Mrongraphs acm-sig-2 (1)
Mrongraphs acm-sig-2 (1)
 
Preparation Data Structures 11 graphs
Preparation Data Structures 11 graphsPreparation Data Structures 11 graphs
Preparation Data Structures 11 graphs
 
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam LermaGraph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
 
ae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.pptae_722_unstructured_meshes.ppt
ae_722_unstructured_meshes.ppt
 
Unit ix graph
Unit   ix    graph Unit   ix    graph
Unit ix graph
 
CS 354 More Graphics Pipeline
CS 354 More Graphics PipelineCS 354 More Graphics Pipeline
CS 354 More Graphics Pipeline
 
12_Graph.pptx
12_Graph.pptx12_Graph.pptx
12_Graph.pptx
 
Unit 9 graph
Unit   9 graphUnit   9 graph
Unit 9 graph
 
Class8 calculus ii
Class8 calculus iiClass8 calculus ii
Class8 calculus ii
 
Clipping & Rasterization
Clipping & RasterizationClipping & Rasterization
Clipping & Rasterization
 
Dijkstra’s algorithm
Dijkstra’s algorithmDijkstra’s algorithm
Dijkstra’s algorithm
 
Wiener Filter Hardware Realization
Wiener Filter Hardware RealizationWiener Filter Hardware Realization
Wiener Filter Hardware Realization
 
Counting trees.pptx
Counting trees.pptxCounting trees.pptx
Counting trees.pptx
 
04 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x204 greedyalgorithmsii 2x2
04 greedyalgorithmsii 2x2
 
Dijkstra
DijkstraDijkstra
Dijkstra
 
d
dd
d
 
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGA
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGAScientific Computing II Numerical Tools & Algorithms - CEI40 - AGA
Scientific Computing II Numerical Tools & Algorithms - CEI40 - AGA
 
Data structure and algorithm
Data structure and algorithmData structure and algorithm
Data structure and algorithm
 
Introduction to Graph Theory
Introduction to Graph TheoryIntroduction to Graph Theory
Introduction to Graph Theory
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 

Dernier

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...jana861314
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxUmerFayaz5
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...anilsa9823
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000Sapana Sha
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINsankalpkumarsahoo174
 

Dernier (20)

SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
Traditional Agroforestry System in India- Shifting Cultivation, Taungya, Home...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Animal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptxAnimal Communication- Auditory and Visual.pptx
Animal Communication- Auditory and Visual.pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
Lucknow 💋 Russian Call Girls Lucknow Finest Escorts Service 8923113531 Availa...
 
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
9953056974 Young Call Girls In Mahavir enclave Indian Quality Escort service
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
DIFFERENCE IN BACK CROSS AND TEST CROSS
DIFFERENCE IN  BACK CROSS AND TEST CROSSDIFFERENCE IN  BACK CROSS AND TEST CROSS
DIFFERENCE IN BACK CROSS AND TEST CROSS
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATINChromatin Structure | EUCHROMATIN | HETEROCHROMATIN
Chromatin Structure | EUCHROMATIN | HETEROCHROMATIN
 

Graph mining 2: Statistical approaches for graph mining

  • 1. Graph mining 2 Statistical approaches for graph mining Nathalie Villa-Vialaneix nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Advanced mathematics for network analysis Luchon, May 3rd 2016 Nathalie Villa-Vialaneix | Graph mining 2 1/48
  • 2. Talk map... Who am I? Statistician working in biostatistics at INRA Toulouse My research interests are: data mining, network inference and mining, machine learning Purpose of this talk: presenting a few statistical tools for graph mining (graph structure, important vertices) and clustering Nathalie Villa-Vialaneix | Graph mining 2 2/48
  • 3. Background Unlike said so, G: undirected and connected graph; Nathalie Villa-Vialaneix | Graph mining 2 3/48
  • 4. Background Unlike said so, G: undirected and connected graph; with vertices V = {x1, ..., xn}; with set of edges E; Nathalie Villa-Vialaneix | Graph mining 2 3/48
  • 5. Background Unlike said so, G: undirected and connected graph; with vertices V = {x1, ..., xn}; with set of edges E; eventually with (positive and symmetric) weights on edges, wij (st wii = 0, no self loop) adjacency matrix A = (wij)i,j=1,...,n Nathalie Villa-Vialaneix | Graph mining 2 3/48
  • 6. Examples are made with... the toy example “Les Misérables” (co-appearance network in Hugo’s novel) Myriel Napoleon MlleBaptistine MmeMagloire CountessDeLoGeborand Champtercier Cravatte Count OldMan Labarre Valjean Marguerite MmeDeR Isabeau Gervais Tholomyes Listolier Fameuil Blacheville Favourite Dahlia Zephine Fantine MmeThenardier Thenardier Cosette Javert Fauchelevent Bamatabois Perpetue Simplice Scaufflaire Woman1 Judge Champmathieu Brevet Chenildieu Cochepaille Pontmercy Boulatruelle Eponine Anzelma Woman2 MotherInnocent Gribier Jondrette MmeBurgon Gavroche Gillenormand Magnon MlleGillenormand MmePontmercy MlleVaubois LtGillenormand Marius BaronessT Mabeuf Enjolras Combeferre Prouvaire Feuilly Courfeyrac Bahorel Bossuet Joly Grantaire MotherPlutarch GueulemerBabet Claquesous Montparnasse Toussaint Child1Child2 Brujon MmeHucheloup Nathalie Villa-Vialaneix | Graph mining 2 4/48
  • 7. Examples are made with... the toy example “Les Misérables” (co-appearance network in Hugo’s novel) software and especially the R package igraph Nathalie Villa-Vialaneix | Graph mining 2 4/48
  • 8. Examples are made with... the toy example “Les Misérables” (co-appearance network in Hugo’s novel) software and especially the R package igraph the full script and the dataset is available on my website at: http://www.nathalievilla.org/teaching/toconet.html Nathalie Villa-Vialaneix | Graph mining 2 4/48
  • 9. Basic description of the graph lesmis ## IGRAPH U--- 77 254 -- ## + attr: layout (g/n), id (v/n), label (v/c), value (e/n) ## + edges: ## [1] 1-- 2 1-- 3 1-- 4 3-- 4 1-- 5 1-- 6 1-- 7 1-- 8 1-- 9 1--10 ## [11] 11--12 4--12 3--12 1--12 12--13 12--14 12--15 12--16 17--18 17--19 ## [21] 18--19 17--20 18--20 19--20 17--21 18--21 19--21 20--21 17--22 18--22 ## [31] 19--22 20--22 21--22 17--23 18--23 19--23 20--23 21--23 22--23 17--24 ## [41] 18--24 19--24 20--24 21--24 22--24 23--24 13--24 12--24 24--25 12--25 ## [51] 25--26 24--26 12--26 25--27 12--27 17--27 26--27 12--28 24--28 26--28 ## [61] 25--28 27--28 12--29 28--29 24--30 28--30 12--30 24--31 31--32 12--32 ## [71] 24--32 28--32 12--33 12--34 28--34 12--35 30--35 12--36 35--36 30--36 ## + ... omitted several edges U--- means: Undirected, not Named (no name attribute for the vertices), not Weighted (no weight attribute for the edges) and not Bipartite Nathalie Villa-Vialaneix | Graph mining 2 5/48
  • 10. System information ## R version 3.2.5 (2016-04-14) ## Platform: x86_64-pc-linux-gnu (64-bit) ## Running under: Ubuntu 14.04.4 LTS ## ## locale: ## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C ## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 ## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 ## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C ## [9] LC_ADDRESS=C LC_TELEPHONE=C ## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C ## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: ## [1] igraph_1.0.1 knitr_1.12.3 ## ## loaded via a namespace (and not attached): ## [1] magrittr_1.5 formatR_1.3 tools_3.2.5 stringi_1.0-1 ## [5] highr_0.5.1 stringr_1.0.0 evaluate_0.8.3 Nathalie Villa-Vialaneix | Graph mining 2 6/48
  • 11. Outline Numerical characteristics Clustering Modularity optimization Spectral clustering Model based clustering Nathalie Villa-Vialaneix | Graph mining 2 7/48
  • 12. Sketch of this section Issue at stake: a graph is given Nathalie Villa-Vialaneix | Graph mining 2 8/48
  • 13. Sketch of this section Issue at stake: a graph is given numerical characteristics describing the graph, the nodes, are a standard approach to describe it Nathalie Villa-Vialaneix | Graph mining 2 8/48
  • 14. Sketch of this section Issue at stake: a graph is given numerical characteristics describing the graph, the nodes, are a standard approach to describe it how to know that the observed value are unexpected according to a so-called “null model”? Nathalie Villa-Vialaneix | Graph mining 2 8/48
  • 15. Standard (global) characteristics density: |E| n(n−1)/2 graph.density number of triangles: triangles (see also motifs) transitivity: number of triangles divided by the number of triplets with at least two edges transitivity diameter: length of the longest shortest paths between two nodes diameter radius: minimal length, over all vertices in the graph, of the longest shortest path linking this vertex to another vertex radius girth: length of the shortest circle in the graph girth cohesion: minimum number of vertices to remove to disconnect the graph Nathalie Villa-Vialaneix | Graph mining 2 9/48
  • 16. Standard (global) characteristics for “Les misérables” graph.density(lesmis); triangles(lesmis); length(triangles(lesmis))/3 ## [1] 0.08680793 ## + 1401/77 vertices: ## [1] 12 1 3 12 1 4 12 3 4 12 24 32 12 24 13 12 24 25 12 24 30 12 25 ## [24] 71 12 25 70 12 25 69 12 25 27 12 26 24 12 26 25 12 26 27 12 26 72 12 ## [47] 26 71 12 26 70 12 26 69 12 27 73 12 27 52 12 27 50 12 27 44 12 28 73 ## [70] 12 28 24 12 28 25 12 28 26 12 28 27 12 28 29 12 28 30 12 28 32 12 28 ## [93] 34 12 28 44 12 28 72 12 28 59 12 28 69 12 28 70 12 28 71 12 29 45 12 ## [116] 30 39 12 30 38 12 30 37 12 30 35 12 30 36 12 35 39 12 35 38 12 35 36 ## [139] 12 35 37 12 36 39 12 36 38 12 36 37 12 37 39 12 37 38 12 38 39 12 49 ## [162] 26 12 49 28 12 49 56 12 49 59 12 49 65 12 49 69 12 49 70 12 49 72 12 ## [185] 50 52 12 56 26 12 56 27 12 56 65 12 56 50 12 56 52 12 56 59 12 59 71 ## [208] 12 59 65 12 69 72 12 69 71 12 69 70 12 70 72 12 70 71 12 71 72 49 26 ## + ... omitted several vertices ## [1] 467 transitivity(lesmis); diameter(lesmis); radius(lesmis); girth(lesmis) ## [1] 0.4989316 ## [1] 5 ## [1] 3 ## $girth ## [1] 3 ## ## $circle ## + 3/77 vertices: ## [1] 3 1 4 Nathalie Villa-Vialaneix | Graph mining 2 10/48
  • 17. Comparison with random graphs... Erdos-Renyi model with the same number of nodes and the same number of edges than the original graph (uniform probability to observe an edge between two given nodes) Nathalie Villa-Vialaneix | Graph mining 2 11/48
  • 18. Comparison with random graphs... Erdos-Renyi model with the same number of nodes and the same number of edges than the original graph (uniform probability to observe an edge between two given nodes) Method: compare the observed values with those of a large number of randomly generated random graphs (with no loop, only connected graphs are kept) sample_gnm(vcount(lesmis), ecount(lesmis)) Nathalie Villa-Vialaneix | Graph mining 2 11/48
  • 19. Results of the comparison with random graphs... For B = 500 graphs (only connected graphs are kept), we have: ## density triangles transitivity diameter ## Min. :0.08681 Min. :31.00 Min. :0.05834 Min. :4.000 ## 1st Qu.:0.08681 1st Qu.:43.00 1st Qu.:0.07907 1st Qu.:4.000 ## Median :0.08681 Median :47.00 Median :0.08701 Median :5.000 ## Mean :0.08681 Mean :47.55 Mean :0.08660 Mean :4.627 ## 3rd Qu.:0.08681 3rd Qu.:52.00 3rd Qu.:0.09415 3rd Qu.:5.000 ## Max. :0.08681 Max. :67.00 Max. :0.11793 Max. :6.000 ## radius girth cohesion ## Min. :3.000 Min. :3 Min. :1.000 ## 1st Qu.:3.000 1st Qu.:3 1st Qu.:1.000 ## Median :3.000 Median :3 Median :2.000 ## Mean :3.004 Mean :3 Mean :1.599 ## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:2.000 ## Max. :4.000 Max. :3 Max. :3.000 compared to: 0.0868079, 467, 0.4989316, 5, 3, 3, 1 ⇒ all values are standard except for: the number of triangles and the transitivity which are larger: local connectivity is strongest than expected in Erdos-Renyi random graphs the cohesion which is in the lowest values of what is expected in Erdos-Renyi random graphs: this again indicates a strongest local connectivity Nathalie Villa-Vialaneix | Graph mining 2 12/48
  • 20. Standard (local) characteristics ... for the vertex xi: degree: {xj : (xi, xj) ∈ E, j i} degree (or strength for the weighted version, j i wij) betweenness (or centrality): number of shortest paths between any pair of vertices in the graph which pass through xi betweenness eccentricity: maximal length of all the shortest paths going from xi to any other vertex in the graph eccentricity closeness (or closeness centrality): 1 j i d(xi,xj) in which d(xi, xj) is the length of the shortest path between xi and xj closeness ...and their distributions among all vertices. Nathalie Villa-Vialaneix | Graph mining 2 13/48
  • 21. Standard (local) characteristics for “Les misérables” summary(degree(lesmis)) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 1.000 2.000 6.000 6.597 10.000 36.000 summary(betweenness(lesmis)) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.00 0.00 0.00 62.36 22.92 1624.00 summary(eccentricity(lesmis)) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 3.00 4.00 4.00 4.13 5.00 5.00 summary(closeness(lesmis)) ## Min. 1st Qu. Median Mean 3rd Qu. Max. ## 0.003378 0.004484 0.005181 0.005123 0.005435 0.008475 Nathalie Villa-Vialaneix | Graph mining 2 14/48
  • 22. Comparison with random graphs... Erdos-Renyi model with the same number of nodes and the same number of edges than the original graph (uniform probability to observe an edge between two given nodes) Nathalie Villa-Vialaneix | Graph mining 2 15/48
  • 23. Comparison with random graphs... Erdos-Renyi model with the same number of nodes and the same number of edges than the original graph (uniform probability to observe an edge between two given nodes) Method: compare the observed values (average betweenness and degree) with those of a large number of randomly generated random graphs (with no loop, only connected graphs are kept) sample_gnm(vcount(lesmis), ecount(lesmis)) Nathalie Villa-Vialaneix | Graph mining 2 15/48
  • 24. Results of the comparison with random graphs... For B = 500 graphs (only connected graphs are kept), we have: ## degree betweenness eccentricity closeness ## Min. :6.597 Min. :54.64 Min. :3.597 Min. :0.005249 ## 1st Qu.:6.597 1st Qu.:55.93 1st Qu.:3.779 1st Qu.:0.005322 ## Median :6.597 Median :56.32 Median :3.857 Median :0.005340 ## Mean :6.597 Mean :56.36 Mean :3.863 Mean :0.005340 ## 3rd Qu.:6.597 3rd Qu.:56.71 3rd Qu.:3.909 3rd Qu.:0.005361 ## Max. :6.597 Max. :58.79 Max. :4.688 Max. :0.005430 compared to: 6.597, 62.364, 4.13, 0.00512 ⇒ the observed average betweenness is higher and the observed average closeness is smaller for all the randomly generated graphs: this seems to indicate that, in average, shortest paths in the graphs are longer than expected for graphs with uniform distribution of the edges. Nathalie Villa-Vialaneix | Graph mining 2 16/48
  • 25. Degree distribution for “Les misérables” + + + + + + + + + + +++ ++++ + 0 1 2 3 −4.0−3.5−3.0−2.5−2.0−1.5 log(k) log(P(k)) Estimation of power law fit (left: α = 1.49) with fit_power_law(degree(lesmis) + 1, implementation = "R.mle") Nathalie Villa-Vialaneix | Graph mining 2 17/48
  • 26. Comparison with random graphs... Scale free model with a parameter for the power law identical to the one previously estimated and the same number of nodes. Barabási and Albert model is used with a number of edges added at each step which is chosen so that the final number of edges resembles that of the original graph (3 edges, which gives 225 edges in the final graph, compared to 254) P(degree = k) = k−α Nathalie Villa-Vialaneix | Graph mining 2 18/48
  • 27. Comparison with random graphs... Scale free model with a parameter for the power law identical to the one previously estimated and the same number of nodes. Barabási and Albert model is used with a number of edges added at each step which is chosen so that the final number of edges resembles that of the original graph (3 edges, which gives 225 edges in the final graph, compared to 254) P(degree = k) = k−α Method: compare the observed values with those of a large number of randomly generated random graphs sample_pa(vcount(lesmis), m = 3, power = ..., directed = FALSE) Nathalie Villa-Vialaneix | Graph mining 2 18/48
  • 28. Results of the comparison with random graphs... For B = 500 graphs, we have: ## density triangles transitivity diameter ## Min. :0.0769 Min. : 72 Min. :0.1075 Min. :3.000 ## 1st Qu.:0.0769 1st Qu.:102 1st Qu.:0.1250 1st Qu.:4.000 ## Median :0.0769 Median :112 Median :0.1307 Median :4.000 ## Mean :0.0769 Mean :113 Mean :0.1303 Mean :3.988 ## 3rd Qu.:0.0769 3rd Qu.:124 3rd Qu.:0.1359 3rd Qu.:4.000 ## Max. :0.0769 Max. :153 Max. :0.1530 Max. :5.000 ## radius girth cohesion degree betweenness ## Min. :2.000 Min. :3 Min. :3 Min. :5.844 Min. :41.86 ## 1st Qu.:2.000 1st Qu.:3 1st Qu.:3 1st Qu.:5.844 1st Qu.:47.88 ## Median :2.000 Median :3 Median :3 Median :5.844 Median :49.55 ## Mean :2.314 Mean :3 Mean :3 Mean :5.844 Mean :49.35 ## 3rd Qu.:3.000 3rd Qu.:3 3rd Qu.:3 3rd Qu.:5.844 3rd Qu.:50.97 ## Max. :3.000 Max. :3 Max. :3 Max. :5.844 Max. :55.73 ## eccentricity closeness ## Min. :2.935 Min. :0.005407 ## 1st Qu.:3.130 1st Qu.:0.005695 ## Median :3.221 Median :0.005788 ## Mean :3.234 Mean :0.005805 ## 3rd Qu.:3.325 3rd Qu.:0.005901 ## Max. :3.662 Max. :0.006334 compared to: 0.087, 467, 0.499, 5, 3, 3, 1, 6.597, 62.364, 4.13, 0.00512 ⇒ the number of triangles, the transitivity, the radius, the average degree, the average betweenness and the eccentricity are larger than in power law graphs with power 1.495, whereas the cohesion and the closeness are smaller. Nathalie Villa-Vialaneix | Graph mining 2 19/48
  • 29. Limits of the previous approaches Until now, we have compared the real graph to graphs randomly generated according to a given random model but: this approach only gives information about global characteristics of the observed graph; none of the distributions of the current characteristics is preserved during the process, especially not the degree distribution which is central for controlling local/global connectivity, counts of specific patterns... Nathalie Villa-Vialaneix | Graph mining 2 20/48
  • 30. A null model closer to the real graph... Sketch of statistical tests on graphs 1. sample at random within the set of graphs with the same degree distribution than the observed graph (B times) 2. compute a numerical statistics for each of these randomly generated graphs 3. comparing the observed value of the statistics and its distribution over the random graphs, a p-value can be derived (for B large enough) Nathalie Villa-Vialaneix | Graph mining 2 21/48
  • 31. A null model closer to the real graph... Sketch of statistical tests on graphs 1. sample at random within the set of graphs with the same degree distribution than the observed graph (B times) 2. compute a numerical statistics for each of these randomly generated graphs 3. comparing the observed value of the statistics and its distribution over the random graphs, a p-value can be derived (for B large enough) Two main approaches to sample at random with fixed degrees: configuration model [Bender and Canfield, 1978] permutation approach [Rao et al., 1996, Roberts Jr., 2000] Nathalie Villa-Vialaneix | Graph mining 2 21/48
  • 32. Sampling at random within the set of graphs with a given degree distribution Aim: all graphs can exhaustively be sampled all graphs have the same probability to be sampled ⇒ MCMC approach Nathalie Villa-Vialaneix | Graph mining 2 22/48
  • 33. Sampling at random within the set of graphs with a given degree distribution Aim: all graphs can exhaustively be sampled all graphs have the same probability to be sampled ⇒ MCMC approach Method: 1: Start from the observed graph G 2: for t = 1 → T do 3: Select uniformly at random two edges e1 = (x1 i , x1 j ) and e2 = (x2 i , x2 j ) ∈ E 4: E ← E {e1 , e2 } ∪ {e1 s , e2 s } with e1 s = (x1 i , x2 j ) and e2 s = (x2 i , x1 j ) 5: if G = (V, E ) is simple and connected then 6: G ← G 7: end if 8: end for 9: return G Nathalie Villa-Vialaneix | Graph mining 2 22/48
  • 34. In practice... This method is used in [Milo et al., 2004] with T = 100. It can be performed using rewire(lesmis, keeping_degseq(n = 100)) Number of triangles Frequency 200 300 400 020406080100120 transitivity Frequency 0.25 0.35 0.45 020406080100 Nathalie Villa-Vialaneix | Graph mining 2 23/48
  • 35. In practice... for the vertex characteristics Find a(n empirical) p-value for all vertices which indicates if its betweenness is higher or lower than expected with respect to its degree: ratio of random graphs for which the observed betweenness is higher (resp. lower) than 95% of the betweennesses for the corresponding vertex in random graphs. Myriel Valjean Listolier Fameuil Blacheville Favourite Dahlia Zephine Fantine Judge Champmathieu Brevet Chenildieu Cochepaille LtGillenormand Marius Combeferre Prouvaire FeuillyCourfeyrac BahorelJoly Grantaire GueulemerBabet Claquesous MontparnasseBrujon MmeHucheloup Nathalie Villa-Vialaneix | Graph mining 2 24/48
  • 36. More on random graphs generation Sometimes, one wants to compare the observed graph with a more sophisticated (constrained) null model (taking into account some additional information on edges or nodes for instance): This can be achieved using the same principle and throwing away the random graphs which do not satisfy the constrains. Nathalie Villa-Vialaneix | Graph mining 2 25/48
  • 37. More on random graphs generation Sometimes, one wants to compare the observed graph with a more sophisticated (constrained) null model (taking into account some additional information on edges or nodes for instance): This can be achieved using the same principle and throwing away the random graphs which do not satisfy the constrains. Warning: The more sophisticated the model is, the more costly the simulation would be. For instance, only removing graphs with multiple edges and graphs which are not connected leads to throw away 47 simulations over 500. Nathalie Villa-Vialaneix | Graph mining 2 25/48
  • 38. More on random graphs generation Sometimes, one wants to compare the observed graph with a more sophisticated (constrained) null model (taking into account some additional information on edges or nodes for instance): This can be achieved using the same principle and throwing away the random graphs which do not satisfy the constrains. Warning: The more sophisticated the model is, the more costly the simulation would be. For instance, only removing graphs with multiple edges and graphs which are not connected leads to throw away 47 simulations over 500. Possible solution: [Tabourier and Cointet, 2011] use multiple edge switching to improve the simulations such simulations. Nathalie Villa-Vialaneix | Graph mining 2 25/48
  • 39. Outline Numerical characteristics Clustering Modularity optimization Spectral clustering Model based clustering Nathalie Villa-Vialaneix | Graph mining 2 26/48
  • 40. Sketch of this section Issue at stake: short overview of different types of methods for vertex clustering only simple clustering (although some methods for overlapping clustering, clustering according to vertex/edge attributes, clustering of bipartite graphs... also exist) statistical relevance and comparison of clustering results Nathalie Villa-Vialaneix | Graph mining 2 27/48
  • 41. A short overview of vertex clustering Purpose: Find communities or modules (i.e., groups of vertices) st vertices inside the community are strongly connected whereas vertices between two communities are slightly connected. Nathalie Villa-Vialaneix | Graph mining 2 28/48
  • 42. A short overview of vertex clustering Purpose: Find communities or modules (i.e., groups of vertices) st vertices inside the community are strongly connected whereas vertices between two communities are slightly connected. Some approaches to perform such task: optimizing a given criterion (e.g., modularity maximization) spectral clustering model based clustering ... (see [Fortunato and Barthélémy, 2007, Schaeffer, 2007, Brohée and van Helden, 2006]) Nathalie Villa-Vialaneix | Graph mining 2 28/48
  • 43. Clustering based on criterion optimization “Cut” criteria: Given a number of clusters, K, find the partition of V, C1, . . . , CK such that it solves the mincut problem, i.e., it minimizes cut(A1, . . . , AK ) = 1 2 K k=1 xi∈Ak , xj Ak wij Nathalie Villa-Vialaneix | Graph mining 2 29/48
  • 44. Clustering based on criterion optimization “Cut” criteria: Given a number of clusters, K, find the partition of V, C1, . . . , CK such that it solves the mincut problem, i.e., it minimizes cut(A1, . . . , AK ) = 1 2 K k=1 xi∈Ak , xj Ak wij Problem: The mincut problem often separates individual vertices from the rest of the graph. Nathalie Villa-Vialaneix | Graph mining 2 29/48
  • 45. Clustering based on criterion optimization “Cut” criteria: Given a number of clusters, K, find the partition of V, C1, . . . , CK such that it solves the “RatioCut” problem, i.e., it minimizes RatioCut(A1, . . . , AK ) = 1 2 K k=1 xi∈Ak , xj Ak wij |Ak | (forces larger communities than the mincut problem). Nathalie Villa-Vialaneix | Graph mining 2 29/48
  • 46. Clustering based on criterion optimization “Cut” criteria: Given a number of clusters, K, find the partition of V, C1, . . . , CK such that it solves the “NCut” problem, i.e., it minimizes NCut(A1, . . . , AK ) = 1 2 K k=1 xi∈Ak , xj Ak wij Vol(Ak ) in which Vol(Ak ) = xi, xj∈Ak wij (also forces larger communities than the mincut problem). Nathalie Villa-Vialaneix | Graph mining 2 29/48
  • 47. Clustering based on criterion optimization “Cut” criteria “Modularity” criterion [Newman and Girvan, 2004]: Given a number of clusters, K, find the partition of V, C1, . . . , CK which maximizes Q(A1, . . . , Ak ) = 1 2m K k=1 xi, xj∈Ck (wij − Pij) with Pij: weight of a “null model” (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j i wij. Nathalie Villa-Vialaneix | Graph mining 2 29/48
  • 48. Advantages and drawbacks mincut is not adapted to vertex clustering in practice (clusters with isolated vertices) the other three methods are NP hard to solve... Nathalie Villa-Vialaneix | Graph mining 2 30/48
  • 49. Advantages and drawbacks mincut is not adapted to vertex clustering in practice (clusters with isolated vertices) the other three methods are NP hard to solve... the modularity takes into account asymmetry in degree distribution by correcting the importance of a vertex by its degree: it is often more adapted to real life graphs [Fortunato and Barthélémy, 2007] showed that modularity has a small resolution issue. [Bickel and Chen, 2009] gave conditions for consistency of the clusters obtained by modularity optimization in Stochastic Block Models (SBM). Nathalie Villa-Vialaneix | Graph mining 2 30/48
  • 50. Advantages and drawbacks mincut is not adapted to vertex clustering in practice (clusters with isolated vertices) the other three methods are NP hard to solve... the modularity takes into account asymmetry in degree distribution by correcting the importance of a vertex by its degree: it is often more adapted to real life graphs [Fortunato and Barthélémy, 2007] showed that modularity has a small resolution issue. [Bickel and Chen, 2009] gave conditions for consistency of the clusters obtained by modularity optimization in Stochastic Block Models (SBM). Remark: Relaxation of RatioCut problem and NCut problem gives spectral clustering. Modularity optimization is often solved by approximation methods. Nathalie Villa-Vialaneix | Graph mining 2 30/48
  • 51. A short description of approximation methods for modularity optimization simple greedy algorithms ([Newman, 2004] and [Clauset et al., 2004] for a fast version): hierarchical clustering which merges pairs of vertices with the highest contribution to modularity cluster_fast_greedy Nathalie Villa-Vialaneix | Graph mining 2 31/48
  • 52. A short description of approximation methods for modularity optimization simple greedy algorithms ([Newman, 2004] and [Clauset et al., 2004] for a fast version): hierarchical clustering which merges pairs of vertices with the highest contribution to modularity cluster_fast_greedy multi-level greedy algorithms ([Blondel et al., 2008], also known as “Louvain algorithm” and [Noack and Rotta, 2009] for an improved version): hierarchical approach in which vertices are sometimes re-assigned to a different community in a greedy way cluster_louvain Nathalie Villa-Vialaneix | Graph mining 2 31/48
  • 53. A short description of approximation methods for modularity optimization simple greedy algorithms ([Newman, 2004] and [Clauset et al., 2004] for a fast version): hierarchical clustering which merges pairs of vertices with the highest contribution to modularity cluster_fast_greedy multi-level greedy algorithms ([Blondel et al., 2008], also known as “Louvain algorithm” and [Noack and Rotta, 2009] for an improved version): hierarchical approach in which vertices are sometimes re-assigned to a different community in a greedy way cluster_louvain simulated annealing ([Reichardt and Bornholdt, 2006] uses a spin-glass model which, in some cases, is equivalent to modularity maximization) cluster_spinglass(..., gamma = 1, update.rule = "config") Nathalie Villa-Vialaneix | Graph mining 2 31/48
  • 54. A short description of approximation methods for modularity optimization simple greedy algorithms ([Newman, 2004] and [Clauset et al., 2004] for a fast version): hierarchical clustering which merges pairs of vertices with the highest contribution to modularity cluster_fast_greedy multi-level greedy algorithms ([Blondel et al., 2008], also known as “Louvain algorithm” and [Noack and Rotta, 2009] for an improved version): hierarchical approach in which vertices are sometimes re-assigned to a different community in a greedy way cluster_louvain simulated annealing ([Reichardt and Bornholdt, 2006] uses a spin-glass model which, in some cases, is equivalent to modularity maximization) cluster_spinglass(..., gamma = 1, update.rule = "config") ...to be compared (when usable) with the exact optimization cluster_optimal. Nathalie Villa-Vialaneix | Graph mining 2 31/48
  • 55. Examples res_time <- cbind( system.time(res_hierarchical <- cluster_fast_greedy(lesmis)), system.time(res_multilevel <- cluster_louvain(lesmis)), system.time(res_annealing <- cluster_spinglass(lesmis)), system.time(res_exact <- cluster_optimal(lesmis)) )[3, ] ## hierarchical multilevel annealing exact ## 0.002 0.002 1.907 21.656 Nathalie Villa-Vialaneix | Graph mining 2 32/48
  • 56. Computational time (greedy approaches) Difference (computational time) between the first two approaches (100 evaluations): library(microbenchmark) res_micro <- microbenchmark(cluster_fast_greedy(lesmis), cluster_louvain(lesmis)) cluster_fast_greedy(lesmis) cluster_louvain(lesmis) 1000 Time [microseconds] Nathalie Villa-Vialaneix | Graph mining 2 33/48
  • 57. Accuracy of the clustering hierarchical − 0.5006 − 5 multilevel − 0.5556 − 6 simulated annealing − 0.5596 − 7 exact − 0.56 − 6 Nathalie Villa-Vialaneix | Graph mining 2 34/48
  • 58. Assessing the relevance of a clustering Given a graph, the modularity optimization will always return a clustering: how to know that this clustering is meaningful? (i.e., that its modularity is large) Nathalie Villa-Vialaneix | Graph mining 2 35/48
  • 59. Assessing the relevance of a clustering Given a graph, the modularity optimization will always return a clustering: how to know that this clustering is meaningful? (i.e., that its modularity is large) Similarly as previously, compare the maximum modularity to the maximum modularity over a large number of randomly generated graphs (with same degree sequence). Modularity Frequency 0.30 0.35 0.40 0.45 0.50 0.55 020406080 Nathalie Villa-Vialaneix | Graph mining 2 35/48
  • 60. Relation between RatioCut and Laplacian [von Luxburg, 2007] shows that minimizing RatioCut(A1, A2) = 1 2 2 k=1 xi∈Ak , xj Ak wij |Ak | is equivalent to the following constrained problem: min A1, ,A2 v Lv st v ⊥ 1n and v = √ n for v the vector of Rn obtained from the partition by: vi = (|A2|)/|A1| if vi ∈ A1 − |A1|/(|A2|) otherwise. and L is the Laplacian of the graph, n × n-matrix with entries: Lij = −wij if i j di = j i wij otherwise . Nathalie Villa-Vialaneix | Graph mining 2 36/48
  • 61. ... and more remarks this is a discrete (since v can only have two values) and NP-hard problem; Nathalie Villa-Vialaneix | Graph mining 2 37/48
  • 62. ... and more remarks this is a discrete (since v can only have two values) and NP-hard problem; the same relation holds between NCut problem and normalized Laplacian D−1/2 LD−1/2 is which D = Diag(d1, . . . , dn); Nathalie Villa-Vialaneix | Graph mining 2 37/48
  • 63. ... and more remarks this is a discrete (since v can only have two values) and NP-hard problem; the same relation holds between NCut problem and normalized Laplacian D−1/2 LD−1/2 is which D = Diag(d1, . . . , dn); a generalization of these results exist for K > 2. Nathalie Villa-Vialaneix | Graph mining 2 37/48
  • 64. Some properties of the Laplacian Relations with the graph structure: 1 2 3 4 5 has a null space spanned by the vectors   1 1 1 0 0   and   0 0 0 1 1   . Nathalie Villa-Vialaneix | Graph mining 2 38/48
  • 65. Some properties of the Laplacian Relations with the graph structure: the vector 1n spans the null space for connected graphs. Nathalie Villa-Vialaneix | Graph mining 2 38/48
  • 66. Some properties of the Laplacian Relations with the graph structure: Random walk point of view: If we consider a random walk on the graph with probability to jump from one node to the other equal to wij di then NCut(A1, A2) is interpreted as the probability to go from A1 to A2 or from A2 to A1. Nathalie Villa-Vialaneix | Graph mining 2 38/48
  • 67. Some properties of the Laplacian Relations with the graph structure: Random walk point of view: If we consider a random walk on the graph with probability to jump from one node to the other equal to wij di then the average time to go from one node to another (commute time) is given by L+ [Fouss et al., 2007]. Nathalie Villa-Vialaneix | Graph mining 2 38/48
  • 68. Spectral clustering: relaxing the constrains K has to be given. Solving minA1, ,A2 Tr(U LU) for a K × n matrix U st U U = 1: 1. Compute the first K eigenvectors of L, u1 , . . . , uK and write U = (u1 , . . . , uK ) (a n × K matrix). Nathalie Villa-Vialaneix | Graph mining 2 39/48
  • 69. Spectral clustering: relaxing the constrains K has to be given. Solving minA1, ,A2 Tr(U LU) for a K × n matrix U st U U = 1: 1. Compute the first K eigenvectors of L, u1 , . . . , uK and write U = (u1 , . . . , uK ) (a n × K matrix). 2. For i = 1, . . . , n, denote ui ∈ RK the i-th row of U. Cluster the points (ui)i=1,...,n using a clustering algorithm (e.g., k-means). Nathalie Villa-Vialaneix | Graph mining 2 39/48
  • 70. Spectral clustering: relaxing the constrains K has to be given. Solving minA1, ,A2 Tr(U LU) for a K × n matrix U st U U = 1: 1. Compute the first K eigenvectors of L, u1 , . . . , uK and write U = (u1 , . . . , uK ) (a n × K matrix). 2. For i = 1, . . . , n, denote ui ∈ RK the i-th row of U. Cluster the points (ui)i=1,...,n using a clustering algorithm (e.g., k-means). embed_laplacian_matrix(..., no = ..., which = "sa", scaled = ...) et kmeans(..., centers = ..., nstart = 10) Nathalie Villa-Vialaneix | Graph mining 2 39/48
  • 71. Spectral clustering in practice res_time_spec <- system.time({ spec_embed <- embed_laplacian_matrix(lesmis, no = 6, which = "sa", scaled = FALSE) res_spectral <- kmeans(spec_embed$X[ ,-1], centers = 6, nstart = 1) })[3] res_time_spec ## elapsed ## 0.017 Time is between the greedy approaches for modularity optimization and simulated annealing for modularity optimization. Nathalie Villa-Vialaneix | Graph mining 2 40/48
  • 72. Accuracy of the clustering spectral clustering − 0.4461 − 6 exact − 0.56 − 6 Modularity is smaller (as expected) and clusters tend to be more unbalanced. An empirical comparison between the performance of spectral clustering and modularity optimization is provided in [Bickel and Chen, 2009]. [Lei and Rinaldo, 2015] gives conditions for the consistency of spectral clustering in stochastic block models. Nathalie Villa-Vialaneix | Graph mining 2 41/48
  • 73. A mixture model for networks [Snijders and Nowicki, 1997]: The observed network G is supposed to be the realization of some random graph model in which vertices are organized in groups. description of the model vertices xi belong to an unknow class in {C1, ..., CK } (K is given) ⇒ latent (unobserved) variables Zi ∼ M(1, α = (α1, . . . , αK )) in which αk is the probability that xi belongs to Ck Nathalie Villa-Vialaneix | Graph mining 2 42/48
  • 74. A mixture model for networks [Snijders and Nowicki, 1997]: The observed network G is supposed to be the realization of some random graph model in which vertices are organized in groups. description of the model vertices xi belong to an unknow class in {C1, ..., CK } (K is given) ⇒ latent (unobserved) variables Zi ∼ M(1, α = (α1, . . . , αK )) in which αk is the probability that xi belongs to Ck given the class membership, the probabilities to have an edge between xi and xj are all independant and obtained by: wij = 1|Zik Zik = 1 ∼ L(., πkk ) for a given distribution L Nathalie Villa-Vialaneix | Graph mining 2 42/48
  • 75. A mixture model for networks [Snijders and Nowicki, 1997]: The observed network G is supposed to be the realization of some random graph model in which vertices are organized in groups. description of the model vertices xi belong to an unknow class in {C1, ..., CK } (K is given) ⇒ latent (unobserved) variables Zi ∼ M(1, α = (α1, . . . , αK )) in which αk is the probability that xi belongs to Ck given the class membership, the probabilities to have an edge between xi and xj are all independant and obtained by: typically, the Bernouilli distribution with probability πkk with πkk = p1 if k = k p0 if k k for p1 > p0. Nathalie Villa-Vialaneix | Graph mining 2 42/48
  • 76. Basic principle for using SBM 1. assignments of vertices to groups; 2. parameter estimation ((αk )k and (πkk )k,k ); 3. estimation of the number of groups. Nathalie Villa-Vialaneix | Graph mining 2 43/48
  • 77. Basic principle for using SBM 1. assignments of vertices to groups; 2. parameter estimation ((αk )k and (πkk )k,k ); 3. estimation of the number of groups. Estimation is made by Bayesian or frequentist approaches and Variational EM (see e.g., [Daudin et al., 2008] for the more computationally efficient frequentist approach). Number of nodes can be chosen using ICL [Biernacki et al., 2000]. Nathalie Villa-Vialaneix | Graph mining 2 43/48
  • 78. Basic principle for using SBM 1. assignments of vertices to groups; 2. parameter estimation ((αk )k and (πkk )k,k ); 3. estimation of the number of groups. Estimation is made by Bayesian or frequentist approaches and Variational EM (see e.g., [Daudin et al., 2008] for the more computationally efficient frequentist approach). Number of nodes can be chosen using ICL [Biernacki et al., 2000]. All this is implemented in the package blockmodels [Léger, 2016]. BM_bernoulli("SBM_sym", as_adjacency_matrix(..., sparse = FALSE)) BM_bernoulli$estimate() Nathalie Villa-Vialaneix | Graph mining 2 43/48
  • 79. SBM in practice library(blockmodels) res_time_sbm <- system.time({ res_sbm <- BM_bernoulli("SBM_sym", as_adjacency_matrix(lesmis, sparse = FALSE)) res_sbm$estimate() })[3] res_time_sbm ## elapsed ## 1.821 opt_K <- which.max(res_sbm$ICL) opt_K ## [1] 6 sbm_clust <- apply(res_sbm$memberships[[opt_K]]$Z, 1, which.max) Nathalie Villa-Vialaneix | Graph mining 2 44/48
  • 80. Accuracy of the clustering SBM clustering − 0.4556 − 6 exact − 0.56 − 6 Modularity is smaller (as expected) but groups can be interpreted by being sets of vertices with similar connecting patterns. Nathalie Villa-Vialaneix | Graph mining 2 45/48
  • 81. Comparing clustering Various metrics ((di)similarities) exist to compare clustering, among which: Rand Index [Rand, 1971] compare(..., method = "rand"): number of agreements between the two clusterings n Normalized Mutual Information [Danon et al., 2005] compare(..., method = "nmi") K1 k=1 K2 k =1 nkk n log   nkk n n1 k n2 k   in which Kj is the number of clusters in clustering j, n j k is the number of vertices classified into cluster k for clustering j and nkk is the number of vertices classified into cluster k for clustering 1 and cluster k for clustering 2. The similarity is normalized so that it is between 0 and 1 (1 is for a perfect match). Nathalie Villa-Vialaneix | Graph mining 2 46/48
  • 82. How do clusterings relate? Method: 1. compute a dissimilarity based on Rand index or NMI (1 − value) 2. perform clustering (of the results of vertex clustering) using hierarchical clustering hclust Nathalie Villa-Vialaneix | Graph mining 2 47/48
  • 83. How do clusterings relate? sbm spectral hierarchical multilevel annealing exact 0.00.10.20.3 Rand index hclust (*, "complete") as.dist(compare_rand) Height sbm spectral hierarchical multilevel annealing exact 0.00.20.40.6 NMI hclust (*, "complete") as.dist(compare_nmi) Height Nathalie Villa-Vialaneix | Graph mining 2 47/48
  • 84. Any question? Nathalie Villa-Vialaneix | Graph mining 2 48/48
  • 85. Bender, E. and Canfield, E. (1978). The asymptotic number of labeled graphs with given degree sequences. Journal of Combinatorial Theory, Series A, 24(3):296–307. Bickel, P. and Chen, A. (2009). A nonparametric view of network models and Newman-Girvan and other modularities. Proceedings of the National Academy of Sciences, USA, 106(50):21068–21073. Biernacki, C., Celeux, G., and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Transactions on Pattern Analysis and Machine Intelligence, 22(7):719–725. Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communites in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008:1742–5468. Brohée, S. and van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7(488). Clauset, A., Newman, M. E. J., and Moore, C. (2004). Finding community structure in very large networks. Physical Review E, 70:066111. Danon, L., Diaz-Guilera, A., Duch, J., and Arenas, A. (2005). Comparing community structure identification. Journal of Statistical Mechanics, page P09008. Daudin, J., Picard, F., and Robin, S. (2008). A mixture model for random graphs. Statistics and Computing, 18:173–183. Fortunato, S. and Barthélémy, M. (2007). Resolution limit in community detection. Nathalie Villa-Vialaneix | Graph mining 2 48/48
  • 86. In Proceedings of the National Academy of Sciences, volume 104, pages 36–41. doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract. Fouss, F., Pirotte, A., Renders, J., and Saerens, M. (2007). Random-walk computation of similarities between nodes of a graph, with application to collaborative recommendation. IEEE Transactions on Knowledge and Data Engineering, 19(3):355–369. Léger, J. (2016). Blockmodels: a R-package for estimating in LBM and SBM, with many pdf, with or without covariates. Preprint arXiv 1602.07587v1. Submitted for publication. Lei, J. and Rinaldo, A. (2015). Consistency of spectral clustering in stochastic block models. The Annals of Statistics, 43(1):215–237. Milo, R., Kashtan, N., Itzkovitz, S., Newman, M., and Alon, U. (2004). On the uniform generation of random graphs with prescribed degree sequences. eprint arXiv: cond-mat/0312028v2. Newman, M. (2004). Fast algorithm for detecting community structure in networks. Physical Review E, 69:066133. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. Noack, A. and Rotta, R. (2009). Multi-level algorithms for modularity clustering. In SEA 2009: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257–268, Berlin, Heidelberg. Springer-Verlag. Rand, W. (1971). Nathalie Villa-Vialaneix | Graph mining 2 48/48
  • 87. Objective criteria for the evaluation of clustering methods. Journal of the American Statistical Association, 66(336):846–850. Rao, A., Jana, R., and Bandyopadhyay, S. (1996). A markov chain monte carlo method for generating random (0, 1)-matrices with given marginals. Sankhyã: The Indian Journal of Statistics, Series A (1961-2002), 58(2):225–242. Reichardt, J. and Bornholdt, S. (2006). Statistical mechanics of community detection. Physical Review, E, 74(016110). Roberts Jr., J. (2000). Simple methods for simulating sociomatrices with given marginal totals. Social Networks, 22(3):273 – 283. Schaeffer, S. (2007). Graph clustering. Computer Science Review, 1(1):27–64. Snijders, T. and Nowicki, K. (1997). Estimation and prediction for stochastic block-structures for graphs with latent block structure. Journal of Classification, 14:75–100. Tabourier, L.and Roth, C. and Cointet, J. (2011). Generating constrained random graphs using multiple edge switches. ACM Journal of Experimental Algorithmics, 16(1):1.7. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395–416. Nathalie Villa-Vialaneix | Graph mining 2 48/48