SlideShare une entreprise Scribd logo
1  sur  83
Télécharger pour lire hors ligne
Visualiser et fouiller des réseaux
Méthodes et exemples dans R
Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr
http://www.nathalievilla.org
Unité MIA-T, INRA, Toulouse
AG PEPI IBIS - 1er avril 2014
AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 1 / 39
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 2 / 39
What is a network/graph? réseau/graphe
Mathematical object used to model relational data between entities.
AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
What is a network/graph? réseau/graphe
Mathematical object used to model relational data between entities.
The entities are called the nodes or the vertexes (vertices in British)
n÷uds/sommets
AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
What is a network/graph? réseau/graphe
Mathematical object used to model relational data between entities.
A relation between two entities is modeled by an edge
arête
AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
(non biological) Examples
Social network: nodes: persons - edges: 2 persons are connected
(friends), as in my facebook network:
In the following: this network will be used to illustrate the presentation.
Note: if you want to test the script with your own facebook network, you can extract it
at http://shiny.nathalievilla.org/fbs.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 4 / 39
(non biological) Examples
Modeling a large corpus of medieval documents
Notarial acts (mostly baux à ef, more
precisely, land charters) established in a
seigneurie named Castelnau Montratier,
written between 1250 and 1500, involving
tenants and lords.
a
a
http://graphcomp.univ-tlse2.fr
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 4 / 39
(non biological) Examples
Modeling a large corpus of medieval documents
• nodes: transactions and individuals
(3 918 nodes)
• edges: an individual is directly involved
in a transaction (6 455 edges)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 4 / 39
(non biological) Examples
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 4 / 39
Standard issues associated with networks
Inference
Giving data, how to build a graph whose edges represent the direct links
between variables?
Example: co-expression networks built from microarray data (nodes = genes; edges =
signicant direct links between expressions of two genes)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 5 / 39
Standard issues associated with networks
Inference
Giving data, how to build a graph whose edges represent the direct links
between variables?
Graph mining (examples)
1 Network visualization: nodes are not a priori associated to a given
position. How to represent the network in a meaningful way?
Random positions
Positions aiming at representing
connected nodes closer
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 5 / 39
Standard issues associated with networks
Inference
Giving data, how to build a graph whose edges represent the direct links
between variables?
Graph mining (examples)
1 Network visualization: nodes are not a priori associated to a given
position. How to represent the network in a meaningful way?
2 Network clustering: identify communities (groups of nodes that are
densely connected and share a few links (comparatively) with the other groups)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 5 / 39
More complex relational models
Nodes may be labeled by a factor
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 6 / 39
More complex relational models
Nodes may be labeled by a factor
... or by a numerical information. [Villa-Vialaneix et al., 2013]
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 6 / 39
More complex relational models
Nodes may be labeled by a factor
... or by a numerical information. [Villa-Vialaneix et al., 2013]
Edges may also be labeled (type of the relation) or weighted (strength of
the relation) or directed (direction of the relation).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 6 / 39
Before we start...
Available material
• this slide (on slideshare or on my website, page seminars);
• data: my facebook network with two les: fbnet-el.txt (edge list)
and fbnet-name.txt (node names and list)
• script: a R script with all command lines included in this slide,
fb-Rscript.R
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 7 / 39
Before we start...
Available material
• this slide (on slideshare or on my website, page seminars);
• data: my facebook network with two les: fbnet-el.txt (edge list)
and fbnet-name.txt (node names and list)
• script: a R script with all command lines included in this slide,
fb-Rscript.R
Outline
Introduce basic concepts on network mining (not inference)
Illustrate them with R (required package: igraph) on my facebook
network
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 7 / 39
Import data
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 8 / 39
Import data
Import a graph from an edge list with igraph
edgelist - as.matrix(read.table(fbnet -el.txt))
vnames - read.table(fbnet -name.txt)
vnames - read.table(fbnet -name.txt, sep=,,
stringsAsFactor=FALSE ,
na.strings=)
The graph is built with:
# with ` graph . edgelist '
fbnet0 - graph.edgelist(edgelist , directed=FALSE)
fbnet0
# IGRAPH U --- 152 551 --
See also help(graph.edgelist) for more graph constructors (from an
adjacency matrix, from an edge list with a data frame describind the nodes,
...)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 9 / 39
Import data
Vertexes, vertex attributes
The graph's vertexes are accessed and counted with:
V(fbnet0)
# Vertex sequence :
# [1] 1 2 3 4 5...
vcount(fbnet0)
# [1] 152
Vertexes can be described by attributes:
# add an attribute for vertices
V(fbnet0)$initials - vnames [,1]
V(fbnet0)$list - vnames [,2]
fbnet0
# IGRAPH U --- 152 551 --
# + attr : initials (v/c), list (v/c)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 10 / 39
Import data
Edges, edge attributes
The graph's edges are accessed and counted with:
E(fbnet0)
# [1] 11 -- 1
# [2] 41 -- 1
# [3] 52 -- 1
# [4] 69 -- 1
# [5] 74 -- 1
# [6] 75 -- 1
# ...
ecount(fbnet0)
# 551
igraph can also handle edge attributes (and also graph attributes).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 11 / 39
Import data
Settings
Notations
In the following, a graph G = (V , E, W ) with:
• V : set of vertexes {x1, . . . , xp};
• E: set of edges;
• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 12 / 39
Import data
Settings
Notations
In the following, a graph G = (V , E, W ) with:
• V : set of vertexes {x1, . . . , xp};
• E: set of edges;
• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.
The graph is said to be connected/connexe if any node can be reached
from any other node by a path/un chemin.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 12 / 39
Import data
Settings
Notations
In the following, a graph G = (V , E, W ) with:
• V : set of vertexes {x1, . . . , xp};
• E: set of edges;
• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.
The graph is said to be connected/connexe if any node can be reached
from any other node by a path/un chemin.
The connected components/composantes connexes of a graph are all its
connected subgraphs.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 12 / 39
Import data
Settings
Notations
In the following, a graph G = (V , E, W ) with:
• V : set of vertexes {x1, . . . , xp};
• E: set of edges;
• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.
Example 1: Natty's FB network has 21 connected components with 122
vertexes (professional contacts, family and closest friends) or from 1 to 5
vertexes (isolated nodes)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 12 / 39
Import data
Settings
Notations
In the following, a graph G = (V , E, W ) with:
• V : set of vertexes {x1, . . . , xp};
• E: set of edges;
• W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0.
Example 2: Medieval network: 10 542 nodes and the largest connected
component contains 10 025 nodes (giant component / composante
géante).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 12 / 39
Import data
Connected components
is.connected(fbnet0)
# [1] FALSE
As this network is not connected, the connected components can be
extracted:
fb.components - clusters(fbnet0)
names(fb.components)
# [1]  membership   csize   no 
head(fb.components$membership , 10)
# [1] 1 1 2 2 1 1 1 1 3 1
fb.components$csize
# [1] 122 5 1 1 2 1 1 1 2 1
# [11] 1 2 1 1 2 3 1 1 1 1
# [21] 1
fb.components$no
# [1] 21
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 13 / 39
Import data
Largest connected component
fbnet.lcc - induced.subgraph(fbnet0 ,
fb.components$membership ==
which.max(fb.components$csize))
# main characteristics of the LCC
fbnet.lcc
# IGRAPH U --- 122 535 --
# + attr : initials (v/x)
is.connected(fbnet.lcc)
# [1] TRUE
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 14 / 39
Visualization
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 15 / 39
Visualization
Visualization tools help understand the graph
macro-structure
Purpose: How to display the nodes in a meaningful and aesthetic way?
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 16 / 39
Visualization
Visualization tools help understand the graph
macro-structure
Purpose: How to display the nodes in a meaningful and aesthetic way?
Standard approach: force directed placement algorithms (FDP)
algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 16 / 39
Visualization
Visualization tools help understand the graph
macro-structure
Purpose: How to display the nodes in a meaningful and aesthetic way?
Standard approach: force directed placement algorithms (FDP)
algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])
• attractive forces: similar to springs along the edges
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 16 / 39
Visualization
Visualization tools help understand the graph
macro-structure
Purpose: How to display the nodes in a meaningful and aesthetic way?
Standard approach: force directed placement algorithms (FDP)
algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])
• attractive forces: similar to springs along the edges
• repulsive forces: similar to electric forces between all pairs of vertexes
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 16 / 39
Visualization
Visualization tools help understand the graph
macro-structure
Purpose: How to display the nodes in a meaningful and aesthetic way?
Standard approach: force directed placement algorithms (FDP)
algorithmes de forces (e.g., [Fruchterman and Reingold, 1991])
• attractive forces: similar to springs along the edges
• repulsive forces: similar to electric forces between all pairs of vertexes
iterative algorithm until stabilization of the vertex positions.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 16 / 39
Visualization
Visualization
• package igraph1
[Csardi and Nepusz, 2006] (static
representation with useful tools for graph mining)
1
http://igraph.sourceforge.net/
2
http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 17 / 39
Visualization
Visualization
• package igraph1
[Csardi and Nepusz, 2006] (static
representation with useful tools for graph mining)
• free interactive software: Gephi, Tulip,
Cytoscape, ...
2
1
http://igraph.sourceforge.net/
2
http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 17 / 39
Visualization
Network visualization
Dierent layouts are implemented in igraph to visualize the graph:
plot(fbnet.lcc , layout=layout.random ,
main=random layout, vertex.size=3,
vertex.color=pink, vertex.frame.color=pink
vertex.label.color=darkred,
edge.color=grey,
vertex.label=V(fbnet.lcc)$initials)
Try also layout.circle, layout.kamada.kawai,
layout.fruchterman.reingold... Network are generated with some
randomness. See also help(igraph.plotting) for more information on
network visualization
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 18 / 39
Visualization
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 19 / 39
Visualization
Plot attributes
igraph integrates a pre-dened graph attribute layout:
V(fbnet.lcc)$label - V(fbnet.lcc)$initials
and label and color node attributes
V(fbnet.lcc)$label - V(fbnet.lcc)$initials
V(fbnet.lcc)$color - rainbow(length(unique(
V(fbnet.lcc)$list )))[
as.numeric(factor(V(fbnet.lcc)$list ))]
plot(fbnet.lcc , vertex.size=3,
vertex.label.color=black, edge.color=grey)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 20 / 39
Visualization
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 21 / 39
Global characteristics
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 22 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Examples
Example 1: Natty's FB network
• 152 vertexes, 551 edges ⇒ density = 551
152×151/2
4.8%;
• largest connected component: 122 vertexes, 535 edges ⇒ density
7.2%.
Example 2: Medieval network (largest connected component): 10 025
vertexes, 17 612 edges ⇒ density 0.035%.
Projected network (individuals): 3 755 vertexes, 8 315 edges ⇒ density
0.12%.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Transitivity: Number of triangles divided by the number of triplets
connected by at least two edges. What is the probability that two friends of
mine are also friends?
Density is equal to
4
4×3/2
= 2/3 ; Transitivity is equal to 1/3.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Transitivity: Number of triangles divided by the number of triplets
connected by at least two edges. What is the probability that two friends of
mine are also friends?
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Transitivity: Number of triangles divided by the number of triplets
connected by at least two edges. What is the probability that two friends of
mine are also friends?
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Transitivity: Number of triangles divided by the number of triplets
connected by at least two edges. What is the probability that two friends of
mine are also friends?
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Density / Transitivity Densité / Transitivité
Density: Number of edges divided by the number of pairs of vertexes. Is
the network densely connected?
Transitivity: Number of triangles divided by the number of triplets
connected by at least two edges. What is the probability that two friends of
mine are also friends?
Examples
Example 1: Natty's FB network
• density 4.8%, transitivity 56.2%;
• largest connected component: density 7.2%, transitivity 56.0%.
Example 2: Medieval network (projected network, individuals): density
0.12%, transitivity 6.1%.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 23 / 39
Global characteristics
Global characteristics
graph.density(fbnet.lcc)
# [1] 0.0724834
transitivity(fbnet.lcc)
# [1] 0.5604524
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 24 / 39
Numerical characteristics calculation
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 25 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
Vertexes with a high degree are called hubs: measure of the vertex
popularity.
Number of nodes (y-axis) with a given degree (x-axis)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
Vertexes with a high degree are called hubs: measure of the vertex
popularity.
Two hubs are students who have been hold back at school and the
other two are from my most recent class.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
Vertexes with a high degree are called hubs: measure of the vertex
popularity.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
1 2 5 10 20 50 100 200 500
1550500
Names
Transactions
This distribution indicates preferential attachement attachement
préférentiel.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
The orange node's degree is equal to 2, its betweenness to 4.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
Vertexes with a high be-
tweenness ( 3 000) are
2 political gures.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Extracting important nodes
1 vertex degree degré: number of edges adjacent to a given vertex or di = j
Wij .
The degree distribution is known to t a power law loi de puissance
in most real networks:
2 vertex betweenness centralité: number of shortest paths between all pairs of
vertexes that pass through the vertex. Betweenness is a centrality measure
(vertexes that are likely to disconnect the network if removed).
Example 2: In the medieval network: moral persons such as the
Chapter of Cahors or the Church of Flaugnac have a high
betweenness despite a low degree.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 26 / 39
Numerical characteristics calculation
Degree and betweenness
fbnet.degrees - degree(fbnet.lcc)
summary(fbnet.degrees)
# Min . 1 st Qu . Median Mean 3 rd Qu . Max .
# 1.00 2.00 6.00 8.77 15.00 31.00
fbnet.between - betweenness(fbnet.lcc)
summary(fbnet.between)
# Min . 1 st Qu . Median Mean 3 rd Qu . Max .
# 0.00 0.00 14.03 301.70 123.10 3439.00
and their distributions:
par(mfrow=c(1,2))
plot(density(fbnet.degrees), lwd=2,
main=Degree distribution, xlab=Degree,
ylab=Density)
plot(density(fbnet.between), lwd=2,
main=Betweenness distribution,
xlab=Betweenness, ylab=Density)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 27 / 39
Numerical characteristics calculation
Degree and betweenness distribution
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 28 / 39
Numerical characteristics calculation
Combine visualization and individual characteristics
par(mar=rep(1,4))
# set node attribute `size ' with degree
V(fbnet.lcc)$size - 2*sqrt(fbnet.degrees)
# set node attribute ` color ' with betweenness
bet.col - cut(log(fbnet.between +1),10,
labels=FALSE)
V(fbnet.lcc)$color - heat.colors (10)[11 - bet.col]
plot(fbnet.lcc , main=Degree and betweenness,
vertex.frame.color=heat.colors (10)[ bet.col],
vertex.label=NA, edge.color=grey)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 29 / 39
Numerical characteristics calculation
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 30 / 39
Clustering
Outline
1 Import data
2 Visualization
3 Global characteristics
4 Numerical characteristics calculation
5 Clustering
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 31 / 39
Clustering
Vertex clustering classication
Cluster vertexes into groups that are densely connected and share a few
links (comparatively) with the other groups. Clusters are often called
communities communautés (social sciences) or modules modules
(biology).
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 32 / 39
Clustering
Vertex clustering classication
Cluster vertexes into groups that are densely connected and share a few
links (comparatively) with the other groups. Clusters are often called
communities communautés (social sciences) or modules modules
(biology).
Example 1: Natty's facebook network
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 32 / 39
Clustering
Vertex clustering classication
Cluster vertexes into groups that are densely connected and share a few
links (comparatively) with the other groups. Clusters are often called
communities communautés (social sciences) or modules modules
(biology).
Example 2: medieval network
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 32 / 39
Clustering
Vertex clustering classication
Cluster vertexes into groups that are densely connected and share a few
links (comparatively) with the other groups. Clusters are often called
communities communautés (social sciences) or modules modules
(biology).
Several clustering methods:
• min cut minimization minimizes the number of edges between clusters;
• spectral clustering [von Luxburg, 2007] and kernel clustering uses
eigen-decomposition of the Laplacian/Laplacien
Lij =
−wij if i = j
di otherwise
(matrix strongly related to the graph structure);
• Generative (Bayesian) models [Zanghi et al., 2008];
• Markov clustering simulate a ow on the graph;
• modularity maximization
• ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007,
Schaeer, 2007, Brohée and van Helden, 2006])
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 32 / 39
Clustering
Find clusters by modularity optimization modularité
The modularity [Newman and Girvan, 2004] of the partition
(C1, . . . , CK) is equal to:
Q(C1, . . . , CK) =
1
2m
K
k=1 xi ,xj ∈Ck
(Wij − Pij)
with Pij: weight of a null model (graph with the same degree distribution
but no preferential attachment):
Pij =
didj
2m
with di = 1
2 j=i Wij.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 33 / 39
Clustering
Interpretation
A good clustering should maximize the modularity:
• Q when (xi, xj) are in the same cluster and Wij Pij
• Q when (xi, xj) are in two dierent clusters and Wij Pij
(m = 20)
Pij = 7.5
Wij = 5 ⇒ Wij − Pij = −2.5
di = 15 dj = 20
i and j in the same cluster decreases the modularity
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 34 / 39
Clustering
Interpretation
A good clustering should maximize the modularity:
• Q when (xi, xj) are in the same cluster and Wij Pij
• Q when (xi, xj) are in two dierent clusters and Wij Pij
(m = 20)
Pij = 0.05
Wij = 5 ⇒ Wij − Pij = 4.95
di = 1 dj = 2
i and j in the same cluster increases the modularity
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 34 / 39
Clustering
Interpretation
A good clustering should maximize the modularity:
• Q when (xi, xj) are in the same cluster and Wij Pij
• Q when (xi, xj) are in two dierent clusters and Wij Pij
• Modularity
• helps separate hubs (= spectral clustering or min cut criterion);
• is not an increasing function of the number of clusters: useful to
choose the relevant number of clusters (with a grid search: several
values are tested, the clustering with the highest modularity is kept)
but modularity has a small resolution default (see
[Fortunato and Barthélémy, 2007])
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 34 / 39
Clustering
Interpretation
A good clustering should maximize the modularity:
• Q when (xi, xj) are in the same cluster and Wij Pij
• Q when (xi, xj) are in two dierent clusters and Wij Pij
• Modularity
• helps separate hubs (= spectral clustering or min cut criterion);
• is not an increasing function of the number of clusters: useful to
choose the relevant number of clusters (with a grid search: several
values are tested, the clustering with the highest modularity is kept)
but modularity has a small resolution default (see
[Fortunato and Barthélémy, 2007])
Main issue: Optimization = NP-complete problem (exhaustive search is
not not usable)
Dierent solutions are provided in
[Newman and Girvan, 2004, Blondel et al., 2008,
Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (among
others) and some of them are implemented in the R package igraph.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 34 / 39
Clustering
Node clustering
One of the function to perform node clustering is multilevel.community
(probably not the best with large graphs):
fbnet.clusters - multilevel.community(fb.lcc)
modularity(fbnet.clusters)
# Graph community structure calculated with the mul
# Number of communities ( best split ): 7
# Modularity ( best split ): 0.566977
# Membership vector :
# [1] 3 2 2 2 2 3 5 3 5 5 4 3 3 2 ...
table(fbnet.clusters$membership)
# 1 2 3 4 5 6 7
# 2 18 26 32 12 8 24
See help(communities) for more information.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 35 / 39
Clustering
Combine clustering and visualization
# create a new attribute
V(fbnet.lcc)$community - fbnet.clusters$membership
fbnet.lcc
# IGRAPH U --- 122 535 --
#+ attr : layout (g/n), initials (v/c), list (v/c),
# label (v/c), color (v/c), community (v/n),
# size (v/n)
Display the clustering:
par(mfrow=c(1,1))
par(mar=rep(1,4))
plot(fbnet.lcc , main=Communities,
vertex.frame.color=
rainbow (9)[ fbnet.clusters$membership],
vertex.color=
rainbow (9)[ fbnet.clusters$membership],
vertex.label=NA, edge.color=grey)
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 36 / 39
Clustering
Combine clustering and visualization
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 37 / 39
Clustering
Export the graph
The graphml format can be used to export the graph (it can be read by
most of the graph visualization programs and includes information on node
and edge attributes)
write.graph(fbnet.lcc , file=fblcc.graphml,
format=graphml)
see help(write.graph) for more information of graph exportation formats
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 38 / 39
Clustering
Further references...
on clustering...
• overlapping communities communautés recouvrantes;
• hierarchical clustering [Rossi and Villa-Vialaneix, 2011] provides an approach;
• organized clustering (projection on a small dimensional grid) and
clustering for visualization [Boulet et al., 2008,
Rossi and Villa-Vialaneix, 2010, Rossi and Villa-Vialaneix, 2011];
• ...
on R
CRAN Task View: gRaphical Models in R
on network inference
you can start with the course An introduction to network inference and
mining
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 39 / 39
Clustering
References
Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008).
Fast unfolding of communites in large networks.
Journal of Statistical Mechanics: Theory and Experiment, P10008:17425468.
Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008).
Batch kernel SOM and related Laplacian methods for social network analysis.
Neurocomputing, 71(7-9):12571273.
Brohée, S. and van Helden, J. (2006).
Evaluation of clustering algorithms for protein-protein interaction networks.
BMC Bioinformatics, 7(488).
Csardi, G. and Nepusz, T. (2006).
The igraph software package for complex network research.
InterJournal, Complex Systems.
Fortunato, S. and Barthélémy, M. (2007).
Resolution limit in community detection.
In Proceedings of the National Academy of Sciences, volume 104, pages 3641.
doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract.
Fruchterman, T. and Reingold, B. (1991).
Graph drawing by force-directed placement.
Software, Practice and Experience, 21:11291164.
Newman, M. and Girvan, M. (2004).
Finding and evaluating community structure in networks.
Physical Review, E, 69:026113.
Noack, A. and Rotta, R. (2009).
Multi-level algorithms for modularity clustering.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 39 / 39
Clustering
In SEA '09: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257268,
Berlin, Heidelberg. Springer-Verlag.
Rossi, F. and Villa-Vialaneix, N. (2010).
Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing
approach.
Neurocomputing, 73(7-9):11421163.
Rossi, F. and Villa-Vialaneix, N. (2011).
Représentation d'un grand réseau à partir d'une classication hiérarchique de ses sommets.
Journal de la Société Française de Statistique, 152(3):3465.
Schaeer, S. (2007).
Graph clustering.
Computer Science Review, 1(1):2764.
Villa-Vialaneix, N., Liaubet, L., Laurent, T., Cherel, P., Gamot, A., and SanCristobal, M. (2013).
The structure of a gene co-expression network reveals biological functions underlying eQTLs.
PLoS ONE, 8(4):e60045.
von Luxburg, U. (2007).
A tutorial on spectral clustering.
Statistics and Computing, 17(4):395416.
Zanghi, H., Ambroise, C., and Miele, V. (2008).
Fast online graph clustering via erdös-rényi mixture.
Pattern Recognition, 41:35923599.
AG PEPI IBIS (01/04/2014) Network  R Nathalie Villa-Vialaneix 39 / 39

Contenu connexe

Tendances

Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithmguest862df4e
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsFrancesco Tudisco
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationMichael Mathioudakis
 
Graph Algorithms
Graph AlgorithmsGraph Algorithms
Graph AlgorithmsAshwin Shiv
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesAnkit Garg
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofArunanand Ta
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithmsMohammad Sadiq
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsDrishti Bhalla
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsKetan Jani
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmMahesh Kodituwakku
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsThirunavukarasu Mani
 

Tendances (17)

Pixelrelationships
PixelrelationshipsPixelrelationships
Pixelrelationships
 
2.5 graph dfs
2.5 graph dfs2.5 graph dfs
2.5 graph dfs
 
Dijkstra's Algorithm
Dijkstra's AlgorithmDijkstra's Algorithm
Dijkstra's Algorithm
 
Core–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectorsCore–periphery detection in networks with nonlinear Perron eigenvectors
Core–periphery detection in networks with nonlinear Perron eigenvectors
 
Optimisation random graph presentation
Optimisation random graph presentationOptimisation random graph presentation
Optimisation random graph presentation
 
Discrete time signals on MATLAB
Discrete time signals on MATLABDiscrete time signals on MATLAB
Discrete time signals on MATLAB
 
Bump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentationBump Hunting in the Dark - ICDE15 presentation
Bump Hunting in the Dark - ICDE15 presentation
 
Graph Algorithms
Graph AlgorithmsGraph Algorithms
Graph Algorithms
 
Line drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniquesLine drawing algorithm and antialiasing techniques
Line drawing algorithm and antialiasing techniques
 
Elliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge ProofElliptic Curve Cryptography and Zero Knowledge Proof
Elliptic Curve Cryptography and Zero Knowledge Proof
 
Lines and curves algorithms
Lines and curves algorithmsLines and curves algorithms
Lines and curves algorithms
 
DDA algorithm
DDA algorithmDDA algorithm
DDA algorithm
 
Mid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer GraphicsMid point line Algorithm - Computer Graphics
Mid point line Algorithm - Computer Graphics
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygons
 
bresenham circles and polygons in computer graphics(Computer graphics tutorials)
bresenham circles and polygons in computer graphics(Computer graphics tutorials)bresenham circles and polygons in computer graphics(Computer graphics tutorials)
bresenham circles and polygons in computer graphics(Computer graphics tutorials)
 
Bresenham Line Drawing Algorithm
Bresenham Line Drawing AlgorithmBresenham Line Drawing Algorithm
Bresenham Line Drawing Algorithm
 
Graphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygonsGraphics6 bresenham circlesandpolygons
Graphics6 bresenham circlesandpolygons
 

En vedette

Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014tuxette
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression networktuxette
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional datatuxette
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOtuxette
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Datatuxette
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningIntegrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningtuxette
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetstuxette
 

En vedette (10)

Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014Slides Lycée Jules Fil 2014
Slides Lycée Jules Fil 2014
 
Mining co-expression network
Mining co-expression networkMining co-expression network
Mining co-expression network
 
Interpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional dataInterpretable Sparse Sliced Inverse Regression for digitized functional data
Interpretable Sparse Sliced Inverse Regression for digitized functional data
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Inferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSOInferring networks from multiple samples with consensus LASSO
Inferring networks from multiple samples with consensus LASSO
 
Random Forest for Big Data
Random Forest for Big DataRandom Forest for Big Data
Random Forest for Big Data
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learningIntegrating Tara Oceans datasets using unsupervised multiple kernel learning
Integrating Tara Oceans datasets using unsupervised multiple kernel learning
 
Multiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasetsMultiple kernel learning applied to the integration of Tara oceans datasets
Multiple kernel learning applied to the integration of Tara oceans datasets
 

Similaire à Visualiser et fouiller des réseaux - Méthodes et exemples dans R

Financial Networks: II. Fundamentals of Network Theory and FNA
Financial Networks: II. Fundamentals of Network Theory and FNAFinancial Networks: II. Fundamentals of Network Theory and FNA
Financial Networks: II. Fundamentals of Network Theory and FNAKimmo Soramaki
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...tuxette
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-pythonJoe OntheRocks
 
Graph mining with kernel self-organizing map
Graph mining with kernel self-organizing mapGraph mining with kernel self-organizing map
Graph mining with kernel self-organizing maptuxette
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsArijit Khan
 
Large network analysis : visualization and clustering
Large network analysis : visualization and clusteringLarge network analysis : visualization and clustering
Large network analysis : visualization and clusteringtuxette
 
Design the implementation of NMEA Get GPS Data from Record
Design the implementation of NMEA Get GPS Data from RecordDesign the implementation of NMEA Get GPS Data from Record
Design the implementation of NMEA Get GPS Data from RecordAnkita Tiwari
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and MilestoneBarry Norton
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rYanchang Zhao
 
Network analysis for computational biology
Network analysis for computational biologyNetwork analysis for computational biology
Network analysis for computational biologytuxette
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquestuxette
 
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchVisual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchBABU MALVIYA
 
Financial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresFinancial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresKimmo Soramaki
 
Exploitations of wireless interface via network scanning
Exploitations of wireless interface via network scanningExploitations of wireless interface via network scanning
Exploitations of wireless interface via network scanningbrkavyashree
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Takashi J OZAKI
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Predictiontuxette
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènestuxette
 
WF ED 540, Class Meeting 3, 10 Septmber 2015
WF ED 540, Class Meeting 3, 10 Septmber 2015WF ED 540, Class Meeting 3, 10 Septmber 2015
WF ED 540, Class Meeting 3, 10 Septmber 2015Penn State University
 

Similaire à Visualiser et fouiller des réseaux - Méthodes et exemples dans R (20)

Financial Networks: II. Fundamentals of Network Theory and FNA
Financial Networks: II. Fundamentals of Network Theory and FNAFinancial Networks: II. Fundamentals of Network Theory and FNA
Financial Networks: II. Fundamentals of Network Theory and FNA
 
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
Rserve, renv, flask, Vue.js dans un docker pour intégrer des données omiques ...
 
Social network-analysis-in-python
Social network-analysis-in-pythonSocial network-analysis-in-python
Social network-analysis-in-python
 
Awalin viz sec
Awalin viz secAwalin viz sec
Awalin viz sec
 
Graph mining with kernel self-organizing map
Graph mining with kernel self-organizing mapGraph mining with kernel self-organizing map
Graph mining with kernel self-organizing map
 
Dagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphsDagstuhl seminar talk on querying big graphs
Dagstuhl seminar talk on querying big graphs
 
Large network analysis : visualization and clustering
Large network analysis : visualization and clusteringLarge network analysis : visualization and clustering
Large network analysis : visualization and clustering
 
Design the implementation of NMEA Get GPS Data from Record
Design the implementation of NMEA Get GPS Data from RecordDesign the implementation of NMEA Get GPS Data from Record
Design the implementation of NMEA Get GPS Data from Record
 
Knowledge Graphs and Milestone
Knowledge Graphs and MilestoneKnowledge Graphs and Milestone
Knowledge Graphs and Milestone
 
RDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-rRDataMining slides-network-analysis-with-r
RDataMining slides-network-analysis-with-r
 
Network analysis for computational biology
Network analysis for computational biologyNetwork analysis for computational biology
Network analysis for computational biology
 
R studio
R studio R studio
R studio
 
Méthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiquesMéthodologies d'intégration de données omiques
Méthodologies d'intégration de données omiques
 
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image SearchVisual-Textual Joint Relevance Learning for Tag-Based Social Image Search
Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search
 
Financial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing ExposuresFinancial Networks IV. Analyzing and Visualizing Exposures
Financial Networks IV. Analyzing and Visualizing Exposures
 
Exploitations of wireless interface via network scanning
Exploitations of wireless interface via network scanningExploitations of wireless interface via network scanning
Exploitations of wireless interface via network scanning
 
Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}Visualization of Supervised Learning with {arules} + {arulesViz}
Visualization of Supervised Learning with {arules} + {arulesViz}
 
Graph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype PredictionGraph Neural Network for Phenotype Prediction
Graph Neural Network for Phenotype Prediction
 
Méthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènesMéthodes à noyaux pour l’intégration de données hétérogènes
Méthodes à noyaux pour l’intégration de données hétérogènes
 
WF ED 540, Class Meeting 3, 10 Septmber 2015
WF ED 540, Class Meeting 3, 10 Septmber 2015WF ED 540, Class Meeting 3, 10 Septmber 2015
WF ED 540, Class Meeting 3, 10 Septmber 2015
 

Plus de tuxette

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathstuxette
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-Ctuxette
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?tuxette
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...tuxette
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquestuxette
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeantuxette
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquestuxette
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...tuxette
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...tuxette
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation datatuxette
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?tuxette
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysistuxette
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricestuxette
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random foresttuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICStuxette
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNNtuxette
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...tuxette
 

Plus de tuxette (20)

Racines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en mathsRacines en haut et feuilles en bas : les arbres en maths
Racines en haut et feuilles en bas : les arbres en maths
 
Projets autour de l'Hi-C
Projets autour de l'Hi-CProjets autour de l'Hi-C
Projets autour de l'Hi-C
 
Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?Can deep learning learn chromatin structure from sequence?
Can deep learning learn chromatin structure from sequence?
 
Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...Multi-omics data integration methods: kernel and other machine learning appro...
Multi-omics data integration methods: kernel and other machine learning appro...
 
ASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiquesASTERICS : une application pour intégrer des données omiques
ASTERICS : une application pour intégrer des données omiques
 
Autour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWeanAutour des projets Idefics et MetaboWean
Autour des projets Idefics et MetaboWean
 
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiquesApprentissage pour la biologie moléculaire et l’analyse de données omiques
Apprentissage pour la biologie moléculaire et l’analyse de données omiques
 
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
Quelques résultats préliminaires de l'évaluation de méthodes d'inférence de r...
 
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
Intégration de données omiques multi-échelles : méthodes à noyau et autres ap...
 
Journal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation dataJournal club: Validation of cluster analysis results on validation data
Journal club: Validation of cluster analysis results on validation data
 
Overfitting or overparametrization?
Overfitting or overparametrization?Overfitting or overparametrization?
Overfitting or overparametrization?
 
Selective inference and single-cell differential analysis
Selective inference and single-cell differential analysisSelective inference and single-cell differential analysis
Selective inference and single-cell differential analysis
 
SOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatricesSOMbrero : un package R pour les cartes auto-organisatrices
SOMbrero : un package R pour les cartes auto-organisatrices
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Explanable models for time series with random forest
Explanable models for time series with random forestExplanable models for time series with random forest
Explanable models for time series with random forest
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Présentation du projet ASTERICS
Présentation du projet ASTERICSPrésentation du projet ASTERICS
Présentation du projet ASTERICS
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
A review on structure learning in GNN
A review on structure learning in GNNA review on structure learning in GNN
A review on structure learning in GNN
 
La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...La statistique et le machine learning pour l'intégration de données de la bio...
La statistique et le machine learning pour l'intégration de données de la bio...
 

Dernier

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin ClassesCeline George
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxnegromaestrong
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 

Dernier (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 

Visualiser et fouiller des réseaux - Méthodes et exemples dans R

  • 1. Visualiser et fouiller des réseaux Méthodes et exemples dans R Nathalie Villa-Vialaneix - nathalie.villa@toulouse.inra.fr http://www.nathalievilla.org Unité MIA-T, INRA, Toulouse AG PEPI IBIS - 1er avril 2014 AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 1 / 39
  • 2. Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 2 / 39
  • 3. What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
  • 4. What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. The entities are called the nodes or the vertexes (vertices in British) n÷uds/sommets AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
  • 5. What is a network/graph? réseau/graphe Mathematical object used to model relational data between entities. A relation between two entities is modeled by an edge arête AG PEPI IBIS (01/04/2014) Network & R Nathalie Villa-Vialaneix 3 / 39
  • 6. (non biological) Examples Social network: nodes: persons - edges: 2 persons are connected (friends), as in my facebook network: In the following: this network will be used to illustrate the presentation. Note: if you want to test the script with your own facebook network, you can extract it at http://shiny.nathalievilla.org/fbs. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 4 / 39
  • 7. (non biological) Examples Modeling a large corpus of medieval documents Notarial acts (mostly baux à ef, more precisely, land charters) established in a seigneurie named Castelnau Montratier, written between 1250 and 1500, involving tenants and lords. a a http://graphcomp.univ-tlse2.fr AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 4 / 39
  • 8. (non biological) Examples Modeling a large corpus of medieval documents • nodes: transactions and individuals (3 918 nodes) • edges: an individual is directly involved in a transaction (6 455 edges) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 4 / 39
  • 9. (non biological) Examples AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 4 / 39
  • 10. Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Example: co-expression networks built from microarray data (nodes = genes; edges = signicant direct links between expressions of two genes) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 5 / 39
  • 11. Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? Random positions Positions aiming at representing connected nodes closer AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 5 / 39
  • 12. Standard issues associated with networks Inference Giving data, how to build a graph whose edges represent the direct links between variables? Graph mining (examples) 1 Network visualization: nodes are not a priori associated to a given position. How to represent the network in a meaningful way? 2 Network clustering: identify communities (groups of nodes that are densely connected and share a few links (comparatively) with the other groups) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 5 / 39
  • 13. More complex relational models Nodes may be labeled by a factor AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 6 / 39
  • 14. More complex relational models Nodes may be labeled by a factor ... or by a numerical information. [Villa-Vialaneix et al., 2013] AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 6 / 39
  • 15. More complex relational models Nodes may be labeled by a factor ... or by a numerical information. [Villa-Vialaneix et al., 2013] Edges may also be labeled (type of the relation) or weighted (strength of the relation) or directed (direction of the relation). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 6 / 39
  • 16. Before we start... Available material • this slide (on slideshare or on my website, page seminars); • data: my facebook network with two les: fbnet-el.txt (edge list) and fbnet-name.txt (node names and list) • script: a R script with all command lines included in this slide, fb-Rscript.R AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 7 / 39
  • 17. Before we start... Available material • this slide (on slideshare or on my website, page seminars); • data: my facebook network with two les: fbnet-el.txt (edge list) and fbnet-name.txt (node names and list) • script: a R script with all command lines included in this slide, fb-Rscript.R Outline Introduce basic concepts on network mining (not inference) Illustrate them with R (required package: igraph) on my facebook network AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 7 / 39
  • 18. Import data Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 8 / 39
  • 19. Import data Import a graph from an edge list with igraph edgelist - as.matrix(read.table(fbnet -el.txt)) vnames - read.table(fbnet -name.txt) vnames - read.table(fbnet -name.txt, sep=,, stringsAsFactor=FALSE , na.strings=) The graph is built with: # with ` graph . edgelist ' fbnet0 - graph.edgelist(edgelist , directed=FALSE) fbnet0 # IGRAPH U --- 152 551 -- See also help(graph.edgelist) for more graph constructors (from an adjacency matrix, from an edge list with a data frame describind the nodes, ...) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 9 / 39
  • 20. Import data Vertexes, vertex attributes The graph's vertexes are accessed and counted with: V(fbnet0) # Vertex sequence : # [1] 1 2 3 4 5... vcount(fbnet0) # [1] 152 Vertexes can be described by attributes: # add an attribute for vertices V(fbnet0)$initials - vnames [,1] V(fbnet0)$list - vnames [,2] fbnet0 # IGRAPH U --- 152 551 -- # + attr : initials (v/c), list (v/c) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 10 / 39
  • 21. Import data Edges, edge attributes The graph's edges are accessed and counted with: E(fbnet0) # [1] 11 -- 1 # [2] 41 -- 1 # [3] 52 -- 1 # [4] 69 -- 1 # [5] 74 -- 1 # [6] 75 -- 1 # ... ecount(fbnet0) # 551 igraph can also handle edge attributes (and also graph attributes). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 11 / 39
  • 22. Import data Settings Notations In the following, a graph G = (V , E, W ) with: • V : set of vertexes {x1, . . . , xp}; • E: set of edges; • W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 12 / 39
  • 23. Import data Settings Notations In the following, a graph G = (V , E, W ) with: • V : set of vertexes {x1, . . . , xp}; • E: set of edges; • W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0. The graph is said to be connected/connexe if any node can be reached from any other node by a path/un chemin. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 12 / 39
  • 24. Import data Settings Notations In the following, a graph G = (V , E, W ) with: • V : set of vertexes {x1, . . . , xp}; • E: set of edges; • W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0. The graph is said to be connected/connexe if any node can be reached from any other node by a path/un chemin. The connected components/composantes connexes of a graph are all its connected subgraphs. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 12 / 39
  • 25. Import data Settings Notations In the following, a graph G = (V , E, W ) with: • V : set of vertexes {x1, . . . , xp}; • E: set of edges; • W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0. Example 1: Natty's FB network has 21 connected components with 122 vertexes (professional contacts, family and closest friends) or from 1 to 5 vertexes (isolated nodes) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 12 / 39
  • 26. Import data Settings Notations In the following, a graph G = (V , E, W ) with: • V : set of vertexes {x1, . . . , xp}; • E: set of edges; • W : weights on edges s.t. Wij ≥ 0, Wij = Wji and Wii = 0. Example 2: Medieval network: 10 542 nodes and the largest connected component contains 10 025 nodes (giant component / composante géante). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 12 / 39
  • 27. Import data Connected components is.connected(fbnet0) # [1] FALSE As this network is not connected, the connected components can be extracted: fb.components - clusters(fbnet0) names(fb.components) # [1] membership csize no head(fb.components$membership , 10) # [1] 1 1 2 2 1 1 1 1 3 1 fb.components$csize # [1] 122 5 1 1 2 1 1 1 2 1 # [11] 1 2 1 1 2 3 1 1 1 1 # [21] 1 fb.components$no # [1] 21 AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 13 / 39
  • 28. Import data Largest connected component fbnet.lcc - induced.subgraph(fbnet0 , fb.components$membership == which.max(fb.components$csize)) # main characteristics of the LCC fbnet.lcc # IGRAPH U --- 122 535 -- # + attr : initials (v/x) is.connected(fbnet.lcc) # [1] TRUE AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 14 / 39
  • 29. Visualization Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 15 / 39
  • 30. Visualization Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 16 / 39
  • 31. Visualization Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 16 / 39
  • 32. Visualization Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 16 / 39
  • 33. Visualization Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertexes AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 16 / 39
  • 34. Visualization Visualization tools help understand the graph macro-structure Purpose: How to display the nodes in a meaningful and aesthetic way? Standard approach: force directed placement algorithms (FDP) algorithmes de forces (e.g., [Fruchterman and Reingold, 1991]) • attractive forces: similar to springs along the edges • repulsive forces: similar to electric forces between all pairs of vertexes iterative algorithm until stabilization of the vertex positions. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 16 / 39
  • 35. Visualization Visualization • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) 1 http://igraph.sourceforge.net/ 2 http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/ AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 17 / 39
  • 36. Visualization Visualization • package igraph1 [Csardi and Nepusz, 2006] (static representation with useful tools for graph mining) • free interactive software: Gephi, Tulip, Cytoscape, ... 2 1 http://igraph.sourceforge.net/ 2 http://gephi.org, http://tulip.labri.fr/TulipDrupal/, http://www.cytoscape.org/ AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 17 / 39
  • 37. Visualization Network visualization Dierent layouts are implemented in igraph to visualize the graph: plot(fbnet.lcc , layout=layout.random , main=random layout, vertex.size=3, vertex.color=pink, vertex.frame.color=pink vertex.label.color=darkred, edge.color=grey, vertex.label=V(fbnet.lcc)$initials) Try also layout.circle, layout.kamada.kawai, layout.fruchterman.reingold... Network are generated with some randomness. See also help(igraph.plotting) for more information on network visualization AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 18 / 39
  • 38. Visualization AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 19 / 39
  • 39. Visualization Plot attributes igraph integrates a pre-dened graph attribute layout: V(fbnet.lcc)$label - V(fbnet.lcc)$initials and label and color node attributes V(fbnet.lcc)$label - V(fbnet.lcc)$initials V(fbnet.lcc)$color - rainbow(length(unique( V(fbnet.lcc)$list )))[ as.numeric(factor(V(fbnet.lcc)$list ))] plot(fbnet.lcc , vertex.size=3, vertex.label.color=black, edge.color=grey) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 20 / 39
  • 40. Visualization AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 21 / 39
  • 41. Global characteristics Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 22 / 39
  • 42. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 43. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Examples Example 1: Natty's FB network • 152 vertexes, 551 edges ⇒ density = 551 152×151/2 4.8%; • largest connected component: 122 vertexes, 535 edges ⇒ density 7.2%. Example 2: Medieval network (largest connected component): 10 025 vertexes, 17 612 edges ⇒ density 0.035%. Projected network (individuals): 3 755 vertexes, 8 315 edges ⇒ density 0.12%. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 44. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Transitivity: Number of triangles divided by the number of triplets connected by at least two edges. What is the probability that two friends of mine are also friends? Density is equal to 4 4×3/2 = 2/3 ; Transitivity is equal to 1/3. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 45. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Transitivity: Number of triangles divided by the number of triplets connected by at least two edges. What is the probability that two friends of mine are also friends? AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 46. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Transitivity: Number of triangles divided by the number of triplets connected by at least two edges. What is the probability that two friends of mine are also friends? AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 47. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Transitivity: Number of triangles divided by the number of triplets connected by at least two edges. What is the probability that two friends of mine are also friends? AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 48. Global characteristics Density / Transitivity Densité / Transitivité Density: Number of edges divided by the number of pairs of vertexes. Is the network densely connected? Transitivity: Number of triangles divided by the number of triplets connected by at least two edges. What is the probability that two friends of mine are also friends? Examples Example 1: Natty's FB network • density 4.8%, transitivity 56.2%; • largest connected component: density 7.2%, transitivity 56.0%. Example 2: Medieval network (projected network, individuals): density 0.12%, transitivity 6.1%. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 23 / 39
  • 49. Global characteristics Global characteristics graph.density(fbnet.lcc) # [1] 0.0724834 transitivity(fbnet.lcc) # [1] 0.5604524 AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 24 / 39
  • 50. Numerical characteristics calculation Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 25 / 39
  • 51. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . Vertexes with a high degree are called hubs: measure of the vertex popularity. Number of nodes (y-axis) with a given degree (x-axis) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 52. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . Vertexes with a high degree are called hubs: measure of the vertex popularity. Two hubs are students who have been hold back at school and the other two are from my most recent class. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 53. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . Vertexes with a high degree are called hubs: measure of the vertex popularity. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 54. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 1 2 5 10 20 50 100 200 500 1550500 Names Transactions This distribution indicates preferential attachement attachement préférentiel. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 55. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). The orange node's degree is equal to 2, its betweenness to 4. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 56. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 57. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 58. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 59. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 60. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 61. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). Vertexes with a high be- tweenness ( 3 000) are 2 political gures. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 62. Numerical characteristics calculation Extracting important nodes 1 vertex degree degré: number of edges adjacent to a given vertex or di = j Wij . The degree distribution is known to t a power law loi de puissance in most real networks: 2 vertex betweenness centralité: number of shortest paths between all pairs of vertexes that pass through the vertex. Betweenness is a centrality measure (vertexes that are likely to disconnect the network if removed). Example 2: In the medieval network: moral persons such as the Chapter of Cahors or the Church of Flaugnac have a high betweenness despite a low degree. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 26 / 39
  • 63. Numerical characteristics calculation Degree and betweenness fbnet.degrees - degree(fbnet.lcc) summary(fbnet.degrees) # Min . 1 st Qu . Median Mean 3 rd Qu . Max . # 1.00 2.00 6.00 8.77 15.00 31.00 fbnet.between - betweenness(fbnet.lcc) summary(fbnet.between) # Min . 1 st Qu . Median Mean 3 rd Qu . Max . # 0.00 0.00 14.03 301.70 123.10 3439.00 and their distributions: par(mfrow=c(1,2)) plot(density(fbnet.degrees), lwd=2, main=Degree distribution, xlab=Degree, ylab=Density) plot(density(fbnet.between), lwd=2, main=Betweenness distribution, xlab=Betweenness, ylab=Density) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 27 / 39
  • 64. Numerical characteristics calculation Degree and betweenness distribution AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 28 / 39
  • 65. Numerical characteristics calculation Combine visualization and individual characteristics par(mar=rep(1,4)) # set node attribute `size ' with degree V(fbnet.lcc)$size - 2*sqrt(fbnet.degrees) # set node attribute ` color ' with betweenness bet.col - cut(log(fbnet.between +1),10, labels=FALSE) V(fbnet.lcc)$color - heat.colors (10)[11 - bet.col] plot(fbnet.lcc , main=Degree and betweenness, vertex.frame.color=heat.colors (10)[ bet.col], vertex.label=NA, edge.color=grey) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 29 / 39
  • 66. Numerical characteristics calculation AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 30 / 39
  • 67. Clustering Outline 1 Import data 2 Visualization 3 Global characteristics 4 Numerical characteristics calculation 5 Clustering AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 31 / 39
  • 68. Clustering Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 32 / 39
  • 69. Clustering Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). Example 1: Natty's facebook network AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 32 / 39
  • 70. Clustering Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). Example 2: medieval network AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 32 / 39
  • 71. Clustering Vertex clustering classication Cluster vertexes into groups that are densely connected and share a few links (comparatively) with the other groups. Clusters are often called communities communautés (social sciences) or modules modules (biology). Several clustering methods: • min cut minimization minimizes the number of edges between clusters; • spectral clustering [von Luxburg, 2007] and kernel clustering uses eigen-decomposition of the Laplacian/Laplacien Lij = −wij if i = j di otherwise (matrix strongly related to the graph structure); • Generative (Bayesian) models [Zanghi et al., 2008]; • Markov clustering simulate a ow on the graph; • modularity maximization • ... (clustering jungle... see e.g., [Fortunato and Barthélémy, 2007, Schaeer, 2007, Brohée and van Helden, 2006]) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 32 / 39
  • 72. Clustering Find clusters by modularity optimization modularité The modularity [Newman and Girvan, 2004] of the partition (C1, . . . , CK) is equal to: Q(C1, . . . , CK) = 1 2m K k=1 xi ,xj ∈Ck (Wij − Pij) with Pij: weight of a null model (graph with the same degree distribution but no preferential attachment): Pij = didj 2m with di = 1 2 j=i Wij. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 33 / 39
  • 73. Clustering Interpretation A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij (m = 20) Pij = 7.5 Wij = 5 ⇒ Wij − Pij = −2.5 di = 15 dj = 20 i and j in the same cluster decreases the modularity AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 34 / 39
  • 74. Clustering Interpretation A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij (m = 20) Pij = 0.05 Wij = 5 ⇒ Wij − Pij = 4.95 di = 1 dj = 2 i and j in the same cluster increases the modularity AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 34 / 39
  • 75. Clustering Interpretation A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij • Modularity • helps separate hubs (= spectral clustering or min cut criterion); • is not an increasing function of the number of clusters: useful to choose the relevant number of clusters (with a grid search: several values are tested, the clustering with the highest modularity is kept) but modularity has a small resolution default (see [Fortunato and Barthélémy, 2007]) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 34 / 39
  • 76. Clustering Interpretation A good clustering should maximize the modularity: • Q when (xi, xj) are in the same cluster and Wij Pij • Q when (xi, xj) are in two dierent clusters and Wij Pij • Modularity • helps separate hubs (= spectral clustering or min cut criterion); • is not an increasing function of the number of clusters: useful to choose the relevant number of clusters (with a grid search: several values are tested, the clustering with the highest modularity is kept) but modularity has a small resolution default (see [Fortunato and Barthélémy, 2007]) Main issue: Optimization = NP-complete problem (exhaustive search is not not usable) Dierent solutions are provided in [Newman and Girvan, 2004, Blondel et al., 2008, Noack and Rotta, 2009, Rossi and Villa-Vialaneix, 2011] (among others) and some of them are implemented in the R package igraph. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 34 / 39
  • 77. Clustering Node clustering One of the function to perform node clustering is multilevel.community (probably not the best with large graphs): fbnet.clusters - multilevel.community(fb.lcc) modularity(fbnet.clusters) # Graph community structure calculated with the mul # Number of communities ( best split ): 7 # Modularity ( best split ): 0.566977 # Membership vector : # [1] 3 2 2 2 2 3 5 3 5 5 4 3 3 2 ... table(fbnet.clusters$membership) # 1 2 3 4 5 6 7 # 2 18 26 32 12 8 24 See help(communities) for more information. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 35 / 39
  • 78. Clustering Combine clustering and visualization # create a new attribute V(fbnet.lcc)$community - fbnet.clusters$membership fbnet.lcc # IGRAPH U --- 122 535 -- #+ attr : layout (g/n), initials (v/c), list (v/c), # label (v/c), color (v/c), community (v/n), # size (v/n) Display the clustering: par(mfrow=c(1,1)) par(mar=rep(1,4)) plot(fbnet.lcc , main=Communities, vertex.frame.color= rainbow (9)[ fbnet.clusters$membership], vertex.color= rainbow (9)[ fbnet.clusters$membership], vertex.label=NA, edge.color=grey) AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 36 / 39
  • 79. Clustering Combine clustering and visualization AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 37 / 39
  • 80. Clustering Export the graph The graphml format can be used to export the graph (it can be read by most of the graph visualization programs and includes information on node and edge attributes) write.graph(fbnet.lcc , file=fblcc.graphml, format=graphml) see help(write.graph) for more information of graph exportation formats AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 38 / 39
  • 81. Clustering Further references... on clustering... • overlapping communities communautés recouvrantes; • hierarchical clustering [Rossi and Villa-Vialaneix, 2011] provides an approach; • organized clustering (projection on a small dimensional grid) and clustering for visualization [Boulet et al., 2008, Rossi and Villa-Vialaneix, 2010, Rossi and Villa-Vialaneix, 2011]; • ... on R CRAN Task View: gRaphical Models in R on network inference you can start with the course An introduction to network inference and mining AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 39 / 39
  • 82. Clustering References Blondel, V., Guillaume, J., Lambiotte, R., and Lefebvre, E. (2008). Fast unfolding of communites in large networks. Journal of Statistical Mechanics: Theory and Experiment, P10008:17425468. Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008). Batch kernel SOM and related Laplacian methods for social network analysis. Neurocomputing, 71(7-9):12571273. Brohée, S. and van Helden, J. (2006). Evaluation of clustering algorithms for protein-protein interaction networks. BMC Bioinformatics, 7(488). Csardi, G. and Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems. Fortunato, S. and Barthélémy, M. (2007). Resolution limit in community detection. In Proceedings of the National Academy of Sciences, volume 104, pages 3641. doi:10.1073/pnas.0605965104; URL: http://www.pnas.org/content/104/1/36.abstract. Fruchterman, T. and Reingold, B. (1991). Graph drawing by force-directed placement. Software, Practice and Experience, 21:11291164. Newman, M. and Girvan, M. (2004). Finding and evaluating community structure in networks. Physical Review, E, 69:026113. Noack, A. and Rotta, R. (2009). Multi-level algorithms for modularity clustering. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 39 / 39
  • 83. Clustering In SEA '09: Proceedings of the 8th International Symposium on Experimental Algorithms, pages 257268, Berlin, Heidelberg. Springer-Verlag. Rossi, F. and Villa-Vialaneix, N. (2010). Optimizing an organized modularity measure for topographic graph clustering: a deterministic annealing approach. Neurocomputing, 73(7-9):11421163. Rossi, F. and Villa-Vialaneix, N. (2011). Représentation d'un grand réseau à partir d'une classication hiérarchique de ses sommets. Journal de la Société Française de Statistique, 152(3):3465. Schaeer, S. (2007). Graph clustering. Computer Science Review, 1(1):2764. Villa-Vialaneix, N., Liaubet, L., Laurent, T., Cherel, P., Gamot, A., and SanCristobal, M. (2013). The structure of a gene co-expression network reveals biological functions underlying eQTLs. PLoS ONE, 8(4):e60045. von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4):395416. Zanghi, H., Ambroise, C., and Miele, V. (2008). Fast online graph clustering via erdös-rényi mixture. Pattern Recognition, 41:35923599. AG PEPI IBIS (01/04/2014) Network R Nathalie Villa-Vialaneix 39 / 39