13. Complex network as
rhizomes
“Unlike trees or their roots, the rhizome connects any point to any other point”
Gilles Deleuze & Felix Guattari “A Thousand Plateaus”, 1980
“The main feature of a net is that every point can be connected with every other
point, and where the connections are not yet designed, they are, however,
conceivable and designable.
A net is an unlimited territory”
Umberto Eco, “Semiotics and the Philosophy of Language”, 1986
36. 1. nodes position – layout
2. nodes size – ranking
3. nodes color – partitions
3 visual variables of analysis
Gephi.org
37. L’analyse du réseau en 6 questions
Application d’une spatialisation force-vecteur
1. Quelles sont les débats/communautés discursives ?
(identification des clusters de nœuds)
2. Quels sont les sites au centre des débats/communautés ?
(identification des nœuds centraux dans le réseau et les clusters)
3. Quels sont les sites qui connectent les débats/communautés ?
(identification des ponts/bridge entre les différents clusters)
Application d’une classement par degrée-entrant/sortant
4. Quels sont les sites leaders d’opinion du débat en ligne ?
(identification des autorités du graphe)
5. Quels sont les sites qui fédèrent le le débat en ligne ?
(identification des hubs du graphe)
Application d’une coloration par partition
6. Comment sont reparties les différentes catégories de sites ?
(évaluation de la cohérence topologie/catégorisation)
38. Application d’une spatialisation force-
vecteur (ForceAtlas 2)
• LinLog mode
(maximizes the legibility of clusters)
• Prevent overlap
(enhances legibility, but distorts spatialization)
• Scaling
(increases/decreases all distance proportionally)
• Gravity
(pulls everything towards the center, prevents
dispersions, but distorts spatialization)
• Approximate repulsion
(accelerate spatialization on large graphs, but
39. Quelles sont les débats/communautés discursives ?
(identification des clusters de nœuds)
40. Quelles sont les débats/communautés discursives ?
(identification des clusters de nœuds)
Because of the monstrous size of the Web, there are two types of maps of it. The maps that try to be exhaustive and to trace the entire Web or most of it (and fail)…
… and the good ones.
27/08/12
A good map of the Web is always limited in its ambition: it tries to represent a limited portion of the Web and the better this portion is delimited, the better is the map. In the example a very interesting map of the French political blogosphere, realized by Linkfluence (a research partner of the médialab).
Indeed, the carving process that we just described is precisely what allows going from a pseudo-exhaustive (in fact, poorly delimited) network to a legible one.
27/08/12 This is a model of a tiny web corpus. It only has some 80 nodes and yet it looks as a plate of spaghetti (or an hairball).
27/08/12 Now that we know about power law, however, we can try to de-spaghetticize this graph. To do so, we will first change the size of the nodes according to their in-degree (the number of hyperlinks that they receive).
27/08/12 Secondly, we will re-order the nodes on the Y axe again according to their in-degree.
27/08/12 Focusing on visibility, the higher layer contains the websites that are highly visible, appear on the first page of search engines’ results and can be easily found by anyone; the middle layer contains the websites that are less visible, appear in the following pages of search engines’ results and can only be found by experts; the lower layers contains the websites that are almost invisible, don’t show up in search engines and are almost impossible to find.
27/08/12 Focusing on connectivity, the higher layer contains the websites that are highly connected both locally and globally; the middle layer contains the websites that are highly connected locally but poorly connected globally; the lower layers contains the websites that are poorly connected both locally and globally.
27/08/12 The three layers can also be distinguished by looking at the direction of the links. The World Wide Web is characterized by a very peculiar reverse gravity: where the less visible websites points toward the more visible ones (thereby making them even more visible)…
27/08/12 … but not the other way around.
27/08/12 The reason why we want to exclude these websites is because it is impossible to define where they are located. The websites that are too high in the in-degree hierarchy are connected to everyone and are therefore everywhere. The websites that are too low in the in-degree hierarchy are connected to none and are therefore nowhere. Only the websites in the middle are somewhere because they are connected to only someone. Of course where exactly this first cut is done depends entirely on the level of specificity of the research that you are doing. And this is why this cut is arbitrary and relatively easy.
27/08/12 This cut is more difficult because the thematic separation on the Web is as we said a question of density and rarefaction and separating one theme from another is more a question of ripping than of cutting.
27/08/12 Through this two operation is possible to delimit a thematic corpus. This corpus is composed of websites of the intermediary layer, but also of the upper and lower layers. In particular, the websites of the higher layer constitute the core of the corpus, which is surrounded by a nebula of websites of the middle layer and several tendrils in the lower layer.
Now that we have extracted our scientometrics network from Scopus, we can analyse it with Gephi.
Gephi is a very complex piece of software and here I will only have the time for a quick introduction. However, if you want to know more about Gephi and its usage, I strongly encourage you to have a look at the documentation on the Gephi’s website ( http://gephi.org/users/ ) which is extremely well done.
Very quickly, Gephi has three main windows one for the ‘Overview’, which is the one where you can manipulate and analyze your graph (and the one on which you’ll spend most of the time).
The second window is the ‘Data Laboratory’ where you have a table view of the nodes and the edges of your graph and their attributes.
Finally the ‘Preview’ window allow you tweaking the visualization parameters of your graph and export the result of your work as a static image (pdf, png, svg).
Back to the ‘Overview’ window there are three main palettes that we will employ in the analysis: 1. The ‘Layout’ palette, to change the position of the nodes 2. The ‘Ranking palette, to change the size of the nodes 3. The ‘Partitions’ palette, to change the color of the nodes
27/08/12 As you see the cells of the table are colored with four different colors that indicates the four steps of the analysis: 1. Identification of clusters (layout) 2. Characterization of clusters (layout) 3. Remarkable nodes (layout & ranking) 4. Categories projection (partitions)
To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
… it is easy to identify the areas which contains no or few nodes, also called structural holes …
…
- Central clusters (located in the middle of the network), because centrality in a spatialized graph is a sign of high and highly diverse connectivity. - Bridging clusters (located in-between two clusters), because this clusters play a crucial role in allowing the circulation of things in the network.
- Central clusters (located in the middle of the network), because centrality in a spatialized graph is a sign of high and highly diverse connectivity. - Bridging clusters (located in-between two clusters), because this clusters play a crucial role in allowing the circulation of things in the network.
To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
- The in-degree, corresponding to the number of incoming edges (the number of connection pointing toward the node). The in-degree of a node is also called its ‘authority score’, because receiving many connections is generally correlated to the fact that the node is considered ‘important’ or ‘remarkable’ by the other nodes of the network.
The out-degree, corresponding to the number of outgoing edges (the number of starting from the node). The out-degree of a node is also called its ‘hub score’. Hubs are important in networks because the play a crucial role in the circulation of the information. Of course, in-degree and out-degree can only be computed in directed graphs (graph in which the connections have a direction). In non-directed graph (such as a graph of friendship, if we assume that friendship is always mutual), it is however possible to compute the degree of nodes (the number of edges connected to a each node).
To identify the clusters, therefore, the first thing to do is to spatialize the network using a force-vector algorithm. The first action that we will do on our graph is to spatialize it with the ForceAtlas 2 layout. This algorithm can be tweaked by changing several parameters, the most important of which are - LinLog mode (maximizes the legibility of clusters) - Prevent overlap (enhances legibility, but distorts spatialization) - Scaling (increases/decreases all distance proportionally) - Gravity (pulls everything towards the center, prevents dispersions, but distorts spatialization) - Approximate repulsion (reduce the time required to spatialize large graphs, but distorts spatialization)
But it is also interesting to observe if topology and classification are consistent (if most of the nodes of a given type are located within the same clusters and, conversely, if clusters are formed by nodes of the same type).
If topology and classification are consistent, it is then interesting to zoom on the exceptions and have a closer look to the nodes that have and unusual position compared to the other nodes of the same type.