A comparative study of social network analysis tools
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Notes de l'éditeur
24 minutesGood morning,mynameis DC. I come fromLaHC , University of St-Etienne. The aim of mythesis I amhere to presentyou an article titled « A comparative study of social network analysistools » writtenwith Christine Largeron, Előd Egyed-Zsigmond and Mathias Géry.
(for the Gartner …)Many experts insist about social network new behavioursappearing in the enterprise and necessity to takethemintoaccount.Gartner Institute predicts #CITATION#.It alsoconsidersthat SNA isgetting mature.So progresscanbe made in business use of SNA for workflow study intended to permit adaptation of the management to the real communication flow in a company;And Identify key actors in networks, ie. for viral marketing aware campains
A previoussurvey by Huisman & Van Duijn has been published.Havingpreviousbench, werealizedthatitshouldbeupdated, as networks and needs have changed.Main Changes in networks concerns the sizes (networks are far largerthanbefore) and the type of information (as networks tend to getricher, withcontextual data).
There are 4 main domains of functionalities :the Representations of the network as datafilesVisualizationCharacterization by quantitativeindicatorsCommunitydetectioncapabilities
There are different sorts of indicators, which help discribe the network or just one actor.First, among global indicatorsyoucanfindsomegeneral values such as the number of nodes of the graph, the number of edges, the diameter (longestshortest path, how far at the maximum are two nodes in the graph). You can compose these values and deduce the density of the graph (whichdepends on the maximum number of links youcan put in the graph ie. the complete graph). An highdensity (high links ratio) defines a dense graph. A graph with a lowdensity value issaid to besparse.The second type of indicatorsis local indicators. You cancalculateitwithonly a small part of the graph information. An exampleis the number of neighboors (called the degree of a node).You canalso use distances between 2 nodes (or evenedges). The shortestpathis one of them.
Nowlet’sadress the benchmark itself.
The methodology for the selection of the tools takes into account the following points:First, in our vision blocking criteria for selection are that:The tools must provide the basic metricsused in the Social Network Analysisdomain ;and the networks size can reach tens of thousands of nodes, as well as smallerones.The toolsstudied in depthwerechoosenbecausetheyrepresentlegacy, wellestablished software, or new developments base on recent standards such as modularization and ergonomics.The datasetsused for experimentation are the Zachary’skarate-club dataset and the DBLP database.
Seven areas of functionalities were evaluated:Input/output formatsCustom attribute handlingBipartite graphs specific functionsLongitudinal analysis (ietimestamped events)IndicatorsVisualizationClustering capabilities
Hereis the list of studied software.Several of them are vizualisationoriented: Pajek, Gephi, GraphViz, GUESS, Tulip.Others claim to permit graph manipulation: igraph, NetworkX, and JUNG.Bothapproaches have led for manytools in incorporating SNA metrics.
The retained softwareis split in twocategories: Stand-alone softwares and libraries.The Stand-alone software are Pajek and Gephi.The libraries are igraph and NetworkX.Some of the software are emblematic of an approach (Gephi has somecommonalitieswithTulip for example).Nowlet’sdetailtheseones.
This table summarizes how well each tool performs in the seven areas.Pajek performs quite well at the seven domains.If you need more than a simple visualization of a small network, you should avoid NetworkX.Igraph is very good at providing indicator, visualization and community detection.
Hereis an otherrepresentation of the results of the benchmark.Weseeclearlywiththis radar viewthatNetworkXisvery good at custom attributeshandling and providesmanyindicators.No software performsfairlywellattemporality and bipartite graphs management.
Inorder to concludethispresentation, wecansaythere aredifferentscientificdomainsconcerned by SN, and thisexplainwhythere are somuch approches and toolsavailable.The advancesshould come in short or midtermfrom smart modularisation of user interface, algorithms and data, with the goal of allowing the reuse of methodsbetween software. Contextualanalysisneedtools for longitudinal analysis and attributesstudy.Hierarchical graph visualizationisnow an important subjecttoo for smart manipulation of large graphs.