Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Gephi icwsm-tutorial

1 736 vues

Publié le

  • Login to see the comments

  • Soyez le premier à aimer ceci

Gephi icwsm-tutorial

  1. 1. ICWSM’11 TutorialExploratory Network Analysis with: Instructors: Sébastien Heymann, Julian Bilcke seb@gephi.org, julian.bilcke@gephi.org July 17, 2011 | 1 PM - 4 PM
  2. 2. Exploratory Network Analysis with GephiThis tutorial is an introduction to Gephi, the open source graph networkvisualization and manipulation software.Gephi aims to fulfill the complete chain from data importing to aestheticsrefinements and interaction.Users interact with the visualization and manipulate structures, shapesand colors to reveal hidden properties.The goal is to help data analysts to make hypotheses, intuitively discoverpatterns or errors in large data collections. EAt the end, the participants will walk away with the practical knowledge INenabling them to use Gephi for their own projects. F F L O
  3. 3. Exploratory Network Analysis with GephiIt starts with a brief introduction on the network exploration process anda hands-on demonstration of the essential functionalities of Gephi.Participants are guided step by step through the complete chain of rep-resentation, manipulation, layout, analysis and aesthetics refinements.Next, teams work on real datasets.They finally present their preliminary results. The tutorial concludes witha general question and answer session. IN E F F L O
  4. 4. RequirementsBring your own laptop with Java and Gephi installed.Gephi should be updated (menu Help > Check for Updates).Bring a mouse with a wheel.Bring a dataset of your own if you want, verify if it loads well in Gephi.[1][1] http://gephi.org/users/supported-graph-formats/
  5. 5. Workshop Schedule - Part IExploratory Network Analysis• Exploratory Data Analysis• Exploratory Network Analysis• Looking for Orderness in Data• Examples• GuidelineIntroduction to Gephi• Approach and Community• Networked Data• Quick Start Demo * 30 min break *
  6. 6. Workshop Schedule - Part IIHands-On!• Team Work on a Dataset• Presentation of Preliminary ResultsQ&A
  7. 7. Exploratory Data Analysis Confirmatory results Exploratory intuition Serendipity surprise “The greatest value of a picture is when it forces us started with to notice what we never expected to see” John Tukey (1962)
  8. 8. Exploratory Data Analysis Non-linear processing chain of Ben Fry in Computational Information Design (2004)
  9. 9. Dummy Example Observation: visual saliences on specific file sizes External knowledge: these sizes correspond to films New hypothesis on data: films are highly exchanged, so the study might dig in this direction P2P file size distribution (Latapy et al., 2008)
  10. 10. Exploratory Network Analysis 2 interact in real time 1 see the network Gephi prototype (2008) 1st graph viz tool: Pajek (1996) group, filter, compute metrics... Vladimir Batagelj, Andrej Mrvar 3 build a visual language size by rank, color by partition, label, curved edges, thickness...
  11. 11. Looking for a “Simple Small Truth”?Drew Conway, What Data Visualization Should Do: 1. Make complex things simple 2. Extract small information from large data 3. Present truth, do not deceive http://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/
  12. 12. Looking for Orderness in Data Make varying 3 cursors simultaneously to extract meaningful patternsMICRO level MACRO level at different levels1 dimension N dimensions on multiple dimensionsT+0 T+N at time scale
  13. 13. “Zoom” cursor on Quantitative DataMICRO level MACRO level Global - connectivity - density - centralization Local - communities - bridges between communities - local centers vs periphery Individual - centrality - distances - neighborhood - location - local authority vs hub
  14. 14. “Crossing” cursor on Qualitative Data1 dimension N dimensionsSocial- who with whom- communities- brokerage- influence and power- homophilySemantic- topics- thematic clustersGeographic- spatial phenomena
  15. 15. “Timeline” cursor on Temporal DataT+0 T+NEvolution of social tiesEvolution of communitiesEvolution of topics
  16. 16. Mapping an Innovation CenterCollaborations on projects at Images et Réseaux Themes and content Actors Territory Franck Ghitalla & Ecole de Design de Nantes
  17. 17. Mapping Scientific Cooperations
  18. 18. Network Map: a Series of Choices corpus data graphical operationsalgorithms communication thresholds goals
  19. 19. Guideline # nodes 1 - 100 lists + edges in bonus, focus on qualitative data How attributes explain the structure? 100 - 1,000 • easy to read, “obvious” patterns • focus on entities (in context) • metrics are tools to describe the graph (centrality, bridging...) • links help to build and interpret categories of entities challenge: mix attribute crossing and connectivity How the structure explains attributes?1,000 - 50,000 • hard to read, problem of “hidden signals”: track patterns with various layouts and filtering • focus on structures • metrics are tools to build the graph (cosine similarity...) • categories help to understand the structure challenge: pattern recognition > 50,000 require high computational power
  20. 20. Gephi now!
  21. 21. Gephi in a Nutshell « Like Photoshop™ for graphs. » Helps data analysts to reveal patterns and trends, highlight outliers and tells story with their data.• Network visualization platform• Open source, supported by a community• Built for performance and usability• Extensible by plug-ins• Windows, MacOS X, Linux
  22. 22. Gephi Community Nonprofit organization Communities Contributors Mathieu Bastian, Mathieu Jacomy, Eduardo Ramos Ibañez, Sébastien Heymann, Guillaume Ceccarelli, André Panisson, Antonio Patriarca, Cezary Bartosiak, Martin Škurla, Patrick McSweeney, Yi Du, Hélder Suzuki, Daniel Bernardes, Ernesto Aneiro, Keheliya Gallaba, Luiz Ribeiro, Urban Škudnik, Vojtech Bardiovsky, Yudi Xue
  23. 23. Community Mission Provide a “sustainable” software Maintain the technical ecosystem Build a business ecosystem Face cutting-edge technological challenges with a long-term vision Distribute the software in Open Source
  24. 24. Community Values Open innovation: ideas and features come from the entire community. Decisions are taken with transparency. We consider this technology as a public good, and will keep it in open source.
  25. 25. Diversity of Usagesbusiness leisure :-)communication academic art
  26. 26. Diversity of Network EncodingV = { a, b, c, d, e } <graph>E = { (a,b), (a,d), (b,c), (e,a), (c,e) } <nodes> <node id=”a” /> <node id=”b” /> Textual <node id=”c” /> <node id=”d” /> <node id=”e” /> </nodes> <edges> <edge source=”a” target=”b” /> <edge source=”a” target=”d” /> a b c d e <edge source=”b” target=”c” /> a - 1 - 1 - <edge source=”e” target=”a” /> <edge source=”c” target=”e” /> b - - 1 - - </edges> c - - - - 1 </graph> d - - - - - e 1 - - - - XML Graphical Tabular and many others...
  27. 27. Software I/O } MySQL PostgreSLSQL Server databases user input Neo4j CSV CSV Pajek NET Pajek NET file Guess GDF Guess GDF > GEXF GEXF GraphML GraphML file Graphviz DOT Excel Spreadsheet UCInet DL SVG NetdrawVNA PDF Tulip TLP PNG Excel Spreadsheet graph streaming
  28. 28. Choosing a File Format re es e tu lu ut c Va ru s rib ph St lt t ra At au rix G re ef n t at gh al io tu D /M ic es s at ei ru ic e h st ut liz W ut am rc St Li rib rib ua ra ge L yn ge ie XM s t t Ed At At Vi D H EdCSV Table of features supportedDL Ucinet by GephiDOT GraphvizGDFGEXF * spreadsheets can be loadedGML in the Data LaboratoryGraphMLNET PajekTLP TulipVNA NetdrawSpreadsheet*
  29. 29. Do you need... Many features GEXF Spreadsheet GraphML Guess GDF GML UCINet DL Netdraw VNA Graphviz DOT Pajek NET File Type CSV XML Tulip TLP Tabular Few features Text
  30. 30. Using Gephi E M O D
  31. 31. Team work 1 Create a team of 2~3 people. 2 Choose a dataset. 3 Explore it during 1H. 4 Two teams present their preliminary findings.
  32. 32. Dataset #1: GitHub Software Repository “GitHub is an application used by nearly a million people to store over two million code repositories, making GitHub the largest code host in the world.”Started in 2008, it provides the features of an online social networkand a software repository to lower the barriers of collaboration andmake the code easier to contribute. https://github.com
  33. 33. Dataset #1: GitHub Software RepositoryData extracted by Franck Cuny* at Linkfluence SAS1st release in March 2010 -> this poster2nd release in June 2011 -> your data_____________Network of user profiles__________Nodes: peoples with at least one repository whoare followed by at least two other peopleEdges: A follows B_____________Network of repositories__________Nodes: repositoriesEdges: A shares a developer with B Very few research publications on this OSN! * franck.cuny@linkfluence.net
  34. 34. Dataset #1: GitHub Software RepositoryData extracted by a crawl using the GitHub APISeed: 10 well-known contributors in the Perl communityNetworks by country: Japan, France, United StatesNetworks by language: Perl, PHP, Python, RubyNode attributes:• user country• number of followers• main programming languageEdges:• directed• weight = number of projects A has forked from B
  35. 35. Dataset #1: GitHub Software Repository Your mission (should you decide to accept it): find research hypotheses based on your exploration Example question: are the Perl communities based on geography?
  36. 36. Dataset #2: The Irish Blogosphere“Identifying Representative Textual Sources in Blog Networks”. K. Wade, D.Greene, C. Lee, D. Archambault, P. Cunningham (2011) http://mlg.ucd.ie/blogs_______________Blogroll Network______________Nodes: blogs with more than two blogroll linksEdges: blogroll link (in-link)_______________Post-link Network_____________Nodes: blogs with more than two blogroll linksEdges: hyperlink inside post from a blog to another(post-link)
  37. 37. Dataset #2: The Irish BlogosphereData extracted by a crawl at distance 2 from the seed for the in-linksand Google Blog Search for the post-links.Seed: 21 popular blogs, winners of the “2010 Irish Blog Awards”Node attributes:• post count = total number of posts by blog• category = from the irish blog index at www.irishblogdirectory.com, where available• infomap_comm = community to which a node belongs (infomap algo)• gce_comms = overlapping communities (GCE algo)• moses_comms = overlapping communities (MOSES algo)Edges:• directed• weight = number of hyperlinks in the Post-link network crawl at distance 2 from the seed
  38. 38. Dataset #2: The Irish Blogosphere Your mission: explore and try to confirm the official results
  39. 39. Hands-On!Start:• Load a graph• Apply a layout• Color the nodes by a qualitative variable in Partition Panel• Size the nodes by a quantitative variable in Ranking Panel• Start to explore...compute metrics, filter the networkEnd:• Export maps to PDF in Preview Tab• Save
  40. 40. Presentations GitHub Repository Irish Blogosphere
  41. 41. Gephi DocumentationWeb Site: http://gephi.orgSupport: http://forum.gephi.orgWiki: http://wiki.gephi.orgSource code: https://launchpad.net/gephiOnline Tutorialshttp://gephi.org/users/quick-start/http://gephi.org/users/tutorial-visualization/http://gephi.org/users/tutorial-layouts/http://wiki.gephi.org/index.php/Import_CSV_Datahttp://wiki.gephi.org/index.php/Import_Dynamic_DataTutorial in Spanishhttps://code.google.com/p/camon/wiki/Taller_GephiSupported Graph Formatshttp://gephi.org/users/supported-graph-formats/
  42. 42. Thank You! Caspar David Friedrich - Wanderer Above the Sea of Fog
  43. 43. Credits[slide 11] images from Drew Conwayhttp://www.dataists.com/2010/10/what-data-visualization-should-do-simple-small-truth/[slide 22 top left] Benoît Vidal at MFG Labs[slide 22 bottom center] Franck Ghitalla at UTC[slide 22 right] Studies in MA Digital Fashion at LCF by Peter Jeun Ho Tsanghttp://jeunhotsang.com/blog/2010/12/07/prototype/[slide 27] sketches from Ben Fry, Computational Information Design Special Thanks to Franck Ghitalla and Mathieu Jacomy for their insightful discussions.