Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Social Media Data Collection & Network Analysis with Netlytic and R

1 772 vues

Publié le

The workshop presented at HKBU
http://www.comm.hkbu.edu.hk/acr/eng/brochure.pdf

Publié dans : Médias sociaux
  • Soyez le premier à commenter

Social Media Data Collection & Network Analysis with Netlytic and R

  1. 1. Social Media Data Collection & Network Analysis with Netlytic and R Anatoliy Gruzd gruzd@ryerson.ca @gruzd Canada Research Chair in Social Media Data Stewardship Associate Professor, Ted Rogers School of Management Director, Social Media Lab Ryerson University HKBU, Hong Kong Dec 3, 2015 Twitter: @gruzd ANATOLIY GRUZD 1
  2. 2. Research at the Social Media Lab
  3. 3. Presentation Slides http://bit.ly/hk15slides Twitter: @gruzd ANATOLIY GRUZD 3
  4. 4. Twitter: @gruzd ANATOLIY GRUZD Social Media sites have become an integral part of our daily lives! Growth of Social Media Data Facebook 1.5B users Instagram 400M users Twitter 300M users
  5. 5. Decision Making in domains such as Politics, Health Care and Education Twitter: @gruzd ANATOLIY GRUZD 6 How to Make Sense of Social Media Data? Self- collected/ reported Public APIs Data Resellers
  6. 6. How to Make Sense of Social Media Data? Big Data Technology Twitter: @gruzd ANATOLIY GRUZD 7 Credit: Nathan Lapierre
  7. 7. Twitter: @gruzd ANATOLIY GRUZD 8 Social Media Analytics Tools http://socialmedialab.ca/apps/social-media-toolkit/
  8. 8. Data -> Visualizations -> Understanding How to Make Sense of Social Media Data? Twitter: @gruzd ANATOLIY GRUZD 9
  9. 9. How to Make Sense of Social Media Data? Example: Geo-based Analysis Twitter: @gruzd ANATOLIY GRUZD 10
  10. 10. How to Make Sense of Social Media Data? Example: Geo-based Analysis Twitter: @gruzd ANATOLIY GRUZD 11 Geography of Twitter Networks
  11. 11. How to Make Sense of Social Media Data? Example: Geo-based + Content Analysis Tracking Hate Speech on Twitter Twitter: @gruzd ANATOLIY GRUZD 12 Source: http://www.fenuxe.com/tag/geo-coded
  12. 12. Social Network Analysis (SNA) • Nodes = People • Edges /Ties (lines) = Relations/ “Who retweeted/ replied/ mentioned whom” How to Make Sense of Social Media Data? Twitter: @gruzd ANATOLIY GRUZD 13
  13. 13. Makes it much easier to understand what is going on in a group Advantages of Social Network Analysis Once the network is discovered, we can find out: • How do people interact with each other, • Who are the most/least active members, • Who is influential in a group, • Who is susceptible to being influenced, etc… Twitter: @gruzd ANATOLIY GRUZD 14 Liberal Conservative Spam Unknown & Undecided NDP Left Green Bloc Other Gruzd, A. and Roy, J (2014). Political Polarization on Social Media: Do Birds of a Feather Flock Together on Twitter? Policy & Internet.
  14. 14. Common approach for collecting social network data: • Self-reported social network data may not be available/accurate • Surveys or interviews Problems with surveys or interviews • Time-consuming • Questions can be too sensitive • Answers are subjective or incomplete • Participant can forget people and interactions • Different people perceive events and relationships differently How Do We Collect Information About Online Social Networks? Twitter: @gruzd ANATOLIY GRUZD 15
  15. 15. Studying Online Social Networks http://www.visualcomplexity.com/vc Forum networks Blog networks Friends’ networks (Facebook, Twitter, Google+, etc…) Networks of like-minded people (YouTube, Flickr, etc…) Twitter: @gruzd ANATOLIY GRUZD 17
  16. 16. Goal: Automated Networks Discovery Challenge: Figuring out what content-based features of online interactions can help to uncover nodes and ties between group members How Do We Collect Information About Online Social Networks? Twitter: @gruzd ANATOLIY GRUZD 18
  17. 17. Automated Discovery of Social Networks Emails Nick Rick Dick • Nodes = People • Ties = “Who talks to whom” • Tie strength = The number of messages exchanged between individuals Twitter: @gruzd ANATOLIY GRUZD 19
  18. 18. Automated Discovery of Social Networks “Many to Many” Communication ChatMailing listservForum Comments Twitter: @gruzd ANATOLIY GRUZD 20
  19. 19. @John @Peter @Paul • Nodes = People • Ties = “Who retweeted/ replied/mentioned whom” • Tie strength = The number of retweets, replies or mentions Automated Discovery of Social Networks Twitter Networks Twitter: @gruzd ANATOLIY GRUZD 21
  20. 20. Automated Discovery of Social Networks Twitter Data Examples Network Ties @Cheeflo -> @JoeProf @Cheeflo -> @VMosco @JoeProf -> @VMosco Twitter: @gruzd ANATOLIY GRUZD 22 Network Tie @Gruzd -> @SidneyEve Connection type: Mention Connection type: Reply
  21. 21. Sample Twitter Searches #ELECTION2016 #HONGKONG Twitter: @gruzd ANATOLIY GRUZD 23 3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
  22. 22. Sample Twitter Searches #ELECTION2016 #HONGKONG Twitter: @gruzd ANATOLIY GRUZD 24 3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
  23. 23. Sample Twitter Searches #ELECTION2016 #HONGKONG Twitter: @gruzd ANATOLIY GRUZD 25 3557 records (Dec 3, 2015)1394 records (Oct 29, 2015) What do these visualizations tell us?
  24. 24. SNA Measures Micro-level In-degree centrality Out-degree centrality Betweenness centrality Other centrality measures (e.g., closeness, eigenvector) Macro-level Density Diameter Reciprocity Centralization Modularity ANATOLIY GRUZD 26Twitter: @gruzd
  25. 25. SNA Measures Micro-level In-degree centrality Out-degree centrality Betweenness centrality Other centrality measures (e.g., closeness, eigenvector) ANATOLIY GRUZD 27 In-degree suggests “prestige” highlighting the most mentioned or replied Twitter users Twitter: @gruzd
  26. 26. In-degree centrality #HongKong Twitter network Twitter: @gruzd ANATOLIY GRUZD 28 SEVENTEEN or SVT is a S.Korean boy group formed by Pledis Entertainment
  27. 27. SNA Measures Micro-level In-degree centrality Out-degree centrality Betweenness centrality Other centrality measures (e.g., closeness, eigenvector) ANATOLIY GRUZD 29 Out-degree reveals active Twitter users with a good awareness of others in the network Twitter: @gruzd
  28. 28. Out-degree centrality #HongKong Twitter network Twitter: @gruzd ANATOLIY GRUZD 30 Note: A music fan (many retweets & replies to others)
  29. 29. SNA Measures Micro-level In-degree centrality Out-degree centrality Betweenness centrality Other centrality measures (e.g., closeness, eigenvector) ANATOLIY GRUZD 31 Betweenness shows actors who are located on the most number of information paths and who often connect different groups of users in the network Twitter: @gruzd
  30. 30. Betweenness centrality #HongKong Twitter network Twitter: @gruzd ANATOLIY GRUZD 32 Note: A fan (retweets/replies to messages from two different fan communities/sites)
  31. 31. Sample Twitter Searches #ELECTION2016 #HONGKONG Twitter: @gruzd ANATOLIY GRUZD 33 3557 records (Dec 3, 2015)1394 records (Oct 29, 2015)
  32. 32. SNA Measures Macro-level Density Diameter Reciprocity Centralization Modularity Density indicates the overall connectivity in the network (the total number of connections divided by the total number of possible connections). It is equal to 1 when everyone is connected to everyone. ANATOLIY GRUZD 34Twitter: @gruzd User1 User3 User2 Density = 1
  33. 33. #Election2016 #HongKong Nodes 491 2570 Edges 1075 2447 Density 0.005 (0.5%) 0.0004 (0.04%) Diameter Reciprocity Centralization Modularity ANATOLIY GRUZD 35Twitter: @gruzd
  34. 34. SNA Measures Macro-level Density Diameter Reciprocity Centralization Modularity Diameter gives a general idea of how “wide” the network is; the longest of the shortest paths between any two nodes in the network. ANATOLIY GRUZD 36Twitter: @gruzd #1 User1 User3 User2 User4 Diameter = 3 #2 #3
  35. 35. #Election2016 #HongKong Nodes 491 2570 Edges 1075 2447 Density 0.005 (0.5%) 0.0004 (0.04%) Diameter 28 14 Reciprocity Centralization Modularity ANATOLIY GRUZD 37Twitter: @gruzd
  36. 36. SNA Measures Macro-level Density Diameter Reciprocity Centralization Modularity Reciprocity shows how many online participants are having two-way conversations. In a scenario when everyone replies to everyone, the reciprocity value will be 1. ANATOLIY GRUZD 38Twitter: @gruzd User2 User1 User3 User4 Reciprocity=1
  37. 37. #Election2016 #HongKong Nodes 491 2570 Edges 1075 2447 Density 0.005 (0.5%) 0.0004 (0.04%) Diameter 28 14 Reciprocity 0.006 (0.6%) 0.003 (0.3%) Centralization Modularity ANATOLIY GRUZD 39Twitter: @gruzd
  38. 38. SNA Measures Macro-level Density Diameter Reciprocity Centralization Modularity Centralization indicates whether a network is dominated by few central participants (values are closer to 1), or whether more people are contributing to discussion and information dissemination (values are closer to 0). ANATOLIY GRUZD 40Twitter: @gruzd User2 User1User3 User4 Centralization=1
  39. 39. #Election2016 #HongKong Nodes 491 2570 Edges 1075 2447 Density 0.005 (0.5%) 0.0004 (0.04%) Diameter 28 14 Reciprocity 0.006 (0.6%) 0.003 (0.3%) Centralization 0.05 0.11 Modularity ANATOLIY GRUZD 42Twitter: @gruzd
  40. 40. SNA Measures Macro-level Density Diameter Reciprocity Centralization Modularity Modularity provides an estimate of whether a network consists of one coherent group of participants who are engaged in the same conversation and who are paying attention to each other (values closer to 0); or whether a network consists of different conversations and communities with a weak overlap (values closer to 1). ANATOLIY GRUZD 44Twitter: @gruzd
  41. 41. #Election2016 #HongKong Nodes 491 2570 Edges 1075 2447 Density 0.005 (0.5%) 0.0004 (0.04%) Diameter 28 14 Reciprocity 0.006 (0.6%) 0.003 (0.3%) Centralization 0.05 0.11 Modularity 0.42 0.92 ANATOLIY GRUZD 47Twitter: @gruzd
  42. 42. Practice with Netlytic + R Twitter: @gruzd Anatoliy Gruzd 48 Twitter hashtag: #HongKong Instructions at http://bit.ly/hknet15

×