SlideShare une entreprise Scribd logo
1  sur  34
Analyzing Twitter data
Issues
  Challenges
    and
      Opportunities



RC33 Conference, Sydney Australia,
9-13 July 2012



Maurice Vergeer
m.vergeer@maw.ru.nl / www.mauricevergeer.nl / blog.mauricevergeer.nl
Radboud University Nijmegen, the Netherlands
   Many platform       Empty platform /
    -   Facebook         infrastructure
    -   Twitter          - Facility
    -   Linkedin
    -   Hyves
    -   RenRen
    -   Cyworld         User generated content
    -   Orkut            -   Text
    -   Youtube          -   Audio
    -   Flickr           -   Video
    -   Plurk            -   Pictures
    -   Sina Weibo
    -   Etc



Social media
Number of articles on politics, Internet and social media
                     180


                     160


                     140


                     120
Number of articles




                     100


                      80


                      60


                      40


                      20


                       0
                           1995   1996   1997    1998    1999   2000   2001   2002    2003    2004       2005   2006   2007   2008    2009    2010    2011      2012
                             Internet and politics (query 1)       Social media and politics (query 2)          Internet, social media and politics (query 3)


Source: Vergeer (in press / 2012) in New Media & Society
Focus on Twitter
The Netherlands



  A special case?
   Opportunities
    ◦ Methodological/technical
       Timeseries analysis
       Network analysis
        ◦ Actors
        ◦ Content
        ◦ Diffusion of information through onine social networks
        ◦ Social media activities

   Limitations
    ◦ Twitter
       Reliability of Twitter API




Outline
•   Within Twitter (using the API)
    • Username
    • Account creation data
    • # of followers
      • And the actual usernames of these followers
    • # of followers
      • And the actual usernames of those being followed
    • Tweet text

    • And many more (see dev.twitter.com)




Data sources
   Tweet
    ◦ Tweet text

    ◦ Whether or not it was a reply to another tweet
       To whom it was a reply (username/screenname and numerical
        userid)

    ◦ Whether or not it was a retweet (according to Twitter)
       Which tweet was retweeted (nunerical tweetid)
   Message of tweet

   Whether or not is was a directed tweet
    (sent to someone in particular)
    ◦ Identified by an @-sign


   Whether or not is was a retweet
    ◦ Identified by RT




Type of content
   Undirected tweet
    ◦ RCMP Commissioner appearing before Public Safety Cmte now.
      What a popular guy - he has his own paparazzi!

   Directed tweet
    ◦ Fantastic blog by my good friend @GlenPearson -
      http://bit.ly/hlAKXp #lpc

   Directed tweet to two usernames
    ◦ @miken32 @CBCEdmonton probably because that is NOT what I
      said--more commercially viable is different than not needed.

   Retweet
    ◦ RT @liberal_party: Think Durham deserves better than Bev Oda?
      Join @BobRaeMP for a rally tomorrow at 1pm http://lpc.ca/durham
      #cdnpoli #lpc




Tweet examples
   Traditional material
    ◦ Produced by professional actors
    ◦ Newspapers
    ◦ Public administration documents

   Social media
    ◦ Produced by
       professional actors
       general public




Content analysis of tweets
   Large quantities of data

   Word frequencies
    ◦ Identifying the most important words in the corpus
    ◦ Code these words into more general categories

   Switch to SPSS (or other type of data management tool)
    ◦ Search for the words in the actual tweets
    ◦ Assign tweet to a specific code

   Improvements in SPSS
    ◦ Compute command facilitates many new text operators
    ◦ Char.index, Char.substr, etc

   Alternative
    ◦ Regular expressions
    ◦ complex




Data extraction
   Publicly available data sources on
    parliament, election council

   Time series
    ◦ Identifying relevant societal/political events
      relevant for the study at hand
      Ex.1 temporarily shut down of election campaign
       due to passenger plane crash of Dutch airliner in
       Libia My 2010
      Ex.2 Deregistration of People s Political Power
       Party of Canada




External data sources
900


800


700


600


500


400


300


200


100


  0
      newspaper   broadcasting    radio    news agency    magazine   online only   local

                          institutional Twitter account       Personal Twitter account     9
Source: Vergeer & Hermans (forthcoming / 2013)
in Journal of Computer-Mediated Communication
1000




                               0
                                   100
                                         200
                                                           500
                                                                             800
                                                                                   900




                                               300
                                                     400
                                                                 600
                                                                       700
                 01-mei-2010
                 02-mei-2010
                 03-mei-2010
                 04-mei-2010
                 05-mei-2010
                 06-mei-2010
                 07-mei-2010
                 08-mei-2010
                 09-mei-2010




          CDA
PvdD
                 10-mei-2010
                 11-mei-2010
                 12-mei-2010




SGP
          PvdA
                 13-mei-2010
                 14-mei-2010
                 15-mei-2010




          SP
NN
                 16-mei-2010
                 17-mei-2010
                 18-mei-2010




          VVD
TON
                 19-mei-2010
                 20-mei-2010
                 21-mei-2010




          PVV
                 22-mei-2010




MenS
                 23-mei-2010
                 24-mei-2010



          GL
HNL
                 25-mei-2010
                 26-mei-2010
                 27-mei-2010
          CU

                 28-mei-2010
Partij1

                 29-mei-2010
                 30-mei-2010
                 31-mei-2010
          D66
Piraten




                 01-jun-2010
                 02-jun-2010
                 03-jun-2010
                 04-jun-2010
                 05-jun-2010
                 06-jun-2010
                 07-jun-2010
                 08-jun-2010
                 09-jun-2010
   Date and time

   For longitudinal analysis and cross-national comparisons
    ◦ take note of the time differences and correct if necessary.
        Time zones
        Daytime saving

   What to do with countries having multiple time zones?
    ◦ Depends on RQs
       Communication patterns: keep a single time zone
       Focus on individual daily patterns: adjust for time zones
   Total tweets by candidates, followers and followed:
    ◦ 4,536,854 tweets

   Breakdown
    ◦ Tweets among candidates:                            appr 2%
    ◦ Tweets to inner circles (followers or being followed)
       appr 18%
    ◦ Tweets to outer circle:                                  appr
      33%
    ◦ Tweets not directed to anyone in particular              appr
      49%

    ◦ Extracting users from tweets (@adresses)




Communication network analysis
 Communication network based on
  candidates identified in tweets
 Excluding the general public




Communication network analysis
   See http://tinyurl.com/blzajsl for
    animated version.
   Retrospective
    ◦ 3200 tweets back in time

   Cost technical
    ◦ Access to firehose for real time data




Limitations in data collection
   Date of tweet
    ◦ Minute fraction is time stamped with the wrong date
   Solution
    ◦ Estimate date and time using the tweetid

   Status of tweet as retweet
    ◦ RT
   Solution:
       Use text search operators to identify real retweets (“RT ”, “rt “)
        Also see http://tinyurl.com/bohhjzn

   Reply to tweets
    ◦ Only the first address is identified
   Solution
    ◦ Search for multiple @-addresses using text extraction methods



Reliability of data as provided by
the API
BIG DATA

The buzz word of these days
 Not gigabyte, ot terabytes,
 But petabytes and exabytes of data
 Only for the few
 Specific hardware requirements
    ◦ Computing power
    ◦ Data storage
   The data presented in this presentation
    ◦ Appr 4.5 million records equals appr 1
      gigabyte, not that Big
There is still so much to be done
with…
•   Focus on specific cases
     -political communication:
         politicians – candidates in elections
     -fan studies
         celebrities
         cast of popular soap opera’s
    ◦ -journalism studies
         journalists and newspapers





Focus on specific cases
 actor information
 information on societal events
 accumulate data over time using the
  same data structure
    ◦ Proonged analysis
    ◦ Multuple case studies, cross-national
      comparative analysis




Enrich existing Twitter data with
external data
   Traditional process (textbook approach)
    ◦ RQ -> research design

   Practice, particularly with secondaire (i.e. third party) data
    ◦ Data  RQ  research design
    ◦ Data  research design  RQ

Twitter
    Content analysis
    Longitudinal analysis
    Network analysis

   Different research designs requires different techniques
   Collaborate



Look at the data from different
angles, i.e. research designs
Thank you for your attention

Contenu connexe

Similaire à Social media presentation held at RC33 conference, Sydney, Australia

Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social MediaDr Wasim Ahmed
 
SKOPOS Defining Social Media
SKOPOS Defining Social MediaSKOPOS Defining Social Media
SKOPOS Defining Social Mediaskoposuk
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsIsmail Fahmi
 
Reading the Riots on Twitter
Reading the Riots on TwitterReading the Riots on Twitter
Reading the Riots on Twitterrobnprocter
 
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Tim Highfield
 
CCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsCCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsJean Burgess
 
CCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationCCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationDarryl Woodford
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009Peter Mika
 
Social Media and Journalism
Social Media and JournalismSocial Media and Journalism
Social Media and JournalismSocialize Group
 
Social Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaSocial Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaMala Chandra
 
Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Sergey Dmitriev
 
Social media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftSocial media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftedSocialMedia
 
Working With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampWorking With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampMike Anderson
 
Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Robert Giggey
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferencemikep007
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterAxel Bruns
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Axel Bruns
 

Similaire à Social media presentation held at RC33 conference, Sydney, Australia (20)

Insights From Social Media
Insights From Social MediaInsights From Social Media
Insights From Social Media
 
SKOPOS Defining Social Media
SKOPOS Defining Social MediaSKOPOS Defining Social Media
SKOPOS Defining Social Media
 
Social Media Analytics for Official Statistics
Social Media Analytics for Official StatisticsSocial Media Analytics for Official Statistics
Social Media Analytics for Official Statistics
 
Reading the Riots on Twitter
Reading the Riots on TwitterReading the Riots on Twitter
Reading the Riots on Twitter
 
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
Appropriating breaking news? The evolving Twitter coverage of the Lance Armst...
 
CCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media AnalyticsCCI Winter School Workshop on Digital Methods and Social Media Analytics
CCI Winter School Workshop on Digital Methods and Social Media Analytics
 
CCI Winter School Social Media Presentation
CCI Winter School Social Media PresentationCCI Winter School Social Media Presentation
CCI Winter School Social Media Presentation
 
Semantic Search Summer School2009
Semantic Search Summer School2009Semantic Search Summer School2009
Semantic Search Summer School2009
 
Social Media and Journalism
Social Media and JournalismSocial Media and Journalism
Social Media and Journalism
 
Social Media and You! Introduction to Social Media
Social Media and You! Introduction to Social MediaSocial Media and You! Introduction to Social Media
Social Media and You! Introduction to Social Media
 
Omd Meeting
Omd MeetingOmd Meeting
Omd Meeting
 
Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010Wiki-course 'An Introduction to the IT Industry' 2010
Wiki-course 'An Introduction to the IT Industry' 2010
 
Social media tool belt presentation at Ravenscroft
Social media tool belt presentation at RavenscroftSocial media tool belt presentation at Ravenscroft
Social media tool belt presentation at Ravenscroft
 
Future Media
Future MediaFuture Media
Future Media
 
Working With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media CampWorking With Facebook, Twitter, et al. - Social Media Camp
Working With Facebook, Twitter, et al. - Social Media Camp
 
Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)Canadian Municipal Gov 2.0 (Lac Carling 2009)
Canadian Municipal Gov 2.0 (Lac Carling 2009)
 
Going beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conferenceGoing beyond google 2 philadelphia loss conference
Going beyond google 2 philadelphia loss conference
 
Twitter
TwitterTwitter
Twitter
 
Social Media in Australia: The Case of Twitter
Social Media in Australia: The Case of TwitterSocial Media in Australia: The Case of Twitter
Social Media in Australia: The Case of Twitter
 
Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010Key Events in Australian (Micro-)Blogging during 2010
Key Events in Australian (Micro-)Blogging during 2010
 

Dernier

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 

Dernier (20)

ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 

Social media presentation held at RC33 conference, Sydney, Australia

  • 1. Analyzing Twitter data Issues Challenges and Opportunities RC33 Conference, Sydney Australia, 9-13 July 2012 Maurice Vergeer m.vergeer@maw.ru.nl / www.mauricevergeer.nl / blog.mauricevergeer.nl Radboud University Nijmegen, the Netherlands
  • 2. Many platform  Empty platform / - Facebook infrastructure - Twitter - Facility - Linkedin - Hyves - RenRen - Cyworld  User generated content - Orkut - Text - Youtube - Audio - Flickr - Video - Plurk - Pictures - Sina Weibo - Etc Social media
  • 3. Number of articles on politics, Internet and social media 180 160 140 120 Number of articles 100 80 60 40 20 0 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 Internet and politics (query 1) Social media and politics (query 2) Internet, social media and politics (query 3) Source: Vergeer (in press / 2012) in New Media & Society
  • 5. The Netherlands A special case?
  • 6.
  • 7. Opportunities ◦ Methodological/technical  Timeseries analysis  Network analysis ◦ Actors ◦ Content ◦ Diffusion of information through onine social networks ◦ Social media activities  Limitations ◦ Twitter  Reliability of Twitter API Outline
  • 8. Within Twitter (using the API) • Username • Account creation data • # of followers • And the actual usernames of these followers • # of followers • And the actual usernames of those being followed • Tweet text • And many more (see dev.twitter.com) Data sources
  • 9. Tweet ◦ Tweet text ◦ Whether or not it was a reply to another tweet  To whom it was a reply (username/screenname and numerical userid) ◦ Whether or not it was a retweet (according to Twitter)  Which tweet was retweeted (nunerical tweetid)
  • 10. Message of tweet  Whether or not is was a directed tweet (sent to someone in particular) ◦ Identified by an @-sign  Whether or not is was a retweet ◦ Identified by RT Type of content
  • 11. Undirected tweet ◦ RCMP Commissioner appearing before Public Safety Cmte now. What a popular guy - he has his own paparazzi!  Directed tweet ◦ Fantastic blog by my good friend @GlenPearson - http://bit.ly/hlAKXp #lpc  Directed tweet to two usernames ◦ @miken32 @CBCEdmonton probably because that is NOT what I said--more commercially viable is different than not needed.  Retweet ◦ RT @liberal_party: Think Durham deserves better than Bev Oda? Join @BobRaeMP for a rally tomorrow at 1pm http://lpc.ca/durham #cdnpoli #lpc Tweet examples
  • 12.
  • 13. Traditional material ◦ Produced by professional actors ◦ Newspapers ◦ Public administration documents  Social media ◦ Produced by  professional actors  general public Content analysis of tweets
  • 14. Large quantities of data  Word frequencies ◦ Identifying the most important words in the corpus ◦ Code these words into more general categories  Switch to SPSS (or other type of data management tool) ◦ Search for the words in the actual tweets ◦ Assign tweet to a specific code  Improvements in SPSS ◦ Compute command facilitates many new text operators ◦ Char.index, Char.substr, etc  Alternative ◦ Regular expressions ◦ complex Data extraction
  • 15. Publicly available data sources on parliament, election council  Time series ◦ Identifying relevant societal/political events relevant for the study at hand  Ex.1 temporarily shut down of election campaign due to passenger plane crash of Dutch airliner in Libia My 2010  Ex.2 Deregistration of People s Political Power Party of Canada External data sources
  • 16. 900 800 700 600 500 400 300 200 100 0 newspaper broadcasting radio news agency magazine online only local institutional Twitter account Personal Twitter account 9
  • 17. Source: Vergeer & Hermans (forthcoming / 2013) in Journal of Computer-Mediated Communication
  • 18.
  • 19. 1000 0 100 200 500 800 900 300 400 600 700 01-mei-2010 02-mei-2010 03-mei-2010 04-mei-2010 05-mei-2010 06-mei-2010 07-mei-2010 08-mei-2010 09-mei-2010 CDA PvdD 10-mei-2010 11-mei-2010 12-mei-2010 SGP PvdA 13-mei-2010 14-mei-2010 15-mei-2010 SP NN 16-mei-2010 17-mei-2010 18-mei-2010 VVD TON 19-mei-2010 20-mei-2010 21-mei-2010 PVV 22-mei-2010 MenS 23-mei-2010 24-mei-2010 GL HNL 25-mei-2010 26-mei-2010 27-mei-2010 CU 28-mei-2010 Partij1 29-mei-2010 30-mei-2010 31-mei-2010 D66 Piraten 01-jun-2010 02-jun-2010 03-jun-2010 04-jun-2010 05-jun-2010 06-jun-2010 07-jun-2010 08-jun-2010 09-jun-2010
  • 20. Date and time  For longitudinal analysis and cross-national comparisons ◦ take note of the time differences and correct if necessary.  Time zones  Daytime saving  What to do with countries having multiple time zones? ◦ Depends on RQs  Communication patterns: keep a single time zone  Focus on individual daily patterns: adjust for time zones
  • 21. Total tweets by candidates, followers and followed: ◦ 4,536,854 tweets  Breakdown ◦ Tweets among candidates: appr 2% ◦ Tweets to inner circles (followers or being followed) appr 18% ◦ Tweets to outer circle: appr 33% ◦ Tweets not directed to anyone in particular appr 49% ◦ Extracting users from tweets (@adresses) Communication network analysis
  • 22.  Communication network based on candidates identified in tweets  Excluding the general public Communication network analysis
  • 23.
  • 24. See http://tinyurl.com/blzajsl for animated version.
  • 25. Retrospective ◦ 3200 tweets back in time  Cost technical ◦ Access to firehose for real time data Limitations in data collection
  • 26. Date of tweet ◦ Minute fraction is time stamped with the wrong date  Solution ◦ Estimate date and time using the tweetid  Status of tweet as retweet ◦ RT  Solution:  Use text search operators to identify real retweets (“RT ”, “rt “) Also see http://tinyurl.com/bohhjzn  Reply to tweets ◦ Only the first address is identified  Solution ◦ Search for multiple @-addresses using text extraction methods Reliability of data as provided by the API
  • 27. BIG DATA The buzz word of these days
  • 28.  Not gigabyte, ot terabytes,  But petabytes and exabytes of data
  • 29.  Only for the few  Specific hardware requirements ◦ Computing power ◦ Data storage  The data presented in this presentation ◦ Appr 4.5 million records equals appr 1 gigabyte, not that Big
  • 30. There is still so much to be done with…
  • 31. Focus on specific cases  -political communication:  politicians – candidates in elections  -fan studies  celebrities  cast of popular soap opera’s ◦ -journalism studies  journalists and newspapers  Focus on specific cases
  • 32.  actor information  information on societal events  accumulate data over time using the same data structure ◦ Proonged analysis ◦ Multuple case studies, cross-national comparative analysis Enrich existing Twitter data with external data
  • 33. Traditional process (textbook approach) ◦ RQ -> research design  Practice, particularly with secondaire (i.e. third party) data ◦ Data  RQ  research design ◦ Data  research design  RQ Twitter  Content analysis  Longitudinal analysis  Network analysis  Different research designs requires different techniques  Collaborate Look at the data from different angles, i.e. research designs
  • 34. Thank you for your attention