SlideShare une entreprise Scribd logo
1  sur  43
Researching Social Media – Big 
Data and Social Media Analysis 
Farida Vis 
Information School, University of Sheffield 
@flygirltwo | @VisSocMedLab 
Social Media for Researchers: A Sheffield Universities Social Media 
Symposium, 23 September 2014
• Academic – working on social media and Big Data (methods) 
• Have been studying social media for nearly a decade 
• Sometime data journalist for The Guardian 
• Co-author of the Data Journalism Handbook 
• Led social media analysis of ‘Reading the Riots on Twitter’ (2.5M) 
• Fellow recipient of inaugural Data Journalism Award (for Twitter 
rumour visualization as part of Reading the Riots) 
• Now: verification – spread on (mis)information online 
• Key concern for the World Economic Forum (top ten trend for 2014) 
• World Economic Forum Global Agenda Council for Social Media 
• Border runner – talk/work across academia | government | industry 
• Advising various research councils on funding social media research 
• New projects: Picturing the Social and Visual Social Media Lab 
• MA module (option): Researching Social Media/ Research Methods 
for Social Media (40 students 2012/2013; 65 students 2013/2014)
"Picturing the Social: transforming our 
understanding of images in social media and 
Big Data research.” 
ESRC Transformative Research grant
Large interdisciplinary team 
Farida Vis (PI) – Media and Communication 
Simon Faulkner – Art History/Visual Culture 
James Aulich - Art History/Visual Culture 
Olga Gorgiunova – Software Studies/Sociology 
Mike Thelwall – Information Science/software 
Francesco D’Orazio – Industry/Media/software 
+ Anne Burns (RA) – Digital Ethnography
“Qualitative data on a quantitative scale” 
(D’Orazio, 2013)
Users share 750 million images daily 
Building new theory and method 
Structures 
Users 
Content
Structures 
Users 
Content
@VisSocMedLab = building community + engaging in conversation
FREE ONE DAY CONFERENCE 
Picturing the Social: 
Analysing Social Media Images 
Part of ESRC Festival of Social Science 
7 November: ICOSS, University of Sheffield 
Registration opens 1 October 
visualsocialmedialab.org
SOCIAL MEDIA = BIG DATA
REAL-TIME ANALYTICS
Industry definition 
“Big data” is high-volume, -velocity and –variety information 
assets that demand cost-effective, innovative forms of 
information processing for enhanced insight and decision 
making’ (Gartner in Sicular, 2013). 
Huge industry now built around ‘social data’ and ‘listening 
platforms’ feeding on this data (Many tools not suitable for 
academic use, black box).
Academic definition 
• Technology: maximizing computation power and algorithmic 
accuracy to gather, analyze, link, and compare large data 
sets. 
• Analysis: drawing on large data sets to identify patterns in 
order to make economic, social, technical, and legal claims. 
• Mythology: the widespread belief that large data sets offer a 
higher form of intelligence and knowledge that can generate 
insights that were previously impossible, with the aura of truth, 
objectivity, and accuracy. 
(boyd and Crawford p. 663).
Critiques of Big Data 
• Important to make visible inherent claims about objectivity 
• Problematic focus on quantitative methods 
• How can data answer questions it was not designed to 
answer? 
• How can the right questions be asked? 
• Inherent biases in large linked error prone datasets 
• Focus on text and numbers that can be mined algorithmically 
• Data fundamentalism (correlation always indicates causation)
How do we ground online data? 
In the offline: assessing findings against what we know about 
an offline population (census data) in order to better understand 
online data. Problems with over/under representation in online 
data? 
In the online: premised on the idea that data derived from the 
Internet should be grounded in other online data in order to 
understand it. So comparing Facebook use to what we know 
about Facebook use, rather than connecting it to offline 
measurements about citizens. 
Pioneered by Richard Rogers: online is the baseline.
Important considerations for online research 
1. Asking the right question – research should be question 
driven rather than data or tool driven. 
2. Accept poor data quality & users gaming metrics – once 
online metrics have value users will try to game them. 
3. Limitations of tools (often built in disconnected way) 
4. Transparency – researchers should be upfront about 
limitations of research and research design. Can the data 
answer the questions?
Organic data / data in the wild 
SOCIAL MEDIA SIMPLY AS (BIG) DATA 
VS 
SOCIAL MEDIA AS A RESEARCH AREA
DOMAIN EXPERTISE
TO UNDERSTAND 
TWITTER DATA YOU 
NEED TO UNDERSTAND 
TWITTER
+ BE ON TWITTER* 
* Or relevant platform you’re interested in!
Finding cutting edge social media research
UGC is ‘false’ until verified
Social media VS social data 
• Social Media: User-generated content where one user 
communicates and expresses themselves and that content is 
delivered to other users. Examples of this are platforms such 
as Twitter, Facebook, YouTube, Tumblr and Disqus. Social 
media is delivered in a great user experience, and is focused 
on sharing and content discovery. Social media also offers 
both public and private experiences with the ability to share 
messages privately. 
• Social Data: Expresses social media in a computer-readable 
format (e.g. JSON) and shares metadata about the content to 
help provide not only content, but context. Metadata often 
includes information about location, engagement and links 
shared. Unlike social media, social data is focused strictly on 
publicly shared experiences. (Cairns, 2013)
Social media 
Social data
Enriched metadata 
Location and influence 
Where are users? 
Are they influential?
Twitter expanding/enriching metadata 
Hawskey (2013)
New Profile Geo Enrichment
Geo-locating tweets 
Exact location 
Lat/long coordinates 
Gold standard geo data 
Problem: only 1% of users 
-> Only 2% of firehose tweets 
Early adopters, highly skewed 
Where in the world are you? 
No Lat/long coordinates 
Text field – enter anything 
Advantage: more than half of all 
tweets contain profile location 
Much more evenly distributed
Profile geo-enrichment 
‘our customers can now hear from the whole world of Twitter 
users and not just 1%’ (Cairs, 2013 on Gnip company Blog) 
• Activity Location – 1% that provide lat/long 
• Profile Location – Place provided in their profile. May or may 
not be posting from there. 
• Mentioned Location – Places a user talks about
Profile geo-enrichment + linking data 
‘Profile location data can be used to unlock demographic data 
and other information that is not otherwise possible with activity 
location. For instance, US Census Bureau statistics are 
aggregated at the locality level and can provide basic stats like 
household income. Profile location is also a strong indicator of 
activity location when one isn’t provided. (Cairns, 2013)
Dangers of mapping/tracking/predicting 
Did it really work? 
Beware of the fall of Google flu trends! 
Results are rarely independently verified 
or triangulated (can’t share social media data).
Data collection problems 
- Searching for keywords/tags: problematic. What are the 
search terms you need/know about? Significant disadvantage 
in searching for a limited number of hashtags: (1) not all 
tweets pertaining to the relevant topic will contain a hashtag; 
(2) how to identify all the relevant keywords/hashtags? 
- Identifying users: problematic. How do you identify them? 
Through information in their bio or additional registration 
information accessible through the API (YouTube)? Self-declared 
information. 
- Searching for users/topics/keywords through networks 
(Facebook, YouTube, Twitter). What does the network show? 
What kind of connections?
Social Network Analysis 
Not new, applied to social media. Many sites structure information 
through social graphs and connections between users/tags/keywords. 
Could identify new information, important view on the data. 
Limits: (1) What is the nature of the connection? What does a ‘friend’ 
connection mean? Could be friend/relative/ acquaintance/fan/ 
stranger/fake -> requires critical reflection, qualitative inspections. 
(2) Difficult to account for changes over time. The network 
visualisation only gives a snapshot in time. Not clear what that shows. 
(3) How is the network graph constructed by the social media 
company? How is the Facebook graph constructed? How are nodes 
connected? Which ‘friends’ are shown? Which are left out? 
(4) How does the visualisation tool (Gephi) show the network?
Sentiment analysis 
Developed for short texts (tweets, comments on YouTube, short 
Facebook updates, comments) to establish positive and negative 
sentiment. Great interest in sentiment analysis from both industry, 
policy makers and academia (‘mapping the mood of the nation’ type 
projects; identifying negative sentiment as an early warning sign). A 
range of tools have been developed in this area, which allow for the 
rapid processing of short structured texts. 
Limits: (1) The stated text must include a measurable sentiment (a lot 
of tweets and short texts are simply statements). 
(2) The danger of reading out of context. Texts must be placed back 
into their original context to make sense. Often impossible to do. 
(3) Establishing sentiment is difficult for both human coder & machine
Hashtag analysis 
Hashtags originally emerged on Twitter, but now also widely used on 
Facebook. By clicking them other content tagged in this way is 
revealed. This data can be collected through APIs to gaining insight 
into topics/events through their hashtag and the communities that can 
be identified around them. These can arise quickly (breaking news); 
establish around a popular TV show (#masterchef); a set of interests 
(#foodporn); part of wider social media cultures (#ff #fail #facepalm). 
Limits: (1) Sampling concerns around how Twitter makes data 
available through API structure. Not clear how collected sample relates 
to the firehose. Concern for social science researchers. 
(2) How do you know you have ‘all’ the hashtags? Only partial data.
Ethics 
Ethics acknowledged as an important and hotly debated area of 
social media research. Updated ethical guidelines from the 
Association of Internet Researchers (AoIR) published last year. 
Slow to update. Do we need more flexible frameworks and 
frequent sharing of good practice? Is it feasible to seek informed 
consent? Often seen as context specific. 
Is the data public if it is publicly available? Does the notion of 
‘public’ change when user registration data (for example age, 
gender) can only be accessed with the API, but is not publicly 
visible.
Ethics 
Most social media data is created within specific context. 
Emerging approaches/best practice. 
Quantitative: data shown in aggregate, not at individual level. 
Users not identifiable. 
Qualitative: Consent is sought, no data shared that can be 
linked back to a user. ‘Agile ethics’: do to users what you feel 
comfortable with in use of your own data. Problem with difficult to 
identify underage users, most: from 13 years+
f.vis@sheffield.ac.uk 
@flygirltwo 
visualsocialmedialab.org

Contenu connexe

Tendances

Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analytics
rupika08
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
Davide Feltoni Gurini
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
sritikumar
 

Tendances (20)

30 Tools and Tips to Speed Up Your Digital Workflow
30 Tools and Tips to Speed Up Your Digital Workflow 30 Tools and Tips to Speed Up Your Digital Workflow
30 Tools and Tips to Speed Up Your Digital Workflow
 
Twitter Data Analytics
Twitter Data AnalyticsTwitter Data Analytics
Twitter Data Analytics
 
Just What Is Social in Social Media? An Actor-Network Critique of Twitter Age...
Just What Is Social in Social Media? An Actor-Network Critique of Twitter Age...Just What Is Social in Social Media? An Actor-Network Critique of Twitter Age...
Just What Is Social in Social Media? An Actor-Network Critique of Twitter Age...
 
Measuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media InitiativesMeasuring the Success of Your Social Media Initiatives
Measuring the Success of Your Social Media Initiatives
 
Integrating Behavioural Science in Government Communication
Integrating Behavioural Science in Government CommunicationIntegrating Behavioural Science in Government Communication
Integrating Behavioural Science in Government Communication
 
Sentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and WhySentiment Analysis and Social Media: How and Why
Sentiment Analysis and Social Media: How and Why
 
Strategy Before Tactics
Strategy Before TacticsStrategy Before Tactics
Strategy Before Tactics
 
Prof. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network AnalysisProf. Hendrik Speck - Social Network Analysis
Prof. Hendrik Speck - Social Network Analysis
 
Jf2516311637
Jf2516311637Jf2516311637
Jf2516311637
 
Defrag Keynote Nov. 4, 2008
Defrag Keynote Nov. 4, 2008Defrag Keynote Nov. 4, 2008
Defrag Keynote Nov. 4, 2008
 
NLP journal paper
NLP journal paperNLP journal paper
NLP journal paper
 
Social Graph Symposium Panel - May 2010
Social Graph Symposium Panel - May 2010Social Graph Symposium Panel - May 2010
Social Graph Symposium Panel - May 2010
 
Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)Social Media Mining - Chapter 8 (Influence and Homophily)
Social Media Mining - Chapter 8 (Influence and Homophily)
 
Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)Social Media Mining - Chapter 9 (Recommendation in Social Media)
Social Media Mining - Chapter 9 (Recommendation in Social Media)
 
Social network analysis
Social network analysisSocial network analysis
Social network analysis
 
Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)Social Media Mining - Chapter 10 (Behavior Analytics)
Social Media Mining - Chapter 10 (Behavior Analytics)
 
GitHub as Transparency Device in Data Journalism, Open Data and Data Activism
GitHub as Transparency Device in  Data Journalism, Open Data and Data ActivismGitHub as Transparency Device in  Data Journalism, Open Data and Data Activism
GitHub as Transparency Device in Data Journalism, Open Data and Data Activism
 
Computational Social Science
Computational Social Science Computational Social Science
Computational Social Science
 
Fact Checking & Information Retrieval
Fact Checking & Information RetrievalFact Checking & Information Retrieval
Fact Checking & Information Retrieval
 
Social Information & Browsing March 6
Social Information & Browsing   March 6Social Information & Browsing   March 6
Social Information & Browsing March 6
 

En vedette

Social and-psychological-impact-of-social-networking-sites
Social and-psychological-impact-of-social-networking-sitesSocial and-psychological-impact-of-social-networking-sites
Social and-psychological-impact-of-social-networking-sites
Sumit Malhotra
 

En vedette (20)

Big data and Social Media Analytics
Big data and Social Media AnalyticsBig data and Social Media Analytics
Big data and Social Media Analytics
 
FIM seminar - Social Media and Big Data
FIM seminar - Social Media and Big DataFIM seminar - Social Media and Big Data
FIM seminar - Social Media and Big Data
 
Social media with big data analytics
Social media with big data analyticsSocial media with big data analytics
Social media with big data analytics
 
Social network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learningSocial network architecture - Part 3. Big data - Machine learning
Social network architecture - Part 3. Big data - Machine learning
 
When size matters Is social media data really that BIG
When size matters Is social media data really that BIGWhen size matters Is social media data really that BIG
When size matters Is social media data really that BIG
 
Big Data, Social Media & Wine
Big Data, Social Media & WineBig Data, Social Media & Wine
Big Data, Social Media & Wine
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Saving Lives with Big Data from Social Media in Emergencies
Saving Lives with Big Data from Social Media in EmergenciesSaving Lives with Big Data from Social Media in Emergencies
Saving Lives with Big Data from Social Media in Emergencies
 
Big data ppt
Big  data pptBig  data ppt
Big data ppt
 
What is Big Data?
What is Big Data?What is Big Data?
What is Big Data?
 
The value of open networking for researchers
The value of open networking for researchers The value of open networking for researchers
The value of open networking for researchers
 
Shifting perception and making an impact through Social Media and Influencer ...
Shifting perception and making an impact through Social Media and Influencer ...Shifting perception and making an impact through Social Media and Influencer ...
Shifting perception and making an impact through Social Media and Influencer ...
 
Curating Social Media Data And Compiling Them Together
Curating Social Media Data And Compiling Them TogetherCurating Social Media Data And Compiling Them Together
Curating Social Media Data And Compiling Them Together
 
Become A Social Media Influencer: Get Followers and Get Noticed!
Become A Social Media Influencer: Get Followers and Get Noticed!Become A Social Media Influencer: Get Followers and Get Noticed!
Become A Social Media Influencer: Get Followers and Get Noticed!
 
ARE BIG DATA AND SOCIAL MEDIA BRINGING DISRUPTION TO HUMAN RESOURCES PRACTICES?
ARE BIG DATA AND SOCIAL MEDIA BRINGING DISRUPTION TO HUMAN RESOURCES PRACTICES?ARE BIG DATA AND SOCIAL MEDIA BRINGING DISRUPTION TO HUMAN RESOURCES PRACTICES?
ARE BIG DATA AND SOCIAL MEDIA BRINGING DISRUPTION TO HUMAN RESOURCES PRACTICES?
 
Social and-psychological-impact-of-social-networking-sites
Social and-psychological-impact-of-social-networking-sitesSocial and-psychological-impact-of-social-networking-sites
Social and-psychological-impact-of-social-networking-sites
 
Social network architecture - Part 1. Core user
Social network architecture - Part 1. Core userSocial network architecture - Part 1. Core user
Social network architecture - Part 1. Core user
 
Social network architecture - Part 2. News feed
Social network architecture - Part 2. News feedSocial network architecture - Part 2. News feed
Social network architecture - Part 2. News feed
 
Making it big in software (ibm post doctoral fellow symposium keynote slidesh...
Making it big in software (ibm post doctoral fellow symposium keynote slidesh...Making it big in software (ibm post doctoral fellow symposium keynote slidesh...
Making it big in software (ibm post doctoral fellow symposium keynote slidesh...
 
Social Media, Cloud Computing and architecture
Social Media, Cloud Computing and architectureSocial Media, Cloud Computing and architecture
Social Media, Cloud Computing and architecture
 

Similaire à Researching Social Media – Big Data and Social Media Analysis

What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker Notes
KrisKasianovitz
 
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docxRUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
cheryllwashburn
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
suresh sood
 

Similaire à Researching Social Media – Big Data and Social Media Analysis (20)

Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...Working with Social Media Data: Ethics & good practice around collecting, usi...
Working with Social Media Data: Ethics & good practice around collecting, usi...
 
s00146-014-0549-4.pdf
s00146-014-0549-4.pdfs00146-014-0549-4.pdf
s00146-014-0549-4.pdf
 
What Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker NotesWhat Your Tweets Tell Us About You, Speaker Notes
What Your Tweets Tell Us About You, Speaker Notes
 
Accessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science KnowledgeAccessing and Using Big Data to Advance Social Science Knowledge
Accessing and Using Big Data to Advance Social Science Knowledge
 
Eavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging SiteEavesdropping on the Twitter Microblogging Site
Eavesdropping on the Twitter Microblogging Site
 
Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...Practical Applications for Social Network Analysis in Public Sector Marketing...
Practical Applications for Social Network Analysis in Public Sector Marketing...
 
Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...Lecture series: Using trace data or subjective data, that is the question dur...
Lecture series: Using trace data or subjective data, that is the question dur...
 
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
‘Big Social Data’ in Context: Connecting Social Media Data and Other Sources
 
Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...Blurring the Boundaries? Ethical challenges in using social media for social...
Blurring the Boundaries? Ethical challenges in using social media for social...
 
Big Data for International Development
Big Data for International DevelopmentBig Data for International Development
Big Data for International Development
 
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docxRUNNING HEAD BIG DATA IN SOCIAL MEDIA                            .docx
RUNNING HEAD BIG DATA IN SOCIAL MEDIA .docx
 
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
IRJET - Socirank Identifying and Ranking Prevalent News Topics using Social M...
 
Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016Netnography online course part 1 of 3 17 november 2016
Netnography online course part 1 of 3 17 november 2016
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social media
 
Science communication via social media
Science communication via social mediaScience communication via social media
Science communication via social media
 
The Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational ObjectivesThe Implementation of Social Media for Educational Objectives
The Implementation of Social Media for Educational Objectives
 
Web2.0 for research: The Spin, blag and blog.
Web2.0 for research: The Spin, blag and blog.Web2.0 for research: The Spin, blag and blog.
Web2.0 for research: The Spin, blag and blog.
 
Information Retrieval and Social Media
Information Retrieval and Social MediaInformation Retrieval and Social Media
Information Retrieval and Social Media
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Tn T Horizons April 28 2008
Tn T Horizons April 28 2008Tn T Horizons April 28 2008
Tn T Horizons April 28 2008
 

Plus de Farida Vis

Everyday Growing and Digging Cultures
Everyday Growing and Digging CulturesEveryday Growing and Digging Cultures
Everyday Growing and Digging Cultures
Farida Vis
 
What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?
Farida Vis
 
London Police Tweets
London Police Tweets London Police Tweets
London Police Tweets
Farida Vis
 
Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective
Farida Vis
 
FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change
Farida Vis
 
Growing Back to the Future
Growing Back to the Future Growing Back to the Future
Growing Back to the Future
Farida Vis
 
Future of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City CampFuture of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City Camp
Farida Vis
 
Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...
Farida Vis
 

Plus de Farida Vis (13)

Vacantacres
VacantacresVacantacres
Vacantacres
 
Everyday Growing and Digging Cultures
Everyday Growing and Digging CulturesEveryday Growing and Digging Cultures
Everyday Growing and Digging Cultures
 
Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?Where do images fit in the era of ‘Big Data’?
Where do images fit in the era of ‘Big Data’?
 
How not to measure Twitter Influence
How not to measure Twitter InfluenceHow not to measure Twitter Influence
How not to measure Twitter Influence
 
What do Data Visualisations want?
What do Data Visualisations want?What do Data Visualisations want?
What do Data Visualisations want?
 
London Police Tweets
London Police Tweets London Police Tweets
London Police Tweets
 
Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective Allotment (publics): an open data and data driven journalism perspective
Allotment (publics): an open data and data driven journalism perspective
 
FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change FutureEverything 2012: Social Media, Social Change
FutureEverything 2012: Social Media, Social Change
 
Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12Reading The Riots on Twitter at LIFT12
Reading The Riots on Twitter at LIFT12
 
Growing Back to the Future
Growing Back to the Future Growing Back to the Future
Growing Back to the Future
 
Future of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City CampFuture of allotments in the UK: Manchester City Camp
Future of allotments in the UK: Manchester City Camp
 
Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...Digging into citizenship: land, waiting lists, broken promises and creating c...
Digging into citizenship: land, waiting lists, broken promises and creating c...
 
Amazon.com and the outbreak narrative
Amazon.com and the outbreak narrativeAmazon.com and the outbreak narrative
Amazon.com and the outbreak narrative
 

Dernier

Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
eliklein8
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
Health
 
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
ZurliaSoop
 

Dernier (20)

Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
Call Girls in Chattarpur (delhi) call me [9953056974] escort service 24X7
 
Film show pre-production powerpoint for site
Film show pre-production powerpoint for siteFilm show pre-production powerpoint for site
Film show pre-production powerpoint for site
 
Film show investigation powerpoint for the site
Film show investigation powerpoint for the siteFilm show investigation powerpoint for the site
Film show investigation powerpoint for the site
 
Interpreting the brief for the media IDY
Interpreting the brief for the media IDYInterpreting the brief for the media IDY
Interpreting the brief for the media IDY
 
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...College & House wife  Call Girls in Paharganj 9634446618 -Best Escort call gi...
College & House wife Call Girls in Paharganj 9634446618 -Best Escort call gi...
 
Improve Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing CompanyImprove Your Brand in Waco with a Professional Social Media Marketing Company
Improve Your Brand in Waco with a Professional Social Media Marketing Company
 
Capstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolutionCapstone slide deck on the TikTok revolution
Capstone slide deck on the TikTok revolution
 
Capstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdfCapstone slidedeck for my capstone final edition.pdf
Capstone slidedeck for my capstone final edition.pdf
 
Finance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdfFinance-and-Operations-in-the-Azure-Cloud.pdf
Finance-and-Operations-in-the-Azure-Cloud.pdf
 
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdfSEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
SEO Expert in USA - 5 Ways to Improve Your Local Ranking - Macaw Digital.pdf
 
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 8617697112 Top Class Pondicherry Escort Servi...
 
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
Unlock the power of Instagram with SocioCosmos. Start your journey towards so...
 
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
+971565801893>> ORIGINAL CYTOTEC ABORTION PILLS FOR SALE IN DUBAI AND ABUDHABI<<
 
Generate easy money from tiktok using this simple steps on the book.
Generate easy money from tiktok using this simple steps on the book.Generate easy money from tiktok using this simple steps on the book.
Generate easy money from tiktok using this simple steps on the book.
 
Production diary Film the city powerpoint
Production diary Film the city powerpointProduction diary Film the city powerpoint
Production diary Film the city powerpoint
 
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic HappensIgnite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
Ignite Your Online Influence: Sociocosmos - Where Social Media Magic Happens
 
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 76 Noida Escorts >༒8448380779 Escort Service
 
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
Jual Obat Aborsi Palu ( Taiwan No.1 ) 085657271886 Obat Penggugur Kandungan C...
 
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
Craft Your Legacy: Invest in YouTube Presence from Sociocosmos"
 
Social media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketingSocial media marketing/Seo expert and digital marketing
Social media marketing/Seo expert and digital marketing
 

Researching Social Media – Big Data and Social Media Analysis

  • 1. Researching Social Media – Big Data and Social Media Analysis Farida Vis Information School, University of Sheffield @flygirltwo | @VisSocMedLab Social Media for Researchers: A Sheffield Universities Social Media Symposium, 23 September 2014
  • 2. • Academic – working on social media and Big Data (methods) • Have been studying social media for nearly a decade • Sometime data journalist for The Guardian • Co-author of the Data Journalism Handbook • Led social media analysis of ‘Reading the Riots on Twitter’ (2.5M) • Fellow recipient of inaugural Data Journalism Award (for Twitter rumour visualization as part of Reading the Riots) • Now: verification – spread on (mis)information online • Key concern for the World Economic Forum (top ten trend for 2014) • World Economic Forum Global Agenda Council for Social Media • Border runner – talk/work across academia | government | industry • Advising various research councils on funding social media research • New projects: Picturing the Social and Visual Social Media Lab • MA module (option): Researching Social Media/ Research Methods for Social Media (40 students 2012/2013; 65 students 2013/2014)
  • 3.
  • 4. "Picturing the Social: transforming our understanding of images in social media and Big Data research.” ESRC Transformative Research grant
  • 5. Large interdisciplinary team Farida Vis (PI) – Media and Communication Simon Faulkner – Art History/Visual Culture James Aulich - Art History/Visual Culture Olga Gorgiunova – Software Studies/Sociology Mike Thelwall – Information Science/software Francesco D’Orazio – Industry/Media/software + Anne Burns (RA) – Digital Ethnography
  • 6. “Qualitative data on a quantitative scale” (D’Orazio, 2013)
  • 7. Users share 750 million images daily Building new theory and method Structures Users Content
  • 9.
  • 10. @VisSocMedLab = building community + engaging in conversation
  • 11. FREE ONE DAY CONFERENCE Picturing the Social: Analysing Social Media Images Part of ESRC Festival of Social Science 7 November: ICOSS, University of Sheffield Registration opens 1 October visualsocialmedialab.org
  • 12. SOCIAL MEDIA = BIG DATA
  • 14. Industry definition “Big data” is high-volume, -velocity and –variety information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making’ (Gartner in Sicular, 2013). Huge industry now built around ‘social data’ and ‘listening platforms’ feeding on this data (Many tools not suitable for academic use, black box).
  • 15. Academic definition • Technology: maximizing computation power and algorithmic accuracy to gather, analyze, link, and compare large data sets. • Analysis: drawing on large data sets to identify patterns in order to make economic, social, technical, and legal claims. • Mythology: the widespread belief that large data sets offer a higher form of intelligence and knowledge that can generate insights that were previously impossible, with the aura of truth, objectivity, and accuracy. (boyd and Crawford p. 663).
  • 16. Critiques of Big Data • Important to make visible inherent claims about objectivity • Problematic focus on quantitative methods • How can data answer questions it was not designed to answer? • How can the right questions be asked? • Inherent biases in large linked error prone datasets • Focus on text and numbers that can be mined algorithmically • Data fundamentalism (correlation always indicates causation)
  • 17. How do we ground online data? In the offline: assessing findings against what we know about an offline population (census data) in order to better understand online data. Problems with over/under representation in online data? In the online: premised on the idea that data derived from the Internet should be grounded in other online data in order to understand it. So comparing Facebook use to what we know about Facebook use, rather than connecting it to offline measurements about citizens. Pioneered by Richard Rogers: online is the baseline.
  • 18. Important considerations for online research 1. Asking the right question – research should be question driven rather than data or tool driven. 2. Accept poor data quality & users gaming metrics – once online metrics have value users will try to game them. 3. Limitations of tools (often built in disconnected way) 4. Transparency – researchers should be upfront about limitations of research and research design. Can the data answer the questions?
  • 19. Organic data / data in the wild SOCIAL MEDIA SIMPLY AS (BIG) DATA VS SOCIAL MEDIA AS A RESEARCH AREA
  • 21. TO UNDERSTAND TWITTER DATA YOU NEED TO UNDERSTAND TWITTER
  • 22. + BE ON TWITTER* * Or relevant platform you’re interested in!
  • 23. Finding cutting edge social media research
  • 24. UGC is ‘false’ until verified
  • 25. Social media VS social data • Social Media: User-generated content where one user communicates and expresses themselves and that content is delivered to other users. Examples of this are platforms such as Twitter, Facebook, YouTube, Tumblr and Disqus. Social media is delivered in a great user experience, and is focused on sharing and content discovery. Social media also offers both public and private experiences with the ability to share messages privately. • Social Data: Expresses social media in a computer-readable format (e.g. JSON) and shares metadata about the content to help provide not only content, but context. Metadata often includes information about location, engagement and links shared. Unlike social media, social data is focused strictly on publicly shared experiences. (Cairns, 2013)
  • 27.
  • 28. Enriched metadata Location and influence Where are users? Are they influential?
  • 30. New Profile Geo Enrichment
  • 31. Geo-locating tweets Exact location Lat/long coordinates Gold standard geo data Problem: only 1% of users -> Only 2% of firehose tweets Early adopters, highly skewed Where in the world are you? No Lat/long coordinates Text field – enter anything Advantage: more than half of all tweets contain profile location Much more evenly distributed
  • 32. Profile geo-enrichment ‘our customers can now hear from the whole world of Twitter users and not just 1%’ (Cairs, 2013 on Gnip company Blog) • Activity Location – 1% that provide lat/long • Profile Location – Place provided in their profile. May or may not be posting from there. • Mentioned Location – Places a user talks about
  • 33. Profile geo-enrichment + linking data ‘Profile location data can be used to unlock demographic data and other information that is not otherwise possible with activity location. For instance, US Census Bureau statistics are aggregated at the locality level and can provide basic stats like household income. Profile location is also a strong indicator of activity location when one isn’t provided. (Cairns, 2013)
  • 34. Dangers of mapping/tracking/predicting Did it really work? Beware of the fall of Google flu trends! Results are rarely independently verified or triangulated (can’t share social media data).
  • 35. Data collection problems - Searching for keywords/tags: problematic. What are the search terms you need/know about? Significant disadvantage in searching for a limited number of hashtags: (1) not all tweets pertaining to the relevant topic will contain a hashtag; (2) how to identify all the relevant keywords/hashtags? - Identifying users: problematic. How do you identify them? Through information in their bio or additional registration information accessible through the API (YouTube)? Self-declared information. - Searching for users/topics/keywords through networks (Facebook, YouTube, Twitter). What does the network show? What kind of connections?
  • 36. Social Network Analysis Not new, applied to social media. Many sites structure information through social graphs and connections between users/tags/keywords. Could identify new information, important view on the data. Limits: (1) What is the nature of the connection? What does a ‘friend’ connection mean? Could be friend/relative/ acquaintance/fan/ stranger/fake -> requires critical reflection, qualitative inspections. (2) Difficult to account for changes over time. The network visualisation only gives a snapshot in time. Not clear what that shows. (3) How is the network graph constructed by the social media company? How is the Facebook graph constructed? How are nodes connected? Which ‘friends’ are shown? Which are left out? (4) How does the visualisation tool (Gephi) show the network?
  • 37.
  • 38. Sentiment analysis Developed for short texts (tweets, comments on YouTube, short Facebook updates, comments) to establish positive and negative sentiment. Great interest in sentiment analysis from both industry, policy makers and academia (‘mapping the mood of the nation’ type projects; identifying negative sentiment as an early warning sign). A range of tools have been developed in this area, which allow for the rapid processing of short structured texts. Limits: (1) The stated text must include a measurable sentiment (a lot of tweets and short texts are simply statements). (2) The danger of reading out of context. Texts must be placed back into their original context to make sense. Often impossible to do. (3) Establishing sentiment is difficult for both human coder & machine
  • 39. Hashtag analysis Hashtags originally emerged on Twitter, but now also widely used on Facebook. By clicking them other content tagged in this way is revealed. This data can be collected through APIs to gaining insight into topics/events through their hashtag and the communities that can be identified around them. These can arise quickly (breaking news); establish around a popular TV show (#masterchef); a set of interests (#foodporn); part of wider social media cultures (#ff #fail #facepalm). Limits: (1) Sampling concerns around how Twitter makes data available through API structure. Not clear how collected sample relates to the firehose. Concern for social science researchers. (2) How do you know you have ‘all’ the hashtags? Only partial data.
  • 40. Ethics Ethics acknowledged as an important and hotly debated area of social media research. Updated ethical guidelines from the Association of Internet Researchers (AoIR) published last year. Slow to update. Do we need more flexible frameworks and frequent sharing of good practice? Is it feasible to seek informed consent? Often seen as context specific. Is the data public if it is publicly available? Does the notion of ‘public’ change when user registration data (for example age, gender) can only be accessed with the API, but is not publicly visible.
  • 41.
  • 42. Ethics Most social media data is created within specific context. Emerging approaches/best practice. Quantitative: data shown in aggregate, not at individual level. Users not identifiable. Qualitative: Consent is sought, no data shared that can be linked back to a user. ‘Agile ethics’: do to users what you feel comfortable with in use of your own data. Problem with difficult to identify underage users, most: from 13 years+