SlideShare une entreprise Scribd logo
1  sur  87
Analyzing Social Media with Digital Methods
Possibilities, Requirements, and Limitations
Bernhard Rieder
Universiteit van Amsterdam
Mediastudies Department
The starting point
Social media are playing important roles in contemporary society, from the
very personal to the very public.
Many disciplines have begun to study social media, applying various
methodologies (ethnography, questionnaires, etc.), but there is an
explosion in data-driven research that relies on the computational analysis
of data gleaned from social media platforms.
The promise is (cheap and detailed) access to what people do, not what
they say they do; to their behavior, exchange, ideas, and sentiments.
This presentation
This talk introduces social media analysis using digital methods from a
theoretically involved yet "practical" perspective.
Instead of laying out an overarching "logic" of social media data analysis, I
focus on the basic setup and the rich reservoir of analytical gestures that
constitute the practice of data analysis.
1 / A (long) introduction
2 / Three examples covering Facebook, Twitter, and YouTube
3 / Some conclusions and recommendations
1 / Introduction
Social media services host an increasing number of relevant phenomena,
including everyday practices, political presentation and debate, social and
political activism, disaster communication, etc.
A number of preliminary remarks:
☉ The phenomena one is interested in may not happen or resonate on social media;
many things happen elsewhere.
☉ Even if one's research focus is on social media, one may not get the data.
☉ One requires a least some technical competence and the willingness to confront
and learn about a number of technical matters.
☉ Every social media "platform" (Gillespie 2013) is different and requires a different
approach; cf. "medium-specificity".
1 / Introduction
Hypothetico-deductive approaches are certainly possible, but this
presentation espouses inductive "exploratory data analysis" (Tukey 1962)
that emphasizes iteration, methodological flexibility, adjustment of
questions, and "grounded theory" (Glaser & Strauss 1965).
"Far better an approximate answer to the right question, which is often
vague, than an exact answer to the wrong question, which can always be
made precise. Data analysis must progress by approximate answers, at
best, since its knowledge of what the problem really is will at best be
approximate." (Tukey 1962)
1 / Introduction
How does social media analysis with digital methods work?
social media platform
e.g. Twitter, Facebook
users communicate,
interact, express, publish,
etc. through "grammars
of action" (forms and
functions) rendered in
technical interface to the
data, defined in technical,
legal, and logistical terms
extraction software
e.g. DMI-TCAT, Netvizz
makes calls to API,
creates "views" by
combing data into
specific sets or
metrics, produces
provides visual or textual
representation of view,
e.g. an interactive chart
data in standard file
format, e.g. CSV
allows analyzing
files in various
ways, e.g.
statistics, graph
output type 1: widget
output type 2:
analysis software,
e.g. Excel, gephi
layers of technical mediation that one might want to think about
3 4
1 / Introduction - a / the platform
Social media services channel communication, interaction, etc. through
"grammars of action" (forms and functions) rendered in software; users
appropriate these affordances.
Every service is different. Every service changes over time, both in terms
of technology and user practices.
Homogeneous interfaces do not mean homogeneous practices. Platforms
strive to capture large audiences and leave important margins to users.
Social media platforms are organized
around instances of predefined
types of entities (users, messages,
hashtags, posts, etc.) and
connections between them.
They formalize and channel
expression, exchange, and
coordination and data fields
are closely related to these
Data fields mirror forms and
functions of the platform.
Social media are different from the "open" Web because most data is
formalized in fields and a "semantic data model".
The more detailed the formalization, the more salient the data.
Social media platforms are essentially large databases.
1 / Introduction - a / the platform
Very large numbers and variety in users,
contents, purposes, arrangements, etc.
Social media are built around simple
point-to-point principles; this allows
for a variety of configurations to
emerge over time.
Every account is the same, but there
are vast differences in scale. We need
to begin with technical fieldwork and
conceptualization of the platform.
1 / Introduction - b / the APIs
There are two possibilities to collect data automatically from social media
platforms: scraping the user interface or collecting via specified
application programming interfaces (APIs).
APIs specify (technically, legally, logistically):
☉ What data can be retrieved (certain fields may be inaccessible or incomplete);
☉ How much data can be retrieved (all APIs have rate limits);
☉ The span of coverage (temporal limitations apply often);
☉ The perceptivity of coverage (privacy or personalization can skew access);
For example, Facebook (currently) provides these variables for each post:
comment like share
count yes yes yes
individual user list yes yes no
time-stamp yes no no
Social media users produce
detailed data traces; data
pools in social media are
centralized and retrievable.
Structure of APIs is closely
related to given formalizations.
In order to select, process, and
interpret data we need to
understand the platform:
entities, relations, modes of
aggregation, metrics, etc.
Every platform is different and
we thus need medium-specific
data analysis.
1 / Introduction - c / the extraction software
Extraction software are the programs that connect to the APIs, retrieve
data, and produce specific outputs.
Can range from custom-written scripts to one-click visualization widgets.
These programs work with API data, but add their own "epistemological
twist", i.e. produce particular views on the data. Sampling is often
difficult, therefore n = all is the norm.
Extraction software can be very simple and completely free or have steep
technical, logistical, and financial requirements.
Example for a widget:
Example for a commercial
service: Topsy
Example for a on open source
analytics suite: DMI-TCAT
Example for a on open source
analytics suite: DMI-TCAT
There are many different tools out there, with different conceptual
underpinnings, ease of use, depth, etc.
Data analysis (statistics): Excel, SPSS, Tableau, Wizard, Mondrian, …
Data analysis (graph): Gephi, NodeXL, Pajek, …
Data analysis (other): Rapidminer, SentiStrength, Wordij, …
Data analysis (custom): R, Python (NLTK, NumPy & SciPy), …
This presentation relies mostly on R (R Core Team 2014) and Gephi
(Bastian, Heymann, Jacomy 2009).
1 / Introduction - d / the analysis software
1 / Introduction - d / the analysis software
Analysis software provide analytical gestures to apply to the data; may be
integrated into the extraction software or not.
We investigate the structure of data by creating "views" of the data.
Analytical gestures produce orderings, lists, tables, charts, coefficients etc.
that are saying something about the data and thus the phenomenon.
Flusser (1991) describes gestures as having convention and structure, but
as different from reflexes, because translating a moment of freedom.
The notion of gesture indicates that data does not speak for itself, we
approach it with particular epistemic techniques (methods) related to a
sense of purpose, a "will to know" (Foucault 1976).
Analytical gestures develop from the tension between a "research
purpose" (question, exploration, etc.) and the available data:
The technical dimension of data (via platform, API, extractions software):
☉ Available units, variables, etc.
☉ Temporal coverage, completeness, perceptivity, etc.
☉ Technical formats, available "views", etc.
The semantic dimension of data (aspects of practice):
☉ Demographic (age, sex, income, etc.)
☉ Post-demographic (tastes, preferences, etc.)
☉ Behavioral (trajectories, interaction, etc.)
☉ Expressive (messages, comments, etc.)
☉ Technical (informing on the platform's functioning)
1 / Introduction - d / the analysis software
Observed: objects and properties ("cases")
Data representation: the table
Visual representation: quantity charts
Inferred: relations between properties
Grouping: class (similar properties)
Graph theory
Observed: objects and relations
Data representation: the adjacency matrix
Visual representation: network diagrams
Inferred: structure of relations between objects
Grouping: clique (dense relations)
1 / Introduction - d / the analysis software
Quetelet 1827, Galton 1885, Pearson 1901
Regression, PCA, etc. are potentially useful.
1 / Introduction - d / the analysis software
Entities seem straightforward because data is well structured, but
variations in scale and practice require being careful.
Descriptive statistics for social media often profit from attention to the form of a distribution;
visualization, multi-point summaries, and metrics like kurtosis or skewness are very useful.
1 / Introduction - d / the analysis software
1 / Introduction - d / the analysis software
Moreno 1934, Forsythe and Katz 1946
Graph theory, "a mathematical model for any
system involving a binary relation" (Harary 1969)
Three different force-based layouts of my FB profile
OpenOrd, ForceAtlas, Fruchterman-Reingold
Non force-based layouts
Circle diagram, parallel bubble lines, arc diagram
Nine measures of centrality (Freeman 1979)
Network statistics (e.g.
degrees, distances, density,
etc.) can help describing and
comparing networks.
Graph theory also provides
many mathematical tools to
derive metrics from the
structure of a network (e.g.
"centrality", "influence",
"authority", etc.), to identify
groupings, etc.
"Facebook Likes can be used to automatically and
accurately predict a range of highly sensitive
personal attributes including: sexual orientation,
ethnicity, religious and political views,
personality traits, intelligence, happiness, use of
addictive substances, parental separation, age,
and gender." (Kosinski, Stillwell, Graepel 2013)
There are many new(ish) techniques
coming from computer science for
automatic classification, prediction,
sentiment analysis, etc.
1 / Introduction - d / the analysis software
1 / Introduction - conclusion
Four layers of technical mediation to take into account: the platform itself,
the API, the extraction software, the analytical techniques.
To do productive work, attention to these four layers needs to be
combined with theoretical resources and case knowledge.
Bringing this together requires iteration and flexibility; it's “detective work
– numerical detective work – or counting detective work – or graphical
detective work” (Tukey, 1977).
2 / Examples - a / Facebook
Facebook is the largest social media platform with 1.5B monthly active
users. It incorporates networked communication (friend-to-friend), group
communication (Facebook Groups), and "mass" communication (Facebook
A lot of analytical possibilities disappeared in April 2015 due to a
comprehensive push for more privacy; open FB Groups and FB Pages are
now the main entryways.
Extraction tool used: Netvizz (Rieder 2013)
Main example: Kullena Khaled Said Page (Rieder et al. 2015)
FB Pages allow for retrieval
of historical data without
time limit.
14K posts, 1.9M active
users, 6.8M comments
(99.9% Arabic), 32M likes
Kullena Khaled Said was
created in June 2010 by
Wael Ghonim after Khaled
Said was beaten to death
by Egyptian police.
comment like share
count yes yes yes
individual user list yes yes no
time-stamp yes no no
There is a lot of material for
analysis, but these numbers
need extensive data critique.
Data quality is high but the
platform is complex and
changing over time.
Is the linked content part
of the data?
These elements can drown
in a large data set and
skew it.
The quantitative is full of
qualitative considerations.
Kullena Khaled Said, June 2010 – July 2013 posts per
comment (timescatter)
Kullena Khaled Said, June 2010 – July 2013 posts per
comment (timescatter), y-scale log10
Kullena Khaled Said, June 2010 – July 2013 page
posts (n=14,072) by type, per month
Kullena Khaled Said, June 2010 – July 2013
Overview statistics
Kullena Khaled Said, June 2010 – July 2013
Comment speed
Kullena Khaled Said, June 2010 – July 2013
Comment length in characters
Kullena Khaled Said, June 2010 – July 2013
Rank-size distribution of ranked users (n = 1.9M) and likes/comments
"Distant reading" 1:
Tag cloud tool for
comments on a post
Distant reading 2:
The comment search tool
allows for exploration of
comment contents.
Manual translation: we used
quantitative indicators to
select posts and comments
for qualitative analysis
Bipartite comment network
June 2010 – July 2013
Nodes: posts (date: heat scale) / users (grey)
Edges: commenting (invisible)
Bipartite comment network
June 2010 – July 2013
Nodes: users (degree: heat scale)
Edges: commenting (invisible)
SIOTW Page Network, from DMI
project on right-wing extremism
and anti-Islamism
FB like network, seed: SIOW, depth: 2,
size: in-degree, color: modularity
FB like network, seed: SIOW, depth: 2,
size: in-degree, color (heat): PageRank
FB like network, seed: SIOW, depth: 2,
size: in-degree, color: modularity
FB like network, seed: SIOW, depth: 2,
size: in-degree, color: modularity
2 / Examples – a / Facebook
For Kullena Khaled Said, we were not only able to confirm the importance
of the page for the Egyptian revolution, but gain a much better
understanding of the dynamics of "connective action" (Bennett &
Segerberg) and what we called "connective leadership".
For the SIOTW network of self-declared affiliations, we were able to
nuance the complicated and skewed relationship between right-wing anti-
Islamism and Israeli actors and institutions.
While API-based research into private relations and interactions on
Facebook has become practically impossible, there are many opportunities
for investigating public (Pages) and semi-public (Groups) settings.
2 / Examples – b / Twitter
While Twitter has fewer users than Facebook (320M MAU), it is used a lot
in the context of media debate, political conversation, and activism.
Twitter has very few privacy limitations, but data needs to be captured in
real time. To access the archive, one has to pay. But there is a 1% sample.
Extraction tool used: DMI-TCAT (Borra & Rieder 2014)
Main example: #gamergate
#gamergate project preliminary exploration:
is it about "ethics in game journalism" or a
neo-conservative hate movement?
There are counts everywhere,
but anything here can be
exploited for analysis.
Because of temporal limitations,
Twitter analysis means creating
databases of collected tweets.
DMI-TCAT, analysis interface
#gamergate in September 2015
DMI-TCAT allows tracking keywords,
user accounts, and the 1% sample.
DMI-TCAT, analysis interface
#gamergate in September 2015
Medium specificity: legal elements
Medium specificity: technical and functional elements
DMI-TCAT & gephi, #gamergate
in September 2015
Top 5000 user network
DMI-TCAT & gephi, #gamergate
in September 2015
Top 5000 users mention stats:
Mean: 89
Median: 8
p90: 124 / p95: 279 / p99: 1943
DMI-TCAT & gephi, #gamergate
in September 2015
Top 5000 user network:
Avg. degree: 33
Avg. weighted degree: 67.3
Avg. path length: 2.97
DMI-TCAT & gephi, #gamergate
in September 2015
Co-hashtag analysis, size:
frequency, color: degree
DMI-TCAT & gephi, #gamergate
in September 2015
Co-hashtag analysis, size:
frequency, color: user diversity
DMI-TCAT (cascade interface), x: time, y: user account
point: tweet, arc: retweet, bots in red
Associational profile around
#feminism in #gamergate dataset
2 / Examples – b / Twitter
Twitter is a very open platform, the main problem is the requirement to
anticipate or react quickly since historical tweets are costly.
Since tweets can be easily sent by bots and automators, we have to be
very careful with metrics and always check from a number of different
For #gamergate, first findings show a very densely connected community
organized around a group of highly active and visible accounts.
Hashtag use (discounting bots) is dominated by outrage against perceived
"minority favoritism", "social justice warriors", and anti-abuse measures;
"ethics in journalism" is not prominent at all.
2 / Examples - c / YouTube
YouTube is maybe the most understudied (witch digital methods) of the
large social media platforms (1B+ users).
YouTube is probably the most open social media platform, with very few
limitations on the API level.
YouTube Data Tools (YTDT), a new tool, is an attempt to facilitate data-
driven research.
YouTube Data Tools
Extracts Data from YouTube
YouTube Data Tools
Channel Network uses data from the
"Featured Channels", which allows for
self-affiliation with other channels.
Gamergate channel network, via YouTube
channel search, depth: 1;
Size: subscriber count / Color: seed or not
Gamergate channel network, via YouTube
channel search, depth: 1;
Size: subscriber count / Color: in-degree
Gamergate channel network, via YouTube
channel search, depth: 1;
Size: subscriber count / Color: betweenness
3 / Conclusions
Social media analysis with digital methods relies on the "natively digital
objects" (Rogers 2013) that platforms are built around; technical
mediation intervenes in all stages of the research process.
Despite the promise of easy access to well-structured data, there are
considerable difficulties and limitations.
Digital methods is not a one-click type of research, but requires
considerable time and critical interrogation to produce robust results:
which objects to take into account, how to create a sample / collection,
how to analyze it, how to interpret, how to make findings.
3 / Conclusions
In order to deal with big and complex datasets, we need exploratory
approaches that combine micro/macro and qualitative/quantitative in
various ways:
☉ Investigate the platform in detail to account for technical pitfalls.
☉ Qualify quantities.
☉ Gain a sense of practices to orient quantitative methods.
☉ Use quantitative indicators to decide on qualitative focus.
☉ Read content to understand outliers.
☉ Make explicit plausibility tests based on reading.
☉ Interpret the small in relation to the large and the other way round.
Because n=all these articulations have become much more feasible.
Every analytical gesture shows different things, combination completes the
picture. We need "flexibility of attack, willingness to iterate" (Tukey 1962).
3 / Conclusions
There is a lot of excitement about social media data analysis, but our
techniques are often still experimental and far from standardized.
We need interrogation and critiques of methodology that are developed
from engagement and historical / conceptual investigation.
We need analytical gestures that are more closely tied to concepts from
the humanities and social sciences.
Visualization and simple tools are very interesting, but require technical
and conceptual literacy to deliver more than (deceptive) illustrations.
3 / Conclusions
Data analysis for social media requires (in my view):
☉ Robust understanding of the social media platform;
☉ A sense of purpose;
☉ Conceptual understanding of methods and analytical gestures;
☉ Knowledge of software tools for data analysis;
☉ Considerable domain expertise;
If you think that these approaches can be interesting for your research, I
would recommend to simply try out some of the tools to get a first-hand
Thank You!
All mentioned data extraction tools are freely available via and

Contenu connexe


The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data VisualizationCenterline Digital
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data VisualizationStephen Tracy
Journey for a data driven organization
Journey for a data driven organizationJourney for a data driven organization
Journey for a data driven organizationDr. Jimmy Schwarzkopf
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Amanda Makulec
Storytelling with Data - Approach | Skills
Storytelling with Data - Approach | SkillsStorytelling with Data - Approach | Skills
Storytelling with Data - Approach | SkillsAmit Kapoor
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization Ana Jofre
Data Analaytics.04. Data visualization
Data Analaytics.04. Data visualizationData Analaytics.04. Data visualization
Data Analaytics.04. Data visualizationAlex Rayón Jerez
Data Visualization
Data VisualizationData Visualization
Data Visualizationsimonwandrew
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualizationZach Gemignani
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Eugene O'Loughlin
Data Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your dataData Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your dataBright North
Identifying Your Audience
Identifying Your AudienceIdentifying Your Audience
Identifying Your AudienceAmanda Makulec

Tendances (20)

The Importance of Data Visualization
The Importance of Data VisualizationThe Importance of Data Visualization
The Importance of Data Visualization
Introduction to Data Visualization
Introduction to Data VisualizationIntroduction to Data Visualization
Introduction to Data Visualization
Data Visualization
Data VisualizationData Visualization
Data Visualization
Data analytics
Data analyticsData analytics
Data analytics
Journey for a data driven organization
Journey for a data driven organizationJourney for a data driven organization
Journey for a data driven organization
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization: Impact, Intrigue, Value Add for APLIC 2014
Data Visualization
Data VisualizationData Visualization
Data Visualization
Data science - An Introduction
Data science - An IntroductionData science - An Introduction
Data science - An Introduction
Storytelling with Data - Approach | Skills
Storytelling with Data - Approach | SkillsStorytelling with Data - Approach | Skills
Storytelling with Data - Approach | Skills
Introduction to Data Visualization
Introduction to Data Visualization Introduction to Data Visualization
Introduction to Data Visualization
Data Analaytics.04. Data visualization
Data Analaytics.04. Data visualizationData Analaytics.04. Data visualization
Data Analaytics.04. Data visualization
Data visualization
Data visualizationData visualization
Data visualization
Data Visualization
Data VisualizationData Visualization
Data Visualization
Data literacy
Data literacyData literacy
Data literacy
Brief introduction to data visualization
Brief introduction to data visualizationBrief introduction to data visualization
Brief introduction to data visualization
Data analytics
Data analyticsData analytics
Data analytics
Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17Data Visualization - What can you see? #baai17
Data Visualization - What can you see? #baai17
Data Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your dataData Storytelling: The only way to unlock true insight from your data
Data Storytelling: The only way to unlock true insight from your data
Visual analytics
Visual analyticsVisual analytics
Visual analytics
Identifying Your Audience
Identifying Your AudienceIdentifying Your Audience
Identifying Your Audience

Similaire à Analyzing Social Media with Digital Methods. Possibilities, Requirements, and Limitations

Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.Bernhard Rieder
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...Piet J.H. Daas
De- and Reassembling Data Infrastructures
De- and Reassembling Data InfrastructuresDe- and Reassembling Data Infrastructures
De- and Reassembling Data Infrastructurescgrltz
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfDr. Radhey Shyam
Leveraging Flat Files from the Canvas LMS Data Portal at K-State
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateLeveraging Flat Files from the Canvas LMS Data Portal at K-State
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateShalin Hai-Jew
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
Extracting Social Network Data and Multimedia Communications from Social Medi...
Extracting Social Network Data and Multimedia Communications from Social Medi...Extracting Social Network Data and Multimedia Communications from Social Medi...
Extracting Social Network Data and Multimedia Communications from Social Medi...Shalin Hai-Jew
Modelling the Media Logic of Software Systems
Modelling the Media Logic of Software SystemsModelling the Media Logic of Software Systems
Modelling the Media Logic of Software SystemsJan Schmidt
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleDr. Radhey Shyam
DGSB Domain Structure samos2020summit
DGSB Domain Structure samos2020summitDGSB Domain Structure samos2020summit
DGSB Domain Structure samos2020summitsamossummit
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfDr. Radhey Shyam
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningStefan Dietze
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overviewSoojung Hong
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesAlvaro Graves
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?Elena Simperl
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Research Data Alliance
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPCGenoveva Vargas-Solar
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterDiana Zajac

Similaire à Analyzing Social Media with Digital Methods. Possibilities, Requirements, and Limitations (20)

Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Engines of Order. Social Media and the Rise of Algorithmic Knowing.
Opportunities and methodological challenges of Big Data for official statist...
Opportunities and methodological challenges of  Big Data for official statist...Opportunities and methodological challenges of  Big Data for official statist...
Opportunities and methodological challenges of Big Data for official statist...
De- and Reassembling Data Infrastructures
De- and Reassembling Data InfrastructuresDe- and Reassembling Data Infrastructures
De- and Reassembling Data Infrastructures
KIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdfKIT-601 Lecture Notes-UNIT-1.pdf
KIT-601 Lecture Notes-UNIT-1.pdf
Leveraging Flat Files from the Canvas LMS Data Portal at K-State
Leveraging Flat Files from the Canvas LMS Data Portal at K-StateLeveraging Flat Files from the Canvas LMS Data Portal at K-State
Leveraging Flat Files from the Canvas LMS Data Portal at K-State
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
Extracting Social Network Data and Multimedia Communications from Social Medi...
Extracting Social Network Data and Multimedia Communications from Social Medi...Extracting Social Network Data and Multimedia Communications from Social Medi...
Extracting Social Network Data and Multimedia Communications from Social Medi...
Modelling the Media Logic of Software Systems
Modelling the Media Logic of Software SystemsModelling the Media Logic of Software Systems
Modelling the Media Logic of Software Systems
Introduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycleIntroduction to Data Analytics and data analytics life cycle
Introduction to Data Analytics and data analytics life cycle
DGSB Domain Structure samos2020summit
DGSB Domain Structure samos2020summitDGSB Domain Structure samos2020summit
DGSB Domain Structure samos2020summit
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdfKIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
KIT-601-L-UNIT-1 (Revised) Introduction to Data Analytcs.pdf
Big Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday LearningBig Data in Learning Analytics - Analytics for Everyday Learning
Big Data in Learning Analytics - Analytics for Everyday Learning
Data science technology overview
Data science technology overviewData science technology overview
Data science technology overview
Integrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologiesIntegrating and publishing public safety data using semantic technologies
Integrating and publishing public safety data using semantic technologies
The web of data: how are we doing so far?
The web of data: how are we doing so far?The web of data: how are we doing so far?
The web of data: how are we doing so far?
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Towards a Community-driven Data Science Body of Knowledge – Data Management S...
Moving forward data centric sciences weaving AI, Big Data & HPC
Moving forward data centric sciences  weaving AI, Big Data & HPCMoving forward data centric sciences  weaving AI, Big Data & HPC
Moving forward data centric sciences weaving AI, Big Data & HPC
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
THIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 PosterTHIC MedIX Summer 2015 Poster
THIC MedIX Summer 2015 Poster

Plus de Bernhard Rieder

From Algorithms to Diagrams: How to Study Platforms?
From Algorithms to Diagrams: How to Study Platforms?From Algorithms to Diagrams: How to Study Platforms?
From Algorithms to Diagrams: How to Study Platforms?Bernhard Rieder
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?Bernhard Rieder
Truth, Justice, and Technicity: from Bias to the Politics of Systems
Truth, Justice, and Technicity: from Bias to the Politics of SystemsTruth, Justice, and Technicity: from Bias to the Politics of Systems
Truth, Justice, and Technicity: from Bias to the Politics of SystemsBernhard Rieder
On Digital Markets, Data, and Concentric Diversification
On Digital Markets, Data, and Concentric DiversificationOn Digital Markets, Data, and Concentric Diversification
On Digital Markets, Data, and Concentric DiversificationBernhard Rieder
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...On the Diversity of the Accountability Problem. Machine Learning and Knowing ...
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...Bernhard Rieder
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% SampleTweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% SampleBernhard Rieder
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...Bernhard Rieder
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical GesturesBernhard Rieder
Interactive visualization and exploration of network data with gephi
Interactive visualization and exploration of network data with gephiInteractive visualization and exploration of network data with gephi
Interactive visualization and exploration of network data with gephiBernhard Rieder
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingBernhard Rieder
ORDER BY column_name: The Relational Database as Pervasive Cultural Form
ORDER BY column_name: The Relational Database as Pervasive Cultural FormORDER BY column_name: The Relational Database as Pervasive Cultural Form
ORDER BY column_name: The Relational Database as Pervasive Cultural FormBernhard Rieder
Hyperurbain.2 - Atelier Google Maps
Hyperurbain.2 - Atelier Google MapsHyperurbain.2 - Atelier Google Maps
Hyperurbain.2 - Atelier Google MapsBernhard Rieder

Plus de Bernhard Rieder (12)

From Algorithms to Diagrams: How to Study Platforms?
From Algorithms to Diagrams: How to Study Platforms?From Algorithms to Diagrams: How to Study Platforms?
From Algorithms to Diagrams: How to Study Platforms?
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?
De l’algorithme au diagramme: comment étudier l’objet « plateforme » ?
Truth, Justice, and Technicity: from Bias to the Politics of Systems
Truth, Justice, and Technicity: from Bias to the Politics of SystemsTruth, Justice, and Technicity: from Bias to the Politics of Systems
Truth, Justice, and Technicity: from Bias to the Politics of Systems
On Digital Markets, Data, and Concentric Diversification
On Digital Markets, Data, and Concentric DiversificationOn Digital Markets, Data, and Concentric Diversification
On Digital Markets, Data, and Concentric Diversification
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...On the Diversity of the Accountability Problem. Machine Learning and Knowing ...
On the Diversity of the Accountability Problem. Machine Learning and Knowing ...
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% SampleTweets are Not Created Equal. Intersecting Devices in the 1% Sample
Tweets are Not Created Equal. Intersecting Devices in the 1% Sample
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...
Digitale Methoden und soziale Netzwerkplattformen. Zwischen Mediumspezifizitä...
Platforms and Analytical Gestures
Platforms and Analytical GesturesPlatforms and Analytical Gestures
Platforms and Analytical Gestures
Interactive visualization and exploration of network data with gephi
Interactive visualization and exploration of network data with gephiInteractive visualization and exploration of network data with gephi
Interactive visualization and exploration of network data with gephi
Figures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative ThinkingFigures of the Many - Quantitative Concepts for Qualitative Thinking
Figures of the Many - Quantitative Concepts for Qualitative Thinking
ORDER BY column_name: The Relational Database as Pervasive Cultural Form
ORDER BY column_name: The Relational Database as Pervasive Cultural FormORDER BY column_name: The Relational Database as Pervasive Cultural Form
ORDER BY column_name: The Relational Database as Pervasive Cultural Form
Hyperurbain.2 - Atelier Google Maps
Hyperurbain.2 - Atelier Google MapsHyperurbain.2 - Atelier Google Maps
Hyperurbain.2 - Atelier Google Maps


Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxMaryGraceBautista27
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Celine George
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...JhezDiaz1
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu

Dernier (20)

Science 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptxScience 7 Quarter 4 Module 2: Natural Resources.pptx
Science 7 Quarter 4 Module 2: Natural Resources.pptx
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17Computed Fields and api Depends in the Odoo 17
Computed Fields and api Depends in the Odoo 17
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
ENGLISH 7_Q4_LESSON 2_ Employing a Variety of Strategies for Effective Interp...
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners

Analyzing Social Media with Digital Methods. Possibilities, Requirements, and Limitations

  • 1. Analyzing Social Media with Digital Methods Possibilities, Requirements, and Limitations Bernhard Rieder Universiteit van Amsterdam Mediastudies Department
  • 2. The starting point Social media are playing important roles in contemporary society, from the very personal to the very public. Many disciplines have begun to study social media, applying various methodologies (ethnography, questionnaires, etc.), but there is an explosion in data-driven research that relies on the computational analysis of data gleaned from social media platforms. The promise is (cheap and detailed) access to what people do, not what they say they do; to their behavior, exchange, ideas, and sentiments.
  • 3. This presentation This talk introduces social media analysis using digital methods from a theoretically involved yet "practical" perspective. Instead of laying out an overarching "logic" of social media data analysis, I focus on the basic setup and the rich reservoir of analytical gestures that constitute the practice of data analysis. 1 / A (long) introduction 2 / Three examples covering Facebook, Twitter, and YouTube 3 / Some conclusions and recommendations
  • 4. 1 / Introduction Social media services host an increasing number of relevant phenomena, including everyday practices, political presentation and debate, social and political activism, disaster communication, etc. A number of preliminary remarks: ☉ The phenomena one is interested in may not happen or resonate on social media; many things happen elsewhere. ☉ Even if one's research focus is on social media, one may not get the data. ☉ One requires a least some technical competence and the willingness to confront and learn about a number of technical matters. ☉ Every social media "platform" (Gillespie 2013) is different and requires a different approach; cf. "medium-specificity".
  • 5. 1 / Introduction Hypothetico-deductive approaches are certainly possible, but this presentation espouses inductive "exploratory data analysis" (Tukey 1962) that emphasizes iteration, methodological flexibility, adjustment of questions, and "grounded theory" (Glaser & Strauss 1965). "Far better an approximate answer to the right question, which is often vague, than an exact answer to the wrong question, which can always be made precise. Data analysis must progress by approximate answers, at best, since its knowledge of what the problem really is will at best be approximate." (Tukey 1962)
  • 6. 1 / Introduction How does social media analysis with digital methods work? social media platform e.g. Twitter, Facebook users communicate, interact, express, publish, etc. through "grammars of action" (forms and functions) rendered in software API technical interface to the data, defined in technical, legal, and logistical terms extraction software e.g. DMI-TCAT, Netvizz makes calls to API, creates "views" by combing data into specific sets or metrics, produces outputs provides visual or textual representation of view, e.g. an interactive chart data in standard file format, e.g. CSV allows analyzing files in various ways, e.g. statistics, graph theory output type 1: widget output type 2: file analysis software, e.g. Excel, gephi layers of technical mediation that one might want to think about 1 2 3 4
  • 7. 1 / Introduction - a / the platform Social media services channel communication, interaction, etc. through "grammars of action" (forms and functions) rendered in software; users appropriate these affordances. Every service is different. Every service changes over time, both in terms of technology and user practices. Homogeneous interfaces do not mean homogeneous practices. Platforms strive to capture large audiences and leave important margins to users.
  • 8. Social media platforms are organized around instances of predefined types of entities (users, messages, hashtags, posts, etc.) and connections between them. They formalize and channel expression, exchange, and coordination and data fields are closely related to these formalizations.
  • 9. Data fields mirror forms and functions of the platform.
  • 10. Social media are different from the "open" Web because most data is formalized in fields and a "semantic data model". The more detailed the formalization, the more salient the data. Social media platforms are essentially large databases. 1 / Introduction - a / the platform
  • 11. Very large numbers and variety in users, contents, purposes, arrangements, etc.
  • 12. Social media are built around simple point-to-point principles; this allows for a variety of configurations to emerge over time. Every account is the same, but there are vast differences in scale. We need to begin with technical fieldwork and conceptualization of the platform.
  • 13. 1 / Introduction - b / the APIs There are two possibilities to collect data automatically from social media platforms: scraping the user interface or collecting via specified application programming interfaces (APIs). APIs specify (technically, legally, logistically): ☉ What data can be retrieved (certain fields may be inaccessible or incomplete); ☉ How much data can be retrieved (all APIs have rate limits); ☉ The span of coverage (temporal limitations apply often); ☉ The perceptivity of coverage (privacy or personalization can skew access); For example, Facebook (currently) provides these variables for each post: comment like share count yes yes yes individual user list yes yes no time-stamp yes no no
  • 14. Social media users produce detailed data traces; data pools in social media are centralized and retrievable. Structure of APIs is closely related to given formalizations. In order to select, process, and interpret data we need to understand the platform: entities, relations, modes of aggregation, metrics, etc. Every platform is different and we thus need medium-specific data analysis.
  • 15. 1 / Introduction - c / the extraction software Extraction software are the programs that connect to the APIs, retrieve data, and produce specific outputs. Can range from custom-written scripts to one-click visualization widgets. These programs work with API data, but add their own "epistemological twist", i.e. produce particular views on the data. Sampling is often difficult, therefore n = all is the norm. Extraction software can be very simple and completely free or have steep technical, logistical, and financial requirements.
  • 16. Example for a widget: Hashtagify
  • 17. Example for a commercial service: Topsy
  • 18. Example for a on open source analytics suite: DMI-TCAT
  • 19. Example for a on open source analytics suite: DMI-TCAT
  • 20. There are many different tools out there, with different conceptual underpinnings, ease of use, depth, etc. Data analysis (statistics): Excel, SPSS, Tableau, Wizard, Mondrian, … Data analysis (graph): Gephi, NodeXL, Pajek, … Data analysis (other): Rapidminer, SentiStrength, Wordij, … Data analysis (custom): R, Python (NLTK, NumPy & SciPy), … This presentation relies mostly on R (R Core Team 2014) and Gephi (Bastian, Heymann, Jacomy 2009). 1 / Introduction - d / the analysis software
  • 21. 1 / Introduction - d / the analysis software Analysis software provide analytical gestures to apply to the data; may be integrated into the extraction software or not. We investigate the structure of data by creating "views" of the data. Analytical gestures produce orderings, lists, tables, charts, coefficients etc. that are saying something about the data and thus the phenomenon. Flusser (1991) describes gestures as having convention and structure, but as different from reflexes, because translating a moment of freedom. The notion of gesture indicates that data does not speak for itself, we approach it with particular epistemic techniques (methods) related to a sense of purpose, a "will to know" (Foucault 1976).
  • 22. Analytical gestures develop from the tension between a "research purpose" (question, exploration, etc.) and the available data: The technical dimension of data (via platform, API, extractions software): ☉ Available units, variables, etc. ☉ Temporal coverage, completeness, perceptivity, etc. ☉ Technical formats, available "views", etc. The semantic dimension of data (aspects of practice): ☉ Demographic (age, sex, income, etc.) ☉ Post-demographic (tastes, preferences, etc.) ☉ Behavioral (trajectories, interaction, etc.) ☉ Expressive (messages, comments, etc.) ☉ Technical (informing on the platform's functioning) 1 / Introduction - d / the analysis software
  • 23. Statistics Observed: objects and properties ("cases") Data representation: the table Visual representation: quantity charts Inferred: relations between properties Grouping: class (similar properties) Graph theory Observed: objects and relations Data representation: the adjacency matrix Visual representation: network diagrams Inferred: structure of relations between objects Grouping: clique (dense relations) 1 / Introduction - d / the analysis software
  • 24. Quetelet 1827, Galton 1885, Pearson 1901 Regression, PCA, etc. are potentially useful. 1 / Introduction - d / the analysis software
  • 25. Entities seem straightforward because data is well structured, but variations in scale and practice require being careful. Descriptive statistics for social media often profit from attention to the form of a distribution; visualization, multi-point summaries, and metrics like kurtosis or skewness are very useful. 1 / Introduction - d / the analysis software
  • 26. 1 / Introduction - d / the analysis software Moreno 1934, Forsythe and Katz 1946 Graph theory, "a mathematical model for any system involving a binary relation" (Harary 1969)
  • 27. Three different force-based layouts of my FB profile OpenOrd, ForceAtlas, Fruchterman-Reingold
  • 28. Non force-based layouts Circle diagram, parallel bubble lines, arc diagram
  • 29. Nine measures of centrality (Freeman 1979) Network statistics (e.g. degrees, distances, density, etc.) can help describing and comparing networks. Graph theory also provides many mathematical tools to derive metrics from the structure of a network (e.g. "centrality", "influence", "authority", etc.), to identify groupings, etc.
  • 30. "Facebook Likes can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender." (Kosinski, Stillwell, Graepel 2013) There are many new(ish) techniques coming from computer science for automatic classification, prediction, sentiment analysis, etc. 1 / Introduction - d / the analysis software
  • 31. 1 / Introduction - conclusion Four layers of technical mediation to take into account: the platform itself, the API, the extraction software, the analytical techniques. To do productive work, attention to these four layers needs to be combined with theoretical resources and case knowledge. Bringing this together requires iteration and flexibility; it's “detective work – numerical detective work – or counting detective work – or graphical detective work” (Tukey, 1977).
  • 32. 2 / Examples - a / Facebook Facebook is the largest social media platform with 1.5B monthly active users. It incorporates networked communication (friend-to-friend), group communication (Facebook Groups), and "mass" communication (Facebook Pages). A lot of analytical possibilities disappeared in April 2015 due to a comprehensive push for more privacy; open FB Groups and FB Pages are now the main entryways. Extraction tool used: Netvizz (Rieder 2013) Main example: Kullena Khaled Said Page (Rieder et al. 2015)
  • 33. FB Pages allow for retrieval of historical data without time limit. 14K posts, 1.9M active users, 6.8M comments (99.9% Arabic), 32M likes Kullena Khaled Said was created in June 2010 by Wael Ghonim after Khaled Said was beaten to death by Egyptian police.
  • 34. comment like share count yes yes yes individual user list yes yes no time-stamp yes no no There is a lot of material for analysis, but these numbers need extensive data critique.
  • 35. Data quality is high but the platform is complex and changing over time. Is the linked content part of the data? These elements can drown in a large data set and skew it. The quantitative is full of qualitative considerations.
  • 37. Kullena Khaled Said, June 2010 – July 2013 posts per comment (timescatter), y-scale log10 10 1000 2010−06−10 2011−01−01 2011−01−25 2012−01−01 2012−01−25 2013−01−01 2013−01−25 2013−07−03 date comments_count_fb type link music photo question status video
  • 39. Kullena Khaled Said, June 2010 – July 2013 Overview statistics
  • 40. Kullena Khaled Said, June 2010 – July 2013 Comment speed
  • 41. Kullena Khaled Said, June 2010 – July 2013 Comment length in characters
  • 42. Kullena Khaled Said, June 2010 – July 2013 Rank-size distribution of ranked users (n = 1.9M) and likes/comments
  • 43. "Distant reading" 1: Tag cloud tool for comments on a post
  • 44. Distant reading 2: The comment search tool allows for exploration of comment contents. corruptiontorture
  • 45. Manual translation: we used quantitative indicators to select posts and comments for qualitative analysis
  • 46. Bipartite comment network June 2010 – July 2013 Nodes: posts (date: heat scale) / users (grey) Edges: commenting (invisible)
  • 47. Bipartite comment network June 2010 – July 2013 Nodes: users (degree: heat scale) Edges: commenting (invisible)
  • 48. SIOTW Page Network, from DMI project on right-wing extremism and anti-Islamism
  • 49. FB like network, seed: SIOW, depth: 2, size: in-degree, color: modularity
  • 50. FB like network, seed: SIOW, depth: 2, size: in-degree, color (heat): PageRank
  • 51. FB like network, seed: SIOW, depth: 2, size: in-degree, color: modularity
  • 52. FB like network, seed: SIOW, depth: 2, size: in-degree, color: modularity
  • 53. 2 / Examples – a / Facebook For Kullena Khaled Said, we were not only able to confirm the importance of the page for the Egyptian revolution, but gain a much better understanding of the dynamics of "connective action" (Bennett & Segerberg) and what we called "connective leadership". For the SIOTW network of self-declared affiliations, we were able to nuance the complicated and skewed relationship between right-wing anti- Islamism and Israeli actors and institutions. While API-based research into private relations and interactions on Facebook has become practically impossible, there are many opportunities for investigating public (Pages) and semi-public (Groups) settings.
  • 54. 2 / Examples – b / Twitter While Twitter has fewer users than Facebook (320M MAU), it is used a lot in the context of media debate, political conversation, and activism. Twitter has very few privacy limitations, but data needs to be captured in real time. To access the archive, one has to pay. But there is a 1% sample. Extraction tool used: DMI-TCAT (Borra & Rieder 2014) Main example: #gamergate
  • 55. #gamergate project preliminary exploration: is it about "ethics in game journalism" or a neo-conservative hate movement?
  • 56. There are counts everywhere, but anything here can be exploited for analysis. Because of temporal limitations, Twitter analysis means creating databases of collected tweets.
  • 57. DMI-TCAT, analysis interface #gamergate in September 2015 DMI-TCAT allows tracking keywords, user accounts, and the 1% sample.
  • 59.
  • 60. Medium specificity: legal elements Medium specificity: technical and functional elements
  • 61.
  • 62. DMI-TCAT & gephi, #gamergate in September 2015 Top 5000 user network
  • 63. DMI-TCAT & gephi, #gamergate in September 2015 Top 5000 users mention stats: Mean: 89 Median: 8 p90: 124 / p95: 279 / p99: 1943
  • 64. DMI-TCAT & gephi, #gamergate in September 2015 Top 5000 user network: Avg. degree: 33 Avg. weighted degree: 67.3 Avg. path length: 2.97
  • 65. DMI-TCAT & gephi, #gamergate in September 2015 Co-hashtag analysis, size: frequency, color: degree
  • 66. DMI-TCAT & gephi, #gamergate in September 2015 Co-hashtag analysis, size: frequency, color: user diversity
  • 67.
  • 68.
  • 69. DMI-TCAT (cascade interface), x: time, y: user account point: tweet, arc: retweet, bots in red
  • 70. Associational profile around #feminism in #gamergate dataset
  • 71. 2 / Examples – b / Twitter Twitter is a very open platform, the main problem is the requirement to anticipate or react quickly since historical tweets are costly. Since tweets can be easily sent by bots and automators, we have to be very careful with metrics and always check from a number of different perspectives. For #gamergate, first findings show a very densely connected community organized around a group of highly active and visible accounts. Hashtag use (discounting bots) is dominated by outrage against perceived "minority favoritism", "social justice warriors", and anti-abuse measures; "ethics in journalism" is not prominent at all.
  • 72. 2 / Examples - c / YouTube YouTube is maybe the most understudied (witch digital methods) of the large social media platforms (1B+ users). YouTube is probably the most open social media platform, with very few limitations on the API level. YouTube Data Tools (YTDT), a new tool, is an attempt to facilitate data- driven research.
  • 73. YouTube Data Tools Extracts Data from YouTube
  • 74. YouTube Data Tools Channel Network uses data from the "Featured Channels", which allows for self-affiliation with other channels.
  • 75. Gamergate channel network, via YouTube channel search, depth: 1; Size: subscriber count / Color: seed or not
  • 76. Gamergate channel network, via YouTube channel search, depth: 1; Size: subscriber count / Color: in-degree
  • 77. Gamergate channel network, via YouTube channel search, depth: 1; Size: subscriber count / Color: betweenness
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83. 3 / Conclusions Social media analysis with digital methods relies on the "natively digital objects" (Rogers 2013) that platforms are built around; technical mediation intervenes in all stages of the research process. Despite the promise of easy access to well-structured data, there are considerable difficulties and limitations. Digital methods is not a one-click type of research, but requires considerable time and critical interrogation to produce robust results: which objects to take into account, how to create a sample / collection, how to analyze it, how to interpret, how to make findings.
  • 84. 3 / Conclusions In order to deal with big and complex datasets, we need exploratory approaches that combine micro/macro and qualitative/quantitative in various ways: ☉ Investigate the platform in detail to account for technical pitfalls. ☉ Qualify quantities. ☉ Gain a sense of practices to orient quantitative methods. ☉ Use quantitative indicators to decide on qualitative focus. ☉ Read content to understand outliers. ☉ Make explicit plausibility tests based on reading. ☉ Interpret the small in relation to the large and the other way round. Because n=all these articulations have become much more feasible. Every analytical gesture shows different things, combination completes the picture. We need "flexibility of attack, willingness to iterate" (Tukey 1962).
  • 85. 3 / Conclusions There is a lot of excitement about social media data analysis, but our techniques are often still experimental and far from standardized. We need interrogation and critiques of methodology that are developed from engagement and historical / conceptual investigation. We need analytical gestures that are more closely tied to concepts from the humanities and social sciences. Visualization and simple tools are very interesting, but require technical and conceptual literacy to deliver more than (deceptive) illustrations.
  • 86. 3 / Conclusions Data analysis for social media requires (in my view): ☉ Robust understanding of the social media platform; ☉ A sense of purpose; ☉ Conceptual understanding of methods and analytical gestures; ☉ Knowledge of software tools for data analysis; ☉ Considerable domain expertise; If you think that these approaches can be interesting for your research, I would recommend to simply try out some of the tools to get a first-hand impression.
  • 87. Thank You! @RiederB All mentioned data extraction tools are freely available via and

Notes de l'éditeur

  1. Data can be thought of as a kind of "observation" rather than survey-based research.
  2. This is what you can do with a tweet. /
  3. People do a lot of different things on Twitter, Facebook, etc. – and just because you and your immediate vicinity seem to have coherent practices, this does not mean others have. Entities and types of relation are formalized in "domain specific ways" => FB social graph
  4. Differentiation of scales (topological forms) is produced through technical means and emerge through social dynamics. Variations in scale are less institutional and more topological. (example: big Twitter accounts.) The idea that this would foster equality comes from the fact that indeed, everybody is a node. We think in terms of properties, not in terms of structure/dynamics. Status is not what you are, but how you are connected. => Variety in topics, variety in scales. Size is the main differentiator.
  5. Very large scale systems with very diverse uses on the one side, but highly concentrated data repositories on the other.
  8. Instead of getting an interface, you're getting a file.
  9. R Core Team (2014). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.
  10. This is where we start bringing together our knowledge of the platform, the case, etc.
  11. Allows for all kinds of folding, combinations, etc. – Math is not homogeneous, but sprawling! Different forms of reasoning, different modes of aggregation. These are already analytical frameworks, different ways of formalizing. There is a fast growing variety of analytical gestures focusing on large numbers of formalized and classed objects.
  12. In statistics, regression analysis is a statistical technique for estimating the relationships among variables. (correlation) A probability relationship: height and weight is correlated: if you are very tall, there is a good chance that you also weigh more; a statistical not a deterministic relationshhip Erosion of determinism in the 19th century Title : Recherches sur la population, les naissances, les décès, les prisons, les dépôts de mendicité, etc., dans le royaume des Pays-Bas , par M. A. Quételet,… 1827
  13. Forsythe and Katz, 1946 – "adjacency matrix", Moreno, 1934
  14. Visualization is, again, one type of analysis. Which properties of the network are "made salient" by an algorithm? Models behind: spring simulation, simulated annealing (
  15. Visual / spatial analysis is already very interesting, but graph theory allows to do much more. Networks are eminently calculable. All in all, this process resulted in the specification of nine centrality measures based on three conceptual foundations. Three are based on the degrees of points and are indexes of communication activity. Three are based on the betweenness of points and are indexes of potential for control of communication. And three are based on closeness and are indexes either of independence or efficiency. (Freeman 1979) What concepts are they based on?
  16. Graph shows prediction accuracy from likes. But this is still based on our "direct data", i.e. the things I liked. Kosinski, Michal, David Stillwell, and Thore Graepel. "Private traits and attributes are predictable from digital records of human behavior." Proceedings of the National Academy of Sciences 110.15 (2013): 5802-5805.
  17. B. Rieder (2013). Studying Facebook via data extraction: the Netvizz application. In WebSci '13 Proceedings of the 5th Annual ACM Web Science Conference (pp. 346-355). New York: ACM. Rieder, B., Abdulla, R., Poell, T., Woltering, R., & Zack, L. (forthcoming). Data Critique and Analytical Opportunities for Very Large Facebook Pages. Lessons Learned from Exploring “We Are All Khaled Said”. Big Data & Society.
  18. Khaled Said was beaten to death by the Egyptian police in Alexandria on June 6 2010 Page created by Wael Ghonim (Google Employee), considered to be a central place for the sparking of the Egyptian Revolution of 2011 (second man: AbdelRahman Mansour) We are interested in a number of questions, in particular the role of the page in the Egyptian Revolution. (broad question)
  19. And although we thing that this is basically about getting data out of a database, it's simply not that easy. Activity on posts continues; because we have a timestamp on comments, we can cut, but not on likes. Numbers need to be qualified on different levels.
  20. Issues: data access, changing FB platform (e.g. threaded comments) The communicational situation on this page is that only the admins can post. Comments can no longer be read for quantitative and logistical reasons. One of our research angles concerned polling as proto-democratic practice, so this is important. For some things we can correct, for others we can't.
  21. Simply plotting events is an analytical gesture. (=> pattern)
  22. Visualization is great for getting a first overview, maybe also finding out problems.
  23. Notice the dip of photos in February 2011. Photos are really the drivers of motivation. In the whole period only 19 days without post. Shared content but meticulous curation.
  24. Start of a revolutionary dynamic when a threshold is crossed. We can see that in the comments of these days, when many declare they no longer care about their safety.
  25. The revolutionary phase is followed by a face of reflection going towards the constitutional referendum.
  26. Interestingly, we do not have a power law. The highly active group is larger than a power low would indicate.
  27. We're not limited to merely quantitative perspectives, but there are so many comments! Two "distant reading" tools.
  28. This is really the limit of what one can do with our resources. Here, one needs to understand the layout algorithm to make interesting readings.
  29. Top user commented on nearly 4K different posts The topology indicates that the top users have different priorities. We could qualify the most active users on the page
  30. From DMI Workshop on Anti-Islamism. Pages can like each other, a kind of declaration of affinity.
  31. Starting point: stop Islamization of the World. Color: modularity algorithm (community detection) What does this mean?
  32. Starting point: stop Islamization of the World. What does this mean?
  33. I am using this case to walk you through some of the things one can do with DMI-TCAT.
  34. User and network statistics give us a good idea, here. We have a very dense community, with a number of highly active and visible top users.
  35. #sjw apprears 3573 times, #journalism 120 times => "ethics in gaming journalism"?
  36. The #gamedev tweets come from hashtag hijacking via IFTTT
  37. Cascade Interface, typical qual-quant Temporal and retweet patterns as means to detect bots.
  38. We again see one sided association, the gamergaters connecting to mainstream gaming channels, but those rarely link back. But no subscriptions taken into account!
  39. Not only features channels but also subscriptions! But with subscriptions, one arrives quickly at a much larger network.
  40. Now looking into patterns in interaction.