LUNULARIA -features, morphology, anatomy ,reproduction etc.
Grounded theory meets big data: One way to marry ethnography and digital methods
1. @dhirajmurthy 1
Grounded theory meets Big Data:
One way to marry ethnography and digital methods
May 2016
Dhiraj Murthy | @dhirajmurthy | d.murthy@gold.ac.uk
CAST: Social Media Research Cluster
2. @dhirajmurthy 2
Objectives
• There are unique challenges associated with data collection
and analysis on social media platforms
• How do we integrate and weigh Big Data questions and
more in-depth contextualized analysis of social media
content?
• How do we categorize textual and visual content,
addressing issues of ontology?
• How can grounded theory be applied to coding schemes?
3. @dhirajmurthy 3
Starting points
• Big data methods successfully applied to Twitter data
(indeed 16% of research on Twitter employed sentiment
analysis (Zimmer and Proferes 2014)
• We may think that anything about human behavior can be
deciphered from Twitter data, but that simply is not true.
• There are also challenges associated with data collection
and analysis on Twitter (boyd & Crawford, 2012).
• Closed coding systems are thought to be the best for
studying Twitter data
• However, social media data involves very ‘messy’
elements and mixed approaches can have high utility
4. @dhirajmurthy 4
New ontologies
So perhaps we need to …
challenge traditional ontological assumptions!
Hardt and Negri (2005, p. 312) argue that this type of a
critical ‘new ontology’ is part of their desire not to engage
in “repeating old rituals”, but, rather, “launching a new
investigation in order to formulate a new science of society
and politics [… that] is not about piling up statistics or
mere sociological facts [… but] immersing ourselves in the
movements of history and the anthropological
transformations of subjectivity.”
5. @dhirajmurthy 5
First: So what does Twitter API data look like
"user": {
"name": "dhirajmurthy",
"friendsCount": 771,
"followersCount": 1534,
"listedCount": 100,
"statusesCount": 2609,
}
This is an excerpt of API-delivered JavaScript Object
Notation (JSON) data for my Twitter ID
6. @dhirajmurthy 6
What is often missing in Twitter-based research
• Be open in the inquiry, allowing coding to be emergent.
• Ask what is happening in the tweet (not just body text).
Think about JSON data holistically.
• What are these tweet data helping us study, speaking
broadly?
• Are we being reflexive on the point of view/standpoint
we are interpreting?
• Are we being flexible or following prescribed rules?
7. @dhirajmurthy 7
Beyond induction and deduction…
• ‘Big data is [..] most effective when researchers take
account of the complex methodological processes that
underlie the analysis of that data’. boyd & Crawford (2012,
p. 668)
• And inductive and deductive methods have their own
limitations
8. @dhirajmurthy 8
Beyond induction and deduction…
• Abductive methods: a form of reasoning ‘for finding the
best explanations among a set of possible ones’ (Paul,
1993) are alternative approach
• Retroduction: a type of abductive method that
emphasizes “asking why” (Olsen, 2012: 215), researchers
are able to probe the data regularly and to “avoid
overgeneralisation but searching for reasons and
causes” (p. 216) instead.
Or put another way, “the retroductive researcher, unlike
the inductive researcher, has something to look
for” (Blaikie, 2004).
9. @dhirajmurthy 9
Methods
Emergent coding
methods can be
implemented
operationally in a
systematic fashion
to build critical,
reflective,
conceptual
knowledge of
Twitter-derived
data.
Theory building, Adapted from Goulding, C. (2002), Grounded Theory: Sage, p. 115
10. @dhirajmurthy 10
In Practice
• Be open in the inquiry, allowing coding to be emergent.
• Tweets are not merely bits of text. Ask what is happening
in the tweet (not just body text). Think about JSON object
data holistically (c.f. Manovich’s (2001) ‘digital objects’).
• What are these tweet data helping us study, speaking
broadly?
• Are we being reflexive on the point of view / standpoint we
are interpreting?
• Are we being flexible or following prescribed rules?
12. @dhirajmurthy 12
Data collection and relationship model;
Figure adapted from Corbin, J. and
Strauss, A (2015), Basics of qualitative
research: techniques and procedures for
developing grounded theory, Thousand
Oaks: Sage, pg. 8
Continuous open coding Twitter data model
applied to #accidentalracist, a hashtag associated
with a 2013 duet by Brad Paisley and LL Cool J
13. @dhirajmurthy 13
• Operationalizing this
ontology requires
several stages of
coding
• Memo making during
collection and analysis
is integral to both
coding development
and theory building
• Comparisons across
diverse data at each
stage provide
reflexivity and
triangulation
14. @dhirajmurthy 14
Computational method first
• One can effectively use
machine learning approaches
such as Latent Dirichlet
allocation (LDA) to derive topic
clusters around a Twitter corpus
• This can be used to inform what
coding categories are deployed
for not only tweet content, but
profiles and other metadata
• Example: Topic clusters derived
from 90,986 cancer-related
tweets (with keywords: cancer,
mammogram, lymphoma,
melanoma, and cancer survivor)
15. @dhirajmurthy 15
Conclusions
• Social media are complex sociotechnical spaces
• Presentation of the self is often highly nuanced – a case
particularly complicated with uses of humor, a frequent
theme on Twitter
• Coded content can present different perspectives on
social interactions and these data are complementary to
computational methods
• Combining emergent grounded theory with machine
learning or vice versa can advance both qualitative and
computational methods
17. @dhirajmurthy 17
References
Blaikie, N. (2004). Retroduction. In M. S. Lewis-Beck, A. Bryman & T. F. Liao (Eds.), The
SAGE Encyclopedia of Social Science Research Methods (pp. 973). Thousand Oaks: Sage.
boyd, d., & Crawford, K. (2012). Critical questions for Big Data: Provocations for a cultural,
technological, and scholarly phenomenon. Information, Communication & Society, 15(5),
662-679.
Corbin, J., & Strauss, A. (2015). Basics of qualitative research : techniques and procedures for developing
grounded theory. Los Angeles: Sage.
Hardt, M., & Negri, A. (2005). Multitude war and democracy in the age of Empire, New
York: Penguin.
Murthy, D. (2011). Emergent digital ethnographic methods for social research. Handbook of
Emergent Technologies in Social Research, Oxford University Press, Oxford, 158-179.
Olsen, W. K. (2012). Data collection : key debates and methods in social research. London; Thousand
Oaks, Calif.: SAGE.
Paul, G. (1993). Approaches to abductive reasoning: an overview. Artificial Intelligence Review,
7(2), 109-152.
Zimmer, M., & Proferes, N. J. (2014). A topology of Twitter research: disciplines, methods,
and ethics. Aslib Journal of Information Management, 66(3), 250-261. doi: doi:10.1108/
AJIM-09-2013-0083.
18. @dhirajmurthy 18
Selected Work
Most can be downloaded from http://www.dhirajmurthy.com/about/
Twitter: Social Communication in the Twitter Age. 2013, with Polity Press
‘Big Data Solutions On a Small Scale: Evaluating Accessible High Performance Computing for Social
Research’, Big Data and Society (with Bowman, S.), 2014
Modeling virtual organizations with Latent Dirichlet Allocation: A case for natural language processing‘,
Neural Networks (with Gross, A.), Volume 58, pp. 38-49, 2014.
‘Social Media, Collaboration, and Scientific Organizations.’ American Behavioral Scientist., (with Lewis,
J.P.), 2014.
‘Comparing Print Coverage and Tweets in Elections: a Case Study of the 2011-2012 US Republican
Primaries‘, Social Science Computer Review (with Petto, L.), 2014
‘Twitter and Disasters: the uses of Twitter during the 2010 Pakistan floods‘, Information
Communication and Society, Volume 16, Issue 6, 2013, pp. 837-855.
‘Emergent Data Mining Tools for Social Network Analysis‘ in Data Mining in Dynamic Social Networks
and Fuzzy Systems (Bhatnagar, V. ed.), pp 40-57 , (with Gross, A. and Takata, A.), 2013.
‘Evaluation and Development of Data Mining Tools for Online Social Networks’ in Mining Social
Networks and Security Informatics ( Özyer, T. et al. eds.) , pp 183-202 (with Gross, A., Takata, A.,
Bond, S.), 2013. Evaluation and Development of Data Mining Tools for Online Social Networks.
Murthy, D., Gross, A., Oliveira, D. ‘Understanding Cancer-based Networks in Twitter using Social
Network Analysis’ in IEEE International Conference on Semantic Computing Proceedings. Palo
Alto, California, 2011