Presentation given at the HEA Social Sciences learning and teaching summit 'Exploring the implications of ‘the era of big data’ for learning and teaching'.
A blog post outlining the issues discussed at the summit is available via: http://bit.ly/1lCBUIB
Making our mark: the important role of social scientists in the ‘era of big data’ - Rebecca Eynon
1. Making our mark: the important
role of social scientists in the ‘era
of big data’
Dr Rebecca Eynon
Oxford Internet Institute
University of Oxford
2.
3. Overview
Big data: hype and reality
Use of big data should not be a specialism of only a few social
scientists
What kinds of skills and knowledge do social scientists need?
5. Big data: the end of social science as we
know it?
“Petabytes allow us to say: “Correlation is enough.” We can
stop looking for models. We can analyze the data without
hypotheses about what it might show. We can throw the
numbers into the biggest computing clusters the world has ever
seen and let statistical algorithms find patterns where science
cannot.”
The end of theory: the data deluge makes the scientific method
obsolete (Chris Anderson, Wired Magazine, 2008)
6. The coming crisis of empirical sociology
“A world inundated with complex processes of social and
cultural digitization; a world in which commercial forces
predominate; a world in which we, as sociologists, are losing
whatever jurisdiction we once had over the study of the ‘social’
as the generation, mobilization and analysis of social data
become ubiquitous” (Savage and Burrows, 2009:763)
7. A valuable addition or a radical rethink?
An open question
Big data is not perfect
But it is not just hype
8. Why big data is not perfect (1)
Big data prioritises certain people
Who has access to data is not straightforward
Data as a commodity
Commercial vs. public
Availability of data tends to drive the questions
Questions that are difficult to measure / collect data on are dropped
9. Why big data is not perfect (2)
Just because data is available does not mean we should use it
Privacy in public, public trust and accountability
How we use results from big data approaches matters
Risks of misuse of data, power structures in society
10. Social science is well positioned to
address these issues
But are we doing enough?
We are at risk of handing over aspects of social science to
computer scientists, physicists and engineers
Few social science journals publish findings from big data
A lot of funding is going outside social science for questions that we
used to be solely responsible for addressing
11. Learning & teaching
Data science courses have options in social science
Few courses in social science that offer data science
Students have to seek out opportunities for themselves
If data scientists can learn about social science then social
scientists can learn about data science
12. What kinds of skills and knowledge do social
scientists need?
On a continuum
We do not all need to be experts, but we need to know enough
Undergraduate & postgraduate
Ultimately, the use of big data will always be a team exercise
13. Language of multidisciplinary work
Need to be able to speak multiple ‘languages’ of the different
disciplines
Or learn how to build a common vocabulary within specific
project teams about the data, the different methods, the
findings etc
14. Awareness of cultural differences
“In many cases when we analyse big datasets we see patterns
that are not intuitive. Of course we need to build a theory
(model, in our language) to explain the observation, but in
many cases I was asked why I think the data looks like this and
even sometimes: "your observation cannot be correct". I guess
this is rooted in the differences in the disciplines. In social
sciences usually you build a theory and then gather the data to
support it, where is in data-driven sciences you first observe
something and then try to build a theory. Usually the
observation can't be wrong (unless your measurements are
wrong for technical reasons).”
(Data Scientist, OII)
15. Ethics of big data
Clear understandings of the ethical implications of gathering,
storing and using big data
Personal codes vs institutional arrangements
Difference between law and ethical practice
Recognition of “privacy in public” and general respect for people
Care over what we do with the data and how our work is used
Commitment to public debate and transparency about the use
of this data
16. Understanding the data
Thinking about data differently, and what constitutes data
Understanding the representation of the data
Linking data sets
17. Being clear about the data
“Usually data people are careless with words. They tend to give
names to their observed parameters which can be misleading.
They count how many times two people have called each other
during a 6 month period and call this quantity "friendship
strength". They count how many times people have mentioned
Obama in their tweets and call it the "political index" of the
user.... What I like about social scientists is that they are very
careful with words and terms and their definitions.”
(Data Scientist, OII)
18. Awareness and use of mixed method designs
Working within a pragmatic paradigm
Three levels of data
structural description (patterns of interactions);
thin descriptions, which note the content of the interaction
thick description, to provide rich context and convey the meaning of
events to those who participated in them (Welser et al., 2008)
Linking methods at three different levels can be very valuable
19.
20. Understanding the analysis
Having an intuition for what processes/algorithms are being applied to
datasets, particularly in the context of the application domain (e.g.
knowing the application domain very well) to be able to refine approaches
“[The Sociologist] always asks me, “Okay show me a code and explain to
me which part of the code is doing which part, just very brief
understanding of how this computer program is working”. So I was
learning some sociology from her and she is learning some computer
science programming skills from me so it’s kind of mutual.”
(Sloan Big Data Project, http://www.oii.ox.ac.uk/research/projects/?id=98)
21. Interpretation
Crucial – the core role of the social scientist in big data projects
An ability to write “the story” for different audiences
Not possible if we do not understand (at least at some level
what has happened at all stages of the research process)
23. Learning & teaching within the wider
ecology of HE
Training for policy makers
Training for current academics
Interdisciplinary support structures across universities
Assessment process for student work
Challenges for the individual doctoral student
REF, early career support and job opportunities
Notes de l'éditeur
(think of how we were able to converge upon more meaningful clusters with some iteration)