Keynote talk for NCRM Stream Analytics workshop, 19 January 2017, Manchester.
My talk is called "New and Emerging Forms of Data: Past, Present, and Future” and I will be giving a perspective from my role as one of the ESRC Strategic Advisers for Data Resources, in which I was responsible for new and emerging forms of data and realtime analytics. The talk also includes some of the current work in the Oxford e-Research Centre on Social Machines (the SOCIAM project) and an introduction to the PETRAS Internet of Things project.
The talk raises a number of important issues looking ahead, including massive scale of data that is already being supplied by Internet of Things, the implications of automation in our research, reproducibility and confidence in research results. I will also ask, how can the new forms of data and new research methods enable social scientists to work in new ways, and can we move on from the dependence on the traditional investment in longitudinal studies?
4. More people
Moremachines
Big Data
High Performance
Computing
Conventional
computing
Web 2
Social Media
e-infrastructure
online
R&D
New and
Emerging
Forms of Data
deeply
about
society
9. Social Media Triangle
social media
data and
analytics
social media
for engagement
with research
social media
as a subject
of research
Sam McGregor
10. New Forms of Data
▶ Internet data, derived from social
media and other online interactions
(including data gathered by
connected people and devices, eg
mobile devices, wearable
technology, Internet of Things)
▶ Tracking data, monitoring the
movement of people and objects
(including GPS/geolocation data,
traffic and other transport sensor
data, CCTV images etc)
▶ Satellite and aerial imagery (eg
Google Earth, Landsat, infrared,
radar mapping etc) http://www.oecd.org/sti/sci-tech/new-data-for-
understanding-the-human-condition.htm
11. What do we mean by real-time analytics?
▶ Live data streams vs live data analysis
▶ Different kinds of data, at a different pace
▶ Time-critical integration and analysis
▶ Influencing processes as they unfold, at speed & at scale
▶ New methodological apparatus
▶ New computational methods and infrastructure
▶ Not just social media – but social media is a rehearsal
14. Real life is and must be full of all kinds of social
constraint – the very processes from which society
arises. Computers can help if we use them to create
abstract social machines on the Web: processes in
which the people do the creative work and the
machine does the administration... The stage is set
for an evolutionary growth of new social engines.
The ability to create new forms of social process
would be given to the world at large, and
development would be rapid.
Berners-Lee, Weaving the Web, 1999 (pp. 172–175)
Social Machines
16. Observer of
one social
machine
Observers using third
party observatory
Observer of
multiple social
machines
Human
participants in
Social
Machine
Human participants in
multiple Social Machines
Observer of Social
Machine infrastructure
1
4
2
3
5
6
SM
SM
SM
Social Machine
Observing Social
Machines
7
@dder
De Roure, D.,
Hooper, C., Page,
K., Tarte, S., and
Willcox, P. 2015.
Observing Social
Machines Part 2:
How to Observe?
ACM Web Science
17. STORYTELLING AS A STETHOSCOPE
FOR SOCIAL MACHINES
1. Sociality through storytelling potential
and realization
2. Sustainability through reactivity and
interactivity
3. Emergence through collaborative
authorship and mixed authority
Zooniverse is a highly
storified Social Machine
Facebook doesn’t allow
for improvisa(on
Wikipedia assigns
authority rights rigidly
Tarte, S. M., De Roure, D., and Willcox, P. Working out the plot: the role of stories in social machines. In Proceedings of the
companion publication of the 23rd international conference on World wide web companion (2014), International World Wide Web
Conferences Steering Committee, pp. 909–914."
18.
19. Seizing the tiger by the tail
▶ The Internet of Things
describes a world in which
everyday objects are
connected to a network so that
data can be shared
▶ But it is really as much about
people as the inanimate object
▶ It is impossible to anticipate
all the social changes that
could be created by connecting
billions of devices
https://www.gov.uk/government/publications/internet-of-things-blackett-review
20.
21. PETRAS Privacy, Ethics, Trust, Reliability, Acceptability, and Security
for the Internet of Things
• Use an integrated approach of collaborative social and
physical science expertise
• Remove barriers to the beneficial adoption of Internet of
Things
• Address generic knowledge gaps through case study
approaches covering major sectors
• Use innovative methodologies including ‘in the wild’ and
citizen science
Principles
22. PETRAS Privacy, Ethics, Trust, Reliability, Acceptability, and Security
for the Internet of Things
Key Facts about PETRAS
• 9 world leading universities via
the core and spoke model (4
from the Alan Turing Institute)
• Combined hub value: £23m
• Blackett Review expertise
• 47 partners at submission
combining presence in the UK,
Central Europe and America
(giving International links and
perspective)
• Inter– and multi-disciplinary
focus
24. More people
Moremachines
Big Data
High Performance
Computing
Conventional
computing
Web 2
Social Media
e-infrastructure
online
R&D
New and
Emerging
Forms of Data
deeply
about
society
Thefuture
increasing automation
machine learning
26. Edwards, P. N., et al. (2013) Knowledge Infrastructures: Intellectual Frameworks and Research
Challenges. Ann Arbor: Deep Blue. http://hdl.handle.net/2027.42/97552
28. Jameson L. Toole, Yu-Ru Lin, Erich Muehlegger, Daniel Shoag, Marta C.
González, David Lazer. Journal of the Royal Society Interface. Volume 12,
issue 107. Published 27 May 2015.DOI: 10.1098/rsif.2015.0185
Tracking employment shocks using mobile phone data
31. What are we trying to achieve?
My reflection is that the reason we seek
“reproducible research” is principally to achieve
two ends:
1. Confidence in results, because they inform
policy, decision-making, and further research
2. Sharing and citation of methods, data,
software, to make it easier to stand on each
others shoulders not toes
Let’s focus on (1)…
32. Research in the Wild (West)
Imagine you are a conference chair… or responsible for urban
planning, or security. Confidence in results is getting harder:
What interventions should we make to improve confidence
and quality? What (socio-)technology can we adopt?
Trusting the analysis that
is occurring
Automation of workflows,
crowd-sourced data reduction,
software vulnerabilities, increasing
adoption of machine learning,
and no critical human in the loop
Knowing what the data is,
where it has come from,
and what we can do with it
Multiple and partial data sources,
at speed and scale, in an evolving
ecosystem of data processing
intermediaries, with complexity in
permissions for data use
33. Provocation One
▶ Are there questions which are answered using
longitudinal studies data today that could be
answered in other ways?
▶ There is massive (voluntary) supply of data about
individuals on a huge scale
▶ The supply is set to increase with Internet of Things
▶ This data is “real time” (fitbit, smartphone,
accelerometer methods…)
34. Provocation Two
▶ Sometimes we really do need a longitudinal study
in order to answer a question
▶ So can we do that longitudinal study in a new way?
▶ By:
– Supplementing existing studies, using linkage
– Using new techniques with easy reporting at scale
– Working internationally, regionally—shining the torch
35. Provocation Three
▶ Are we planning for how the world will be in
5 years?
▶ What have we learned from the rehearsal so far?
▶ Increasing automation, bots, robots
▶ Behaviour in the digital world (physical-digital world)
▶ Changing data ecosystem, e.g. personal data stores
37. Closing reflec(ons
1. Not just new forms of data, but new social
processes and new research questions
2. What can we learn from the social media
analytics rehearsal?
3. Are we ready?
– for the data supply ahead
– for inevitable automation
4. How do we ensure the quality of research?
38. david.deroure@oerc.ox.ac.uk
@dder
Thanks to Peter Elias, Wendy Hall, Sam McGregor, Mark
Sandler, Nigel Shadbolt, Jeremy Watson, also Grant Miller,
Petar Radanliev, Ségolène Tarte, and Pip Willcox.
http://www.slideshare.net/dder/new-and-emerging-forms-of-data