The Big Picture
More people
Moremachines
Big Data
Big Compute
Conventional
Computation
“Big Social”
Social Networks
e-infrastructure
online
R&D
Big Data
Production
& Analytics
deeply
about
society
RCUK and Big Data
▶ „Big data is a term for a collection of datasets so
large and complex that it is beyond the ability of
typical database software tools to capture, store,
manage, and analyse them. „Big‟ is not defined as
being larger than a certain number of „bytes‟ because
as technology advances over time, the size of datasets
that qualify as big data will also increase‟ (RCUK)
▶ But why do we want it?
New forms of data enable us to
1. Answer existing research questions in new ways
2. Ask entirely new research questions
NERC Big Data
...as diverse as our science
• From micro- to macro-scale
• Many sources:
• Monitoring campaigns
• Field sites & sensors
• State-of-the-art laboratories
• Ships & aircraft
• Remote Sensing & EO
• Regulator networks
• Volunteers/citizen science
• Model output
• Long-term and unique!
10µm
100 TB
Big data: time-based media including film, tv, cctv
footage - retail data - geospatial data - email and social
media - images and associated metadata -
performance data including raw data of recordings,
choreography, performance structure - open
government data - music - large-scale digital scans -
Research benefits of new data
▶ Undertaking research on pressing policy-related issues
without the need for new data collection
• Food consumption, social background and obesity
• Energy consumption, housing type and climatic
conditions
• Rural location, private/public transport alternatives and
incomes
• School attainment, higher education participation,
subject choices, student debt and later incomes
▶ New data such as social media enable us to ask big
questions, about big populations, and in real time – this is
transformative
Real life is and must be full of all kinds of social
constraint – the very processes from which
society arises. Computers can help if we use
them to create abstract social machines on the
Web: processes in which the people do the
creative work and the machine does the
administration... The stage is set for an
evolutionary growth of new social engines. The
ability to create new forms of social process
would be given to the world at large, and
development would be rapid.Berners-Lee, Weaving the Web, 1999 (pp. 172–175)
The Order of Social
Machines
Some Social
Machines
SOCIAM: The Theory and Practice of Social Machines is funded by the UK Engineering and Physical Sciences Research Council
(EPSRC) under grant number EPJ017728/1 and comprises the Universities of Southampton, Oxford and Edinburgh. See sociam.org
Edwards, P. N., et al. (2013) Knowledge Infrastructures: Intellectual Frameworks and
Research Challenges. Ann Arbor: Deep Blue. http://hdl.handle.net/2027.42/97552
Big data elephant versus sense-making
network?
The challenge is to foster the co-constituted socio-technical
system on the right i.e. a computationally-enabled sense-
making network of expertise, data, models and narratives.
Iain Buchan
Join the W3C Community Group www.w3.org/community/rosc
Jun
Zhao
www.researchobject.org
Take homes
▶ New forms of data enable us answer old
questions in new ways and to answer entirely
new questions
▶ There are multiple shifts occurring:
– Volumes of data
– Realtime analytics
– Computational infrastructure
– Dataflows vs datasets (and curation infrastructure)
– Correlation vs causation
– Increasing automation
– Machine-to-Machine in Internet of Things
EPSRC: Under ‘Big Data’ we are considering both very large and also complex data, including dynamic and heterogenous data from all the various sources including sensors, social media, industry etc.
Our research is underpinned by our extensive data holdings..Which range in scale from the micro- ... To the macro...And are derived from a wide range of sources, including...What make these data so special is that many of the data a very long-term,Our earliest data date-back to the 19th century...And are unique, irreplaceable!...providing an irreplaceable resource that helps provide early warnings of environmental change and places such changes in the correct historical context for decision makers…
Sloan Digital Sky Survey, the most ambitious astronomical survey ever undertaken, comprises 40 terabytes of information, while Steven Spielberg’s Survivors of the Shoah Visual History project comprises 200 terabytes.100 terabytes500 GBhttp://historyonics.blogspot.com/2011/06/culturomics-big-data-code-breakers-and.html
ESRC was allocated 64m and much of this is being used to set up the ESRC Big Data Network. The ESRC’s Big Data Network will support the development of a network of innovative investments which will strengthen the UK’s competitive advantage in Big Data for the social sciences. The core aim of this network is to facilitate access to different types of data and thereby stimulate innovative research and develop new methods to undertake that research. Although you should note that diagram it is only illustrative in terms of how the UKDS and ADS will work across – that is still under discussion; and only illustrative in the number of Business and Local Government Data Research.This network has been divided into three phases. In Phase 1 of the Big Data Network the ESRC has invested in the development of the Administrative Data Research Network (ADRN) which will provide access to de-identified administrative data collected by government departments for research use – focus of this meeting and all your grants.A few words about Phase 2 and 3 before we pass to Vanessa to talk about the ADRN some more. Phase 2is currently bring commissioned and will deal primarily with business data and/ or local government data. Phase 3, further details of which will be released in the last autumn / winter and will focus primarily on third sector data and social media data. It is expected that there will be opportunities for interaction across all elements of the ESRC Big Data Network and that they will all work together around the wider objectives of facilitating access to different forms of data and of ensuring maximum impact is generated from the use of that data for the mutual benefit of data owners and researchers, and through the research facilitated by the Network, benefit society and the economy more generally.
Thanks to Simon Hettrick for additional input to this slide.