Slide 2: Etymology: The etymology of the term ‘Big Data’ can be traced back to the mid-1990s, when it was first used by John Mashey to refer to handling and analysis of massive datasets. However, by 2013, ‘Big Data’ was already being declared obsolescent as a meaningful term by some, as it was too wide ranging and vague in definition (e.g. de Goes, 2013).
Side 6: Vagaries: Kitchin argues that it is velocity and these additional key characteristics that set Big Data apart and make them a “disruptive innovation – one that radically changes the nature of data and what can be done with them” (Kitchin, 2014). However, there is no one characteristic profile that all Big Data fit and they can take multiple forms.
Slide 8: Ethics: Several ethical questions have been raised about the scope of data being generated and retained; such as those concerning privacy, informed consent, and protection from harm.
These questions raise wider issues about what kinds of data should be combined and analysed, and the purposes to which the resulting information should be put.
Slide 9: Inequalities: Challenges of inequality have also been posed:
Whose data traces will be analysed? It is likely that only those who are better off will be represented (as they are more likely to use social media, etc.)
Access and use of open data is unlikely to be equally available to everyone due to existing structural inequalities (Eynon, 2013)
Slide 11: What do Big Data actually tell us? Eynon (2013) argues that Big Data is concerned with capturing and examining patterns, and tells us more about what people actually do than about what they say they do. However, this is not sufficient for all kinds of social science research. We need to understand the meanings of behaviours which cannot be inferred simply from tracking specific patterns.
In order that Big Data are used appropriately, we need to ensure understanding of what kinds of research can or cannot be carried out using them. Big Data should not be seen as [a] “technical fix” for research, but should be used to empower, support and facilitate practice and critical research.
3. Definitions: 3 Vs?
“huge in volume – consisting of terabytes or
petabytes of data
high in velocity – being created in or near
real-time
diverse in variety in type – being structured
and unstructured in nature, and often
temporally and spatially referenced”
(Kitchin, 2014)
4. Other key characteristics
exhaustive in scope ( n=all)
fine-grained in resolution
indexical in identification (able to be uniquely labelled and identified)
relational in nature (different datasets can be conjoined)
flexible – can add new fields easily
scalable - can expand in size rapidly
5. Small and Big Data
Small data Big Data
Volume Limited to large Very large
Velocity Slow, freeze-framed,
bundled
Fast, continuous
Variety Limited to wide Wide
Exhaustivity Samples Entire populations
Resolutions and
indexicality
Course and weak to
tight and strong
Tight and strong
Relationality Weak to strong Strong
Extensionability and
scalability
Low to middling High
7. The mythology of Big Data
“the widespread belief that large
data sets offer a higher form of
intelligence and knowledge that
can generate insights that were
previously impossible, with the
aura of truth, objectivity and
accuracy.”
boyd & Crawford
10. Practicalities
Implications for the training of future academics – that’s you!
Institutional and cross-institutional infrastructures to support data
storage and processing capacity
Agreements and incentives for sharing data need to be drawn up
(e.g. Concordat on Open Research Data)
Ethical guidelines and protocols are needed
11. What do Big Data actually tell us?
what people
actually do (not
what they say
they do)
patterns of
behaviour
12. boyd with a small b
Big Data changes the definition of knowledge
Claims to objectivity and accuracy are misleading
Bigger data are not always better data
Taken out of context, Big Data loses its meaning
Just because it is accessible does not make it ethical
Limited access to Big Data creates new digital divides
These points should be carefully considered before utilising Big Data in research.
16. Sources
• boyd, d. and Crawford, K. (2012) ‘Critical questions for Big Data’, Information,
Communication & Society, 15(5), pp. 662-679.
• Davidag. (2011) ‘Drive Thru’. Available at: http://flic.kr/p/9X8hpQ. Accessed 9th
August 2017.
• Dinnen, P. (2010) ‘Sketch of Twitter Data Visualization’. Available at:
http://flic.kr/p/7MH2rf. Accessed 8th August 2017.
• Eynon, R. (2013) ‘The rise of Big Data: what does it mean for education, technology,
and media research?’, Learning, Media and Technology, 30(3), pp. 237-240.
• G4ll4is. (2013) ‘Privacy’. Available at: http://flic.kr/p/dZ2y6b. Accessed 8th August
2017.
• Kitchin, R. (2014) The Data Revolution, London: SAGE.
• Kitchin, R. and McArdle, G. (2016) ‘What makes Big Data, Big Data? Exploring the
ontological characteristics of 26 datasets’, Big Data & Society, January-June 2016, pp.
1-10.
17. Sources (2)
• Lebied, M. (2017) ‘5 big data examples in your real life at bars, restaurants and casinos’,
Datapine. Available at: http://www.datapine.com/blog/big-data-examples-in-real-life/.
Accessed 9th August 2017.
• Marr, B. (2016) ‘The most practical big data use cases of 2016’, Forbes. Available at:
https://www.forbes.com/sites/bernardmarr/2016/08/25/the-most-practical-big-data-use-
cases-of-2016. Accessed 9th August 2017.
• System of Ideas. (2012) ‘V’. Available at: http://flic.kr/p/bi2CPn. Accessed 8th August 2017.
• Yassan Yukky. (2011) ‘Cooking’. Available at: http://flic.kr/p/9tU7BB. Accessed 9th August
2017.