Ce diaporama a bien été signalé.

Accessing and Using Big Data to Advance Social Science Knowledge

2

Partager

Chargement dans…3
×
1 sur 21
1 sur 21

Accessing and Using Big Data to Advance Social Science Knowledge

2

Partager

Télécharger pour lire hors ligne

Description

'Brown bag' presentation at the Oxford Internet Institute, June 2014.

Transcription

  1. 1. Accessing and Using Big Data to Advance Social Science Knowledge Eric Meyer, Ralph Schroeder, Linnet Taylor and Josh Cowls
  2. 2. Overview • Project Introduction (Eric) – Papers – Conferences and outreach – Data sources • Big data theory, definition, limits to science and its uses (Ralph) • Case study: using big data to gauge public opinion (Josh)
  3. 3. Accessing and Using Big Data to Advance Social Science Knowledge • October 2012 – August 2014 • Funded by the Alfred P. Sloan Foundation • Data sources • 120+ interviews, mainly with social scientists • Workshops and conferences • No representative sample, but some patterns of disciplinary and skills background and career trajectory
  4. 4. Project scope to date • Economics • Business (Nemode project with Greg, Monica) • Wikipedia • Social Media • Development • Political Science • Causation and Theory • Policy • (20+ blog posts on a range of other issues)
  5. 5. Outcomes and outreach • A series of workshops hosted at the Oxford Internet Institute: – 2013, March. Big Data: Rewards and Risks for the Social Sciences. Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=557 – 2013, March. Qualitative methods to study data practices: An international comparative workshop. Oxford Internet Institute, Oxford – 2013, January. Towards a Sociology of Data. Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=560 – 2014, May. Big data and social change in the developing world, Rockefeller Foundation Bellagio Centre conference – 2014, August. American Political Science Association Panel on ‘Big Data and Political Science’ • MSc option course ‘Big Data and Society’
  6. 6. Definition • ‘Big data’ – the advance of knowledge via a leap in the scale and scope in relation to a given object or phenomenon • ‘Data’ – Belongs to the object – ‘taking…before interpreting’ (Ian Hacking) • the view that ‘all data are of their nature interpreted’ is misleading: ‘data are made, but as a good first approximation, the making and taking come before interpreting’ – The most atomizable useful unit of analysis
  7. 7. Digital Objects and their Referents Digital Object (Examples: Twitter, Tesco Loyalty card information Real World (People / Physical Objects) Represent / Manipulate
  8. 8. Representing Manipulating Limits Digital Data
  9. 9. Uses and Limits • Big data research uses (academic, commercial, government) are limited to the exploitation of suitable objects, and the objects which ‘give off’ digital data, and the phenomena they lay bare, are limited • The knowledge produced is aimed at ‘sorting people’ and advancing ‘representing and intervening’ (but without ‘manipulating’, except where this is warranted by practical economic and political objectives) • Difference commercial versus academic world is that knowledge provides competitive and practical advantage as against advancing (high-consensus rapid-discovery) knowledge – The limits in both cases are the objects (to which the data ‘belong’), and that need to have available digitally manipulable data points • How available these objects are differs, but also… – Causation and theoretical embedding matters for academic social science – For commercial (and non-academic uses), ‘predicting’ consumer choices and other behaviours, for limited purposes and without increasing scientific knowledge, is good enough • There are many objects, for non-academics and scientists to humanities scholars (physical, human, cultural), but they are not infinite • This availability, not skills or other issues, determines the future of big data research
  10. 10. (Big) data definition enables pinpointing impacts and threats • ‘Google Plus may not be much of a competitor to Facebook as a social network, but…some analysts…say that Google understands more about people’s social activity than Facebook does.’ – New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’. • Facebook Likes: ‘Predicting users’ individual attributes and preferences can beused to improve numerous products and services. For instance, digital systems and devices (such as online stores or cars) could be designed to adjust their behavior to best fit each user’s inferred profile…online insurance…advertisements might emphasize security when facing emotionally unstable (neurotic) users but stress potential threats when dealing with emotionally stable ones’ – ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M, Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5. • More powerful knowledge will enable better services, and more manipulation
  11. 11. Ethical and Social Issues in Big Data Research • Objects with ‘total’ knowledge (universes) – Danger is inferring behaviour not of individuals, but of classes of people • Asymmetry of knower and the subjects of knowledge is greater than elsewhere • Based not on individuals’ but on aggregate behaviour – Hence only utilitarian, not Kantian justification? • Why does prediction or uncovering laws of behaviour ‘grate’? • Benefits: greater scientific power and more specific details • Relation to smaller data? ‘Creep’ • Solution: ethical = greater researcher and public awareness, regulatory (would apply to academic researchers?) = prevent legal and specific harms
  12. 12. Outlook and Implications • There is an overlap between real world research and the world of academic research which is closer than elsewhere – because this is the research front in both – because they share common objects • For research – Develop theoretical frame in which to embed big data (for social media), including power/function, relation to traditional media, and role in society • For society – Awareness of how research can generate transparency and manipulability • Big Brother? – Yes, but also Brave New World of Omniscience, with Social Science as Handmaiden
  13. 13. Exemplar case: using big data to gauge public opinion Opportunities and challenges
  14. 14. The traditional model • The notion of public opinion was enlivened by the coffee houses of 17th Century Britain • Inferential statistics provided a rigorous, replicable basis for reporting public opinion, based on a random sample and MOE • Remained expensive, random sample difficult to construct, response rates dwindling
  15. 15. Using big data approaches • n=all: beyond the sample • Cheaper (after initial investment) • More granularity, more insight? Cost Utility Traditional inferential model Big data model
  16. 16. But challenges remain… • Representativeness • Reliability • Replicability
  17. 17. The challenge of representativeness ≠ Amber Boydstun: anyone who does a Twitter study has to really work hard, I've noticed, to justify why we should care about Twitter because Twitter is not representative of the United States population or the world population, right. And it's not. And even if it was representative, or even if we don't care that it's not representative, it's really hard to figure out in any given study whether you're getting an over- sampling of those users who are just more active than other users. Data from Dutton, W.H. and Blank, G., with Groselj, D. (2013) Cultures of the Internet: The Internet in Britain. Oxford Internet Survey 2013. Oxford Internet Institute, University of Oxford.
  18. 18. The challenge of reliability Mike Thelwall: really the big problem that we haven’t cracked is that if someone tweets a sentiment it’s not necessarily what they’re feeling, it can be for a variety of reasons, so it doesn’t really reflect directly what they feel necessarily … so it’s quite a stretch to say that if someone tweets, “I’m happy” that they’re actually happy, to give a simple example • Difficult to establish the meaning of latent messages • Platform specific behaviours (e.g. hashtags, likes) are not always understood • Political discourse often laced with sarcasm
  19. 19. The challenge of replicability • Social data is often proprietary – getting access can be difficult, expensive or impossible • Sometimes access is limited to output – analysis takes inside black box • Challenges basic Popperian assumption of falsifiability Nick Anstead: there are all these companies that do all this wonderful stuff, but actually as an academic researcher, using them is expensive … what do you actually get from working with these companies? Do you get raw data sets that you go and do stuff with yourself? More commonly, I would suggest, what you probably get is access to, sort of, a black box tool.
  20. 20. … and the implications aren’t just academic Shelton, T et al, ‘Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum Volume 52, March 2014, pps 167-79
  21. 21. Project Papers Schroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming. Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development policy.’ GeoJournal. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American Sociological Association. (being submitted) Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to Information, Communication and Society. Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population, Space and Place. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being submitted). Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium, ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada. Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation. Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big data analysis in economics”. Forthcoming in Big Data & Society. Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’ Cowls, Josh ‘Big Data and Policy Implementation’, in preparation. Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.

Description

'Brown bag' presentation at the Oxford Internet Institute, June 2014.

Transcription

  1. 1. Accessing and Using Big Data to Advance Social Science Knowledge Eric Meyer, Ralph Schroeder, Linnet Taylor and Josh Cowls
  2. 2. Overview • Project Introduction (Eric) – Papers – Conferences and outreach – Data sources • Big data theory, definition, limits to science and its uses (Ralph) • Case study: using big data to gauge public opinion (Josh)
  3. 3. Accessing and Using Big Data to Advance Social Science Knowledge • October 2012 – August 2014 • Funded by the Alfred P. Sloan Foundation • Data sources • 120+ interviews, mainly with social scientists • Workshops and conferences • No representative sample, but some patterns of disciplinary and skills background and career trajectory
  4. 4. Project scope to date • Economics • Business (Nemode project with Greg, Monica) • Wikipedia • Social Media • Development • Political Science • Causation and Theory • Policy • (20+ blog posts on a range of other issues)
  5. 5. Outcomes and outreach • A series of workshops hosted at the Oxford Internet Institute: – 2013, March. Big Data: Rewards and Risks for the Social Sciences. Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=557 – 2013, March. Qualitative methods to study data practices: An international comparative workshop. Oxford Internet Institute, Oxford – 2013, January. Towards a Sociology of Data. Oxford Internet Institute, Oxford. http://www.oii.ox.ac.uk/events/?id=560 – 2014, May. Big data and social change in the developing world, Rockefeller Foundation Bellagio Centre conference – 2014, August. American Political Science Association Panel on ‘Big Data and Political Science’ • MSc option course ‘Big Data and Society’
  6. 6. Definition • ‘Big data’ – the advance of knowledge via a leap in the scale and scope in relation to a given object or phenomenon • ‘Data’ – Belongs to the object – ‘taking…before interpreting’ (Ian Hacking) • the view that ‘all data are of their nature interpreted’ is misleading: ‘data are made, but as a good first approximation, the making and taking come before interpreting’ – The most atomizable useful unit of analysis
  7. 7. Digital Objects and their Referents Digital Object (Examples: Twitter, Tesco Loyalty card information Real World (People / Physical Objects) Represent / Manipulate
  8. 8. Representing Manipulating Limits Digital Data
  9. 9. Uses and Limits • Big data research uses (academic, commercial, government) are limited to the exploitation of suitable objects, and the objects which ‘give off’ digital data, and the phenomena they lay bare, are limited • The knowledge produced is aimed at ‘sorting people’ and advancing ‘representing and intervening’ (but without ‘manipulating’, except where this is warranted by practical economic and political objectives) • Difference commercial versus academic world is that knowledge provides competitive and practical advantage as against advancing (high-consensus rapid-discovery) knowledge – The limits in both cases are the objects (to which the data ‘belong’), and that need to have available digitally manipulable data points • How available these objects are differs, but also… – Causation and theoretical embedding matters for academic social science – For commercial (and non-academic uses), ‘predicting’ consumer choices and other behaviours, for limited purposes and without increasing scientific knowledge, is good enough • There are many objects, for non-academics and scientists to humanities scholars (physical, human, cultural), but they are not infinite • This availability, not skills or other issues, determines the future of big data research
  10. 10. (Big) data definition enables pinpointing impacts and threats • ‘Google Plus may not be much of a competitor to Facebook as a social network, but…some analysts…say that Google understands more about people’s social activity than Facebook does.’ – New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’. • Facebook Likes: ‘Predicting users’ individual attributes and preferences can beused to improve numerous products and services. For instance, digital systems and devices (such as online stores or cars) could be designed to adjust their behavior to best fit each user’s inferred profile…online insurance…advertisements might emphasize security when facing emotionally unstable (neurotic) users but stress potential threats when dealing with emotionally stable ones’ – ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M, Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5. • More powerful knowledge will enable better services, and more manipulation
  11. 11. Ethical and Social Issues in Big Data Research • Objects with ‘total’ knowledge (universes) – Danger is inferring behaviour not of individuals, but of classes of people • Asymmetry of knower and the subjects of knowledge is greater than elsewhere • Based not on individuals’ but on aggregate behaviour – Hence only utilitarian, not Kantian justification? • Why does prediction or uncovering laws of behaviour ‘grate’? • Benefits: greater scientific power and more specific details • Relation to smaller data? ‘Creep’ • Solution: ethical = greater researcher and public awareness, regulatory (would apply to academic researchers?) = prevent legal and specific harms
  12. 12. Outlook and Implications • There is an overlap between real world research and the world of academic research which is closer than elsewhere – because this is the research front in both – because they share common objects • For research – Develop theoretical frame in which to embed big data (for social media), including power/function, relation to traditional media, and role in society • For society – Awareness of how research can generate transparency and manipulability • Big Brother? – Yes, but also Brave New World of Omniscience, with Social Science as Handmaiden
  13. 13. Exemplar case: using big data to gauge public opinion Opportunities and challenges
  14. 14. The traditional model • The notion of public opinion was enlivened by the coffee houses of 17th Century Britain • Inferential statistics provided a rigorous, replicable basis for reporting public opinion, based on a random sample and MOE • Remained expensive, random sample difficult to construct, response rates dwindling
  15. 15. Using big data approaches • n=all: beyond the sample • Cheaper (after initial investment) • More granularity, more insight? Cost Utility Traditional inferential model Big data model
  16. 16. But challenges remain… • Representativeness • Reliability • Replicability
  17. 17. The challenge of representativeness ≠ Amber Boydstun: anyone who does a Twitter study has to really work hard, I've noticed, to justify why we should care about Twitter because Twitter is not representative of the United States population or the world population, right. And it's not. And even if it was representative, or even if we don't care that it's not representative, it's really hard to figure out in any given study whether you're getting an over- sampling of those users who are just more active than other users. Data from Dutton, W.H. and Blank, G., with Groselj, D. (2013) Cultures of the Internet: The Internet in Britain. Oxford Internet Survey 2013. Oxford Internet Institute, University of Oxford.
  18. 18. The challenge of reliability Mike Thelwall: really the big problem that we haven’t cracked is that if someone tweets a sentiment it’s not necessarily what they’re feeling, it can be for a variety of reasons, so it doesn’t really reflect directly what they feel necessarily … so it’s quite a stretch to say that if someone tweets, “I’m happy” that they’re actually happy, to give a simple example • Difficult to establish the meaning of latent messages • Platform specific behaviours (e.g. hashtags, likes) are not always understood • Political discourse often laced with sarcasm
  19. 19. The challenge of replicability • Social data is often proprietary – getting access can be difficult, expensive or impossible • Sometimes access is limited to output – analysis takes inside black box • Challenges basic Popperian assumption of falsifiability Nick Anstead: there are all these companies that do all this wonderful stuff, but actually as an academic researcher, using them is expensive … what do you actually get from working with these companies? Do you get raw data sets that you go and do stuff with yourself? More commonly, I would suggest, what you probably get is access to, sort of, a black box tool.
  20. 20. … and the implications aren’t just academic Shelton, T et al, ‘Mapping the data shadows of Hurricane Sandy: Uncovering the sociospatial dimensions of ‘big data’. Geoforum Volume 52, March 2014, pps 167-79
  21. 21. Project Papers Schroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming. Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development policy.’ GeoJournal. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American Sociological Association. (being submitted) Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to Information, Communication and Society. Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population, Space and Place. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being submitted). Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium, ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada. Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation. Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big data analysis in economics”. Forthcoming in Big Data & Society. Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’ Cowls, Josh ‘Big Data and Policy Implementation’, in preparation. Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.

Plus De Contenu Connexe

Livres associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

Livres audio associés

Gratuit avec un essai de 30 jours de Scribd

Tout voir

×