SlideShare une entreprise Scribd logo
1  sur  22
Big data in social sciences
and humanities: from
epistemology to data power
Teresa Numerico
Dept. Philosophy, communication and
performing arts
University of Rome Three
teresa.numerico@uniroma3.it
Luiss - Media Politics and Democracy.
A Challenging Topic for Social Sciences
21-22 May 2015
Questionable Big data
examples: Ethical,
juridical, political and
social doubts
Facebook experiments, google flu trends,
culturonomics
Facebook experiment on
textual emotional contagion
• In June 2014 PNAS journal published
the description of a Facebook
experiment on measuring emotional
negative and positive contagion by
altering the news feed of 689,003
English users
• The paper was written by Adam Kramer
(core data science team Facebook) and
two scholars in social sciences who
worked at the Dept. of Communication
and information science, Cornell
University
See Schroeder 2014 for a complete analysis of the
Facebook experiment
Informed consent
• There is a discussion about informed
consent of the people who were involved
in the experiment
• Users tested in the experiment did not
obtain any prior information or opt-out
opportunity
• Because Facebook is a company and not a
research institution there was no need
to ask for any extra consent than that
which is obtained in the service
agreement
• The defence of Facebook with respect to
this point is based on the fact that the
company always manipulates user
experience (Yarkoni 2014, boyd 2014)
IRB approval
• Because the research was conducted
independently by Facebook and Professor
Hancock had access only to results – and
not to any individual, identifiable data
at any time – Cornell University’s
Institutional Review Board concluded
that he was not directly engaged in
human research and that no review by the
Cornell Human Research Protection
Program was required
Press release Cornell University 30 june 2014http
://mediarelations.cornell.edu/2014/06/30/media-statement-on-c
/
Data collection and
interpretations
• The collection of the data and their
interpretations raises not only ethical
and legal doubts but also
epistemological controversies.
• Positive and negative emotional words
were counted using a linguistic inquiry
and word count software (LIWC 2007) that
implies the use of a generic, univocal,
context free definition of words, judged
as positive or negative. The system
interprets posts by listing the presence
of positive or negative expressions
Kramer and al. 2014, passim
Technological determinism or
exploitation of a dominant
position?
• Prediction and manipulation are based on
the hypothesis that human behaviour is
stable and mechanically alterable
• No replication of the experiment
according to the standard scientific
methodology is possible
• No control on data acquisition from
scientists that were involved in the
interpretation process, Jamie Guillory
and Jeffrey Hancock
• However their reputations as social
scientists were used by the Facebook
team to validate their data science
research results
Social sciences:
representing while intervening
• According to Evelyn Fox Keller (1991), a
feminist philosopher of science and to
Ian Hacking (1983, 1992) it is not
possible to represent something without
intervening and transforming it
• The Facebook experiment is a clear
example of a representation that need
intervention: understanding the
emotional reactions of the human beings
- which were the objects of
representation - implied manipulating
them
• Scientists are like apprentice sorcerer:
they describe emotional reactions, while
inducing them during the experiment
Google Flu Trends (GFT)
failure
• GFT did not give the right predictions on flu
trends, their value almost doubled the data
preview by the Center for disease control and
prevention (CDC)
• Instability of the data
• Continuous changes in the search algorithms
that influenced the GFT data
• Not clear indicators adopted
• Impossible to repeat experiments for
controlling results
• Measurement systems impossible to analyse
• The risk of ‘red teams’ attack on the
monitored systems, that attempt to manipulate
results for economic or political gain
Lazer and al. 2014
Facebook filter bubble
study
• Bakshy et al. Exposure to ideologically diverse news and opinion on Facebook,
Science, 7 may 2015
• David Lumb: Why Scientists Are Upset About the Facebook Filter Bubble Study
• https://www.fastcompany.com/3046111/fast-feed/why-scientists-are-upset-over-
the-facebook-filter-bubble-study
• Christian Sandvig: The Facebook “It’s Not Our Fault” Study
• http://socialmediacollective.org/2015/05/07/the-facebook-its-not-our-fault-
study/
• Eli Pariser: Did Facebook’s Big New Study Kill My Filter Bubble Thesis?
• https://medium.com/backchannel/facebook-published-a-big-new-study-on-the-
filter-bubble-here-s-what-it-says-ef31a292da95
• Zeynep Tufekci: How Facebook’s Algorithm Suppresses Content Diversity
(Modestly) and How the Newsfeed Rules Your Clicks
• https://medium.com/message/how-facebook-s-algorithm-suppresses-content-
diversity-modestly-how-the-newsfeed-rules-the-clicks-b5f8a4bb7bab
• John Wihbey | May 7, 2015: Does Facebook drive political polarization? Data
science and research
http://journalistsresource.org/studies/society/social-media/facebook-political-polar
#
Facebook data science
and politics
• Vinter Mason 28/10/2014:
Politics and Culture on
Facebook in the 2014 Midterm
Elections
https://www.facebook.com/notes/
facebook-data-science/politics-
and-culture-on-facebook-in-the-
2014-midterm-
elections/10152598396348859
Epistemology and politics:
research and power
Changes in thinking about knowledge creation and
their consequences
researching or spying
• How to be a knowledge scientist after Snowden
revelations? (Berendt, Bückler, Rockwell 2015, see
also van Dijck 2014)
• The digital humanist is losing innocence,
experiencing his/her own ‘Manhattan Project’
syndrome: there is no neutral technology
• Technologies are already oriented once they are
used in the research/battle field
• Ethics of knowledge science is needed but it is
very difficult if we decline responsibility on our
creatures as soon as we invent them
• There is a power of data, not only because they
are never raw, not only because they are often
proprietary but also because they are used for
political reasons and every generic ‘neutral’
manipulation is a transformation of the observed
object with no way back
Knowing is transforming AKA
Fox Keller vision
• There is no pure science and bad applications
• Knowledge is action not only with respect to
power in society but also with respect to the
object of research
• After the knowledge process the object will
never be the same
• Language’s role in science is never
considered enough
• The evocative character of language and its
vague, ambiguous status introduces
uncontrolled leaps of meanings, metaphors,
and the pre-scientific arguments
Fox Keller 2011
Rhetoric of BD/1: Computer are
better problem solver than humans
• It’s human nature to focus on the
problems […] where human skill and
ingenuity are most valuable. And it’s
normal human prejudice to undervalue
the problems [of] the domain where
data-driven intelligence really
shines. But […] what problems can
computers solve that we can’t? And
how, when we put that ability
together with human intelligence, can
we combine the two to do more than
either is capable of alone?
Nielsen, 2011, p. 255
Rhetoric of BD/2:
data-driven science
• Science is no more oriented by
interpretation, models and theory
• Science is “data-driven” which - in the
BD jargon - means that there is no
interpretation and no theory prior to
data, because they are just making sense
by themselves
• But this is just rhetoric because in
order to find out the correlation among
data series you need to seek for them
choosing the right machine learning
algorithms, or you risk that the
correlations are just random,
particularly with high dimensionality
No BD without solid
replicable methodologies
• Machine-learning methods are a
valuable part of our toolkit in
understanding behavior, but we do not
yet understand the precise limits of
their applicability
• The biggest contributions before us
are not new algorithms or new social
theories but new methodologies for
decomposing hard questions in the
social sciences into a series of
robust analyses that are replicable
and composable
Raghavan 2014
BD can be useful provided we
understand the epistemological
implications
• According to Kitchin 2014a we
need to develop a “situated,
reflexive and contextually
nuanced epistemology” in order to
effectively use the methods in
social sciences and humanities
• But to understand the problematic
epistemological implication means
to reduce the rhetoric and
comprehend the relationships
savoir/poivoir which are implied
in data-driven results
Let’s ask some final questions on
BD experiments and results
• Who owns the data?
• Who owns the machines on which the data
are processed?
• Who plans the algorithms to make sense
of the data (is the data scientist
working with or without the field
expert)?
• What do we consider as definite results
of the data-driven procedures?
• who is going to take advantages of the
results?
• Is it possible to replicate the process,
on different machines with different
algorithms to be sure of the stability
of the results?
Bibliographic sources/1
• Berendt B.,Buchler M., Rockwell G. (2015) “Is it research or is it
spying?” Pre-print of paper published in Künstliche Intelligenz
2015. (C) Springer, URL of this pre-print:
http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_buechl
er_rockwell
• dana boyd (1 july 2014), “What does the Facebook experiment teach
us?”in the message, URL: https://medium.com/message/what-does-the-
facebook-experiment-teach-us-c858c08e287f
• Hacking I (1983) Representing and Intervening, Cambridge University
Press, Cambridge
• Hacking I (1992) “The self-vindication of the laboratory sci-
ences” In: Pickering A (ed.) Science as Practice and Culture,
University of Chicago Press, Chicago, pp. 29–64.
• Halevy A., Norvig P., Pereira F., (2009) “The unreasonable
effectiveness of data”, IEEE Intelligent systems, March/April 2009,
vol.24 n.9 pp.8-12,
http://static.googleusercontent.com/external_content/untrusted_dlcp
/research.google.com/en//pubs/archive/35179.pdf
• Keller Fox E. (2010) The mirage of a space between nature and
nurture, Duke University Press, Durham & London.
• Kitchin R. (2014a) “Big Data, new epistemologies and paradigm
shifts”, in Big data and society,April-June 2014, 1-12.
• Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data
Infrastructures & Their Consequences. London: Sage.
Bibliographic sources/2
• Kramer A.I. and al. (2014) “Experimental evidence of massive-scale
emotional contagion through social networks”, in PNAS, June 17,
2014, vol. 111, no. 24, 8788–8790,
www.pnas.org/cgi/doi/10.1073/pnas.1320040111
• Lazer D., Kennedy R., King G., Vespignani A. (2014) “The parable of
Google Flu: traps in Big data analysis”, in Science, vol. 343, 14
march 2014, pp. 1203- 1205.
• Leetaru, K. H. (5 September 2011). "Culturomics 2.0: Forecasting
Large-Scale Human Behavior Using Global News Media Tone In Time And
Space". First Monday 16 (9),URL:
http://firstmonday.org/ojs/index.php/fm/article/view/3663/3040#p7
• Licklider J.C.R. (1965): Libraries of the future, The MIT Press,
Cambridge, MA.
• Mayer-Schönberger V., Cukier K. (2013) Big Data. A revolution that
will transform how we live, work and think, Houghton Mifflin
Harcourt, Boston.
• Michel, J.B., Liberman Aiden, E. (14 Jan. 2011). "Quantitative
Analysis of Culture Using Millions of Digitized Books". Science 331
(6014): 176–182.
• Nielsen M. (2012) Reinventing discovery: the new era of networked
science, Princeton University Press, Princeton.
Bibliographic sources/3
• Mayer-Schonberger, V. & Cukier, K. (2013). Big Data. A Revolution̈
That Will Transform How We Live, Schroeder R.(2014) “Big data and
the brave new world of social media research”, in Big data and
society, July-Dec 2014, 1-11.
• Porsdam H. (2013) “Digital Humanities: On Finding the Proper
Balance between Qualitative and Quantitative Ways of Doing Research
in the Humanities”, in Digital humanities quarterly 2013, Volume 7
Number
3http://www.digitalhumanities.org/dhq/vol/7/3/000167/000167.html
• Raghavan P. (2014) “It’s time to scale the science in the social
sciences”, in Big Data and society, Apr-June 2014, pp.1-4.
• Schroeder R. (2014) “Big Data and the brave new world of social and
media research” in Big Data and society July-Dec 2014, 1-11,
bds.sagepub.com.
• Taylor Bob oral interview 1989
http://conservancy.umn.edu/bitstream/107666/1/oh154rt.pdf
• Yarkoni T.(july 2014) In defense of in defense of facebook, in
citation needed, URL:
http://www.talyarkoni.org/blog/2014/07/01/in-defense-of-in-defense-
of-facebook/
• Van Dijck J. (2014) “Datification, Dataism and dataveillance: big
data between scientific paradigm and ideology, in Surveillance and
Society, 2014, vol. 12(2), 197-208.
• Wiener, N. (1950): The Human Use of Human Beings. Houghton Mifflin,
Boston.

Contenu connexe

Tendances

Research Panel Wcet Oct 2009
Research Panel Wcet Oct 2009Research Panel Wcet Oct 2009
Research Panel Wcet Oct 2009Terry Anderson
 
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...Bruce Edmonds
 
Ecel2007 Social Learning
Ecel2007 Social LearningEcel2007 Social Learning
Ecel2007 Social Learningcplp
 
Effect of Multitasking on GPA - Research Paper
Effect of Multitasking on GPA - Research PaperEffect of Multitasking on GPA - Research Paper
Effect of Multitasking on GPA - Research PaperDivya Kothari
 
Esc 2016 crime as alternative pauwels
Esc 2016 crime as alternative pauwelsEsc 2016 crime as alternative pauwels
Esc 2016 crime as alternative pauwelsLieven J.R. Pauwels
 
Chapter 3 research design
Chapter 3 research designChapter 3 research design
Chapter 3 research designFaisal Pak
 
Chapter 1 Social Research
Chapter 1 Social ResearchChapter 1 Social Research
Chapter 1 Social Researcharpsychology
 
Opinion Dynamics on Networks
Opinion Dynamics on NetworksOpinion Dynamics on Networks
Opinion Dynamics on NetworksMason Porter
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internetSyracuse University
 
06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguardsdnac
 
ASSIGNMENT B - RT ready
ASSIGNMENT B - RT readyASSIGNMENT B - RT ready
ASSIGNMENT B - RT readyMilesWeb
 
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuSocial Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuCBS Competitiveness Platform
 
#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalyticsSoudé Fazeli
 
The Relationship Between SNS Usage and Academic Performances: The Mediating R...
The Relationship Between SNS Usage and Academic Performances: The Mediating R...The Relationship Between SNS Usage and Academic Performances: The Mediating R...
The Relationship Between SNS Usage and Academic Performances: The Mediating R...carolzhu
 

Tendances (20)

Research Panel Wcet Oct 2009
Research Panel Wcet Oct 2009Research Panel Wcet Oct 2009
Research Panel Wcet Oct 2009
 
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...
Using Agent-Based Simulation to integrate micro/qualitative evidence, macro-q...
 
Ecel2007 Social Learning
Ecel2007 Social LearningEcel2007 Social Learning
Ecel2007 Social Learning
 
Effect of Multitasking on GPA - Research Paper
Effect of Multitasking on GPA - Research PaperEffect of Multitasking on GPA - Research Paper
Effect of Multitasking on GPA - Research Paper
 
Esc 2016 crime as alternative pauwels
Esc 2016 crime as alternative pauwelsEsc 2016 crime as alternative pauwels
Esc 2016 crime as alternative pauwels
 
Chapter 3 research design
Chapter 3 research designChapter 3 research design
Chapter 3 research design
 
Research Traditions 2010
Research Traditions 2010Research Traditions 2010
Research Traditions 2010
 
Research Traditions
Research TraditionsResearch Traditions
Research Traditions
 
Chapter 1 Social Research
Chapter 1 Social ResearchChapter 1 Social Research
Chapter 1 Social Research
 
Internet-based research
Internet-based researchInternet-based research
Internet-based research
 
Opinion Dynamics on Networks
Opinion Dynamics on NetworksOpinion Dynamics on Networks
Opinion Dynamics on Networks
 
Carma internet research module: Sampling for internet
Carma internet research module: Sampling for internetCarma internet research module: Sampling for internet
Carma internet research module: Sampling for internet
 
Ch9
Ch9Ch9
Ch9
 
06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards06 Network Study Design: Ethical Considerations and Safeguards
06 Network Study Design: Ethical Considerations and Safeguards
 
ASSIGNMENT B - RT ready
ASSIGNMENT B - RT readyASSIGNMENT B - RT ready
ASSIGNMENT B - RT ready
 
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi VatrapuSocial Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
Social Media Analytics: Concepts, Models, Methods, & Tools - Ravi Vatrapu
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Pick a Crowd
Pick a CrowdPick a Crowd
Pick a Crowd
 
#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics#lak2013, Leuven, DC slides, #learninganalytics
#lak2013, Leuven, DC slides, #learninganalytics
 
The Relationship Between SNS Usage and Academic Performances: The Mediating R...
The Relationship Between SNS Usage and Academic Performances: The Mediating R...The Relationship Between SNS Usage and Academic Performances: The Mediating R...
The Relationship Between SNS Usage and Academic Performances: The Mediating R...
 

En vedette

accidentInvestigationFFM2015
accidentInvestigationFFM2015accidentInvestigationFFM2015
accidentInvestigationFFM2015Teresa Valentin
 
ILMU KAYU ULTRA STRUKTUR KAYU
ILMU KAYU ULTRA STRUKTUR KAYUILMU KAYU ULTRA STRUKTUR KAYU
ILMU KAYU ULTRA STRUKTUR KAYUEDIS BLOG
 
Fedorova svetlana, contest winner!
Fedorova svetlana, contest winner! Fedorova svetlana, contest winner!
Fedorova svetlana, contest winner! bilimland
 
Презентація 1
Презентація 1Презентація 1
Презентація 1salger01
 
ΣΗΜΕΙΑ ΣΤΙΞΗΣ
ΣΗΜΕΙΑ ΣΤΙΞΗΣΣΗΜΕΙΑ ΣΤΙΞΗΣ
ΣΗΜΕΙΑ ΣΤΙΞΗΣEleni Kots
 
Управление инвестиционными проектами
Управление инвестиционными проектамиУправление инвестиционными проектами
Управление инвестиционными проектамиМКД Партнер
 
Innovation Mavericks meet Corporate Rebels
Innovation Mavericks meet Corporate RebelsInnovation Mavericks meet Corporate Rebels
Innovation Mavericks meet Corporate RebelsPeter Vander Auwera
 
第2回全国大会分科会(山浦)
第2回全国大会分科会(山浦)第2回全国大会分科会(山浦)
第2回全国大会分科会(山浦)human-edu
 
The Neural Basis for Creativity
The Neural Basis for CreativityThe Neural Basis for Creativity
The Neural Basis for CreativityPlanning-ness
 
A short story of Ken Little
A short story of Ken LittleA short story of Ken Little
A short story of Ken Littledekalb walcott
 
Building the TAD ecosystem
Building the TAD ecosystemBuilding the TAD ecosystem
Building the TAD ecosystemAlan Quayle
 

En vedette (14)

accidentInvestigationFFM2015
accidentInvestigationFFM2015accidentInvestigationFFM2015
accidentInvestigationFFM2015
 
ILMU KAYU ULTRA STRUKTUR KAYU
ILMU KAYU ULTRA STRUKTUR KAYUILMU KAYU ULTRA STRUKTUR KAYU
ILMU KAYU ULTRA STRUKTUR KAYU
 
Fedorova svetlana, contest winner!
Fedorova svetlana, contest winner! Fedorova svetlana, contest winner!
Fedorova svetlana, contest winner!
 
Online Tools for Assessment & Treatment
Online Tools for Assessment & TreatmentOnline Tools for Assessment & Treatment
Online Tools for Assessment & Treatment
 
Ispring & edmodo
Ispring & edmodoIspring & edmodo
Ispring & edmodo
 
Презентація 1
Презентація 1Презентація 1
Презентація 1
 
ΣΗΜΕΙΑ ΣΤΙΞΗΣ
ΣΗΜΕΙΑ ΣΤΙΞΗΣΣΗΜΕΙΑ ΣΤΙΞΗΣ
ΣΗΜΕΙΑ ΣΤΙΞΗΣ
 
Управление инвестиционными проектами
Управление инвестиционными проектамиУправление инвестиционными проектами
Управление инвестиционными проектами
 
Innovation Mavericks meet Corporate Rebels
Innovation Mavericks meet Corporate RebelsInnovation Mavericks meet Corporate Rebels
Innovation Mavericks meet Corporate Rebels
 
第2回全国大会分科会(山浦)
第2回全国大会分科会(山浦)第2回全国大会分科会(山浦)
第2回全国大会分科会(山浦)
 
Presentation on UCT MOOCs to UWC's Public Health workshop
Presentation on UCT MOOCs to UWC's Public Health workshopPresentation on UCT MOOCs to UWC's Public Health workshop
Presentation on UCT MOOCs to UWC's Public Health workshop
 
The Neural Basis for Creativity
The Neural Basis for CreativityThe Neural Basis for Creativity
The Neural Basis for Creativity
 
A short story of Ken Little
A short story of Ken LittleA short story of Ken Little
A short story of Ken Little
 
Building the TAD ecosystem
Building the TAD ecosystemBuilding the TAD ecosystem
Building the TAD ecosystem
 

Similaire à Big data luiss

Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialKa_Kinder
 
paradigms-190305093939 (1).pdf
paradigms-190305093939 (1).pdfparadigms-190305093939 (1).pdf
paradigms-190305093939 (1).pdfssuser31c469
 
1 Nature of Inquiry and Research.pptx
1 Nature of Inquiry and Research.pptx1 Nature of Inquiry and Research.pptx
1 Nature of Inquiry and Research.pptxCharizaPitogo2
 
2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptxJenniferApollo
 
Research paradigms ii (1)
Research paradigms ii (1)Research paradigms ii (1)
Research paradigms ii (1)Amina Tariq
 
Aronson 6e ch2_research
Aronson 6e ch2_researchAronson 6e ch2_research
Aronson 6e ch2_researchmrkramek
 
Pet735 week 4 presentation
Pet735 week 4 presentationPet735 week 4 presentation
Pet735 week 4 presentationaemachamer
 
Pet735 week 4 pres.
Pet735 week 4 pres. Pet735 week 4 pres.
Pet735 week 4 pres. aemachamer
 
Week 12 13 march 3rd 2012
Week 12 13 march 3rd 2012Week 12 13 march 3rd 2012
Week 12 13 march 3rd 2012icleir_
 
Learning Analytics – Ethical questions and dilemmas
Learning Analytics  – Ethical questions and dilemmasLearning Analytics  – Ethical questions and dilemmas
Learning Analytics – Ethical questions and dilemmasTore Hoel
 
SOCIAL RESEARCH.pptx
SOCIAL RESEARCH.pptxSOCIAL RESEARCH.pptx
SOCIAL RESEARCH.pptxrupasi13
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsrobkitchin
 
Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Claire Stewart
 
QE. Strength of Ties under conditions of anonymity
QE. Strength of Ties under conditions of anonymityQE. Strength of Ties under conditions of anonymity
QE. Strength of Ties under conditions of anonymityHerbert Eng
 

Similaire à Big data luiss (20)

Internet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 TutorialInternet Research Ethics CSSWS2015 Tutorial
Internet Research Ethics CSSWS2015 Tutorial
 
Social physics
Social physicsSocial physics
Social physics
 
Social media & research
Social media & researchSocial media & research
Social media & research
 
paradigms-190305093939 (1).pdf
paradigms-190305093939 (1).pdfparadigms-190305093939 (1).pdf
paradigms-190305093939 (1).pdf
 
Paradigms
ParadigmsParadigms
Paradigms
 
1 Nature of Inquiry and Research.pptx
1 Nature of Inquiry and Research.pptx1 Nature of Inquiry and Research.pptx
1 Nature of Inquiry and Research.pptx
 
2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx2-kinds-and-importance-of-research.pptx
2-kinds-and-importance-of-research.pptx
 
Research paradigms ii (1)
Research paradigms ii (1)Research paradigms ii (1)
Research paradigms ii (1)
 
Aronson 6e ch2_research
Aronson 6e ch2_researchAronson 6e ch2_research
Aronson 6e ch2_research
 
Pet735 week 4 presentation
Pet735 week 4 presentationPet735 week 4 presentation
Pet735 week 4 presentation
 
Pet735 week 4 pres.
Pet735 week 4 pres. Pet735 week 4 pres.
Pet735 week 4 pres.
 
Week 12 13 march 3rd 2012
Week 12 13 march 3rd 2012Week 12 13 march 3rd 2012
Week 12 13 march 3rd 2012
 
Culture Design 101
Culture Design 101Culture Design 101
Culture Design 101
 
Learning Analytics – Ethical questions and dilemmas
Learning Analytics  – Ethical questions and dilemmasLearning Analytics  – Ethical questions and dilemmas
Learning Analytics – Ethical questions and dilemmas
 
Lo1 Research & Statistics
Lo1 Research & StatisticsLo1 Research & Statistics
Lo1 Research & Statistics
 
SOCIAL RESEARCH.pptx
SOCIAL RESEARCH.pptxSOCIAL RESEARCH.pptx
SOCIAL RESEARCH.pptx
 
Big data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shiftsBig data, new epistemologies and paradigm shifts
Big data, new epistemologies and paradigm shifts
 
Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...Evolving and emerging scholarly communication services in libraries: public a...
Evolving and emerging scholarly communication services in libraries: public a...
 
classJan11.ppt
classJan11.pptclassJan11.ppt
classJan11.ppt
 
QE. Strength of Ties under conditions of anonymity
QE. Strength of Ties under conditions of anonymityQE. Strength of Ties under conditions of anonymity
QE. Strength of Ties under conditions of anonymity
 

Big data luiss

  • 1. Big data in social sciences and humanities: from epistemology to data power Teresa Numerico Dept. Philosophy, communication and performing arts University of Rome Three teresa.numerico@uniroma3.it Luiss - Media Politics and Democracy. A Challenging Topic for Social Sciences 21-22 May 2015
  • 2. Questionable Big data examples: Ethical, juridical, political and social doubts Facebook experiments, google flu trends, culturonomics
  • 3. Facebook experiment on textual emotional contagion • In June 2014 PNAS journal published the description of a Facebook experiment on measuring emotional negative and positive contagion by altering the news feed of 689,003 English users • The paper was written by Adam Kramer (core data science team Facebook) and two scholars in social sciences who worked at the Dept. of Communication and information science, Cornell University See Schroeder 2014 for a complete analysis of the Facebook experiment
  • 4. Informed consent • There is a discussion about informed consent of the people who were involved in the experiment • Users tested in the experiment did not obtain any prior information or opt-out opportunity • Because Facebook is a company and not a research institution there was no need to ask for any extra consent than that which is obtained in the service agreement • The defence of Facebook with respect to this point is based on the fact that the company always manipulates user experience (Yarkoni 2014, boyd 2014)
  • 5. IRB approval • Because the research was conducted independently by Facebook and Professor Hancock had access only to results – and not to any individual, identifiable data at any time – Cornell University’s Institutional Review Board concluded that he was not directly engaged in human research and that no review by the Cornell Human Research Protection Program was required Press release Cornell University 30 june 2014http ://mediarelations.cornell.edu/2014/06/30/media-statement-on-c /
  • 6. Data collection and interpretations • The collection of the data and their interpretations raises not only ethical and legal doubts but also epistemological controversies. • Positive and negative emotional words were counted using a linguistic inquiry and word count software (LIWC 2007) that implies the use of a generic, univocal, context free definition of words, judged as positive or negative. The system interprets posts by listing the presence of positive or negative expressions Kramer and al. 2014, passim
  • 7. Technological determinism or exploitation of a dominant position? • Prediction and manipulation are based on the hypothesis that human behaviour is stable and mechanically alterable • No replication of the experiment according to the standard scientific methodology is possible • No control on data acquisition from scientists that were involved in the interpretation process, Jamie Guillory and Jeffrey Hancock • However their reputations as social scientists were used by the Facebook team to validate their data science research results
  • 8. Social sciences: representing while intervening • According to Evelyn Fox Keller (1991), a feminist philosopher of science and to Ian Hacking (1983, 1992) it is not possible to represent something without intervening and transforming it • The Facebook experiment is a clear example of a representation that need intervention: understanding the emotional reactions of the human beings - which were the objects of representation - implied manipulating them • Scientists are like apprentice sorcerer: they describe emotional reactions, while inducing them during the experiment
  • 9. Google Flu Trends (GFT) failure • GFT did not give the right predictions on flu trends, their value almost doubled the data preview by the Center for disease control and prevention (CDC) • Instability of the data • Continuous changes in the search algorithms that influenced the GFT data • Not clear indicators adopted • Impossible to repeat experiments for controlling results • Measurement systems impossible to analyse • The risk of ‘red teams’ attack on the monitored systems, that attempt to manipulate results for economic or political gain Lazer and al. 2014
  • 10. Facebook filter bubble study • Bakshy et al. Exposure to ideologically diverse news and opinion on Facebook, Science, 7 may 2015 • David Lumb: Why Scientists Are Upset About the Facebook Filter Bubble Study • https://www.fastcompany.com/3046111/fast-feed/why-scientists-are-upset-over- the-facebook-filter-bubble-study • Christian Sandvig: The Facebook “It’s Not Our Fault” Study • http://socialmediacollective.org/2015/05/07/the-facebook-its-not-our-fault- study/ • Eli Pariser: Did Facebook’s Big New Study Kill My Filter Bubble Thesis? • https://medium.com/backchannel/facebook-published-a-big-new-study-on-the- filter-bubble-here-s-what-it-says-ef31a292da95 • Zeynep Tufekci: How Facebook’s Algorithm Suppresses Content Diversity (Modestly) and How the Newsfeed Rules Your Clicks • https://medium.com/message/how-facebook-s-algorithm-suppresses-content- diversity-modestly-how-the-newsfeed-rules-the-clicks-b5f8a4bb7bab • John Wihbey | May 7, 2015: Does Facebook drive political polarization? Data science and research http://journalistsresource.org/studies/society/social-media/facebook-political-polar #
  • 11. Facebook data science and politics • Vinter Mason 28/10/2014: Politics and Culture on Facebook in the 2014 Midterm Elections https://www.facebook.com/notes/ facebook-data-science/politics- and-culture-on-facebook-in-the- 2014-midterm- elections/10152598396348859
  • 12. Epistemology and politics: research and power Changes in thinking about knowledge creation and their consequences
  • 13. researching or spying • How to be a knowledge scientist after Snowden revelations? (Berendt, Bückler, Rockwell 2015, see also van Dijck 2014) • The digital humanist is losing innocence, experiencing his/her own ‘Manhattan Project’ syndrome: there is no neutral technology • Technologies are already oriented once they are used in the research/battle field • Ethics of knowledge science is needed but it is very difficult if we decline responsibility on our creatures as soon as we invent them • There is a power of data, not only because they are never raw, not only because they are often proprietary but also because they are used for political reasons and every generic ‘neutral’ manipulation is a transformation of the observed object with no way back
  • 14. Knowing is transforming AKA Fox Keller vision • There is no pure science and bad applications • Knowledge is action not only with respect to power in society but also with respect to the object of research • After the knowledge process the object will never be the same • Language’s role in science is never considered enough • The evocative character of language and its vague, ambiguous status introduces uncontrolled leaps of meanings, metaphors, and the pre-scientific arguments Fox Keller 2011
  • 15. Rhetoric of BD/1: Computer are better problem solver than humans • It’s human nature to focus on the problems […] where human skill and ingenuity are most valuable. And it’s normal human prejudice to undervalue the problems [of] the domain where data-driven intelligence really shines. But […] what problems can computers solve that we can’t? And how, when we put that ability together with human intelligence, can we combine the two to do more than either is capable of alone? Nielsen, 2011, p. 255
  • 16. Rhetoric of BD/2: data-driven science • Science is no more oriented by interpretation, models and theory • Science is “data-driven” which - in the BD jargon - means that there is no interpretation and no theory prior to data, because they are just making sense by themselves • But this is just rhetoric because in order to find out the correlation among data series you need to seek for them choosing the right machine learning algorithms, or you risk that the correlations are just random, particularly with high dimensionality
  • 17. No BD without solid replicable methodologies • Machine-learning methods are a valuable part of our toolkit in understanding behavior, but we do not yet understand the precise limits of their applicability • The biggest contributions before us are not new algorithms or new social theories but new methodologies for decomposing hard questions in the social sciences into a series of robust analyses that are replicable and composable Raghavan 2014
  • 18. BD can be useful provided we understand the epistemological implications • According to Kitchin 2014a we need to develop a “situated, reflexive and contextually nuanced epistemology” in order to effectively use the methods in social sciences and humanities • But to understand the problematic epistemological implication means to reduce the rhetoric and comprehend the relationships savoir/poivoir which are implied in data-driven results
  • 19. Let’s ask some final questions on BD experiments and results • Who owns the data? • Who owns the machines on which the data are processed? • Who plans the algorithms to make sense of the data (is the data scientist working with or without the field expert)? • What do we consider as definite results of the data-driven procedures? • who is going to take advantages of the results? • Is it possible to replicate the process, on different machines with different algorithms to be sure of the stability of the results?
  • 20. Bibliographic sources/1 • Berendt B.,Buchler M., Rockwell G. (2015) “Is it research or is it spying?” Pre-print of paper published in Künstliche Intelligenz 2015. (C) Springer, URL of this pre-print: http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_buechl er_rockwell • dana boyd (1 july 2014), “What does the Facebook experiment teach us?”in the message, URL: https://medium.com/message/what-does-the- facebook-experiment-teach-us-c858c08e287f • Hacking I (1983) Representing and Intervening, Cambridge University Press, Cambridge • Hacking I (1992) “The self-vindication of the laboratory sci- ences” In: Pickering A (ed.) Science as Practice and Culture, University of Chicago Press, Chicago, pp. 29–64. • Halevy A., Norvig P., Pereira F., (2009) “The unreasonable effectiveness of data”, IEEE Intelligent systems, March/April 2009, vol.24 n.9 pp.8-12, http://static.googleusercontent.com/external_content/untrusted_dlcp /research.google.com/en//pubs/archive/35179.pdf • Keller Fox E. (2010) The mirage of a space between nature and nurture, Duke University Press, Durham & London. • Kitchin R. (2014a) “Big Data, new epistemologies and paradigm shifts”, in Big data and society,April-June 2014, 1-12. • Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data Infrastructures & Their Consequences. London: Sage.
  • 21. Bibliographic sources/2 • Kramer A.I. and al. (2014) “Experimental evidence of massive-scale emotional contagion through social networks”, in PNAS, June 17, 2014, vol. 111, no. 24, 8788–8790, www.pnas.org/cgi/doi/10.1073/pnas.1320040111 • Lazer D., Kennedy R., King G., Vespignani A. (2014) “The parable of Google Flu: traps in Big data analysis”, in Science, vol. 343, 14 march 2014, pp. 1203- 1205. • Leetaru, K. H. (5 September 2011). "Culturomics 2.0: Forecasting Large-Scale Human Behavior Using Global News Media Tone In Time And Space". First Monday 16 (9),URL: http://firstmonday.org/ojs/index.php/fm/article/view/3663/3040#p7 • Licklider J.C.R. (1965): Libraries of the future, The MIT Press, Cambridge, MA. • Mayer-Schönberger V., Cukier K. (2013) Big Data. A revolution that will transform how we live, work and think, Houghton Mifflin Harcourt, Boston. • Michel, J.B., Liberman Aiden, E. (14 Jan. 2011). "Quantitative Analysis of Culture Using Millions of Digitized Books". Science 331 (6014): 176–182. • Nielsen M. (2012) Reinventing discovery: the new era of networked science, Princeton University Press, Princeton.
  • 22. Bibliographic sources/3 • Mayer-Schonberger, V. & Cukier, K. (2013). Big Data. A Revolution̈ That Will Transform How We Live, Schroeder R.(2014) “Big data and the brave new world of social media research”, in Big data and society, July-Dec 2014, 1-11. • Porsdam H. (2013) “Digital Humanities: On Finding the Proper Balance between Qualitative and Quantitative Ways of Doing Research in the Humanities”, in Digital humanities quarterly 2013, Volume 7 Number 3http://www.digitalhumanities.org/dhq/vol/7/3/000167/000167.html • Raghavan P. (2014) “It’s time to scale the science in the social sciences”, in Big Data and society, Apr-June 2014, pp.1-4. • Schroeder R. (2014) “Big Data and the brave new world of social and media research” in Big Data and society July-Dec 2014, 1-11, bds.sagepub.com. • Taylor Bob oral interview 1989 http://conservancy.umn.edu/bitstream/107666/1/oh154rt.pdf • Yarkoni T.(july 2014) In defense of in defense of facebook, in citation needed, URL: http://www.talyarkoni.org/blog/2014/07/01/in-defense-of-in-defense- of-facebook/ • Van Dijck J. (2014) “Datification, Dataism and dataveillance: big data between scientific paradigm and ideology, in Surveillance and Society, 2014, vol. 12(2), 197-208. • Wiener, N. (1950): The Human Use of Human Beings. Houghton Mifflin, Boston.

Notes de l'éditeur

  1. To live effectively is to live with adequate information. Thus communication and control belong to the essence of man’s inner life, even as they belong to his life in society The needs and the complexity of modern life make greater demands on this process of information than ever before, and our press, our museums, our scientific laboratories, our universities, our libraries and textbooks, are obliged to meet the needs of this process or fail their purpose Wiener 1950: 18 Property rights in information suffer from the necessary disadvantage that a piece of information, in order to contribute to the general information of the community, must say something substantially different from the community’s previous common stock of information
  2. The part of the fund of knowledge that interacts with nature during an experiment therefore is only that part that it is stored inside the experimenter ‘s head, plus the small ammounts that come into his head from books he reads or from calls he makes to the library while his experiment is running or that are implicit in the design of his experimental apparatus Licklider 1965, pp. 22-23
  3. The experiment raised a huge discussion about what ought to be scientific research on human behaviours there were various position in discussion we will discuss some of them just because this is a very interesting case in which we can raise a lot of epistemological, political, and of couse ethical issues I am a philosopher of science and I am interested particularly in the epistemological part of the discussion, but I cannot avoid to underline that the ethical and research politics issues cannot be easily divided from the rest of the epistemological discourse. I will start with a quick discussion of ethical and legal issues and then I will pass to illustrate my position with regard to the epistemological problems raised by Facebook result
  4. We should be aware that these social experiments are based on intervention as well as on representation
  5. As if computer were not organized and managed by humans, just another list of experts, those who are expert in machine learning algorithms instead of people who are expert in the field of the research data scientists instead of humanities or social scientists See also wiener if responsibility is applied to machines they will answer with a tempest…. No way to stop the mechanism once it is In place
  6. Vice president of engineering at Google