QE. Strength of Ties under conditions of anonymity
Big data luiss
1. Big data in social sciences
and humanities: from
epistemology to data power
Teresa Numerico
Dept. Philosophy, communication and
performing arts
University of Rome Three
teresa.numerico@uniroma3.it
Luiss - Media Politics and Democracy.
A Challenging Topic for Social Sciences
21-22 May 2015
2. Questionable Big data
examples: Ethical,
juridical, political and
social doubts
Facebook experiments, google flu trends,
culturonomics
3. Facebook experiment on
textual emotional contagion
• In June 2014 PNAS journal published
the description of a Facebook
experiment on measuring emotional
negative and positive contagion by
altering the news feed of 689,003
English users
• The paper was written by Adam Kramer
(core data science team Facebook) and
two scholars in social sciences who
worked at the Dept. of Communication
and information science, Cornell
University
See Schroeder 2014 for a complete analysis of the
Facebook experiment
4. Informed consent
• There is a discussion about informed
consent of the people who were involved
in the experiment
• Users tested in the experiment did not
obtain any prior information or opt-out
opportunity
• Because Facebook is a company and not a
research institution there was no need
to ask for any extra consent than that
which is obtained in the service
agreement
• The defence of Facebook with respect to
this point is based on the fact that the
company always manipulates user
experience (Yarkoni 2014, boyd 2014)
5. IRB approval
• Because the research was conducted
independently by Facebook and Professor
Hancock had access only to results – and
not to any individual, identifiable data
at any time – Cornell University’s
Institutional Review Board concluded
that he was not directly engaged in
human research and that no review by the
Cornell Human Research Protection
Program was required
Press release Cornell University 30 june 2014http
://mediarelations.cornell.edu/2014/06/30/media-statement-on-c
/
6. Data collection and
interpretations
• The collection of the data and their
interpretations raises not only ethical
and legal doubts but also
epistemological controversies.
• Positive and negative emotional words
were counted using a linguistic inquiry
and word count software (LIWC 2007) that
implies the use of a generic, univocal,
context free definition of words, judged
as positive or negative. The system
interprets posts by listing the presence
of positive or negative expressions
Kramer and al. 2014, passim
7. Technological determinism or
exploitation of a dominant
position?
• Prediction and manipulation are based on
the hypothesis that human behaviour is
stable and mechanically alterable
• No replication of the experiment
according to the standard scientific
methodology is possible
• No control on data acquisition from
scientists that were involved in the
interpretation process, Jamie Guillory
and Jeffrey Hancock
• However their reputations as social
scientists were used by the Facebook
team to validate their data science
research results
8. Social sciences:
representing while intervening
• According to Evelyn Fox Keller (1991), a
feminist philosopher of science and to
Ian Hacking (1983, 1992) it is not
possible to represent something without
intervening and transforming it
• The Facebook experiment is a clear
example of a representation that need
intervention: understanding the
emotional reactions of the human beings
- which were the objects of
representation - implied manipulating
them
• Scientists are like apprentice sorcerer:
they describe emotional reactions, while
inducing them during the experiment
9. Google Flu Trends (GFT)
failure
• GFT did not give the right predictions on flu
trends, their value almost doubled the data
preview by the Center for disease control and
prevention (CDC)
• Instability of the data
• Continuous changes in the search algorithms
that influenced the GFT data
• Not clear indicators adopted
• Impossible to repeat experiments for
controlling results
• Measurement systems impossible to analyse
• The risk of ‘red teams’ attack on the
monitored systems, that attempt to manipulate
results for economic or political gain
Lazer and al. 2014
10. Facebook filter bubble
study
• Bakshy et al. Exposure to ideologically diverse news and opinion on Facebook,
Science, 7 may 2015
• David Lumb: Why Scientists Are Upset About the Facebook Filter Bubble Study
• https://www.fastcompany.com/3046111/fast-feed/why-scientists-are-upset-over-
the-facebook-filter-bubble-study
• Christian Sandvig: The Facebook “It’s Not Our Fault” Study
• http://socialmediacollective.org/2015/05/07/the-facebook-its-not-our-fault-
study/
• Eli Pariser: Did Facebook’s Big New Study Kill My Filter Bubble Thesis?
• https://medium.com/backchannel/facebook-published-a-big-new-study-on-the-
filter-bubble-here-s-what-it-says-ef31a292da95
• Zeynep Tufekci: How Facebook’s Algorithm Suppresses Content Diversity
(Modestly) and How the Newsfeed Rules Your Clicks
• https://medium.com/message/how-facebook-s-algorithm-suppresses-content-
diversity-modestly-how-the-newsfeed-rules-the-clicks-b5f8a4bb7bab
• John Wihbey | May 7, 2015: Does Facebook drive political polarization? Data
science and research
http://journalistsresource.org/studies/society/social-media/facebook-political-polar
#
11. Facebook data science
and politics
• Vinter Mason 28/10/2014:
Politics and Culture on
Facebook in the 2014 Midterm
Elections
https://www.facebook.com/notes/
facebook-data-science/politics-
and-culture-on-facebook-in-the-
2014-midterm-
elections/10152598396348859
13. researching or spying
• How to be a knowledge scientist after Snowden
revelations? (Berendt, Bückler, Rockwell 2015, see
also van Dijck 2014)
• The digital humanist is losing innocence,
experiencing his/her own ‘Manhattan Project’
syndrome: there is no neutral technology
• Technologies are already oriented once they are
used in the research/battle field
• Ethics of knowledge science is needed but it is
very difficult if we decline responsibility on our
creatures as soon as we invent them
• There is a power of data, not only because they
are never raw, not only because they are often
proprietary but also because they are used for
political reasons and every generic ‘neutral’
manipulation is a transformation of the observed
object with no way back
14. Knowing is transforming AKA
Fox Keller vision
• There is no pure science and bad applications
• Knowledge is action not only with respect to
power in society but also with respect to the
object of research
• After the knowledge process the object will
never be the same
• Language’s role in science is never
considered enough
• The evocative character of language and its
vague, ambiguous status introduces
uncontrolled leaps of meanings, metaphors,
and the pre-scientific arguments
Fox Keller 2011
15. Rhetoric of BD/1: Computer are
better problem solver than humans
• It’s human nature to focus on the
problems […] where human skill and
ingenuity are most valuable. And it’s
normal human prejudice to undervalue
the problems [of] the domain where
data-driven intelligence really
shines. But […] what problems can
computers solve that we can’t? And
how, when we put that ability
together with human intelligence, can
we combine the two to do more than
either is capable of alone?
Nielsen, 2011, p. 255
16. Rhetoric of BD/2:
data-driven science
• Science is no more oriented by
interpretation, models and theory
• Science is “data-driven” which - in the
BD jargon - means that there is no
interpretation and no theory prior to
data, because they are just making sense
by themselves
• But this is just rhetoric because in
order to find out the correlation among
data series you need to seek for them
choosing the right machine learning
algorithms, or you risk that the
correlations are just random,
particularly with high dimensionality
17. No BD without solid
replicable methodologies
• Machine-learning methods are a
valuable part of our toolkit in
understanding behavior, but we do not
yet understand the precise limits of
their applicability
• The biggest contributions before us
are not new algorithms or new social
theories but new methodologies for
decomposing hard questions in the
social sciences into a series of
robust analyses that are replicable
and composable
Raghavan 2014
18. BD can be useful provided we
understand the epistemological
implications
• According to Kitchin 2014a we
need to develop a “situated,
reflexive and contextually
nuanced epistemology” in order to
effectively use the methods in
social sciences and humanities
• But to understand the problematic
epistemological implication means
to reduce the rhetoric and
comprehend the relationships
savoir/poivoir which are implied
in data-driven results
19. Let’s ask some final questions on
BD experiments and results
• Who owns the data?
• Who owns the machines on which the data
are processed?
• Who plans the algorithms to make sense
of the data (is the data scientist
working with or without the field
expert)?
• What do we consider as definite results
of the data-driven procedures?
• who is going to take advantages of the
results?
• Is it possible to replicate the process,
on different machines with different
algorithms to be sure of the stability
of the results?
20. Bibliographic sources/1
• Berendt B.,Buchler M., Rockwell G. (2015) “Is it research or is it
spying?” Pre-print of paper published in Künstliche Intelligenz
2015. (C) Springer, URL of this pre-print:
http://people.cs.kuleuven.be/~bettina.berendt/Papers/berendt_buechl
er_rockwell
• dana boyd (1 july 2014), “What does the Facebook experiment teach
us?”in the message, URL: https://medium.com/message/what-does-the-
facebook-experiment-teach-us-c858c08e287f
• Hacking I (1983) Representing and Intervening, Cambridge University
Press, Cambridge
• Hacking I (1992) “The self-vindication of the laboratory sci-
ences” In: Pickering A (ed.) Science as Practice and Culture,
University of Chicago Press, Chicago, pp. 29–64.
• Halevy A., Norvig P., Pereira F., (2009) “The unreasonable
effectiveness of data”, IEEE Intelligent systems, March/April 2009,
vol.24 n.9 pp.8-12,
http://static.googleusercontent.com/external_content/untrusted_dlcp
/research.google.com/en//pubs/archive/35179.pdf
• Keller Fox E. (2010) The mirage of a space between nature and
nurture, Duke University Press, Durham & London.
• Kitchin R. (2014a) “Big Data, new epistemologies and paradigm
shifts”, in Big data and society,April-June 2014, 1-12.
• Kitchin, R. (2014). The Data Revolution. Big Data, Open Data, Data
Infrastructures & Their Consequences. London: Sage.
21. Bibliographic sources/2
• Kramer A.I. and al. (2014) “Experimental evidence of massive-scale
emotional contagion through social networks”, in PNAS, June 17,
2014, vol. 111, no. 24, 8788–8790,
www.pnas.org/cgi/doi/10.1073/pnas.1320040111
• Lazer D., Kennedy R., King G., Vespignani A. (2014) “The parable of
Google Flu: traps in Big data analysis”, in Science, vol. 343, 14
march 2014, pp. 1203- 1205.
• Leetaru, K. H. (5 September 2011). "Culturomics 2.0: Forecasting
Large-Scale Human Behavior Using Global News Media Tone In Time And
Space". First Monday 16 (9),URL:
http://firstmonday.org/ojs/index.php/fm/article/view/3663/3040#p7
• Licklider J.C.R. (1965): Libraries of the future, The MIT Press,
Cambridge, MA.
• Mayer-Schönberger V., Cukier K. (2013) Big Data. A revolution that
will transform how we live, work and think, Houghton Mifflin
Harcourt, Boston.
• Michel, J.B., Liberman Aiden, E. (14 Jan. 2011). "Quantitative
Analysis of Culture Using Millions of Digitized Books". Science 331
(6014): 176–182.
• Nielsen M. (2012) Reinventing discovery: the new era of networked
science, Princeton University Press, Princeton.
22. Bibliographic sources/3
• Mayer-Schonberger, V. & Cukier, K. (2013). Big Data. A Revolution̈
That Will Transform How We Live, Schroeder R.(2014) “Big data and
the brave new world of social media research”, in Big data and
society, July-Dec 2014, 1-11.
• Porsdam H. (2013) “Digital Humanities: On Finding the Proper
Balance between Qualitative and Quantitative Ways of Doing Research
in the Humanities”, in Digital humanities quarterly 2013, Volume 7
Number
3http://www.digitalhumanities.org/dhq/vol/7/3/000167/000167.html
• Raghavan P. (2014) “It’s time to scale the science in the social
sciences”, in Big Data and society, Apr-June 2014, pp.1-4.
• Schroeder R. (2014) “Big Data and the brave new world of social and
media research” in Big Data and society July-Dec 2014, 1-11,
bds.sagepub.com.
• Taylor Bob oral interview 1989
http://conservancy.umn.edu/bitstream/107666/1/oh154rt.pdf
• Yarkoni T.(july 2014) In defense of in defense of facebook, in
citation needed, URL:
http://www.talyarkoni.org/blog/2014/07/01/in-defense-of-in-defense-
of-facebook/
• Van Dijck J. (2014) “Datification, Dataism and dataveillance: big
data between scientific paradigm and ideology, in Surveillance and
Society, 2014, vol. 12(2), 197-208.
• Wiener, N. (1950): The Human Use of Human Beings. Houghton Mifflin,
Boston.
Notes de l'éditeur
To live effectively is to live with adequate information. Thus communication and control belong to the essence of man’s inner life, even as they belong to his life in society
The needs and the complexity of modern life make greater demands on this process of information than ever before, and our press, our museums, our scientific laboratories, our universities, our libraries and textbooks, are obliged to meet the needs of this process or fail their purpose
Wiener 1950: 18
Property rights in information suffer from the necessary disadvantage that a piece of information, in order to contribute to the general information of the community, must say something substantially different from the community’s previous common stock of information
The part of the fund of knowledge that interacts with nature during an experiment therefore is only that part that it is stored inside the experimenter ‘s head, plus the small ammounts that come into his head from books he reads or from calls he makes to the library while his experiment is running or that are implicit in the design of his experimental apparatus Licklider 1965, pp. 22-23
The experiment raised a huge discussion about what ought to be scientific research on human behaviours there were various position in discussion we will discuss some of them just because this is a very interesting case in which we can raise a lot of epistemological, political, and of couse ethical issues I am a philosopher of science and I am interested particularly in the epistemological part of the discussion, but I cannot avoid to underline that the ethical and research politics issues cannot be easily divided from the rest of the epistemological discourse.
I will start with a quick discussion of ethical and legal issues and then I will pass to illustrate my position with regard to the epistemological problems raised by Facebook result
We should be aware that these social experiments are based on intervention as well as on representation
As if computer were not organized and managed by humans, just another list of experts, those who are expert in machine learning algorithms instead of people who are expert in the field of the research data scientists instead of humanities or social scientists
See also wiener if responsibility is applied to machines they will answer with a tempest…. No way to stop the mechanism once it is In place