3. The problems with
either methods
Traditional quantitative methods:
• data collection:
standard discourses’ collection risks to hide the heterogeneity
• data treatment:
statistical comparison risks to hide divergences
Traditional qualitative methods:
•data collection:
risk of not being representative (beyond small controversies)
•data treatment:
problem of weighting different discourse
4. The problem with both
methods
rich data, small populations
large populations, poor data
Wide angle VS. telephoto
6. … is reified in social
theory
The collective self is not a simple epiphenomenon of
its morphologic base, precisely as the individual self is
not a simple efflorescence of the nervous system.
For the collective self to appear, a sui generis synthesis
of individual self has to be produced. This synthesis
creates a world of feelings, ideas, images that, once
come to life, follow their own laws.
Emile Durkheim, 1912
Le formes élémentaires de la vie
religieuse
7. Emergence
The emergent is unlike its components insofar as
these are incommensurable, and it cannot be reduced
to their sum or their difference (p. 412)
George Henry Lewes, 1875
Problems of Life and Mind
8. Cats and mice Jack Cohen, 2000
The Collapse of Chaos:
Discovering Simplicity in a Complex World
15. Diving in magma T. Venturini (2010)
Public Understanding of Science 19(3)
16. The Tarde vs Durkheim
controversy
Gabriel Tarde vs Emile Durkheim
17. Against emergence
It is surprising to see the men of sciences, so ready to
repeat that nothing is ever created from nothing,
admitting implicitly (as if it was self-evident) that the
connections among different beings can become beings
themselves (p. 67)
Tarde, 1893
Monadologie et sociologie
18. Against emergence
Supposons pour un instant qu'un de nos États humains, composé non de quelques milliers
mais de quelques quatrillions ou quintillions d'hommes hermétiquement clos et inaccessibles
individuellement (sorte de Chine infiniment plus populeuse encore et plus fermée) nous soit
simplement connu par les données de ses statisticiens, dont les chiffres portant sur de très
grands nombres se reproduiraient avec une extrême régularité. Quand une révolution
politique ou sociale, qui nous serait révélée par un grossissement ou un affaissement
brusques de certains de ces chiffres, se produirait dans cet État, nous aurions beau être
certains qu'il s'agit là d'un fait causé par des idées et des passions individuelles, nous
éviterions de nous perdre en conjectures superflues sur la nature de ces causes seules vraies,
mais impénétrables, et le plus sage nous paraîtrait d'expliquer tant bien que mal les chiffres
anormaux par des comparaisons ingénieuses avec les chiffres normaux habilement maniés.
Nous atteindrions ainsi au moins des résultats clairs et des vérités symboliques. Toutefois, il
importerait de temps en temps de nous rappeler le caractère purement symbolique de ces
vérités.
Tarde, 1893
Monadologie et sociologie
22. And then the web arrived…
<a href="http://www.medialab.sciencespo.fr/index.php"> click here </a>
23. And then the web arrived…
and Google with it
Brin, S., & Page, P. (1998).
The Anatomy of a Large-Scale Hypertextual Web Search Engine.
Computer Networks and ISDN Systems, 30(1-7), 107–117
24. Digital
traceability Latour, B. (2007). Beware your Imagination Leave Digital Traces.
Times Higher Literary Supplement.
Owen Gingerich, the great historian of astronomy, spent a life-time retrieving all the
annotations of all the copies of Copernicus’s first edition. He could thus give a precise
meaning to the rather empty notion of “Copernican revolution” and could show which parts
of the book everyone had read and misinterpreted. Nowadays, any scientist can do the same
for each portion of each article he or she has published so long as the local library has
bought a good package of digital data banks. But what is more extraordinary is that any
journalist can do so as well for the latest Madonna video or the dirtiest rumour about Prince
Harry’s love affairs.
In other words, the former distinction between the circulation of facts and the
dissemination of opinions has been erased in such a way that they are both graduating to
the same type of visibility — not a small advantage if we wish to disentangle the mixture
of facts and opinions that has become our usual diet of information
25. Digital
traceability
Once you can get information as bores, bytes, modem, sockets,
cables and so on, you have actually a more material way of looking
at what happens in Society.
Virtual Society thus, is not a thing of the future, it’s the
materialisation, the traceability of society. It renders visible because
of the obsessive necessity of materialising information into cables,
into data.
Latour, B. 1998
“Thought Experiments in Social Science: from the Social
Contract to Virtual Society”
26. From digital
traceability …
Bruno Latour (1998), argued that the Web is mainly of importance to
social science insofar as it makes possible new types of descriptions of
social life. According to Latour, the social integration of the Web
constitutes an event for social science because the social link becomes
traceable in this medium. Thus, social relations are established in a
tangible form as a material network connection. We take Latour’s claim of
the tangibility of the social as a point of departure in our search (p. 342).
Rogers, R., and Marres, N. 2002
“Frenchs candals on the Web, and on the streets:
A small experiment in stretching the limits of reported reality.”
Asian Journal of Social Science 66: 339-353.
27. … to digital
methods
The Internet is employed as a site of research for far more than just
online culture. The issue no longer is how much of society and
culture is online, but rather how to diagnose cultural change and
societal conditions with the Internet. The conceptual point of
departure for the research program is the recognition that the
Internet is not only an object of study, but also a source.
Rogers, R. 2009
The End of the Virtual: Digital Methods. Amsterdam
University Press.
41. How to search/query
Bisphenol
http://en.wikipedia.org/wiki/Bisphenol_A
Bisphenol heart diseases controversy
http://www.foxnews.com/health/2012/03/07/bpa-chemical-may-be-tied-to-heart-disease/
Bisphenol Melzer controversy
http://www.thelancet.com/journals/lancet/article/PIIS0140-6736(12)60496-6/fulltext
BPA
Polycarbonate
endocrine disruptor
hearth desease
David Melzer
Food and Drug Administration
coronary artery disease
Monica Lind
Jeremy Pearson
Steven Hentges
Polycarbonate Global Group
42. Subject-specific keywords
• Proper names
• Name of institutions
• Toponyms
• Scientific/technical terminology
• Scientific references
• …
43. Improving your
query
• Exploit linguistic differences
• Go advanced (use search fields)
• Limit the time span
• Use search operators
• “exact” / -exclude / ~synonyms / * / OR / AND
45. Whose data
is this?
• Proliferation of new devices, genres and formats for the documentation
of social life… explosion of digital technologies that enable people to
report and comment upon social life.
• Routine generation of data about social life as part of social life. ‘Social
media’ platforms… embed the process of social data generation in
everyday practices.
• Development of online platforms and tools for the analysis of digital
social data. These days, most online platforms come with ‘analytics’
attached: a set of tools and services facilitating the analysis of the data
generated by said platforms.
Marres, N. (2011).
Re-distributing Methods:
Interventions in Digital Social Research.
46. Redistribution of
research methods
• Methods as usual (ex. Andrew Abbott, )
The techniques used by digital platforms have been long used in social sciences.
• Big methods (ex. Newman et al, 2007)
Digital traceability increases the quantity of social data thereby demanding use of
mathematical techniques of analysis.
• Virtual methods (ex. Christine Hine, 2000, 2005)
Digital media transform the quality of social practices and demand therefore increased
efforts of observations and interpretation.
• Digital methods (ex. Richard Rogers, 2009)
Digital platforms have their own methods that need to be understood and re-purposed
for social research.
• Re-mediation of methods (ex. Nortje Marres, 2011)
The techniques used by digital platforms have been long used in social sciences, but are
radically transformed the new context of their use.
Marres, N. (2011).
Re-distributing Methods:
Interventions in Digital Social Research.
More
redistribution
Less
On the one hand, social sciences could use quantitative methods (surveys and statistics) to collect data on large population, but the data they collect would necessarily be relatively poor and superficial. On the other hand, they could use qualitative methods (interviews, focus group, observations) to collect rich and detailed data, but they were then forced to limit their investigation to small populations.
27/08/12
Social science could observe many thing from far away (quantitative methods = wide angle) or have a close look to few things (qualitative methods = telephoto). Never could they maintain the span and the focus of their observation at the same time, nor change their focal length continuosly.
Up until now, social sciences cannot use natural experiments either, because this type of experiments requires a detailed knowledge of a large number of subjects (Snow, for instance, had the complete map of the water distribution system of London, which allowed him to know which water company was serving each specific household). Unfortunately, these two conditions are seldom reunited in social sciences. Since their foundation, social sciences have always had to deal with a sort of methodological strabismus .
To use another metaphor, this is what I call the ‘Gulliver sociology’.
27/08/12 In the previous unit we learnt how difficult is to study controversies. In this unit, we will discover that, luckily, there is at least one thing that can help us in this otherwise impossible mission. The one thing that can make the task of controversy mapping less helpless.
27/08/12 Hop-o'-My-Thumb
27/08/12
27/08/12
27/08/12
27/08/12
27/08/12
But this situation has started to change as soon as social scientists have stopped considering media (and electronic media in particular) just as an object of study…
… and started considering them also as a possible source of data. Digital media have, in fact, a very interesting feature: all the interactions that they mediate becomes easily traceable and is often easily traced. Though these traces are not collected for the sake of social science (but for surveillance, marketing or for technical optimiszation), they can nonetheless be exploited by social scientists. Giving social sciences, for the first time in their history, access to plenty of data.
These data concerns huge population as about one third of world population has access to the Internet and about half of it owns a mobile phone. Digital media are spreading like a immense carbon paper, tracing social phenomena to an extent that has never been possible before. As a proof of concept, in the image in the slide Paul Butler showed how it is possible to generate a very detailed map of the world by mapping friendships connections in Facebook.
At the same time, this data are also as rich than the data collected with qualitative data. As a proof of concept, see the documentary on the life of American On Line user 711391. Drawing on a an accidental leak of AOL data, the documentary reports the three month complete search history of this user. The sequence of her queries (and nothing else) allows disturbingly intimate access to the life of this ”religious middle-aged and somewhat obese middle-aged lady from Houston Texas who is looking for a way to rejuvenate her sex life” (as we come to discover).
What is most important, thanks to digital traceability is now possible to collect data that are rich and concerning large population at the same time , as convincingly demonstrated by the famous Google study on the detection of flu epidemics.
In this study Google engineers identified the 45 search queries that best matched the flu curves released by the U.S. Centers for Disease Control (CDC). Then they combined the curves of this 45 queries and built and indicator that has an increadible mean correlation of 0.97 with CDC data.
With the advantage that whereas the CDC needs about two weeks to collect and release the data on US flu epidemics, Google can calculate its indicator every day.
(Google also made the same type of research possible to anyone and on any subject through Google Insight for Search and Google Correlate)
(Google also made the same type of research possible to anyone and on any subject through Google Insight for Search and Google Correlate)
(Google also made the same type of research possible to anyone and on any subject through Google Insight for Search and Google Correlate)
(Google also made the same type of research possible to anyone and on any subject through Google Insight for Search and Google Correlate)
From the point of view of social science, the change is dramatic. For the first time, it is possible to start imagining methods having both a large scope and a detailed focus, thereby overcoming the limitations of both quantitative and qualitative methods. The image in the slide is a good proof of concept. In this map of the US blogosphere in 2006 realized by Ben Fry, it is possible to observe zoom out to see the big picture and observe large-scale patterns (like the fact the the more visible websites link to the less visible one, but not the other way around – the so called preferential attachment), but also to zoom in and observe each individual connection. A new generation of quali-quantitative methods becomes therefore possible …
This is a map digital tools and methods that we use at the médialab of Sciences Po. In this course (and in particular in the second semester) you will lean to use most of them.
… and it becomes possible to move from the sociology of Gulliver to the sociology of Alice (as you know in her trip to Wonderland Alice can change her size at her will by drinking a magical potion and eating a magical cookie).
27/08/12
27/08/12
27/08/12
27/08/12
27/08/12
27/08/12
27/08/12
The first challenge consists in taking the data mining metaphor seriously. Everyone who ever visited a gold mine knows well that what is striking about this type of landscape is the feeling of absence that dominate them. Where a mountain is supposed to be, there is a huge hole instead. Describing mining as the act of collecting gold and other precious materials is mistaking the aim for the practice. 0.1% of mining is about collecting precious substances, 99,9% of it is about removing tons and tons of rocks, sand and earth. Gold is the product of such absence, what is left when everything else is gone. The same is true for information mining: it is not about collecting as much data as possible; it is about getting rid of most of it. This is important, because the current ‘data deluge’ ideology, obsessed as it is with the question of collecting, storing, exploiting data, forgets that the careful selection of data is most important part of all scientific protocol.
27/08/12 An example will make our argument clear. The so-called Internet map is, to our knowledge, the largest publicly available map of the Web. As you can see, very little knowledge can be extracted from this map. All that we can see is that the Web is polarized by language (the color of the nodes) and that some nodes are (far) more connected than the other (size of the nodes). None of this is a surprise.
27/08/12 Beautiful and breathtaking as they may be, this kind of maps is useless for research purposes. This is not data mining, this is compulsive hoarding: a syndrome that is growing more and more serious among the data deluge fans.
A good map of the Web is always limited in its ambition: it tries to represent a limited portion of the Web and the better this portion is delimited, the better is the map. In the example an interesting map of the French political blogosphere, realized by Linkfluence (a research partner of the médialab).
Because the selection of the websites has been done carefully it is possible to use this map as a research tool and discover for example, that the extreme left and the extreme right have two very different position in French online politics: the first being little, spread out and central; the second being massive, clusterized and eccentric.
0.1% of Web-crawling is about collecting relevant websites, 99,9% of it is about removing irrelevant ones. That is why the most important button in all the crawling tools that we develop at the médialab (in the slide you see the old Navicrawler and the soon-to-be-release Hyphe) is the one allowing the exclusion of one website from the corpus. Providing us tools for filtering, delimiting, sieving data is the first contribution that we would like to have from CHI experts.
27/08/12 The first skill is ‘searching’ that is to say using a search engine. This is, by far, the most common way of finding information on the Web. All of you have already used search engines millions of times. And yet, it is important (and not only for the sake of controversy mapping) for you to understand the very specific movement of search engine querying. Contrarily to what you may think, this movement should not aim at expansion (finding more information), but at reduction. The problem with search engines is not that they return too little information, but that they return too much (and most of it is not relevant). Improving one’s queries is therefore an effort in finding more and more specific words capable to reduce the information reduce by the search engine.
27/08/12 In fact, the movement just described need to be precised. The aim of the research, of course, is not to reduce the quantity of information found, but to reduce the irrelevant information and increase the relevant one. This movement of concentration (or distillation) requires identifying a number of ‘specific keywords’ clearly focused on the subject of the research.
27/08/12 This subject-specific keywords can include proper names, name of institutions, toponyms, scientific/technical terminology, scientific references and in general all words or expression that are not polysemic or vague.
27/08/12 And here are some other advices on how to improve your queries
In order to understand the revolution brought by digital traceability in controversy mapping and, more generally, to social science, we have to go back to a famous research conducted by the British epidemiologist John Snow at the middle of the XIX century. John Snow was trying to understand the mechanisms of diffusion of the cholera (one of the main death cause in UK). At the time, the dominant theory was that cholera was caused by pollution or a noxious form of "bad air”. Snow, however, criticized by this theory and claimed instead that cholera germs were transported by infected water. Snow first tried to prove its theory by showing that one particularly severe cholera outbreak in London was centered around a particular water pump located in broad street. But how how to prove that these particular observation could be generalized to all cholera epidemics.
Snow, of course, could not prove his theory by direct experiments on human beings and yet an experimental evidence was exactly he needed to convince the scientific community. Trying to solve this conundrum, Snow came up with the idea of ‘natural experiment’. First of all, he observed that the mortality rate in different households was strongly correlated with the company that provided them water. In particular, the houses supplied by the Southwark Company the mortality was almost six times higher that in the houses supplied by the Lambeth Company.
But this proof was not sufficient, as other differences between the households could have explained the difference. Snow however had at his disposal the detailed map of the London water system and observed that the distribution network of Southwark and Lambeth intermingled in central London. Since in these district the households supplied by the two water company were side by side, Snow can easily assume that all other conditions were equal. In other words, it was as if London population had been divided randomly in an experimental group and a control group, a perfect experimental setting except that Snow had not prepared it himself, but just found it in ‘nature’.
One of the main difference between natural science and social science is that the latter cannot reproduce the phenomena that they study in the controlled setting of the laboratory. Social sciences cannot rely on controlled experiments to investigate collective dynamics (and this is why the comics in the slide are funny). But can social sciences employ at least natural experiments?