Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Figures of the Many - Quantitative Concepts for Qualitative Thinking
1. Figures of the Many
Quantitative Concepts for Qualitative Thinking
Bernhard Rieder
Universiteit van Amsterdam
Mediastudies Department
2. Context
Terms like "big data", "computational social science", "digital humanities",
"digital methods", etc. are receiving a lot of attention.
They point to a set of practices for knowledge production: data analysis,
visualization, modeling, etc.
Instead of a totalizing search for a "logic" of data analysis, we could
inquire into the vocabulary of analytical gestures that constitute the
practice of data analysis.
A twofold approach to methods:
☉ Engagement, development, application => digital methods
☉ Conceptual, historical, and political analysis and critique => software studies
3. This presentation
How do we talk about data? How do we analyze them? What is our frame
of thought? How do we go further in terms of imagination, expressivity?
☉ 1 / Confronting "the many"
☉ 2 / Two kinds of mathematics
☉ Objects and their properties => Statistics
☉ Objects and their relations => Graph theory
Engage the theory of knowledge (epistemology) mobilized in data analysis,
but through the actual techniques and not generalizing concepts.
4. What styles of reasoning?
Hacking (1991) building the concept of "style of reasoning" on A. C.
Crombie’s (1994) "styles of scientific thinking":
☉ postulation and deduction
☉ experiment and empirical research
☉ reasoning by analogy
☉ ordering by comparison and taxonomy
☉ statistical analysis of regularities and probabilities
☉ genetic development
What kind of reasoning are we mobilizing in data analysis?
Is the history of styles of reasoning simply intellectual progress, or
adaptation to a changing world, or co-constitutive of that world?
What is our world like?
5.
6.
7. "It is hard to believe that we still have to absorb the same types of
actors, the same number of entities, the same profiles of beings, and
the same modes of existence into the same types of collectives as
Comte, Durkheim, Weber, or Parson [sic], especially after science and
technology have massively multiplied the participants to be cooked in
the melting pot." (Latour 2005, 260)
8. The proliferation of actors and facilitation of transversal connectivity have
lead to large and complex forms of socio-technical grouping and
structuring.
Forms of organization take the shape of (multi-sided) markets based
around technological platforms that facilitate transactions.
Social media use simple but flexible grammars of connectivity
(combination of point to point and list forms), exchange, and aggregation
that accommodate various practices and levels of scale.
The diversity of practices, contents, geographies, topologies, intensities,
motivations, etc. makes it hard to generalize and theorize dynamics of use.
1 / The many
10. At the same time, they
produce detailed data
traces that are highly
centralized and searchable.
11. Quality / quantity
"One of my favorite fantasies is a dialogue between Mills and Lazarsfeld in which the former
reads to the latter the first sentence of The Sociological Imagination: 'Nowadays men often
feel that their private lives are a series of traps.' Lazarsfeld immediately replies: 'How many
men, which men, how long have they felt this way, which aspects of their private lives
bother them, do their public lives bother them, when do they feel free rather than trapped,
what kinds of traps do they experience, etc., etc., etc.' If Mills succumbed, the two of them
would have to apply to the National Institute of Mental Health for a million-dollar grant to
check out and elaborate that first sentence. They would need a staff of hundreds, and when
finished they would have written Americans View Their Mental Health rather than The
Sociological Imagination, provided that they finished at all, and provided that either of them
cared enough at the end to bother writing anything." (Maurice Stein, cit. in Gitlin 1978)
Theory vs. empiricism, macro vs. micro, qualitative vs. quantitative, inductive vs.
deductive, associative vs. formalistic, etc.
The promise of data analysis tools, applied to exhaustive (and cheap) data, is to
bridge the gap, to allow zooming, "quali-quanti" (Latour 2010).
12. “facts and statistics collected together for reference or analysis. See also datum.
- Computing: the quantities, characters, or symbols on which operations are performed by a
computer, being stored and transmitted in the form of electrical signals and recorded on
magnetic, optical, or mechanical recording media.
- Philosophy: things known or assumed as facts, making the basis of reasoning or
calculation.” (Oxford American Dictionary)
Define: data
Reasoning (OAD): "think rationally", "use one's mind", "calculate", "make sense
of", "come to the conclusion", "judge", "persuade", etc.
Reasoning as "giving reasons" – what counts as a good reason? What counts as a
good argument? As a proof? What is "good" knowledge?
Reasoning as a series of techniques, e.g. science, engineering, etc.
13. Why does the astronaut step into the space shuttle?
14. A short history of reasoning the "more"
Commercial Capitalism (13th +)
calculating for trade, arithmetic, sharing risk and profit in long-distance commerce
Rise of the Nation State (17th +)
"art of the state", mercantilism, scientific revolution
Industrialization (19th +)
urbanization, scientific management, large bureaucracies
☉ Fibonacci, "Liber Abaci", Fibonacci, Calculating with Arab numerals (Pisa, 1202)
☉ Unknown, "Arte dell'Abbaco", Practical arithmetic (Venice, 1478)
☉ Pacioli, "Summa de arithmetica, geometria, proportioni et proportionalità", Double entry
bookkeeping (Venice, 1494)
☉ William Petty & John Graunt, Political Arithmetick (17th century)
☉ Hermann Conring & Gottfried Achenwall, Statistik (17th & 18th century)
☉ Adolphe Quetelet, Statistical regularities and the "average man" (19th century)
☉ Francis Galton & Karl Pearson, Public health and eugenics (late 19th century)
15. Liber Abaci, Fibonacci, 1202
Calculation for accounting,
money-changing, insurance,
lending, measurement, etc.
16. "Having proved that there die about 3,506 persons at Paris unnecessarily, to the
damage of France, we come next to compute the value of the said damage, and
of the remedy thereof, as follows, viz., the value of the said 3,506 at 60 livres
sterling per head, being about the value of Algier slaves (which is less than the
intrinsic value of people at Paris), the whole loss of the subjects of France in that
hospital seems to be 60 times 3,506 livres sterling per annum, viz., 210,360
livres sterling, equivalent to about 2,524,320 French livres." (Petty 1655)
17. The Assurance of Lifes,
Charles Babbage, 1826
First life tables were
assembled in the 17th
century by John Graunt.
Babbage builds a machine
to produce tables faster.
18. Essai sur la statistique de
la population française,
Adolphe d'Angeville, 1836
population census, tax
register, house numbers, etc.
modern statistics, large
bureaucracies, quantitative
social sciences, etc.
19. Over the last centuries, scientific thinking has become the dominant way
of producing knowledge and making decisions in most societies.
Scientific thinking implies various styles of reasoning, different ways of
"giving reasons", different analytical gestures, etc.
Styles are intrinsically connected to our "lifeworld" (Husserl 1936).
Two diagnoses:
☉ Our lifeworld is changing in significant ways => "the many"
☉ We need new ways of making sense of it => data analysis
What is the style of data analysis? Its epistemology? One or many?
What are its techniques, its analytical gestures?
Some conclusions for part 1
20. 2 / Two kinds of mathematics
Can there be data analysis without math? No.
Does this imply epistemological commitments? Yes.
But there are choice, e.g. between:
☉ Confirmatory data analysis => deductive
☉ Exploratory data analysis (Tukey 1962) => inductive
There is a fast growing variety of analytical gestures focusing on large
numbers of formalized and classed objects.
21. 2 / Two kinds of mathematics
Statistics
Observed: objects and properties
Inferred: relations
Data representation: the table
Visual representation: quantity charts
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Inferred: structure
Data representation: the matrix
Visual representation: network diagrams
Grouping: clique (dense relations)
22. Facebook Page "ElShaheeed", June 2010 – June 2011, (Poell / Rieder, forthcoming)
7K posts, 700K users, 3.6M comments, 10M likes (tool: netvizz), work in progress!
23. New media platforms funnel practices into reduced and largely formal
"grammars of action" (Agre 1989); data is therefore very clean, very
complete, and very detailed.
Can be imported with great ease into standard packages that come with
many analytical gestures built in R, Excel, SPSS, Rapidminer, etc.).
Tools are easy, concepts are hard.
Statistics
36. 2 / Two kinds of mathematics
Statistics
Observed: objects and properties
Inferred: relations
Data representation: the table
Visual representation: quantity charts
Grouping: class (similar properties)
Graph-theory
Observed: objects and relations
Inferred: structure
Data representation: the matrix
Visual representation: network diagrams
Grouping: clique (dense relations)
37. 3 / The mathematics of structure
Graph theory has a long prehistory; social network analysis starts in the
1930s with Jacob Moreno's work.
Graph theory is "a mathematical model for any system involving a binary
relation" (Harary 1969); it makes relational structure calculable.
40. Network statistics
betweenness centrality
degree
Relational elements of graphs can
be represented as tables (nodes
have properties) and analyzed
through statistics.
Network statistics bridge the gap
between individual units and the
structural forms they are
embedded in.
This is currently an extremely
prolific field of research.
42. Helpful: baseline sampling
Twitter's API proposes a random 1% statuses/sample endpoint that does
not require privileged access.
Provides datasets for researching certain types of questions and allows to
"contextualize" (baseline) other collections.
We (Gerlitz / Rieder 2013) explored 24 hours of the 1% sample and
captured 4,376,230 tweets, sent from 3,370,796 accounts, at an average
rate of 50.65 tweets per second, leading to about 1.3GB of uncompressed
and unindexed MySQL tables.
43. A baseline provides reference points
Beware of averages in non-normal distributions! But 1% sample is
sufficiently large to allow representative exploration of subsamples.
We can qualify structures and individual elements in terms with the help
of statistics and graph theory.
55. Conclusions
There is a lot of excitement about data analysis, but our understanding of
styles and analytical gestures is still very poor.
We need interrogation and critiques of methodology that are developed
from engagement and historical/conceptual investigation.
We need analytical gestures that are more closely tied to concepts from
the humanities and social sciences; exploration rather than confirmation.
Visualization and simpler tools are very interesting but require technical
and conceptual literacy to deliver more than illustrations.
This is probably not a fad.
56. "Incite, induce, deviate, make easy or difficult, enlarge or limit, render more or
less probable… These are the categories or power." (Deleuze 1986, 77)
57. Thank You
rieder@uva.nl
https://www.digitalmethods.net
http://thepoliticsofsystems.net
"Far better an approximate answer to the right
question, which is often vague, than an exact answer to
the wrong question, which can always be made precise.
Data analysis must progress by approximate answers, at
best, since its knowledge of what the problem really is will
at best be approximate." (Tukey 1962)
Notes de l'éditeur
An almostclassic kind of reasoning about "more".Image: http://www.prweb.com/releases/information/digital/prweb509640.htm
Every one of use posses a large number of objects, many of them computers.
People do a lot of different things on Twitter, Facebook, etc. – and just because you and your immediate vicinity seem to have coherent practices, this does not mean others have.
Anatomy of a tweet. https://twitter.com/ICIJorg/status/321585235491962880https://api.twitter.com/1/statuses/show/321585235491962880.json
Very large scale systems on the one side, but highly concentrated data repositories on the other.The promise of data analysis is, of course, to use that data to make sense of all the complexity.
C. Wright Mills vs. Paul LazarsfeldMany people argue that we no longer need that grant, we already have the data.
Reasoning then guides practice. Description => decision-making.
"Why does the Astronaut step into the Space Shuttle?", does not seem like a sensible idea. What reasons are given that we do not think about astronauts as suicidal?
Cost-benefit analysis! How to price a life? (today: expected future earnings)
http://www.youtube.com/watch?v=zFl6p4D59AAhttp://www.videohippy.com/video/11216/Little-Britain-Computer-says-NoExample: opening a bank account at ABN-Amro (credit rating)http://www.creditchecker.nl/Questions: ShouldI give that person money? How much? At what interest rate?
Questions: I am the government, what should I do? Where should I invest? How does the economy work?Adolphed'Angeville:Essaisur la Statistique de la Population Française, 1836 - Full document: http://www.europeana.eu/portal/record/03486/DE44EEC02EA9F56E94AD9D3BD077AB298A92514E.html
Making decisions: in particular on the interpersonal level!
Allows for all kinds of folding, combinations, etc. – Math is not homogeneous, but sprawling!Different forms of reasoning, different modes of aggregation.These are already analytical frameworks, different ways of formalizing.
http://www.facebook.com/ElShaheeed (Created by WaelGhonim, considered to be a central place for the sparking of the Egyptian Revolution)http://apps.facebook.com/netvizz/ (tool used for extraction)
Simply plotting events is an analytical gesture. (=> pattern)
Changing scales, analytical gesture, "tame" large numbers and heighten visibility
Adding variables => allow for comparisons
Count per interval (here: day).
Different visuals, change counting interval, very different effect
But if we look at the number of posts published on the page, this is a very different picture! So we want to compare!
Find outliers and interesting moments not only in terms of values, but relationships between values.
Looking at "central tendencies" in data. When does it make sense? Here it does, because there is no powerlaw.
Whatdo the averages characterize here? Not much – there is no "typical" post.
In statistics, regression analysis is a statistical technique for estimating the relationships among variables. (correlation)A probability relationship: height and weight is correlated: if you are very tall, there is a good chance that you also weigh more; a statistical not a deterministic relationshhipErosion of determinism in the 19th centuryTitle : Recherchessur la population, les naissances, les décès, les prisons, les dépôts de mendicité, etc., dans le royaume des Pays-Bas , par M. A. Quételet,… 1827http://gallica.bnf.fr/ark:/12148/bpt6k81568v.r=.langEN
Positive correlation, but it's not 1:1
And now to graph theory.
Forsythe and Katz, 1946 – "adjacency matrix", Moreno, 1934
Visualization is, again, one type of analysis.Which properties of the network are "made salient" by an algorithm?http://thepoliticsofsystems.net/2010/10/one-network-and-four-algorithms/Models behind: spring simulation, simulated annealing (http://wiki.cns.iu.edu/pages/viewpage.action?pageId=1704113)
So, what can we do?Logistics are important, because they determine who can do what kind of research, requirements for groups, etc.1% easy to handle for modern hardware; but for how long?
A platform that hosts many different practices, from interpersonal communication to mass-media like oulets like Lady Gaga's account, which has 36M followers.But means or medians are still reference points!
We can of course produce descriptive statistics!Baselining allows us to make "drawing the line" more informed. Does not evacuate bias – there is no "view from nowhere" – but maybe more conscious.
Extend word lists (what am I missing?), account for refraction.
Compare
Larger roles of hashtags, not all are issue markers!
All in all, this process resulted in the specification of nine centrality measures based on three conceptual foundations. Three are based on the degrees of points and are indexes of communication activity. Three are based on the betweenness of points and are indexes of potential for control of communication. And three are based on closeness and are indexes either of independence or efficiency.(Freeman 1979)What concepts are they based on?
Network metrics are highly dependent on individual variables.
There is no need to analyze and visualize a graph as a network.Characterize hashtags in relation to a whole. (their role beyond my sample), better understand our fishing pole and the weight it carries.Tbt: throwback thursday
How do we interpret this: understand the platform, understand the context of the phenomenon, understand the algorithm, etc.
How do we interpret somethinglike this?
Quantitative forms allow us to fill this with "content".