Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
"Visual analysis of controversy in user-generated encyclopedias": A critical summary
1. Summarizing
Visual analysis of controversy in
user-generated encyclopedias
by Brandes, U., & Lerner, J. (2008). Information Visualization, 7(1),
34-48.
Diego Maranan
IAT 814
22 October 2009
2. About the article
• In a nutshell: contributing to the understanding of authorship
dynamics in collaboratively-edited articles on controversial
topics
• 16 pages
• 18 citations on Google Scholar
• University of Konstanz, Germany
• Social network analysis and graph layouts since 1999
3. Article contents
Why this paper matters 693 words
What’s been done before 635 words
Their unique contribution 228 words
How they acquire the data 307 words
How they parse and filter the data 121 words
How they mine and represent the data 3757 words
How they make the visualization interactive 362 words
Examples 724 words
Conclusion 329 words
4. Why this matters
• Wikipedia: reliability depends on its “neutrality”
• Neutrality depends on dynamics of author community
– Who contributes?
– What are the disputes?
– What are the roles of the authors?
• Roles
– Agree or disagree
– Regular or sporadic
– Contribute a lot or a little
– Revise or be revised
(“Reactionary” or “revolutionary”)
5. What’s been done before
• Wikipedia research
– Disagreement based on the number of reverts (Kittur et al 2007)
• NLP-based research on Web 2.0 applications
– Analysis of polarizing factors in user opinions (aka people like The Da
Vinci Code for different reasons than they hate it) (Chen et al 2006)
– Identifying polarizing language and partitioning into opinion groups
(Nigam and Hurst 2004)
– Metrics for polarity + buzz + author dispersion (Glance et al 2005)
• Non-NLP
– Vocabulary used by opponent sides tend to be largely identical (Han &
Kamber 2006)
– Links carry less noisy information than text (Agrawal et al 2003)
– Simpler
• Visualization-related
– Force-directed layout (equally-spacing nodes)
– Multidimensional scaling
7. Legend
Opposition (to another node)
Distance from other node
Thickness of edge connecting to other node
Involvement (number of edits made)
Node area
Number and aggregate thickness of edges
Role (revisor vs. revisee)
Luminance of edge (in relation to a particular
author)
Shape (in relation to all authors in general)
Variance in edit frequency
Node brightness
8. Ingredients:
Wikipedia XML dump of revision history file
This visualization uses the
following revision data:
• Timestamp
• Author name (or IP address)
• A flag to denote whether a
revision was a revert
… that's it.
9. How to get from XML dump to network visualization?
Three major problems
• Inferring who revises who
(data mining issue)
• Obtaining a metric for conflict between authors
(data mining issue)
• Meaningfully mapping the authors into a 2D network based on conflict
(visual representation issue)
Remaining problems are relatively easy
• Visually coding individual involvement and participation variance
• Visually coding conflict
• Presenting aggregate involvement (bar chart)
• Choosing and designing interactive features
10. Problem 1: Inferring who revises who
• You are probably revising a revision immediately previous to yours if
– A short amount of time has passed since the latest revision
– You revise several times in a row
– Your revision is a revert
• Example: Alice and Bob
11. Problem 1: Inferring who revises who
Therefore…
• Assumptions: Authors revise in a timely manner revisions they're interested
in
• Does not detect revisions where revisor is revising old content (Assumption:
these are a minority compared to intensely conflictual revisions, which
happen very quickly)
• The probability that revision by author u following revision made by author v
can be interpreted as "u is revising v" is approximated by
12. Problem 2: Obtaining a metric of conflict between authors
Agrawal et al (2003)
• Quoting an author from a previous usenet post creates a quotation link
between authors (Link => relationship)
• Quotation link implies disagreement in controversies ("You said such-and-
such. Well, here's what I think.")
• Limited applicability; “assumes a single topic per posting and poster is ‘for’
or ‘against’ that standpoint” (Nigam)
13. Problem 2: Obtaining a metric for conflict between authors
Therefore…
• Authors revise only what they don't agree with. (They don't revise to support
a statement.)
• A revision between authors is a "vote" for conflict between authors
• The degree of conflict duv for authors u, v is given by
14. Problem 3: Mapping authors to a 2D space
A celebration of matrix math
Symmetric adjacency matrix. (duv = dvu)
Let xu and xv be the x-coordinate of authors u and v respectively.
Therefore, if u and v are in conflict, xuxvduv is large and negative.
15. Problem 3: Mapping authors to a 2D space
Solve for all x by minimizing the sum
How? Find the smallest eigenvalue, λmin, associated with A. The associated
eigenvector has all the x-coordinates. Solve this:
Claim: results optimized arrangement
of authors along x-axis.
Do the same for y-axis by taking
second smallest eigenvalue, λ'min
16. Problem 3: Mapping authors to a 2D space
Results
Authors with largest duv entry are furthest away from each other
x
Two unrelated bipolar conflicts are separated along x and y axes. (They prove
this for the general case.)
y
x
17. Problem 3: Mapping authors to a 2D space
Scaling and tripolar conflicts
"Real arguments are not quite so polarized," says authors.
Degrees of tripolar conflict
Claim: scale the y-values by λmin /λ'min
... really?
18. Problem 3: Mapping authors to a 2D space
Conflict 2 axis
Conflict 1
axis
1 set of opposing 2 sets of opposing What does this mean?
opinions opinions
How do you represent 3-polar
Ex: Ripe vs unripe (conflict 1 is conflicts in a Cartesian
bananas independent of presentation space, which
conflict 2) inherently employs greater-than
and less-than ordering?
Ex:
Ripe vs unripe Ex:
bananas Bananas vs apples vs oranges
&
Green vs red apples
19. Problem 3: Mapping authors to a 2D space
Explanation about scaling makes more sense after normalization around
ellipse.
Place all the points around an ellipse by "exploding" them from the center.
20. Legend
Opposition (to another node)
Distance from other node
Thickness of edge connecting to other node
Involvement (number of edits made)
Node area
Number and aggregate thickness of edges
Role (revisor vs. revisee)
Luminance of edge (in relation to a particular
author)
Shape (in relation to all authors in general)
Variance in edit frequency
Node brightness
22. Contributions
Extends multidimensional scaling by visualizing independent conflict(s) (2 bipolar
conflicts)
Builds on the work by Agrawal and Kittur
Some limitations
Only a representation; ignores lesser conflicts (but highlights important ones)
Major assumptions about nature of content based on limited data! Very specific
application
Extends the work of Kittur et al to include non-revert based disagreements
Some criticisms
Question about meaning of the 2D space. "Absolute value of x and y coordinate
indicates involvement in the conflict"
Illustrations don’t match the description in the text
23. Question for you
How do you represent
multipolar conflict in general in 2D?
(Thanks.)