"Visual analysis of controversy in user-generated encyclopedias": A critical summary

Summarizing

Visual analysis of controversy in
user-generated encyclopedias
by Brandes, U., & Lerner, J. (2008). Information Visualization, 7(1),
34-48.

Diego Maranan
IAT 814
22 October 2009

About the article

• In a nutshell: contributing to the understanding of authorship
dynamics in collaboratively-edited articles on controversial
topics
• 16 pages
• 18 citations on Google Scholar
• University of Konstanz, Germany
• Social network analysis and graph layouts since 1999

Article contents

Why this paper matters 693 words
What’s been done before 635 words
Their unique contribution 228 words
How they acquire the data 307 words
How they parse and filter the data 121 words
How they mine and represent the data 3757 words
How they make the visualization interactive 362 words
Examples 724 words
Conclusion 329 words

Why this matters

• Wikipedia: reliability depends on its “neutrality”
• Neutrality depends on dynamics of author community
– Who contributes?
– What are the disputes?
– What are the roles of the authors?
• Roles
– Agree or disagree
– Regular or sporadic
– Contribute a lot or a little
– Revise or be revised
(“Reactionary” or “revolutionary”)

What’s been done before

• Wikipedia research
– Disagreement based on the number of reverts (Kittur et al 2007)
• NLP-based research on Web 2.0 applications
– Analysis of polarizing factors in user opinions (aka people like The Da
Vinci Code for different reasons than they hate it) (Chen et al 2006)
– Identifying polarizing language and partitioning into opinion groups
(Nigam and Hurst 2004)
– Metrics for polarity + buzz + author dispersion (Glance et al 2005)
• Non-NLP
– Vocabulary used by opponent sides tend to be largely identical (Han &
Kamber 2006)
– Links carry less noisy information than text (Agrawal et al 2003)
– Simpler
• Visualization-related
– Force-directed layout (equally-spacing nodes)
– Multidimensional scaling

Finished product:
"Who-revises-whom"-network in 2D space

Legend

Opposition (to another node)
Distance from other node
Thickness of edge connecting to other node
Involvement (number of edits made)
Node area
Number and aggregate thickness of edges

Role (revisor vs. revisee)
Luminance of edge (in relation to a particular
author)
Shape (in relation to all authors in general)

Variance in edit frequency
Node brightness

Ingredients:
Wikipedia XML dump of revision history file

This visualization uses the
following revision data:

• Timestamp
• Author name (or IP address)
• A flag to denote whether a
revision was a revert

… that's it.

How to get from XML dump to network visualization?

Three major problems

• Inferring who revises who
(data mining issue)
• Obtaining a metric for conflict between authors
(data mining issue)
• Meaningfully mapping the authors into a 2D network based on conflict
(visual representation issue)

Remaining problems are relatively easy

• Visually coding individual involvement and participation variance
• Visually coding conflict
• Presenting aggregate involvement (bar chart)
• Choosing and designing interactive features

Problem 1: Inferring who revises who

• You are probably revising a revision immediately previous to yours if
– A short amount of time has passed since the latest revision
– You revise several times in a row
– Your revision is a revert

• Example: Alice and Bob

Problem 1: Inferring who revises who

Therefore…

• Assumptions: Authors revise in a timely manner revisions they're interested
in

• Does not detect revisions where revisor is revising old content (Assumption:
these are a minority compared to intensely conflictual revisions, which
happen very quickly)

• The probability that revision by author u following revision made by author v
can be interpreted as "u is revising v" is approximated by

Problem 2: Obtaining a metric of conflict between authors

Agrawal et al (2003)

• Quoting an author from a previous usenet post creates a quotation link
between authors (Link => relationship)

• Quotation link implies disagreement in controversies ("You said such-and-
such. Well, here's what I think.")

• Limited applicability; “assumes a single topic per posting and poster is ‘for’
or ‘against’ that standpoint” (Nigam)

Problem 2: Obtaining a metric for conflict between authors

Therefore…

• Authors revise only what they don't agree with. (They don't revise to support
a statement.)
• A revision between authors is a "vote" for conflict between authors
• The degree of conflict duv for authors u, v is given by

Problem 3: Mapping authors to a 2D space

A celebration of matrix math

Symmetric adjacency matrix. (duv = dvu)

Let xu and xv be the x-coordinate of authors u and v respectively.
Therefore, if u and v are in conflict, xuxvduv is large and negative.


Solve for all x by minimizing the sum

How? Find the smallest eigenvalue, λmin, associated with A. The associated
eigenvector has all the x-coordinates. Solve this:

Claim: results optimized arrangement
of authors along x-axis.

Do the same for y-axis by taking
second smallest eigenvalue, λ'min


Results
Authors with largest duv entry are furthest away from each other

x

Two unrelated bipolar conflicts are separated along x and y axes. (They prove
this for the general case.)

y

x


Scaling and tripolar conflicts

"Real arguments are not quite so polarized," says authors.

Degrees of tripolar conflict

Claim: scale the y-values by λmin /λ'min

... really?


Conflict 2 axis

Conflict 1
axis

1 set of opposing 2 sets of opposing What does this mean?
opinions opinions
How do you represent 3-polar
Ex: Ripe vs unripe (conflict 1 is conflicts in a Cartesian
bananas independent of presentation space, which
conflict 2) inherently employs greater-than
and less-than ordering?
Ex:
Ripe vs unripe Ex:
bananas Bananas vs apples vs oranges
&
Green vs red apples


Explanation about scaling makes more sense after normalization around
ellipse.

Place all the points around an ellipse by "exploding" them from the center.

Filtering by time intervals and by number of edges shown

Contributions
Extends multidimensional scaling by visualizing independent conflict(s) (2 bipolar
conflicts)
Builds on the work by Agrawal and Kittur

Some limitations
Only a representation; ignores lesser conflicts (but highlights important ones)

Major assumptions about nature of content based on limited data! Very specific
application
Extends the work of Kittur et al to include non-revert based disagreements

Some criticisms
Question about meaning of the 2D space. "Absolute value of x and y coordinate
indicates involvement in the conflict"
Illustrations don’t match the description in the text

Question for you
How do you represent
multipolar conflict in general in 2D?

(Thanks.)

"Visual analysis of controversy in user-generated encyclopedias": A critical summary

Recommandé

Recommandé

Contenu connexe

Similaire à "Visual analysis of controversy in user-generated encyclopedias": A critical summary

Similaire à "Visual analysis of controversy in user-generated encyclopedias": A critical summary (20)

Plus de Diego Maranan

Plus de Diego Maranan (9)

Dernier

Dernier (20)

"Visual analysis of controversy in user-generated encyclopedias": A critical summary