The document summarizes a presentation on analyzing Wikipedia data to understand the relationship between geographic and temporal information.
[1] It discusses how Wikipedia pages can be tagged with geographic coordinates and how this data has been used to study patterns in editor contributions and relationships between concepts and locations.
[2] Two specific research areas are highlighted: analyzing editor edit histories to correlate contributors with locations, and mapping Wikipedia concepts to understand connections between topics, places and time periods.
[3] Examples of findings are presented, such as many editors focusing their contributions to locations near where they live, and which countries and cities are most associated with different historical philosophy periods.
1. Social Geography &
Wikipedia
a quick overwiew
Maurizio Napolitano
(SoNet internal research meeting)
FBK 27/08/2010
2. SoNet Research Meetings
These slides were used for an internal presentation of the
SoNet group.
Every week, one member of the SoNet group presents a
research papers to the other members. The mentioned
paper(s) are hence written by other researchers.
Being internal presentations, these slides might be a bit
rough and unpolished.
You can find more information (including this
presentation) about the SoNet group at
http://sonet.fbk.eu
3. Summary
• Introduction: the wikification of GIS
• Wikipedia and geodata
• Some research questions
– You Are Where You Edit:
Locating Wikipedia Contributors Through Edit
Histories
– Spatiotemporal Mapping of Wikipedia
Concepts
5. Introduction
Sui, D.Z. The wikification of GIS and its
consequences: or Angelina Jolie's new tattoo
and the future of GIS. Comp. Env. Urb. Sys.
2008, 32, 1-5.
6. The wikifications
• The GIS has changed
– Better hardware → easy management
– Data production → Crowdsourcing project
(WikiMapia, OpenStreetMap, Mapufacture,
GeoCommons, TierraWiki, FixMyStreet, WhoIsSick
… ) and GeoTag
– People → Organizations
… NEOGEOGRAPHY ...
7. Wikipedia and geodata – applications
(1/2)
• Space-time
exploration
• Space-time selection
• Space-Wikipedia
relationship
exploration
• Space-Wikipedia
relationship selection
Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic
lenses to explore georeferenced Wikipedia content. In Proc. of the 3rd
International Workshop on Pervasive Mobile Interaction Devices.
8. Wikipedia and geodata – applications
(2/2)
Spatial feature-edge-feature
relationships in Wikipedia Berlin article temporal reference profile
10. Wikipedia geopages – the infobox
{{Infobox Settlement
…
|latd = 37 |latm = 18 |lats = 15 |latNS = N
|longd = 121 |longm = 52 |longs = 22 |longEW = W
…
}}
You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
11. Wikipedia geopages – problem and
solution
Must process page Wiki
markup to identify DBpedia
geographic templates and – Public ontology derived
extract coordinates from Wikipedia, including
– Wiki markup language extracted geographic
continually evolves coordinates
– Geographic templates – Amounts to a primitive
continually evolve gazetteer of geographic
– Over 20 distinct template forms entities in Wikipedia
at this time for different
coordinate systems and feature
types
12. Features With Extent
All geopages are tagged with a single lat/lon point
Tradeoff between simplicity and accuracy
Examples: Country or state Center or capital city,
Road Midpoint, River Source
Want to distinguish these features, as tagged point may
be geographically distant from other contributor edits
In Wikipedia, more precise coordinates generally
indicates smaller extent
California: (37, -120)
San Jose, CA: (37.304, -121.873)
13. Some research questions
1. You Are Where You Edit
2. Spatiotemporal Mapping of Wikipedia Concepts
14. You Are Where You Edit:
Locating Wikipedia Contributors Through Edit
Histories
• Contributors tend to add what they know and self-
organize into groups based on interest
– Can contributors be further categorized based on their edits
to geographic pages? (= geopages)
• Identify Wikipedia contributors who:
– Edit geopages in a constrained geographic area
– Mostly edit one or two “pet” geopages
– Identify reasons for the above patterns
Worked only on the english Wikipedia version
8 Oct 2008 - 61.7GB of data.
You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
15. Wikipedia/DBpedia dump statistics.
Stat Type Class Total Geo Geo%
pages 14915993 328393 2.2%
contrib both 16235895 2011828 12.4%
anon 13795118 1655135 12.0%
named 2440777 356693 14,6%
edits both 224473397 15341937 6.8%
anon 55571407 4519807 8.1%
named both 168901990 10822130 6.4%
non minor 114844836 6357558 5.5%
minor 54057154 4464572 8.3%
A considerable number of pages (~330k) are tagged with geographic coordinates
16. Basic Observations
• Named contributors are outnumbered by
anonymous ones by about 5 to 1, but are
responsible for 2–3 times as many geopage
edits
• A nontrivial number of named contributors have
made at least one non-minor edit to a geopage
(14.6%)
• Most edits to geopages are non-minor edits
(58.7%)
17. Geopages Country distribution
Country Count
• Vast majority of USA 83871
geopages tagged to France 37730
the US and Europe UK 26651
• Possibly reflects the Poland 16050
Germany 15939
geographic
Russia 10964
distribution of
Canada 8970
contributors to the
Italy 8772
English Wikipedia
Spain 6603
India 5683
You Are Where You Edit - Michael D. Lieberman - ICWSM 2009, San Jose, CA
19. Wikipedia Edit Histories
• Easily-parsed XML format
• Information saved for each edit:
– Username (or IP address, if anonymous)
– Timestamp
– Whether edit is “minor” (spelling, formatting)
• Excluded anonymous edits
– Not allowed to be marked minor, to avoid abuse
– Most Wikipedia vandalism perpetrated anonymously
• Also excluded minor edits
– Geopages tend to have mostly non-minor edits
21. Indentify edit area contributors
•
Large number of edits to geopages
•
Geopage edits constrained to a small
area
•
At least K edited geopages
•
Area α of convex hull of edited geopage
coordinates smaller than A (edit area) Of 356693 contributors with at
K = 3 and - A = 1 deg2 ≈ 112 x 112 km
least one edit to a geopage,
only 102271 (28.7%) have user
pages. Also, for the 93195
contributors with at least five
edits to geopages, only
47623
(51.1%) have user pages.
22. Accounting For Outliers
• Local edit patterns may be muddled
by “outlier” edits
• For each contributor, select a fraction
F of edited geopages with smallest
convex hull area
• Simple approximation scheme:
1. For each geopage P:
a. Sort edited geopages by distance from P
b. Compute convex hull HP of first F geopages
2. Select HP with smallest area α
• Example: 71 deg2 - 10 deg2
(5k x 5k mi - 112 x 112km)
23. Contributor Locality
Computed minimum edit
area sizes for
F = {95%, 80%}, both
(a) with and (b) without
features with extent
30–35% of contributors
have edit areas smaller
than 1 deg2
Over 50% of contributors
with less than 5 geopage
edits are highly local
24. Pet Geopages
• Statistics for users with:
– 5–20 edits (~93k)
– over 20 edits (~28k)
• Over 50% of contributors
with 5–20 edits, and 25%
of contributors with over 20
edits, have over 80% of
geopage edits confined to
two geopages
25. Reasons for Tight Edit Areas
Randomly selected 100
contributors with at least 10
edits to geopages and small
edit areas
• Concurrently examined contributors’ user pages and the
set of edited geopages to determine an interest
• Contributors with small edit areas tend to be born in or
are living in the region defined by their edit areas
26. Some research questions
1. You Are Where You Edit
2. Spatiotemporal Mapping of Wikipedia Concepts
27. The question
“Where” and “when” are important implicit aspects
of a wide variety of concepts.
Wikipedia offer:
. Geopages
–
. Biography (birth and death dates)
–
. Temporal and spatial information concerning concepts
– (Romanticism, Scholasticism)
HOW ASSOCIATE THIS INFORMATION?
The solution start by using dbpedia
By using common wiki pages in this languages:
English, German, French, Italian, Spanish, Dutch and Portuguese.
28. Results about spatiotemporal
mapping of Wikipedia concepts
Topics explorerd:
1.Concepts and geolocation
2.Biography and country
3.Cultural interaction between countries
4.Historical periods of literature and philosophy
and related countries
29. Country rappresented in Wikipedia
geopages
Distribution of geotagged articles in different countries (log2 values are plotted).
30. Top 20 cities (geotagged)
14.000 total
Top 20 cities by number of geotagged articles.
31. People by century
Distribution of Wikipedia person articles per century of lifespan.
Log2 values are shown.
The total number of century-people associations is higher than 423,846 because many persons are
associated with 2 centuries
34. Cultural interactions
Top 5 source (incoming) and destinations (outgoing) of cultural interactions
for 10 countries.
Statistics computer from the study of locations associated to persons present in Wikipedia.
The size of displayed countris is proportional to the log2 of the contry's score.
35. Top 5 cities and countries in philosophy
for different periods
Top 5 cities and countries in philosophy for different periods.
Starting with the 1st century, five century slots were used.
36. Top 5 cities and countries in literature
from the 15th century to nowadays.
One century time slots were usede
37. Creative Commons
Attribution-ShareAlike 2.5
You are free:
●
to copy, distribute, display, and perform the work
●
to make derivative works
●
to make commercial use of the work
Under the following conditions:
Attribution. You must attribute the work in the manner specified by the author or
licensor.
Share Alike. If you alter, transform, or build upon this work, you may distribute the
resulting work only under a license identical to this one.
For any reuse or distribution, you must make clear to others the license terms of
this work.
Any of these conditions can be waived if you get permission from the copyright
holder.
Your fair use and other rights are in no way affected by the above.
More info at http://creativecommons.org/licenses/by-sa/2.5/
All the images come from the relative papers
38. Bibliography
Sui, D.Z. The wikification of GIS and its consequences: or Angelina Jolie's new tattoo
and the future of GIS. Comp. Env. Urb. Sys. 2008, 32, 1-5.
http://geog.tamu.edu/~sui/publication/pub2008/SuiCEUSeditorial.pdf
Hecht, B.; Rohs, M.; Schöning, J.; and Krüger, A. 2007. WikEye - using magic lenses to
explore georeferenced Wikipedia content. In Proc. of the 3rd International Workshop on
Pervasive Mobile Interaction Devices
http://www.deutsche-telekom-laboratories.de/~rohs/papers/Hecht-WikEye.pdf
Michael D. Lieberman You Are Where You Edit ICWSM 2009, San Jose, CA
http://www.umiacs.umd.edu/~jimmylin/publications/Lieberman_Lin_ICWSM2009.pdf
Adrian's analysis of Wikipedia: Adrian Popescu, Gregory Grefenstette Spatiotemporal Mapping of
Wikipedia Concepts, JCDL 2010, June 21 - 25, Brisbane, Australia
http://portal.acm.org/ft_gateway.cfm?id=1816142&type=pdf