This set of slides illustrates the growing interest people have in Wikipedia, changes in relative interest between languages, and how much Wikipedia interest there is in different language zones.
2. The x-axis is months since Jan 2008, the Yaxis is the total number of hits to all
Wikipedia pages.
There are some violent variations that are
probably caused by data quality problems, in
particular around index 30 (2010-06 and
2010-07) we see a drop in hits, then a very
high number of hits in (2010-11). I think
there may be a few weeks of data missing
sometime in that time range
3. The y-axis here is the fraction of hits to the English
Wikipedia. At the beginning, more than 50% of
the traffic went to the “en” Wikipedia, but that
has fallen off and now “en” represents a bit more
than 1/3 of the traffic.
“en” is still dominant, but others are catching up.
4. The y-axis here is the fraction of traffic to the
German Wikipedia. Like “en”, the fraction falls
over time. Note that there is a high spike at Dec
2008
5. The y-axis here is hits to the Japanese Wikipedia
and the story is similar to “de” except the crazy
spike happens around March 2013
6. The fraction of traffic in the francophone region,
“fr”, actually looks stable over time
7. The fraction of hits to the Korean language
Wikipedia actually have been increasing
(something has to if “en”, “de” and “ja” are
declining)
8. The fraction of hits to the Chinese Wikipedia has
grown over time, but there is a drop in the time frame
that looks unstable on the summary graph at the
beginning and another crazy spike
9. The fraction of traffic in the “es” cultural zone seems to
have a strong seasonal variation
10. Top 15 Wikimedia Sites ordered by fraction of all-time hits.
Note that “ja” is Japan, “zh” is Chinese, and “tr” is Turkish.
en.mw and ja.mw both come up with a single URI, so these probably represent a
redirect somewhere.
11. Notes on data sources
• Original source: http://dumps.wikimedia.org/other/pagecounts-raw/
• Hourly files were aggregated at the month level; a few invalid (empty
or full of HTML) files were removed as were a few lines that did not
parse. Content sizes were removed
• URIs that got fewer than 10 hits a month were removed from the
monthlies (this reduces the number of URIs roughly tenfold!)