SlideShare une entreprise Scribd logo
1  sur  11
Summaries of Wikipedia
Usage Data
Paul Houle, Ontology2
The x-axis is months since Jan 2008, the Yaxis is the total number of hits to all
Wikipedia pages.
There are some violent variations that are
probably caused by data quality problems, in
particular around index 30 (2010-06 and
2010-07) we see a drop in hits, then a very
high number of hits in (2010-11). I think
there may be a few weeks of data missing
sometime in that time range
The y-axis here is the fraction of hits to the English
Wikipedia. At the beginning, more than 50% of
the traffic went to the “en” Wikipedia, but that
has fallen off and now “en” represents a bit more
than 1/3 of the traffic.
“en” is still dominant, but others are catching up.
The y-axis here is the fraction of traffic to the
German Wikipedia. Like “en”, the fraction falls
over time. Note that there is a high spike at Dec
2008
The y-axis here is hits to the Japanese Wikipedia
and the story is similar to “de” except the crazy
spike happens around March 2013
The fraction of traffic in the francophone region,
“fr”, actually looks stable over time
The fraction of hits to the Korean language
Wikipedia actually have been increasing
(something has to if “en”, “de” and “ja” are
declining)
The fraction of hits to the Chinese Wikipedia has
grown over time, but there is a drop in the time frame
that looks unstable on the summary graph at the
beginning and another crazy spike
The fraction of traffic in the “es” cultural zone seems to
have a strong seasonal variation
Top 15 Wikimedia Sites ordered by fraction of all-time hits.
Note that “ja” is Japan, “zh” is Chinese, and “tr” is Turkish.
en.mw and ja.mw both come up with a single URI, so these probably represent a
redirect somewhere.
Notes on data sources
• Original source: http://dumps.wikimedia.org/other/pagecounts-raw/
• Hourly files were aggregated at the month level; a few invalid (empty
or full of HTML) files were removed as were a few lines that did not
parse. Content sizes were removed
• URIs that got fewer than 10 hits a month were removed from the
monthlies (this reduces the number of URIs roughly tenfold!)

Contenu connexe

Plus de Paul Houle

Plus de Paul Houle (20)

Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6Chatbots in 2017 -- Ithaca Talk Dec 6
Chatbots in 2017 -- Ithaca Talk Dec 6
 
Estimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development ProcessEstimating the Software Product Value during the Development Process
Estimating the Software Product Value during the Development Process
 
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
Universal Standards for LEI and other Corporate Reference Data: Enabling risk...
 
Fixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI SystemFixing a leaky bucket; Observations on the Global LEI System
Fixing a leaky bucket; Observations on the Global LEI System
 
Cisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart DataCisco Fog Strategy For Big and Smart Data
Cisco Fog Strategy For Big and Smart Data
 
Making the semantic web work
Making the semantic web workMaking the semantic web work
Making the semantic web work
 
Paul houle the supermen
Paul houle   the supermenPaul houle   the supermen
Paul houle the supermen
 
Paul houle what ails enterprise search
Paul houle   what ails enterprise search Paul houle   what ails enterprise search
Paul houle what ails enterprise search
 
Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#Extension methods, nulls, namespaces and precedence in c#
Extension methods, nulls, namespaces and precedence in c#
 
Dropping unique constraints in sql server
Dropping unique constraints in sql serverDropping unique constraints in sql server
Dropping unique constraints in sql server
 
Paul houle resume
Paul houle resumePaul houle resume
Paul houle resume
 
Embrace dynamic PHP
Embrace dynamic PHPEmbrace dynamic PHP
Embrace dynamic PHP
 
Once asynchronous, always asynchronous
Once asynchronous, always asynchronousOnce asynchronous, always asynchronous
Once asynchronous, always asynchronous
 
Pro align snap 2
Pro align snap 2Pro align snap 2
Pro align snap 2
 
Proalign Snapshot 1
Proalign Snapshot 1Proalign Snapshot 1
Proalign Snapshot 1
 
Text wise technology textwise company, llc
Text wise technology   textwise company, llcText wise technology   textwise company, llc
Text wise technology textwise company, llc
 
Tapir user manager
Tapir user managerTapir user manager
Tapir user manager
 
The Global Performing Arts Database
The Global Performing Arts DatabaseThe Global Performing Arts Database
The Global Performing Arts Database
 
Arxiv.org: Research And Development Directions
Arxiv.org: Research And Development DirectionsArxiv.org: Research And Development Directions
Arxiv.org: Research And Development Directions
 
Commonspot installation at cornell university library
Commonspot installation at cornell university libraryCommonspot installation at cornell university library
Commonspot installation at cornell university library
 

Dernier

Dernier (20)

Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Summaries of wikipedia usage data

  • 1. Summaries of Wikipedia Usage Data Paul Houle, Ontology2
  • 2. The x-axis is months since Jan 2008, the Yaxis is the total number of hits to all Wikipedia pages. There are some violent variations that are probably caused by data quality problems, in particular around index 30 (2010-06 and 2010-07) we see a drop in hits, then a very high number of hits in (2010-11). I think there may be a few weeks of data missing sometime in that time range
  • 3. The y-axis here is the fraction of hits to the English Wikipedia. At the beginning, more than 50% of the traffic went to the “en” Wikipedia, but that has fallen off and now “en” represents a bit more than 1/3 of the traffic. “en” is still dominant, but others are catching up.
  • 4. The y-axis here is the fraction of traffic to the German Wikipedia. Like “en”, the fraction falls over time. Note that there is a high spike at Dec 2008
  • 5. The y-axis here is hits to the Japanese Wikipedia and the story is similar to “de” except the crazy spike happens around March 2013
  • 6. The fraction of traffic in the francophone region, “fr”, actually looks stable over time
  • 7. The fraction of hits to the Korean language Wikipedia actually have been increasing (something has to if “en”, “de” and “ja” are declining)
  • 8. The fraction of hits to the Chinese Wikipedia has grown over time, but there is a drop in the time frame that looks unstable on the summary graph at the beginning and another crazy spike
  • 9. The fraction of traffic in the “es” cultural zone seems to have a strong seasonal variation
  • 10. Top 15 Wikimedia Sites ordered by fraction of all-time hits. Note that “ja” is Japan, “zh” is Chinese, and “tr” is Turkish. en.mw and ja.mw both come up with a single URI, so these probably represent a redirect somewhere.
  • 11. Notes on data sources • Original source: http://dumps.wikimedia.org/other/pagecounts-raw/ • Hourly files were aggregated at the month level; a few invalid (empty or full of HTML) files were removed as were a few lines that did not parse. Content sizes were removed • URIs that got fewer than 10 hits a month were removed from the monthlies (this reduces the number of URIs roughly tenfold!)