SlideShare une entreprise Scribd logo
1  sur  20
Dissecting Wikipedia




                                     Andrew Gray

           Wikipedian in Residence, British Library

              andrew.gray@bl.uk // @generalising
Wikipedia & Wikimedia



   Wikimedia
      Movement and charitable body
      80-100,000 contributors in 280 languages
        and eleven core projects
      Image repository, dictionary, news site…
      …used by almost 500,000,000 people



   Wikipedia
      25,000,000 articles, 4,000,000 in English
      representing 8-9,000,000 topics & entities
      6,500 articles and 235,000 edits per day

    (…and twelve years ago, this was all fields…)
…so what is Wikipedia?



   …an encyclopedia (more or less)

   …written neutrally

   …and verifiably

   …using previously published information

   …free to use, distribute, or reuse

   …a collaborative community

   …with no firm rules
A developing internal infrastructure



   All edits are visible through watchlists and page histories
      About 7% are vandalism or malicious; processes to detect
         these
      Median time to correction < 2 minutes… but some stay much
         longer

   Individual discussion pages for all articles – “talk”

   Quality review and assessment process

   Specialised working groups and central noticeboards
      eg/ content topics; style; dispute resolution; copyright; etc.
Quality of Wikipedia as a source



   On average… it’s not bad
      In 2005 four errors per article, versus three in Britannica
      In 2011, in English, Spanish & Arabic:
            “…the Wikipedia articles in this sample scored higher overall than the
            comparison articles with respect to accuracy, references, style/
            readability and overall judgment…”

   Millions of articles – so many are, individually, problematic
      Various ways of identifying “signs” of quality
      Markers for quality are both obvious and subtle



   Very effective “springboard” tool
Moving to other content



   Other languages – not translations, and may have more content

   Mousing over footnote markers

   Within the references:
      Links through DOIs and other identifiers
      ISBNs go to a special landing page
           …and then out to libraries, booksellers, etc
      ISSNs go to WorldCat
      If an author, look for authority control links:
Other research tools



   Some tools available – “toolserver” allows live DB queries
      Complex to use, but rewarding




   CatScan: look for intersection of categories
      “all physicists born in 1912” – 53 in English, 35 in German




   Full dumps of all data available – http://dumps.wikipedia.org/



   Reusers – Freebase, DBpedia, Wolfram Alpha
Wikidata



   Wikidata: our new linked data repository
      Phase I: cross-language links
      Phase II: structured data elements
      Phase III: dynamic lists




   Very loosely defined schema

   Currently harvesting structured data from WP

   Public API, open to reusers

   CC-0 licensed data – fully open
Research about Wikipedia



   Thriving research around Wikimedia communities & content
      by mid-2011, 2100 peer-reviewed articles and 38 PhD theses
      Active research committee and WMF support

   Regular community-produced monthly newsletter
      http://enwp.org/meta:Research // @wikiresearch

   Topics include:
      Community and content creation
      Reading and researching by users
      Quality of content
      Technical research
      Large-scale content examination
Research on communities



   Research on the Wikipedia communities:


        Dynamics of community conflict, discussions, collaboration,
         voting, contribution, mentoring…
        Demographics, motivation and specialisms of contributors
        Patterns of growth and content creation/deletion
        Effect of central programs on volunteer activity
        Cross-cultural interaction
Visualisation: discussion dynamics




                                     http://notabilia.net/
Editor activity and motivation




                        http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
Research on users



   Research on usage of Wikipedia:


        Specific searching behaviour
        Patterns of usage (yearly, daily)
        Tracking external events through Wikipedia
        Search engine rankings
        Change in usage by students
        Effect of Wikipedia publication on wider literature
Visualising editing patterns




                       http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
Research on content



   Research on the content of Wikipedia:


        Evolution of content
        Accuracy, coverage and quality
        Biases – geographic, cultural, gender
        Linguistic analysis
        Effect of external publications on Wikipedia
Quality assessment comparisons




           http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
Research on technical aspects



   Research on the technical side of Wikipedia:


      Extensive work on scaling open-content services
      Tools for detecting and handling vandalism
      Algorithmic detection and identification of bias, spam
      Practical research on uses of wikis
Research using content



   Research using content from Wikipedia

   Hard to distinguish from “conventional” research, but some
    examples:


      Geographical analysis
      Visualisations of content
      Source for extracted datasets


        ...and Wikidata still to come!
Visualising art history




                          http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
Visualising place




                    https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png

Contenu connexe

Tendances

Tendances (20)

2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
Supporting Open Access Publishing via Open Journal Systems – One Library’s ex...
 
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...Library Support of Identification and Discovery of Scholarly Output - Cross- ...
Library Support of Identification and Discovery of Scholarly Output - Cross- ...
 
Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change 	Knowledge Unlatched – Navigating Through the Rapids of Change
Knowledge Unlatched – Navigating Through the Rapids of Change
 
KBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO UpdateKBART-Wilson-ALA Annual 2015 NISO Update
KBART-Wilson-ALA Annual 2015 NISO Update
 
Wikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-WikiWikimedia Translation in Meta-Wiki
Wikimedia Translation in Meta-Wiki
 
Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015Open Discovery Initiative Successes - January 28, 2015
Open Discovery Initiative Successes - January 28, 2015
 
The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?The Future of Research Communications and e-Scholarship: Are we there yet?
The Future of Research Communications and e-Scholarship: Are we there yet?
 
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving DocumentationWikis, Rubrics and Views: An Integrated Approach to Improving Documentation
Wikis, Rubrics and Views: An Integrated Approach to Improving Documentation
 
Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014Open Access Metadata Indicators - NISO Update Jan 2014
Open Access Metadata Indicators - NISO Update Jan 2014
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Data Designed for Discovery
Data Designed for DiscoveryData Designed for Discovery
Data Designed for Discovery
 
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
3.11.16 Slides, “Institutional Perspectives on the Impact of SHARE and VIVO T...
 
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic RoadmapALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
ALA 2016 NISO Standards Update Hillman Bibliographic Roadmap
 
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
2.24.16 Slides, “VIVO plus SHARE: Closing the Loop on Tracking Scholarly Acti...
 
2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery2015 NISO Forum: The Future of Library Resource Discovery
2015 NISO Forum: The Future of Library Resource Discovery
 
Introduction to databases and metadata
Introduction to databases and metadataIntroduction to databases and metadata
Introduction to databases and metadata
 
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI PresentationOpen Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
Open Context and Publishing to the Web of Data: Eric Kansa's LAWDI Presentation
 
Caldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data RepositoriesCaldrone - Specific Needs and Concerns Associated with Data Repositories
Caldrone - Specific Needs and Concerns Associated with Data Repositories
 

En vedette (6)

Lecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and ReliabilityLecture 25: Wikipedia and Reliability
Lecture 25: Wikipedia and Reliability
 
Trusting wikipedia
Trusting wikipediaTrusting wikipedia
Trusting wikipedia
 
Wikipedia and Medicine
Wikipedia and MedicineWikipedia and Medicine
Wikipedia and Medicine
 
The Wikipedia Model
The Wikipedia ModelThe Wikipedia Model
The Wikipedia Model
 
Wikipedia basics
Wikipedia basicsWikipedia basics
Wikipedia basics
 
FirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearchFirstWorkshopOnWikipediaResearch
FirstWorkshopOnWikipediaResearch
 

Similaire à Dissecting Wikipedia

Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
Nick Sheppard
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
innovatics
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
bjornh
 

Similaire à Dissecting Wikipedia (20)

Using wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trendsUsing wikis in library liaison work: overview & trends
Using wikis in library liaison work: overview & trends
 
Contributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and WikimediaContributing to the global commons: Repositories and Wikimedia
Contributing to the global commons: Repositories and Wikimedia
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Wikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s VisibilityiWikipedia and Libraries: Increasing your Library’s Visibilityi
Wikipedia and Libraries: Increasing your Library’s Visibilityi
 
Wikimedia Presentation for Schools
Wikimedia Presentation for SchoolsWikimedia Presentation for Schools
Wikimedia Presentation for Schools
 
Wiki on Library Perspective
Wiki on Library PerspectiveWiki on Library Perspective
Wiki on Library Perspective
 
Mediawiki and Wiki As a Medium
Mediawiki and Wiki As a MediumMediawiki and Wiki As a Medium
Mediawiki and Wiki As a Medium
 
An Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital WritingAn Analysis Of Wikipedia Digital Writing
An Analysis Of Wikipedia Digital Writing
 
Enrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysisEnrichment of multilingual Wikipedia based on quality analysis
Enrichment of multilingual Wikipedia based on quality analysis
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and Wikipedia
 
Using wikis for teaching
Using wikis for teachingUsing wikis for teaching
Using wikis for teaching
 
An introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issuesAn introduction to Wikipedia and cataloguing issues
An introduction to Wikipedia and cataloguing issues
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
 
Future libraries london
Future libraries londonFuture libraries london
Future libraries london
 
ALIA Wikipedia and libraries
ALIA Wikipedia and librariesALIA Wikipedia and libraries
ALIA Wikipedia and libraries
 
Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010Wikipedia Seminar For Cipr October 2010
Wikipedia Seminar For Cipr October 2010
 
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
Descubrimiento, entrega de información y gestión: tendencias actuales de las ...
 
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & WritingStudent to Author: Using Wikipedia to Improve Undergraduate Research & Writing
Student to Author: Using Wikipedia to Improve Undergraduate Research & Writing
 
Web 2.0
Web 2.0Web 2.0
Web 2.0
 
Social Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interactionSocial Networking: Tools and Technologies for enhancing user interaction
Social Networking: Tools and Technologies for enhancing user interaction
 

Plus de Andrew Gray

Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong Kong
Andrew Gray
 

Plus de Andrew Gray (8)

Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014Wikipedia and information literacy - LILAC 2014
Wikipedia and information literacy - LILAC 2014
 
Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013Wikipedia in the Library - The European Library, Amsterdam 2013
Wikipedia in the Library - The European Library, Amsterdam 2013
 
Community communications slides
Community communications slidesCommunity communications slides
Community communications slides
 
Wikipedia in the Library Wikimania Hong Kong
Wikipedia in the Library   Wikimania Hong KongWikipedia in the Library   Wikimania Hong Kong
Wikipedia in the Library Wikimania Hong Kong
 
Introduction to Wikidata
Introduction to WikidataIntroduction to Wikidata
Introduction to Wikidata
 
Social Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal ManuscriptsSocial Media at the British Library - Royal Manuscripts
Social Media at the British Library - Royal Manuscripts
 
AHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence ReportAHRC Wikipedian in Residence Report
AHRC Wikipedian in Residence Report
 
Wikipedia Workshop presentation
Wikipedia Workshop presentationWikipedia Workshop presentation
Wikipedia Workshop presentation
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Dernier (20)

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 

Dissecting Wikipedia

  • 1. Dissecting Wikipedia Andrew Gray Wikipedian in Residence, British Library andrew.gray@bl.uk // @generalising
  • 2. Wikipedia & Wikimedia  Wikimedia  Movement and charitable body  80-100,000 contributors in 280 languages and eleven core projects  Image repository, dictionary, news site…  …used by almost 500,000,000 people  Wikipedia  25,000,000 articles, 4,000,000 in English  representing 8-9,000,000 topics & entities  6,500 articles and 235,000 edits per day (…and twelve years ago, this was all fields…)
  • 3. …so what is Wikipedia?  …an encyclopedia (more or less)  …written neutrally  …and verifiably  …using previously published information  …free to use, distribute, or reuse  …a collaborative community  …with no firm rules
  • 4. A developing internal infrastructure  All edits are visible through watchlists and page histories  About 7% are vandalism or malicious; processes to detect these  Median time to correction < 2 minutes… but some stay much longer  Individual discussion pages for all articles – “talk”  Quality review and assessment process  Specialised working groups and central noticeboards  eg/ content topics; style; dispute resolution; copyright; etc.
  • 5. Quality of Wikipedia as a source  On average… it’s not bad  In 2005 four errors per article, versus three in Britannica  In 2011, in English, Spanish & Arabic: “…the Wikipedia articles in this sample scored higher overall than the comparison articles with respect to accuracy, references, style/ readability and overall judgment…”  Millions of articles – so many are, individually, problematic  Various ways of identifying “signs” of quality  Markers for quality are both obvious and subtle  Very effective “springboard” tool
  • 6. Moving to other content  Other languages – not translations, and may have more content  Mousing over footnote markers  Within the references:  Links through DOIs and other identifiers  ISBNs go to a special landing page  …and then out to libraries, booksellers, etc  ISSNs go to WorldCat  If an author, look for authority control links:
  • 7. Other research tools  Some tools available – “toolserver” allows live DB queries  Complex to use, but rewarding  CatScan: look for intersection of categories  “all physicists born in 1912” – 53 in English, 35 in German  Full dumps of all data available – http://dumps.wikipedia.org/  Reusers – Freebase, DBpedia, Wolfram Alpha
  • 8. Wikidata  Wikidata: our new linked data repository  Phase I: cross-language links  Phase II: structured data elements  Phase III: dynamic lists  Very loosely defined schema  Currently harvesting structured data from WP  Public API, open to reusers  CC-0 licensed data – fully open
  • 9. Research about Wikipedia  Thriving research around Wikimedia communities & content  by mid-2011, 2100 peer-reviewed articles and 38 PhD theses  Active research committee and WMF support  Regular community-produced monthly newsletter  http://enwp.org/meta:Research // @wikiresearch  Topics include:  Community and content creation  Reading and researching by users  Quality of content  Technical research  Large-scale content examination
  • 10. Research on communities  Research on the Wikipedia communities:  Dynamics of community conflict, discussions, collaboration, voting, contribution, mentoring…  Demographics, motivation and specialisms of contributors  Patterns of growth and content creation/deletion  Effect of central programs on volunteer activity  Cross-cultural interaction
  • 11. Visualisation: discussion dynamics http://notabilia.net/
  • 12. Editor activity and motivation http://commons.wikimedia.org/wiki/File:Effect_of_barnstars_on_productivity.png
  • 13. Research on users  Research on usage of Wikipedia:  Specific searching behaviour  Patterns of usage (yearly, daily)  Tracking external events through Wikipedia  Search engine rankings  Change in usage by students  Effect of Wikipedia publication on wider literature
  • 14. Visualising editing patterns http://commons.wikimedia.org/wiki/File:WikiTrip_egyptian_revolution_screenshot.png
  • 15. Research on content  Research on the content of Wikipedia:  Evolution of content  Accuracy, coverage and quality  Biases – geographic, cultural, gender  Linguistic analysis  Effect of external publications on Wikipedia
  • 16. Quality assessment comparisons http://commons.wikimedia.org/wiki/File:Boxplot_of_Average_Article_Feedback_ratings_by_project_rated_quality.svg
  • 17. Research on technical aspects  Research on the technical side of Wikipedia:  Extensive work on scaling open-content services  Tools for detecting and handling vandalism  Algorithmic detection and identification of bias, spam  Practical research on uses of wikis
  • 18. Research using content  Research using content from Wikipedia  Hard to distinguish from “conventional” research, but some examples:  Geographical analysis  Visualisations of content  Source for extracted datasets  ...and Wikidata still to come!
  • 19. Visualising art history http://commons.wikimedia.org/wiki/File:Wikiarthistory.png
  • 20. Visualising place https://commons.wikimedia.org/wiki/File:Imageworld-artphp3.png