SlideShare une entreprise Scribd logo
1  sur  51
Merrilee Proffitt and
Max Klein
OCLC Research
August 24 2012
 45 years old
 Almost 30K libraries contributing from
  170 countries
 More than 271 M items
 1200 employees
 21 offices worldwide
   Since 1978
   46 people
   3 locations (Dublin, San Mateo, Leiden)

   Pure research
     not product R&D
     not market research
   Wikipedians still complain about the vector
    skin
 Although content creation is fast
 Internal policy progress is glacial, conservative

   Consensus model over asynchronous and near-
    anonymous discussion
   “The free bureaucracy, that anyone can
    legislate.” ~ San Francisco Wiknic 2012
   Community orginated.
     27,456 instances
   2009 “Linkspam” accusations against OCLC.
     Cause links to Amazon and B&N on the WorldCat
      page.
     Original accuser was banned for being
      argumentative.
   Crux: Should Wikipedia promote any
    organization?
     Open question in the community
   Disambiguation
   Collation
   Authority file
    matching
   During creation used
    Wikipedia data
    2013. Wikipedia will
    be promoted to
    “source” rather than
    reference.
   English Wikipedia
     4,000 instances
   German Wikipeida
     220,000 instances
   Wikimedia Commons
     45,000 instances
                          …
   Added by hand
   Rules vary by
    language
…




Load VIAF Data   Check Deutsche Wikipedia   Edit English Wikipedia
   English Only, for now
   Targets 260,000 pages
     1/16th of English Wikipedia
   Still won’t be fully synched with Deutsche
    Wikipedia
   https://github.com/notconfusing/VIAFbot
   Uses Pywikipediabot
   In community code review: running within the
    next month
   Transclusion & Sugarcoated HTML
   Transclusion
     You can draw in text from other pages (typically
      templates)
     Can send parameters
   Templates can perform
     Simple logic operations
     Simple text manipulation

   Still Wikitext, not fully query-able
“The way you always thought Wikipedia worked.”
~Merrilee Proffitt
   Phase 1
     Revamping interlanguage links
   Phase 2
     Data, Templates and Infoboxes
   Phase 3
     Semantic querying
   Now: Added by      Soon: Wikidata
    hand or bot         concept page
   Soon: Properties for a concept
   Soon: This won’t be a monumental effort.
   The end of the assumption that Wikipages
    store Wikitext.
   On Wikidata they store JSON.
   All the work VIAFbot is doing, will be accessible
    across 270 Wikis.
   Plus language specific lookup…
   RDF Data
   Backers: Google, Paul Allen Institute for
    Artificial Intelligence, Gordon and Betty Moore
    Foundation.
   Release Date: January 2013
   Caveat: Requires adoption by each individual
    language wiki – by consensus.
   Wikipedias having found consensus so far: …
   Hungarian Wikipedia
 Bibliographic data is both:
  An element of citation
  An articles in its own right
•     411,274 citations of books
•     244, 236 citations of journals
•     57,868 citations of encyclopedias
•     342,470 of newspapers
•     1,055,845 total print citations
•     1,169,495 citations of web
http://en.wikipedia.org/wiki/User:Maximilianklein/Citations
• 154,978 Citation of Google books
• 38,328 Citations of Amazon
• 7,695 Citations of Worldcat
http://webempires.org/wikirank-wikipedias-top-sources/wiki_top/




• Must Make it easier to link to libraries.
 Wikipedia features bidirectional linking.
  Take links forward all the time, why not backwards?
 Could add “what cites this”




 What cites this
 A Wikipedia article could be a good way of
  declaring the aboutness of a record.
~Asaf Bartov (User:Ijon)
links to
 Could add “what’s about this”




What’s about this
What’s
about this
 Dream
  Take your browser history
   Would still have to create bidirectional links
    between WorldCat and Wikipeida
   There is the practical solution.

   VIAFbot is the prototype of the link
    reciprocation solution
   Have to gain Wikipedia approval to reciprocate
    links with a bot
     Subject to community approval
   Requires maintenance
     Can become unsynchronized
   Seaplanes
     Imitated bidirectional

   Islands
     Wikipedia, VIAF, WorldCat

   Data Archipelago
Max Klein and Merrilee Proffitt
@notconfusing and@merrileeiam

Contenu connexe

Tendances

Tendances (20)

The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
Data availability policies and licensing
Data availability policies and licensingData availability policies and licensing
Data availability policies and licensing
 
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
UKSG Conference 2016 Breakout Session - Discovery and linking integrity – do ...
 
Disrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for librariesDisrupting academic publishing: a future role for libraries
Disrupting academic publishing: a future role for libraries
 
Disrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to UniversitiesDisrupting Academic Publishing: Returning Control to Universities
Disrupting Academic Publishing: Returning Control to Universities
 
The data journal: incentivizing open scholarship or 'a convenient fiction'?
The data journal: incentivizing open scholarship or 'a convenient fiction'?The data journal: incentivizing open scholarship or 'a convenient fiction'?
The data journal: incentivizing open scholarship or 'a convenient fiction'?
 
Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)Signposting Overview (Version November 2017)
Signposting Overview (Version November 2017)
 
The Shift to Open Access Publishing
The Shift to Open Access PublishingThe Shift to Open Access Publishing
The Shift to Open Access Publishing
 
The Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for PublishingThe Ubiquity Partner Network: Global Support for Publishing
The Ubiquity Partner Network: Global Support for Publishing
 
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can EditWikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
Wikidata: Verifiable, Linked Open Knowledge That Anyone Can Edit
 
UKSG Conference 2016 Breakout Session - Of Libraries and Labs: effecting user...
UKSG Conference 2016 Breakout Session - Of Libraries and Labs: effecting user...UKSG Conference 2016 Breakout Session - Of Libraries and Labs: effecting user...
UKSG Conference 2016 Breakout Session - Of Libraries and Labs: effecting user...
 
Open Access via Open Source
Open Access via Open SourceOpen Access via Open Source
Open Access via Open Source
 
Open Access: Advantages, Funding, Opportunities
Open Access: Advantages, Funding, Opportunities Open Access: Advantages, Funding, Opportunities
Open Access: Advantages, Funding, Opportunities
 
Overcoming Obstacles to Sharing Research Data
Overcoming Obstacles to Sharing Research DataOvercoming Obstacles to Sharing Research Data
Overcoming Obstacles to Sharing Research Data
 
Linked Open (meta)Data
Linked Open (meta)DataLinked Open (meta)Data
Linked Open (meta)Data
 
Research Ideas and Outcomes (RIO) Journal: from Open Access to Open Science f...
Research Ideas and Outcomes (RIO) Journal: from Open Access to Open Science f...Research Ideas and Outcomes (RIO) Journal: from Open Access to Open Science f...
Research Ideas and Outcomes (RIO) Journal: from Open Access to Open Science f...
 
Publishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising RigourPublishing Open Data: Incentivising Rigour
Publishing Open Data: Incentivising Rigour
 
Open Access Publishing
Open Access PublishingOpen Access Publishing
Open Access Publishing
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
 LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P... LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
LibChain – Open, Verifiable and Anonymous Access Management. Juan Cabello, P...
 

En vedette

国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
Toru Saito
 
Al Capone
Al CaponeAl Capone
Al Capone
MrG
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
Aleena Khan
 

En vedette (6)

国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
国内外成功事例に学ぶソーシャルメディア活用最前線Ver1.3.0
 
Al Capone
Al CaponeAl Capone
Al Capone
 
Chromosome walking
Chromosome walkingChromosome walking
Chromosome walking
 
Gene cloning
Gene cloningGene cloning
Gene cloning
 
Yahoo compares Storm and Spark
Yahoo compares Storm and SparkYahoo compares Storm and Spark
Yahoo compares Storm and Spark
 
korean porn
korean porn korean porn
korean porn
 

Similaire à Wikipedia and Libraries: Island Hopping the Data Archipelago

Similaire à Wikipedia and Libraries: Island Hopping the Data Archipelago (20)

Using wikipedia as a source of chemical information
Using wikipedia as a source of chemical informationUsing wikipedia as a source of chemical information
Using wikipedia as a source of chemical information
 
Dissecting Wikipedia
Dissecting WikipediaDissecting Wikipedia
Dissecting Wikipedia
 
Wiki on Library Perspective
Wiki on Library PerspectiveWiki on Library Perspective
Wiki on Library Perspective
 
Wikis
WikisWikis
Wikis
 
Wikis: Basics, Tools and Strategies
Wikis: Basics, Tools and StrategiesWikis: Basics, Tools and Strategies
Wikis: Basics, Tools and Strategies
 
Wikis: Basics Tools And Strategies - IL2007
Wikis: Basics Tools And Strategies - IL2007Wikis: Basics Tools And Strategies - IL2007
Wikis: Basics Tools And Strategies - IL2007
 
Wikis: Basics Tools and Strategies
Wikis: Basics Tools and StrategiesWikis: Basics Tools and Strategies
Wikis: Basics Tools and Strategies
 
Edcowiki
EdcowikiEdcowiki
Edcowiki
 
Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?Puzzled by Wikis And Blogs?
Puzzled by Wikis And Blogs?
 
Wikis in Teaching and Learning
Wikis in Teaching and LearningWikis in Teaching and Learning
Wikis in Teaching and Learning
 
DM110 - Week 3 - Wikis
DM110 - Week 3 - WikisDM110 - Week 3 - Wikis
DM110 - Week 3 - Wikis
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
Fantastic Two wiki's
Fantastic Two wiki'sFantastic Two wiki's
Fantastic Two wiki's
 
Fantastic Two
Fantastic TwoFantastic Two
Fantastic Two
 
Wrangling Wikipedia
Wrangling WikipediaWrangling Wikipedia
Wrangling Wikipedia
 
Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!Web 2.0...it’s okay to play!
Web 2.0...it’s okay to play!
 
The Future of Libraries and Wikipedia
The Future of Libraries and WikipediaThe Future of Libraries and Wikipedia
The Future of Libraries and Wikipedia
 
E Write Intro To Web 2
E Write   Intro To Web 2E Write   Intro To Web 2
E Write Intro To Web 2
 
SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia SLA Presentation - Institutional Partnerships with Wikipedia
SLA Presentation - Institutional Partnerships with Wikipedia
 
A Survey of the Landscape and State-of-Art in Semantic Wiki
A Survey of the Landscape and State-of-Art in Semantic WikiA Survey of the Landscape and State-of-Art in Semantic Wiki
A Survey of the Landscape and State-of-Art in Semantic Wiki
 

Dernier

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Dernier (20)

Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 

Wikipedia and Libraries: Island Hopping the Data Archipelago

  • 1. Merrilee Proffitt and Max Klein OCLC Research August 24 2012
  • 2.  45 years old  Almost 30K libraries contributing from 170 countries  More than 271 M items  1200 employees  21 offices worldwide
  • 3. Since 1978  46 people  3 locations (Dublin, San Mateo, Leiden)  Pure research  not product R&D  not market research
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. Wikipedians still complain about the vector skin
  • 10.
  • 11.  Although content creation is fast  Internal policy progress is glacial, conservative  Consensus model over asynchronous and near- anonymous discussion
  • 12. “The free bureaucracy, that anyone can legislate.” ~ San Francisco Wiknic 2012
  • 13.
  • 14. Community orginated.  27,456 instances  2009 “Linkspam” accusations against OCLC.  Cause links to Amazon and B&N on the WorldCat page.  Original accuser was banned for being argumentative.
  • 15. Crux: Should Wikipedia promote any organization?  Open question in the community
  • 16. Disambiguation  Collation
  • 17. Authority file matching  During creation used Wikipedia data  2013. Wikipedia will be promoted to “source” rather than reference.
  • 18. English Wikipedia  4,000 instances  German Wikipeida  220,000 instances  Wikimedia Commons  45,000 instances …  Added by hand  Rules vary by language
  • 19. … Load VIAF Data Check Deutsche Wikipedia Edit English Wikipedia
  • 20. English Only, for now  Targets 260,000 pages  1/16th of English Wikipedia  Still won’t be fully synched with Deutsche Wikipedia
  • 21. https://github.com/notconfusing/VIAFbot  Uses Pywikipediabot  In community code review: running within the next month
  • 22. Transclusion & Sugarcoated HTML
  • 23. Transclusion  You can draw in text from other pages (typically templates)  Can send parameters  Templates can perform  Simple logic operations  Simple text manipulation  Still Wikitext, not fully query-able
  • 24. “The way you always thought Wikipedia worked.” ~Merrilee Proffitt
  • 25. Phase 1  Revamping interlanguage links  Phase 2  Data, Templates and Infoboxes  Phase 3  Semantic querying
  • 26. Now: Added by  Soon: Wikidata hand or bot concept page
  • 27. Soon: Properties for a concept
  • 28. Soon: This won’t be a monumental effort.
  • 29. The end of the assumption that Wikipages store Wikitext.  On Wikidata they store JSON.
  • 30. All the work VIAFbot is doing, will be accessible across 270 Wikis.  Plus language specific lookup…
  • 31. RDF Data
  • 32. Backers: Google, Paul Allen Institute for Artificial Intelligence, Gordon and Betty Moore Foundation.  Release Date: January 2013  Caveat: Requires adoption by each individual language wiki – by consensus.  Wikipedias having found consensus so far: …
  • 33. Hungarian Wikipedia
  • 34.  Bibliographic data is both:  An element of citation  An articles in its own right
  • 35. 411,274 citations of books • 244, 236 citations of journals • 57,868 citations of encyclopedias • 342,470 of newspapers • 1,055,845 total print citations • 1,169,495 citations of web http://en.wikipedia.org/wiki/User:Maximilianklein/Citations
  • 36. • 154,978 Citation of Google books • 38,328 Citations of Amazon • 7,695 Citations of Worldcat http://webempires.org/wikirank-wikipedias-top-sources/wiki_top/ • Must Make it easier to link to libraries.
  • 37.
  • 38.  Wikipedia features bidirectional linking.  Take links forward all the time, why not backwards?
  • 39.  Could add “what cites this” What cites this
  • 40.  A Wikipedia article could be a good way of declaring the aboutness of a record. ~Asaf Bartov (User:Ijon)
  • 42.  Could add “what’s about this” What’s about this
  • 44.  Dream  Take your browser history
  • 45.
  • 46.
  • 47. Would still have to create bidirectional links between WorldCat and Wikipeida
  • 48. There is the practical solution.  VIAFbot is the prototype of the link reciprocation solution
  • 49. Have to gain Wikipedia approval to reciprocate links with a bot  Subject to community approval  Requires maintenance  Can become unsynchronized
  • 50. Seaplanes  Imitated bidirectional  Islands  Wikipedia, VIAF, WorldCat  Data Archipelago
  • 51. Max Klein and Merrilee Proffitt @notconfusing and@merrileeiam