Verifiable, Linked Open Knowledge
That Anyone can Edit
Dario Taraborelli
@readermeter
A short history of Wikipedia
A website that anyone can edit
The largest reference work on the internet
A multi-language on...
A short history of Wikipedia
A website that anyone can edit
The largest reference work on the internet
A multi-language on...
A short history of Wikipedia
A website that anyone can edit
The largest reference work on the internet
A multi-language on...
Wikipedia: unintended outcomes
accelerate the dissemination of scholarship
support open scientific research
enable distrib...
accelerate the dissemination of scholarship
support open scientific research
enable distributed fact-checking and curation...
Wikipedia: unintended outcomes
accelerate the dissemination of scholarship
support open scientific research
enable distrib...
Outline
1. Wikipedia as the front matter to all research
2. A new kind of open knowledge
3. Wikidata: Collaboratively cura...
Wikipedia as the front matter to all research
“Wikipedia is not the bottom
layer of authority, nor the top,
but in fact the highest layer
without formal vetting. In thi...
Top sources of DOI resolutions
http://crosstech.crossref.org/2014/02/many-metrics-such-data-wow.html
http://blog.crossref....
The world’s most accessed online medical resource?
Heilman and West (2015) doi.org/10.2196/jmir.4069
Most visited resource on Ebola in West Africa
Heilman (2016) http://tinyurl.com/jfuyduv
Most used internet site in Liberia...
A new kind of open knowledge
The backbone of the linked open data ecosystem
Schmachtenberg et al
(2014)
http://lod-cloud.net [CC BY SA]
Challenges
Biases / errors
Coverage
Diversity and inclusiveness
Verifiability
Machine-readable linked open data
Editable by anyone
Supporting human + algorithmic curation
Comprehensive
Transparently v...
Machine-readable linked open data
Editable by anyone
Supporting human + algorithmic curation
Comprehensive
Transparently v...
Machine-readable linked open data
Editable by anyone
Supporting human + algorithmic curation
Comprehensive
Transparently v...
Wikidata
Collaboratively curated linked open data
Wikidata
Free knowledge base that anyone can edit
Launched in 2012
Integrated with Wikipedia and other sister
projects
Sta...
Wikidata:
Growth
http://reportcard.wmflabs.org/graphs/active_editors
English Wikipedia
Wikidata
Wikidata:
Growth
http://reportcard.wmflabs.org/graphs/very_active_editors
English Wikipedia
Wikidata
Wikidata’s anatomy
https://www.wikidata.org/wiki/Wikidata:Introduction
Wikidata’s anatomy
Linked data, San Francisco, Jeblad
https://commons.wikimedia.org/wiki/File:Linked_Data_-_San_Francisco....
SPARQL:
http://tinyurl.com/zelqrwp
Paintings
by
Gustav Klimt
Wikidata
query
examples
SPARQL:
https://t.co/cDR4Lt7V6P
Birth place of people
employed by MIT
SPARQL:
http://tinyurl.com/h5x5q4q
Children of Genghis Khan
Expert curation of scientific open data
Benjamin Good (2016) Opportunities and challenges
presented by Wikidata in the con...
Expert curation of scientific open data
Gene Wiki: WIkidata SPARQL examples
https://bitbucket.org/sulab/wikidatasparqlexam...
WikiCite
Building the sum of all human citations
Randall Munroe, Wikipedian protester http://tinyurl.com/p3rodlb [CC BY]
the disappearance of provenance
http://bit.ly/SumOfAllCitations
the disappearance of provenance
the disappearance of provenance
http://wapo.st/1Y5Smm6
Linking is a small act of generosity that sends people away from
your site to some other that you think shows the world in...
a provenance-preserving answer engine
a provenance-preserving answer engine
The sum of all human
knowledge
The sum of all data and
sources backing human
knowled...
https://twitter.com/egonwillighagen/status/718474906858582016
Benjamin Good (2016) Opportunities and challenges presented by Wikidata in the context of
biocuration
http://tinyurl.com/h...
https://tools.wmflabs.org/wikidata-todo/stats.php
https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Source_MetaData#...
The molecular origins of insulin go at least as far back as
the simplest unicellular [[eukaryotes]].<ref
name='LeRoith'>{{...
Wikicite: goals
Lay the foundations for building a repository of all Wikimedia
citations and source metadata as structured...
Wikidata as the solution
Vision
Technology
Community
Scale
Licensing
Independence
https://meta.wikimedia.org/wiki/WikiCite_2016
https://tools.wmflabs.org/sqid/#/view?id=P2860
cites property now used in 350,000+ statements
https://twitter.com/harej/status/765336072997842944
The Zika corpus
Open citation graph layer
Bibliographic metadata layer
Expert annotation layer
Encyclopedic layer
The Zika corpus
Encyclopedic layer
The Zika corpus
Expert annotation layer
Encyclopedic layer
The Zika corpus
Bibliographic metadata layer
Expert annotation layer
Encyclopedic layer
The Zika corpus
Open citation graph layer
Bibliographic metadata layer
Expert annotation layer
Encyclopedic layer
Applications
Co-author graphs for individual researchers
SPARQL: http://tinyurl.com/zml3jox
Most cited authors in the research corpus on Zika
SPARQL: http://tinyurl.com/jb8da68
Semi-automated recommendation of missing statements or sources for unsourced
statements
https://www.wikidata.org/wiki/Wiki...
Tools for crowdsourcing entity matching / disambiguation
http://www.generalist.org.uk/blog/2014/wikidata-identifiers-and-t...
all statements citing a New York Times article
the most popular scholarly journals used as citations for statements in any...
Concluding remarks
Liberate public domain bibliographic and citation data
Support new forms of open curation and distributed fact-checking
Ac...
meta.wikimedia.org/wiki/WikiCite • @wikicite
Thank you
Acknowledgments
Daniel Mietchen, Jonathan Dugan, Lydia Pintscher, Cameron Neylon, James Hare, James Heilman,
Mag...
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
Verifiable, linked open knowledge that anyone can edit
Prochain SlideShare
Chargement dans…5
×

Verifiable, linked open knowledge that anyone can edit

1 371 vues

Publié le

My keynote talk at VIVO '16 on Wikidata, Wikicite and collaboratively created, linked open knowledge.

Publié dans : Technologie
  • Soyez le premier à commenter

Verifiable, linked open knowledge that anyone can edit

  1. 1. Verifiable, Linked Open Knowledge That Anyone can Edit Dario Taraborelli @readermeter
  2. 2. A short history of Wikipedia A website that anyone can edit The largest reference work on the internet A multi-language online encyclopedia
  3. 3. A short history of Wikipedia A website that anyone can edit The largest reference work on the internet A multi-language online encyclopedia
  4. 4. A short history of Wikipedia A website that anyone can edit The largest reference work on the internet A multi-language online encyclopedia
  5. 5. Wikipedia: unintended outcomes accelerate the dissemination of scholarship support open scientific research enable distributed fact-checking and curation of scientific knowledge
  6. 6. accelerate the dissemination of scholarship support open scientific research enable distributed fact-checking and curation of scientific knowledge
  7. 7. Wikipedia: unintended outcomes accelerate the dissemination of scholarship support open scientific research enable distributed fact-checking and curation of scientific knowledge
  8. 8. Outline 1. Wikipedia as the front matter to all research 2. A new kind of open knowledge 3. Wikidata: Collaboratively curated linked open data 4. WikiCite: Building the sum of all human citations 5. Applications 6. Concluding remarks
  9. 9. Wikipedia as the front matter to all research
  10. 10. “Wikipedia is not the bottom layer of authority, nor the top, but in fact the highest layer without formal vetting. In this unique role, it serves as an ideal bridge between the validated and unvalidated Web.” Casper Grathwohl Chronicle of Higher Education http://chronicle.com/article/article-content/125899/
  11. 11. Top sources of DOI resolutions http://crosstech.crossref.org/2014/02/many-metrics-such-data-wow.html http://blog.crossref.org/2016/05/https-and-wikipedia.html
  12. 12. The world’s most accessed online medical resource? Heilman and West (2015) doi.org/10.2196/jmir.4069
  13. 13. Most visited resource on Ebola in West Africa Heilman (2016) http://tinyurl.com/jfuyduv Most used internet site in Liberia, Sierra Leone and Guinea for Ebola during 2014 outbreak Greater than CNN, CDC and WHO
  14. 14. A new kind of open knowledge
  15. 15. The backbone of the linked open data ecosystem Schmachtenberg et al (2014) http://lod-cloud.net [CC BY SA]
  16. 16. Challenges Biases / errors Coverage Diversity and inclusiveness Verifiability
  17. 17. Machine-readable linked open data Editable by anyone Supporting human + algorithmic curation Comprehensive Transparently verifiable
  18. 18. Machine-readable linked open data Editable by anyone Supporting human + algorithmic curation Comprehensive Transparently verifiable
  19. 19. Machine-readable linked open data Editable by anyone Supporting human + algorithmic curation Comprehensive Transparently verifiable
  20. 20. Wikidata Collaboratively curated linked open data
  21. 21. Wikidata Free knowledge base that anyone can edit Launched in 2012 Integrated with Wikipedia and other sister projects Statistics (Aug 2016) Nearly 20M items Over 100M statements
  22. 22. Wikidata: Growth http://reportcard.wmflabs.org/graphs/active_editors English Wikipedia Wikidata
  23. 23. Wikidata: Growth http://reportcard.wmflabs.org/graphs/very_active_editors English Wikipedia Wikidata
  24. 24. Wikidata’s anatomy https://www.wikidata.org/wiki/Wikidata:Introduction
  25. 25. Wikidata’s anatomy Linked data, San Francisco, Jeblad https://commons.wikimedia.org/wiki/File:Linked_Data_-_San_Francisco.svg [CC BY SA]
  26. 26. SPARQL: http://tinyurl.com/zelqrwp Paintings by Gustav Klimt Wikidata query examples
  27. 27. SPARQL: https://t.co/cDR4Lt7V6P Birth place of people employed by MIT
  28. 28. SPARQL: http://tinyurl.com/h5x5q4q Children of Genghis Khan
  29. 29. Expert curation of scientific open data Benjamin Good (2016) Opportunities and challenges presented by Wikidata in the context of biocuration http://tinyurl.com/hk9qrmz
  30. 30. Expert curation of scientific open data Gene Wiki: WIkidata SPARQL examples https://bitbucket.org/sulab/wikidatasparqlexamples/overview Get a list of all diseases treated by Metformin Get all the gene ontology evidence codes used in Wikidata Get all known drug-drug interactions for Methadone via its CHEMBL id
  31. 31. WikiCite Building the sum of all human citations Randall Munroe, Wikipedian protester http://tinyurl.com/p3rodlb [CC BY]
  32. 32. the disappearance of provenance http://bit.ly/SumOfAllCitations
  33. 33. the disappearance of provenance
  34. 34. the disappearance of provenance http://wapo.st/1Y5Smm6
  35. 35. Linking is a small act of generosity that sends people away from your site to some other that you think shows the world in a way worth considering. [...] [Sources] that are not generous with linking [...] are a stopping point in the ecology of information. That’s the operational definition of authority: The last place you visit when you’re looking for an answer. If you are satisfied with the answer, you stop your pursuit of it. Take the links out and you think you look like more of an authority. D. Weinberger (2012) Linking is a public good http://www.hyperorg.com/blogger/2012/02/26/2b2k-linking-is-a-public-good/
  36. 36. a provenance-preserving answer engine
  37. 37. a provenance-preserving answer engine The sum of all human knowledge The sum of all data and sources backing human knowledge +
  38. 38. https://twitter.com/egonwillighagen/status/718474906858582016
  39. 39. Benjamin Good (2016) Opportunities and challenges presented by Wikidata in the context of biocuration http://tinyurl.com/hk9qrmz
  40. 40. https://tools.wmflabs.org/wikidata-todo/stats.php https://www.wikidata.org/wiki/Wikidata_talk:WikiProject_Source_MetaData#Sources_used_as_references_on _Wikidata 77% 2013 2014 2015 2016 References in Wikidata
  41. 41. The molecular origins of insulin go at least as far back as the simplest unicellular [[eukaryotes]].<ref name='LeRoith'>{{cite journal | vauthors = LeRoith D, Shiloach J, Heffron R, Rubinovitz C, Tanenbaum R, Roth J | title = Insulin-related material in microbes: similarities and differences from mammalian insulins | journal = Can. J. Biochem. Cell Biol. | volume = 63 | issue = 8 | pages = 839–49 | year = 1985 | pmid = 3933801 | doi = 10.1139/o85-106 }}</ref> Apart from animals, insulin-like proteins are also known to exist in Fungi and Protista kingdoms. References in Wikipedia
  42. 42. Wikicite: goals Lay the foundations for building a repository of all Wikimedia citations and source metadata as structured data Design data models and technology to improve the coverage, quality, standards-compliance and machine-readability of citations and source metadata in Wikimedia projects https://meta.wikimedia.org/wiki/WikiCite_2016
  43. 43. Wikidata as the solution Vision Technology Community Scale Licensing Independence
  44. 44. https://meta.wikimedia.org/wiki/WikiCite_2016
  45. 45. https://tools.wmflabs.org/sqid/#/view?id=P2860 cites property now used in 350,000+ statements
  46. 46. https://twitter.com/harej/status/765336072997842944
  47. 47. The Zika corpus Open citation graph layer Bibliographic metadata layer Expert annotation layer Encyclopedic layer
  48. 48. The Zika corpus Encyclopedic layer
  49. 49. The Zika corpus Expert annotation layer Encyclopedic layer
  50. 50. The Zika corpus Bibliographic metadata layer Expert annotation layer Encyclopedic layer
  51. 51. The Zika corpus Open citation graph layer Bibliographic metadata layer Expert annotation layer Encyclopedic layer
  52. 52. Applications
  53. 53. Co-author graphs for individual researchers SPARQL: http://tinyurl.com/zml3jox
  54. 54. Most cited authors in the research corpus on Zika SPARQL: http://tinyurl.com/jb8da68
  55. 55. Semi-automated recommendation of missing statements or sources for unsourced statements https://www.wikidata.org/wiki/Wikidata:Primary_sources_tool https://meta.wikimedia.org/wiki/Grants:IEG/StrepHit:_Wikidata_Statements_Validation_via_References
  56. 56. Tools for crowdsourcing entity matching / disambiguation http://www.generalist.org.uk/blog/2014/wikidata-identifiers-and-the-odnb-where-next/ http://www.generalist.org.uk/blog/2014/wikidata-and-identifiers-part-2-the-matching-process/
  57. 57. all statements citing a New York Times article the most popular scholarly journals used as citations for statements in any item that is a subclass of economics all statements citing the works of Joseph Stiglitz all statements citing journal articles by physicists from Oxford University all statements citing a journal article that was retracted all statements citing a source that cites a journal article that was retracted New opportunities for linked open knowledge curation and discovery https://meta.wikimedia.org/wiki/WikiCite_2016/Report/Group_5
  58. 58. Concluding remarks
  59. 59. Liberate public domain bibliographic and citation data Support new forms of open curation and distributed fact-checking Accelerate open scientific research Verifiable, Linked Open Knowledge That Anyone can Edit
  60. 60. meta.wikimedia.org/wiki/WikiCite • @wikicite
  61. 61. Thank you Acknowledgments Daniel Mietchen, Jonathan Dugan, Lydia Pintscher, Cameron Neylon, James Hare, James Heilman, Magnus Manske, the Gene Wiki team (especially Andra Waagmeester and Benjamin Good), the University of Chicago Knowledge Lab, all WikiCite 2016 participants and Wikidata Source Metadata project contributors. Additional image credits Printing press, M. Wirth https://thenounproject.com/term/printing/11880/ [CC BY] Cocitation network for openfMRI papers, F. Å. Nielsen https://twitter.com/fnielsen/status/752860630932156416 dario@wikimedia.org • @readermeter • @Wikidata • @WikiCite • @WikiResearch

×