SlideShare a Scribd company logo
1 of 26
Download to read offline
Data-Hacking with
Wikimedia Projects:
Learn by Example, Including
Wikipedia, Wikidata, and beyond!
@notconfusing (Max Klein)
@wrought (Matt Senate)
Who is in the room?
●
Data hackers?
●
Programmers?
●
Artists/designers?
●
Open Access folks?
●
Academics?
●
Wikipedians?
●
Wikimedians?
●
Wiki-people?
Wikimedia Movement
●
wikipedia.org
●
commons.wikimedia.org
●
wikisource.org
●
wikidata.org
Wiki Context
●
Wikipedia: far-largest in
size and user base.
●
Projects often
organized by language.
●
Each language-project
has an independent
user community.
Wiki Context
●
See Wikimedia
projects as a form of:
“curated database”
●
Web's Least Common
Denominator for data.
●
Wiki Paradox:
– Low-barrier-to-entry
– High-barrier-to-entry
Buneman, et al. Curated Databases.
https://peerlibrary.org/p/rxQ6WBd89XviMF4Tk
How do Wikimedia projects
work?
●
Community
●
Opt-in
●
Reputation
●
Cultural Protocol
●
Bureaucracy
●
Adhocracy
●
Coordination
– WikiProjects
History of Wiki Data-
Hacking
●
Rambot
– First “bot”
– 2002
●
US Census Data
●
Created 200,000
articles
●
More than doubled
Wikipedia's size at
the time
●
No permissions
“Ignore All Rules” -->
Calcification
●
To get things done:
– Need to issue a
“Request for
Comment” or RFC
●
Everybody,
regardless of
expertise has
trouble with this.
– Sometimes, not
everyone is acting in
good faith, but try to
assume it, it will help.
Häskell und Grepl
●
Hänsel und Gretel ●
Wikidata
https://commons.wikimedia.org/wiki/File:Hansel_and_Gretel.jpg
●
Rural Hunger
Problems
●
The Wikimedia
datascape having
cross-language data
sharing problems.
●
Häskell and Grepl
are sent to the
Forest alone.
●
Data is sent to live in
Templates, living
alone.
●
Häskell and Grepl
first invent a
successful
breadcrumb system
●
Wikimedia commons
allows images to be
shared across Wikis.
●
Lucky if their
breadcrumbs are not
eaten / or whatever
put.
●
Lucky if we knew on
which pages the
Data was stored.
http://commons.wikimedia.org/wiki/File:Flickr_-_Per_Ola_Wiberg_~_mostly_away_-_-_
%22No_bread,_just_a_camera...huh,_quack_%22.jpg
●
A magical
gingerbread house
●
A magical data store
– Called “Wikidata”
– Interwiki data
sharing
– Plus with extra
sweeties
http://commons.wikimedia.org/wiki/Gingerbread_house#mediaviewer/File:Pe
pparkakshus.JPG
●
And the house
includes many
different sweeties.
●
Semantic Triples
●
Qualifiers
●
Ranks
http://commons.wikimedia.org/wiki/Category:Liquorice_candy#mediaviewer/File:Flickr_-
_cyclonebill_-_Slik_%281%29.jpg
●
Häskell and Grepl
eat the roof
hungrily.
●
In this story, the
User's start adding
to Wikidata.
– Importing Wikipedia
– Foreign Database
– Manual adds.
●
The evil witch trap ●
The evil data
witches, normally
keep the
information as a silo.
●
Grepl's cunning
defeat of the witch
●
Identifers
– Think of this as a foreign
database key (Brian
Jacobs)
– Max imported 400,000
biographical identifers.
– This started an Identiifer
craze on Wikidata
●
Tennis Player
●
Swiss Parliament
●
Danish Companies
●
Grepl's cunning
defeat of the witch
●
All Wikis can
transclude arbitrary
data (eventually).
●
And Citations can be
represented as
semantic properties.
See how sources are handled with “FRBR” format:
http://www.wikidata.org/wiki/Help:Sources#Scientific.2C_newspaper_or_magazine_article
Signalling Open Access
●
WikiProject Open
Access
– On English Wikipedia
●
(Data) Problem:
Signalling “Open
Access” is hard!
●
Solution: Use clear
signals directly to
the relevant data.
– Copyright license
– Source content
– Metadata
What?
How?
●
Text WikiSource→
●
Media Commons→
●
Metadata Wikidata→
●
Signals Wikipedia→
– Including license!
●
Public domain, CC0, CC-
BY, CC-BY-SA, etc
●
RFC RecitationBot→
●
RFC RecitationBot→
●
RFC RecitationBot→
●
RFC RecitationBot→
In the Wikimedia Universe
●
Where does
“Signalling Open
Access” fit in the
Wikimedia narrative?
●
(Data) Problem:
Managing citations &
references is hard!
●
Possible Solutions:
– Templates (Many)
– Categories (Many)
– Namespace (FR)
– WikiScholar (Dead)
– VisualEditor (Zotero?)
“Signalling OA”
Opportunities
●
Take pass at citation
management
– Quality & experience
●
Integrate metadata
with Wikidata
– Where it belongs
– Sensitivity and rigor
●
Forge deep
knowledge resources
on Wikipedia
– Snapshots of sources,
deeper linking
●
Automate, but use
human judgment
– Save time and energy,
improve accuracy.
Paths for Data Hackers
●
Reputation
– Create a user account
– Contribute in good faith
to wikimedia projects
●
Cultural protocol
– Identify the scope,
concerns, and nature of
a given project
– Learn to navigate
●
History and Context
– Form a narrative for
your hack based on
past endeavors
– Seek consent and
build consensus
●
Community
– Reach out, on IRC
and mailing lists!
Of course we're Open Source
github.com/wpoa/OA-signalling
github.com/wpoa/recitation-bot
Thanks!
@notconfusing (Max)
@wrought (Matt)

More Related Content

What's hot

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016Sebastian Hellmann
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataSebastian Hellmann
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fastMetaSolutions AB
 
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to MetadataAileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to Metadatadri_ireland
 
Wikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCWikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCLucie-Aimée Kaffee
 
Video game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidatapeterchanws
 
Digital Narratives for Transylvania DH
Digital Narratives for Transylvania DHDigital Narratives for Transylvania DH
Digital Narratives for Transylvania DHShawn Day
 
Contropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyContropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyDavid Laniado
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018peterchanws
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLdri_ireland
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublinm_ackermann
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013Frauke Ziedorn
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataDLFCLIR
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Fabrizio Orlandi
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsdgarijo
 
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Thomas Lancaster
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Fabrizio Orlandi
 

What's hot (20)

DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016DBpedia/association Introduction The Hague 12.2.2016
DBpedia/association Introduction The Hague 12.2.2016
 
DBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of DataDBpedia: A Public Data Infrastructure for the Web of Data
DBpedia: A Public Data Infrastructure for the Web of Data
 
Linked Open Data
Linked Open DataLinked Open Data
Linked Open Data
 
Open data easy, explicit and fast
Open data easy, explicit and fastOpen data easy, explicit and fast
Open data easy, explicit and fast
 
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to MetadataAileen O'Carroll - DRI Training UCC: Introduction to Metadata
Aileen O'Carroll - DRI Training UCC: Introduction to Metadata
 
DBpedia InsideOut
DBpedia InsideOutDBpedia InsideOut
DBpedia InsideOut
 
20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture20140521 sem-tech-biz-guest-lecture
20140521 sem-tech-biz-guest-lecture
 
Wikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYCWikidata at Wikipeda Day 15 (2016) NYC
Wikidata at Wikipeda Day 15 (2016) NYC
 
Video game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidataVideo game controlled vocabulary in wikidata
Video game controlled vocabulary in wikidata
 
Digital Narratives for Transylvania DH
Digital Narratives for Transylvania DHDigital Narratives for Transylvania DH
Digital Narratives for Transylvania DH
 
Contropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit historyContropedia: Critical learning through Wikipedia's edit history
Contropedia: Critical learning through Wikipedia's edit history
 
Digital game preservation conference 12 25-2018
Digital game preservation conference   12 25-2018Digital game preservation conference   12 25-2018
Digital game preservation conference 12 25-2018
 
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XMLKathryn Cassidy - DRI Training Series: 4. Metadata and XML
Kathryn Cassidy - DRI Training Series: 4. Metadata and XML
 
DBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, DublinDBpedia Tutorial - Feb 2015, Dublin
DBpedia Tutorial - Feb 2015, Dublin
 
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
DOI registration with DataCite - COOPEUS, ENVRI, EUDAT workshop 2013
 
An Introduction to Linked Data and Microdata
An Introduction to Linked Data and MicrodataAn Introduction to Linked Data and Microdata
An Introduction to Linked Data and Microdata
 
Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021Beyond 2022 project presentation 2021
Beyond 2022 project presentation 2021
 
Semantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologistsSemantic web 101: Benefits for geologists
Semantic web 101: Benefits for geologists
 
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018Contract Cheating in Canada (for University of Calgary) - 17 October 2018
Contract Cheating in Canada (for University of Calgary) - 17 October 2018
 
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
Benchmarking RDF Metadata Representations: Reification, Singleton Property an...
 

Viewers also liked

2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projectsAndy Mabbett
 
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy MabbettAndy Mabbett
 
The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSSRachel Andrew
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsShelly Sanchez Terrell
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back KidEthos3
 

Viewers also liked (6)

GLAMHerbert
GLAMHerbertGLAMHerbert
GLAMHerbert
 
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
2014 05-21 poster on ORCID identifiers in Wikipedia, Wikidata & sister projects
 
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
2015-04-03 WikiArabia GLAM-Wiki presentation by Andy Mabbett
 
The Near Future of CSS
The Near Future of CSSThe Near Future of CSS
The Near Future of CSS
 
Classroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and AdolescentsClassroom Management Tips for Kids and Adolescents
Classroom Management Tips for Kids and Adolescents
 
The Presentation Come-Back Kid
The Presentation Come-Back KidThe Presentation Come-Back Kid
The Presentation Come-Back Kid
 

Similar to Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project

Bot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotBot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotMiguel-Angel Monjas
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Chris
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLShalin Hai-Jew
 
N2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsN2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsMike Linksvayer
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoopcneudecker
 
Wmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWikimedia Foundation
 
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionFrom Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionSTLab
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataLuca Martinelli
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Knowledge Belgium
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"Nick Sheppard
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationBenjamin Good
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeErikMoeller
 
The Elephant in the Library
The Elephant in the LibraryThe Elephant in the Library
The Elephant in the LibraryDataWorks Summit
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory WorkshopBeat Estermann
 
Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)ShweataNHegde
 

Similar to Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project (20)

Bot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with PywikibotBot programming in Wikimedia Commons with Pywikibot
Bot programming in Wikimedia Commons with Pywikibot
 
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
Wikimedia 재단과 MediaWiki 위키 소프트웨어 조사
 
Tel Vortrag
Tel VortragTel Vortrag
Tel Vortrag
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Exploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXLExploring Article Networks on Wikipedia with NodeXL
Exploring Article Networks on Wikipedia with NodeXL
 
ConfrencePres
ConfrencePresConfrencePres
ConfrencePres
 
N2Y3: Mashups Require Commons
N2Y3: Mashups Require CommonsN2Y3: Mashups Require Commons
N2Y3: Mashups Require Commons
 
Intranet 2.0: Using Wikis
Intranet 2.0: Using WikisIntranet 2.0: Using Wikis
Intranet 2.0: Using Wikis
 
The Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating HadoopThe Elephant in the Library - Integrating Hadoop
The Elephant in the Library - Integrating Hadoop
 
Wmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdfWmf wikimedia conference japan feb 3 en pdf
Wmf wikimedia conference japan feb 3 en pdf
 
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge ExtractionFrom Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
From Hyperlinks to Semantic Web Properties using Open Knowledge Extraction
 
Wikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured DataWikidata: A New Way to Disseminate Structured Data
Wikidata: A New Way to Disseminate Structured Data
 
Open Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - RomaineOpen Culture - How Wiki loves art and data - Romaine
Open Culture - How Wiki loves art and data - Romaine
 
Open Access and Wikipedia : Taking accessible research to the global public"
Open Access and  Wikipedia : Taking accessible research to the global public"Open Access and  Wikipedia : Taking accessible research to the global public"
Open Access and Wikipedia : Taking accessible research to the global public"
 
Opportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocurationOpportunities and challenges presented by Wikidata in the context of biocuration
Opportunities and challenges presented by Wikidata in the context of biocuration
 
Beyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free KnowledgeBeyond the Encylcopedia: The Frontiers of Free Knowledge
Beyond the Encylcopedia: The Frontiers of Free Knowledge
 
The Elephant in the Library
The Elephant in the LibraryThe Elephant in the Library
The Elephant in the Library
 
Wikidata Introductory Workshop
Wikidata Introductory WorkshopWikidata Introductory Workshop
Wikidata Introductory Workshop
 
Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)Mphil Computational Biology Seminar Series Presentation (20201111)
Mphil Computational Biology Seminar Series Presentation (20201111)
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
 

Recently uploaded

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendArshad QA
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number SystemsJheuzeDellosa
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software DevelopersVinodh Ram
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackVICTOR MAESTRE RAMIREZ
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 

Recently uploaded (20)

Test Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and BackendTest Automation Strategy for Frontend and Backend
Test Automation Strategy for Frontend and Backend
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
What is Binary Language? Computer Number Systems
What is Binary Language?  Computer Number SystemsWhat is Binary Language?  Computer Number Systems
What is Binary Language? Computer Number Systems
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...Call Girls In Mukherjee Nagar 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
Call Girls In Mukherjee Nagar 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SE...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Professional Resume Template for Software Developers
Professional Resume Template for Software DevelopersProfessional Resume Template for Software Developers
Professional Resume Template for Software Developers
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Cloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStackCloud Management Software Platforms: OpenStack
Cloud Management Software Platforms: OpenStack
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 

Häskell und Grepl: Data Hacking Wikimedia Projects Exampled With Open Access Signalling Project

  • 1. Data-Hacking with Wikimedia Projects: Learn by Example, Including Wikipedia, Wikidata, and beyond! @notconfusing (Max Klein) @wrought (Matt Senate)
  • 2. Who is in the room? ● Data hackers? ● Programmers? ● Artists/designers? ● Open Access folks? ● Academics? ● Wikipedians? ● Wikimedians? ● Wiki-people?
  • 4. Wiki Context ● Wikipedia: far-largest in size and user base. ● Projects often organized by language. ● Each language-project has an independent user community.
  • 5. Wiki Context ● See Wikimedia projects as a form of: “curated database” ● Web's Least Common Denominator for data. ● Wiki Paradox: – Low-barrier-to-entry – High-barrier-to-entry Buneman, et al. Curated Databases. https://peerlibrary.org/p/rxQ6WBd89XviMF4Tk
  • 6. How do Wikimedia projects work? ● Community ● Opt-in ● Reputation ● Cultural Protocol ● Bureaucracy ● Adhocracy ● Coordination – WikiProjects
  • 7. History of Wiki Data- Hacking ● Rambot – First “bot” – 2002 ● US Census Data ● Created 200,000 articles ● More than doubled Wikipedia's size at the time ● No permissions
  • 8. “Ignore All Rules” --> Calcification ● To get things done: – Need to issue a “Request for Comment” or RFC ● Everybody, regardless of expertise has trouble with this. – Sometimes, not everyone is acting in good faith, but try to assume it, it will help.
  • 9. Häskell und Grepl ● Hänsel und Gretel ● Wikidata https://commons.wikimedia.org/wiki/File:Hansel_and_Gretel.jpg
  • 10. ● Rural Hunger Problems ● The Wikimedia datascape having cross-language data sharing problems.
  • 11. ● Häskell and Grepl are sent to the Forest alone. ● Data is sent to live in Templates, living alone.
  • 12. ● Häskell and Grepl first invent a successful breadcrumb system ● Wikimedia commons allows images to be shared across Wikis.
  • 13. ● Lucky if their breadcrumbs are not eaten / or whatever put. ● Lucky if we knew on which pages the Data was stored. http://commons.wikimedia.org/wiki/File:Flickr_-_Per_Ola_Wiberg_~_mostly_away_-_-_ %22No_bread,_just_a_camera...huh,_quack_%22.jpg
  • 14. ● A magical gingerbread house ● A magical data store – Called “Wikidata” – Interwiki data sharing – Plus with extra sweeties http://commons.wikimedia.org/wiki/Gingerbread_house#mediaviewer/File:Pe pparkakshus.JPG
  • 15. ● And the house includes many different sweeties. ● Semantic Triples ● Qualifiers ● Ranks http://commons.wikimedia.org/wiki/Category:Liquorice_candy#mediaviewer/File:Flickr_- _cyclonebill_-_Slik_%281%29.jpg
  • 16. ● Häskell and Grepl eat the roof hungrily. ● In this story, the User's start adding to Wikidata. – Importing Wikipedia – Foreign Database – Manual adds.
  • 17. ● The evil witch trap ● The evil data witches, normally keep the information as a silo.
  • 18. ● Grepl's cunning defeat of the witch ● Identifers – Think of this as a foreign database key (Brian Jacobs) – Max imported 400,000 biographical identifers. – This started an Identiifer craze on Wikidata ● Tennis Player ● Swiss Parliament ● Danish Companies
  • 19. ● Grepl's cunning defeat of the witch ● All Wikis can transclude arbitrary data (eventually). ● And Citations can be represented as semantic properties. See how sources are handled with “FRBR” format: http://www.wikidata.org/wiki/Help:Sources#Scientific.2C_newspaper_or_magazine_article
  • 20. Signalling Open Access ● WikiProject Open Access – On English Wikipedia ● (Data) Problem: Signalling “Open Access” is hard! ● Solution: Use clear signals directly to the relevant data. – Copyright license – Source content – Metadata
  • 21. What?
  • 22. How? ● Text WikiSource→ ● Media Commons→ ● Metadata Wikidata→ ● Signals Wikipedia→ – Including license! ● Public domain, CC0, CC- BY, CC-BY-SA, etc ● RFC RecitationBot→ ● RFC RecitationBot→ ● RFC RecitationBot→ ● RFC RecitationBot→
  • 23. In the Wikimedia Universe ● Where does “Signalling Open Access” fit in the Wikimedia narrative? ● (Data) Problem: Managing citations & references is hard! ● Possible Solutions: – Templates (Many) – Categories (Many) – Namespace (FR) – WikiScholar (Dead) – VisualEditor (Zotero?)
  • 24. “Signalling OA” Opportunities ● Take pass at citation management – Quality & experience ● Integrate metadata with Wikidata – Where it belongs – Sensitivity and rigor ● Forge deep knowledge resources on Wikipedia – Snapshots of sources, deeper linking ● Automate, but use human judgment – Save time and energy, improve accuracy.
  • 25. Paths for Data Hackers ● Reputation – Create a user account – Contribute in good faith to wikimedia projects ● Cultural protocol – Identify the scope, concerns, and nature of a given project – Learn to navigate ● History and Context – Form a narrative for your hack based on past endeavors – Seek consent and build consensus ● Community – Reach out, on IRC and mailing lists!
  • 26. Of course we're Open Source github.com/wpoa/OA-signalling github.com/wpoa/recitation-bot Thanks! @notconfusing (Max) @wrought (Matt)