SlideShare a Scribd company logo
1 of 14
Download to read offline
Ge#ng	
  Seman*cs	
  from	
  the	
  Crowd	
  
Gianluca	
  Demar*ni	
  
eXascale	
  Infolab,	
  University	
  of	
  Fribourg	
  
Switzerland	
  
Seman<c	
  Web	
  2.0	
  
	
  
•  not	
  the	
  Web	
  3.0	
  
•  GeDng	
  seman<cs	
  from	
  (non-­‐expert)	
  people	
  
–  From	
  few	
  publishers	
  and	
  many	
  consumers	
  (SW	
  1.0)	
  
–  To	
  many	
  publishers	
  and	
  many	
  consumers	
  (SW	
  2.0)	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   2	
  
read/write	
  SW	
  
•  Wikidata
hQp://meta.wikimedia.org/wiki/Wikidata	
  
	
  
•  Seman<cs	
  is	
  about	
  the	
  meaning	
  
•  Get	
  people	
  in	
  the	
  loop!	
  
•  Social	
  compu<ng	
  for	
  SemWeb	
  applica<ons	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   3	
  
Crowdsourcing	
  
•  Exploit	
  human	
  intelligence	
  to	
  solve	
  
– Tasks	
  simple	
  for	
  humans,	
  complex	
  for	
  machines	
  
– With	
  a	
  large	
  number	
  of	
  humans	
  (the	
  Crowd)	
  
– Small	
  problems:	
  micro-­‐tasks	
  (Amazon	
  MTurk)	
  
•  Examples	
  
– Wikipedia,	
  Flickr	
  
•  Incen<ves	
  
– Financial,	
  fun,	
  visibility	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   4	
  
Crowdsourcing	
  
•  Success	
  Stories	
  
– Training	
  set	
  for	
  ML	
  
– Image	
  tagging	
  
– Document	
  annota<on/transla<on	
  
– IR	
  evalua<on	
  [Blanco	
  et	
  al.	
  SIGIR	
  2011]	
  
– CrowdDB	
  [Franklin	
  et	
  al.	
  SIGMOD	
  2011]	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   5	
  
Crowd-­‐powered	
  SW	
  apps	
  
•  En<ty	
  Linking	
  [ZenCrowd	
  at	
  WWW12]	
  
•  Create/validate	
  sameAs	
  links	
  
•  Schema	
  matching	
  
•  ...	
  Add	
  your	
  own	
  favorite	
  applica<on!	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   6	
  
HTML+ RDFa
Pages
LOD Cloud
ZenCrowd	
  
•  Combine	
  both	
  algorithmic	
  and	
  manual	
  linking	
  
•  Automate	
  manual	
  linking	
  via	
  crowdsourcing	
  
•  Dynamically	
  assess	
  human	
  workers	
  with	
  a	
  
probabilis<c	
  reasoning	
  framework	
  
27-­‐Apr-­‐12	
   7	
  
Crowd	
  
Algorithms	
  Machines	
  
ZenCrowd	
  Architecture	
  
Micro
Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
ZenCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Micro-
TaskManager
Workers Decisions
Algorithmic
Matchers
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   8	
  
The	
  micro-­‐task	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   9	
  
En<ty	
  Factor	
  Graphs	
  
•  Graph	
  components	
  
– Workers,	
  links,	
  clicks	
  
– Prior	
  probabili<es	
  
– Link	
  Factors	
  
– Constraints	
  
•  Probabilis<c	
  
Inference	
  
– Select	
  all	
  links	
  with	
  
posterior	
  prob	
  >τ	
  
w1
w2
l1
l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l3
lf3( )
pl3( )
c11
c22
c12
c21
c13
c23
u2-3( )sa1-2( )
2	
  workers,	
  6	
  clicks,	
  3	
  candidate	
  links	
  
Link	
  priors	
  
Worker	
  
priors	
  
Observed	
  
variables	
  
Link	
  
factors	
  
SameAs	
  
constraints	
  
Dataset	
  
Unicity	
  
constraints
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   10	
  
ZenCrowd:	
  Lessons	
  Learnt	
  
•  Crowdsourcing	
  +	
  Prob	
  reasoning	
  works!	
  
•  But	
  
– Different	
  worker	
  communi<es	
  perform	
  differently	
  
– No	
  differences	
  w/	
  different	
  contexts	
  
– Comple<on	
  <me	
  may	
  vary	
  (based	
  on	
  reward)	
  
– Many	
  low	
  quality	
  workers	
  +	
  Spam	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   11	
  
ZenCrowd	
  
•  Worker	
  Selec<on	
  
Top$US$
Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&Precision&
Number&of&Tasks&
US$Workers$
IN$Workers$
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   12	
  
Challenges	
  for	
  Crowd-­‐SW	
  
•  How	
  to	
  design	
  the	
  micro-­‐task	
  
•  Where	
  to	
  find	
  the	
  crowd	
  
– MTurk,	
  Facebook	
  (900M	
  users)	
  
•  Evalua<on	
  
– Which	
  ground	
  truth?!	
  
•  Quality	
  control	
  /	
  Spam	
  
– Need	
  for	
  spam	
  benchmarks	
  in	
  Crowdsourcing	
  
[Mechanical	
  Cheat	
  at	
  CrowdSearch	
  2012]	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   13	
  
27-­‐Apr-­‐12	
   Gianluca	
  Demar<ni,	
  eXascale	
  Infolab	
   14	
  

More Related Content

Similar to Getting Semantics from the Crowd

Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringPDE1D
 
SDN :: Software Defined Networking –2017 Executive Overview
SDN :: Software Defined Networking –2017 Executive OverviewSDN :: Software Defined Networking –2017 Executive Overview
SDN :: Software Defined Networking –2017 Executive OverviewChristian Esteve Rothenberg
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)Emil Eifrem
 
Closed2Open Networking
Closed2Open NetworkingClosed2Open Networking
Closed2Open NetworkingNaLUG
 
blueMarine a desktop app for the open source photographic workflow
blueMarine  a desktop app for the open source photographic workflowblueMarine  a desktop app for the open source photographic workflow
blueMarine a desktop app for the open source photographic workflowFabrizio Giudici
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataEUCLID project
 
Software-Defined Networking: Evolution or Revolution?
Software-Defined Networking: Evolution or Revolution?Software-Defined Networking: Evolution or Revolution?
Software-Defined Networking: Evolution or Revolution?Diego Kreutz
 
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Data Science Milan
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)NETUserGroupBern
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is youDavide Carboni
 
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...eXascale Infolab
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningMikel Emaldi Manrique
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosPaco Nathan
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingPaco Nathan
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataStavros Kontopoulos
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thessaloniki
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsPayamBarnaghi
 
2010 Future of Advanced Computing
2010 Future of Advanced Computing2010 Future of Advanced Computing
2010 Future of Advanced ComputingBob Marcus
 

Similar to Getting Semantics from the Crowd (20)

Linux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and EngineeringLinux and Open Source in Math, Science and Engineering
Linux and Open Source in Math, Science and Engineering
 
SDN :: Software Defined Networking –2017 Executive Overview
SDN :: Software Defined Networking –2017 Executive OverviewSDN :: Software Defined Networking –2017 Executive Overview
SDN :: Software Defined Networking –2017 Executive Overview
 
An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)An Overview of the Emerging Graph Landscape (Oct 2013)
An Overview of the Emerging Graph Landscape (Oct 2013)
 
Closed2Open Networking
Closed2Open NetworkingClosed2Open Networking
Closed2Open Networking
 
blueMarine a desktop app for the open source photographic workflow
blueMarine  a desktop app for the open source photographic workflowblueMarine  a desktop app for the open source photographic workflow
blueMarine a desktop app for the open source photographic workflow
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 
Microtask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked DataMicrotask Crowdsourcing Applications for Linked Data
Microtask Crowdsourcing Applications for Linked Data
 
Software-Defined Networking: Evolution or Revolution?
Software-Defined Networking: Evolution or Revolution?Software-Defined Networking: Evolution or Revolution?
Software-Defined Networking: Evolution or Revolution?
 
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
Introduction to Distributed Computing Engines for Data Processing - Simone Ro...
 
Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)Reaktive Programmierung mit den Reactive Extensions (Rx)
Reaktive Programmierung mit den Reactive Extensions (Rx)
 
The world is the computer and the programmer is you
The world is the computer and the programmer is youThe world is the computer and the programmer is you
The world is the computer and the programmer is you
 
Grandata
GrandataGrandata
Grandata
 
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
 
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph MiningDetection of Related Semantic Datasets Based on Frequent Subgraph Mining
Detection of Related Semantic Datasets Based on Frequent Subgraph Mining
 
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache MesosAugury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
Augury and Omens Aside, Part 1:
 The Business Case for Apache Mesos
 
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark StreamingTiny Batches, in the wine: Shiny New Bits in Spark Streaming
Tiny Batches, in the wine: Shiny New Bits in Spark Streaming
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Semantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of ThingsSemantic Technolgies for the Internet of Things
Semantic Technolgies for the Internet of Things
 
2010 Future of Advanced Computing
2010 Future of Advanced Computing2010 Future of Advanced Computing
2010 Future of Advanced Computing
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Getting Semantics from the Crowd

  • 1. Ge#ng  Seman*cs  from  the  Crowd   Gianluca  Demar*ni   eXascale  Infolab,  University  of  Fribourg   Switzerland  
  • 2. Seman<c  Web  2.0     •  not  the  Web  3.0   •  GeDng  seman<cs  from  (non-­‐expert)  people   –  From  few  publishers  and  many  consumers  (SW  1.0)   –  To  many  publishers  and  many  consumers  (SW  2.0)   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   2  
  • 3. read/write  SW   •  Wikidata hQp://meta.wikimedia.org/wiki/Wikidata     •  Seman<cs  is  about  the  meaning   •  Get  people  in  the  loop!   •  Social  compu<ng  for  SemWeb  applica<ons   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   3  
  • 4. Crowdsourcing   •  Exploit  human  intelligence  to  solve   – Tasks  simple  for  humans,  complex  for  machines   – With  a  large  number  of  humans  (the  Crowd)   – Small  problems:  micro-­‐tasks  (Amazon  MTurk)   •  Examples   – Wikipedia,  Flickr   •  Incen<ves   – Financial,  fun,  visibility   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   4  
  • 5. Crowdsourcing   •  Success  Stories   – Training  set  for  ML   – Image  tagging   – Document  annota<on/transla<on   – IR  evalua<on  [Blanco  et  al.  SIGIR  2011]   – CrowdDB  [Franklin  et  al.  SIGMOD  2011]   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   5  
  • 6. Crowd-­‐powered  SW  apps   •  En<ty  Linking  [ZenCrowd  at  WWW12]   •  Create/validate  sameAs  links   •  Schema  matching   •  ...  Add  your  own  favorite  applica<on!   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   6   HTML+ RDFa Pages LOD Cloud
  • 7. ZenCrowd   •  Combine  both  algorithmic  and  manual  linking   •  Automate  manual  linking  via  crowdsourcing   •  Dynamically  assess  human  workers  with  a   probabilis<c  reasoning  framework   27-­‐Apr-­‐12   7   Crowd   Algorithms  Machines  
  • 8. ZenCrowd  Architecture   Micro Matching Tasks HTML Pages HTML+ RDFa Pages LOD Open Data Cloud Crowdsourcing Platform ZenCrowd Entity Extractors LOD Index Get Entity Input Output Probabilistic Network Decision Engine Micro- TaskManager Workers Decisions Algorithmic Matchers 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   8  
  • 9. The  micro-­‐task   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   9  
  • 10. En<ty  Factor  Graphs   •  Graph  components   – Workers,  links,  clicks   – Prior  probabili<es   – Link  Factors   – Constraints   •  Probabilis<c   Inference   – Select  all  links  with   posterior  prob  >τ   w1 w2 l1 l2 pw1( ) pw2( ) lf1( ) lf2( ) pl1( ) pl2( ) l3 lf3( ) pl3( ) c11 c22 c12 c21 c13 c23 u2-3( )sa1-2( ) 2  workers,  6  clicks,  3  candidate  links   Link  priors   Worker   priors   Observed   variables   Link   factors   SameAs   constraints   Dataset   Unicity   constraints 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   10  
  • 11. ZenCrowd:  Lessons  Learnt   •  Crowdsourcing  +  Prob  reasoning  works!   •  But   – Different  worker  communi<es  perform  differently   – No  differences  w/  different  contexts   – Comple<on  <me  may  vary  (based  on  reward)   – Many  low  quality  workers  +  Spam   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   11  
  • 12. ZenCrowd   •  Worker  Selec<on   Top$US$ Worker$ 0$ 0.5$ 1$ 0$ 250$ 500$ Worker&Precision& Number&of&Tasks& US$Workers$ IN$Workers$ 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   12  
  • 13. Challenges  for  Crowd-­‐SW   •  How  to  design  the  micro-­‐task   •  Where  to  find  the  crowd   – MTurk,  Facebook  (900M  users)   •  Evalua<on   – Which  ground  truth?!   •  Quality  control  /  Spam   – Need  for  spam  benchmarks  in  Crowdsourcing   [Mechanical  Cheat  at  CrowdSearch  2012]   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   13  
  • 14. 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   14