SlideShare une entreprise Scribd logo
1  sur  21
Télécharger pour lire hors ligne
Query Optimization
over Crowdsourced Data
Hyunjung Park, Jennifer Widom
Stanford University
Deco: Declarative Crowdsourcing
Give me a Spanish-speaking
country.
Give me a country.
What language do they speak
in country X?
What is the capital of country X?
8/27/2013 Hyunjung Park 2
“Find the capitals of eight
Spanish-speaking countries”
DBMS
country language capital
Italy Italian Rome
Spain Spanish Madrid
… … …
country language capital
Italy Italian Rome
Spain Spanish Madrid
Deco System
Deco Query Optimization
•  Crowd incurs monetary cost
•  Some query plans are much cheaper than others
•  Cost estimation is complicated by:
–  Previously collected data
–  Unknown database state
–  Inconsistency of human answers
8/27/2013 Hyunjung Park 3
Outline
•  Motivating example
•  Deco data model and queries
•  Cost and cardinality estimation
•  Experimental results
8/27/2013 Hyunjung Park 4
Everything implemented in full prototype
Motivating Example: Plan 1
8/27/2013 Hyunjung Park 5
Give me a country.
What language do they speak in country X?
What is the capital of country X?
unseen
Spanish
F
T
T
F
“Find the capitals of eight Spanish-speaking countries”
8x
Give me a country.Give me a country.Give me a country.
Motivating Example: Plan 2
8/27/2013 Hyunjung Park 6
Give me a Spanish-speaking country.
What language do they speak in country X?
What is the capital of country X?
unseen
Spanish
F
T
T
F
“Find the capitals of eight Spanish-speaking countries”
8x
Preview of Experimental Results
0
5
10
15
Plan 1 Plan 2
Actual costs spent on Mechanical Turk
What is the capital of
country X?
What language do they
speak in country X?
Give me a Spanish-speaking
country.
Give me a country.
8/27/2013 Hyunjung Park 7
($)
Outline
•  Motivating example
•  Deco data model and queries
•  Cost and cardinality estimation
•  Experimental results
8/27/2013 Hyunjung Park 8
Deco: Data Model (1/2)
•  Conceptual Relation: visible to end-users
Country (country, language, capital)
•  Resolution Rules: cleanse raw data using UDFs
country: dupElim
language: majority(3)
capital: majority(3)
8/27/2013 Hyunjung Park 9
Deco: Data Model (2/2)
•  Fetch Rules: “access methods” for the crowd
language => country
“Give me a {language}-speaking country.”
Ø => country
“Give me a country.”
country => language
“What language do they speak in {country}?”
country => capital
“What is the capital of {country}?”
8/27/2013 Hyunjung Park 10
[$0.05]
[$0.01]
[$0.02]
[$0.03]
Deco: Queries
•  Deco query: SQL query over conceptual relations
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
•  Query processor: access the crowd as needed to
produce query result while:
1.  Minimizing monetary cost
2.  Reducing latency
8/27/2013 Hyunjung Park 11
query optimizer
query execution engine
Query Optimization
•  Find the best query plan in terms of estimated
monetary cost
•  As in traditional query optimizer
1.  Cost and cardinality estimation
2.  Search space
3.  Plan enumeration algorithm
8/27/2013 12Hyunjung Park
Cost Estimation
•  Total monetary cost = ∑Fetch	
  F	
  F.price × F.cardinality
–  Existing data is “free”
•  Definition of Cardinality in Deco
–  Total number of expected output tuples from operator
until query execution terminates
•  Cardinality estimation
–  Final database state needs to be estimated
simultaneously
8/27/2013 Hyunjung Park 13
Cardinality Estimation: Setting
•  $0.05 for all fetch rules
•  No existing data
•  Selectivity factors
–  language=‘Spanish’: 0.1
–  dupElim: 0.8
–  majority(3): 0.4 (=1/2.5)
8/27/2013 Hyunjung Park 14
Cardinality Estimation: Plan 1
8/27/2013 15Hyunjung Park
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
MinTuples[8]
Project[co,ca]
DLOJoin[co]
DLOJoin[co]
Resolve[dupeli] Resolve[maj3]
Resolve[maj3]Filter[la=’Spanish’]
Scan
[CtryA]
Fetch
[Øàco]
Scan
[CtryD2]
Fetch
[coàca]
Scan
[CtryD1]
Fetch
[coàla]
1
2
3
4 12
5
13
96
7 8 10 11
14
Ø => country
country => language
country => capital
Cost estimation:
$0.05×(100+200+20)
= $16.00200
20
100
Cardinality Estimation: Plan 2
8/27/2013 16Hyunjung Park
MinTuples[8]
Project[co,ca]
DLOJoin[co]
DLOJoin[co]
Resolve[dupeli] Resolve[maj3]
Resolve[maj3]Filter[la=’Spanish’]
Scan
[CtryA]
Fetch
[laàco]
Scan
[CtryD2]
Fetch
[coàca]
Scan
[CtryD1]
Fetch
[coàla]
1
2
3
4 12
5
13
96
7 8a 10 11
14
SELECT country, capital
FROM Country
WHERE language=‘Spanish’
MINTUPLES 8
language => country
country => language
country => capital
Cost estimation:
$0.05×(10+20+20)
= $2.502010
20
8/27/2013 Hyunjung Park 17
0
1
2
3
Actual
Plan 2
Experimental Results
0
5
10
15
Actual
Plan 1
country => capital
country => language
language => country
Ø => country
($) ($)
8/27/2013 Hyunjung Park 18
0
1
2
3
Actual Estimated
Plan 2
Experimental Results
0
5
10
15
Actual Estimated
Plan 1
country => capital
country => language
language => country
Ø => country
($) ($)
Related Work
•  Declarative approach for crowdsourcing
–  Arnold, CrowdDB, CrowdSearcher, Jabberwocky, Qurk, ...
•  Crowd-powered algorithms/operations
–  Filter, sort, join, max, entity resolution, …
•  Also:
–  Traditional query optimization
–  Heterogeneous or federated database systems
8/27/2013 19Hyunjung Park
Summary
•  Cost estimation in Deco
–  Distinguish between existing data vs. new data
–  Estimate cardinality and final database state
simultaneously
•  In the paper:
–  Full description of cost estimation and plan
enumeration algorithms
–  More experimental results
8/27/2013 Hyunjung Park 20
Thank you!

Contenu connexe

Similaire à Query Optimization over Crowdsourced Data

Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneOlli Rinne
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneApps4Finland
 
Geography of Digital Earth
Geography of Digital EarthGeography of Digital Earth
Geography of Digital EarthGeorge Percivall
 
EOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionEOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionOpenAIRE
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesIan Huston
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015Alan Dix
 
Lecture 3 needs assessment
Lecture 3   needs assessmentLecture 3   needs assessment
Lecture 3 needs assessmentyihongyuan19
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesChristophe Guéret
 
SC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCSC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCJohn Cobb
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open dataTwinbit
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupalDay
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation Akshat Sharma
 
Peter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubPeter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubBigDataViz
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18DataconomyGmbH
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereAlex Hardisty
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer visionEran Shlomo
 
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieGlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieFAO
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Jeroen Rombouts
 

Similaire à Query Optimization over Crowdsourced Data (20)

Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinne
 
Maps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinneMaps4 finland 28.8.2012, olli rinne
Maps4 finland 28.8.2012, olli rinne
 
Geography of Digital Earth
Geography of Digital EarthGeography of Digital Earth
Geography of Digital Earth
 
OMANTEL
OMANTELOMANTEL
OMANTEL
 
EOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introductionEOSC-hub and OpenAIRE Advance webinar - introduction
EOSC-hub and OpenAIRE Advance webinar - introduction
 
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural LanguagesData Science Amsterdam - Massively Parallel Processing with Procedural Languages
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
 
Data matters-bournemouth-2015
Data matters-bournemouth-2015Data matters-bournemouth-2015
Data matters-bournemouth-2015
 
Lecture 3 needs assessment
Lecture 3   needs assessmentLecture 3   needs assessment
Lecture 3 needs assessment
 
Linked Open Data for Digital Humanities
Linked Open Data for Digital HumanitiesLinked Open Data for Digital Humanities
Linked Open Data for Digital Humanities
 
SC13 BoF: RDA and HPC
SC13 BoF: RDA and HPCSC13 BoF: RDA and HPC
SC13 BoF: RDA and HPC
 
Thinking spatially with your open data
Thinking spatially with your open dataThinking spatially with your open data
Thinking spatially with your open data
 
Drupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open dataDrupal Day 2011 - Thinking spatially with your open data
Drupal Day 2011 - Thinking spatially with your open data
 
Research paper presentation
Research paper presentation Research paper presentation
Research paper presentation
 
Peter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City HubPeter Bjørn Larsen - Öresund Smart City Hub
Peter Bjørn Larsen - Öresund Smart City Hub
 
What can be done with Open Data?
What can be done with Open Data?What can be done with Open Data?
What can be done with Open Data?
 
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
Rent, Rain, and Regulations | Du Phan, Dataiku | DN18
 
Data accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphereData accessibility and the role of informatics in predicting the biosphere
Data accessibility and the role of informatics in predicting the biosphere
 
Practical deep learning for computer vision
Practical deep learning for computer visionPractical deep learning for computer vision
Practical deep learning for computer vision
 
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzieGlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
GlobalSoilMap.net and the new Global Soil Information System by Neil McKenzie
 
Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10Elag workshop sessie 1 en 2 v10
Elag workshop sessie 1 en 2 v10
 

Dernier

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsAndolasoft Inc
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Steffen Staab
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 

Dernier (20)

Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
How To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.jsHow To Use Server-Side Rendering with Nuxt.js
How To Use Server-Side Rendering with Nuxt.js
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 

Query Optimization over Crowdsourced Data

  • 1. Query Optimization over Crowdsourced Data Hyunjung Park, Jennifer Widom Stanford University
  • 2. Deco: Declarative Crowdsourcing Give me a Spanish-speaking country. Give me a country. What language do they speak in country X? What is the capital of country X? 8/27/2013 Hyunjung Park 2 “Find the capitals of eight Spanish-speaking countries” DBMS country language capital Italy Italian Rome Spain Spanish Madrid … … … country language capital Italy Italian Rome Spain Spanish Madrid Deco System
  • 3. Deco Query Optimization •  Crowd incurs monetary cost •  Some query plans are much cheaper than others •  Cost estimation is complicated by: –  Previously collected data –  Unknown database state –  Inconsistency of human answers 8/27/2013 Hyunjung Park 3
  • 4. Outline •  Motivating example •  Deco data model and queries •  Cost and cardinality estimation •  Experimental results 8/27/2013 Hyunjung Park 4 Everything implemented in full prototype
  • 5. Motivating Example: Plan 1 8/27/2013 Hyunjung Park 5 Give me a country. What language do they speak in country X? What is the capital of country X? unseen Spanish F T T F “Find the capitals of eight Spanish-speaking countries” 8x
  • 6. Give me a country.Give me a country.Give me a country. Motivating Example: Plan 2 8/27/2013 Hyunjung Park 6 Give me a Spanish-speaking country. What language do they speak in country X? What is the capital of country X? unseen Spanish F T T F “Find the capitals of eight Spanish-speaking countries” 8x
  • 7. Preview of Experimental Results 0 5 10 15 Plan 1 Plan 2 Actual costs spent on Mechanical Turk What is the capital of country X? What language do they speak in country X? Give me a Spanish-speaking country. Give me a country. 8/27/2013 Hyunjung Park 7 ($)
  • 8. Outline •  Motivating example •  Deco data model and queries •  Cost and cardinality estimation •  Experimental results 8/27/2013 Hyunjung Park 8
  • 9. Deco: Data Model (1/2) •  Conceptual Relation: visible to end-users Country (country, language, capital) •  Resolution Rules: cleanse raw data using UDFs country: dupElim language: majority(3) capital: majority(3) 8/27/2013 Hyunjung Park 9
  • 10. Deco: Data Model (2/2) •  Fetch Rules: “access methods” for the crowd language => country “Give me a {language}-speaking country.” Ø => country “Give me a country.” country => language “What language do they speak in {country}?” country => capital “What is the capital of {country}?” 8/27/2013 Hyunjung Park 10 [$0.05] [$0.01] [$0.02] [$0.03]
  • 11. Deco: Queries •  Deco query: SQL query over conceptual relations SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 •  Query processor: access the crowd as needed to produce query result while: 1.  Minimizing monetary cost 2.  Reducing latency 8/27/2013 Hyunjung Park 11 query optimizer query execution engine
  • 12. Query Optimization •  Find the best query plan in terms of estimated monetary cost •  As in traditional query optimizer 1.  Cost and cardinality estimation 2.  Search space 3.  Plan enumeration algorithm 8/27/2013 12Hyunjung Park
  • 13. Cost Estimation •  Total monetary cost = ∑Fetch  F  F.price × F.cardinality –  Existing data is “free” •  Definition of Cardinality in Deco –  Total number of expected output tuples from operator until query execution terminates •  Cardinality estimation –  Final database state needs to be estimated simultaneously 8/27/2013 Hyunjung Park 13
  • 14. Cardinality Estimation: Setting •  $0.05 for all fetch rules •  No existing data •  Selectivity factors –  language=‘Spanish’: 0.1 –  dupElim: 0.8 –  majority(3): 0.4 (=1/2.5) 8/27/2013 Hyunjung Park 14
  • 15. Cardinality Estimation: Plan 1 8/27/2013 15Hyunjung Park SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 MinTuples[8] Project[co,ca] DLOJoin[co] DLOJoin[co] Resolve[dupeli] Resolve[maj3] Resolve[maj3]Filter[la=’Spanish’] Scan [CtryA] Fetch [Øàco] Scan [CtryD2] Fetch [coàca] Scan [CtryD1] Fetch [coàla] 1 2 3 4 12 5 13 96 7 8 10 11 14 Ø => country country => language country => capital Cost estimation: $0.05×(100+200+20) = $16.00200 20 100
  • 16. Cardinality Estimation: Plan 2 8/27/2013 16Hyunjung Park MinTuples[8] Project[co,ca] DLOJoin[co] DLOJoin[co] Resolve[dupeli] Resolve[maj3] Resolve[maj3]Filter[la=’Spanish’] Scan [CtryA] Fetch [laàco] Scan [CtryD2] Fetch [coàca] Scan [CtryD1] Fetch [coàla] 1 2 3 4 12 5 13 96 7 8a 10 11 14 SELECT country, capital FROM Country WHERE language=‘Spanish’ MINTUPLES 8 language => country country => language country => capital Cost estimation: $0.05×(10+20+20) = $2.502010 20
  • 17. 8/27/2013 Hyunjung Park 17 0 1 2 3 Actual Plan 2 Experimental Results 0 5 10 15 Actual Plan 1 country => capital country => language language => country Ø => country ($) ($)
  • 18. 8/27/2013 Hyunjung Park 18 0 1 2 3 Actual Estimated Plan 2 Experimental Results 0 5 10 15 Actual Estimated Plan 1 country => capital country => language language => country Ø => country ($) ($)
  • 19. Related Work •  Declarative approach for crowdsourcing –  Arnold, CrowdDB, CrowdSearcher, Jabberwocky, Qurk, ... •  Crowd-powered algorithms/operations –  Filter, sort, join, max, entity resolution, … •  Also: –  Traditional query optimization –  Heterogeneous or federated database systems 8/27/2013 19Hyunjung Park
  • 20. Summary •  Cost estimation in Deco –  Distinguish between existing data vs. new data –  Estimate cardinality and final database state simultaneously •  In the paper: –  Full description of cost estimation and plan enumeration algorithms –  More experimental results 8/27/2013 Hyunjung Park 20