SlideShare une entreprise Scribd logo
1  sur  18
Télécharger pour lire hors ligne
1 
Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Karl Aberer 
École Polytechnique Fédérale de Lausanne, Switzerland 
Zoltán Miklós 
Université de Rennes 1, IRISA, France 
DASFAA 2013, Part II, LNCS 7826, pp. 139 – 154, 2013
2 
Database schema matching is an active research field: 
Surveys: [1], [2] 
Applications: data transformation, data migration, data alignment, … 
Automatic Matching Tools: COMA++, AMC, OpenII, Falcon, … 
Schema matching is the task of establishing correspondences that connect related 
attributes in two (independently developed) database schemas. 
SA SB 
BirthName BirthName 
BirthDate 
Address Address 
[1] Rahm, E. et al. “A Survey of Approaches to Automatic Schema Matching”. JVLDB, 2001 
[2] Bernstein, P.A. et al. “Generic Schema Matching, Ten Years Later”. PVLDB, 2011
3 
Automatic schema matchers will 
(sometimes) fail to identify the correct 
correspondences 
There is a need for post‐matching 
reconciliation through human input 
This effort is the « real cost » in the company 
Schemas do not appear alone, they are 
part of a matching network 
The network‐level consistency constraints 
are very important for business users
4 
Real‐world scenario: a repository of schemas in the same domain 
Schema matching network: connect schemas by pair‐wise matchings 
Network‐level consistency constraints 
Automatic tools produce incorrect correspondences  need validation by 
human
5
6
7 
DASFAA’2013, BDA’2013: On Leveraging 
Crowdsourcing Techniques for Schema 
Matching Networks 
ER’2013: Minimizing Human Effort in 
Reconciling Match Networks 
coopIS’2013: Collaborative Schema Matching 
Reconciliation 
ICDE’2014: Pay‐as‐you‐go Reconciliation in 
Schema Matching Networks
“Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions 
from a large group of people, and especially from an online community, rather than from traditional 
employees or suppliers.” ‐ Wiki 
Our context: employ many workers (users) to validate same correspondences and 
combine their answers. 
Surveys: [1], [2] 
A wide range of applications (e.g. CrowdSearch) have been developed on top of 
more than 70 crowdsourcing platforms (e.g. Amazon Mechanical Turk). 
8 
Our contribution: 
Define network‐level constraints in schema matching network 
Design questions for workers to validate correspondences 
Leverage network‐level constraints to reduce user efforts 
[1] E. Law et al. “Human Computation”. Morgan & Claypool Publishers, 2011 
[2] A. Doan et al. “Crowdsourcing systems on the World Wide Web”. CACM, 2011
9
10
11 
Three elements of questions: 
Asking object: correspondence 
Possible choices: simple YES/NO question 
Support Information: alternatives, constraint satisfactions, constraint 
violations
12 
User Question Answer 
U1 C Yes 
U2 C Yes 
U3 C No 
User Quality 
User Reliability 
U1 r1 
U2 r2 
U3 r3 
User Feedbacks 
Answer 
Aggregation 
Probabilistic Model (*) 
Pr(C) 
Compute <a,e> 
aggregation + error rate 
Corr Aggregation Error Rate 
C True 0.19 
r1 = Pr (C=true | U1=yes) 
= Pr (C=false | U1=no) 
(*) Majority Voting, Expectation Maximization, … 
See full paper for details
To achieve higher accuracy, we need more answers  Cost‐Accuracy Tradeoff 
13 
r = 0.6 
Goal 
Solution: Leverage constraints to reduce error rate
14 
Idea: correspondences support each other if they satisfy a constraint 
1‐1 constraint: ONE source attribute matches to only ONE target attribute 
S T 
b1 
a 
b2 
Pr(ab1=true) = 0.8 
Pr(ab2=false) = 0.6 
By independence, 
0.8 x 0.6 
ab1 ab2 Prob 
T T 0.32 not satisfy 
T F 0.48 satisfy 
F T 0.08 satisfy 
F F 0.12 satisfy 
Pr ܾܽଶ ൌ ݂݈ܽݏ݁ ߛଵିଵ ൌ 
0.48 ൅ 0.12 
0.48 ൅ 0.08 ൅ 0.12 
ൌ ૙. ૡૡ 
Without Constraint With Constraint 
Corr Aggregation Error Rate 
ab2 False 0.4 (*) 
Corr Aggregation Error Rate 
ab2 False 0.12 (**) 
> 
(*) Error Rate = 1 – Pr (ab2=false) (**) Error Rate = 1 – Pr ሺܾܽଵ ൌ ݂݈ܽݏ݁|ߛଵିଵሻ
0.512 ൅ 3 ൈ Δ ൈ 0.032 ൅ Δ ൈ 0.008 
ൎ ૙. ૢૠ૜ with ઢ ൌ ૙. ૛ 
15 
Circle constraint: sequence of correspondences create a closed circle 
Δ: probability of compensating errors along the circle (*) 
b Pr(ab=T) = 0.8 
Pr(ac=T) = 0.8 Pr(bc=T) = 0.8 
S3 
S2 
c 
ab bc ac Prob 
T T T 0.512 1.0 
T T F 0.128 0.0 
T F T 0.128 0.0 
T F F 0.032 
F T T 0.128 0.0 
F T F 0.032 
F F T 0.032 
F F F 0.008 
By independence, 
0.8 x 0.8 x 0.8 
Pr ܾܽ ൌ ܂ ߛ௖௜௥௖௟௘ ൌ 
0.512 ൅ Δ ൈ 0.032 
Without Constraint With Constraint 
S1 
a 
Corr Aggregation Error Rate 
ab True 0.2 (**) 
Corr Aggregation Error Rate 
ab True 0.027 (***) 
> 
(**) Error Rate = 1 – Pr (ab=T) (***) Error Rate = 1 – Pr ܾܽ ൌ ܂ ߛ௖௜௥௖௟௘ 
* Cudré-Mauroux, et al. Probabilistic message passing in peer data management systems. ICDE 2006.
16 
Settings: 
Real‐world schemas. Use ground truth to simulate users/workers. 
Error Threshold = 0.1 : make decision when error rate < 0.1; otherwise, 
continue to ask users. 
Metric: Cost = 
Observation: Cost (With Constraints) Cost (Without Constraints)
We model a crowdsourcing process for schema 
matching network 
address optimization goals: minimize monetary cost, 
maximize accuracy (minimize error rate). 
We design a variety of questions with different support 
information. 
We leverage consistency constraints  reduce error 
rate  reduce the monetary cost. 
17
18

Contenu connexe

Tendances

Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionJordan McBain
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latexIAESIJEECS
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix DatasetBen Mabey
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Sri Ambati
 
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...ijseajournal
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson ChallengeRaouf KESKES
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSanjaySingh011996
 
A mathematical model for integrating product of two functions
A mathematical model for integrating product of two functionsA mathematical model for integrating product of two functions
A mathematical model for integrating product of two functionsAlexander Decker
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theorycsandit
 
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...aciijournal
 
Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)VARUN KUMAR
 
Presentation_OCR
Presentation_OCRPresentation_OCR
Presentation_OCRsamvb18
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality ReductionSaad Elbeleidy
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsAleksandr Chuklin
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)theijes
 

Tendances (19)

Pca ankita dubey
Pca ankita dubeyPca ankita dubey
Pca ankita dubey
 
Principal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty DetectionPrincipal Component Analysis For Novelty Detection
Principal Component Analysis For Novelty Detection
 
20 26 jan17 walter latex
20 26 jan17 walter latex20 26 jan17 walter latex
20 26 jan17 walter latex
 
SVD and the Netflix Dataset
SVD and the Netflix DatasetSVD and the Netflix Dataset
SVD and the Netflix Dataset
 
Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013Jan vitek distributedrandomforest_5-2-2013
Jan vitek distributedrandomforest_5-2-2013
 
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...
JAVA BASED VISUALIZATION AND ANIMATION FOR TEACHING THE DIJKSTRA SHORTEST PAT...
 
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...K-MEDOIDS CLUSTERING  USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...
 
Higgs Boson Challenge
Higgs Boson ChallengeHiggs Boson Challenge
Higgs Boson Challenge
 
Some Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial DerivativesSome Engg. Applications of Matrices and Partial Derivatives
Some Engg. Applications of Matrices and Partial Derivatives
 
A mathematical model for integrating product of two functions
A mathematical model for integrating product of two functionsA mathematical model for integrating product of two functions
A mathematical model for integrating product of two functions
 
Fuzzy c means manual work
Fuzzy c means manual workFuzzy c means manual work
Fuzzy c means manual work
 
Reduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theoryReduct generation for the incremental data using rough set theory
Reduct generation for the incremental data using rough set theory
 
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
A Novel Optimization of Cloud Instances with Inventory Theory Applied on Real...
 
Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)Image Registration (Digital Image Processing)
Image Registration (Digital Image Processing)
 
Presentation_OCR
Presentation_OCRPresentation_OCR
Presentation_OCR
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
Dimensionality Reduction
Dimensionality ReductionDimensionality Reduction
Dimensionality Reduction
 
Click Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval MetricsClick Model-Based Information Retrieval Metrics
Click Model-Based Information Retrieval Metrics
 
The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)The International Journal of Engineering and Science (The IJES)
The International Journal of Engineering and Science (The IJES)
 

Similaire à On Leveraging Crowdsourcing Techniques for Schema Matching Networks

Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...ijcsa
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE cscpconf
 
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...wajrcs
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONIJCSEA Journal
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptSubrata Kumer Paul
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...IJAEMSJORNAL
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessingSalah Amean
 
Size Measurement and Estimation
Size Measurement and EstimationSize Measurement and Estimation
Size Measurement and EstimationLouis A. Poulin
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-orientedbutest
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...IJCNCJournal
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringIJECEIAES
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdfAlireza418370
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimationData Con LA
 

Similaire à On Leveraging Crowdsourcing Techniques for Schema Matching Networks (20)

Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
Front End Data Cleaning And Transformation In Standard Printed Form Using Neu...
 
QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE QUERY INVERSION TO FIND DATA PROVENANCE
QUERY INVERSION TO FIND DATA PROVENANCE
 
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
A Fairness-aware Machine Learning Interface for End-to-end Discrimination Dis...
 
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
CLIM Program: Remote Sensing Workshop, A Notional Framework for a Theory of D...
 
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTIONGRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
GRID COMPUTING: STRATEGIC DECISION MAKING IN RESOURCE SELECTION
 
Chapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.pptChapter 3. Data Preprocessing.ppt
Chapter 3. Data Preprocessing.ppt
 
Network predictive analysis
Network predictive analysisNetwork predictive analysis
Network predictive analysis
 
Unit 3-2.ppt
Unit 3-2.pptUnit 3-2.ppt
Unit 3-2.ppt
 
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
CLIM Program: Remote Sensing Workshop, An Introduction to Systems and Softwar...
 
BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...BPSO&1-NN algorithm-based variable selection for power system stability ident...
BPSO&1-NN algorithm-based variable selection for power system stability ident...
 
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
Data Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessingData Mining:  Concepts and Techniques (3rd ed.)- Chapter 3 preprocessing
Data Mining: Concepts and Techniques (3rd ed.) - Chapter 3 preprocessing
 
Size Measurement and Estimation
Size Measurement and EstimationSize Measurement and Estimation
Size Measurement and Estimation
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 
This is a heavily data-oriented
This is a heavily data-orientedThis is a heavily data-oriented
This is a heavily data-oriented
 
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
CONSTRUCTING A FUZZY NETWORK INTRUSION CLASSIFIER BASED ON DIFFERENTIAL EVOLU...
 
An Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream ClusteringAn Improved Differential Evolution Algorithm for Data Stream Clustering
An Improved Differential Evolution Algorithm for Data Stream Clustering
 
03Preprocessing01.pdf
03Preprocessing01.pdf03Preprocessing01.pdf
03Preprocessing01.pdf
 
Approaches to online quantile estimation
Approaches to online quantile estimationApproaches to online quantile estimation
Approaches to online quantile estimation
 
Big Data Challenges and Solutions
Big Data Challenges and SolutionsBig Data Challenges and Solutions
Big Data Challenges and Solutions
 

Plus de PlanetData Network of Excellence

A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoPlanetData Network of Excellence
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingPlanetData Network of Excellence
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamPlanetData Network of Excellence
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingPlanetData Network of Excellence
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...PlanetData Network of Excellence
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchPlanetData Network of Excellence
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSPlanetData Network of Excellence
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReducePlanetData Network of Excellence
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...PlanetData Network of Excellence
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsPlanetData Network of Excellence
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...PlanetData Network of Excellence
 

Plus de PlanetData Network of Excellence (20)

Dl2014 slides
Dl2014 slidesDl2014 slides
Dl2014 slides
 
A Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about TrentinoA Contextualized Knowledge Repository for Open Data about Trentino
A Contextualized Knowledge Repository for Open Data about Trentino
 
Towards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory SensingTowards Enabling Probabilistic Databases for Participatory Sensing
Towards Enabling Probabilistic Databases for Participatory Sensing
 
Privacy-Preserving Schema Reuse
Privacy-Preserving Schema ReusePrivacy-Preserving Schema Reuse
Privacy-Preserving Schema Reuse
 
Pay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching NetworksPay-as-you-go Reconciliation in Schema Matching Networks
Pay-as-you-go Reconciliation in Schema Matching Networks
 
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstreamDemo: tablet-based visualisation of transport data in Madrid using SPARQLstream
Demo: tablet-based visualisation of transport data in Madrid using SPARQLstream
 
On the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream ProcessingOn the need for a W3C community group on RDF Stream Processing
On the need for a W3C community group on RDF Stream Processing
 
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
Urbanopoly: Collection and Quality Assessment of Geo-spatial Linked Data via ...
 
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatchLinking Smart Cities Datasets with Human Computation: the case of UrbanMatch
Linking Smart Cities Datasets with Human Computation: the case of UrbanMatch
 
SciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMSSciQL, Bridging the Gap between Science and Relational DBMS
SciQL, Bridging the Gap between Science and Relational DBMS
 
CLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data ArchitectureCLODA: A Crowdsourced Linked Open Data Architecture
CLODA: A Crowdsourced Linked Open Data Architecture
 
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduceScalable Nonmonotonic Reasoning over RDF Data Using MapReduce
Scalable Nonmonotonic Reasoning over RDF Data Using MapReduce
 
Data and Knowledge Evolution
Data and Knowledge Evolution  Data and Knowledge Evolution
Data and Knowledge Evolution
 
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...Evolution of Workflow Provenance Information in the Presence of Custom Infere...
Evolution of Workflow Provenance Information in the Presence of Custom Infere...
 
Access Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract ModelsAccess Control for RDF graphs using Abstract Models
Access Control for RDF graphs using Abstract Models
 
Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?Arrays in Databases, the next frontier?
Arrays in Databases, the next frontier?
 
Abstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF DatasetsAbstract Access Control Model for Dynamic RDF Datasets
Abstract Access Control Model for Dynamic RDF Datasets
 
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of FactsTowards Parallel Nonmonotonic Reasoning with Billions of Facts
Towards Parallel Nonmonotonic Reasoning with Billions of Facts
 
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
Automation in Cytomics: A Modern RDBMS Based Platform for Image Analysis and ...
 
Heuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQLHeuristic based Query Optimisation for SPARQL
Heuristic based Query Optimisation for SPARQL
 

Dernier

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Standkumarajju5765
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Roomdivyansh0kumar0
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 

Dernier (20)

'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls KolkataLow Rate Call Girls Kolkata Avani 🤌  8250192130 🚀 Vip Call Girls Kolkata
Low Rate Call Girls Kolkata Avani 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night StandHot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
Hot Call Girls |Delhi |Hauz Khas ☎ 9711199171 Book Your One night Stand
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With RoomVIP Kolkata Call Girl Dum Dum 👉 8250192130  Available With Room
VIP Kolkata Call Girl Dum Dum 👉 8250192130 Available With Room
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 

On Leveraging Crowdsourcing Techniques for Schema Matching Networks

  • 1. 1 Nguyen Quoc Viet Hung, Nguyen Thanh Tam, Karl Aberer École Polytechnique Fédérale de Lausanne, Switzerland Zoltán Miklós Université de Rennes 1, IRISA, France DASFAA 2013, Part II, LNCS 7826, pp. 139 – 154, 2013
  • 2. 2 Database schema matching is an active research field: Surveys: [1], [2] Applications: data transformation, data migration, data alignment, … Automatic Matching Tools: COMA++, AMC, OpenII, Falcon, … Schema matching is the task of establishing correspondences that connect related attributes in two (independently developed) database schemas. SA SB BirthName BirthName BirthDate Address Address [1] Rahm, E. et al. “A Survey of Approaches to Automatic Schema Matching”. JVLDB, 2001 [2] Bernstein, P.A. et al. “Generic Schema Matching, Ten Years Later”. PVLDB, 2011
  • 3. 3 Automatic schema matchers will (sometimes) fail to identify the correct correspondences There is a need for post‐matching reconciliation through human input This effort is the « real cost » in the company Schemas do not appear alone, they are part of a matching network The network‐level consistency constraints are very important for business users
  • 4. 4 Real‐world scenario: a repository of schemas in the same domain Schema matching network: connect schemas by pair‐wise matchings Network‐level consistency constraints Automatic tools produce incorrect correspondences  need validation by human
  • 5. 5
  • 6. 6
  • 7. 7 DASFAA’2013, BDA’2013: On Leveraging Crowdsourcing Techniques for Schema Matching Networks ER’2013: Minimizing Human Effort in Reconciling Match Networks coopIS’2013: Collaborative Schema Matching Reconciliation ICDE’2014: Pay‐as‐you‐go Reconciliation in Schema Matching Networks
  • 8. “Crowdsourcing is the practice of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers.” ‐ Wiki Our context: employ many workers (users) to validate same correspondences and combine their answers. Surveys: [1], [2] A wide range of applications (e.g. CrowdSearch) have been developed on top of more than 70 crowdsourcing platforms (e.g. Amazon Mechanical Turk). 8 Our contribution: Define network‐level constraints in schema matching network Design questions for workers to validate correspondences Leverage network‐level constraints to reduce user efforts [1] E. Law et al. “Human Computation”. Morgan & Claypool Publishers, 2011 [2] A. Doan et al. “Crowdsourcing systems on the World Wide Web”. CACM, 2011
  • 9. 9
  • 10. 10
  • 11. 11 Three elements of questions: Asking object: correspondence Possible choices: simple YES/NO question Support Information: alternatives, constraint satisfactions, constraint violations
  • 12. 12 User Question Answer U1 C Yes U2 C Yes U3 C No User Quality User Reliability U1 r1 U2 r2 U3 r3 User Feedbacks Answer Aggregation Probabilistic Model (*) Pr(C) Compute <a,e> aggregation + error rate Corr Aggregation Error Rate C True 0.19 r1 = Pr (C=true | U1=yes) = Pr (C=false | U1=no) (*) Majority Voting, Expectation Maximization, … See full paper for details
  • 13. To achieve higher accuracy, we need more answers  Cost‐Accuracy Tradeoff 13 r = 0.6 Goal Solution: Leverage constraints to reduce error rate
  • 14. 14 Idea: correspondences support each other if they satisfy a constraint 1‐1 constraint: ONE source attribute matches to only ONE target attribute S T b1 a b2 Pr(ab1=true) = 0.8 Pr(ab2=false) = 0.6 By independence, 0.8 x 0.6 ab1 ab2 Prob T T 0.32 not satisfy T F 0.48 satisfy F T 0.08 satisfy F F 0.12 satisfy Pr ܾܽଶ ൌ ݂݈ܽݏ݁ ߛଵିଵ ൌ 0.48 ൅ 0.12 0.48 ൅ 0.08 ൅ 0.12 ൌ ૙. ૡૡ Without Constraint With Constraint Corr Aggregation Error Rate ab2 False 0.4 (*) Corr Aggregation Error Rate ab2 False 0.12 (**) > (*) Error Rate = 1 – Pr (ab2=false) (**) Error Rate = 1 – Pr ሺܾܽଵ ൌ ݂݈ܽݏ݁|ߛଵିଵሻ
  • 15. 0.512 ൅ 3 ൈ Δ ൈ 0.032 ൅ Δ ൈ 0.008 ൎ ૙. ૢૠ૜ with ઢ ൌ ૙. ૛ 15 Circle constraint: sequence of correspondences create a closed circle Δ: probability of compensating errors along the circle (*) b Pr(ab=T) = 0.8 Pr(ac=T) = 0.8 Pr(bc=T) = 0.8 S3 S2 c ab bc ac Prob T T T 0.512 1.0 T T F 0.128 0.0 T F T 0.128 0.0 T F F 0.032 F T T 0.128 0.0 F T F 0.032 F F T 0.032 F F F 0.008 By independence, 0.8 x 0.8 x 0.8 Pr ܾܽ ൌ ܂ ߛ௖௜௥௖௟௘ ൌ 0.512 ൅ Δ ൈ 0.032 Without Constraint With Constraint S1 a Corr Aggregation Error Rate ab True 0.2 (**) Corr Aggregation Error Rate ab True 0.027 (***) > (**) Error Rate = 1 – Pr (ab=T) (***) Error Rate = 1 – Pr ܾܽ ൌ ܂ ߛ௖௜௥௖௟௘ * Cudré-Mauroux, et al. Probabilistic message passing in peer data management systems. ICDE 2006.
  • 16. 16 Settings: Real‐world schemas. Use ground truth to simulate users/workers. Error Threshold = 0.1 : make decision when error rate < 0.1; otherwise, continue to ask users. Metric: Cost = Observation: Cost (With Constraints) Cost (Without Constraints)
  • 17. We model a crowdsourcing process for schema matching network address optimization goals: minimize monetary cost, maximize accuracy (minimize error rate). We design a variety of questions with different support information. We leverage consistency constraints  reduce error rate  reduce the monetary cost. 17
  • 18. 18