SlideShare a Scribd company logo
1 of 47
Download to read offline
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
?x	
  
dbp:producer	
  dbr:	
  
Bad_Hair	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
2	
  
Motivation (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Due to the semi-structured nature of RDF,
incomplete values cannot be easily detected. !
3	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
39 movies!
(v. 2015-04)!
4	
  
Motivation (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Films_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
46 movies!
(There are 7 movies
without producers)!
Retrieve	
  movies	
  that	
  have	
  producers	
  and	
  have	
  been	
  filmed	
  in	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
New	
  York	
  City	
  by	
  Universal	
  Pictures.	
  	
  
5	
  
(v. 2015-04)!
Motivation
Movies (shot in NYC by Universal Pictures) with no producers in!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
All images licensed under Fair use via Wikipedia.!
dbr:Legal_Eagles	

6	
  
dbr:Wanderlust	

 dbr:Barney’s_	

Version_(film)	

dbr:Non_Stop_	

(film)	

dbr:The_Wolf_of_Wall_
Street_(2013_film)	

dbr:Broadway_Love	

 dbr:Trainwreck_(film)	

(v. 2015-04)!
Leonardo
DiCaprio is
a producer!
[[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D*
Problem Definition
Given an RDF data set D and a SPARQL query Q against
D. Consider D* the virtual data set that contains all the data
that should be in D. !
!
P1) Identifying portions of Q that yield missing values
!
P2) Resolving missing values
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
⊂
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D ∧∉
µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio}
[[(?movie, dbp:producer, ?producer)]]D*∈
7	
  
Does not belong to DBpedia!
Should belong to DBpedia!
OUR APPROACH: HARE
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
8	
  
HARE
•  A hybrid machine/human SPARQL query engine that
is able to enhance the size of query answers. !
•  Based on a novel RDF completeness model, HARE
implements query optimization and execution techniques:!
P1) Identifying portions of queries that yield missing values.
•  HARE resorts to microtask crowdsourcing:!
P2) Resolving missing values.
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
9	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
10	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
11	
  
RDF Completeness Model (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!rdf:type!
rdf:type!
schema.org:!
Movie!
rdf:type!
dbr:!
Bad_Hair!
?!
?!
dbp:producer!
dbp:producer!
Movies have producers (e.g. db:The_Interpreter).!
dbr:!
Tower_Heist!
dbr:!
The_Interpreter!
…	
  
12	
  
RDF Completeness Model (2)
①  Predicate multiplicity of an RDF resource!
Number of different objects that a resource has for a certain predicate.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
MD(dbr:The_Interpreter | dbp:producer) = 3
dbr:!
Eric_Fellner!
dbr:!
Tim_Bevan!
dbr:!
Kevin_Misher!
dbp:producer!
dbr:!
The_Interpreter!
13	
  
RDF Completeness Model (3)
②  Aggregated predicate multiplicity of a class!
Given a predicate, median number of distinct objects that have all the
resources that belong to a class. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
AMD(schema.org:Movies | dbp:producer) = 3
MD(dbr:The_Interpreter | dbp:producer) = 3
MD(dbr:Legal_Eagles | dbp:producer) = 2
14	
  
RDF Completeness Model (4)
③  Completeness of an RDF resource
(with respect to a predicate)!
Given a predicate, the completeness of an RDF resource is determined
by the aggregated predicate multiplicity of the classes that it belongs to.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CompD(dbr:The_Interpreter | dbp:producer) =
CompD(dbr:Legal_Eagles | dbp:producer) =
CompD(dbr:Bad_Hair) | dbp:producer) =
3
3
2
3
0
3
① 	
  	
  Computed in !
Computed in !② 	
  	
  
15	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
16	
  
Crowd Knowledge
•  The knowledge collected from the crowd is captured in
three knowledge bases:!
•  CKB+, CKB–, CKB~ are fuzzy sets over RDF data
composed of 4-tuples of the form:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
CKB = ( , , )
CKB+! CKB–! CKB~!
(subject, predicate, object, membership_degree)
RDF triple
17	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
18	
  
Types of Crowd Knowledge Bases!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
“Brian Grazer is a producer of Tower Heist.”!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
“Tower Heist does not have a producer.”!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
“I am not sure if Bad Hair has a producer.”!
CKB+!
CKB-!
CKB~!
Contradiction"
Uncertainty!
19	
  
Measuring Contradiction!
!
•  Contradiction occurs when triples with the same subject
and predicate belong to CKB+ and CKB–.!
•  It is measured as follows:!
•  Contradiction values close to 0.0 indicate high consensus.!
!
Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | !
= 0.15!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
20	
  
Measuring Uncertainty!
!
•  When a triple belongs to CKB~, the value of the triple
object is unknown or uncertain.!
!
•  Uncertainty is measured as follows:!
•  Uncertainty values close to 1.0 indicate that the crowd has
shown to be unknowledgeable about the fact to be vetted.!
!
Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})!
= 0.78!
Crowd Knowledge
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!
CKB~!
21	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
22	
  
Query Optimizer (1)
•  Heuristic-based optimizer that decomposes the BGPs of
a SPARQL query into two subsets:!
–  SQD: triples patterns executed against the data set D,"
–  SQCROWD: triple patterns to be crowdsourced.!
!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
23	
  
Query Optimizer (2)
•  Given a SPARQL query Q:!
–  Triple patterns in Q with variables in the subject position
and object position are added to SQCROWD.!
–  The rest of the triple patterns in Q are added to to SQD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SELECT	
  DISTINCT	
  ?movie	
  WHERE	
  {	
  
	
  ?movie	
  rdf:type	
  schema.org:Movie	
  .	
  
	
  ?movie	
  dbp:producer	
  ?producer	
  .	
  
	
  ?movie	
  dct:subject	
  dbc:Universal_Pictures_film	
  .	
  
	
  ?movie	
  dct:subject	
  dbxFilms_shot_in_New_York_City	
  .	
  
}	
   	
   	
  	
  
t1	
  
t2	
  
t3	
  
t4	
  
SQCROWD	
  
SQD	
  
SQD	
  
SQD	
  
24	
  
•  The optimizer builds a query plan TQ for query Q.!
•  Triple patterns from SQD are grouped into star-shaped
sub-queries in a bushy tree [Vidal et al.].!
•  Triple patterns in SQCROWD are added to the plan TQ in a
left-linear fashion.!
!
!
Query Optimizer (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
t2	
  
SQD	
  
SQCROWD	
  
25	
  
Query Engine (1)
•  Executes the query plan TQ.!
•  Sub-queries that are part of SQD are executed against
the data set:!
•  For each mapping contained in Ω, the engine instantiates
the triple patterns in SQCROWD.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
t1	
   t3	
  
t4	
  
SQD	
  
Ω = {{movieà dbr:Tower_Heist},	

{movieà dbr:Legal_Eagles},	

…}	

26	
  
Query Engine (2)
Example of an Iteration !
•  The engine processes {movieà dbr:Tower_Heist}. !
•  Following the running example:!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33
1
3
Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15
Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0
27	
  
(dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)!
(dbr:Tower_Heist, dbp:producer, _:o1, 0.05)!
CKB+!
CKB–!
(dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
Query Engine (3)
Example of an Iteration !
•  The algorithm computes the probability of crowdsourcing
the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):!
•  α is a score weight between 0.0 and 1.0 (in example 0.5)!
•  If P(CROWD | μ(s), p) is greater than a user threshold τ,
then algorithm crowdsources the triple pattern (μ(s), p, o).!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
P(CROWD | μ(s), p) =	

	

α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41	

Estimated
incompleteness
Crowd
reliability
28	
  
•  The engine combines mappings obtained from the data
set D and mappings from the crowd stored in CKB+.!
•  The query evaluation terminates when all the sub-
queries are executed. !
Query Engine (4)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
The HARE query engine does not increase the
time complexity of executing a SPARQL query.!
(Theorem 1)
29	
  
HARE Architecture
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!
30	
  
Microtask Manager (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
• Receives triple patterns to
crowdsource, for example:!
• Creates human tasks.!
!
• Submits tasks to the
crowdsourcing platform.!
(dbr:Tower_Heist, dbp:producer, ?p)
31	
  
Microtask Manager (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
dbr:Tower_Heist, rdfs:label,
dbp:producer, rdfs:label,
dbr:Tower_Heist, foaf:depiction,
dbr:Tower_Heist, dbo:abstract,
dbr:Tower_Heis, foaf:primaryTopic,
HARE exploits the semantics
encoded in RDF resources!
32	
  
Microtask Manager (3)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
33	
  
CKB+! CKB-! CKB~!
EXPERIMENTAL STUDY
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
34	
  
•  Benchmark: 50 queries against (v. 2014).!
–  Ten queries in different knowledge domains: !
History, Life Sciences, Movies, Music, and Sports.!
•  Implementation details:!
–  HARE is implemented in Python 2.7.6.!
–  CrowdFlower is used as crowdsourcing platform.!
•  Crowdsourcing configuration:!
–  Four different RDF triples per task, 0.07 US$ per task.!
–  At least three judgments were collected per task.!
•  Total RDF triple patterns crowdsourced: 502!
•  Total answers collected from the crowd: 1,609!
Experimental Set-Up
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
35	
  
Results: Size of Query Answer (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
5
10
15
20
25
30
35
40
45
Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7
#Answers
Queries
Crowd Answers
Data Set Answers
Sports!
0
10
20
30
40
50
60
70
80
Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Music! Life Sciences!
0
20
40
60
80
100
120
140
160
180
Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
36	
  
Metric: Number of answers when queries are executed.!
Results: Size of Query Answer (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
100
200
300
400
500
Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9
#Answers Queries
Crowd Answers
Data Set Answers
0
20
40
60
80
100
120
140
160
Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10
#Answers
Queries
Crowd Answers
Data Set Answers
Movies! History!
1.05 – 3.13! 1.10 – 1.89!
HARE identifies sub-queries that produce incomplete answers.
Crowdsourcing is a feasible solution to resolve missing values. !
37	
  
Metric: Number of answers when queries are executed.!
Metric: Elapsed time since the first task until the last answer is retrieved.!
Results: Crowd Response Time (1)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Judgmentscompleted(%)!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Time (min)
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Sports! Music! Life Sciences!
(12th min.): 77%!
Time (min)Time (min)
(12th min.): 82%! (12th min.): 97%!
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
38	
  
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Results: Crowd Response Time (2)
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Judgmentscompleted(%)!
Movies! History!
(12th min.): 98%!
Time (min)
(12th min.): 75%!
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
Q1 Q2
Q3 Q4
Q5 Q6
Q7 Q8
Q9 Q10
Time (min)
At the 12th minute after the first task is submitted
the crowd produces at least 75% of the answers.!
39	
  
Metric: Elapsed time since the first task until the last answer is retrieved.!
Metric: A true positive is a mapping that belongs to the query answer.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 0.67 0.88 1.00
Q2 1.00 1.00 1.00 0.96 1.00
Q3 1.00 1.00 0.89 0.79 0.67
Q4 0.55 0.67 1.00 1.00 0.96
Q5 0.86 0.67 1.00 1.00 0.95
Q6 0.69 0.83 1.00 1.00 0.96
Q7 1.00 0.63 0.71 1.00 0.57
Q8 1.00 0.67 0.88 0.94 0.72
Q9 0.46 0.73 1.00 1.00 0.64
Q10 0.92 0.49 1.00 1.00 0.95
Avg 0.85 0.77 0.91 0.96 0.84
Results: Quality of Crowd Answers
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Sports Music
Life
Sciences Movies History
Q1 1.00 1.00 1.00 0.47 1.00
Q2 1.00 0.29 1.00 1.00 1.00
Q3 1.00 1.00 1.00 1.00 1.00
Q4 0.83 1.00 1.00 1.00 1.00
Q5 1.00 0.86 1.00 1.00 1.00
Q6 1.00 1.00 1.00 1.00 0.96
Q7 1.00 1.00 1.00 1.00 0.84
Q8 1.00 1.00 1.00 1.00 0.78
Q9 1.00 1.00 1.00 1.00 0.92
Q10 1.00 1.00 1.00 1.00 0.98
Avg 0.98 0.91 1.00 0.95 0.95
Recall! Precision!
The crowd exhibits heterogeneous performance within domains.
This supports the importance of HARE triple-based approach.!
40	
  
RELATED WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
41	
  
Human/computer query processing architectures!
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
Manual
specification
Automatically
HARE
CrowdDB [Franklin et al.]: Tables, columns
Deco [Park and Widom]: Rules
Qurk [Marcus et al.]: Microtask I/O
HARE relies on the RDF graph and crowd
knowledge to resort to crowdsourcing !
Crowdsourcing
42	
  
Crowdsourcing in other contexts of Data Management
(SPARQL- or RDF-based)
Summary of Related Work
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
HARE
OASSIS
[Amsterdamer et al.]
KATARA
[Chu et al.]
SPARQL
Query Processing
Tabular Data
Cleansing
Recommendation
System
Mines crowdsourced
patterns specified in a
SPARQL-like language
Compares tabular data
against RDF data sets via
crowdsourced mappings
Resorts to crowdsourcing
to complete missing
values in RDF data sets
43	
  
CONCLUSIONS &
FUTURE WORK
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
44	
  
Conclusions
•  HARE: Hybrid query engine against RDF data sets.!
•  Supports microtasks to enhance query answers on-the-fly.!
!
!
•  Experimental results confirmed that:!
!
!
Future work
•  Study further approaches to capture crowd reliability.!
•  Consider other quality dimensions on the knowledge collected
from the crowd.!
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
3.13 times!
Size of query answer!
Crowd response time!
(12th min.): 98%!
Accuracy!
0.84 – 0.96!
45	
  
References
•  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S.
Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In
SIGMOD, pages 589–600, 2014. !
•  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang,
and Y. Ye. Katara: A data cleaning system powered by knowledge bases
and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. !
•  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh.
Counting with the crowd. PVLDB, 6(2):109–120, 2012. !
•  [Park and Widom] H. Park and J.Widom. Query optimization over
crowdsourced data. PVLDB, 6(10):781–792, 2013. !
•  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and
A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC,
pages 228–242, 2010. !
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing – Acosta et al.!
46	
  
HARE: A Hybrid SPARQL Engine to Enhance
Query Answers via Crowdsourcing
Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal!
SPARQL Query Q, τ"
RDF
Completeness
Model !
Tasks!
Human
input!
Crowd Knowledge!
Query Engine!
Crowd!
CKB+! CKB-! CKB~!
Query
Optimizer!
Microtask
Manager!
LOD Cloud!
Query plan!
Crowdsourcing triple patterns!
RDF !
Data Set!
Input!
Results for Q"
Bindings from
the crowd!
RDF
data!
Output!
Aggregated!
Human Input!

More Related Content

What's hot

Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleDataWorks Summit
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAAdam Doyle
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architectureMatteo Merli
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark IntegrationGang Tao
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query ParsingErik Hatcher
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 
Polylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsPolylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsLongtail Video
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021StreamNative
 
Kafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedKafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedSumant Tambe
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022StreamNative
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkDatabricks
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used forAljoscha Krettek
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safeconfluent
 
Flex Your Database on 12c's Flex ASM and Flex Cluster
Flex Your Database on 12c's Flex ASM and Flex ClusterFlex Your Database on 12c's Flex ASM and Flex Cluster
Flex Your Database on 12c's Flex ASM and Flex ClusterMaaz Anjum
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environmentconfluent
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI FlowJiangjie Qin
 
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache SparkApache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache SparkTakuya UESHIN
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First OverviewRicardo Paiva
 

What's hot (20)

Hive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! ScaleHive and Apache Tez: Benchmarked at Yahoo! Scale
Hive and Apache Tez: Benchmarked at Yahoo! Scale
 
Apache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEAApache Iceberg Presentation for the St. Louis Big Data IDEA
Apache Iceberg Presentation for the St. Louis Big Data IDEA
 
Apache pulsar - storage architecture
Apache pulsar - storage architectureApache pulsar - storage architecture
Apache pulsar - storage architecture
 
Splunk Spark Integration
Splunk Spark IntegrationSplunk Spark Integration
Splunk Spark Integration
 
Solr Query Parsing
Solr Query ParsingSolr Query Parsing
Solr Query Parsing
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
Polylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed SystemsPolylog: A Log-Based Architecture for Distributed Systems
Polylog: A Log-Based Architecture for Distributed Systems
 
How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021How Pulsar Stores Your Data - Pulsar Summit NA 2021
How Pulsar Stores Your Data - Pulsar Summit NA 2021
 
Integrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data LakesIntegrating Apache Spark and NiFi for Data Lakes
Integrating Apache Spark and NiFi for Data Lakes
 
Kafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presentedKafka tiered-storage-meetup-2022-final-presented
Kafka tiered-storage-meetup-2022-final-presented
 
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
Beam + Pulsar: Powerful Stream Processing at Scale - Pulsar Summit SF 2022
 
Building Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache SparkBuilding Robust ETL Pipelines with Apache Spark
Building Robust ETL Pipelines with Apache Spark
 
Apache Flink and what it is used for
Apache Flink and what it is used forApache Flink and what it is used for
Apache Flink and what it is used for
 
How to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams SafeHow to Lock Down Apache Kafka and Keep Your Streams Safe
How to Lock Down Apache Kafka and Keep Your Streams Safe
 
Flex Your Database on 12c's Flex ASM and Flex Cluster
Flex Your Database on 12c's Flex ASM and Flex ClusterFlex Your Database on 12c's Flex ASM and Flex Cluster
Flex Your Database on 12c's Flex ASM and Flex Cluster
 
Integrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your EnvironmentIntegrating Apache Kafka Into Your Environment
Integrating Apache Kafka Into Your Environment
 
AI made easy with Flink AI Flow
AI made easy with Flink AI FlowAI made easy with Flink AI Flow
AI made easy with Flink AI Flow
 
Apache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache SparkApache Arrow and Pandas UDF on Apache Spark
Apache Arrow and Pandas UDF on Apache Spark
 
Apache Pulsar First Overview
Apache PulsarFirst OverviewApache PulsarFirst Overview
Apache Pulsar First Overview
 
HDFS Analysis for Small Files
HDFS Analysis for Small FilesHDFS Analysis for Small Files
HDFS Analysis for Small Files
 

Similar to HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic WebJan Beeck
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...Maribel Acosta Deibe
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1andreas_schultz
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresSerena Villata
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Chris Fregly
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsJean-Paul Calbimonte
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLFariz Darari
 
CliqueSquare processing
CliqueSquare processingCliqueSquare processing
CliqueSquare processingINRIA-OAK
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?Ruben Verborgh
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Webebiquity
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD CloudRuben Verborgh
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraStratio
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsSpeck&Tech
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02eswcsummerschool
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackTypenathanmarz
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2BarryK88
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesData Ninja API
 

Similar to HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing (20)

SPARQL in the Semantic Web
SPARQL in the Semantic WebSPARQL in the Semantic Web
SPARQL in the Semantic Web
 
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
HARE: An Engine for Enhancing Answer Completeness of SPARQL Queries via Crowd...
 
Data translation with SPARQL 1.1
Data translation with SPARQL 1.1Data translation with SPARQL 1.1
Data translation with SPARQL 1.1
 
Linked Data Fragments
Linked Data FragmentsLinked Data Fragments
Linked Data Fragments
 
Context-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph StoresContext-Aware Access Control for RDF Graph Stores
Context-Aware Access Control for RDF Graph Stores
 
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
Spark After Dark 2.0 - Apache Big Data Conf - Vancouver - May 11, 2016
 
RDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of SemanticsRDF Stream Processing and the role of Semantics
RDF Stream Processing and the role of Semantics
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Querying Linked Data
Querying Linked DataQuerying Linked Data
Querying Linked Data
 
Dependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQLDependency Parsing-based QA System for RDF and SPARQL
Dependency Parsing-based QA System for RDF and SPARQL
 
CliqueSquare processing
CliqueSquare processingCliqueSquare processing
CliqueSquare processing
 
Querying data on the Web – client or server?
Querying data on the Web – client or server?Querying data on the Web – client or server?
Querying data on the Web – client or server?
 
Finding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic WebFinding knowledge, data and answers on the Semantic Web
Finding knowledge, data and answers on the Semantic Web
 
The Lonesome LOD Cloud
The Lonesome LOD CloudThe Lonesome LOD Cloud
The Lonesome LOD Cloud
 
An efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and CassandraAn efficient data mining solution by integrating Spark and Cassandra
An efficient data mining solution by integrating Spark and Cassandra
 
Towards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIsTowards Virtual Knowledge Graphs over Web APIs
Towards Virtual Knowledge Graphs over Web APIs
 
Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02Mon norton tut_queryinglinkeddata02
Mon norton tut_queryinglinkeddata02
 
Clojure at BackType
Clojure at BackTypeClojure at BackType
Clojure at BackType
 
Semantic web assignment 2
Semantic web assignment 2Semantic web assignment 2
Semantic web assignment 2
 
Applying large scale text analytics with graph databases
Applying large scale text analytics with graph databasesApplying large scale text analytics with graph databases
Applying large scale text analytics with graph databases
 

More from Maribel Acosta Deibe

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsMaribel Acosta Deibe
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia StudyMaribel Acosta Deibe
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Maribel Acosta Deibe
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialMaribel Acosta Deibe
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentMaribel Acosta Deibe
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesMaribel Acosta Deibe
 

More from Maribel Acosta Deibe (7)

A Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia MappingsA Closer Look at the Changing Dynamics of DBpedia Mappings
A Closer Look at the Changing Dynamics of DBpedia Mappings
 
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs:A DBpedia StudyCrowdsourcing the Quality of Knowledge Graphs:A DBpedia Study
Crowdsourcing the Quality of Knowledge Graphs: A DBpedia Study
 
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
Diefficiency Metrics: Measuring the Continuous Efficiency of Query Processing...
 
Adaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of EndpointsAdaptive Semantic Data Management Techniques for Federations of Endpoints
Adaptive Semantic Data Management Techniques for Federations of Endpoints
 
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 TutorialSemantic Data Management in Graph Databases: ESWC 2014 Tutorial
Semantic Data Management in Graph Databases: ESWC 2014 Tutorial
 
Crowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality AssessmentCrowdsourcing Linked Data Quality Assessment
Crowdsourcing Linked Data Quality Assessment
 
Semantic Data Management in Graph Databases
Semantic Data Management in Graph DatabasesSemantic Data Management in Graph Databases
Semantic Data Management in Graph Databases
 

Recently uploaded

Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxJorenAcuavera1
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringPrajakta Shinde
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...D. B. S. College Kanpur
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》rnrncn29
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...navyadasi1992
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensorsonawaneprad
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxMurugaveni B
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxuniversity
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPirithiRaju
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024AyushiRastogi48
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPirithiRaju
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)Columbia Weather Systems
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.PraveenaKalaiselvan1
 

Recently uploaded (20)

Volatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -IVolatile Oils Pharmacognosy And Phytochemistry -I
Volatile Oils Pharmacognosy And Phytochemistry -I
 
Topic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptxTopic 9- General Principles of International Law.pptx
Topic 9- General Principles of International Law.pptx
 
Microteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical EngineeringMicroteaching on terms used in filtration .Pharmaceutical Engineering
Microteaching on terms used in filtration .Pharmaceutical Engineering
 
Pests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdfPests of castor_Binomics_Identification_Dr.UPR.pdf
Pests of castor_Binomics_Identification_Dr.UPR.pdf
 
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
Fertilization: Sperm and the egg—collectively called the gametes—fuse togethe...
 
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》《Queensland毕业文凭-昆士兰大学毕业证成绩单》
《Queensland毕业文凭-昆士兰大学毕业证成绩单》
 
Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?Let’s Say Someone Did Drop the Bomb. Then What?
Let’s Say Someone Did Drop the Bomb. Then What?
 
Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...Radiation physics in Dental Radiology...
Radiation physics in Dental Radiology...
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 
Environmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial BiosensorEnvironmental Biotechnology Topic:- Microbial Biosensor
Environmental Biotechnology Topic:- Microbial Biosensor
 
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptxSTOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
STOPPED FLOW METHOD & APPLICATION MURUGAVENI B.pptx
 
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptxThermodynamics ,types of system,formulae ,gibbs free energy .pptx
Thermodynamics ,types of system,formulae ,gibbs free energy .pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Pests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdfPests of Bengal gram_Identification_Dr.UPR.pdf
Pests of Bengal gram_Identification_Dr.UPR.pdf
 
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
User Guide: Pulsar™ Weather Station (Columbia Weather Systems)
 
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdfPests of soyabean_Binomics_IdentificationDr.UPR.pdf
Pests of soyabean_Binomics_IdentificationDr.UPR.pdf
 
Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024Vision and reflection on Mining Software Repositories research in 2024
Vision and reflection on Mining Software Repositories research in 2024
 
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdfPests of safflower_Binomics_Identification_Dr.UPR.pdf
Pests of safflower_Binomics_Identification_Dr.UPR.pdf
 
User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)User Guide: Orion™ Weather Station (Columbia Weather Systems)
User Guide: Orion™ Weather Station (Columbia Weather Systems)
 
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
BIOETHICS IN RECOMBINANT DNA TECHNOLOGY.
 

HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing

  • 1. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! ?x   dbp:producer  dbr:   Bad_Hair  
  • 2. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 2  
  • 3. Motivation (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Due to the semi-structured nature of RDF, incomplete values cannot be easily detected. ! 3  
  • 4. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     39 movies! (v. 2015-04)! 4  
  • 5. Motivation (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbc:Films_shot_in_New_York_City  .   }         46 movies! (There are 7 movies without producers)! Retrieve  movies  that  have  producers  and  have  been  filmed  in                         New  York  City  by  Universal  Pictures.     5   (v. 2015-04)!
  • 6. Motivation Movies (shot in NYC by Universal Pictures) with no producers in! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! All images licensed under Fair use via Wikipedia.! dbr:Legal_Eagles 6   dbr:Wanderlust dbr:Barney’s_ Version_(film) dbr:Non_Stop_ (film) dbr:The_Wolf_of_Wall_ Street_(2013_film) dbr:Broadway_Love dbr:Trainwreck_(film) (v. 2015-04)! Leonardo DiCaprio is a producer!
  • 7. [[(?movie, dbp:producer, ?producer)]]D [[(?movie, dbp:producer, ?producer)]]D* Problem Definition Given an RDF data set D and a SPARQL query Q against D. Consider D* the virtual data set that contains all the data that should be in D. ! ! P1) Identifying portions of Q that yield missing values ! P2) Resolving missing values HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! ⊂ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D ∧∉ µ={movieàdbr:The_Wolf_of_Wall_Street_(2013)_film, produceràdbr:Leonardo_DiCaprio} [[(?movie, dbp:producer, ?producer)]]D*∈ 7   Does not belong to DBpedia! Should belong to DBpedia!
  • 8. OUR APPROACH: HARE HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 8  
  • 9. HARE •  A hybrid machine/human SPARQL query engine that is able to enhance the size of query answers. ! •  Based on a novel RDF completeness model, HARE implements query optimization and execution techniques:! P1) Identifying portions of queries that yield missing values. •  HARE resorts to microtask crowdsourcing:! P2) Resolving missing values. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 9  
  • 10. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 10  
  • 11. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 11  
  • 12. RDF Completeness Model (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer!rdf:type! rdf:type! schema.org:! Movie! rdf:type! dbr:! Bad_Hair! ?! ?! dbp:producer! dbp:producer! Movies have producers (e.g. db:The_Interpreter).! dbr:! Tower_Heist! dbr:! The_Interpreter! …   12  
  • 13. RDF Completeness Model (2) ①  Predicate multiplicity of an RDF resource! Number of different objects that a resource has for a certain predicate.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! MD(dbr:The_Interpreter | dbp:producer) = 3 dbr:! Eric_Fellner! dbr:! Tim_Bevan! dbr:! Kevin_Misher! dbp:producer! dbr:! The_Interpreter! 13  
  • 14. RDF Completeness Model (3) ②  Aggregated predicate multiplicity of a class! Given a predicate, median number of distinct objects that have all the resources that belong to a class. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! AMD(schema.org:Movies | dbp:producer) = 3 MD(dbr:The_Interpreter | dbp:producer) = 3 MD(dbr:Legal_Eagles | dbp:producer) = 2 14  
  • 15. RDF Completeness Model (4) ③  Completeness of an RDF resource (with respect to a predicate)! Given a predicate, the completeness of an RDF resource is determined by the aggregated predicate multiplicity of the classes that it belongs to.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CompD(dbr:The_Interpreter | dbp:producer) = CompD(dbr:Legal_Eagles | dbp:producer) = CompD(dbr:Bad_Hair) | dbp:producer) = 3 3 2 3 0 3 ①     Computed in ! Computed in !②      15  
  • 16. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 16  
  • 17. Crowd Knowledge •  The knowledge collected from the crowd is captured in three knowledge bases:! •  CKB+, CKB–, CKB~ are fuzzy sets over RDF data composed of 4-tuples of the form:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! CKB = ( , , ) CKB+! CKB–! CKB~! (subject, predicate, object, membership_degree) RDF triple 17  
  • 18. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! 18  
  • 19. Types of Crowd Knowledge Bases! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! “Brian Grazer is a producer of Tower Heist.”! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! “Tower Heist does not have a producer.”! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! “I am not sure if Bad Hair has a producer.”! CKB+! CKB-! CKB~! Contradiction" Uncertainty! 19  
  • 20. Measuring Contradiction! ! •  Contradiction occurs when triples with the same subject and predicate belong to CKB+ and CKB–.! •  It is measured as follows:! •  Contradiction values close to 0.0 indicate high consensus.! ! Contradiction(dbr:Tower_Heist | dbp:producer) = 1 - | 0.9 – 0.05 | ! = 0.15! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! 20  
  • 21. Measuring Uncertainty! ! •  When a triple belongs to CKB~, the value of the triple object is unknown or uncertain.! ! •  Uncertainty is measured as follows:! •  Uncertainty values close to 1.0 indicate that the crowd has shown to be unknowledgeable about the fact to be vetted.! ! Uncertainty(dbr:Bad_Hair| dbp:producer) = avg({0.78})! = 0.78! Crowd Knowledge HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)! CKB~! 21  
  • 22. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 22  
  • 23. Query Optimizer (1) •  Heuristic-based optimizer that decomposes the BGPs of a SPARQL query into two subsets:! –  SQD: triples patterns executed against the data set D," –  SQCROWD: triple patterns to be crowdsourced.! ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 23  
  • 24. Query Optimizer (2) •  Given a SPARQL query Q:! –  Triple patterns in Q with variables in the subject position and object position are added to SQCROWD.! –  The rest of the triple patterns in Q are added to to SQD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SELECT  DISTINCT  ?movie  WHERE  {    ?movie  rdf:type  schema.org:Movie  .    ?movie  dbp:producer  ?producer  .    ?movie  dct:subject  dbc:Universal_Pictures_film  .    ?movie  dct:subject  dbxFilms_shot_in_New_York_City  .   }         t1   t2   t3   t4   SQCROWD   SQD   SQD   SQD   24  
  • 25. •  The optimizer builds a query plan TQ for query Q.! •  Triple patterns from SQD are grouped into star-shaped sub-queries in a bushy tree [Vidal et al.].! •  Triple patterns in SQCROWD are added to the plan TQ in a left-linear fashion.! ! ! Query Optimizer (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   t2   SQD   SQCROWD   25  
  • 26. Query Engine (1) •  Executes the query plan TQ.! •  Sub-queries that are part of SQD are executed against the data set:! •  For each mapping contained in Ω, the engine instantiates the triple patterns in SQCROWD.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! t1   t3   t4   SQD   Ω = {{movieà dbr:Tower_Heist}, {movieà dbr:Legal_Eagles}, …} 26  
  • 27. Query Engine (2) Example of an Iteration ! •  The engine processes {movieà dbr:Tower_Heist}. ! •  Following the running example:! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Comp (dbr:Tower_Heist) | dbp:producer) = = 0.33 1 3 Contradiction (dbr:Tower_Heist) | dbp:producer) = 0.15 Uncertainty(dbr:Tower_Heist) | dbp:producer) = 0.0 27   (dbr:Tower_Heist, dbp:producer, dbr:Brian_Grazer, 0.9)! (dbr:Tower_Heist, dbp:producer, _:o1, 0.05)! CKB+! CKB–! (dbr:Bad_Hair, dbp:producer, _:o2, 0.78)!CKB~!
  • 28. Query Engine (3) Example of an Iteration ! •  The algorithm computes the probability of crowdsourcing the triple pattern (dbr:Tower_Heist, dbp:producer, ?producer):! •  α is a score weight between 0.0 and 1.0 (in example 0.5)! •  If P(CROWD | μ(s), p) is greater than a user threshold τ, then algorithm crowdsources the triple pattern (μ(s), p, o).! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! P(CROWD | μ(s), p) = α (1 – 0.33) + (1 – α) min{0.15, 1 – 0.0} = 0.41 Estimated incompleteness Crowd reliability 28  
  • 29. •  The engine combines mappings obtained from the data set D and mappings from the crowd stored in CKB+.! •  The query evaluation terminates when all the sub- queries are executed. ! Query Engine (4) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! The HARE query engine does not increase the time complexity of executing a SPARQL query.! (Theorem 1) 29  
  • 30. HARE Architecture HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input! 30  
  • 31. Microtask Manager (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! • Receives triple patterns to crowdsource, for example:! • Creates human tasks.! ! • Submits tasks to the crowdsourcing platform.! (dbr:Tower_Heist, dbp:producer, ?p) 31  
  • 32. Microtask Manager (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! dbr:Tower_Heist, rdfs:label, dbp:producer, rdfs:label, dbr:Tower_Heist, foaf:depiction, dbr:Tower_Heist, dbo:abstract, dbr:Tower_Heis, foaf:primaryTopic, HARE exploits the semantics encoded in RDF resources! 32  
  • 33. Microtask Manager (3) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 33   CKB+! CKB-! CKB~!
  • 34. EXPERIMENTAL STUDY HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 34  
  • 35. •  Benchmark: 50 queries against (v. 2014).! –  Ten queries in different knowledge domains: ! History, Life Sciences, Movies, Music, and Sports.! •  Implementation details:! –  HARE is implemented in Python 2.7.6.! –  CrowdFlower is used as crowdsourcing platform.! •  Crowdsourcing configuration:! –  Four different RDF triples per task, 0.07 US$ per task.! –  At least three judgments were collected per task.! •  Total RDF triple patterns crowdsourced: 502! •  Total answers collected from the crowd: 1,609! Experimental Set-Up HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 35  
  • 36. Results: Size of Query Answer (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 5 10 15 20 25 30 35 40 45 Q1 Q2 Q5 Q6 Q3 Q4 Q10 Q8 Q9 Q7 #Answers Queries Crowd Answers Data Set Answers Sports! 0 10 20 30 40 50 60 70 80 Q4 Q2 Q3 Q1 Q5 Q4 Q7 Q8 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Music! Life Sciences! 0 20 40 60 80 100 120 140 160 180 Q2 Q4 Q1 Q3 Q5 Q8 Q7 Q9 Q6 Q10 #Answers Queries Crowd Answers Data Set Answers 1.25 – 2.00! 1.50 – 2.00! 1.08 – 1.92! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 36   Metric: Number of answers when queries are executed.!
  • 37. Results: Size of Query Answer (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 100 200 300 400 500 Q1 Q2 Q3 Q5 Q6 Q4 Q7 Q8 Q10 Q9 #Answers Queries Crowd Answers Data Set Answers 0 20 40 60 80 100 120 140 160 Q8 Q3 Q7 Q6 Q5 Q4 Q1 Q2 Q9 Q10 #Answers Queries Crowd Answers Data Set Answers Movies! History! 1.05 – 3.13! 1.10 – 1.89! HARE identifies sub-queries that produce incomplete answers. Crowdsourcing is a feasible solution to resolve missing values. ! 37   Metric: Number of answers when queries are executed.!
  • 38. Metric: Elapsed time since the first task until the last answer is retrieved.! Results: Crowd Response Time (1) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Judgmentscompleted(%)! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Time (min) Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Sports! Music! Life Sciences! (12th min.): 77%! Time (min)Time (min) (12th min.): 82%! (12th min.): 97%! At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 38  
  • 39. 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Results: Crowd Response Time (2) HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Judgmentscompleted(%)! Movies! History! (12th min.): 98%! Time (min) (12th min.): 75%! 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Time (min) At the 12th minute after the first task is submitted the crowd produces at least 75% of the answers.! 39   Metric: Elapsed time since the first task until the last answer is retrieved.!
  • 40. Metric: A true positive is a mapping that belongs to the query answer.! Sports Music Life Sciences Movies History Q1 1.00 1.00 0.67 0.88 1.00 Q2 1.00 1.00 1.00 0.96 1.00 Q3 1.00 1.00 0.89 0.79 0.67 Q4 0.55 0.67 1.00 1.00 0.96 Q5 0.86 0.67 1.00 1.00 0.95 Q6 0.69 0.83 1.00 1.00 0.96 Q7 1.00 0.63 0.71 1.00 0.57 Q8 1.00 0.67 0.88 0.94 0.72 Q9 0.46 0.73 1.00 1.00 0.64 Q10 0.92 0.49 1.00 1.00 0.95 Avg 0.85 0.77 0.91 0.96 0.84 Results: Quality of Crowd Answers HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Sports Music Life Sciences Movies History Q1 1.00 1.00 1.00 0.47 1.00 Q2 1.00 0.29 1.00 1.00 1.00 Q3 1.00 1.00 1.00 1.00 1.00 Q4 0.83 1.00 1.00 1.00 1.00 Q5 1.00 0.86 1.00 1.00 1.00 Q6 1.00 1.00 1.00 1.00 0.96 Q7 1.00 1.00 1.00 1.00 0.84 Q8 1.00 1.00 1.00 1.00 0.78 Q9 1.00 1.00 1.00 1.00 0.92 Q10 1.00 1.00 1.00 1.00 0.98 Avg 0.98 0.91 1.00 0.95 0.95 Recall! Precision! The crowd exhibits heterogeneous performance within domains. This supports the importance of HARE triple-based approach.! 40  
  • 41. RELATED WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 41  
  • 42. Human/computer query processing architectures! Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! Manual specification Automatically HARE CrowdDB [Franklin et al.]: Tables, columns Deco [Park and Widom]: Rules Qurk [Marcus et al.]: Microtask I/O HARE relies on the RDF graph and crowd knowledge to resort to crowdsourcing ! Crowdsourcing 42  
  • 43. Crowdsourcing in other contexts of Data Management (SPARQL- or RDF-based) Summary of Related Work HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! HARE OASSIS [Amsterdamer et al.] KATARA [Chu et al.] SPARQL Query Processing Tabular Data Cleansing Recommendation System Mines crowdsourced patterns specified in a SPARQL-like language Compares tabular data against RDF data sets via crowdsourced mappings Resorts to crowdsourcing to complete missing values in RDF data sets 43  
  • 44. CONCLUSIONS & FUTURE WORK HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 44  
  • 45. Conclusions •  HARE: Hybrid query engine against RDF data sets.! •  Supports microtasks to enhance query answers on-the-fly.! ! ! •  Experimental results confirmed that:! ! ! Future work •  Study further approaches to capture crowd reliability.! •  Consider other quality dimensions on the knowledge collected from the crowd.! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 3.13 times! Size of query answer! Crowd response time! (12th min.): 98%! Accuracy! 0.84 – 0.96! 45  
  • 46. References •  [Amsterdamer et al.] Y. Amsterdamer, S. B. Davidson, T. Milo, S. Novgorodov, and A. Somech. OASSIS: query driven crowd mining. In SIGMOD, pages 589–600, 2014. ! •  [Chu et al.] X. Chu, J. Morcos, I. F. Ilyas, M. Ouzzani, P. Papotti, N. Tang, and Y. Ye. Katara: A data cleaning system powered by knowledge bases and crowdsourcing. In SIGMOD, pages 1247–1261, 2015. ! •  [Marcus et al.] A. Marcus, D. R. Karger, S. Madden, R. Miller, and S. Oh. Counting with the crowd. PVLDB, 6(2):109–120, 2012. ! •  [Park and Widom] H. Park and J.Widom. Query optimization over crowdsourced data. PVLDB, 6(10):781–792, 2013. ! •  [Vidal et al.] M.E. Vidal, E. Ruckhaus, T. Lampo, A. Martínez, J. Sierra, and A. Polleres. Efficiently joining group patterns in SPARQL queries. In ESWC, pages 228–242, 2010. ! HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing – Acosta et al.! 46  
  • 47. HARE: A Hybrid SPARQL Engine to Enhance Query Answers via Crowdsourcing Maribel Acosta, Elena Simperl, Fabian Flöck, Maria-Esther Vidal! SPARQL Query Q, τ" RDF Completeness Model ! Tasks! Human input! Crowd Knowledge! Query Engine! Crowd! CKB+! CKB-! CKB~! Query Optimizer! Microtask Manager! LOD Cloud! Query plan! Crowdsourcing triple patterns! RDF ! Data Set! Input! Results for Q" Bindings from the crowd! RDF data! Output! Aggregated! Human Input!