This PowerPoint helps students to consider the concept of infinity.
Summary Models for Routing Keywords to Linked Data Sources
1. Summary Models for Routing Keywords
to Linked Data Sources
Thanh Tran, Lei Zhang, Rudi Studer
AIFB Institute, KIT
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
1 National Laboratory of the Helmholtz Association
2. Agenda
Introduction
Opportunities & challenges
Contributions
Problem Definition
LOD Data
Keyword Query Answer
Keyword Query Routing
Summary Models
Keyword sets
Element-level vs. schema-level vs.
source-level Summary
Validity of Results vs. complexity
Theo. / Exp. Results
2 Conclusions ducthanh.tran@kit.edu
Thanh Tran, AIFB Institute, KIT, KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
3. Semantic Data
- 203 linked datasets serve 25 billion RDF triples interconnected by 395 million links
- As of 09-2010 + other data (e.g. LON, ontologies, RDFa ) + increasing rapidly...
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
3 National Laboratory of the Helmholtz Association
4. Opportunities
“Articles from awarded researchers at Stanford ”
Freebase contains data about people More complex information needs
DBPedia contains information about awards More precise results
DBLP contains bibliographic data More integrated results
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
4 National Laboratory of the Helmholtz Association
5. Problems
“Articles from awarded researchers at Stanford ”
Large number of unknown
& irrelevant sources!
What is in there?
What is relevant?
Formulating queries is a hard task! Processing queries is expensive!
• Which data sources?
USABILITY • Process against all data sources?
SCALABILITY
• Which schema elements?
( z). x, y.prizes(x, Turing Award) worksAt(x,y) name(y,Stanford) publication(x, z)
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
5 National Laboratory of the Helmholtz Association
6. Keyword Query Routing
Given the needs expressed as sets of keywords,
are there “corresponding answers” in linked data?
and what combination of data sources can be used to
produce them?
Identify valid combination of Let user choose
sources using keywords combination of sources
Present schema elements for Process only relevant
the user to formulate query combinations of sources
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
6 National Laboratory of the Helmholtz Association
7. Contributions
Introduce the novel problem of keyword query routing
Propose the multi-level relationship graph to capture its
search space.
Introduce various summary models, which aim to
compactly represent the search space.
Investigate the resulting trade-offs between result quality
and efficiency through theoretical analysis and practical
experiments using publicly available linked data sources.
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
7 National Laboratory of the Helmholtz Association
8. Agenda
Introduction
Opportunities & challenges
Contributions
Problem Definition
LOD Data
Keyword Query Answer
Keyword Query Routing
Summary Models
Keyword sets
Element-level vs. schema-level vs.
source-level Summary
Validity of Results vs. complexity
Theo. / Exp. Results
8 Conclusions ducthanh.tran@kit.edu
Thanh Tran, AIFB Institute, KIT, KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
9. LOD Element-level Graph
Web data modeled as a set of interlinked data graphs
Each data graph represent a source
Element-level graph vs. schema-level graph vs. source-level graph
Freebase DBLP DBPedia
… John Music
John. Smith Award
title name label
uni1 pub2 pub1 pub3 per4 prize2
author prizes
employ author author
per2 per1 per3 prize1
sameAs sameAs prizes
name name name name label
Stanford John John John Turing
University McCarthy Mccarthy McCarthy Award
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
9 National Laboratory of the Helmholtz Association
10. LOD Schema-level Graph
Web data modeled as a set of interlinked data graphs
Each data graph represent a source
Element-level graph vs. schema-level graph vs. source-level graph
Freebase DBLP DBPedia
Written
University Article
Work
employ author author
Person Author Person Prize
sameAs sameAs prizes
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
10 National Laboratory of the Helmholtz Association
11. LOD Source-level Graph
Web data modeled as a set of interlinked data graphs
Each data graph represent a source
Element-level graph vs. schema-level graph vs. source-level graph
Freebase DBLP DBPedia
author
sames sameAs
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
11 National Laboratory of the Helmholtz Association
12. “Corresponding” Answers
User information need „stanford article award“
Freebase DBLP DBPedia
… John Music
Article
John. Smith Award
type title name label
uni1 pub2 pub1 pub3 per4 prize2
author prizes
employ author author
per2 per1 per3 prize1
sameAs sameAs prizes
name name name name label
Stanford John John John Turing
University McCarthy Mccarthy McCarthy Award
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
12 National Laboratory of the Helmholtz Association
13. Problem Definition
Keyword query result (also called Steiner graph) is a
subgraph of the union of the data- and schema-level graph
that for every keyword, contains a matching element, and
these elements are pairwise connected over a path.
d-max Steiner graph is a Steiner graph where paths
between keyword elements is d-max or less.
Keyword query routing: compute valid set of data sources
called keyword routing plan. A plan is valid if its sources
produce non-empty keyword query results.
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
13 National Laboratory of the Helmholtz Association
14. A Valid Keyword Routing Plan
User information need „stanford article award“
Freebase DBLP DBPedia
… John Music
Article
John. Smith Award
type title name label
uni1 pub2 pub1 pub3 per4 prize2
author prizes
employ author author
per2 per1 per3 prize1
sameAs sameAs prizes
name name name name label
Stanford John John John Turing
University McCarthy Mccarthy McCarthy Award
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
14 National Laboratory of the Helmholtz Association
15. The Search Space
Multi-level inter-relationship graphs capture the entire search space
Relationships between elements
and between different levels
Search space is too large!
Naïve solution not applicable: apply existing approaches to
keyword search for computing Steiner graphs
Steiner graphs might span several linked sources
Search space grow exponentially with the number of
sources and their associated links
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
15 National Laboratory of the Helmholtz Association
16. Agenda
Introduction
Opportunities & challenges
Contributions
Problem Definition
LOD Data
Keyword Query Answer
Keyword Query Routing
Summary Models
Keyword sets
Element-level vs. schema-level vs.
source-level KERG
Validity of Results vs. complexity
Theo. / Exp. Results
16 Conclusions ducthanh.tran@kit.edu
Thanh Tran, AIFB Institute, KIT, KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
17. Keyword Sets
One keyword set for every data source
Elements stand for distinct keywords mentioned in a source
Freebase DBLP DBPedia
… John Music
Smith Music
John. Smith Award
title name label
uni1 pub2 pub1 pub3 per4 prize2
author prizes
author author
per2 per1 per3 prize1
sameAs sameAs prizes
employ
Stanford John McCarthy John Award
name name name label
Stanford John John John Turing
University McCarthy John McCarthy Turing
University McCarthy Mccarthy McCarthy Award
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
17 National Laboratory of the Helmholtz Association
18. Element-level Keyword-Element Relationship Graph (E- KERG)
A keyword-element captures a keyword k and the data element mentioning k
A relationship between two keyword-elements exists iff there is a path between
their associated data elements
In d-max KERG, the paths to be considered have length d-max or less
Freebase DBLP DBPedia
pub4 per4 prize2
… John Music
John Smith Music
John. Smith Award
title name label
uni1 pub2 pub1 pub3 John
per4 Award
prize2
author prizes
author author
per2 per1 per3 prize1
sameAs sameAs prizes
employ
uni1 per2 per1 per3 prize1
Stanford John McCarthy John Award
name name name label
Stanford John John John Turing
University McCarthy John McCarthy Turin
University McCarthy Mccarthy McCarthy Award
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
18 National Laboratory of the Helmholtz Association
19. Schema-level Keyword-Element Relationship Graph (S-KERG)
A keyword-element captures a keyword k and the schema element which contains
some instances (date elements) mentioning k
A relationship between two keyword-elements exists if there is a path between some
instances of their associated schema elements
Groups elements (relationships) when they capture same pair of keywords in the
same class (same keyword relationships between same pair of classes)
Freebase DBLP DBPedia
Article
pub4 Person
per4 Prize
prize2
… John Music
John Smith Music
John. Smith Award
title name label
uni1 pub2 pub1 pub3 John
per4 Award
prize2
author prizes
author author
per2 per1 per3 prize1
sameAs sameAs prizes
employ
University
uni1 Person
per2 Author
per1 per3 prize1
Stanford John McCarthy John Award
name name name label
Stanford John John John Turing
University McCarthy
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu John McCarthy of the State of Baden-Wuerttemberg and
KIT – University
Turin
University McCarthy Mccarthy McCarthy Award
19 National Laboratory of the Helmholtz Association
20. Data-Source-level Keyword-Element Relationship Graph (D-KERG)
A keyword-element captures a keyword k and the source which contains some
instances (date elements) mentioning k
A relationship between two keyword-elements exists if there is a path between some
instances of their associated sources
Groups elements (relationships) when they capture same pair of keywords in the
same source (same keyword relationships between the same of pair sources)
Freebase DBLP DBPedia
Article
pub4 Person
per4 Prize
prize2
… John Music
John Smith Music
John. Smith Award
title name label
uni1 pub2 pub1 pub3 John
per4 Award
prize2
author prizes
author author
per2 per1 per3 prize1
sameAs sameAs prizes
employ
University
uni1 Person
per2 Author
per1 per3 prize1
Stanford John McCarthy John Award
name name name label
Stanford John John John Turing
University McCarthy
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu John McCarthy of the State of Baden-Wuerttemberg and
KIT – University
Turin
University McCarthy Mccarthy McCarthy Award
20 National Laboratory of the Helmholtz Association
21. Agenda
Introduction
Opportunities & challenges
Contributions
Problem Definition
LOD Data
Keyword Query Answer
Keyword Query Routing
Summary Models
Keyword sets
Element-level vs. schema-level vs.
source-level KERG
Validity of Results vs. complexity
Theo. / Exp. Results
22 Conclusions ducthanh.tran@kit.edu
Thanh Tran, AIFB Institute, KIT, KIT – University of the State of Baden-Wuerttemberg and
National Laboratory of the Helmholtz Association
22. Theoretical Results
When Steiner graphs can be found for K in the
data, then there will be keyword routing plan that
can be found in KERG.
The keyword routing plan derived from the
summary are not necessarily valid s.t. there might
be no corresponding Steiner graph in the data
Detailed results + algorithms + complexity results in
the paper!
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
23 National Laboratory of the Helmholtz Association
23. Experiments
Chunk of the BTC dataset containing 10M RDF
triples from 154 sources, linked via 500K mappings
Manually crafted 30 keyword valid multi-data-
source queries, i.e., produce non-empty keyword
answers and involve more than 2 sources
Town River America
Beijing Conference Database 2007
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
24 National Laboratory of the Helmholtz Association
24. Validity
P@k measure the percentage of plans that are valid out of the top-k plans
P@5 up to 100% for E-KERG (dmax =4), P@5 for KS only 6%
More valid plans were computed when a higher value was used for dmax
dmax =3 seems to be a good tradeoff
Queries with larger number of keywords resulted in lower precision
1.0 1.0
E-KERG D-KERG
E-KERG
0.9 0.9
D-KERG S-KERG KS
0.8 0.8
0.7 S-KERG 0.7
0.6 KS 0.6
P@5
P@5
0.5 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0.0 0.0
0 1 2 3 4 2 3 4 5
dmax |K|
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
25 National Laboratory of the Helmholtz Association
25. Performance
Times increased with higher values for dmax
Sharp for E-KERG and S-KERG
Relatively stable for D-KERG
Times increase with number of keywords
All other models had poor performance w.r.t complex queries but D-KERG
E-KERG needed more than 100s for queries with more than 2 keywords
Time for D-KERG was no more than 10ms on average
S-KERG D-KERG KS E-KERG S-KERG D-KERG KS E-KERG
1000000 1000000
Query Processing Time (ms)
Query Processing Time (ms)
100000 100000
10000 10000
1000 1000
100 100
10 10
1
1
0 1 2 3 4
2 3 4 5
dmax
|K|
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
26 National Laboratory of the Helmholtz Association
26. Conclusions
Keyword query routing helps users without knowledge of linked data
and schemas to find combination of sources that contain answers
corresponding to their needs
Summarizing relationships is essential for dealing with the large-scale
linked data Web (E-KERG achieved poor performance, requires more
than 100s for complex queries)
Summarizing at the level of sources (D-KERG) represents the most
practical trade-off, produces results in less than 10ms out of which
every second one was valid
However, validity still low for complex queries (<30% when 4 keywords)
Baseline approaches for novel problem
Further improve validity and consider relevance!
Combine keyword query routing with source and structured query
processing to compute final results!
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
27 National Laboratory of the Helmholtz Association
27. Thanks for Your Attention!
Institute AIFB, KIT
ducthanh.tran@kit.edu
Thanh Tran, AIFB Institute, KIT, ducthanh.tran@kit.edu KIT – University of the State of Baden-Wuerttemberg and
28 National Laboratory of the Helmholtz Association
Notes de l'éditeur
More complex information needs More precise results More integrated results
So far, these requirements have proven to be a large burden. Given the amount of linked data is large and continuously evolving, it is inherently dicultto know what is in there (i.e., the data and the schema) and to formulate the corresponding structured queries for addressing some given information needs.Hence, it is desirable to have a mechanism, which allows users to express information needs in their own words. Another aspect of dealing with the large Web of linked data is scalability. Processing the needs against the entire Web might be too time consuming and not needed, especially when users are interested in and want to choose some particular sources of information. Processing against a relevant subset of linked data identied by the user is more scalable and possibly the only practical solution for the large Web of linked data.
(Rank combination of sources)(Automatically process relevant combination of sources)Concerning these problems, the question we deal with is given the needs expressed by users as sets of keywords, are there corresponding answers in linked data and what combination of data sources shall be used to produce them?Further, the aim is not to directly compute results but to quickly identify and let users andsystem focus on the combination of sources that produce non-empty results.They recognized the fact that the computational complexity resulting from a large-scalesetting can be partially addressed when allowing users to choose and retrieve an-swers from only some particular databases. Given a set of keywords, the goal is tond and rank the single most relevant databases that contain the answers. Follow-ing this line, we propose specic solutions for the linked data context. The dier-
This novel keyword query routing problem raises additional challenges. Most notably, query keywordsmay be covered by several linked sources, resulting in a large search space. Thesize of this search space grow exponentially with the number of sources and theirassociated links. Targeting this problem of scale, we report the following contri-butions in this paper:{ We propose solutions for keyword query routing which enable the exploita-tion of linked data. Without putting any burden on the users, this kind ofapproaches help to nd relevant sources containing complex answers to ad-hocinformation needs in the large and evolving Web of linked data.{ We propose a multi-level relationship graph to capture the search space ofthe keyword query routing problem. Based on this, we elaborate on a fam-ily of summary models, which compactly represent the Web of linked data.These models capture information at dierent levels, representing summariesof dierent granularities. In a theoretical analysis, we prove that ner grainedmodels can improve the result quality. This however, comes at the expenseof higher complexity. Thus, the models represent dierent trade-os betweeneectiveness and eciency.{ In the experiments, we investigate these trade-os by analyzing the precisionand the processing time needed using dierent models. The experiments werecarried out in a real-world setting using more than 150 publicly availabledatasets, and an open-source implementation we made available at http://code.google.com/p/rdfstores/. Results of using summaries are promising.While the \\best" one shall be determined w.r.t a concrete application, there isone model that seems to represent the most practical trade-o: the D-KERGmodel, which summarizes elements according to sources, produces results inless than 10ms, out of which every second is a valid one.
Linked data can be conceived as a set of data graphs, each represents a particular source. As a working denition, we present a simple graph-based model of linked data called the Web graph. In that model, we distinguish between the - Web data graph representing relationships between individual data elements, - the Web schema graph, which captures information about group of elements, and the Web source graph that contains information at the level of data sources.- This is a simple model of linked data that omits details not necessary forthis work. In particular, data elements may correspond to RDF resources, blank nodes or literals. Schema elements might stand for classes or data types. For keyword query routing, these distinctions are not relevant but the fact that theelements can be recognized via their labels. While dierent kinds of links can beestablished, the ones frequently found are sameAs links, which denote that twoRDF resources or two classes are the same. There is also no need to distinguishthe types of links. Only the fact that sources can be reached via some kinds oflink m 2M matters.
Linked data can be conceived as a set of data graphs, each represents a particular source. As a working denition, we present a simple graph-based model of linked data called the Web graph. In that model, we distinguish between the - Web data graph representing relationships between individual data elements, - the Web schema graph, which captures information about group of elements, and the Web source graph that contains information at the level of data sources.- This is a simple model of linked data that omits details not necessary forthis work. In particular, data elements may correspond to RDF resources, blank nodes or literals. Schema elements might stand for classes or data types. For keyword query routing, these distinctions are not relevant but the fact that theelements can be recognized via their labels. While dierent kinds of links can beestablished, the ones frequently found are sameAs links, which denote that twoRDF resources or two classes are the same. There is also no need to distinguishthe types of links. Only the fact that sources can be reached via some kinds oflink m 2M matters.
Linked data can be conceived as a set of data graphs, each represents a particular source. As a working denition, we present a simple graph-based model of linked data called the Web graph. In that model, we distinguish between the - Web data graph representing relationships between individual data elements, - the Web schema graph, which captures information about group of elements, and the Web source graph that contains information at the level of data sources.- This is a simple model of linked data that omits details not necessary forthis work. In particular, data elements may correspond to RDF resources, blank nodes or literals. Schema elements might stand for classes or data types. For keyword query routing, these distinctions are not relevant but the fact that theelements can be recognized via their labels. While dierent kinds of links can beestablished, the ones frequently found are sameAs links, which denote that twoRDF resources or two classes are the same. There is also no need to distinguishthe types of links. Only the fact that sources can be reached via some kinds oflink m 2M matters.
A valid plan in our example is RP = fFreebase;DBLP;DBPediag. Note that validity does not imply relevance. That is, a valid plan ensures that resultscan be produced, but for the users, these results may dier in relevance. A properaccount of relevance and the ranking of routing plans based on the relevance oftheir results go beyond the scope of this paper, which is focused on eciencyaspects of computing valid plans. We assume a xed ranking function, whichequally applies to all summaries discussed in this paper. We refer the interestedreaders to our report [8], which discusses relevance and the ranking function.Does not consider RELEVANCE, focus on EFFICIENCY
- Keywords map against elements of the entire data web- Routing simply based on coverage- Consider further factors for data source identification, i.e. characteristics of the data, the data sources and links between them-Keyword query routing: Keyword routing in a truly distributed setting such that several data sources might be used to answer a set of keywordsOnly the highly relevant data sources are selected to answer the user query
Elements stands for all the keywordsthat are mentioned in elements of the graphs G. Every nKSk 2 NKSKis in fact atuple (k; Gk) that represents a keyword k and the graphs Gk G mentioning k.
Elements stands for all the keywordsthat are mentioned in elements of the graphs G. Every nKSk 2 NKSKis in fact atuple (k; Gk) that represents a keyword k and the graphs Gk G mentioning k.
As opposed to E-KERG, this one is indeed a summary model because itclusters two element-level relationships (hki; nKi (ni; gi;Ki)i; hkj ; nKj (nj ; gj ;Kj)i)and (hkv; nKv (nv; gv;Kv)i; hkw; nKw(nw; gw;Kw)i) to one schema-level relation-ship when they capture the same keyword relationships (i.e., ki = kvand kj = kw) between the same classes (i.e, n0i = n0v and n0j =
As opposed to E-KERG, this one is indeed a summary model because itclusters two element-level relationships (hki; nKi (ni; gi;Ki)i; hkj ; nKj (nj ; gj ;Kj)i)and (hkv; nKv (nv; gv;Kv)i; hkw; nKw(nw; gw;Kw)i) to one schema-level relation-ship when they capture the same keyword relationships (i.e., ki = kvand kj = kw) between the same classes (i.e, n0i = n0v and n0j =
Intuitively speaking, this procedure simply retrieves sources that cover thekeywords and in order to cover all jKj query keywords, it uses jKj-combinationsof these sources as routing plans.
Valid plans (D-KERG) ≤ valid plans (S-KERG) ≤ valid plans (E-KERG) All plans are valid for D-KERG when d-max (summary) ≥ d-max (Steiner graph)This procedure is the same for all KERGs. Given that the underlying datacontain results, we provide proofs in the report [8] to show that applying thisprocedure on the S-KERG summary will yield routing plans, i.e., when Steinergraphs can be found for K in the data, then there will be corresponding graphsthat can be found in the summary. Thus, given K, the procedure will output anon-empty set of RP if W contains a result for K. In the same manner, it isstraightforward to show that E-KERG and D-KERG can provide this guarantee.However, we show formally in [8] that the other way around is not true, i.e., thegraphs derived from the summary are not necessarily valid such that there mightbe no corresponding Steiner graph in the data. Thus, the fact that a routingplan can be derived from the summaries does not guarantee there exists a resultfor K. This formal result is interesting because it makes clear that while theIn summary, the percentage of valid plans for D-KERG is less or equal thatfor S-KERG, which in turn is less or equal that for E-KERG. When dsummax valueof E-KERG is suciently large to cover all paths relevant for Steiner graph com-putation, i.e., dsummax = ddatamax, this percentage is 100 for E-KERG. By chance, thepercentage of valid plans for KS might be higher than that for the summary mod-els but in general, is expected to be less (because relationships between elementsare not considered).Compared to the KERG models, KS does not capture relationships betweenkeywords at all. Given two keywords ki; kj , the sources which cover these key-words can be derived from KS, e.g. the graphs n00 i ; n00 j . However, this does notimply there exist two elements ni 2 n00 i and nj 2 n00 j , and ni !nj . More gener-ally, a combination of sources derived from KS covers all keywords but does notensure that elements matching these keywords are connected, and thus, does notnecessarily correspond to a Steiner graph.
values represent the average computed for all 30 queries. Using E-KERG, precision was up to 100 percent, i.e., for dsum max = ddatamax = 4. With P@5 being always above 0.6 whendmax > 1, S-KERG and D-KERG also achieved relatively good results. P@5 for KS was only 6%. Clearly, dmax had a positive effect. More valid plans werecomputed when a higher value was used for dmax. However, using dmax = 4instead of 3 did not yield clear improvemenFig. 4b shows the eect of query length jKj. Quite clear, queries with largernumber of keywords resulted in lower precision. It dropped as low as 0.23 whenusing D-KERG for queries with 5 keywords.KS is the model that produces only very few valid plans. This result was improved byone order of magnitude when relationships between keywords were used. The morene-grained a model captures the relationships, the larger was the percentage ofvalid plans. Even a summary at the level of sources produced reasonably highquality results, i.e., every second plan was a valid one
Performance is measured as the average response time for com-puting routing plans. Fig. 5a shows the performance for queries at various settingsusing dierent values for dmax. This parameter had no eect on the KS's resultsbut clearly inuenced the performance achieved with KERG summaries. Times increased with higher values for dmax. While this increase was sharp for E-KERGand S-KERG, time performance of D-KERG was relatively stable. In particular,time required by D-KERG was no more than 10ms on average.While the times shown are the actual times obtainedfor the other models, only the lower bound was shown for E-KERG. This is be-cause we applied a timeout of 6min. Fig. 5c shows the exact times obtained forE-KERG and the queries that had to be aborted due to timeout. For dmax = 4for instance, 1 out of every three queries was abortedExpectedly, more time was needed when the number of query keywords in-creases, as illustrated in Fig. 5b. It seems that all the other models had poorperformance w.r.t complex queries but D-KERG.
We presented a solution to the novel problem of keyword query routing. It helpsusers without knowledge of the evolving linked data and schema to ndcombina-tion of sources that contain answers corresponding to their needs. This solutionalso partially addresses the aspect of eciency as queries can be then evaluatedagainst the relevant sources identied by the user, instead of using the entire Webof linked data.We have proposed a family of summary models. Through theoretical and ex-perimental analysis, we showed that it is important to capture keyword relation-ships. Compared to the KS model representing the naive baseline that stores onlysingle keywords, the KERG models relying on relationships could produce a much