SlideShare une entreprise Scribd logo
1  sur  25
Top-k Linked Data Query Processing
   Andreas Wagner, Duc Thanh Tran, Günter Ladwig,
   Andreas Harth, and Rudi Studer



Institute of Applied Informatics and Formal Description Methods (AIFB)




KIT – University of the State of Baden-Wuerttemberg and
National Research Center of the Helmholtz Association                    www.kit.edu
Introduction and Motivation


    Top-k Linked Data Query Processing


                           Evaluation Results




2   Andreas Wagner, Duc Thanh Tran, Günter Ladwig,   Institute of Applied Informatics and Formal
    Andreas Harth, and Rudi Studer                                   Description Methods (AIFB)
INTRODUCTION & MOTIVATION


3                      Institute of Applied Informatics and Formal
                                       Description Methods (AIFB)
Linked Data Query Processing


                                Linked Data Query
                                Processing Engine


                  HTTP lookup


                                                             data
                                                URI
                                                Src.
                   data sources
                                                                    Problems: Efficiency
                                                                       and Scalability




4           Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                     Institute of Applied Informatics and Formal
            Andreas Harth, and Rudi Studer                                                     Description Methods (AIFB)
Top-K Query Processing

      Users are usually interested in only a few results
      Top-K query processing addresses the efficiency and
      scalability issues
                                                            ex:sgt_pepper foaf:name
                                                               "Sgt. Pepper";
                                                            ex:song "Lucy".

     ex:beatles foaf:name    Src. 1
       "The Beatles";                                Src. 2
     ex:album ex:sgt_pepper;
     ex:album ex:help.
                                                                            SELECT * WHERE
                                           Src. 3                           {
                                                                               ex:beatles ex:album ?album .
                ex:help foaf:name                                              ?album ex:song ?song .
                   "Help!";                                                 }
                ex:song "Help!".



5                Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                       Institute of Applied Informatics and Formal
                 Andreas Harth, and Rudi Studer                                                       Description Methods (AIFB)
Contributions

      Transfer top-k query processing to the Linked Data setting

      Linked Data specific improvements of the top-k approach

      Evaluation using real-world data




6             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,   Institute of Applied Informatics and Formal
              Andreas Harth, and Rudi Studer                                   Description Methods (AIFB)
TOP-K LINKED DATA QUERY
    PROCESSING

7                      Institute of Applied Informatics and Formal
                                       Description Methods (AIFB)
Top-K Query Processing in a Linked Data
    Setting (1) – Requirements (1)

      Source index mapping triple patterns to sources containing
      bindings (e.g., [1,2])
      Ranking function determining the relevance of triple pattern
      bindings
                                                                       TP1: ex:beatles ex:album ?album .
                Linked Data                                            TP2: ?album ex:song ?song .
                Query Processing                   source
                Engine                             index
                                                                 TP2
                                                  TP1
                                                          TP2                        ex:sgt_pepper foaf:name
      score∈ [0,1]                                                                      "Sgt. Pepper";
                                                 score ∈ [2,3]         Src. 3        ex:song "Lucy".
ex:beatles foaf:name    Src. 1
  "The Beatles";                                ex:help foaf:name
ex:album ex:sgt_pepper;                            "Help!";
ex:album ex:help.                                                               Src. 2           score∈ [1,2]
                                                ex:song "Help!".


8               Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                       Institute of Applied Informatics and Formal
                Andreas Harth, and Rudi Studer                                                       Description Methods (AIFB)
Top-K Query Processing in a Linked Data
    Setting (2) – Requirements (2)

      Sorted access on each join input




                                                                         2
                                                               Src. 3
                                                 score ∈ [2,3]                                   Scheduling
                                                                          1                       Strategy
                Src. 1
                                                                         3
                                                               Src. 2
          score ∈ [0,1]                                                                    Bindings with
      TP1:                                        score ∈ [1,2]                            descending
      ex:beatles ex:album ?album                  TP2: ?album ex:song ?song                scores



9             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                  Institute of Applied Informatics and Formal
              Andreas Harth, and Rudi Studer                                                  Description Methods (AIFB)
Top-K Query Processing in a Scheduling Strategy:
                                 Linked Data
     Setting (3) – Push Bound Rank Joinsource 1
                                    Load (1)     3



                                 Score         Query Bindings – Output Queue



      Score    Seen Triples (TP1)
        1      ex:beatles ex:album
               ex:sgt_pepper                                        Score     Seen Triples (TP2)
      Score    Seen Triples (TP1)
        1      ex:beatles ex:album                                    3       ex:help ex:song "Help!"
               ex:help




      Sorted Access for                                                Sorted Access for
           ex:beatles foaf:name Src.
      ex:beatles ex:album ?album1.
             "The Beatles";                                            ?album foaf:name ?song 3
                                                                         ex:help ex:song  Src.
              ex:album ex:sgt_pepper;                                          "Help!";
              ex:album ex:help.                                             ex:song "Help!".
10                 Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                   Institute of Applied Informatics and Formal
                   Andreas Harth, and Rudi Studer                                                   Description Methods (AIFB)
Top-K Query Processing in a Linked Data
     Setting (4) – Push Bound Rank Join (2)

                                Score       Query Bindings – Output Queue
        Threshold: 4                4       ex:beatles ex:album ex:help .
                                            ex:help ex:song "Help!" .

      Score   Seen Triples (TP1)
        1     ex:beatles ex:album Found query binding with
              ex:sgt_pepper          score ≥ threshold Seen Triples (TP2)
                                                   Score
        1     ex:beatles ex:album         STOP
                                                     3   ex:help ex:song "Help!"
              ex:help



      Sorted Access for                                             Sorted Access for
      ex:beatles ex:album ?album .                                  ?album ex:song ?song

                                                                                                         Src. 2

11                Andreas Wagner, Duc Thanh Tran, Günter Ladwig,            Institute of Applied Informatics and Formal
                  Andreas Harth, and Rudi Studer                                            Description Methods (AIFB)
Improving the Threshold Estimation (1)

       Threshold estimation:
                              Threshold: max { max_1 + min_2 , max_2 + min_1 }
                 upper
               bound seen
max_1                                                                                                         max_2
 Score    Seen Triples (TP1)                                      Score   Seen Triples (TP2)
                                                          +
min_1                                                                                                         min_2

                 upper
             bound unseen


       We improve the threshold estimation:
           Star-shaped entity query bounds
           Look-ahead bounds

12               Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                 Institute of Applied Informatics and Formal
                 Andreas Harth, and Rudi Studer                                                 Description Methods (AIFB)
Improving the Threshold Estimation (2)
     Star-shaped Entity Query Bounds

       Observation: Results for entity queries come from one single
       source
       Idea: Upper bound scores for triple pattern bindings via the
       maximal possible triple score

                                                                                   score ∈ [1,2]

upper-bound                                                                        ex:sgt_pepper foaf:name
for triple                                                                            "Sgt. Pepper";
                                                                      Src. 3       ex:song "Lucy".
bindings: 3
            ex:song                  ?y
                                                  ex:help foaf:name
             ?x                                      "Help!";
                                                  ex:song "Help!".
                                                                               Src. 2

           foaf:name                 ?z
                                                    score ∈ [2,3]

upper-bound
for triple bindings: 3                                upper bound for entity query bindings: 3 + 3
13                Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                   Institute of Applied Informatics and Formal
                  Andreas Harth, and Rudi Studer                                                   Description Methods (AIFB)
Improving the Threshold Estimation (3)
     Look-ahead Bounds
       Idea: Provide a more accurate upper bound for the unseen bindings
       scores via the „next possible“ score
                                Threshold: max { 1 + 3 , 1 + 3 } = 4
                                                     2
                                Score       Query Bindings – Output Queue
                                   4        ex:beatles ex:album ex:help .
                                            ex:help ex:song "Help!" .
max_1 = 1                                                                                                      max_2 = 3

     Score   Seen Triples (TP1)                                    Score   Seen Triples (TP2)

       1     ex:beatles ex:album                                     3     ex:help ex:song "Help!" Src. 3
             ex:sgt_pepper                                                                                     min_2 = 3
       1     ex:beatles ex:album
             ex:help                                                                                           min_2 = 2

min_1 = 1                                                                Sorted Access for
                                                                         ?album ex:song ?song
                                                                                                 Src. 2
      Sorted Access for                                                     score ∈ [1,2]
      ex:beatles ex:album ?album .
14                Andreas Wagner, Duc Thanh Tran, Günter Ladwig,                  Institute of Applied Informatics and Formal
                  Andreas Harth, and Rudi Studer                                                  Description Methods (AIFB)
EVALUATION


15                Institute of Applied Informatics and Formal
                                  Description Methods (AIFB)
Evaluation – Setting

       We implemented three systems
           Push-based symmetric hash join operator [2,5]
           Standard top-k operator [6]
           Improved top-k operator


       Query set: 20 queries (8 FedBench and 12 own queries), having
       varying result size (1 to ~10.000) and complexity (2 to 5 triple
       patterns)

       Data set: ~ 2.000.000 triples, distributed over ~700.000 sources

       Parameters: k ∈ {1,5,10,20} and score distributions ∈ {uniform,
       normal, exponential}



16             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,   Institute of Applied Informatics and Formal
               Andreas Harth, and Rudi Studer                                   Description Methods (AIFB)
Evaluation – Results (1)

       Overall Results




               Overview of processing times for all queries (k = 1, d = n)


       Top-k strategies lead to runtime improvement of 35% on average
       (compared to standard Linked Data processing)

       Tighter bounding lead to further improvements of 12% on average
       (compared to standard top-k processing)
17             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,    Institute of Applied Informatics and Formal
               Andreas Harth, and Rudi Studer                                    Description Methods (AIFB)
Evaluation – Results (2)

       Effect of K and Score Distributions




18             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,   Institute of Applied Informatics and Formal
               Andreas Harth, and Rudi Studer                                   Description Methods (AIFB)
CONCLUSION


19                Institute of Applied Informatics and Formal
                                  Description Methods (AIFB)
Conclusion

      We showed that top-k processing techniques are applicable
      to the Linked Data setting.

      Top-k strategies lead to significant time savings w.r.t. small
      values of k (in our experiments 35% on average)

      We showed that our improved top-k strategy lead to further
      runtime advantages (in our experiments 12% on average)




20            Andreas Wagner, Duc Thanh Tran, Günter Ladwig,   Institute of Applied Informatics and Formal
              Andreas Harth, and Rudi Studer                                   Description Methods (AIFB)
QUESTIONS


21               Institute of Applied Informatics and Formal
                                 Description Methods (AIFB)
REFERENCES


22                Institute of Applied Informatics and Formal
                                  Description Methods (AIFB)
References
     [1]   A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. Sattler, and J. Umbrich. Data
           summaries for on-demand queries over linked data. In World Wide Web,
           2010.
     [2]   G. Ladwig and T. Tran. Linked Data Query Processing Strategies. In ISWC,
           2010.
     [3]   M. Wu, L. Berti-Equille, A. Marian, C. M. Procopiuc, and D. Srivastava.
           Processing top-k join queries. Proc. VLDB Endow., pages 860–870, 2010.
     [4]   A. Harth, S. Kinsella, and S. Decker. Using naming authority to rank data and
           ontologies for web search. In ISWC, pages 277–292, 2009.
     [5]   G. Ladwig and T. Tran. SIHJoin: Querying Remote and Local Linked Data. In
           ESWC, 2011.
     [6]    K. Schnaitter and N. Polyzotis. Optimal algorithms for evaluating rank joins in
           database systems. ACM Trans. Database Syst., 35:6:1–6:47, 2010.




23             Andreas Wagner, Duc Thanh Tran, Günter Ladwig,          Institute of Applied Informatics and Formal
               Andreas Harth, and Rudi Studer                                          Description Methods (AIFB)
BACKUP SLIDES


24                   Institute of Applied Informatics and Formal
                                     Description Methods (AIFB)
Early Pruning of Partial Results

       Motivation: Top-k join processing can be quite costly in terms of
       memory consumption
       Idea: Prune such partial query results that cannot contribute to
       a final top-k result
                                                         Currently known top-2 results:
                                                         Rank       Query Bindings – Output Queue
                                                            6       ex:help foaf:name "Help!".
               ex:song              ?y                              ex:help ex:song "Help!" .
                                                            4       ex:sgt_pepper foaf:name "Sgt. Pepper".
             ?x                                                     ex:sgt_pepper ex:song "Lucy".

              foaf:name             ?z                   Currently known partial results:

upper-bound
                                                         Rank       Triple Pattern Binding
                                                                                               ≤
for triple bindings: 3                                          1   ex:sgt_pepper ex:song "Getting Better".

                                         +
25                                                  maximal score: 3 + 1 = 4 Institute of Applied Informatics and Formal
               Andreas Wagner, Duc Thanh Tran, Günter Ladwig,
               Andreas Harth, and Rudi Studer                                                       Description Methods (AIFB)

Contenu connexe

Similaire à Linked Data Top-K Query Processing

Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Rich Heimann
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown BagDataTactics
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce worldYu Liu
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...Amazon Web Services
 

Similaire à Linked Data Top-K Query Processing (6)

Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)Data Tactics Data Science Brown Bag (April 2014)
Data Tactics Data Science Brown Bag (April 2014)
 
Data Science and Analytics Brown Bag
Data Science and Analytics Brown BagData Science and Analytics Brown Bag
Data Science and Analytics Brown Bag
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Tree representation in map reduce world
Tree representation  in map reduce worldTree representation  in map reduce world
Tree representation in map reduce world
 
main_presentation
main_presentationmain_presentation
main_presentation
 
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...SQL to NoSQL   Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
SQL to NoSQL Best Practices with Amazon DynamoDB - AWS July 2016 Webinar Se...
 

Dernier

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxmanuelaromero2013
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991RKavithamani
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityGeoBlogs
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104misteraugie
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfchloefrazer622
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application ) Sakshi Ghasle
 

Dernier (20)

18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
How to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptxHow to Make a Pirate ship Primary Education.pptx
How to Make a Pirate ship Primary Education.pptx
 
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
Industrial Policy - 1948, 1956, 1973, 1977, 1980, 1991
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Paris 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activityParis 2024 Olympic Geographies - an activity
Paris 2024 Olympic Geographies - an activity
 
Measures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and ModeMeasures of Central Tendency: Mean, Median and Mode
Measures of Central Tendency: Mean, Median and Mode
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
Arihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdfArihant handbook biology for class 11 .pdf
Arihant handbook biology for class 11 .pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Hybridoma Technology ( Production , Purification , and Application )
Hybridoma Technology  ( Production , Purification , and Application  ) Hybridoma Technology  ( Production , Purification , and Application  )
Hybridoma Technology ( Production , Purification , and Application )
 

Linked Data Top-K Query Processing

  • 1. Top-k Linked Data Query Processing Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer Institute of Applied Informatics and Formal Description Methods (AIFB) KIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association www.kit.edu
  • 2. Introduction and Motivation Top-k Linked Data Query Processing Evaluation Results 2 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 3. INTRODUCTION & MOTIVATION 3 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 4. Linked Data Query Processing Linked Data Query Processing Engine HTTP lookup data URI Src. data sources Problems: Efficiency and Scalability 4 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 5. Top-K Query Processing Users are usually interested in only a few results Top-K query processing addresses the efficiency and scalability issues ex:sgt_pepper foaf:name "Sgt. Pepper"; ex:song "Lucy". ex:beatles foaf:name Src. 1 "The Beatles"; Src. 2 ex:album ex:sgt_pepper; ex:album ex:help. SELECT * WHERE Src. 3 { ex:beatles ex:album ?album . ex:help foaf:name ?album ex:song ?song . "Help!"; } ex:song "Help!". 5 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 6. Contributions Transfer top-k query processing to the Linked Data setting Linked Data specific improvements of the top-k approach Evaluation using real-world data 6 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 7. TOP-K LINKED DATA QUERY PROCESSING 7 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 8. Top-K Query Processing in a Linked Data Setting (1) – Requirements (1) Source index mapping triple patterns to sources containing bindings (e.g., [1,2]) Ranking function determining the relevance of triple pattern bindings TP1: ex:beatles ex:album ?album . Linked Data TP2: ?album ex:song ?song . Query Processing source Engine index TP2 TP1 TP2 ex:sgt_pepper foaf:name score∈ [0,1] "Sgt. Pepper"; score ∈ [2,3] Src. 3 ex:song "Lucy". ex:beatles foaf:name Src. 1 "The Beatles"; ex:help foaf:name ex:album ex:sgt_pepper; "Help!"; ex:album ex:help. Src. 2 score∈ [1,2] ex:song "Help!". 8 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 9. Top-K Query Processing in a Linked Data Setting (2) – Requirements (2) Sorted access on each join input 2 Src. 3 score ∈ [2,3] Scheduling 1 Strategy Src. 1 3 Src. 2 score ∈ [0,1] Bindings with TP1: score ∈ [1,2] descending ex:beatles ex:album ?album TP2: ?album ex:song ?song scores 9 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 10. Top-K Query Processing in a Scheduling Strategy: Linked Data Setting (3) – Push Bound Rank Joinsource 1 Load (1) 3 Score Query Bindings – Output Queue Score Seen Triples (TP1) 1 ex:beatles ex:album ex:sgt_pepper Score Seen Triples (TP2) Score Seen Triples (TP1) 1 ex:beatles ex:album 3 ex:help ex:song "Help!" ex:help Sorted Access for Sorted Access for ex:beatles foaf:name Src. ex:beatles ex:album ?album1. "The Beatles"; ?album foaf:name ?song 3 ex:help ex:song Src. ex:album ex:sgt_pepper; "Help!"; ex:album ex:help. ex:song "Help!". 10 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 11. Top-K Query Processing in a Linked Data Setting (4) – Push Bound Rank Join (2) Score Query Bindings – Output Queue Threshold: 4 4 ex:beatles ex:album ex:help . ex:help ex:song "Help!" . Score Seen Triples (TP1) 1 ex:beatles ex:album Found query binding with ex:sgt_pepper score ≥ threshold Seen Triples (TP2) Score 1 ex:beatles ex:album STOP 3 ex:help ex:song "Help!" ex:help Sorted Access for Sorted Access for ex:beatles ex:album ?album . ?album ex:song ?song Src. 2 11 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 12. Improving the Threshold Estimation (1) Threshold estimation: Threshold: max { max_1 + min_2 , max_2 + min_1 } upper bound seen max_1 max_2 Score Seen Triples (TP1) Score Seen Triples (TP2) + min_1 min_2 upper bound unseen We improve the threshold estimation: Star-shaped entity query bounds Look-ahead bounds 12 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 13. Improving the Threshold Estimation (2) Star-shaped Entity Query Bounds Observation: Results for entity queries come from one single source Idea: Upper bound scores for triple pattern bindings via the maximal possible triple score score ∈ [1,2] upper-bound ex:sgt_pepper foaf:name for triple "Sgt. Pepper"; Src. 3 ex:song "Lucy". bindings: 3 ex:song ?y ex:help foaf:name ?x "Help!"; ex:song "Help!". Src. 2 foaf:name ?z score ∈ [2,3] upper-bound for triple bindings: 3 upper bound for entity query bindings: 3 + 3 13 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 14. Improving the Threshold Estimation (3) Look-ahead Bounds Idea: Provide a more accurate upper bound for the unseen bindings scores via the „next possible“ score Threshold: max { 1 + 3 , 1 + 3 } = 4 2 Score Query Bindings – Output Queue 4 ex:beatles ex:album ex:help . ex:help ex:song "Help!" . max_1 = 1 max_2 = 3 Score Seen Triples (TP1) Score Seen Triples (TP2) 1 ex:beatles ex:album 3 ex:help ex:song "Help!" Src. 3 ex:sgt_pepper min_2 = 3 1 ex:beatles ex:album ex:help min_2 = 2 min_1 = 1 Sorted Access for ?album ex:song ?song Src. 2 Sorted Access for score ∈ [1,2] ex:beatles ex:album ?album . 14 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 15. EVALUATION 15 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 16. Evaluation – Setting We implemented three systems Push-based symmetric hash join operator [2,5] Standard top-k operator [6] Improved top-k operator Query set: 20 queries (8 FedBench and 12 own queries), having varying result size (1 to ~10.000) and complexity (2 to 5 triple patterns) Data set: ~ 2.000.000 triples, distributed over ~700.000 sources Parameters: k ∈ {1,5,10,20} and score distributions ∈ {uniform, normal, exponential} 16 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 17. Evaluation – Results (1) Overall Results Overview of processing times for all queries (k = 1, d = n) Top-k strategies lead to runtime improvement of 35% on average (compared to standard Linked Data processing) Tighter bounding lead to further improvements of 12% on average (compared to standard top-k processing) 17 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 18. Evaluation – Results (2) Effect of K and Score Distributions 18 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 19. CONCLUSION 19 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 20. Conclusion We showed that top-k processing techniques are applicable to the Linked Data setting. Top-k strategies lead to significant time savings w.r.t. small values of k (in our experiments 35% on average) We showed that our improved top-k strategy lead to further runtime advantages (in our experiments 12% on average) 20 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 21. QUESTIONS 21 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 22. REFERENCES 22 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 23. References [1] A. Harth, K. Hose, M. Karnstedt, A. Polleres, K. Sattler, and J. Umbrich. Data summaries for on-demand queries over linked data. In World Wide Web, 2010. [2] G. Ladwig and T. Tran. Linked Data Query Processing Strategies. In ISWC, 2010. [3] M. Wu, L. Berti-Equille, A. Marian, C. M. Procopiuc, and D. Srivastava. Processing top-k join queries. Proc. VLDB Endow., pages 860–870, 2010. [4] A. Harth, S. Kinsella, and S. Decker. Using naming authority to rank data and ontologies for web search. In ISWC, pages 277–292, 2009. [5] G. Ladwig and T. Tran. SIHJoin: Querying Remote and Local Linked Data. In ESWC, 2011. [6] K. Schnaitter and N. Polyzotis. Optimal algorithms for evaluating rank joins in database systems. ACM Trans. Database Syst., 35:6:1–6:47, 2010. 23 Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Institute of Applied Informatics and Formal Andreas Harth, and Rudi Studer Description Methods (AIFB)
  • 24. BACKUP SLIDES 24 Institute of Applied Informatics and Formal Description Methods (AIFB)
  • 25. Early Pruning of Partial Results Motivation: Top-k join processing can be quite costly in terms of memory consumption Idea: Prune such partial query results that cannot contribute to a final top-k result Currently known top-2 results: Rank Query Bindings – Output Queue 6 ex:help foaf:name "Help!". ex:song ?y ex:help ex:song "Help!" . 4 ex:sgt_pepper foaf:name "Sgt. Pepper". ?x ex:sgt_pepper ex:song "Lucy". foaf:name ?z Currently known partial results: upper-bound Rank Triple Pattern Binding ≤ for triple bindings: 3 1 ex:sgt_pepper ex:song "Getting Better". + 25 maximal score: 3 + 1 = 4 Institute of Applied Informatics and Formal Andreas Wagner, Duc Thanh Tran, Günter Ladwig, Andreas Harth, and Rudi Studer Description Methods (AIFB)

Notes de l'éditeur

  1. Introduction:* Challenges in Current Linked Data Query Processing*Processing of Ranked Linked Data* Our ContributionsTop-k* Top-K Query Processing in a Linked Data Setting* Improving the Threshold Estimation* Eager Pruning of Partial Results
  2. * Special case of federated query processing* Only http-lookups are availablefor data access* Entire sources have to be retrieved
  3. * Provides strategies for computing only the k top-ranked results*Other (less relevant) results are not materialized* For computing the top-1 result, no data from src. 2 is needed.
  4. *Tighter threshold estimation and early partial result pruning
  5. * For instance, scores for triples can be obtained through PageRank inspired ranking [4]* However, no triples are indexed (i.e., each source must be scanned)
  6. * Join inputs must be accessible in a descending score order* We store min/max triple score per source, and allow sources to be accessed in descending score order (via a scheduling strategy)
  7. * Given our ranking function, sorted access and source index we can employ a push-based rank join
  8. * The threshold allows us estimate scores of the unseen query result bindings and terminate early
  9. Push-based symmetric hash join operator (shj) Rank-join operator with corner-bound (rj-cc) [6] Rank-join operator with tigther corner-bound and early pruning (rj-tc)* (all push-based join processing and left-deep join trees): * (due to network latency issues, sources were downloaded and Linked Data access was simulated on one single machine)
  10. * Differences due to less input data retrieved* Some queries (e.g., q10 or q20) equal as result set too small (i.e., all (!) data had to retrieved)* Differences between rj-cc and rj-tc not showing properly in (a) as evaluation was on local machineOutlier q19 due to implementation issueQ9: early pruning: 8% of buffered data safed. However, no „real“ impact on efficiency -> main aspect here is number of source to be retrieved
  11. (b) Average number of sources (different k, d = n). (c) Average evaluation time (different k, d = n). (d) Average evaluation time (different n, k = 10). (e) Average evaluation time with varying number of triple patterns (k = 1, d = n).
  12. Q9: early pruning: 8% of buffered data safed. However, no „real“ impact on efficiency -> main aspect here is number of source to be retrieved
  13. * ( „seen“ and „output“ buffers)* That is, any partial result having a (partial) score that together the maximal possible score for the unevaluated query part is ≤ than the currently smallest top-k score