SlideShare une entreprise Scribd logo
1  sur  17
Top-k Exploration of Query Candidates
for Efficient Keyword Search on Graph-
           Shaped (RDF) Data

    Thanh Tran1, Haofen Wang2, Sebastian Rudolph1,
                      Philipp Cimiano3
      1Institute AIFB, University Karlsruhe, Germany
     2APEX Lab, Shanghai Jiao Tong University, China
    3Web Information Systems, TU Delft, Netherlands
Motivation
• Semantic search
   – Access to KB facts and semantically described documents
   – Support for expressive / precise information need
• How to capture the user’s information need?
   – Expressive queries with difficult syntax (SQL, SPARQL) vs.
     limited but intuitive queries (Keywords)
   – Expressive power is crucial!
   – Support the user in specifying information needs in an
     intuitive way is also crucial!
• Goal: Interpreting Complex Information Needs by
  Translating Keywords to Expressive Formal Queries
Related Work
• Translation of NL questions
  – Can the user specify a precise question when the
    information need is vague?
• Relaxed-structure query models
  – Require some knowledge about the query syntax and
    the structure of the underlying data
• Labeled query models
  – Require some knowledge about schema elements
• In keyword search, the user does not need to
  know about the query syntax and data schema
  – Crucial for environment like the Web where most data
    sources to be queried are unknown to the user
Scenario – Interpreting Information Needs
           User Information Need
                                            RDF Data Graph




        Query Specification


„2006   Philipp Cimiano X-Media“


           Query Translation
                      Query Processing
SELECT ?x , ? y , ? z WHERE {
? x type Publication . ? x year 2006 .
? x author ?y . ? y name ’P . Cimiano ’ .
? y worksAt ? z . ? z name ’AIFB’}
Keyword Search – An Overview
• Mapping of keywords to ”labels” of data elements
   – Result in a set of keyword elements
   – Through imprecise matching, user even does not need to know the
     labels of data elements (c.f. precise matching in [G. Bhalotia et al.])
• Data Graph exploration
   – Search for substructures (query graph) connecting keyword elements
   – Query graph vs. answer trees [H. He et al.]
   – Exploration of query graphs operates on summary of data graph only
• Top-k computation
   – Search guided by a scoring function to output only the top-k results
   – Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.]
• Mapping query graph to conjunctive query
• Processing the conjunctive query using standard query engine
Keyword Search – The Workflow
• Offline: Summarization, Scoring, Term Expansion
• Online: Query Computation, Query Processing
Graph Summarization
• Goal: preserve sufficient information to compute elements and
  structure of the query, while reducing the exploration space
• Summary graph captures relations between entity classes, thus
  preserve structural information of the original data graph




                                                  Summary Graph
     Example RDF Graph
Keyword Mapping & Graph Augmentation
•   Summary graph captures information for exploration of query structure
•   Online augmentation with elements & scores obtained from keyword mapping
•   Augmented graph contains further information for exploration of query elements


                                                                     „2006

                                                                     Philipp Cimiano

                                                                     AIFB“
                                                                    Keyword Query




    Summary Graph                   Augmented Summary Graph
Top-k Graph Exploration
• Cost-directed exploration of the graph, starting from keyword elements Nk
• Explore all possible distinct paths starting from nk 2 Nk
• At each step, take cursor (“path”) from queues with lowest cost for exploration
• When a connecting element nc is found,
   • Paths from nk to nc are merged to construct the query graph
   • Top-k is invoked to add query graph to candidate list
• Top-k terminates when highest cost of the candidate list (the cost of the k-
  ranked query graph) is found to be lower than the lowest possible cost that can
  achieved with paths in the queues yet to be explored




    Augmented Summary Graph                        Explored Paths
Mapping Query Graph to Conjunctive Query

•   Conjunctive query obtained by exhaustive application of mapping rules
     • Every value vertex vvertex  a term
     • Every class vertex cvertex  a distinct variable
     • Every A-edge e(cvertex, vvertex)  a query predicate e[var(cvertex), term(vvertex)]
     • Every R-edge e(cvertex1, cvertex2)  a query predicate e[var(cvertex1), var(cvertex2)]
•   Treat all query variables as distinguished
•   Specific mechanisms can be provided for the user to choose distinguished variables
•   Query chosen by the user finally translated to query formalism supported by the
    query engine (SPARQL) for retrieving query answers




           Query Graph                                      Conjunctive Query
Rich Client Demo – xXploreKnow!




      http://ontoware.org/projects/xxplore/
Web Demo – Q2Semantic




   http://q2semantic.apexlab.org/UI.html
Evaluation – Effectiveness
• 12 users provide 30 keyword queries on DBLP, along with the
  NL description of the information need
• Reciprocal Rank = 1/r, where r is the rank of the correct query
• A query is correct if it matches the information need
• Information need can be interpreted in most cases, in
  particular when path length, matching score as well as
  popularity of graph elements are incorporated into scoring
  function (C3)
 1
0.8
0.6                                                            C1
0.4                                                            C2
0.2                                                            C3
 0
      Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q15 Q17 Q19 Q21 Q23 Q25 Q27 Q29
              MRRs of different Scoring Functions on DBLP
Evaluation – Usability of Query Interpretation
- Standard approaches return top-k results
- Our approach based on interpretation of keywords as
  queries, i.e. compute top-k queries instead of top-k
  answer trees [V. Kacholia et al.] [H. He et al.]
- Queries are then transformed to simple natural
  language and presented to user
- 90% of users prefer to obtain question first, since it
  facilitates understanding of results
- All user prefers to do refinement on the structured
  query, rather than on the keywords, since the
  structured query can be manipulated in a more
  precise and predictable way
Evaluation – Efficiency
• Comparison with bidirectional search [V. Kacholia et al.] and search based on
  graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.])
• We measure time for query computation + time for processing several
  queries until finding 10 answers
• Outperforms bidirectional search by at least one order of magnitude
• Performs fairly well when compared to indexing based approaches

 100000
  10000                                                           Our Solution

   1000                                                           Bidirect
                                                                  1000 BFS
    100
                                                                  1000 METIS
     10                                                           300BFS
       1                                                          300METIS
           Q1   Q2    Q3   Q4   Q5   Q6   Q7    Q8      Q9 Q10

                       Query Performance on DBLP Data
Conclusions and Future Work
• Conclusions
   – A new approach for keyword search on graph-structured
     data, RDF in particular
   – Novel algorithms for the top-k exploration of subgraphs to
     compute queries as an additional intermediate step
   – Query computing is performed on an aggregated graph
     while query processing can leverage optimization
     capability of the database
• Future Work
   – Indexing connectivity and scores for further speed up
   – Consider special query operations (e.g. filters) as keywords
Thank you for your attention!

            Q&A

Contenu connexe

En vedette

Tips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraTips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraYara International
 
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationHeterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationThanh Tran
 
Faculty forum presentation march 2012
Faculty forum presentation  march 2012Faculty forum presentation  march 2012
Faculty forum presentation march 2012Jeff Simmons
 
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)Thanh Tran
 
Benefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting ProcessBenefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting Processbene_professor
 
The Information Workbench -
The Information Workbench -  The Information Workbench -
The Information Workbench - Thanh Tran
 
Diesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsDiesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsYara International
 
20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめTakeo Mizuguchi
 

En vedette (9)

Tips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yaraTips on how to use AdBlue for fleet operators and drivers by air1_yara
Tips on how to use AdBlue for fleet operators and drivers by air1_yara
 
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data IntegrationHeterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
Heterogeneous Web Data Search Using Relevance-based On The Fly Data Integration
 
Faculty forum presentation march 2012
Faculty forum presentation  march 2012Faculty forum presentation  march 2012
Faculty forum presentation march 2012
 
Genetically Modified Food
Genetically Modified FoodGenetically Modified Food
Genetically Modified Food
 
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
TYPifier: Inferring the Type Semantics of Structured Data (icde2013)
 
Benefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting ProcessBenefit Professor Sr. AE and Recruiting Process
Benefit Professor Sr. AE and Recruiting Process
 
The Information Workbench -
The Information Workbench -  The Information Workbench -
The Information Workbench -
 
Diesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact SheetsDiesel Exhaust Fluid (DEF) Fact Sheets
Diesel Exhaust Fluid (DEF) Fact Sheets
 
20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ20120410 aiming水口 ドイツゲームのすゝめ
20120410 aiming水口 ドイツゲームのすゝめ
 

Similaire à Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxElasticsearch
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Rakebul Hasan
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...HPCC Systems
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBNPlamen Petrov
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-Systeminside-BigData.com
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveXu Jiang
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RDatabricks
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...eswcsummerschool
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionRakebul Hasan
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresCrai Macdonald
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...Cambridge Semantics
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sqlaftab alam
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jDatabricks
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jFred Madrid
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Databricks
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendationsRik Van Bruggen
 

Similaire à Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data (20)

An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
An introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolboxAn introduction to Elasticsearch's advanced relevance ranking toolbox
An introduction to Elasticsearch's advanced relevance ranking toolbox
 
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...Predicting SPARQL query execution time and suggesting SPARQL queries based on...
Predicting SPARQL query execution time and suggesting SPARQL queries based on...
 
Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...Natural Language to SQL Query conversion using Machine Learning Techniques on...
Natural Language to SQL Query conversion using Machine Learning Techniques on...
 
Information Exploitation at BBN
Information Exploitation at BBNInformation Exploitation at BBN
Information Exploitation at BBN
 
The Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-SystemThe Analytics Frontier of the Hadoop Eco-System
The Analytics Frontier of the Hadoop Eco-System
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE19CS3052R-CO1-7-S7 ECE
19CS3052R-CO1-7-S7 ECE
 
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep DiveApache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
 
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and RSpark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
Spark Summit EU 2015: Combining the Strengths of MLlib, scikit-learn, and R
 
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
ESWC SS 2012 - Wednesday Tutorial Barry Norton: Building (Production) Semanti...
 
A Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance PredictionA Machine Learning Approach to SPARQL Query Performance Prediction
A Machine Learning Approach to SPARQL Query Performance Prediction
 
Efficient Query Processing Infrastructures
Efficient Query Processing InfrastructuresEfficient Query Processing Infrastructures
Efficient Query Processing Infrastructures
 
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
AnzoGraph DB: Driving AI and Machine Insights with Knowledge Graphs in a Conn...
 
Apache Spark sql
Apache Spark sqlApache Spark sql
Apache Spark sql
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4jTransforming AI with Graphs: Real World Examples using Spark and Neo4j
Transforming AI with Graphs: Real World Examples using Spark and Neo4j
 
Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...Informational Referential Integrity Constraints Support in Apache Spark with ...
Informational Referential Integrity Constraints Support in Apache Spark with ...
 
HDF5 FastQuery
HDF5 FastQueryHDF5 FastQuery
HDF5 FastQuery
 
Using graphs for recommendations
Using graphs for recommendationsUsing graphs for recommendations
Using graphs for recommendations
 

Dernier

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebUiPathCommunity
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 

Dernier (20)

DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Dev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio WebDev Dives: Streamline document processing with UiPath Studio Web
Dev Dives: Streamline document processing with UiPath Studio Web
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 

Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph-Shaped (RDF) Data

  • 1. Top-k Exploration of Query Candidates for Efficient Keyword Search on Graph- Shaped (RDF) Data Thanh Tran1, Haofen Wang2, Sebastian Rudolph1, Philipp Cimiano3 1Institute AIFB, University Karlsruhe, Germany 2APEX Lab, Shanghai Jiao Tong University, China 3Web Information Systems, TU Delft, Netherlands
  • 2. Motivation • Semantic search – Access to KB facts and semantically described documents – Support for expressive / precise information need • How to capture the user’s information need? – Expressive queries with difficult syntax (SQL, SPARQL) vs. limited but intuitive queries (Keywords) – Expressive power is crucial! – Support the user in specifying information needs in an intuitive way is also crucial! • Goal: Interpreting Complex Information Needs by Translating Keywords to Expressive Formal Queries
  • 3. Related Work • Translation of NL questions – Can the user specify a precise question when the information need is vague? • Relaxed-structure query models – Require some knowledge about the query syntax and the structure of the underlying data • Labeled query models – Require some knowledge about schema elements • In keyword search, the user does not need to know about the query syntax and data schema – Crucial for environment like the Web where most data sources to be queried are unknown to the user
  • 4. Scenario – Interpreting Information Needs User Information Need RDF Data Graph Query Specification „2006 Philipp Cimiano X-Media“ Query Translation Query Processing SELECT ?x , ? y , ? z WHERE { ? x type Publication . ? x year 2006 . ? x author ?y . ? y name ’P . Cimiano ’ . ? y worksAt ? z . ? z name ’AIFB’}
  • 5. Keyword Search – An Overview • Mapping of keywords to ”labels” of data elements – Result in a set of keyword elements – Through imprecise matching, user even does not need to know the labels of data elements (c.f. precise matching in [G. Bhalotia et al.]) • Data Graph exploration – Search for substructures (query graph) connecting keyword elements – Query graph vs. answer trees [H. He et al.] – Exploration of query graphs operates on summary of data graph only • Top-k computation – Search guided by a scoring function to output only the top-k results – Guaranteed top-k vs. approximate top-k V. [V. Kacholia et al.] • Mapping query graph to conjunctive query • Processing the conjunctive query using standard query engine
  • 6. Keyword Search – The Workflow • Offline: Summarization, Scoring, Term Expansion • Online: Query Computation, Query Processing
  • 7. Graph Summarization • Goal: preserve sufficient information to compute elements and structure of the query, while reducing the exploration space • Summary graph captures relations between entity classes, thus preserve structural information of the original data graph Summary Graph Example RDF Graph
  • 8. Keyword Mapping & Graph Augmentation • Summary graph captures information for exploration of query structure • Online augmentation with elements & scores obtained from keyword mapping • Augmented graph contains further information for exploration of query elements „2006 Philipp Cimiano AIFB“ Keyword Query Summary Graph Augmented Summary Graph
  • 9. Top-k Graph Exploration • Cost-directed exploration of the graph, starting from keyword elements Nk • Explore all possible distinct paths starting from nk 2 Nk • At each step, take cursor (“path”) from queues with lowest cost for exploration • When a connecting element nc is found, • Paths from nk to nc are merged to construct the query graph • Top-k is invoked to add query graph to candidate list • Top-k terminates when highest cost of the candidate list (the cost of the k- ranked query graph) is found to be lower than the lowest possible cost that can achieved with paths in the queues yet to be explored Augmented Summary Graph Explored Paths
  • 10. Mapping Query Graph to Conjunctive Query • Conjunctive query obtained by exhaustive application of mapping rules • Every value vertex vvertex  a term • Every class vertex cvertex  a distinct variable • Every A-edge e(cvertex, vvertex)  a query predicate e[var(cvertex), term(vvertex)] • Every R-edge e(cvertex1, cvertex2)  a query predicate e[var(cvertex1), var(cvertex2)] • Treat all query variables as distinguished • Specific mechanisms can be provided for the user to choose distinguished variables • Query chosen by the user finally translated to query formalism supported by the query engine (SPARQL) for retrieving query answers Query Graph Conjunctive Query
  • 11. Rich Client Demo – xXploreKnow! http://ontoware.org/projects/xxplore/
  • 12. Web Demo – Q2Semantic http://q2semantic.apexlab.org/UI.html
  • 13. Evaluation – Effectiveness • 12 users provide 30 keyword queries on DBLP, along with the NL description of the information need • Reciprocal Rank = 1/r, where r is the rank of the correct query • A query is correct if it matches the information need • Information need can be interpreted in most cases, in particular when path length, matching score as well as popularity of graph elements are incorporated into scoring function (C3) 1 0.8 0.6 C1 0.4 C2 0.2 C3 0 Q1 Q3 Q5 Q7 Q9 Q11 Q13 Q15 Q17 Q19 Q21 Q23 Q25 Q27 Q29 MRRs of different Scoring Functions on DBLP
  • 14. Evaluation – Usability of Query Interpretation - Standard approaches return top-k results - Our approach based on interpretation of keywords as queries, i.e. compute top-k queries instead of top-k answer trees [V. Kacholia et al.] [H. He et al.] - Queries are then transformed to simple natural language and presented to user - 90% of users prefer to obtain question first, since it facilitates understanding of results - All user prefers to do refinement on the structured query, rather than on the keywords, since the structured query can be manipulated in a more precise and predictable way
  • 15. Evaluation – Efficiency • Comparison with bidirectional search [V. Kacholia et al.] and search based on graph indexing (1000 BFS, 1000 METIS, 300 BFS, 300 METIS in [H. He et al.]) • We measure time for query computation + time for processing several queries until finding 10 answers • Outperforms bidirectional search by at least one order of magnitude • Performs fairly well when compared to indexing based approaches 100000 10000 Our Solution 1000 Bidirect 1000 BFS 100 1000 METIS 10 300BFS 1 300METIS Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Query Performance on DBLP Data
  • 16. Conclusions and Future Work • Conclusions – A new approach for keyword search on graph-structured data, RDF in particular – Novel algorithms for the top-k exploration of subgraphs to compute queries as an additional intermediate step – Query computing is performed on an aggregated graph while query processing can leverage optimization capability of the database • Future Work – Indexing connectivity and scores for further speed up – Consider special query operations (e.g. filters) as keywords
  • 17. Thank you for your attention! Q&A