SlideShare une entreprise Scribd logo
1  sur  20
Combining Inverted Indices and
Structured Search for
Ad-hoc Object Retrieval
Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux
eXascale Infolab - University of Fribourg - Switzerland
{firstname.lastname}@unifr.ch
SIGIR2012 - Monday, August 13th 2012
2
Motivation
• Lot of search engines queries are
about entities.
• Increasingly large amount of entity
data online.
• Often represented as huge graphs
• e.g. the LOD cloud, Google
Knowledge Graph, Facebook social
graph.
• Globally unique Entity identifiers
(e.g., URIs) .
• Hard to discover and/or
memorize.
3
Ad-hoc Object Retrieval
(informal definition)
• “Given the description of an entity, give me back its identifier”
• Description can be keywords (e.g., “Harry Potter”).
• More than one identifier per entity (e.g., dbpedia +
freebase).
• How to evaluate returned results?
Ad-hoc Object Retrieval
(formal definition by Pound et al.)
• Input: unstructured query q
and data graph G.
• Output: ranked list of
resource identifiers (URIs)
from G.
• Evaluation: results (URIs)
scored by a judge with
access to all the information
contained in or linked to the
resource.
• Standard collections exist.
+
1. http://ex.plode.us/tag/harry+potter
1. http://www.vox.com/explore/interests/harry%20potter
1. http://www.flickr.com/groups/harrypotterandthedeathlyhallo
ws/
1. http://harrypotter.wizards.pro/
1. http://ex.plode.us/tag/harry+potter
1. http://www.vox.com/explore/interests/harry%20potter
1. http://www.flickr.com/groups/harrypotterandthedeathlyhallo
ws/
1. http://harrypotter.wizards.pro/
http://dbpedia.org/resource/Harry_Potter_and_the_Deathly_Hallows
http://www.aktors.org/scripts/EdinburghPeople.dome#stephenp.aiai.ed.ac.u
k
http://harrypotter.wizards.pro/
http://ebiquity.umbc.edu/person/html/Harry/Chen/
http://dbpedia.org/resource/Ceramist
http://dbpedia.org/resource/Harry_Potter_and_the_Deathly_Hallows
http://www.aktors.org/scripts/EdinburghPeople.dome#stephenp.aiai.ed.ac.u
k
http://harrypotter.wizards.pro/
http://ebiquity.umbc.edu/person/html/Harry/Chen/
http://dbpedia.org/resource/Ceramist
5
Overview of Our Solution
Inverted indices on
the LOD Cloud...
...and RDF store
containing the data.
Simple NLP techniques,
Autocompletion,
Pseudo-relevance feedback
BM25,
BM25F
6
Pseudo-Relevance Feedback
NLP techniques
Query auto-
completion
A Simple Example
SIGIRSIGIR
Graph traversals
Final ranking function
2. http://freebase.com/…/sigir
3. http://dbpedia.org/…/IRAQ
…
1. http://dbpedia.org/…/SIGIR
Which properties
should we follow?
How to rank new
results?
II + ranking function(s)
2. http://dbpedia.org/…/IRAQ
3. …
…
1. http://dbpedia.org/…/SIGIR
How to build
the II?
7
Outline
1. Inverted Indices
2. Graph Based Entity Search
1. Object Properties vs Datatype Properties
2. Properties to Follow
3. Experimental Results
1. Experimental Setting
2. IR Techniques: Experimental Results
3. Evaluation of the Hybrid Approaches
4. Overhead of the Graph Traversal
8
1. Inverted Indices (IIs)
• Simple inverted index:
• index all literals attached to each
node in the input graph.
• “movie” http://…types/film→
• Structured inverted index with three
fields:
• URI - tokenized URIs identifying
entities.
• Label - manually selected datatype
properties to textual descriptions of
the entity (e.g., label, title, name, full-
name, …).
• Attributes - all other literals.
BM25(F), query auto-completion, query extension, relevance
8
9
New URIs
...
2. Graph-Based Entity Search
IR results
...
...
N
p1
p2
p_m
p1
p2
p_m
sim(e, q) > τ?
...
Assign Scores
0.284
1.428
0.556
Merged Re-
Ranked Results
...
Take top-N
docs.
Follow
links/properties
and get new
URIs.
Filter new
results by text
similarity wrt
the user query.
Scoring functions:
count sim > τ,
avg sim > τ,
Sum sim,
Avg sim,
Sum BM25 - ε
10
2. 1. Object Properties vs
Datatype Properties
• Object Properties:
• connect different entities
• explore all the graph
• Datatype properties:
• give additional info about
entities
• explore just the
neighborhood of a node
11
2.2. properties to follow
• RDF graph queried with SPARQL queries.
• Scope 1 queries vs Scope 2 queries.
• Set of predicates to follow selected using:
• Common sense (e.g., sameAs)
• Statistics from the data
12
properties to follow:
Two Examples
Entry point
given by the II
13
3. Experimental results
14
3.1 Experimental Setting
• SemSearch 2010 and 2011 testsets:
• Billion Triple Challenge 2009 (BTC2009)
• 1.3 billions RDF triples crawled from the LOD cloud.
• 92 and 50 queries, respectively.
• Evaluation of systems with depth-10 pooling by means of
crowdsourcing.
• Measures taken into consideration: Mean Average Precision (MAP),
Normalized Discounted Cumulative Gain (NDCG), early Precision
(P10)
15
Completing Relevance by
Crowdsourcing Judgements
• We obtained relevance judgments for unjudged entities in
the top-10 results of our runs by using Amazon MTurk.
• To be fair we used the same design and settings that were
used for the AOR task of SemSearch.
16
3.2. IR Techniques: Experimental
ResultsOur
Baseline.
18
3.3. Evaluation of Hybrid
Approaches N = 3, = 0,τ
score = sumBM25 - ε
19
3.4. Overhead of the Graph
traversal
• Time in milliseconds
needed for each part of the
hybrid approaches.
• Measures taken on a single
machine with cold cache.
Surprisingly small
overhead (17% for best
results).
20
Conclusions
• AOR = “Given the description of an entity, give me back its identifier”
• Disappointing results using simple IR techniques for AOR task.
• Hybrid system for AOR:
• combining classic IR techniques + structured database storing graph
data.
• Our evaluation shows that the new approach leads to significantly better
results (up to +25% MAP over BM25 baseline).
• For the best working configuration found, the overhead caused from the
graph traversal part is limited (17% more than running the chosen
baseline).
21
Thank you for your attention
• You can find the new relevance judgments at
http://diuf.unifr.ch/xi/HybridAOR.
• More info at www.exascale.info.
• In the following days you’ll find our paper, this presentation,
and the new crowdsourced relevance judgements at
www.exascale.info/AOR.

Contenu connexe

Tendances

Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010
Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010
Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010Yahoo Developer Network
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyrC. Tobin Magle
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its typesNavtar Sidhu Brar
 
Redis Day TLV 2018 - Graph Distribution
Redis Day TLV 2018 - Graph DistributionRedis Day TLV 2018 - Graph Distribution
Redis Day TLV 2018 - Graph DistributionRedis Labs
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Kira
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.C. Tobin Magle
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesGábor Szárnyas
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonMOHITKUMAR1379
 
Java Extension Methods
Java Extension MethodsJava Extension Methods
Java Extension MethodsAndreas Enbohm
 
Java Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJava Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJamsher bhanbhro
 
IR-ranking
IR-rankingIR-ranking
IR-rankingFELIX75
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structureeShikshak
 

Tendances (20)

Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010
Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010
Efficient Parallel Set-Similarity Joins Using Hadoop__HadoopSummit2010
 
Data wrangling with dplyr
Data wrangling with dplyrData wrangling with dplyr
Data wrangling with dplyr
 
Data structure and its types
Data structure and its typesData structure and its types
Data structure and its types
 
Redis Day TLV 2018 - Graph Distribution
Redis Day TLV 2018 - Graph DistributionRedis Day TLV 2018 - Graph Distribution
Redis Day TLV 2018 - Graph Distribution
 
Data Structure
Data StructureData Structure
Data Structure
 
Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)Tutorial 9 (bloom filters)
Tutorial 9 (bloom filters)
 
Python networkx library quick start guide
Python networkx library quick start guidePython networkx library quick start guide
Python networkx library quick start guide
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 
Roberto Trasarti PhD Thesis
Roberto Trasarti PhD ThesisRoberto Trasarti PhD Thesis
Roberto Trasarti PhD Thesis
 
Data structure
Data structureData structure
Data structure
 
Ghost
GhostGhost
Ghost
 
Empirical Semantics
Empirical SemanticsEmpirical Semantics
Empirical Semantics
 
Basic data analysis using R.
Basic data analysis using R.Basic data analysis using R.
Basic data analysis using R.
 
Incremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher QueriesIncremental View Maintenance for openCypher Queries
Incremental View Maintenance for openCypher Queries
 
Data Wrangling and Visualization Using Python
Data Wrangling and Visualization Using PythonData Wrangling and Visualization Using Python
Data Wrangling and Visualization Using Python
 
Java Extension Methods
Java Extension MethodsJava Extension Methods
Java Extension Methods
 
Java Arrays and DateTime Functions
Java Arrays and DateTime FunctionsJava Arrays and DateTime Functions
Java Arrays and DateTime Functions
 
What is data structure
What is data structureWhat is data structure
What is data structure
 
IR-ranking
IR-rankingIR-ranking
IR-ranking
 
Introduction of data structure
Introduction of data structureIntroduction of data structure
Introduction of data structure
 

Similaire à Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval

Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsIRJET Journal
 
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...pdemian
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionseXascale Infolab
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionSaibee Alam
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchTill Blume
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsRajendran
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock PredictionLim Zhi Yuan (Zane)
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learningKiran Lonikar
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper PresentationShubham Singh
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material Bryan Yang
 

Similaire à Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval (20)

Intro_2.ppt
Intro_2.pptIntro_2.ppt
Intro_2.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Intro.ppt
Intro.pptIntro.ppt
Intro.ppt
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
3DIR: Exploiting Topological Relationships in Three-dimensional Information R...
 
ProjectReport
ProjectReportProjectReport
ProjectReport
 
Effective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web CollectionsEffective Named Entity Recognition for Idiosyncratic Web Collections
Effective Named Entity Recognition for Idiosyncratic Web Collections
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Introduction to image processing and pattern recognition
Introduction to image processing and pattern recognitionIntroduction to image processing and pattern recognition
Introduction to image processing and pattern recognition
 
Indexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data searchIndexing data on the web a comparison of schema level indices for data search
Indexing data on the web a comparison of schema level indices for data search
 
Basic terminologies & asymptotic notations
Basic terminologies & asymptotic notationsBasic terminologies & asymptotic notations
Basic terminologies & asymptotic notations
 
Deep Learning for Stock Prediction
Deep Learning for Stock PredictionDeep Learning for Stock Prediction
Deep Learning for Stock Prediction
 
2015 03-28-eb-final
2015 03-28-eb-final2015 03-28-eb-final
2015 03-28-eb-final
 
Introduction to machine_learning
Introduction to machine_learningIntroduction to machine_learning
Introduction to machine_learning
 
Big Data and IOT
Big Data and IOTBig Data and IOT
Big Data and IOT
 
Term Paper Presentation
Term Paper PresentationTerm Paper Presentation
Term Paper Presentation
 
Spark MLlib - Training Material
Spark MLlib - Training Material Spark MLlib - Training Material
Spark MLlib - Training Material
 
Pandas application
Pandas applicationPandas application
Pandas application
 

Plus de eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictioneXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex GraphseXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapeXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceanseXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutioneXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataeXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data ManagementeXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataeXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataeXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingeXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingeXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big DataeXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)eXascale Infolab
 

Plus de eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 

Dernier

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 

Dernier (20)

presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval

  • 1. Combining Inverted Indices and Structured Search for Ad-hoc Object Retrieval Alberto Tonon, Gianluca Demartini, Philippe Cudré-Mauroux eXascale Infolab - University of Fribourg - Switzerland {firstname.lastname}@unifr.ch SIGIR2012 - Monday, August 13th 2012
  • 2. 2 Motivation • Lot of search engines queries are about entities. • Increasingly large amount of entity data online. • Often represented as huge graphs • e.g. the LOD cloud, Google Knowledge Graph, Facebook social graph. • Globally unique Entity identifiers (e.g., URIs) . • Hard to discover and/or memorize.
  • 3. 3 Ad-hoc Object Retrieval (informal definition) • “Given the description of an entity, give me back its identifier” • Description can be keywords (e.g., “Harry Potter”). • More than one identifier per entity (e.g., dbpedia + freebase). • How to evaluate returned results?
  • 4. Ad-hoc Object Retrieval (formal definition by Pound et al.) • Input: unstructured query q and data graph G. • Output: ranked list of resource identifiers (URIs) from G. • Evaluation: results (URIs) scored by a judge with access to all the information contained in or linked to the resource. • Standard collections exist. + 1. http://ex.plode.us/tag/harry+potter 1. http://www.vox.com/explore/interests/harry%20potter 1. http://www.flickr.com/groups/harrypotterandthedeathlyhallo ws/ 1. http://harrypotter.wizards.pro/ 1. http://ex.plode.us/tag/harry+potter 1. http://www.vox.com/explore/interests/harry%20potter 1. http://www.flickr.com/groups/harrypotterandthedeathlyhallo ws/ 1. http://harrypotter.wizards.pro/ http://dbpedia.org/resource/Harry_Potter_and_the_Deathly_Hallows http://www.aktors.org/scripts/EdinburghPeople.dome#stephenp.aiai.ed.ac.u k http://harrypotter.wizards.pro/ http://ebiquity.umbc.edu/person/html/Harry/Chen/ http://dbpedia.org/resource/Ceramist http://dbpedia.org/resource/Harry_Potter_and_the_Deathly_Hallows http://www.aktors.org/scripts/EdinburghPeople.dome#stephenp.aiai.ed.ac.u k http://harrypotter.wizards.pro/ http://ebiquity.umbc.edu/person/html/Harry/Chen/ http://dbpedia.org/resource/Ceramist
  • 5. 5 Overview of Our Solution Inverted indices on the LOD Cloud... ...and RDF store containing the data. Simple NLP techniques, Autocompletion, Pseudo-relevance feedback BM25, BM25F
  • 6. 6 Pseudo-Relevance Feedback NLP techniques Query auto- completion A Simple Example SIGIRSIGIR Graph traversals Final ranking function 2. http://freebase.com/…/sigir 3. http://dbpedia.org/…/IRAQ … 1. http://dbpedia.org/…/SIGIR Which properties should we follow? How to rank new results? II + ranking function(s) 2. http://dbpedia.org/…/IRAQ 3. … … 1. http://dbpedia.org/…/SIGIR How to build the II?
  • 7. 7 Outline 1. Inverted Indices 2. Graph Based Entity Search 1. Object Properties vs Datatype Properties 2. Properties to Follow 3. Experimental Results 1. Experimental Setting 2. IR Techniques: Experimental Results 3. Evaluation of the Hybrid Approaches 4. Overhead of the Graph Traversal
  • 8. 8 1. Inverted Indices (IIs) • Simple inverted index: • index all literals attached to each node in the input graph. • “movie” http://…types/film→ • Structured inverted index with three fields: • URI - tokenized URIs identifying entities. • Label - manually selected datatype properties to textual descriptions of the entity (e.g., label, title, name, full- name, …). • Attributes - all other literals. BM25(F), query auto-completion, query extension, relevance 8
  • 9. 9 New URIs ... 2. Graph-Based Entity Search IR results ... ... N p1 p2 p_m p1 p2 p_m sim(e, q) > τ? ... Assign Scores 0.284 1.428 0.556 Merged Re- Ranked Results ... Take top-N docs. Follow links/properties and get new URIs. Filter new results by text similarity wrt the user query. Scoring functions: count sim > τ, avg sim > τ, Sum sim, Avg sim, Sum BM25 - ε
  • 10. 10 2. 1. Object Properties vs Datatype Properties • Object Properties: • connect different entities • explore all the graph • Datatype properties: • give additional info about entities • explore just the neighborhood of a node
  • 11. 11 2.2. properties to follow • RDF graph queried with SPARQL queries. • Scope 1 queries vs Scope 2 queries. • Set of predicates to follow selected using: • Common sense (e.g., sameAs) • Statistics from the data
  • 12. 12 properties to follow: Two Examples Entry point given by the II
  • 14. 14 3.1 Experimental Setting • SemSearch 2010 and 2011 testsets: • Billion Triple Challenge 2009 (BTC2009) • 1.3 billions RDF triples crawled from the LOD cloud. • 92 and 50 queries, respectively. • Evaluation of systems with depth-10 pooling by means of crowdsourcing. • Measures taken into consideration: Mean Average Precision (MAP), Normalized Discounted Cumulative Gain (NDCG), early Precision (P10)
  • 15. 15 Completing Relevance by Crowdsourcing Judgements • We obtained relevance judgments for unjudged entities in the top-10 results of our runs by using Amazon MTurk. • To be fair we used the same design and settings that were used for the AOR task of SemSearch.
  • 16. 16 3.2. IR Techniques: Experimental ResultsOur Baseline.
  • 17. 18 3.3. Evaluation of Hybrid Approaches N = 3, = 0,τ score = sumBM25 - ε
  • 18. 19 3.4. Overhead of the Graph traversal • Time in milliseconds needed for each part of the hybrid approaches. • Measures taken on a single machine with cold cache. Surprisingly small overhead (17% for best results).
  • 19. 20 Conclusions • AOR = “Given the description of an entity, give me back its identifier” • Disappointing results using simple IR techniques for AOR task. • Hybrid system for AOR: • combining classic IR techniques + structured database storing graph data. • Our evaluation shows that the new approach leads to significantly better results (up to +25% MAP over BM25 baseline). • For the best working configuration found, the overhead caused from the graph traversal part is limited (17% more than running the chosen baseline).
  • 20. 21 Thank you for your attention • You can find the new relevance judgments at http://diuf.unifr.ch/xi/HybridAOR. • More info at www.exascale.info. • In the following days you’ll find our paper, this presentation, and the new crowdsourced relevance judgements at www.exascale.info/AOR.

Notes de l'éditeur

  1. lot of search engines queries are about entities (more than a half) there is the task...
  2. tell that literals are strings attached to some node
  3. just the only scoring function
  4. tell what same as is
  5. I dati sono un grafo , l ’ indice invertito ci dà un entry point e poi camminiam
  6. TREC like collection/testset depth 10 pooling tutti lo conoscono qui!
  7. Say that simple index is “ or ” , UL, LA, ULA is “ and ” Say disappointment with first result with BM25: we tried to do just II but didn ’ t work, and then we decided to go for graph… NO GOOGLE
  8. Compare JUST s_1 with s_2 (lower recall but higher precision)
  9. s2_3 doesn ’ t follow wikilinks. Indicies and database were resident in the machine. We didn ’ t focus on efficiency