SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Challenges for industrial-strength
Information Retrieval on Databases
R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers
KARS2017 - 21 March 2017, Venice, IT
○ Since 2010
○ Spin-off of CWI, Amsterdam
○ “Search by Strategy”
About Spinque
Outline
1. Search is everywhere
2. Tailored search is expected
3. Tailored search needs modelling
4. Search modelling by information specialists
5. Search modelling needs flexible IR & DB
6. IR on DB: it works
Search is everywhere
Real world scenarios
Technical
Desktop
Coding content assistant
Product recommendation
Personalised
newsfeed
Let’s pick a simple one: autocompletion
iphone 7
iphone 5c
iphone 6s
ipho| “autocompletion is trivial”
.. not so fast!
Tailored search is expected
autocompletion
iphone 7
iphone 5c
iphone 6s
ipho|
Basic - products
○ Any matching term from the index
○ Suggest products
Tailored search is expected
autocompletion
iphone 7
iphone 5c
iphone 6 cases
ipho|
Basic - products & categories
○ Any matching term from the index
○ Suggest products & categories
Tailored search is expected
autocompletion
iphone 7
iphone 6 cases
iphone 6s
ipho|
Filtered
○ Any matching term from the index
○ “iPhone 5c” out of stock
Tailored search is expected
autocompletion
iphone 8
iphone 7
iphone 6 cases
ipho|
Filtered & ranked
○ “iPhone 5c” out of stock
○ “iPhone 8” the most requested
Tailored search is expected
autocompletion
iphone cases
iphone adapters
iphone 7
ipho|
Exploratory
○ First suggest categories..
○ .. then products
Tailored search is expected
autocompletion
iphone 7 cases
iphone 7 adapters
iphone 8
ipho|
Personalised
○ I already own an “iPhone 7”
○ Suggest compatible accessories
○ Suggest upgrade
Tailored search is expected
What if my search API isn’t enough?
Tailored search needs modelling
iphone 7 cases
iphone 7 adapters
iphone 8
ipho|
<your favourite autocompletion>
○ Out-of-the-box API may fall short
○ Build custom search API
○ Who? How?
http://localhost:8983/solr/suggest?q=ipho
How do we build custom search APIs?
Search modelling by information specialists
data modelling search modelling
Spinque: Empower the information specialist
Empowering the information specialist
data modelling search modelling
Search modelling by information specialists
Data modelling
Search modelling needs flexible IR & DB
business transactions social media
Search modelling
standard autocompletion custom autocompletion
Search modelling by information specialists
http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho
The IR & DB challenge
Search modelling needs flexible IR & DB
○ IR & DB both needed even for trivial tasks
○ Different technologies / focus
○ How / where to integrate task results?
○ Do they stay black boxes?
○ Can we express them in the same platform,
and when does this make sense?
http://spinque/suggest_ranked?q=ipho
Text retrieval by strategy
Search modelling needs flexible IR & DB
text retrieval.. ..is just another DB query
○ strategy-driven “collection” and “documents”
○ on-demand indexing
○ it takes just standard SQL
Graph DB by strategy
Search modelling needs flexible IR & DB
Visual modelling Relational Algebra Graph
subject property object
123 name pen
123 availability in stock
123 price 9.99
Graph DB by strategy
Search modelling needs flexible IR & DB
we want DB & ranking
together & seamlessly
what if this.. ..could work on this?
subject property object p
123 name pen 1.0
123 availability in stock 0.8
123 price 9.99 1.0
Rank. Everything. Always.
Search modelling needs flexible IR & DB
rank products.. ..get ranked orders and customers
Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB
SELECT g.obj, (o.p * g.p) as p
FROM graph g,
ranked_orders o
WHERE g.subj = o.id
AND g.rel = ’orderedBy’;
PROJECT [$3]
JOIN INDEPENDENT [$1=$1]
SELECT [$2=’orderedBy’] (g)
ranked_orders
SQLPRA
What about efficiency?
IR on DB: it works
1.1M docs, 2.3GB
4-core i7-3770s, 16GB RAM, 256GB SSD
find documents: 20ms
8M lots, 25K auctions (10GB raw data)
VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD
find lots: 150ms
topic
What about efficiency?
IR on DB: it works
pre-compute what can be pre-computed.. ..but do it query-driven
○ Index on demand
○ Cache result of relational expressions
○ Algebraic analysis to determine cache
What about efficiency?
IR on DB: it works
choose it carefully.. ..then enjoy
○ Main benefits of IR on DB
○ IR as a DB optimisation problem
○ No custom extensions, no vendor-lock
○ Column-store, CPU-friendly DB engine
Hey, we made our join 20% faster.
You are welcome.
○ If you just text retrieval on documents
○ Lucene-like will serve you well
○ Information needs tend to be more complex
○ Solve at application-level: common and painful
○ A one-platform approach pays off
IR on DB: when does it make sense?
IR on DB: it works
Conclusions
1. Search is everywhere
○ In the real world..
2. Tailored search is expected
○ ..there is no search like another.
3. Tailored search needs modelling
○ Someone will put effort in it..
4. Search modelling by information specialists
○ ..who better than the right person for the job?
5. Search modelling needs flexible IR & DB
○ Who takes care of the low-level details then?
6. IR on DB: it works
○ The right tools. The right architecture.
○ Live updates
○ ACID transactions overhead
○ Scale out
○ It’s more than “just an inverted file” to be distributed
○ Even better support for information specialists
○ Strategy auto-tuning
Challenges ahead
supporting information specialists
Don’t program search engines,
design them

Contenu connexe

Tendances

International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Marcos Almeida
 
call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...dannyijwest
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)ijfcstjournal
 
Data Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataData Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataAngus Addlesee
 
User interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar ØverliUser interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar Øverliplan4business
 

Tendances (7)

International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
Multi datastores - CLOSER'14
Multi datastores - CLOSER'14Multi datastores - CLOSER'14
Multi datastores - CLOSER'14
 
call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...call for papers - 9th International Conference on Natural Language Processing...
call for papers - 9th International Conference on Natural Language Processing...
 
International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)International Journal on Foundations of Computer Science & Technology (IJFCST)
International Journal on Foundations of Computer Science & Technology (IJFCST)
 
Data Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your DataData Reconciliation: Using DBpedia to Enhance your Data
Data Reconciliation: Using DBpedia to Enhance your Data
 
User interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar ØverliUser interface – client / portal by Tor Gunnar Øverli
User interface – client / portal by Tor Gunnar Øverli
 

En vedette

The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...Steve Elliott
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCAmazon Web Services
 
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Amazon Web Services
 
Dev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinarDev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinaritService ®
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech TalksAmazon Web Services
 
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksDeep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksAmazon Web Services
 
Digital Strategy Success 2016
Digital Strategy Success 2016Digital Strategy Success 2016
Digital Strategy Success 2016Dave Chaffey
 
B2B Marketing Automation 2017
B2B Marketing Automation 2017B2B Marketing Automation 2017
B2B Marketing Automation 2017Dave Chaffey
 
Infrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationInfrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationAmazon Web Services
 
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksDeveloping Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksAmazon Web Services
 
5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing todayDave Chaffey
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesAmazon Web Services
 
An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...Amazon Web Services
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAmazon Web Services
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldAmazon Web Services
 

En vedette (18)

UX dans un monde de distraction
UX dans un monde de distractionUX dans un monde de distraction
UX dans un monde de distraction
 
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
The ROI of Scaling Agile - How to justify the investment in terms your CFO wi...
 
Large-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSCLarge-Scale AWS Migrations with CSC
Large-Scale AWS Migrations with CSC
 
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements - ...
 
Dev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinarDev ops + ITIL / mejor juntos webinar
Dev ops + ITIL / mejor juntos webinar
 
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech TalksHands-on Labs: Getting Started with AWS  - March 2017 AWS Online Tech Talks
Hands-on Labs: Getting Started with AWS - March 2017 AWS Online Tech Talks
 
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech TalksDeep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
Deep Dive on Amazon EBS Elastic Volumes - March 2017 AWS Online Tech Talks
 
Digital Strategy Success 2016
Digital Strategy Success 2016Digital Strategy Success 2016
Digital Strategy Success 2016
 
B2B Marketing Automation 2017
B2B Marketing Automation 2017B2B Marketing Automation 2017
B2B Marketing Automation 2017
 
Infrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormationInfrastructure Continuous Delivery Using AWS CloudFormation
Infrastructure Continuous Delivery Using AWS CloudFormation
 
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech TalksDeveloping Applications with the IoT Button - March 2017 AWS Online Tech Talks
Developing Applications with the IoT Button - March 2017 AWS Online Tech Talks
 
5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today5 things that still surprise me about Digital Marketing today
5 things that still surprise me about Digital Marketing today
 
IAM Best Practices
IAM Best PracticesIAM Best Practices
IAM Best Practices
 
IAM Introduction
IAM IntroductionIAM Introduction
IAM Introduction
 
Introduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code ServicesIntroduction to DevOps and the AWS Code Services
Introduction to DevOps and the AWS Code Services
 
An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...An Overview of Designing Microservices Based Applications on AWS - March 2017...
An Overview of Designing Microservices Based Applications on AWS - March 2017...
 
Automate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeployAutomate Software Deployments on EC2 with AWS CodeDeploy
Automate Software Deployments on EC2 with AWS CodeDeploy
 
Application Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless WorldApplication Lifecycle Management in a Serverless World
Application Lifecycle Management in a Serverless World
 

Similaire à Challenges for Industrial-strength Information Retrieval on Databases

Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedStanford University
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAhmet Akyol
 
Machine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxMachine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxNanda Kishore Mallapragada
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiAmadeus Magrabi
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPDr. Haxel Consult
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data PlatformDani Solà Lagares
 
Neo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Codemotion
 
Building powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesBuilding powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesCambridge Intelligence
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDBMongoDB
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned Omid Vahdaty
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Jos van Dongen
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022ArangoDB Database
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsInside Analysis
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella
 

Similaire à Challenges for Industrial-strength Information Retrieval on Databases (20)

Neurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons LearnedNeurodb Engr245 2021 Lessons Learned
Neurodb Engr245 2021 Lessons Learned
 
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel GuideAnything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
Anything Data: Big, Streaming, NoSQL, Cloud, Science ... A Sloppy Travel Guide
 
Machine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptxMachine Learning Basics to get you into Leading Tech companies.pptx
Machine Learning Basics to get you into Leading Tech companies.pptx
 
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus MagrabiManaging Database Indexes: A Data-Driven Approach - Amadeus Magrabi
Managing Database Indexes: A Data-Driven Approach - Amadeus Magrabi
 
ICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IPICIC 2017: How to effectively monitor Technological Developments in IP
ICIC 2017: How to effectively monitor Technological Developments in IP
 
Simply Business' Data Platform
Simply Business' Data PlatformSimply Business' Data Platform
Simply Business' Data Platform
 
Neo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom DemoNeo4j Product Update and Bloom Demo
Neo4j Product Update and Bloom Demo
 
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
Artificial Intelligence in practice - Gerbert Kaandorp - Codemotion Amsterdam...
 
Big Data Usecases
Big Data UsecasesBig Data Usecases
Big Data Usecases
 
Building powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLinesBuilding powerful apps with ArangoDB & KeyLines
Building powerful apps with ArangoDB & KeyLines
 
Webinar: Scaling MongoDB
Webinar: Scaling MongoDBWebinar: Scaling MongoDB
Webinar: Scaling MongoDB
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
AWS Big Data Demystified #1.2 | Big Data architecture lessons learned
 
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
Neo4j Innovation Lab – Bringing the Best of Data Science and Design Thinking ...
 
Database Shootout: What's best for BI?
Database Shootout: What's best for BI?Database Shootout: What's best for BI?
Database Shootout: What's best for BI?
 
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
Machine Learning + Graph Databases for Better Recommendations V2 08/20/2022
 
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens DoorsThe Anywhere Enterprise – How a Flexible Foundation Opens Doors
The Anywhere Enterprise – How a Flexible Foundation Opens Doors
 
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
MongoDB World 2019: MongoDB in Data Science: How to Build a Scalable Product ...
 
Data science tutorial
Data science tutorialData science tutorial
Data science tutorial
 
Distributed machine learning 101 using apache spark from a browser devoxx.b...
Distributed machine learning 101 using apache spark from a browser   devoxx.b...Distributed machine learning 101 using apache spark from a browser   devoxx.b...
Distributed machine learning 101 using apache spark from a browser devoxx.b...
 

Dernier

AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024APNIC
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirtrahman018755
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girlsstephieert
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 

Dernier (20)

AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024On Starlink, presented by Geoff Huston at NZNOG 2024
On Starlink, presented by Geoff Huston at NZNOG 2024
 
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Sukhdev Vihar Delhi 💯Call Us 🔝8264348440🔝
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya ShirtChallengers I Told Ya Shirt
Challengers I Told Ya ShirtChallengers I Told Ya Shirt
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 3 Gurgaon >༒8448380779 Escort Service
 
Russian Call girls in Dubai +971563133746 Dubai Call girls
Russian  Call girls in Dubai +971563133746 Dubai  Call girlsRussian  Call girls in Dubai +971563133746 Dubai  Call girls
Russian Call girls in Dubai +971563133746 Dubai Call girls
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 

Challenges for Industrial-strength Information Retrieval on Databases

  • 1. Challenges for industrial-strength Information Retrieval on Databases R. Cornacchia, M. Hildebrand, A.P. de Vries, F. Dorssers KARS2017 - 21 March 2017, Venice, IT
  • 2. ○ Since 2010 ○ Spin-off of CWI, Amsterdam ○ “Search by Strategy” About Spinque
  • 3. Outline 1. Search is everywhere 2. Tailored search is expected 3. Tailored search needs modelling 4. Search modelling by information specialists 5. Search modelling needs flexible IR & DB 6. IR on DB: it works
  • 4. Search is everywhere Real world scenarios Technical Desktop Coding content assistant Product recommendation Personalised newsfeed
  • 5. Let’s pick a simple one: autocompletion iphone 7 iphone 5c iphone 6s ipho| “autocompletion is trivial” .. not so fast! Tailored search is expected
  • 6. autocompletion iphone 7 iphone 5c iphone 6s ipho| Basic - products ○ Any matching term from the index ○ Suggest products Tailored search is expected
  • 7. autocompletion iphone 7 iphone 5c iphone 6 cases ipho| Basic - products & categories ○ Any matching term from the index ○ Suggest products & categories Tailored search is expected
  • 8. autocompletion iphone 7 iphone 6 cases iphone 6s ipho| Filtered ○ Any matching term from the index ○ “iPhone 5c” out of stock Tailored search is expected
  • 9. autocompletion iphone 8 iphone 7 iphone 6 cases ipho| Filtered & ranked ○ “iPhone 5c” out of stock ○ “iPhone 8” the most requested Tailored search is expected
  • 10. autocompletion iphone cases iphone adapters iphone 7 ipho| Exploratory ○ First suggest categories.. ○ .. then products Tailored search is expected
  • 11. autocompletion iphone 7 cases iphone 7 adapters iphone 8 ipho| Personalised ○ I already own an “iPhone 7” ○ Suggest compatible accessories ○ Suggest upgrade Tailored search is expected
  • 12. What if my search API isn’t enough? Tailored search needs modelling iphone 7 cases iphone 7 adapters iphone 8 ipho| <your favourite autocompletion> ○ Out-of-the-box API may fall short ○ Build custom search API ○ Who? How? http://localhost:8983/solr/suggest?q=ipho
  • 13. How do we build custom search APIs? Search modelling by information specialists data modelling search modelling Spinque: Empower the information specialist
  • 14. Empowering the information specialist data modelling search modelling Search modelling by information specialists
  • 15. Data modelling Search modelling needs flexible IR & DB business transactions social media
  • 16. Search modelling standard autocompletion custom autocompletion Search modelling by information specialists http://spinque/suggest?q=ipho http://spinque/suggest_ranked?q=ipho
  • 17. The IR & DB challenge Search modelling needs flexible IR & DB ○ IR & DB both needed even for trivial tasks ○ Different technologies / focus ○ How / where to integrate task results? ○ Do they stay black boxes? ○ Can we express them in the same platform, and when does this make sense? http://spinque/suggest_ranked?q=ipho
  • 18. Text retrieval by strategy Search modelling needs flexible IR & DB text retrieval.. ..is just another DB query ○ strategy-driven “collection” and “documents” ○ on-demand indexing ○ it takes just standard SQL
  • 19. Graph DB by strategy Search modelling needs flexible IR & DB Visual modelling Relational Algebra Graph subject property object 123 name pen 123 availability in stock 123 price 9.99
  • 20. Graph DB by strategy Search modelling needs flexible IR & DB we want DB & ranking together & seamlessly what if this.. ..could work on this? subject property object p 123 name pen 1.0 123 availability in stock 0.8 123 price 9.99 1.0
  • 21. Rank. Everything. Always. Search modelling needs flexible IR & DB rank products.. ..get ranked orders and customers Fuhr, Rölleke, 1997, A probabilistic relational algebra for the integration of IR and DB SELECT g.obj, (o.p * g.p) as p FROM graph g, ranked_orders o WHERE g.subj = o.id AND g.rel = ’orderedBy’; PROJECT [$3] JOIN INDEPENDENT [$1=$1] SELECT [$2=’orderedBy’] (g) ranked_orders SQLPRA
  • 22. What about efficiency? IR on DB: it works 1.1M docs, 2.3GB 4-core i7-3770s, 16GB RAM, 256GB SSD find documents: 20ms 8M lots, 25K auctions (10GB raw data) VM (8 CPUs) on Xeon E5-2620, 16GB RAM, 256GB SSD find lots: 150ms topic
  • 23. What about efficiency? IR on DB: it works pre-compute what can be pre-computed.. ..but do it query-driven ○ Index on demand ○ Cache result of relational expressions ○ Algebraic analysis to determine cache
  • 24. What about efficiency? IR on DB: it works choose it carefully.. ..then enjoy ○ Main benefits of IR on DB ○ IR as a DB optimisation problem ○ No custom extensions, no vendor-lock ○ Column-store, CPU-friendly DB engine Hey, we made our join 20% faster. You are welcome.
  • 25. ○ If you just text retrieval on documents ○ Lucene-like will serve you well ○ Information needs tend to be more complex ○ Solve at application-level: common and painful ○ A one-platform approach pays off IR on DB: when does it make sense? IR on DB: it works
  • 26. Conclusions 1. Search is everywhere ○ In the real world.. 2. Tailored search is expected ○ ..there is no search like another. 3. Tailored search needs modelling ○ Someone will put effort in it.. 4. Search modelling by information specialists ○ ..who better than the right person for the job? 5. Search modelling needs flexible IR & DB ○ Who takes care of the low-level details then? 6. IR on DB: it works ○ The right tools. The right architecture.
  • 27. ○ Live updates ○ ACID transactions overhead ○ Scale out ○ It’s more than “just an inverted file” to be distributed ○ Even better support for information specialists ○ Strategy auto-tuning Challenges ahead
  • 28. supporting information specialists Don’t program search engines, design them