SlideShare une entreprise Scribd logo
1  sur  35
Elasticsearch Meetup | Amsterdam | April 7, 2016
Maarten Roosendaal & Anne Veling
introduction
• Anne Veling
– Elasticsearch consultancy and custom
training
– Performance and Stability Troubleshooting
– Software Architect, Team Lead
• Hierarchical data model, multiple levels
• High volume
– searches
– data changes
• Complex query requirements
– Both Product and Offer fields in query
– Facet on both levels
bol.com challenge
Products and Offers
faster indexing
faster searching
Test Data Creation
• Node.js Script creating random data
– Product
• Title: two random nouns from noun list
• Category: pick one out 26 nouns
• Half have no offer, half between 1-4
– Offer
• Random price between 1-20
• Seller: pick one out of 10k
• Stream in memory, flush out to disk in 3 flavors
– Each flavor keeping its own bulk size of 100k
– For 1M, 10M and 100M products
Document
{
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1,
"product": {
"id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime"
}
}
Nested
Nested
{
"_id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime",
"offers": [
{
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1
}
]
}
Parent/Child
{
"_id": "product95826",
"familyId": "family56744",
"title": "lunchroom representative",
"category": "crime”
}
{
"_parent": "product95826"
"seller": "seller1203",
"price": 7,
"stock": 2,
"deliveryCode": 1
}
• Zipped data files
– 1M: 86Mb
– 10M: 860Mb
– 100M: 8.6Gg
Getting it there
Indexing?
Indexing
• 1M product set, local naive
– 80s Document
– 41s Nested
– 64s Parent/Child
• ES index bottleneck:
– Your source system and latency
it can slurp it up faster than you can serve it
Let’s take a break
Use Cases
Use Case A Use Case B Use Case C
Product Search Word in Title Word in Title
∃ DeliveryC = 0
Word in Title
∃ Price < P
Order By Relevance Relevance (Lowest) Price
Display for top N
products
Product Fields
Cheapest Offer
fields
Product Fields
Correct Cheapest
Offer fields
Product Fields
Cheapest Offer
fields
Aggregate On Category Category Category
∀ Offer SellerId ∀ Correct Offers
SellerId
∀ Correct Offers
SellerId
∀ Offer Price ∀ Correct Offers
Price
∀ Correct Offers
Price
∀ Offer
DeliveryCode
∀ Correct Offers
DeliveryCode
∀ Correct Offers
DeliveryCode
• Product
• Offer
Use Cases
D: query B, roll up by family
• Families (with products with
offers)
– with product.title:lunchroom
– filter by
product.offer.deliveryCode:tom
orrow
Searching for a lunchroom
How hard can it be?
Let’s search
POST /boltest1m_doc/_search -> 3046
{
"query": {
"term": {
"product.title": {
"value": "lunchroom"
}
}
}
}
POST /boltest1m_nested/_search -> 2026
{
"query": {
"term": {
"title": {
"value": "lunchroom"
}
}
}
}
POST /boltest1m_parentchild/_search -> 2022
{
"query": {
"has_parent": {
"parent_type": "product",
"query": {
"term": {
"title": {
"value": "lunchroom"
}
}
}
}
}
}
ElasticSearch docs (and Lucene docs)
Product with Doc Nested
Parent/
Child
no offer 1 1 (1) 1
1 offer 1 1 (2) 2
2 offers 2 1 (3) 3
Real Queries
• Add Details, Sorting
• Product Facets
– Category
• Offer Facets
– Seller ID
– Price Buckets
– Delivery Code
Compare the numbers…
Explain the differences...
A: Doc
A: Nested
A: Parent/Child
Query Tips
• Use aggregations
– Cardinality
– top_hits ♥️ (with top_score)
• Smart Grouping & Field Collapsing
• Slooooow 😢
– inner_hits
• Don’t forget post-filtering or result page
lookup
Ice Cream Bounty
for making top_hits aggregation fast
Testing
Results
0
20
40
60
80
100
120
140
160
180
200
a b c d
1m tun 30102015 32 GB new queries
doc
nested
parentchild
0
500
1000
1500
2000
2500
3000
3500
a b c d
10m tun 30102015 32 GB new queries
doc
nested
parentchild
Conclusions
• Parent/Child has limitations
– Combining cross-level queries with
aggregations in one go
• Doc not as fast as we’d expected
– Because we needed top_hits aggregation
• Elasticsearch scales predictably
Conclusions
• For us, nested was the best solution
• What is yours?
• What are you searching for?
– What are the rows?
– What are the facets about?
Lessons Learned
• Testing the scalability of your data model
– Fast iterations early on
– Valuable insight in indexing and search
requirements
• Data Modeling is hard
– Do it early
– Make it fun
Tech Lessons Learned
• Don’t forget to tune the ES cluster
– Configure memory ;)
• If bulk file last line has no n, gets ignored!
– count the differences
• 100k bulk files with .000 suffixes ought to
be enough for everyone, right?
• Do not underestimate Sneakernet
Thank You
@anneveling anne@beyondtrees.com

Contenu connexe

Tendances

Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...DataWorks Summit
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Lucidworks
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to ElasticsearchClifford James
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Neo4j
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jArangoDB Database
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGLucidworks
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Edureka!
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutesDavid Pilato
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaLucidworks
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Fwdays
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL dataKevin Lee
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Burak TUNGUT
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseRobert Lujo
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...NoSQLmatters
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Upfoundsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneRahul Jain
 

Tendances (19)

Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
Realtime Analytics and Anomalities Detection using Elasticsearch, Hadoop and ...
 
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...Designing and Building a Graph Database Application – Architectural Choices, ...
Designing and Building a Graph Database Application – Architectural Choices, ...
 
Performance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4jPerformance comparison: Multi-Model vs. MongoDB and Neo4j
Performance comparison: Multi-Model vs. MongoDB and Neo4j
 
Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AGOLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
OLAP Battle - SolrCloud vs. HBase: Presented by Dragan Milosevic, Zanox AG
 
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
Elasticsearch Tutorial | Getting Started with Elasticsearch | ELK Stack Train...
 
elasticsearch
elasticsearchelasticsearch
elasticsearch
 
Elasticsearch in 15 minutes
Elasticsearch in 15 minutesElasticsearch in 15 minutes
Elasticsearch in 15 minutes
 
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, ClouderaReal-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
Real-Time Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
Евгений Бобров "Powered by OSS. Масштабируемая потоковая обработка и анализ б...
 
SAS integration with NoSQL data
SAS integration with NoSQL dataSAS integration with NoSQL data
SAS integration with NoSQL data
 
Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5Elasticsearch Arcihtecture & What's New in Version 5
Elasticsearch Arcihtecture & What's New in Version 5
 
ElasticSearch - index server used as a document database
ElasticSearch - index server used as a document databaseElasticSearch - index server used as a document database
ElasticSearch - index server used as a document database
 
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
Dan Sullivan - Data Analytics and Text Mining with MongoDB - NoSQL matters Du...
 
Elasticsearch From the Bottom Up
Elasticsearch From the Bottom UpElasticsearch From the Bottom Up
Elasticsearch From the Bottom Up
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 

En vedette

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseKristijan Duvnjak
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedBeyondTrees
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for ElasticsearchFlorian Hopf
 
Getting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBGetting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBBigstep
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Sematext Group, Inc.
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining William Simms
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchBeyondTrees
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in NetflixDanny Yuan
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life琛琳 饶
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data AnalyticsAmazon Web Services
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaPrajal Kulkarni
 

En vedette (11)

Elasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational databaseElasticsearch as a search alternative to a relational database
Elasticsearch as a search alternative to a relational database
 
ElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learnedElasticSearch in Production: lessons learned
ElasticSearch in Production: lessons learned
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Getting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DBGetting the Most Out of Your NoSQL DB
Getting the Most Out of Your NoSQL DB
 
Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2Side by Side with Elasticsearch & Solr, Part 2
Side by Side with Elasticsearch & Solr, Part 2
 
ElasticSearch for data mining
ElasticSearch for data mining ElasticSearch for data mining
ElasticSearch for data mining
 
Nested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearchNested and Parent/Child Docs in ElasticSearch
Nested and Parent/Child Docs in ElasticSearch
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Attack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and KibanaAttack monitoring using ElasticSearch Logstash and Kibana
Attack monitoring using ElasticSearch Logstash and Kibana
 

Similaire à Scalable Data Models with Elasticsearch

Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Lucidworks
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Avkash Chauhan
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation SystemsSalil Navgire
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesPhilipp Klöckner
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBMongoDB
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrTrey Grainger
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation SystemMinha Hwang
 
Query Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsQuery Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsMicrosoft Tech Community
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseLucidworks
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actElad Rosenheim
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartHakka Labs
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...Petter Skodvin-Hvammen
 
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsSharath Rao
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...MongoDB
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference ArchitectureMongoDB
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise SearchFindwise
 

Similaire à Scalable Data Models with Elasticsearch (20)

Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...Using Deep Learning and Customized Solr Components to Improve search Relevanc...
Using Deep Learning and Customized Solr Components to Improve search Relevanc...
 
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)Creating AnswerBot with Keras and TensorFlow (TensorBeat)
Creating AnswerBot with Keras and TensorFlow (TensorBeat)
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Data Mining and Recommendation Systems
Data Mining and Recommendation SystemsData Mining and Recommendation Systems
Data Mining and Recommendation Systems
 
Behemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge WebsitesBehemoth SEO: Search Strategy for Huge Websites
Behemoth SEO: Search Strategy for Huge Websites
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
Prepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDBPrepare for Peak Holiday Season with MongoDB
Prepare for Peak Holiday Season with MongoDB
 
Scaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solrScaling Recommendations, Semantic Search, & Data Analytics with solr
Scaling Recommendations, Semantic Search, & Data Analytics with solr
 
Introduction to Recommendation System
Introduction to Recommendation SystemIntroduction to Recommendation System
Introduction to Recommendation System
 
Query Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applicationsQuery Processing Innovations for data intensive, modern applications
Query Processing Innovations for data intensive, modern applications
 
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, FindwiseRelevance in the Wild - Daniel Gomez Vilanueva, Findwise
Relevance in the Wild - Daniel Gomez Vilanueva, Findwise
 
Building a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing actBuilding a Recommendation Engine - A Balancing act
Building a Recommendation Engine - A Balancing act
 
DataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at InstacartDataEngConf SF16 - Recommendations at Instacart
DataEngConf SF16 - Recommendations at Instacart
 
How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...How did it go? The first large enterprise search project in Europe using Shar...
How did it go? The first large enterprise search project in Europe using Shar...
 
Search Analytics - Comperio
Search Analytics - ComperioSearch Analytics - Comperio
Search Analytics - Comperio
 
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data ProductsWrangleConf 2017 - Lessons from Integrating ML models into Data Products
WrangleConf 2017 - Lessons from Integrating ML models into Data Products
 
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
Retail Reference Architecture Part 1: Flexible, Searchable, Low-Latency Produ...
 
Retail Reference Architecture
Retail Reference ArchitectureRetail Reference Architecture
Retail Reference Architecture
 
Everything You Wish You Knew About Search
Everything You Wish You Knew About SearchEverything You Wish You Knew About Search
Everything You Wish You Knew About Search
 
Introduction to Enterprise Search
Introduction to Enterprise SearchIntroduction to Enterprise Search
Introduction to Enterprise Search
 

Dernier

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park masabamasaba
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...Shane Coughlan
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension AidPhilip Schwarz
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is insideshinachiaurasa2
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisamasabamasaba
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfonteinmasabamasaba
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxalwaysnagaraju26
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfayushiqss
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 

Dernier (20)

Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa%in tembisa+277-882-255-28 abortion pills for sale in tembisa
%in tembisa+277-882-255-28 abortion pills for sale in tembisa
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
%in Stilfontein+277-882-255-28 abortion pills for sale in Stilfontein
 
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptxBUS PASS MANGEMENT SYSTEM USING PHP.pptx
BUS PASS MANGEMENT SYSTEM USING PHP.pptx
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 

Scalable Data Models with Elasticsearch

  • 1. Elasticsearch Meetup | Amsterdam | April 7, 2016 Maarten Roosendaal & Anne Veling
  • 2. introduction • Anne Veling – Elasticsearch consultancy and custom training – Performance and Stability Troubleshooting – Software Architect, Team Lead
  • 3. • Hierarchical data model, multiple levels • High volume – searches – data changes • Complex query requirements – Both Product and Offer fields in query – Facet on both levels bol.com challenge
  • 4. Products and Offers faster indexing faster searching
  • 5.
  • 6. Test Data Creation • Node.js Script creating random data – Product • Title: two random nouns from noun list • Category: pick one out 26 nouns • Half have no offer, half between 1-4 – Offer • Random price between 1-20 • Seller: pick one out of 10k • Stream in memory, flush out to disk in 3 flavors – Each flavor keeping its own bulk size of 100k – For 1M, 10M and 100M products
  • 7. Document { "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1, "product": { "id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime" } }
  • 9. Nested { "_id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime", "offers": [ { "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1 } ] }
  • 10. Parent/Child { "_id": "product95826", "familyId": "family56744", "title": "lunchroom representative", "category": "crime” } { "_parent": "product95826" "seller": "seller1203", "price": 7, "stock": 2, "deliveryCode": 1 }
  • 11. • Zipped data files – 1M: 86Mb – 10M: 860Mb – 100M: 8.6Gg Getting it there
  • 13.
  • 14. Indexing • 1M product set, local naive – 80s Document – 41s Nested – 64s Parent/Child • ES index bottleneck: – Your source system and latency it can slurp it up faster than you can serve it
  • 15. Let’s take a break
  • 16. Use Cases Use Case A Use Case B Use Case C Product Search Word in Title Word in Title ∃ DeliveryC = 0 Word in Title ∃ Price < P Order By Relevance Relevance (Lowest) Price Display for top N products Product Fields Cheapest Offer fields Product Fields Correct Cheapest Offer fields Product Fields Cheapest Offer fields Aggregate On Category Category Category ∀ Offer SellerId ∀ Correct Offers SellerId ∀ Correct Offers SellerId ∀ Offer Price ∀ Correct Offers Price ∀ Correct Offers Price ∀ Offer DeliveryCode ∀ Correct Offers DeliveryCode ∀ Correct Offers DeliveryCode • Product • Offer
  • 17. Use Cases D: query B, roll up by family • Families (with products with offers) – with product.title:lunchroom – filter by product.offer.deliveryCode:tom orrow
  • 18. Searching for a lunchroom How hard can it be?
  • 19. Let’s search POST /boltest1m_doc/_search -> 3046 { "query": { "term": { "product.title": { "value": "lunchroom" } } } } POST /boltest1m_nested/_search -> 2026 { "query": { "term": { "title": { "value": "lunchroom" } } } } POST /boltest1m_parentchild/_search -> 2022 { "query": { "has_parent": { "parent_type": "product", "query": { "term": { "title": { "value": "lunchroom" } } } } } }
  • 20.
  • 21. ElasticSearch docs (and Lucene docs) Product with Doc Nested Parent/ Child no offer 1 1 (1) 1 1 offer 1 1 (2) 2 2 offers 2 1 (3) 3
  • 22. Real Queries • Add Details, Sorting • Product Facets – Category • Offer Facets – Seller ID – Price Buckets – Delivery Code Compare the numbers… Explain the differences...
  • 26.
  • 27. Query Tips • Use aggregations – Cardinality – top_hits ♥️ (with top_score) • Smart Grouping & Field Collapsing • Slooooow 😢 – inner_hits • Don’t forget post-filtering or result page lookup
  • 28. Ice Cream Bounty for making top_hits aggregation fast
  • 30. Results 0 20 40 60 80 100 120 140 160 180 200 a b c d 1m tun 30102015 32 GB new queries doc nested parentchild 0 500 1000 1500 2000 2500 3000 3500 a b c d 10m tun 30102015 32 GB new queries doc nested parentchild
  • 31. Conclusions • Parent/Child has limitations – Combining cross-level queries with aggregations in one go • Doc not as fast as we’d expected – Because we needed top_hits aggregation • Elasticsearch scales predictably
  • 32. Conclusions • For us, nested was the best solution • What is yours? • What are you searching for? – What are the rows? – What are the facets about?
  • 33. Lessons Learned • Testing the scalability of your data model – Fast iterations early on – Valuable insight in indexing and search requirements • Data Modeling is hard – Do it early – Make it fun
  • 34. Tech Lessons Learned • Don’t forget to tune the ES cluster – Configure memory ;) • If bulk file last line has no n, gets ignored! – count the differences • 100k bulk files with .000 suffixes ought to be enough for everyone, right? • Do not underestimate Sneakernet