SlideShare a Scribd company logo
1 of 24
Download to read offline
USE CASE DIAGNOSIS: WHEN IS SOLR
REALLY THE BEST TOOL?

Michael Hausenblas
Twitter: @mhausenblas

Chief Data Engineer EMEA, MapR Technologies
Agenda
• 
• 
• 
• 
• 

Solr in the Big Data ecosystem
Polyglot Persistence
Common (Big Data) use cases
A checklist
When not to use Solr …
processing
storage

Apache Pig

Apache Zookeeper
Polyglot Persistence
$ ls -al

$ tail –f some.log
$ nc localhost 80

awk 'BEGIN { FS = "," }
/2013-[[:digit:]]+-[[:digit:]]+/ { print $3 }’
sample.csv

tool box

one-size-fits-all
Polyglot Persistence—Backdrop
• 

Michael Stonebraker and Ugur Çetintemel—2005
"One Size Fits All": An Idea Whose Time Has Come and Gone

• 

Martin Fowler—2011
Polyglot Persistence1

• 

Eric Brewer—2012
Ricon Keynote—Advancing Distributed Systems2

1) http://martinfowler.com/bliki/PolyglotPersistence.html
2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
Polyglot Persistence—Key Points
• 

Use different datastores for different needs

• 

Can apply within an application or cross-enterprise

• 

Encapsulating data access yields loosely coupled components

• 

Find sweet spot between dev/op complexity and flexibility
Common (Big Data)
use cases
Where are we coming from?
• 
• 
• 
• 
• 

Keyword search
Spellcheck & autosuggest
Ranking
Faceted search
Spatial search
Use case:
search-based
recommendation
Search-based recommendation (credit card issuer)
• 

Given
–  customer purchase history
–  merchant designations
–  merchant special offers

• 

Goal
–  Improve existing recommender system
–  Throughput important
Analyze with MapReduce

complete	
  
history	
  

Co-­‐occurrence	
  
(Mahout)	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR	
  
Solr	
  
Indexer	
  
Indexer	
  
indexing	
  

Index	
  
shards	
  
Deploy with search system

user	
  
history	
  

Web	
  >er	
  

Item	
  meta-­‐data	
  

SolR	
  
SolR	
  
Solr	
  
Indexer	
  
Indexer	
  
search	
  

Index	
  
shards	
  
Use case:
log analysis
Log analysis
• 

Given
–  Receive 200,000+ log lines per second

• 

Goal
–  Want to do multi-field search
–  Want to search on log lines with <30 second delay before search
Data Ingestion and Indexing

incoming	
  data	
  

Ka@a	
  

SolR	
  
SolR	
  
Text	
  
Indexer	
  
Indexer	
  
analysis	
  

Solr	
  
indexer	
  

Real-­‐>me	
  
Raw	
  
documents	
  

Older	
  index	
  
shards	
  

Live	
  index	
  
shard	
  

>me-­‐sharded	
  Solr	
  indexes	
  
Search

Query	
  

Solr	
  
search	
  

Web	
  >er	
  
SolR	
  
SolR	
  
Solr	
  
Indexer	
  
Indexer	
  
search	
  

Raw	
  
documents	
  

Older	
  index	
  
shards	
  

Live	
  index	
  
shard	
  
A checklist
Question you may want to ask …
• 

What is the volume of your data* (few GB? up to PB?)

• 

How are your query characteristics?
–  full scans
–  look-ups
–  multiple passes over large parts
–  continuous queries

• 

What’s (more) important: throughput or latency?

*)	
  Note:	
  as	
  long	
  as	
  Moore's	
  law	
  s>ll	
  holds,	
  these	
  figures	
  obviously	
  change	
  on	
  a	
  yearly	
  if	
  not	
  monthly	
  basis.	
  
Key qualifiers
• 

Want exploratory interface rather than aggregates in a dashboard

• 

Data are sparse symbol sets like words or recommendation indicators

• 

Small-ish return sets are OK, especially if facets are good enough

• 

Near-real-time is good enough
When not to use Solr …
Red Flags
• 

You need strong consistency?

• 

JOINS, anyone?

• 
• 
• 

reme
mber
:	
  one
fit	
  all
	
  size	
  d
—too
Want (complex) transactions?
l	
  belt
oes	
  n
	
  appr
ot	
  
oach!
OLTP, streaming (but: near-real-time)
	
  
Graphs?
Let’s stay in touch …

• 

Twitter:
@mhausenblas
@MapR

MapR	
  Nordics	
  
MapR	
  UK	
  
MapR	
  HQ	
  
San	
  Jose,	
  US	
  

MapR	
  DACH	
  

MapR	
  Japan	
  

MapR	
  SE	
  &	
  Benelux	
  
MapR	
  Hyderbad	
  

• 

We’re hiring!

MapR	
  Korea	
  

More Related Content

What's hot

Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentOpenSource Connections
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineeringnathanmarz
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyftmarkgrover
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...Charlie Hull
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoBitly
 
Intro to Python for C# Developers
Intro to Python for C# DevelopersIntro to Python for C# Developers
Intro to Python for C# DevelopersSarah Dutkiewicz
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...lucenerevolution
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsOpenSource Connections
 

What's hot (15)

Use cases for cassandra in federal and state government
Use cases for cassandra in federal and state governmentUse cases for cassandra in federal and state government
Use cases for cassandra in federal and state government
 
Big Search 4 Big Data War Stories
Big Search 4 Big Data War StoriesBig Search 4 Big Data War Stories
Big Search 4 Big Data War Stories
 
Demystifying Data Engineering
Demystifying Data EngineeringDemystifying Data Engineering
Demystifying Data Engineering
 
Dogfooding data at Lyft
Dogfooding data at LyftDogfooding data at Lyft
Dogfooding data at Lyft
 
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Splunk
SplunkSplunk
Splunk
 
Indexing big data in the cloud
Indexing big data in the cloudIndexing big data in the cloud
Indexing big data in the cloud
 
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...Enterprise Search Europe 2015:  Fishing the big data streams - the future of ...
Enterprise Search Europe 2015: Fishing the big data streams - the future of ...
 
Data Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah GuidoData Day Seattle 2015: Sarah Guido
Data Day Seattle 2015: Sarah Guido
 
Intro to Python for C# Developers
Intro to Python for C# DevelopersIntro to Python for C# Developers
Intro to Python for C# Developers
 
R reproducibility
R reproducibilityR reproducibility
R reproducibility
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Reproducible Data Science with R
Reproducible Data Science with RReproducible Data Science with R
Reproducible Data Science with R
 
Spark at Zillow
Spark at ZillowSpark at Zillow
Spark at Zillow
 
Building a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patentsBuilding a lightweight discovery interface for Chinese patents
Building a lightweight discovery interface for Chinese patents
 

Viewers also liked

Presentatie Wearable health monitoring devices
Presentatie Wearable health monitoring devicesPresentatie Wearable health monitoring devices
Presentatie Wearable health monitoring devices#devdate
 
Lehman Hot Springs DEQ fine response
Lehman Hot Springs DEQ fine responseLehman Hot Springs DEQ fine response
Lehman Hot Springs DEQ fine responseJ. Patrick Lucas
 
Solr for highly customized sitewide navigation - By Shantanu Deo
 Solr for highly customized sitewide navigation -  By Shantanu Deo Solr for highly customized sitewide navigation -  By Shantanu Deo
Solr for highly customized sitewide navigation - By Shantanu Deolucenerevolution
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation enginelucenerevolution
 

Viewers also liked (6)

Presentatie Wearable health monitoring devices
Presentatie Wearable health monitoring devicesPresentatie Wearable health monitoring devices
Presentatie Wearable health monitoring devices
 
Lehman Hot Springs DEQ fine response
Lehman Hot Springs DEQ fine responseLehman Hot Springs DEQ fine response
Lehman Hot Springs DEQ fine response
 
Solr for highly customized sitewide navigation - By Shantanu Deo
 Solr for highly customized sitewide navigation -  By Shantanu Deo Solr for highly customized sitewide navigation -  By Shantanu Deo
Solr for highly customized sitewide navigation - By Shantanu Deo
 
Nutrition basics-dossier
Nutrition basics-dossierNutrition basics-dossier
Nutrition basics-dossier
 
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
2013 11-06 lsr-dublin_m_hausenblas_solr as recommendation engine
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
 

Similar to 2013 11-07 lsr-dublin_m_hausenblas_when solr is best

Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Lucidworks
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation EnginesTrey Grainger
 
DOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using SplunkDOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using SplunkOutlyer
 
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...Lucas Jellema
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemTrey Grainger
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentationTao Feng
 
Intro to Solr in Drupal
Intro to Solr in Drupal Intro to Solr in Drupal
Intro to Solr in Drupal Mediacurrent
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Looker
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Looker
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyBozhidar Bozhanov
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSaaS Is Beautiful
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioComperio - Search Matters.
 
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013Kirill Morozov
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Open Analytics
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenChristopher Whitaker
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation SystemsJared Winick
 

Similar to 2013 11-07 lsr-dublin_m_hausenblas_when solr is best (20)

Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
Solr Under the Hood at S&P Global- Sumit Vadhera, S&P Global
 
Building Search & Recommendation Engines
Building Search & Recommendation EnginesBuilding Search & Recommendation Engines
Building Search & Recommendation Engines
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
Continuous Analytics & Optimisation using Apache Spark (Big Data Analytics, L...
 
How Solr Search Works
How Solr Search WorksHow Solr Search Works
How Solr Search Works
 
DOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using SplunkDOXLON November 2016 - Data Democratization Using Splunk
DOXLON November 2016 - Data Democratization Using Splunk
 
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
50 Shades of Data - how, when and why Big, Fast, Relational, NoSQL, Elastic, ...
 
The Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data EcosystemThe Apache Solr Smart Data Ecosystem
The Apache Solr Smart Data Ecosystem
 
Strata sf - Amundsen presentation
Strata sf - Amundsen presentationStrata sf - Amundsen presentation
Strata sf - Amundsen presentation
 
Dw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhanDw 07032018-dr pl pradhan
Dw 07032018-dr pl pradhan
 
Intro to Solr in Drupal
Intro to Solr in Drupal Intro to Solr in Drupal
Intro to Solr in Drupal
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016Frank Bien Opening Keynote - Join 2016
Frank Bien Opening Keynote - Join 2016
 
Elasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and MultitenancyElasticsearch - Scalability and Multitenancy
Elasticsearch - Scalability and Multitenancy
 
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from AlgoliaSession #2, tech session: Build realtime search by Sylvain Utard from Algolia
Session #2, tech session: Build realtime search by Sylvain Utard from Algolia
 
Improve Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - ComperioImprove Performance in Fast Search for SharePoint - Comperio
Improve Performance in Fast Search for SharePoint - Comperio
 
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013
UnderstandingHowSolrCanHelpYourBusinessScale-ECG07.31.2013
 
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
 
Open Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe OlsenOpen Data Summit Presentation by Joe Olsen
Open Data Summit Presentation by Joe Olsen
 
Building Scalable Aggregation Systems
Building Scalable Aggregation SystemsBuilding Scalable Aggregation Systems
Building Scalable Aggregation Systems
 

More from lucenerevolution

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucenelucenerevolution
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! lucenerevolution
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solrlucenerevolution
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationslucenerevolution
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloudlucenerevolution
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusterslucenerevolution
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiledlucenerevolution
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs lucenerevolution
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchlucenerevolution
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Stormlucenerevolution
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?lucenerevolution
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APIlucenerevolution
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucenelucenerevolution
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMlucenerevolution
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucenelucenerevolution
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenallucenerevolution
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside downlucenerevolution
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...lucenerevolution
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - finallucenerevolution
 

More from lucenerevolution (20)

Text Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and LuceneText Classification Powered by Apache Mahout and Lucene
Text Classification Powered by Apache Mahout and Lucene
 
State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here! State of the Art Logging. Kibana4Solr is Here!
State of the Art Logging. Kibana4Solr is Here!
 
Search at Twitter
Search at TwitterSearch at Twitter
Search at Twitter
 
Building Client-side Search Applications with Solr
Building Client-side Search Applications with SolrBuilding Client-side Search Applications with Solr
Building Client-side Search Applications with Solr
 
Integrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applicationsIntegrate Solr with real-time stream processing applications
Integrate Solr with real-time stream processing applications
 
Scaling Solr with SolrCloud
Scaling Solr with SolrCloudScaling Solr with SolrCloud
Scaling Solr with SolrCloud
 
Administering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud ClustersAdministering and Monitoring SolrCloud Clusters
Administering and Monitoring SolrCloud Clusters
 
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and ParboiledImplementing a Custom Search Syntax using Solr, Lucene, and Parboiled
Implementing a Custom Search Syntax using Solr, Lucene, and Parboiled
 
Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs Using Solr to Search and Analyze Logs
Using Solr to Search and Analyze Logs
 
Enhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic searchEnhancing relevancy through personalization & semantic search
Enhancing relevancy through personalization & semantic search
 
Real-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and StormReal-time Inverted Search in the Cloud Using Lucene and Storm
Real-time Inverted Search in the Cloud Using Lucene and Storm
 
Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?Solr's Admin UI - Where does the data come from?
Solr's Admin UI - Where does the data come from?
 
Schemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST APISchemaless Solr and the Solr Schema REST API
Schemaless Solr and the Solr Schema REST API
 
High Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with LuceneHigh Performance JSON Search and Relational Faceted Browsing with Lucene
High Performance JSON Search and Relational Faceted Browsing with Lucene
 
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVMText Classification with Lucene/Solr, Apache Hadoop and LibSVM
Text Classification with Lucene/Solr, Apache Hadoop and LibSVM
 
Faceted Search with Lucene
Faceted Search with LuceneFaceted Search with Lucene
Faceted Search with Lucene
 
Recent Additions to Lucene Arsenal
Recent Additions to Lucene ArsenalRecent Additions to Lucene Arsenal
Recent Additions to Lucene Arsenal
 
Turning search upside down
Turning search upside downTurning search upside down
Turning search upside down
 
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
Spellchecking in Trovit: Implementing a Contextual Multi-language Spellchecke...
 
Shrinking the haystack wes caldwell - final
Shrinking the haystack   wes caldwell - finalShrinking the haystack   wes caldwell - final
Shrinking the haystack wes caldwell - final
 

Recently uploaded

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilV3cube
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 

Recently uploaded (20)

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Developing An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of BrazilDeveloping An App To Navigate The Roads of Brazil
Developing An App To Navigate The Roads of Brazil
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 

2013 11-07 lsr-dublin_m_hausenblas_when solr is best

  • 1.
  • 2. USE CASE DIAGNOSIS: WHEN IS SOLR REALLY THE BEST TOOL? Michael Hausenblas Twitter: @mhausenblas Chief Data Engineer EMEA, MapR Technologies
  • 3. Agenda •  •  •  •  •  Solr in the Big Data ecosystem Polyglot Persistence Common (Big Data) use cases A checklist When not to use Solr …
  • 6. $ ls -al $ tail –f some.log $ nc localhost 80 awk 'BEGIN { FS = "," } /2013-[[:digit:]]+-[[:digit:]]+/ { print $3 }’ sample.csv tool box one-size-fits-all
  • 7. Polyglot Persistence—Backdrop •  Michael Stonebraker and Ugur Çetintemel—2005 "One Size Fits All": An Idea Whose Time Has Come and Gone •  Martin Fowler—2011 Polyglot Persistence1 •  Eric Brewer—2012 Ricon Keynote—Advancing Distributed Systems2 1) http://martinfowler.com/bliki/PolyglotPersistence.html 2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
  • 8. Polyglot Persistence—Key Points •  Use different datastores for different needs •  Can apply within an application or cross-enterprise •  Encapsulating data access yields loosely coupled components •  Find sweet spot between dev/op complexity and flexibility
  • 10. Where are we coming from? •  •  •  •  •  Keyword search Spellcheck & autosuggest Ranking Faceted search Spatial search
  • 12. Search-based recommendation (credit card issuer) •  Given –  customer purchase history –  merchant designations –  merchant special offers •  Goal –  Improve existing recommender system –  Throughput important
  • 13. Analyze with MapReduce complete   history   Co-­‐occurrence   (Mahout)   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   indexing   Index   shards  
  • 14. Deploy with search system user   history   Web  >er   Item  meta-­‐data   SolR   SolR   Solr   Indexer   Indexer   search   Index   shards  
  • 16. Log analysis •  Given –  Receive 200,000+ log lines per second •  Goal –  Want to do multi-field search –  Want to search on log lines with <30 second delay before search
  • 17. Data Ingestion and Indexing incoming  data   Ka@a   SolR   SolR   Text   Indexer   Indexer   analysis   Solr   indexer   Real-­‐>me   Raw   documents   Older  index   shards   Live  index   shard   >me-­‐sharded  Solr  indexes  
  • 18. Search Query   Solr   search   Web  >er   SolR   SolR   Solr   Indexer   Indexer   search   Raw   documents   Older  index   shards   Live  index   shard  
  • 20. Question you may want to ask … •  What is the volume of your data* (few GB? up to PB?) •  How are your query characteristics? –  full scans –  look-ups –  multiple passes over large parts –  continuous queries •  What’s (more) important: throughput or latency? *)  Note:  as  long  as  Moore's  law  s>ll  holds,  these  figures  obviously  change  on  a  yearly  if  not  monthly  basis.  
  • 21. Key qualifiers •  Want exploratory interface rather than aggregates in a dashboard •  Data are sparse symbol sets like words or recommendation indicators •  Small-ish return sets are OK, especially if facets are good enough •  Near-real-time is good enough
  • 22. When not to use Solr …
  • 23. Red Flags •  You need strong consistency? •  JOINS, anyone? •  •  •  reme mber :  one fit  all  size  d —too Want (complex) transactions? l  belt oes  n  appr ot   oach! OLTP, streaming (but: near-real-time)   Graphs?
  • 24. Let’s stay in touch … •  Twitter: @mhausenblas @MapR MapR  Nordics   MapR  UK   MapR  HQ   San  Jose,  US   MapR  DACH   MapR  Japan   MapR  SE  &  Benelux   MapR  Hyderbad   •  We’re hiring! MapR  Korea