This document discusses when Solr is a good tool to use versus other options. It provides an overview of Solr in the big data ecosystem and the concept of polyglot persistence, where different data stores are used for different needs. Common use cases for Solr like search-based recommendations and log analysis are described. A checklist is presented for determining if Solr is a good fit based on factors like data volume, query characteristics, throughput needs, and data type. The document concludes by listing some red flags where Solr may not be suitable, such as if strong consistency, transactions, or graphs are needed requirements.
7. Polyglot Persistence—Backdrop
•
Michael Stonebraker and Ugur Çetintemel—2005
"One Size Fits All": An Idea Whose Time Has Come and Gone
•
Martin Fowler—2011
Polyglot Persistence1
•
Eric Brewer—2012
Ricon Keynote—Advancing Distributed Systems2
1) http://martinfowler.com/bliki/PolyglotPersistence.html
2) https://speakerdeck.com/eric_brewer/ricon-2012-keynote
8. Polyglot Persistence—Key Points
•
Use different datastores for different needs
•
Can apply within an application or cross-enterprise
•
Encapsulating data access yields loosely coupled components
•
Find sweet spot between dev/op complexity and flexibility
16. Log analysis
•
Given
– Receive 200,000+ log lines per second
•
Goal
– Want to do multi-field search
– Want to search on log lines with <30 second delay before search
17. Data Ingestion and Indexing
incoming
data
Ka@a
SolR
SolR
Text
Indexer
Indexer
analysis
Solr
indexer
Real-‐>me
Raw
documents
Older
index
shards
Live
index
shard
>me-‐sharded
Solr
indexes
18. Search
Query
Solr
search
Web
>er
SolR
SolR
Solr
Indexer
Indexer
search
Raw
documents
Older
index
shards
Live
index
shard
20. Question you may want to ask …
•
What is the volume of your data* (few GB? up to PB?)
•
How are your query characteristics?
– full scans
– look-ups
– multiple passes over large parts
– continuous queries
•
What’s (more) important: throughput or latency?
*)
Note:
as
long
as
Moore's
law
s>ll
holds,
these
figures
obviously
change
on
a
yearly
if
not
monthly
basis.
21. Key qualifiers
•
Want exploratory interface rather than aggregates in a dashboard
•
Data are sparse symbol sets like words or recommendation indicators
•
Small-ish return sets are OK, especially if facets are good enough
•
Near-real-time is good enough
23. Red Flags
•
You need strong consistency?
•
JOINS, anyone?
•
•
•
reme
mber
:
one
fit
all
size
d
—too
Want (complex) transactions?
l
belt
oes
n
appr
ot
oach!
OLTP, streaming (but: near-real-time)
Graphs?
24. Let’s stay in touch …
•
Twitter:
@mhausenblas
@MapR
MapR
Nordics
MapR
UK
MapR
HQ
San
Jose,
US
MapR
DACH
MapR
Japan
MapR
SE
&
Benelux
MapR
Hyderbad
•
We’re hiring!
MapR
Korea