More Related Content
Similar to The Seven Deadly Sins of Solr - By Jay Hill (20)
More from lucenerevolution (20)
The Seven Deadly Sins of Solr - By Jay Hill
- 2. Introductions…!
Who
the
hell
am
I?
Jay
Hill,
Lucid
Imagina-on
7
years
Lucene
experience
4
years
Solr
experience
Author
of
Lucid
Training
SME
for
Lucid
Cer-fica-on
Who
the
hell
are
you?
New
to
search?
New
to
Lucene/Solr?
BaKle-‐tested
veterans?
©
Lucid
Imagina-on,
Inc.
- 3. We'll Leave Time For Q&A!
Who's
doing
what?
Solr
3.1?
Solr
1.4.1?
Nightly
build?
Solr
1.3
or
older?
Are
there
any
specific
problems
you're
having?
Meanwhile,
interrupt,
ask
ques8ons
as
we
go,
etc.
©
Lucid
Imagina-on,
Inc.
- 4. A Brief Word About Lucid Imagination!
Lucid
Imagina8on:
The
commercial
company
suppor-ng
Lucene/Solr
open
source
search.
Founded
by
Yonik
Seeley
–
Creator
of
Solr
Erik
Hatcher
–
Co-‐author,
Lucene
In
Ac-on
Grant
Ingersoll
–
Apache
PMC
Chair
Marc
Krellenstein
–
Lucid
CTO
Staff
includes
9
Lucene/Solr
commiKers
Training,
cer-fica-on,
support,
LucidWorks
Enterprise
©
Lucid
Imagina-on,
Inc.
- 7. Sins As Anti-Patterns?!
"Sorta
kinda"
Specify
Nothing
(Sloth)
Creeping
Featurei-s
(Greed)
Blowhard
Jamboree
(Pride)
Boat
Anchor
(Lust)
Not
Invented
Here
(Envy)
Phatware
(GluKony)
Emperor's
New
Clothes
(Wrath)
©
Lucid
Imagina-on,
Inc.
- 8. Sins Can Contradict One Another!!
You'll
no-ce
that
many
of
the
"sins"
we
see
will
be
the
exact
opposite
of
others
Just
as
some
of
us
tend
towards
laziness,
others
towards
excess
Some-mes
you
-‐
"Look
before
you
leap."
Other
-mes,
"He
who
hesitates
is
lost."
In
Solr
(or
any
search
app),
one
size
never
fits
all
©
Lucid
Imagina-on,
Inc.
- 10. Sloth!
"We
aren't
really
into
open
source."
Lack
of
commitment
to
Solr
and/or
the
search
applica-on
itself
Not
developing
in-‐house
Solr
exper-se
Not
paying
enough
aKen-on
to
JVM
sebngs,
garbage
collec-on,
and
RAM
alloca-on.
©
Lucid
Imagina-on,
Inc.
- 11. Sloth!
Neglec-ng
to
get
familiar
with
the
source
code
It
is
open
source
ader
all!
Not
taking
the
-me
to
understand
the
main
parts
of
Solr:
Request
Handlers
Search
components
Query
parsers
Extend
QParserPlugin
class
ValueSource
&
ValueSourceParser
–
custom
func-ons
New
pseudo-‐fields
in
4.x
Response
writers
©
Lucid
Imagina-on,
Inc.
- 12. Sloth!
Not
keeping
up
with
new
features
and
developments
in
Lucene
and
Solr
CHANGES.txt
–
use
"diff"
to
keep
up
on
changes
©
Lucid
Imagina-on,
Inc.
- 13. Sloth!
New
features
in
Solr
3.1:
Solr
spa8al
Edismax
query
parser
NOT
experimental!
Dynamic
metadata
extrac-on
via
UIMA
Numeric
range
face8ng
(like
date
face-ng)
Lucene
RAMDirectoryFactory
available
Face-ng
performance
improvements
Spellcheck
and
Terms
components
now
work
for
distributed
search
Suggester
component
–
beKer
autosuggest!
Can
add
custom
dict.,
phrases,
etc.
©
Lucid
Imagina-on,
Inc.
- 14. Sloth!
New
features
coming
in
Solr
4.x:
Lucene
DocumentWritersPerThread
(DWPT)
Moving
towards
"real
-me"
UpdateHandler
upgrade
to
work
with
real-‐-me
Field
collapsing/grouping
Pivot
facets
SolrCloud
(Zookeeper)
Fuzzy
queries
100
-mes
faster
Pseudo
fields
via
func-ons
Relevancy
func-on
queries:
n,
idf,
docFreq,
norm,
…
©
Lucid
Imagina-on,
Inc.
- 15. Sloth: The Path To Salvation!
Commit
to
the
project
and
to
learning
Solr
Stay
up
to
date
on
Solr
changes
Stay
current
with
ongoing
releases
Get
familiar
with
the
source
code
Spend
some
-me
to
understand
the
main
configura-on
files:
solrconfig.xml
schema.xml
Read
through
the
en-re
Solr
Wiki
once
every
so
oden
Develop
in-‐house
Solr
exper-se
©
Lucid
Imagina-on,
Inc.
- 17. Greed!
Skimping
on
resources
such
as:
RAM
"Here's
a
quarter
buddy,
go
buy
some
RAM!"
Storage
space
You
will
get
what
you
pay
for!
…on
the
other
hand,
not
every
company
has
"deep
pockets"
©
Lucid
Imagina-on,
Inc.
- 18. Greed!
Trying
to
"squeeze
by",
indexing
to,
and
searching
on,
the
same
server
Indexing
Indexing
Shards
(Indexers)
Slave/Searchers
Load
Balancer
Searches
Searches
©
Lucid
Imagina-on,
Inc.
- 19. Greed!
Not
making
the
effort
to
find
the
right
balance
between
precision
and
recall
Recall:
What
frac-on
of
Precision:
What
frac-on
the
relevant
documents
in
of
the
returned
results
the
collec-on
were
re-‐
are
relevant
to
the
turned
by
the
system?
informa-on
need?
©
Lucid
Imagina-on,
Inc.
- 20. Greed!
A
few
thoughts
about
relevance:
Get
feedback
from
domain
experts
Is
it
beKer
to
have
lots
of
results
with
less
precision,
or
fewer,
more
targeted
results?
Different
sites
will
have
very
different
requirements
©
Lucid
Imagina-on,
Inc.
- 21. Greed: The Path To Salvation!
Pry
open
your
wallet
–
don't
be
cheap
You
don't
have
to
push
the
envelope
Find
the
right
balance
between
recall
and
precision
Don't
push
for
more
results
over
precision
–
unless
that
is
a
clear
requirement
(some-mes
it
is)
©
Lucid
Imagina-on,
Inc.
- 23. Pride!
Reinven-ng
the
wheel
"Why
don't
we
just
write
our
own
search
libraries?"
Nobody
has
a
use
case
like
us
–
right?
"We
need
to
change
the
scoring
algorithms."
©
Lucid
Imagina-on,
Inc.
- 24. Pride!
Thinking
you
can
"do
it
all"
in
Solr
Solr
is
rarely
a
good
choice
as
a
SOR
Consider
other
tools
to
work
with
Solr:
Nutch
Mahout
OpenNLP
Google
Connector
Framework
Your
own
code
©
Lucid
Imagina-on,
Inc.
- 25. Pride!
Stubbornly
refusing
to
use
resources
such
as
the
mailing
lists:
Solr
user
list:
solr-‐user@lucene.apache.org
Solr
developer
list:
dev@lucene.apache.org
Lucene
user
list:
java-‐user@lucene.apache.org
LucidFind:
hKp://www.lucidimagina-on.com/search/
©
Lucid
Imagina-on,
Inc.
- 26. Pride!
"I
will
not
yield!"
Trying
to
"win
baKles"
on
the
mailing
lists
Good
Karma
–
be
a
good
ci-zen
in
the
community
©
Lucid
Imagina-on,
Inc.
- 27. Pride: The Path To Salvation!
Ask
for
help
when
needed
Let
the
business
needs
define
the
project
–
don't
let
the
tail
wag
the
dog
Get
a
feel
for
the
Solr
community
and
respect
the
experience
of
others
You're
situa-on,
while
possibly
unique,
is
probably
not
completely
dissimilar
to
others.
Learn
from
the
pioneers
and
Solr
veterans
©
Lucid
Imagina-on,
Inc.
- 29. Lust!
Obsessing
over
unimportant
details
too
early
in
the
project
Agile
approach
is
well
suited
to
Solr
development
–
iterate!
Trying
to
"push
the
envelope"
Necessary
some-mes,
but
it's
not
called
the
"bleeding
edge"
without
reason
"Ease
in"
to
major
changes
Too
much
aKen-on
to
JVM
sebngs
Solr
experts
are
not
usually
JVM/GC
experts
©
Lucid
Imagina-on,
Inc.
- 30. Lust!
"An--‐greed"
–
CommiEng
too
many
resources
to
Solr
Make
sure
the
OS
has
plenty
of
RAM
to
cache
files,
etc
"If
one
is
good,
a
dozen
must
be
beKer!"
As
much
as
possible,
try
to
get
a
sense
of
what
your
query
volume
will
be,
and
don't
just
throw
money
at
building
a
monstrous
farm
of
searchers
Solr
has
proven
to
be
much
more
efficient
than
some
large,
commercial
search
solu-ons
©
Lucid
Imagina-on,
Inc.
- 31. Lust!
Blood
from
a
turnip:
Trying
some
absurd
new
technique,
"just
because"
RAMDirectoryFactory
–
not
a
secret
way
to
faster
indexing/searching
No
disk-‐backed
persistence
Usually
not
worth
it
…but
you
never
know…
Research
first
before
going
"extreme"
©
Lucid
Imagina-on,
Inc.
- 32. Lust!
No
need
to
index
millions
of
docs
for
development
BeKer
to
work
with
small
sets
of
data
while
gebng
started.
Don't
worry
too
much
about
field
types
as
you
get
started.
Get
data
in
the
index,
then
analyze
and
refine.
©
Lucid
Imagina-on,
Inc.
- 33. Lust: The Path To Salvation!
Use
an
agile
approach
–
start
simply,
build
your
applica-on
slowly,
iterate
Deal
with
the
low-‐hanging
fruit
first
Measure
twice,
cut
once
Don't
miss
the
forest
for
the
trees
–
no
need
to
obsess
over
details
in
the
early
stages
Do
some
due
diligence
before
trying
unorthodox
approaches
Get
a
small
sample
of
data
indexed
w/o
worrying
about
type,
then
itera-ons
of
refinement
©
Lucid
Imagina-on,
Inc.
- 34. "If
we
had
some
bacon
we
could
have
some
bacon
and
eggs
–
if
we
had
some
eggs."
©
Lucid
Imagina-on,
Inc.
- 35. Envy!
Adding
"cool"
features
you
see
on
other
sites,
but
don't
really
need
Keep
it
"lean
and
mean",
especially
to
start
Resist
the
urge
to
include
the
"kitchen
sink"
©
Lucid
Imagina-on,
Inc.
- 36. Envy!
You
too
can
master
dismax!
Don't
be
afraid
of
dismax/edismax
Lots
of
controls
to
learn,
but
also
lots
of
power
Flexibility
to
search
mul-ple
fields
Boost
different
fields
Boost
phrase
fields
(pf)
higher
than
query
fields
(qf)
Use
boost
queries
(bq)
and
func-on
queries
(bf)
Most
in-mida-ng
params:
-e
mm
©
Lucid
Imagina-on,
Inc.
- 37. Envy!
Spa-al
search
–
seems
complicated,
but
major
sites
make
it
look
easy
Now,
in
Solr
3.1
–
it
is
easy!
You
can:
Store
spa-al
data
in
your
index
Filter
by
distance
Sort
by
distance
Boost/bias
by
distance
Facet
by
distance
Also
consider:
Search-‐based
naviga-on
such
as
"Show
me
in-‐stock
items
only"
©
Lucid
Imagina-on,
Inc.
- 38. Envy: The Path To Salvation!
Focus
on
your
requirements,
don't
try
to
add
"bells
and
whistles"
you
don't
need
Don't
be
hesitant
to
dive
into
the
power
of
dismax/edismax
Take
advantage
of
new
features
such
as
Solr
spa-al,
if
those
features
will
add
value
to
the
end
user
experience
©
Lucid
Imagina-on,
Inc.
- 39. "A
fat
stomach
never
breeds
fine
thoughts."
©
Lucid
Imagina-on,
Inc.
- 40. Gluttony!
“Staying
fit
and
trim”
is
usually
good
prac-ce
when
designing
and
running
Solr
applica-ons
Once
again
–
keep
it
"lean
and
mean"
A
lot
of
these
issues
cross
over
into
the
“Sloth”
category
The
effort
needed
to
keep
your
configura-on
and
data
efficiently
managed
is
not
considered
important
Don't
lose
control
of
your
configura-on
files
Remove
unnecessary
elements
Version
control
all
configura-on
files
©
Lucid
Imagina-on,
Inc.
- 41. Gluttony!
Slim
down
those
"bloated"
queries:
q="red
shoes"&
accountId=(12343
OR
338899
OR
554443
OR
243445
OR
55442OR
3330899
OR
59927
OR
3888999
OR
549
OR
440293579
34201
OR
339917
OR
300191
OR
339338
OR
109823
OR
679176
OR
31407815
OR
3001756
OR
134322
OR
311123
OR
987888
OR
997181
OR
771819
OR
100292
OR
3389474
OR
5505759
OR
2459577
OR
4499957
OR
1996571
OR
559590
OR
220299
OR
4404872
OR
151510
OR
66017
OR
666
OR
113459
OR
890575
OR
505725
OR
330393
OR
349940
OR
4094994
OR
1245995
OR
2459959
OR
4255909
OR
899955
OR
7878899
OR
100999
…
∞
)
©
Lucid
Imagina-on,
Inc.
- 42. Gluttony!
Stay
in
shape
–
Flex
Your
Solr
Muscles!
Keep
up
on
new
features
Training,
when
appropriate
Cer-fica-on
Contribute!
Follow
the
user
lists
Refactor
when
new
features
can
help
Keep
up
to
date
on
new
releases
©
Lucid
Imagina-on,
Inc.
- 43. Gluttony: The Path To Salvation!
Keep
configura-on
files
clean
and
trim.
Remove
unused
elements
Periodically
review
queries
to
make
sure
they
are
efficient
Refactor
when
necessary
–
keep
your
applica-on
fit
and
trim
©
Lucid
Imagina-on,
Inc.
- 44. "Hope
is
the
denial
of
reality."
©
Lucid
Imagina-on,
Inc.
- 45. Wrath!
Wrath
-‐
usually
synonymous
with
anger,
but…
Let’s
use
an
older
defini-on
here:
“A
vehement
denial
of
the
truth,
both
to
others
and
in
the
form
of
self-‐denial
and
impaMence.”
Step
back
every
now
and
then
and
look
objec-vely
at
your
applica-on
©
Lucid
Imagina-on,
Inc.
- 47. Wrath!
Ignoring
new
Solr
releases
OK
to
wait
un-l
a
release
is
proven
But
gebng
too
far
behind
makes
upgrading
more
painful
with
each
release
We
don't
have
-me
to
do
it
right,
but
we
always
have
-me
to
fix
it
©
Lucid
Imagina-on,
Inc.
- 48. Wrath!
Ignoring
complaints
about
results
relevance
Disregarding
feedback
from
stakeholders
Remember
–
the
point
of
your
search
applica-on
is
to
support
the
business,
not
to
"build
cool
stuff"
Not
taking
advantage
of
log
files
Consider
mining
log
files,
storing
data
in
rela-onal
DB
for
genera-ng
reports
Capturing
user
queries
and
query
counts
can
be
extremely
useful
Can
also
be
used
for
query-‐based
autosuggest.
(not
just
indexed
terms)
©
Lucid
Imagina-on,
Inc.
- 49. Wrath: The Path To Salvation!
Keep
your
version
of
Solr
up
to
date
OK
to
wait
"awhile",
but
don't
skip
versions
Seek
and
embrace
feedback
from
business
and
domain
experts
Constantly
gauge
and
improve
relevance
as
an
ongoing
task
Avoid
the
push
to
release
too
soon
(as
best
you
can)
Take
advantage
of
log
files
to
understand
what
users
are
doing,
and
what
is
not
working
well
©
Lucid
Imagina-on,
Inc.