5. Why
solr?
• Hey
solr
is
already
part
of
my
stack
• I
love
solr
• It’s
fast,
scalable
and
there
are
some
great
python
interfaces
out
there
6. When
would
you
consider
it?
• You
have
a
DB
that’s
frequently
read
and
infrequently
written
• You
want
robust
search
&
filtering
on
your
data
• You
want
to
leverage
the
faceting
feature
• You
want
a
decently
scalable
data
layer
7. What’s
not
so
cool?
• Doesn’t
support
transactions
• Not
all
SQL
queries
can
be
translated
into
solr
queries
• Generating
indices
can
take
a
long
time
• Searching
and
indexing
at
the
same
time
brings
down
performance
8. But..
• You
don’t
have
to
give
up
your
relational
data
layer
• Create
a
non-‐relational
layer
on
top
of
your
relational
data
layer
• Get
best
of
the
both
worlds
9. So
what’s
the
use
case?
• We
deal
with
medical
survey
data
• Say:
– About
300
multiple
choice
questions
– Responses
can
be
multi-‐dimensional
– 7000+
different
answer
choices
per
question
– 2000+
respondents
per
survey
– 15+
surveys
and
growing
10. What
a
survey
question
looks
like
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
Rheumatoid
Traumatic
Psoriatic
Osteoarthritis
Other
Arthritis
Arthritis
Arthritis
Less
than
a
þ
☐
☐
☐
☐
year
ago
More
than
a
☐
☐
þ
☐
☐
year
ago
11. Storing
a
single
response
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
Rheumatoid
Traumatic
Psoriatic
Osteoarthritis
Other
Arthritis
Arthritis
Arthritis
Less
than
a
1
0
0
0
0
year
ago
More
than
a
0
0
1
0
0
year
ago
12. Aggregating
over
2000
responses
When
were
you
diagnosed
with
the
following
types
of
Arthri5s?
Rheumatoid
Traumatic
Psoriatic
Osteoarthritis
Other
Arthritis
Arthritis
Arthritis
Less
than
a
63
155
19
27
268
year
ago
More
than
a
190
46
8
213
325
year
ago
13. The
Document
Structure
• Each
survey
response
=
solr
document
• Up
to
3000
boolean
variables
per
document
indicating
chosen
answers
• Added
meta
information:
age,
profession,
interests
14. Querying
• Filter
by
age,
interest,
profession
• Facet
across
boolean
field
• Result:
what
group
of
people
chose
what
group
of
answers
15. Why
solr
is
awesome..
• Faceting
across
boolean
field
uses
very
little
memory
• Combining
3000
fields
for
2000
documents
takes
1
~
2
ms
• Allowed
us
to
reduce
API
response
time
from
a
variable
of
2
~
15
seconds
(sucked!)
to
an
almost
constant
~50
ms
16. Good
to
know..
• sunburnt:
Awesome
python
solr
interface
github.com/tow/sunburnt
• Programmatic
querying
as
well
as
raw
queries
• Supports
most
advanced
solr
options
• If
you
only
required
facets,
specify
rows=0
Good afternoon everyone! Welcome to my lightning talk: The Solr Power. My name is Tareque and I work for a small health industry startup named wisertogether. As you have noticed from this corny title, my talk is about solr.
This could be turned into a most interesting man joke.
As you might have already guessed I’m talking about using solr as a NoSQL backend. This approach is not novel in anyway. But I wanted to discuss the use case that brought it about. First of all… NoSQL.
We got to a point where retrieving data from a SQL layer just wasn’t an option. The arrow came in form of performance hit from querying a complex relational model.
Well why not? Now on to more specific reasons for using solr as a NoSQL backend.
I emphasize on the word infrequently.
So there are a lot of answer options
What were you diagnosed with previously and what you got diagnosed with recently.
When you start combining all the survey responses, you start getting some really useful information because it exposes common trends, idiosyncrasies etc. We use these numbers to generate pretty graphs
Solr stores everything in the form of a document
We used sunburnt to interface with solr. If you only need the facets, no reason to retrieve the documents unless necessary and you can save a lot of memory