The solr power

•

1 j'aime•842 vues

Tareque Hossain

Motivation for using solr as a NoSQL backend

Technologie Santé & Médecine

The
Power

Tareque
Hossain

Sr.
Software
Engineer

What
about
it?

•  We
always
associate
solr
with
searching

•  solr
can
also
serve
as
your
non-‐relational

data
layer

Why
solr?

•  Hey
solr
is
already
part
of
my
stack

•  I
love
solr

•  It’s
fast,
scalable
and
there
are
some
great

python

interfaces
out
there

When
would
you
consider
it?

•  You
have
a
DB
that’s
frequently
read
and

infrequently
written

•  You
want
robust
search
&
ﬁltering
on
your

data

•  You
want
to
leverage
the
faceting
feature

•  You
want
a
decently
scalable
data
layer

What’s
not
so
cool?

•  Doesn’t
support
transactions

•  Not
all
SQL
queries
can
be
translated
into

solr
queries

•  Generating
indices
can
take
a
long
time

•  Searching
and
indexing
at
the
same
time

brings
down
performance

But..

•  You
don’t
have
to
give
up
your
relational

data
layer

•  Create
a
non-‐relational
layer
on
top
of
your

relational
data
layer

•  Get
best
of
the
both
worlds

So
what’s
the
use
case?

•  We
deal
with
medical
survey
data

•  Say:

–  About
300
multiple
choice
questions

–  Responses
can
be
multi-‐dimensional

–  7000+
diﬀerent
answer
choices
per
question

–  2000+
respondents
per
survey

–  15+
surveys
and
growing

What
a
survey
question
looks
like

When
were
you
diagnosed
with
the
following
types
of

Arthri5s?

Rheumatoid
Traumatic
Psoriatic

Osteoarthritis
Other

Arthritis
Arthritis
Arthritis

Less
than
a

þ
☐
☐
☐
☐

year
ago

More
than
a

☐
☐
þ
☐
☐

year
ago

Storing
a
single
response

When
were
you
diagnosed
with
the
following
types
of

Arthri5s?

Rheumatoid
Traumatic
Psoriatic

Osteoarthritis
Other

Arthritis
Arthritis
Arthritis

Less
than
a

1
0
0
0
0

year
ago

More
than
a

0
0
1
0
0

year
ago

Aggregating
over
2000
responses

When
were
you
diagnosed
with
the
following
types
of

Arthri5s?

Rheumatoid
Traumatic
Psoriatic

Osteoarthritis
Other

Arthritis
Arthritis
Arthritis

Less
than
a

63
155
19
27
268

year
ago

More
than
a

190
46
8
213
325

year
ago

The
Document
Structure

•  Each
survey
response
=
solr
document

•  Up
to
3000
boolean
variables
per
document

indicating
chosen
answers

•  Added
meta
information:
age,
profession,

interests

Querying

•  Filter
by
age,
interest,
profession

•  Facet
across
boolean
ﬁeld

•  Result:
what
group
of
people
chose
what

group
of
answers

Why
solr
is
awesome..

•  Faceting
across
boolean
ﬁeld
uses
very
little

memory

•  Combining
3000
ﬁelds
for
2000
documents

takes
1
~
2
ms

•  Allowed
us
to
reduce
API
response
time

from
a
variable
of
2
~
15
seconds
(sucked!)
to

an
almost
constant
~50
ms

Good
to
know..

•  sunburnt:
Awesome
python
solr
interface

github.com/tow/sunburnt

•  Programmatic
querying
as
well
as
raw

queries

•  Supports
most
advanced
solr
options

•  If
you
only
required
facets,
specify
rows=0

Questions?

•  wisertogether.com

•  slideshare.net/tarequeh/the-‐solr-‐power

•  @tarequeh

Contenu connexe

Dernier

Artificial Intelligence: Facts and MythsJoaquim Jorge

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal

Boost PC performance: How more available memory can improve productivityPrincipled Technologies

Histor y of HAM Radio presentation slidevu2urc

CNv6 Instructor Chapter 6 Quality of Servicegiselly40

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge

04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG

Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge

Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services

presentation ICT roal in 21st century educationjfdjdjcjdnsjd

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer

GenAI Risks & Security Meetup 01052024.pdflior mazor

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge

2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

Dernier (20)

Artificial Intelligence: Facts and Myths

Axa Assurance Maroc - Insurer Innovation Award 2024

Understanding Discord NSFW Servers A Guide for Responsible Users.pdf

Boost PC performance: How more available memory can improve productivity

Histor y of HAM Radio presentation slide

CNv6 Instructor Chapter 6 Quality of Service

Handwritten Text Recognition for manuscripts and early printed texts

IAC 2024 - IA Fast Track to Search Focused AI Solutions

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx

Driving Behavioral Change for Information Management through Data-Driven Gree...

Strategies for Landing an Oracle DBA Job as a Fresher

presentation ICT roal in 21st century education

How to Troubleshoot Apps for the Modern Connected Worker

08448380779 Call Girls In Civil Lines Women Seeking Men

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024

GenAI Risks & Security Meetup 01052024.pdf

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf

2024: Domino Containers - The Next Step. News from the Domino Container commu...

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

En vedette

2024 State of Marketing Report – by HubspotMarius Sescu

Everything You Need To Know About ChatGPTExpeed Software

Product Design Trends in 2024 | Teenage EngineeringsPixeldarts

How Race, Age and Gender Shape Attitudes Towards Mental HealthThinkNow

AI Trends in Creative Operations 2024 by Artwork Flow.pdfmarketingartwork

Skeleton Culture CodeSkeleton Technologies

PEPSICO Presentation to CAGNY Conference Feb 2024Neil Kimberley

Content Methodology: A Best Practices Report (Webinar)contently

How to Prepare For a Successful Job Search for 2024Albert Qian

Social Media Marketing Trends 2024 // The Global Indie InsightsKurio // The Social Media Age(ncy)

Trends In Paid Search: Navigating The Digital Landscape In 2024Search Engine Journal

5 Public speaking tips from TED - Visualized summarySpeakerHub

ChatGPT and the Future of Work - Clark Boyd Clark Boyd

Getting into the tech field. what next Tessa Mero

Google's Just Not That Into You: Understanding Core Updates & Search IntentLily Ray

How to have difficult conversations Rajiv Jayarajah, MAppComm, ACC

Introduction to Data ScienceChristy Abraham Joy

Time Management & Productivity - Best PracticesVit Horky

The six step guide to practical project managementMindGenius

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...RachelPearson36

En vedette (20)

2024 State of Marketing Report – by Hubspot

Everything You Need To Know About ChatGPT

Product Design Trends in 2024 | Teenage Engineerings

How Race, Age and Gender Shape Attitudes Towards Mental Health

AI Trends in Creative Operations 2024 by Artwork Flow.pdf

Skeleton Culture Code

PEPSICO Presentation to CAGNY Conference Feb 2024

Content Methodology: A Best Practices Report (Webinar)

How to Prepare For a Successful Job Search for 2024

Social Media Marketing Trends 2024 // The Global Indie Insights

Trends In Paid Search: Navigating The Digital Landscape In 2024

5 Public speaking tips from TED - Visualized summary

ChatGPT and the Future of Work - Clark Boyd

Getting into the tech field. what next

Google's Just Not That Into You: Understanding Core Updates & Search Intent

How to have difficult conversations

Introduction to Data Science

Time Management & Productivity - Best Practices

The six step guide to practical project management

Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...

The solr power

1. The Power Tareque Hossain Sr. Software Engineer

2. What about it? •  We always associate solr with searching •  solr can also serve as your non-‐relational data layer

3. NoSQL ? solr ?

5. Why solr? •  Hey solr is already part of my stack •  I love solr •  It’s fast, scalable and there are some great python interfaces out there

6. When would you consider it? •  You have a DB that’s frequently read and infrequently written •  You want robust search & ﬁltering on your data •  You want to leverage the faceting feature •  You want a decently scalable data layer

7. What’s not so cool? •  Doesn’t support transactions •  Not all SQL queries can be translated into solr queries •  Generating indices can take a long time •  Searching and indexing at the same time brings down performance

8. But.. •  You don’t have to give up your relational data layer •  Create a non-‐relational layer on top of your relational data layer •  Get best of the both worlds

9. So what’s the use case? •  We deal with medical survey data •  Say: –  About 300 multiple choice questions –  Responses can be multi-‐dimensional –  7000+ diﬀerent answer choices per question –  2000+ respondents per survey –  15+ surveys and growing

10. What a survey question looks like When were you diagnosed with the following types of Arthri5s? Rheumatoid Traumatic Psoriatic Osteoarthritis Other Arthritis Arthritis Arthritis Less than a þ ☐ ☐ ☐ ☐ year ago More than a ☐ ☐ þ ☐ ☐ year ago

11. Storing a single response When were you diagnosed with the following types of Arthri5s? Rheumatoid Traumatic Psoriatic Osteoarthritis Other Arthritis Arthritis Arthritis Less than a 1 0 0 0 0 year ago More than a 0 0 1 0 0 year ago

12. Aggregating over 2000 responses When were you diagnosed with the following types of Arthri5s? Rheumatoid Traumatic Psoriatic Osteoarthritis Other Arthritis Arthritis Arthritis Less than a 63 155 19 27 268 year ago More than a 190 46 8 213 325 year ago

13. The Document Structure •  Each survey response = solr document •  Up to 3000 boolean variables per document indicating chosen answers •  Added meta information: age, profession, interests

14. Querying •  Filter by age, interest, profession •  Facet across boolean ﬁeld •  Result: what group of people chose what group of answers

15. Why solr is awesome.. •  Faceting across boolean ﬁeld uses very little memory •  Combining 3000 ﬁelds for 2000 documents takes 1 ~ 2 ms •  Allowed us to reduce API response time from a variable of 2 ~ 15 seconds (sucked!) to an almost constant ~50 ms

16. Good to know.. •  sunburnt: Awesome python solr interface github.com/tow/sunburnt •  Programmatic querying as well as raw queries •  Supports most advanced solr options •  If you only required facets, specify rows=0

17. Questions? •  wisertogether.com •  slideshare.net/tarequeh/the-‐solr-‐power •  @tarequeh

Notes de l'éditeur

Good afternoon everyone! Welcome to my lightning talk: The Solr Power. My name is Tareque and I work for a small health industry startup named wisertogether. As you have noticed from this corny title, my talk is about solr.
This could be turned into a most interesting man joke.
As you might have already guessed I’m talking about using solr as a NoSQL backend. This approach is not novel in anyway. But I wanted to discuss the use case that brought it about. First of all… NoSQL.
We got to a point where retrieving data from a SQL layer just wasn’t an option. The arrow came in form of performance hit from querying a complex relational model.
Well why not? Now on to more specific reasons for using solr as a NoSQL backend.
I emphasize on the word infrequently.
So there are a lot of answer options
What were you diagnosed with previously and what you got diagnosed with recently.
When you start combining all the survey responses, you start getting some really useful information because it exposes common trends, idiosyncrasies etc. We use these numbers to generate pretty graphs
Solr stores everything in the form of a document
We used sunburnt to interface with solr. If you only need the facets, no reason to retrieve the documents unless necessary and you can save a lot of memory

The solr power

Recommandé

Recommandé

Contenu connexe

Dernier

Dernier (20)

En vedette

En vedette (20)

The solr power

Notes de l'éditeur