SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
Elasticsearch
Scalable Full-Text Search Engine

Thursday, February 27, 14
Goals for this talk

Thursday, February 27, 14
Outline
• What’s full text search and why do we use
it?

• What can you do with Elasticsearch?
• Why is Elasticsearch different?
• DEMO TIME!
Thursday, February 27, 14
Text Search
do I really need to explain it?

Thursday, February 27, 14
%LIKE%
• In the beginning there was:
SELECT * FROM tweets WHERE content
LIKE ‘%zuckerberg%’

Thursday, February 27, 14
But that’s not what you usually search for!

• You want:
Search by author
Search by time
Search by sentiment
Search by location
Search by everything!

Thursday, February 27, 14
That’s a lot of metadata!

• You can’t search through all that on the fly
if you want realtime results

• You need to index it first!

Thursday, February 27, 14
Inverted Index
• Some documents:sells Facebook’ [Monday]
1: ‘Mark Zuckerberg

2: ‘Facebook buys WhatsApp’ [Tuesday]
3: ‘Mark’s Facebook buys Instagram’[Monday]

• Inverted index for them:{ 1, 2, 3}
Facebook:
Mark: {1, 3}
Instagram: {2}
WhatsApp: {2}
[Monday]: {1, 3}

Thursday, February 27, 14
Ok, now that we have data, we also want some
numbers behind it!

• In our previous example:
• Facebook is mentioned 3 times
• There are 2 posts on [Monday]
• The most frequent words are
Facebook and Mark

Thursday, February 27, 14
All 3 put together
Elasticsearch
=
Search(Content & Metadata) + Analytics
(oversimplified)

Thursday, February 27, 14
Let’s look at some
search features of
Elasticsearch

Thursday, February 27, 14
Features: Complex Queries

• Boolean Operators:

(apple OR pumpkin) AND pie

• Wildcards:
app*: apple, apples, appliance
appl?: apple, apply

• Fuzzy:
back~: back, pack, black, bank

• Ranged:
Thursday, February 27, 14
Features: Complex Queries

• Attribute filtering:
apple AND pie AND location:california

• Range filtering:
apple AND published:[1393100055 TO 1393427055]

Thursday, February 27, 14
Features:Geo Queries
Bounding Box Queries
Queries

Thursday, February 27, 14

Distance Range
Feature: built in analytics

Thursday, February 27, 14
Feature: Built in tagcloud

Thursday, February 27, 14
What’s special about
Elasticsearch?

Thursday, February 27, 14
Distributed

• Clustering data into multiple servers is easy
and abstracted away from the developer

Thursday, February 27, 14
Performance/Scalability

• Add and take nodes on the fly without ever
stopping the search service

Thursday, February 27, 14
Performance/Scalability

• Can scale independently both indexing and
searching

Thursday, February 27, 14
Performance/Scalability

• With few nodes you can do complex
queries on billions of documents

• 3 nodes: 20 mil documents with 2 replicas
each

Thursday, February 27, 14
Easy to back up
• Elasticsearch has a built in backup solution
so that you don’t have to worry about
implementing one

Thursday, February 27, 14
Demo time!

Thursday, February 27, 14

Contenu connexe

Similaire à Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator

Test Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsTest Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsSarah Joy Arnold
 
Test Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsTest Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsSarah Joy Arnold
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoopRussell Jurney
 
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)Introduction to Object-Oriented Programming & Design Principles (TCF 2014)
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)Michael Redlich
 
Ladies Be Architects: Integration Study Group: Kick Off Slides
Ladies Be Architects: Integration Study Group: Kick Off SlidesLadies Be Architects: Integration Study Group: Kick Off Slides
Ladies Be Architects: Integration Study Group: Kick Off Slidesgemziebeth
 
Ab(Using) the MetaCPAN API for Fun and Profit v2013
Ab(Using) the MetaCPAN API for Fun and Profit v2013Ab(Using) the MetaCPAN API for Fun and Profit v2013
Ab(Using) the MetaCPAN API for Fun and Profit v2013Olaf Alders
 
Post-it Up: Qualitative Data Analysis of a Test Fest
Post-it Up: Qualitative Data Analysis of a Test FestPost-it Up: Qualitative Data Analysis of a Test Fest
Post-it Up: Qualitative Data Analysis of a Test FestSarah Joy Arnold
 
Our path to apache spark
Our path to apache sparkOur path to apache spark
Our path to apache sparkppetr82
 
Puppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet
 

Similaire à Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator (15)

Test Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsTest Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-its
 
Test Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-itsTest Fest and the Tale of Too Many Post-its
Test Fest and the Tale of Too Many Post-its
 
Write the Docs 2014, EU
Write the Docs 2014, EUWrite the Docs 2014, EU
Write the Docs 2014, EU
 
Agile analytics applications on hadoop
Agile analytics applications on hadoopAgile analytics applications on hadoop
Agile analytics applications on hadoop
 
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)Introduction to Object-Oriented Programming & Design Principles (TCF 2014)
Introduction to Object-Oriented Programming & Design Principles (TCF 2014)
 
TSEM Spring 2012 - Wood
TSEM Spring 2012 - WoodTSEM Spring 2012 - Wood
TSEM Spring 2012 - Wood
 
Ladies Be Architects: Integration Study Group: Kick Off Slides
Ladies Be Architects: Integration Study Group: Kick Off SlidesLadies Be Architects: Integration Study Group: Kick Off Slides
Ladies Be Architects: Integration Study Group: Kick Off Slides
 
Ab(Using) the MetaCPAN API for Fun and Profit v2013
Ab(Using) the MetaCPAN API for Fun and Profit v2013Ab(Using) the MetaCPAN API for Fun and Profit v2013
Ab(Using) the MetaCPAN API for Fun and Profit v2013
 
My Varnish Setup
My Varnish SetupMy Varnish Setup
My Varnish Setup
 
Lean UX
Lean UXLean UX
Lean UX
 
Post-it Up: Qualitative Data Analysis of a Test Fest
Post-it Up: Qualitative Data Analysis of a Test FestPost-it Up: Qualitative Data Analysis of a Test Fest
Post-it Up: Qualitative Data Analysis of a Test Fest
 
Our path to apache spark
Our path to apache sparkOur path to apache spark
Our path to apache spark
 
DevTools at Etsy
DevTools at EtsyDevTools at Etsy
DevTools at Etsy
 
Puppet Camp London 2014: Keynote
Puppet Camp London 2014: KeynotePuppet Camp London 2014: Keynote
Puppet Camp London 2014: Keynote
 
Introduction to Apache Solr
Introduction to Apache SolrIntroduction to Apache Solr
Introduction to Apache Solr
 

Dernier

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Alkin Tezuysal
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructureitnewsafrica
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Dernier (20)

New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
Unleashing Real-time Insights with ClickHouse_ Navigating the Landscape in 20...
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical InfrastructureVarsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
Varsha Sewlal- Cyber Attacks on Critical Critical Infrastructure
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Intro to Elaticsearch - Elasticsearch Bucharest Group @ Softbinator

  • 1. Elasticsearch Scalable Full-Text Search Engine Thursday, February 27, 14
  • 2. Goals for this talk Thursday, February 27, 14
  • 3. Outline • What’s full text search and why do we use it? • What can you do with Elasticsearch? • Why is Elasticsearch different? • DEMO TIME! Thursday, February 27, 14
  • 4. Text Search do I really need to explain it? Thursday, February 27, 14
  • 5. %LIKE% • In the beginning there was: SELECT * FROM tweets WHERE content LIKE ‘%zuckerberg%’ Thursday, February 27, 14
  • 6. But that’s not what you usually search for! • You want: Search by author Search by time Search by sentiment Search by location Search by everything! Thursday, February 27, 14
  • 7. That’s a lot of metadata! • You can’t search through all that on the fly if you want realtime results • You need to index it first! Thursday, February 27, 14
  • 8. Inverted Index • Some documents:sells Facebook’ [Monday] 1: ‘Mark Zuckerberg 2: ‘Facebook buys WhatsApp’ [Tuesday] 3: ‘Mark’s Facebook buys Instagram’[Monday] • Inverted index for them:{ 1, 2, 3} Facebook: Mark: {1, 3} Instagram: {2} WhatsApp: {2} [Monday]: {1, 3} Thursday, February 27, 14
  • 9. Ok, now that we have data, we also want some numbers behind it! • In our previous example: • Facebook is mentioned 3 times • There are 2 posts on [Monday] • The most frequent words are Facebook and Mark Thursday, February 27, 14
  • 10. All 3 put together Elasticsearch = Search(Content & Metadata) + Analytics (oversimplified) Thursday, February 27, 14
  • 11. Let’s look at some search features of Elasticsearch Thursday, February 27, 14
  • 12. Features: Complex Queries • Boolean Operators: (apple OR pumpkin) AND pie • Wildcards: app*: apple, apples, appliance appl?: apple, apply • Fuzzy: back~: back, pack, black, bank • Ranged: Thursday, February 27, 14
  • 13. Features: Complex Queries • Attribute filtering: apple AND pie AND location:california • Range filtering: apple AND published:[1393100055 TO 1393427055] Thursday, February 27, 14
  • 14. Features:Geo Queries Bounding Box Queries Queries Thursday, February 27, 14 Distance Range
  • 15. Feature: built in analytics Thursday, February 27, 14
  • 16. Feature: Built in tagcloud Thursday, February 27, 14
  • 18. Distributed • Clustering data into multiple servers is easy and abstracted away from the developer Thursday, February 27, 14
  • 19. Performance/Scalability • Add and take nodes on the fly without ever stopping the search service Thursday, February 27, 14
  • 20. Performance/Scalability • Can scale independently both indexing and searching Thursday, February 27, 14
  • 21. Performance/Scalability • With few nodes you can do complex queries on billions of documents • 3 nodes: 20 mil documents with 2 replicas each Thursday, February 27, 14
  • 22. Easy to back up • Elasticsearch has a built in backup solution so that you don’t have to worry about implementing one Thursday, February 27, 14