SlideShare une entreprise Scribd logo
1  sur  46
LUCENE
•
•
•
•

Tagline: “Proven Search Capabilities”
Free & Open Source
Created in 1999
Features:
• Indexes & Analyzes Data
• Tokenizing, Stemming, Filtering

• Search Queries
• Phrases, wildcards, proximity searches, ranges, fielded searches

• Relevance Scoring, Field Sorting
ELASTICSEARCH
•
•
•
•

Tagline: “You know, for Search”
Free & Open Source
Created by Shay Banon @kimchy
Versions
• First public release, v0.4 in February 2010
• A rewrite of earlier “Compass” project, w/ scalability built-in from the very core

• Latest release 0.90.5
• In Java, so inherently cross-platform
DISTRIBUTED & HIGHLY AVAILABLE
• Multiple servers (nodes) running in a cluster
• Acts as single service (internal routing)
• Data is split into shards (# shards is configurable)
• Zero or more replicas
• Replicas on different servers (server pools) for failover
• Node in cluster goes down? Replica takes over.
• Self managing cluster
• Automatic master detection + failover
• Responsible for distribution/relocating shards
$ cd ~/Downloads
$ wget https://download […] /elasticsearch-0.90.5.tar.gz
$ tar -xzf elasticsearch-0.90.5.tar.gz
$ cd elasticsearch-0.90.5/
$ ./bin/elasticsearch
$ curl -XPUT http://localhost:9200/reddevils/matches/1 -d
'{"date": "2013-10-15T19:00:00Z", "opponent":
"Wales", "result": "1-1"}'
{"ok":true,"_index":"reddevils","_type":"matches","_id":"1","
_version":1}
$ curl -XPUT http://localhost:9200/reddevils/matches/2 -d
'{"date": "2013-10-11T15:00:00Z", "opponent":
"Croatia", "result": "1-2"}'
{"ok":true,"_index":"reddevils","_type":"matches","_id":"2","
_version":1}
$ curl -XPUT http://localhost:9200/reddevils/matches/2 -d
'{"date": "2013-10-11T15:00:00Z", "opponent":
"Croatia", "result": "1-2", "girlfriend_attention_span": 30}’
{"ok":true,"_index":"reddevils","_type":"matches","_id":"2","
_version":2}
“Aha! A NoSQL store?!”
QUERY DSL
• Full Text Search
• Search for “Croatia”
• Structured Search
• Search for “All matches where outcome was „1-1‟”
• Analytics
• Search for “Average attention span of my girlfriend”
• Incl. custom functions (scripts)

• … or a combination of those!
QUERY DSL (CONT‟D)
• Searching in your data set …
• queries: full text search & relevance scoring
• filters: exact matches
• Aggregating information from your data set …
• facets:
• Averages
• Sums
• Date histograms
•…
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=tru
e' -d '{
"query": {
"query_string": {
"query": "croatia"
}
}
}'
{
"took" : 18,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 2,
"max_score" : 0.40240064,
"hits" : [ {
"_index" : "reddevils",
"_type" : "matches",
"_id" : "2",
"_score" : 0.40240064, "_source" : {"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"}
}, {
"_index" : "reddevils",
"_type" : "matches",
"_id" : "4",
"_score" : 0.3125, "_source" : {"date": "2012-09-11T15:00:00Z", "opponent": "Croatia", "result": "1-1"}
}]
}
}
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=true' -d
'{
"query": {
"constant_score": {
"filter": {
"term": {
"result": "1-1”
}
}
}
}
}’
curl -XGET
'http://localhost:9200/reddevils/matches/_search?pretty=true' -d
'{
"size": 0,
"facets": {
"opponent": {
"terms": {
"field": "opponent"
}
}
}
}'
{

…
"facets" : {
"opponent" : {
"_type" : "terms",
"missing" : 0,
"total" : 10,
"other" : 0,
"terms" : [ {
"term" : "wales”, "count" : 2
}, {
"term" : "serbia”, "count" : 2
}, … {
"term" : "croatia”, "count" : 2
}]
…
DOCUMENT RELATIONS
• ElasticSearch provides 2 mechanisms
• Parent/Child Documents
• add links between documents by defining parent/child ids.
• query example: “return children where parent matches x”
• use case: linking “product” and “offer” documents.
• query-time join
• Nested Documents
• use case: “actions” on a “mention” (Engagor)
• denormalized in Lucene index
• in Lucene index data is stored nearby
• thus local join, thus very fast.
• index-time join
EXAMPLE EXPLAINED
•
•
•
•

range filter on publish_date
query_string w/ (internal version of) user defined query string
date_histogram facet on mention-document publish_date field
term_stats facet per action type on “delay” field nesteddocument “action” of mention-document
• result contains:
• amount of mentions with action
• amount of actions
• total delay of actions
• facet_filter per defined facet.
THE ENGAGOR SETUP
• Running ES since 2 years
• 1 billion social messages, sharded by client
• 20 nodes cluster
• 24GB RAM, 12-18 reserved for ES
• Main data source
• Other storage systems in place mainly for backup

• Usage:
• write heavy (indexing new data all the time, real time)
• less reads (no need for micro-optimizing read caches, yet)
• # updates on data depends on client use case
• social care and/or pure analytics
3 lessons learned …
1/3: INDEXING SPEED
• Bulk Indexing is faster, obviously
• Less network overhead
• With RabbitMQ
• Handles peaks in data
• Allows us to slow down throughput to ES while still
consuming firehoses from our 3rd party services
• Bulk w/ Timeouts
• (so Engagor users get their messages near-realtime)
2/3: CHOOSE SHARDING STRATEGY
WISELY
• Plan # shards on expected growth, not on current set-up
• But, take care …
• We have several shards per monitored topic (related to #
customers and volume of data)
• Biggest problem in our cluster right now is big # shards
• Bugfixes in latest versions
• You can use “aliases” to create “virtual shards”/”windows on
shards”
3/3: TRY TO KEEP UP WITH RELEASES
• ElasticSearch is a young product
• 0.90 releases
• September 2013
• August 2013
• June 2013
• May 2013
• April 2013

• The 1.0 release is for early 2014.
• Updates help you
• Great improvements over every release
• Much needed bugfixes over every release
• Bonus Tip: + keep your JVM up to date
“filtering, free text search & analytics
all in the same box”
“power of search and data-digging
in the hands of your users”
flexible and powerful open
source, distributed real-time search
and analytics engine for the cloud
$ sudo bin/service/elasticsearch stop
Introduction To ElasticSearch (DamnData)
Introduction To ElasticSearch (DamnData)

Contenu connexe

Plus de Jurriaan Persyn

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.Jurriaan Persyn
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the CloudJurriaan Persyn
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At NetlogJurriaan Persyn
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at NetlogJurriaan Persyn
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogJurriaan Persyn
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Jurriaan Persyn
 

Plus de Jurriaan Persyn (7)

An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Engagor Walkthrough
Engagor WalkthroughEngagor Walkthrough
Engagor Walkthrough
 
Developing Social Games in the Cloud
Developing Social Games in the CloudDeveloping Social Games in the Cloud
Developing Social Games in the Cloud
 
Database Sharding At Netlog
Database Sharding At NetlogDatabase Sharding At Netlog
Database Sharding At Netlog
 
Database Sharding at Netlog
Database Sharding at NetlogDatabase Sharding at Netlog
Database Sharding at Netlog
 
Meet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: NetlogMeet the OpenSocial Containers: Netlog
Meet the OpenSocial Containers: Netlog
 
Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)Get Your Frontend Sorted (Barcamp Gent 2008)
Get Your Frontend Sorted (Barcamp Gent 2008)
 

Dernier

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 

Dernier (20)

From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 

Introduction To ElasticSearch (DamnData)

  • 1.
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9. LUCENE • • • • Tagline: “Proven Search Capabilities” Free & Open Source Created in 1999 Features: • Indexes & Analyzes Data • Tokenizing, Stemming, Filtering • Search Queries • Phrases, wildcards, proximity searches, ranges, fielded searches • Relevance Scoring, Field Sorting
  • 10. ELASTICSEARCH • • • • Tagline: “You know, for Search” Free & Open Source Created by Shay Banon @kimchy Versions • First public release, v0.4 in February 2010 • A rewrite of earlier “Compass” project, w/ scalability built-in from the very core • Latest release 0.90.5 • In Java, so inherently cross-platform
  • 11. DISTRIBUTED & HIGHLY AVAILABLE • Multiple servers (nodes) running in a cluster • Acts as single service (internal routing) • Data is split into shards (# shards is configurable) • Zero or more replicas • Replicas on different servers (server pools) for failover • Node in cluster goes down? Replica takes over. • Self managing cluster • Automatic master detection + failover • Responsible for distribution/relocating shards
  • 12. $ cd ~/Downloads $ wget https://download […] /elasticsearch-0.90.5.tar.gz $ tar -xzf elasticsearch-0.90.5.tar.gz $ cd elasticsearch-0.90.5/ $ ./bin/elasticsearch
  • 13.
  • 14. $ curl -XPUT http://localhost:9200/reddevils/matches/1 -d '{"date": "2013-10-15T19:00:00Z", "opponent": "Wales", "result": "1-1"}' {"ok":true,"_index":"reddevils","_type":"matches","_id":"1"," _version":1}
  • 15. $ curl -XPUT http://localhost:9200/reddevils/matches/2 -d '{"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"}' {"ok":true,"_index":"reddevils","_type":"matches","_id":"2"," _version":1}
  • 16. $ curl -XPUT http://localhost:9200/reddevils/matches/2 -d '{"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2", "girlfriend_attention_span": 30}’ {"ok":true,"_index":"reddevils","_type":"matches","_id":"2"," _version":2}
  • 17.
  • 18. “Aha! A NoSQL store?!”
  • 19.
  • 20.
  • 21. QUERY DSL • Full Text Search • Search for “Croatia” • Structured Search • Search for “All matches where outcome was „1-1‟” • Analytics • Search for “Average attention span of my girlfriend” • Incl. custom functions (scripts) • … or a combination of those!
  • 22. QUERY DSL (CONT‟D) • Searching in your data set … • queries: full text search & relevance scoring • filters: exact matches • Aggregating information from your data set … • facets: • Averages • Sums • Date histograms •…
  • 23. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=tru e' -d '{ "query": { "query_string": { "query": "croatia" } } }'
  • 24. { "took" : 18, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.40240064, "hits" : [ { "_index" : "reddevils", "_type" : "matches", "_id" : "2", "_score" : 0.40240064, "_source" : {"date": "2013-10-11T15:00:00Z", "opponent": "Croatia", "result": "1-2"} }, { "_index" : "reddevils", "_type" : "matches", "_id" : "4", "_score" : 0.3125, "_source" : {"date": "2012-09-11T15:00:00Z", "opponent": "Croatia", "result": "1-1"} }] } }
  • 25. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=true' -d '{ "query": { "constant_score": { "filter": { "term": { "result": "1-1” } } } } }’
  • 26. curl -XGET 'http://localhost:9200/reddevils/matches/_search?pretty=true' -d '{ "size": 0, "facets": { "opponent": { "terms": { "field": "opponent" } } } }'
  • 27. { … "facets" : { "opponent" : { "_type" : "terms", "missing" : 0, "total" : 10, "other" : 0, "terms" : [ { "term" : "wales”, "count" : 2 }, { "term" : "serbia”, "count" : 2 }, … { "term" : "croatia”, "count" : 2 }] …
  • 28. DOCUMENT RELATIONS • ElasticSearch provides 2 mechanisms • Parent/Child Documents • add links between documents by defining parent/child ids. • query example: “return children where parent matches x” • use case: linking “product” and “offer” documents. • query-time join • Nested Documents • use case: “actions” on a “mention” (Engagor) • denormalized in Lucene index • in Lucene index data is stored nearby • thus local join, thus very fast. • index-time join
  • 29.
  • 30. EXAMPLE EXPLAINED • • • • range filter on publish_date query_string w/ (internal version of) user defined query string date_histogram facet on mention-document publish_date field term_stats facet per action type on “delay” field nesteddocument “action” of mention-document • result contains: • amount of mentions with action • amount of actions • total delay of actions • facet_filter per defined facet.
  • 31. THE ENGAGOR SETUP • Running ES since 2 years • 1 billion social messages, sharded by client • 20 nodes cluster • 24GB RAM, 12-18 reserved for ES • Main data source • Other storage systems in place mainly for backup • Usage: • write heavy (indexing new data all the time, real time) • less reads (no need for micro-optimizing read caches, yet) • # updates on data depends on client use case • social care and/or pure analytics
  • 32.
  • 33.
  • 34.
  • 35.
  • 37. 1/3: INDEXING SPEED • Bulk Indexing is faster, obviously • Less network overhead • With RabbitMQ • Handles peaks in data • Allows us to slow down throughput to ES while still consuming firehoses from our 3rd party services • Bulk w/ Timeouts • (so Engagor users get their messages near-realtime)
  • 38. 2/3: CHOOSE SHARDING STRATEGY WISELY • Plan # shards on expected growth, not on current set-up • But, take care … • We have several shards per monitored topic (related to # customers and volume of data) • Biggest problem in our cluster right now is big # shards • Bugfixes in latest versions • You can use “aliases” to create “virtual shards”/”windows on shards”
  • 39. 3/3: TRY TO KEEP UP WITH RELEASES • ElasticSearch is a young product • 0.90 releases • September 2013 • August 2013 • June 2013 • May 2013 • April 2013 • The 1.0 release is for early 2014. • Updates help you • Great improvements over every release • Much needed bugfixes over every release • Bonus Tip: + keep your JVM up to date
  • 40.
  • 41. “filtering, free text search & analytics all in the same box”
  • 42. “power of search and data-digging in the hands of your users”
  • 43. flexible and powerful open source, distributed real-time search and analytics engine for the cloud

Notes de l'éditeur

  1. Good afternoonMy name is Jurriaan,And I want to thank Thijs for inviting me to speak on this conference.
  2. I want to introduceElasticSearch to you.Worked with ElasticSearch for the last 2 yearsAt a company called EngagorSocial media monitoring and management tool.We’re based in Gent, have an office in SFO and are a team of 25 people now, 10 technicalInstead of diving into the technical details firstI want to start with showing you one of the coolestthings we’ve built with it so far
  3. Engagor is basically a huge database of social messages. (profiles, keyword searches)Our clients use Engagor to address those messages, like replying to it, or assigning it to a team member or adding metadata.This is a page in Engagor where you see the amount of incoming messages per day and how often and how fast they are being replied to.The dataset is about 40k social messages, data from the last 28days.The purple bars are the response times per day.And there’s graphs with details on response times during and outside business hoursOur clients use this to evaluate performance of the customer support they deliver eg through twitter & facebook
  4. This is the same page, but now of data from the last 3 months (140k messages) and grouped by week.Andthey use it to improve there response times … As you can see here.
  5. Not only this, but our users can also drill down and search and filter into the dataset.Here’s is a filter for messages with a certain tag, negative sentiment and from a certain region.They might want to have a better response time for certain times of messages. Give priorities.This filter can then be applied on the previous page, and you get statistics about the subset of data in real time.
  6. This shows the page in our debug mode, showing a bunch of statistics about the page that’s rendered.This particular request was 32ms.But we see that on average, and also for bigger acounts, when searching in millions of messages we get great performance.And it’s realtime. As soon as a message comes in, action is done, it’s in these statistics. No pre-processing.
  7. From the what and who to the how …Main component of Engagor is ElasticSearchWhat I want to do for the next 20 minutes is… quickly go over a few verybasic ES things… explain how the example I started with is implemented… and finish with some of the lessons we learned.
  8. If we talk ES, we have to talk LuceneThat’s the search engineElasticSearchis built on top of Lucene. Noticethe flat design of that Lucene logo. This was made in 1999
  9. Apache Foundation projectLucene is a proven technology for search indices.
  10. ES joined in 2010.And it was built to be a full featured search product, with scalability features built in from the bottom.
  11. What does that mean?
  12. So, how do we get started …PrettysimpleDownload, unzip, start.
  13. When it’s running. You can view that it’s running in your browser.The easiest way to interact with ElasticSearch is through it’s REST api.It’s JSON in and JSON out.
  14. Example of adding something to ElasticSearch from your command shell via a HTTP PUT request done by curl.You can do this right after installation. No need to create an index or configure anything, just add data right away.ElasticSearch is smart about what type of data you’re giving it.
  15. Adding a second record (document).
  16. Now we’re adding some more interesting data to the mix …Updating an existing record. (Mind the new version number.)(An update is an atomic DELETE & PUT.)
  17. Want to see view a document?HTTP GETThe url consist of index, type & id.
  18. So, what we have right now is a NoSQL store
  19. Yes, ES is a NoSQL store(and you could use it to replace your current type of data source, but I’m not saying you should)But that’s not the field where ES shines.
  20. ES is for search. So let’s do a search.Here we do a GET request (in the browser)that searches our newly created index for the word “croatia”It returns the match from last Friday.
  21. The language for searching in your ES cluster is called the Query DSLThere’s 3 different basic types of searches
  22. In ES terms this maps to the following words …
  23. By now I’ve added all matches from the qualification campaign, and I will do a fulltext search for croatia.The query string supports everything you can do with Lucene, so that includesWildcardsNear searchesField restrictions
  24. And these are the results. We played Croatia twice, and won once.
  25. This is a filtered search, where we will only get back exact matches.If you can, it’s better to use ES filters. Since they can than be cached by ES.
  26. This ES HTTP Request will facet our data on the opponent field.Thus returning how often we played each opponent.
  27. Which is obviously, now we’re qualified, 2 times against each of the other 5 teams in our group.That covers it for the 3 basic types of searches.
  28. I need to explain one more thing before we can go back to the example we started with. And that’s document relations.The equivalent of Mysql join2 typesWe are using nested documents for mentions & actions
  29. So, if you remember this screenshot …This is a single ES call …How is that call set-up?
  30. What setup and hardware is needed to make this work for Engagor?
  31. With all of this, on a typical day this is how ES our dashboard looks like …Lots of greenBlue indicating the current master
  32. And if by now you’re thinking …I don’t believe you.Well, you’re right.
  33. There have been days where it looked like this
  34. ANIMATIONAnd where server density – our monitoring platform – looked like this.We’ve seen servers with load averages up to 1800.I wonder if that’s a record setting value?Getting the full set-up and configuration right, is a bit more work then unzipping the software and starting up 20 nodes
  35. So I want to move on to some lessons learned
  36. Our set-up (firehose)RabbitMQ in frontsometimes (when relocating, initializing, recovering from …) indexing slows down
  37. I want to end with 2 quotes from a presentation from this summer On whyElasticSearch was builtWhat it’s goals were …
  38. Use the right tool for the jobES does filtering, free text & analytics in a single bogThat’s definitely easier then having to move big piles of data between different systems
  39. When I look at the features we can build for our clients, it definitely does that for us.Thanks to elasticsearch we’ve been able to offer our clients features like this,and it’s interesting to see which queries they’re using for their day to day business.
  40. And I’m not sure when exactly this happenedProbably around the time of their hugefundingBut the tagline is no longer“You know, for search” …But is now fully buzzword compliantAnd on that bombshell
  41. Ready to dive in?