SlideShare une entreprise Scribd logo
1  sur  51
Télécharger pour lire hors ligne
Elasticsearch:
first steps with an
Aggregate-oriented
database
Jug Roma
28/11/2013
Matteo Moci
Me
Matteo Moci
@matteomoci
http://mox.fm
Software Engineer
R&D, new product development
Agenda
• 2 Use cases
• Elasticsearch Basics
• Data Design for scaling
Social Media Analytics Platform
for Marketing Agencies
Scenario

• Using Elasticsearch as:
• Analytics engine
Aggregate repository
•
Use case 1

• count values distribution over
time
Before

• ~10M documents
• Heaviest query:
~10 minutes
•
• Our staff had a problem
After

• ~10M documents
• Heaviest query:
~1 second (also with larger
•
dataset)
Use case 2
• Aggregate-oriented repository
• ...as in DDD

http://ptgmedia.pearsoncmg.com/images/chap10_9780321834577/elementLinks/10fig05.jpg
Elasticsearch
Distributed RESTful search and analytics
real time data and analytics
distributed
high availability
multi tenancy
full-text search
schema free
RESTful, JSON API
Elasticsearch basics
• Install
• API
• Types mapping
• Facets
• Relations
Install
$ wget https://
download.elasticsearch.org/...
$ tar -xf
elasticsearch-0.90.7.tar.gz
Run!
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f

es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk
Run!
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
$ ./elasticsearch-0.90.7/bin/
elasticsearch -f
es
Hulk

Thor
Index a document
$ curl -X PUT localhost:9200/
products/product/1 -d '{
"name" : "Camera"
}'
Search
$ curl	‐X	GET 'localhost:9200/
products/product/_search?
q=Camera'
Shards and Replicas
es
Hulk
Products
1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products

1

2

1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
1

2
Shards and Replicas
es
Hulk
Products

Thor
Products
2

1
2

1
Integration

Hulk

Thor
9300

9300
Integration
TransportClient

Hulk

Thor
9300

9300
Async Java API
this.client.prepareGet("documents", "document", id)
//async, non blocking APIs
//use a listener to handle result. non-blocking
.execute(new ActionListener<GetResponse>() {
@Override
public void onResponse(GetResponse
getFields)
{
//
}
@Override
public void onFailure(Throwable e) {
//
}
Mapping
Mappings define how primitive
types are stored and analyzed
Mapping
• JSON data is parsed on indexing
• Mapping is done on first field indexing
• Inferred if not configured (!)
• Types: float, long, boolean, date

(+formatting), object, nested
• String type can have arbitrary analyzers
• Fields can be split up in more fields
"text": {
"type": "multi_field",
"fields": {
"text": {
"type": "string",
"index": "analyzed",
"index_analyzer": "whitespace",
"analyzer": "whitespace"
},
"text_bigram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "bigram_analyzer",
"search_analyzer": "bigram_analyzer"
},
"text_trigram": {
"type": "string",
"index": "analyzed",
"index_analyzer": "trigram_analyzer",
"search_analyzer": "trigram_analyzer"
Mapping - lessons
• schema can evolve (e.g. add fields)
• inferred if not specified (!)
• worst case: reindex
• use aliases to enable zero downtime
Search with Facets
final TermsFacetBuilder userFacet =
FacetBuilders.termsFacet(MENTION_FACET_NAME)
.field(USER_ID).size(maxUsersAmount);
SearchResponse response;
response = client.prepareSearch(Indices.USERS)
.setTypes(USER_TYPE)
.setQuery(someQuery).setSize(0)
.setSearchType(SearchType.COUNT)
.addFacet(userFacet).execute().actionGet()
;
final TermsFacet facets = (TermsFacet)
response.getFacets().facetsAsMap()
.get(MENTION_FACET_NAME);
Query

Facets
Date Histogram Facet
The histogram facet works with numeric data by
building a histogram across intervals of the field values.
Each value is placed in a “bucket”
{
 
 
 
 
 
 
 
 
 
 
 
}

 
 
 
 
 
 
 
 
 
 
 

"query" : {
    "match_all" : {}
},
"facets" : {
    "histo1" : {
        "histogram" : {
            "field" : "followers",
            "interval" : 10
        }
    }
}
Facets - lessons
•

•
•

Bug in 0.90.x:
https://github.com/elasticsearch/elasticsearch/
issues/1305*
Solutions:
use 1 shard
ask for top 100 instead of 10
*will be solved in 1.0 with aggregation
module
Analyzers
A Lucene analyzer consists of a tokenizer and
an arbitrary amount of filters (+ char filters)
{
"index":{
"analysis":{
"filter":{
"bigram_shingle_filter":{
"type":"shingle",
"max_shingle_size":2,
"min_shingle_size":2,

...
"analyzer":{
"bigram_analyzer":{
"tokenizer":"whitespace",
"filter":[
"standard",
"bigram_shingle_filter"
]
},
"trigram_analyzer":{
"tokenizer":"whitespace",
"filter":[
"standard",
"trigram_shingle_filter"
]
}

"output_unigrams":"false",
"output_unigrams_if_no_shingles":"fal
se"
},
"trigram_shingle_filter":
{
"type":"shingle",
"max_shingle_size":3,
"min_shingle_size":3,

}
}

"output_unigrams":"false",
"output_unigrams_if_no_shingles":"fal
se"
}
} ...

}
}
Relations between
Documents
Author

1

N

Book

• nested: faster reads, update needs reindex, cross object

match
• parent/child: same shard, no reindex on update, difficult
sorting
Nested Documents
Specify Book type is “nested” in Author’s Mapping
We can query Authors with a query on properties
of nested Books
“Authors who published at least a book with
Penguin, in scifi genre”
curl -XGET localhost:9200/authors/nested_author/
_search -d '
{
"query": {
"filtered": {
"query": {"match_all": {}},
"filter": {
"nested": {
"path": "books",
"query":{
"filtered": {
"query": { "match_all": {}},
"filter": {
"and": [
{"term": {"books.publisher":
"penguin"}},
{"term": {"books.genre": "scifi"}}
]
Parent and Child
Indexing happens separately
Specify _parent type in Child mapping (Book)
When indexing Books, specify id of Author
curl -XPOST localhost:9200/authors/book/_mapping -d
'{
"book":{
"_parent": {"type": "bare_author"}
}
}'

curl -XPOST localhost:9200/authors/book/1?parent=2 -d
'{
"name": "Revelation Space",
"genre": "scifi",
"publisher": "penguin"
}'
Parent and Child query
curl -XPOST localhost:9200/authors/bare_author/
_search -d '{
"query": {
"has_child": {
"type": "book",
"query" : {
"filtered": {
"query": { "match_all": {}},
"filter" : {
"and": [
{"term": {"publisher": "penguin"}},
{"term": {"genre": "scifi"}}
]
Data Design
Index Configurations
• One index “per user”
• Single index
• SI + Routing: 1 index + custom doc routing
•

to shards
Time: 1 index per time window *

* we can search across indices
One Index per user
Hulk

Thor

User1 s0

User1 s1

User2 s0

+ different sharding per user
- small users own (and cost) at least 1 shard
Single Index
Hulk

Thor

Users s0

Users s3

Users s2

+ filter by user id, support growth
- search hits all shards
Single Index + routing
Hulk

Thor

Users s0

Users s3

Users s2

+ a user’s data is all in one shard,
allows large overallocation
Index per time range
Hulk

Thor

2013_01 s1

2013_01 s2

2013_02 s1

+ allows change in future indices
Data Design - lessons
Test, test, test your use case!
Take a single node with one shard and
throw load at it, checking the shard capacity
The shard is the scaling unit:
overallocate to enable future scaling
#shards > #nodes
...ES has lots of other
features!
• Bulk operations
• Percolator (alerts, classification, …)
• Suggesters (“Did you mean …?”)
• Index templates (Automatic index
•
•
•

configuration)
Monitoring API (Amount of memory used,
number of operations, …)
Plugins
...
Thanks!
@matteomoci
http://mox.fm

Contenu connexe

Tendances

Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
Tom Z Zeng
 

Tendances (20)

Elasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetupElasticsearch Introduction at BigData meetup
Elasticsearch Introduction at BigData meetup
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Elastic search Walkthrough
Elastic search WalkthroughElastic search Walkthrough
Elastic search Walkthrough
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Intro to Elasticsearch
Intro to ElasticsearchIntro to Elasticsearch
Intro to Elasticsearch
 
Using elasticsearch with rails
Using elasticsearch with railsUsing elasticsearch with rails
Using elasticsearch with rails
 
Solr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by CaseSolr vs. Elasticsearch - Case by Case
Solr vs. Elasticsearch - Case by Case
 
Simple search with elastic search
Simple search with elastic searchSimple search with elastic search
Simple search with elastic search
 
Data modeling for Elasticsearch
Data modeling for ElasticsearchData modeling for Elasticsearch
Data modeling for Elasticsearch
 
Intro to elasticsearch
Intro to elasticsearchIntro to elasticsearch
Intro to elasticsearch
 
An Introduction to Elastic Search.
An Introduction to Elastic Search.An Introduction to Elastic Search.
An Introduction to Elastic Search.
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Elastic search overview
Elastic search overviewElastic search overview
Elastic search overview
 
Introduction to Elasticsearch
Introduction to ElasticsearchIntroduction to Elasticsearch
Introduction to Elasticsearch
 
Searching Relational Data with Elasticsearch
Searching Relational Data with ElasticsearchSearching Relational Data with Elasticsearch
Searching Relational Data with Elasticsearch
 
ElasticSearch AJUG 2013
ElasticSearch AJUG 2013ElasticSearch AJUG 2013
ElasticSearch AJUG 2013
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Elasticsearch Introduction
Elasticsearch IntroductionElasticsearch Introduction
Elasticsearch Introduction
 
Elasticsearch presentation 1
Elasticsearch presentation 1Elasticsearch presentation 1
Elasticsearch presentation 1
 

Similaire à Elasticsearch first-steps

1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
MongoDB
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
琛琳 饶
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
DATAVERSITY
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
Gabriel Moreira
 

Similaire à Elasticsearch first-steps (20)

Elasticsearch an overview
Elasticsearch   an overviewElasticsearch   an overview
Elasticsearch an overview
 
1403 app dev series - session 5 - analytics
1403   app dev series - session 5 - analytics1403   app dev series - session 5 - analytics
1403 app dev series - session 5 - analytics
 
How ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps lifeHow ElasticSearch lives in my DevOps life
How ElasticSearch lives in my DevOps life
 
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & AggregationWebinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation
Webinar: Applikationsentwicklung mit MongoDB : Teil 5: Reporting & Aggregation
 
REST easy with API Platform
REST easy with API PlatformREST easy with API Platform
REST easy with API Platform
 
Elasticsearch - basics and beyond
Elasticsearch - basics and beyondElasticsearch - basics and beyond
Elasticsearch - basics and beyond
 
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
MongoDB .local Houston 2019: Best Practices for Working with IoT and Time-ser...
 
10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling10gen Presents Schema Design and Data Modeling
10gen Presents Schema Design and Data Modeling
 
Extensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPopExtensible RESTful Applications with Apache TinkerPop
Extensible RESTful Applications with Apache TinkerPop
 
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
(BDT209) Launch: Amazon Elasticsearch For Real-Time Data Analytics
 
Relevance trilogy may dream be with you! (dec17)
Relevance trilogy  may dream be with you! (dec17)Relevance trilogy  may dream be with you! (dec17)
Relevance trilogy may dream be with you! (dec17)
 
Discovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender SystemsDiscovering User's Topics of Interest in Recommender Systems
Discovering User's Topics of Interest in Recommender Systems
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
Schema Design with MongoDB
Schema Design with MongoDBSchema Design with MongoDB
Schema Design with MongoDB
 
Elasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engineElasticsearch a real-time distributed search and analytics engine
Elasticsearch a real-time distributed search and analytics engine
 
Schema design
Schema designSchema design
Schema design
 
Your Database Cannot Do this (well)
Your Database Cannot Do this (well)Your Database Cannot Do this (well)
Your Database Cannot Do this (well)
 
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
Java/Scala Lab: Борис Трофимов - Обжигающая Big Data.
 
Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)Indexing and Query Optimizer (Mongo Austin)
Indexing and Query Optimizer (Mongo Austin)
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 

Dernier

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Dernier (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Elasticsearch first-steps