SlideShare a Scribd company logo
1 of 22
Download to read offline
Elasticsearch
“war stories”
Well Hello There! I am Arno Broekhof
Data Engineer ( full stack ) @Dataworkz

Working with elasticsearch since 2011

Dutch National Police
History of Elasticsearch
• Created by Shay Banon

• Compass

• Elasticsearch == Compass 3.0

• First release in February 2010

• Abstraction layer on top of Lucene
Present Day
24 Elasticsearch Clusters
441 Nodes
5477 GB Ram Memory
343 TB Used Data
3798 Indices
Zen Discovery
discovery.zen.ping.multicast.enabled: true
• Elasticsearch nodes uses multicast traffic for discovery

• Default setting in ES < 5x
Not a database
• Persistency
• Consistency
• Security
• SELECT * FROM pet WHERE name LIKE 'b%';
• Total amount of data < 512GB
Shard Sizing
“Too Many Shards or the Gazillion Shards Problem”
• 	 A shard is a Lucene index under the covers, which uses file handles, memory, and CPU cycles.	 

• Every search request needs to hit a copy of every shard in the index. That’s fine if every shard is
sitting on a different node, but not if many shards have to compete for the same resources.

• Term statistics, used to calculate relevance, are per shard. Having a small amount of data in many
shards leads to poor relevance.
How many shards?
• 1.000.000 documents

• Index of 256GB

• 6 nodes

• 1 node has 8 cores and 30GB Heap
256GB / ( 80% heap of 1 node ) = +/- 10 shards
curl -XGET http://localhost:9200/_cat/indices
Disable _source field
• The update, update_by_query, and reindex APIs.

• On the fly highlighting.

• The ability to reindex from one Elasticsearch index to another, 

either to change mappings or analysis, 

or to upgrade an index to a new major version.

• The ability to debug queries or aggregations 

by viewing the original document used at index time.

• Potentially in the future, the ability to repair index corruption automatically.
How much indices
“remember that there is no rule that limits
your application to using only a single index.”
Dynamic Mappings
• Not everything needs to be searchable
"avatarLink": {
"type": "string",
"index": "not_analyzed",
"doc_values": true
},
• Use Explicit Mapping when possible
{
“job” : “Some job description”,
“date”: “1-10-2017”
}
{
“job” : “Some job description”,
“date”: “NO_DATE”
}
Where is my memory?
{
“aggs” : {
“players”: {
“terms”: {
“field”: “players”,
“size”: 10
}
}
},
“aggs”: {
“other”: {
“terms” : {
“field”: “players”,
“size”: 5
}
}
}
}
• The aggregation will return a list of the 

top 10 players and a list of the 

top five supporting players for each top player

• 50 results

• Minimal effort, Maximum memory
Where is my memory?
{
“aggs” : {
“players”: {
“terms”: {
“field”: “players”,
“size”: 10,
“collect_mode”: “breadth_first”
}
}
},
“aggs”: {
“other”: {
“terms” : {
“field”: “players”,
“size”: 5
}
}
}
}
• Use collect mode if possible

• Trims one level at a time

• Minimal change, Maximum performance
Where is my data?
public void insert(final JsonArray jsonArray) {
if (jsonArray.size() == 0) {
return;
}
BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk();
this.setEsRefreshInterval("-1");
jsonArray.forEach(e -> {
String id = e.getAsJsonObject().get("name").toString();
bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(),
configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id));
});
BulkResponse bulkResponse = bulkRequestBuilder.get();
LOGGER.debug("bulk inserted {} items took: {} with failures: {}",
bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures());
}
Where is my data?
public void insert(final JsonArray jsonArray) {
if (jsonArray.size() == 0) {
return;
}
BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk();
this.setEsRefreshInterval("-1");
jsonArray.forEach(e -> {
String id = e.getAsJsonObject().get("name").toString();
bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(),
configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id));
});
BulkResponse bulkResponse = bulkRequestBuilder.get();
LOGGER.debug("bulk inserted {} items took: {} with failures: {}",
bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures());
}
Query or Filter?
Queries —> should be used when performing a full-text search, 

when scoring of results is required (think search results ranked by relevancy).



Filters —> are much faster than queries, mainly because they don’t score the results.  
If you just want to return all of the products that are blue, 

or that cost more than €50, use filters!
_type == _type
• Use unique types

• Why wordpress post_type == _type is a bad idea

• When deleting a post a document is identified both by its _id and _type
Search limits
• Default limits to 10

• Max results limits to 10.000

• If you want everything use the scroll api
We have a distributed search engine, nodes can fail!
• We have shards replica’s

• Single master

• Use dedicated masters
Slow recovery
-XPUT _cluster/settings -d ‘{
"transient" : {
"cluster.routing.allocation.cluster_concurrent_rebalance" : "5",
"cluster.routing.allocation.node_concurrent_recoveries" : "5",
"cluster.routing.allocation.node_initial_primaries_recoveries" : "4",
"indices.recovery.concurrent_streams" : "4",
"indices.recovery.max_bytes_per_sec" : "200mb",
"indices.store.throttle.max_bytes_per_sec" : "100mb"
}
}’
What brings the future?
• Java Transport Client is deprecated, REST is the way to go

• Cross Cluster Searches

• Index sorting during indexing

• Only one type can exist

• Better use of transaction logs

• Sparse Doc Values
Questions?

More Related Content

What's hot

MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
MongoDB
 

What's hot (20)

MongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and MergingMongoDB Chunks - Distribution, Splitting, and Merging
MongoDB Chunks - Distribution, Splitting, and Merging
 
Dapper performance
Dapper performanceDapper performance
Dapper performance
 
MongoDB Scalability Best Practices
MongoDB Scalability Best PracticesMongoDB Scalability Best Practices
MongoDB Scalability Best Practices
 
Managing Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDBManaging Data and Operation Distribution In MongoDB
Managing Data and Operation Distribution In MongoDB
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
MongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster TutorialMongoDB - Sharded Cluster Tutorial
MongoDB - Sharded Cluster Tutorial
 
Triggers In MongoDB
Triggers In MongoDBTriggers In MongoDB
Triggers In MongoDB
 
Workshop: Learning Elasticsearch
Workshop: Learning ElasticsearchWorkshop: Learning Elasticsearch
Workshop: Learning Elasticsearch
 
Introduction to MongoDB
Introduction to MongoDBIntroduction to MongoDB
Introduction to MongoDB
 
Query DSL In Elasticsearch
Query DSL In ElasticsearchQuery DSL In Elasticsearch
Query DSL In Elasticsearch
 
Postgres Performance for Humans
Postgres Performance for HumansPostgres Performance for Humans
Postgres Performance for Humans
 
Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015Elasticsearch - DevNexus 2015
Elasticsearch - DevNexus 2015
 
Hazelcast
HazelcastHazelcast
Hazelcast
 
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTOClickHouse Introduction by Alexander Zaitsev, Altinity CTO
ClickHouse Introduction by Alexander Zaitsev, Altinity CTO
 
ELK: a log management framework
ELK: a log management frameworkELK: a log management framework
ELK: a log management framework
 
MongoDB - External Authentication
MongoDB - External AuthenticationMongoDB - External Authentication
MongoDB - External Authentication
 
MongoDB Performance Tuning
MongoDB Performance TuningMongoDB Performance Tuning
MongoDB Performance Tuning
 
Elasticsearch - under the hood
Elasticsearch - under the hoodElasticsearch - under the hood
Elasticsearch - under the hood
 
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности CassandraАндрей Козлов (Altoros): Оптимизация производительности Cassandra
Андрей Козлов (Altoros): Оптимизация производительности Cassandra
 
Advanced data access with Dapper
Advanced data access with DapperAdvanced data access with Dapper
Advanced data access with Dapper
 

Similar to Elasticsearch War Stories

Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
dnoble00
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
DaeMyung Kang
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on Android
Sven Haiges
 

Similar to Elasticsearch War Stories (20)

Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
Using ElasticSearch as a fast, flexible, and scalable solution to search occu...
 
Elasticsearch - Zero to Hero
Elasticsearch - Zero to HeroElasticsearch - Zero to Hero
Elasticsearch - Zero to Hero
 
Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"Elasticsearch & "PeopleSearch"
Elasticsearch & "PeopleSearch"
 
Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!Elasticsearch: You know, for search! and more!
Elasticsearch: You know, for search! and more!
 
Scaling Analytics with elasticsearch
Scaling Analytics with elasticsearchScaling Analytics with elasticsearch
Scaling Analytics with elasticsearch
 
Elasticsearch first-steps
Elasticsearch first-stepsElasticsearch first-steps
Elasticsearch first-steps
 
ElasticSearch for .NET Developers
ElasticSearch for .NET DevelopersElasticSearch for .NET Developers
ElasticSearch for .NET Developers
 
Elk presentation1#3
Elk presentation1#3Elk presentation1#3
Elk presentation1#3
 
Elasticsearch
ElasticsearchElasticsearch
Elasticsearch
 
Superficial mongo db
Superficial mongo dbSuperficial mongo db
Superficial mongo db
 
Hands On Spring Data
Hands On Spring DataHands On Spring Data
Hands On Spring Data
 
CouchDB on Android
CouchDB on AndroidCouchDB on Android
CouchDB on Android
 
Streaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & ElasticsearchStreaming using Kafka Flink & Elasticsearch
Streaming using Kafka Flink & Elasticsearch
 
NoSQL meets Microservices - Michael Hackstein
NoSQL meets Microservices -  Michael HacksteinNoSQL meets Microservices -  Michael Hackstein
NoSQL meets Microservices - Michael Hackstein
 
About elasticsearch
About elasticsearchAbout elasticsearch
About elasticsearch
 
Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!Data access 2.0? Please welcome: Spring Data!
Data access 2.0? Please welcome: Spring Data!
 
[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화[2D1]Elasticsearch 성능 최적화
[2D1]Elasticsearch 성능 최적화
 
[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화[2 d1] elasticsearch 성능 최적화
[2 d1] elasticsearch 성능 최적화
 
Compass Framework
Compass FrameworkCompass Framework
Compass Framework
 
NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020NoSQL Endgame DevoxxUA Conference 2020
NoSQL Endgame DevoxxUA Conference 2020
 

Recently uploaded

+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
Health
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
VictoriaMetrics
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
masabamasaba
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
masabamasaba
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
masabamasaba
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
masabamasaba
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 

Recently uploaded (20)

Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
Crypto Cloud Review - How To Earn Up To $500 Per DAY Of Bitcoin 100% On AutoP...
 
tonesoftg
tonesoftgtonesoftg
tonesoftg
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
Architecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the pastArchitecture decision records - How not to get lost in the past
Architecture decision records - How not to get lost in the past
 
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
Large-scale Logging Made Easy: Meetup at Deutsche Bank 2024
 
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
%+27788225528 love spells in Toronto Psychic Readings, Attraction spells,Brin...
 
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
%+27788225528 love spells in new york Psychic Readings, Attraction spells,Bri...
 
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
%+27788225528 love spells in Colorado Springs Psychic Readings, Attraction sp...
 
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
Devoxx UK 2024 - Going serverless with Quarkus, GraalVM native images and AWS...
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand%in Midrand+277-882-255-28 abortion pills for sale in midrand
%in Midrand+277-882-255-28 abortion pills for sale in midrand
 
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
%+27788225528 love spells in Knoxville Psychic Readings, Attraction spells,Br...
 
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
Abortion Pill Prices Tembisa [(+27832195400*)] 🏥 Women's Abortion Clinic in T...
 
%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare%in Harare+277-882-255-28 abortion pills for sale in Harare
%in Harare+277-882-255-28 abortion pills for sale in Harare
 
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital TransformationWSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
 
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
MarTech Trend 2024 Book : Marketing Technology Trends (2024 Edition) How Data...
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni%in Benoni+277-882-255-28 abortion pills for sale in Benoni
%in Benoni+277-882-255-28 abortion pills for sale in Benoni
 

Elasticsearch War Stories

  • 2. Well Hello There! I am Arno Broekhof Data Engineer ( full stack ) @Dataworkz Working with elasticsearch since 2011 Dutch National Police
  • 3. History of Elasticsearch • Created by Shay Banon • Compass • Elasticsearch == Compass 3.0 • First release in February 2010 • Abstraction layer on top of Lucene
  • 4. Present Day 24 Elasticsearch Clusters 441 Nodes 5477 GB Ram Memory 343 TB Used Data 3798 Indices
  • 5. Zen Discovery discovery.zen.ping.multicast.enabled: true • Elasticsearch nodes uses multicast traffic for discovery • Default setting in ES < 5x
  • 6. Not a database • Persistency • Consistency • Security • SELECT * FROM pet WHERE name LIKE 'b%'; • Total amount of data < 512GB
  • 7. Shard Sizing “Too Many Shards or the Gazillion Shards Problem” • A shard is a Lucene index under the covers, which uses file handles, memory, and CPU cycles. • Every search request needs to hit a copy of every shard in the index. That’s fine if every shard is sitting on a different node, but not if many shards have to compete for the same resources. • Term statistics, used to calculate relevance, are per shard. Having a small amount of data in many shards leads to poor relevance.
  • 8. How many shards? • 1.000.000 documents • Index of 256GB • 6 nodes • 1 node has 8 cores and 30GB Heap 256GB / ( 80% heap of 1 node ) = +/- 10 shards curl -XGET http://localhost:9200/_cat/indices
  • 9. Disable _source field • The update, update_by_query, and reindex APIs. • On the fly highlighting. • The ability to reindex from one Elasticsearch index to another, 
 either to change mappings or analysis, 
 or to upgrade an index to a new major version. • The ability to debug queries or aggregations 
 by viewing the original document used at index time. • Potentially in the future, the ability to repair index corruption automatically.
  • 10. How much indices “remember that there is no rule that limits your application to using only a single index.”
  • 11. Dynamic Mappings • Not everything needs to be searchable "avatarLink": { "type": "string", "index": "not_analyzed", "doc_values": true }, • Use Explicit Mapping when possible { “job” : “Some job description”, “date”: “1-10-2017” } { “job” : “Some job description”, “date”: “NO_DATE” }
  • 12. Where is my memory? { “aggs” : { “players”: { “terms”: { “field”: “players”, “size”: 10 } } }, “aggs”: { “other”: { “terms” : { “field”: “players”, “size”: 5 } } } } • The aggregation will return a list of the 
 top 10 players and a list of the 
 top five supporting players for each top player • 50 results • Minimal effort, Maximum memory
  • 13. Where is my memory? { “aggs” : { “players”: { “terms”: { “field”: “players”, “size”: 10, “collect_mode”: “breadth_first” } } }, “aggs”: { “other”: { “terms” : { “field”: “players”, “size”: 5 } } } } • Use collect mode if possible • Trims one level at a time • Minimal change, Maximum performance
  • 14. Where is my data? public void insert(final JsonArray jsonArray) { if (jsonArray.size() == 0) { return; } BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk(); this.setEsRefreshInterval("-1"); jsonArray.forEach(e -> { String id = e.getAsJsonObject().get("name").toString(); bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(), configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id)); }); BulkResponse bulkResponse = bulkRequestBuilder.get(); LOGGER.debug("bulk inserted {} items took: {} with failures: {}", bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures()); }
  • 15. Where is my data? public void insert(final JsonArray jsonArray) { if (jsonArray.size() == 0) { return; } BulkRequestBuilder bulkRequestBuilder = transportClient.prepareBulk(); this.setEsRefreshInterval("-1"); jsonArray.forEach(e -> { String id = e.getAsJsonObject().get("name").toString(); bulkRequestBuilder.add(transportClient.prepareIndex(configuration.getEsIndex(), configuration.getEsTypeName()).setSource(e.toString(),XContentType.JSON).setId(id)); }); BulkResponse bulkResponse = bulkRequestBuilder.get(); LOGGER.debug("bulk inserted {} items took: {} with failures: {}", bulkResponse.getItems().length, bulkResponse.getTook(), bulkResponse.hasFailures()); }
  • 16. Query or Filter? Queries —> should be used when performing a full-text search, 
 when scoring of results is required (think search results ranked by relevancy).
 
 Filters —> are much faster than queries, mainly because they don’t score the results.   If you just want to return all of the products that are blue, 
 or that cost more than €50, use filters!
  • 17. _type == _type • Use unique types • Why wordpress post_type == _type is a bad idea • When deleting a post a document is identified both by its _id and _type
  • 18. Search limits • Default limits to 10 • Max results limits to 10.000 • If you want everything use the scroll api
  • 19. We have a distributed search engine, nodes can fail! • We have shards replica’s • Single master • Use dedicated masters
  • 20. Slow recovery -XPUT _cluster/settings -d ‘{ "transient" : { "cluster.routing.allocation.cluster_concurrent_rebalance" : "5", "cluster.routing.allocation.node_concurrent_recoveries" : "5", "cluster.routing.allocation.node_initial_primaries_recoveries" : "4", "indices.recovery.concurrent_streams" : "4", "indices.recovery.max_bytes_per_sec" : "200mb", "indices.store.throttle.max_bytes_per_sec" : "100mb" } }’
  • 21. What brings the future? • Java Transport Client is deprecated, REST is the way to go • Cross Cluster Searches • Index sorting during indexing • Only one type can exist • Better use of transaction logs • Sparse Doc Values