2. 2
Intro
Int
• Miguel Bosin
– Support engineer
– Joined in 2015
– Interested in techonology
– Passion about support
• Elastic
– Founded in 2012
– Distributed company
– Elasticsearch: What’s it?
– Open-source:
ES,LS,Kibana and Beats
– Commercial:
X-Pack
3. 3
Intro
• Miguel Bosin
– Support engineer
– Joined in 2015
– Interested in techonology
– Passion about support
• Elastic
– Founded in 2012
– Distributed company
– Elasticsearch: What’s it
– Open-source:
ES,LS,Kibana and Beats
– Commercial:
X-Pack
4. 4
What is it?
Open source
Distributed-scalable
Highly available
Document-oriented (JSON)
RESTful
FT search engine with real-
time search and analytics
capabilities
8. 8
Elasticsearch terminology
A node is a single Elasticsearch instance, a single JVM
Multiple nodes can form a cluster
A cluster or a node can manage multiple indices
An index is a container for data
A shard is a single piece of an Elasticsearch index
A shard is either a primary or a replica
11. 11
Elasticsearch Architecture: Node roles
Master node:
coordinates the cluster
only node able to apply changes to cluster state
publishes updated cluster state to all nodes
Data node:
performs indexing
can allocate shards locally
knows cluster state
12. 12
Elasticsearch Architecture: Node roles II
Client node:
does NOT perform indexing or allocate shards locally
does NOT perform cluster management operations
knows cluster state
smart load balancer (load balancing Kibana searches i.e.)
redirect operations to the nodes that holds the relevant data
calculate aggregations results
13. 13
Nodes roles are set in the elasticsearch.yml
Elasticsearch Architecture: Node roles III
17. 17
Dedicated master nodes –Why / minimum_master_nodes
Indexing and searching data is CPU-, memory-, and I/O-intensive work which can
put pressure on a node’s resources
Avoiding split brain: 2 current master nodes on the same cluster DATA LOSS
Set this setting discovery.zen.minimum_master_nodes to the quorum:
(master_eligible_nodes / 2) + 1
19. 19
Sizing: general factors (server capacity)
• Disks (SSD vs. HD)
• RAM
-1/2 total RAM for ES
-ES heap size max: 30.5Gb
• # CPU cores
-ES threadpools concept
**1 shard—>gets 1 thread—>1 java process—>1core**
20. 20
Sizing: Elasticsearch factors (logging case)
Size of shards
Number of shards on each node
Retention period of data
Mapping configuration
-Which fields are searchable, _source enabled or
not,etc…
Size (average) of the documents
21. 21
Sizing: Capacity planning test I
FIRST: testing on a single node with a single index with one shard
and no replica
THEN: insert as many documents as you can and run some typical
queries
At some point, queries will start to slow down to a threshold, which
no longer meet your requirements
This is the ideal number of documents a single shard is able to
hold
NEXT: Find the ideal number your primary shards (by dividing your
dataset size by the ideal shard size)
FINALLY: Add replicas for HA and improve the read throughput
22. 22
Sizing: Capacity planning test II
Each experiment tries to accomplish a discreet goal and build upon previous
22
Determine various
disk utilization
1 2 3 4
Determine breaking
point of a shard
Determine
saturation point of a
node
Test desired
configuration on
two node cluster
24. 24
Hot / Warm architecture
When using it?
Elasticsearch for larger time-data analytics use cases
Using time-based indices
Able to run an architecture with 3 different types of nodes
25. 25
Hot / Warm architecture: Type of nodes
Master, Hot and Warm nodes:
Master nodes: 3 dedicated master nodes
Hot data nodes: perform all indexing and also hold the most recent daily
(data to be queried most frequently). Powerful machines with SSD storage
Warm data nodes: handle a large amount of read-only indices that are not
queried frequently. Very large attached spinning disks
26. 26
Hot / Warm architecture: tagging
Which node is doing what?
ES needs to know which servers contain the hot nodes and which servers
contain the warm nodes
This can be achieved by assigning arbitrary tags to each server (Hot/Warm)
Tag the node with node.box_type: xxx in elasticsearch.yml
OR start a node using ./bin/elasticsearch --node.box_type xxx
27. 27
Hot / Warm architecture: Force Merge API
Optimizing your indices in the Warm Node
The force merge API allows to force merging of one or more indices
through an API. Optimizes the index for faster search operation
The merge relates to the number of segments a Lucene index holds within
each shard
The force merge operation allows to reduce the number of segments by
merging them:
$ curl -XPOST 'http://localhost:9200/my_index/_forcemerge'