SlideShare une entreprise Scribd logo
1  sur  23
Télécharger pour lire hors ligne
複数のElasticsearchクラスタの運用
で消耗しないために
Hokuto Kagaya
開発2センター
ゲームプラットフォームサービス開発室
PION C チーム
In-game Community / Marketing Platform
WHAT'S PION?
• As a time series DB
• As a search engine
• As a log store
WHAT’S Elasticsearch?
Logging
WE HAVE MULTIPLE CLUSTERS FOR..
Event
Processi
ng
Service
Develop
ment
RealSandbox
Purpose
Environment
“Which clusters did I install which plugins on?”
For example..
MULTIPLE CLUSTERS WILL CAUSE..
Basically our clusters are provisioned by Ansible
BUT…
Someone: “Hey, let’s try the XXX plugin on the node YYY of the cluster
ZZZ in DEV environment!”
They forgot to record XXX, YYY, ZZZ…
Easily go down to chaos!
WE NEED A MANAGEMENT TOOL!
• ElasticHQ (OSS)
• Kibana (by Elastic)
• cerebro (OSS)
EXISTING TOOLS
For a single cluster
One of its strengths is that it can support multiple clusters
OK, let’s use
this!
However its main purpose is also deep management of a single cluster
Not for browsing a cluster list
ANOTHER PROBLEM ON Elasticsearch
Not too easy to:
monitor an Elasticsearch cluster
alert us to the abnormal status based on the result of monitoring properly
Kibana or many OSS are very nice, but:
Some detailed metrics (like latency 95%ile) cannot retrieved directly
We cannot see them when Es is under too heavy load
COMPARISON
Multiple clusters? Monitoring? Alerting?
Kibana
partial support
(cross cluster search, dedicated separate
cluster for monitoring)
partial support
(server side metrics)
✔
(with Watcher)
ElasticHQ ✔
(not for browsing)
partial support
(server side metrics)
✘
What we need
✔
(w/ high browsability)
✔ ✔
OK, let’s make it by
ourselves!
Screenshots
RUBBER BAND - TOOLKIT FOR ES MANAGEMENT
Rubber Band UI, Health Watcher, Client - architecture
Rubber Band UI, Health Watcher, Client - architecture
Rubber Band UI, Health Watcher, Client - architecture
TWO OPTIONS FOR MONITORING
Monitor clusters’ states directly
/_cat/***
/_cluster/health
/_nodes/***
Monitor client-side metrics
can compute detailed metrics
can access even when a cluster is highly loaded (via our tool)
Rubber Band UI, Health Watcher, Client - architecture
HOW TO ALERT ON A CLUSTER STATUS?
The X-Pack GOLD license supports Watcher, which also can be
used to check the cluster health out-of-the-box!
{
"trigger" : {
"schedule" : { "interval" : "10s" }
},
"input" : {
"http" : {
"request" : {
"host" : "localhost",
"port" : 9200,
"path" : "/_cluster/health"
}
}
Uses cluster health API!
We can also utilize it
by ourselves:)
EXAMPLES OF ALERT FROM HEALTH WATCHER
Rubber Band UI, Health Watcher, Client - architecture
MILESTONE
PHASE 1
Rubber Band UI
Rubber Band Health Watcher
Rubber Band Client (Simple REST client wrapper)
PHASE 2
• Rubber Band Curator (Centralized wrapper of curator)
• Open to the other internal teams
PHASE 3 • Publish it as a OSS
KEY TAKEAWAYS
How can we manage multiple clusters without any chaos?
Our toolkit: Rubber Band
A simple UI with information aggregation and appropriate delegation
How can we do proper monitoring and alerting?
Uses both of direct server states and client metrics
Implements a simple health-check server by ourselves
And..
WE ARE HIRING!
THANK YOU
@Component
public class ElasticsearchClientWrapper {
private final RestHighLevelClient elasticsearchClient;
private final MeterRegistry meterRegistry;
public ElasticsearchClientWrapper(RestHighLevelClient elasticsearchClient,
MeterRegistry meterRegistry) {
this.elasticsearchClient = elasticsearchClient;
this.meterRegistry = meterRegistry;
}
public void searchAndGetAggregationAsync(SearchRequest searchRequest) {
Timer.Sample sample = Timer.start(meterRegistry);
elasticsearchClient.searchAsync(searchRequest, new ActionListener<SearchResponse>() {
@Override
public void onResponse(SearchResponse searchResponse) {
sample.stop(meterRegistry.timer("metrics.timer", "success"));
// do stuff..
}
@Override
public void onFailure(Exception e) {
sample.stop(meterRegistry.timer("metrics.timer", "failure"));
// do fallback..
}
});
}
Wrap the official HighLevelRESTClient
See also: Elasticsearch を検索エンジンとして利用する際のポイント
https://engineering.linecorp.com/ja/blog/detail/99

Contenu connexe

Tendances

Mocloudos - Feather-weight Cloud OS developed within
14 man-days
Mocloudos - Feather-weight Cloud OS developed within
14 man-daysMocloudos - Feather-weight Cloud OS developed within
14 man-days
Mocloudos - Feather-weight Cloud OS developed within
14 man-days
Masaki Muranaka
 

Tendances (20)

DevOps Practices: Configuration as Code
DevOps Practices:Configuration as CodeDevOps Practices:Configuration as Code
DevOps Practices: Configuration as Code
 
308 the dark side of containers new
308 the dark side of containers new308 the dark side of containers new
308 the dark side of containers new
 
Serverless framework와 CircleCI를 통한 NoOps 맛보기
Serverless framework와 CircleCI를 통한 NoOps 맛보기Serverless framework와 CircleCI를 통한 NoOps 맛보기
Serverless framework와 CircleCI를 통한 NoOps 맛보기
 
CMS Tools for Developers- Owen Harris
CMS Tools for Developers- Owen HarrisCMS Tools for Developers- Owen Harris
CMS Tools for Developers- Owen Harris
 
Cloud infrastructures - Slide Set 6 - BOSH | anynines
Cloud infrastructures - Slide Set 6 - BOSH | anyninesCloud infrastructures - Slide Set 6 - BOSH | anynines
Cloud infrastructures - Slide Set 6 - BOSH | anynines
 
CI/CD Pipeline with Octopus Deploy
CI/CD Pipeline with Octopus DeployCI/CD Pipeline with Octopus Deploy
CI/CD Pipeline with Octopus Deploy
 
Containerize All the (Multi-Platform) Things! by Phil Estes
Containerize All the (Multi-Platform) Things! by Phil EstesContainerize All the (Multi-Platform) Things! by Phil Estes
Containerize All the (Multi-Platform) Things! by Phil Estes
 
A journey-to-a-button
A journey-to-a-buttonA journey-to-a-button
A journey-to-a-button
 
Building a Container Platform with docker swarm
Building a Container Platform with docker swarmBuilding a Container Platform with docker swarm
Building a Container Platform with docker swarm
 
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
OpenNebulaConf 2016 - Icinga2 - APIFY them all by Achim Ledermüller, Netways ...
 
Mocloudos - Feather-weight Cloud OS developed within
14 man-days
Mocloudos - Feather-weight Cloud OS developed within
14 man-daysMocloudos - Feather-weight Cloud OS developed within
14 man-days
Mocloudos - Feather-weight Cloud OS developed within
14 man-days
 
Real dev ops with containers
Real dev ops with containersReal dev ops with containers
Real dev ops with containers
 
Brainlunch Docker.io
Brainlunch Docker.ioBrainlunch Docker.io
Brainlunch Docker.io
 
Containers in the Microsoft ecosystem
Containers in the Microsoft ecosystemContainers in the Microsoft ecosystem
Containers in the Microsoft ecosystem
 
Introduction to ansible
Introduction to ansibleIntroduction to ansible
Introduction to ansible
 
DockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ DockerDockerCon SF 2015: Getting Started w/ Docker
DockerCon SF 2015: Getting Started w/ Docker
 
Hashicorp: Delivering the Tao of DevOps
Hashicorp: Delivering the Tao of DevOpsHashicorp: Delivering the Tao of DevOps
Hashicorp: Delivering the Tao of DevOps
 
Making app cluster ready
Making app cluster readyMaking app cluster ready
Making app cluster ready
 
Continuum Overview
Continuum OverviewContinuum Overview
Continuum Overview
 
MongoDB + Node.JS + EPAM ROAD
MongoDB + Node.JS + EPAM ROADMongoDB + Node.JS + EPAM ROAD
MongoDB + Node.JS + EPAM ROAD
 

Similaire à Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters

DEFCON 18- These Aren't the Permissions You're Looking For
DEFCON 18- These Aren't the Permissions You're Looking ForDEFCON 18- These Aren't the Permissions You're Looking For
DEFCON 18- These Aren't the Permissions You're Looking For
Michael Scovetta
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology Overview
Open Stack
 

Similaire à Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters (20)

AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
AWS re:Invent 2016: Life Without SSH: Immutable Infrastructure in Production ...
 
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAconCloud-native .NET-Microservices mit Kubernetes @BASTAcon
Cloud-native .NET-Microservices mit Kubernetes @BASTAcon
 
Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)Serverless in production (O'Reilly Software Architecture)
Serverless in production (O'Reilly Software Architecture)
 
Modular Architectures using Micro Services
Modular Architectures using Micro ServicesModular Architectures using Micro Services
Modular Architectures using Micro Services
 
FIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart SystemsFIWARE Wednesday Webinars - Short Term History within Smart Systems
FIWARE Wednesday Webinars - Short Term History within Smart Systems
 
DEFCON 18- These Aren't the Permissions You're Looking For
DEFCON 18- These Aren't the Permissions You're Looking ForDEFCON 18- These Aren't the Permissions You're Looking For
DEFCON 18- These Aren't the Permissions You're Looking For
 
Escape the defaults - Configure Sling like AEM as a Cloud Service
Escape the defaults - Configure Sling like AEM as a Cloud ServiceEscape the defaults - Configure Sling like AEM as a Cloud Service
Escape the defaults - Configure Sling like AEM as a Cloud Service
 
Spring boot microservice metrics monitoring
Spring boot   microservice metrics monitoringSpring boot   microservice metrics monitoring
Spring boot microservice metrics monitoring
 
Spring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics MonitoringSpring Boot - Microservice Metrics Monitoring
Spring Boot - Microservice Metrics Monitoring
 
microservice architecture public education v2
microservice architecture public education v2microservice architecture public education v2
microservice architecture public education v2
 
Elasticsearch features and ecosystem
Elasticsearch features and ecosystemElasticsearch features and ecosystem
Elasticsearch features and ecosystem
 
Successful Patterns for running platforms
Successful Patterns for running platformsSuccessful Patterns for running platforms
Successful Patterns for running platforms
 
Microservices development at scale
Microservices development at scaleMicroservices development at scale
Microservices development at scale
 
Private Apps in the Public Cloud - DevConTLV March 2016
Private Apps in the Public Cloud - DevConTLV March 2016Private Apps in the Public Cloud - DevConTLV March 2016
Private Apps in the Public Cloud - DevConTLV March 2016
 
Masterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM RolesMasterless Puppet Using AWS S3 Buckets and IAM Roles
Masterless Puppet Using AWS S3 Buckets and IAM Roles
 
Architecture: Microservices
Architecture: MicroservicesArchitecture: Microservices
Architecture: Microservices
 
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack SolutionWhy OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
Why OpenStack on UCS? An Introduction to Red Hat and Cisco OpenStack Solution
 
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and CloudAltinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
Altinity Cluster Manager: ClickHouse Management for Kubernetes and Cloud
 
OpenStack Technology Overview
OpenStack Technology OverviewOpenStack Technology Overview
OpenStack Technology Overview
 
Docker Madison, Introduction to Kubernetes
Docker Madison, Introduction to KubernetesDocker Madison, Introduction to Kubernetes
Docker Madison, Introduction to Kubernetes
 

Plus de LINE Corporation

Plus de LINE Corporation (20)

JJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LTJJUG CCC 2018 Fall 懇親会LT
JJUG CCC 2018 Fall 懇親会LT
 
Reduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin CoroutinesReduce dependency on Rx with Kotlin Coroutines
Reduce dependency on Rx with Kotlin Coroutines
 
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみたKotlin/NativeでAndroidのNativeメソッドを実装してみた
Kotlin/NativeでAndroidのNativeメソッドを実装してみた
 
Use Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extensionUse Kotlin scripts and Clova SDK to build your Clova extension
Use Kotlin scripts and Clova SDK to build your Clova extension
 
The Magic of LINE 購物 Testing
The Magic of LINE 購物 TestingThe Magic of LINE 購物 Testing
The Magic of LINE 購物 Testing
 
GA Test Automation
GA Test AutomationGA Test Automation
GA Test Automation
 
UI Automation Test with JUnit5
UI Automation Test with JUnit5UI Automation Test with JUnit5
UI Automation Test with JUnit5
 
Feature Detection for UI Testing
Feature Detection for UI TestingFeature Detection for UI Testing
Feature Detection for UI Testing
 
LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享LINE 新星計劃介紹與新創團隊分享
LINE 新星計劃介紹與新創團隊分享
 
​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享​LINE 技術合作夥伴與應用分享
​LINE 技術合作夥伴與應用分享
 
LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣LINE 開發者社群經營與技術推廣
LINE 開發者社群經營與技術推廣
 
日本開發者大會短講分享
日本開發者大會短講分享日本開發者大會短講分享
日本開發者大會短講分享
 
LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享LINE Chatbot - 活動報名報到設計分享
LINE Chatbot - 活動報名報到設計分享
 
在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes在 LINE 私有雲中使用 Managed Kubernetes
在 LINE 私有雲中使用 Managed Kubernetes
 
LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧LINE TODAY高效率的敏捷測試開發技巧
LINE TODAY高效率的敏捷測試開發技巧
 
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
LINE 區塊鏈平台及代幣經濟 - LINK Chain及LINK介紹
 
LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享LINE Things - LINE IoT平台新技術分享
LINE Things - LINE IoT平台新技術分享
 
LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗LINE Pay - 一卡通支付新體驗
LINE Pay - 一卡通支付新體驗
 
LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務LINE Platform API Update - 打造一個更好的Chatbot服務
LINE Platform API Update - 打造一個更好的Chatbot服務
 
Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發Keynote - ​LINE 的技術策略佈局與跨國產品開發
Keynote - ​LINE 的技術策略佈局與跨國產品開發
 

Dernier

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Dernier (20)

Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Stop Exhausting Yourself in Operating Multiple Elasticsearch Clusters

  • 2. In-game Community / Marketing Platform WHAT'S PION?
  • 3. • As a time series DB • As a search engine • As a log store WHAT’S Elasticsearch?
  • 4. Logging WE HAVE MULTIPLE CLUSTERS FOR.. Event Processi ng Service Develop ment RealSandbox Purpose Environment
  • 5. “Which clusters did I install which plugins on?” For example.. MULTIPLE CLUSTERS WILL CAUSE.. Basically our clusters are provisioned by Ansible BUT… Someone: “Hey, let’s try the XXX plugin on the node YYY of the cluster ZZZ in DEV environment!” They forgot to record XXX, YYY, ZZZ… Easily go down to chaos!
  • 6. WE NEED A MANAGEMENT TOOL!
  • 7. • ElasticHQ (OSS) • Kibana (by Elastic) • cerebro (OSS) EXISTING TOOLS For a single cluster One of its strengths is that it can support multiple clusters OK, let’s use this! However its main purpose is also deep management of a single cluster Not for browsing a cluster list
  • 8. ANOTHER PROBLEM ON Elasticsearch Not too easy to: monitor an Elasticsearch cluster alert us to the abnormal status based on the result of monitoring properly Kibana or many OSS are very nice, but: Some detailed metrics (like latency 95%ile) cannot retrieved directly We cannot see them when Es is under too heavy load
  • 9. COMPARISON Multiple clusters? Monitoring? Alerting? Kibana partial support (cross cluster search, dedicated separate cluster for monitoring) partial support (server side metrics) ✔ (with Watcher) ElasticHQ ✔ (not for browsing) partial support (server side metrics) ✘ What we need ✔ (w/ high browsability) ✔ ✔ OK, let’s make it by ourselves!
  • 10. Screenshots RUBBER BAND - TOOLKIT FOR ES MANAGEMENT
  • 11. Rubber Band UI, Health Watcher, Client - architecture
  • 12. Rubber Band UI, Health Watcher, Client - architecture
  • 13. Rubber Band UI, Health Watcher, Client - architecture
  • 14. TWO OPTIONS FOR MONITORING Monitor clusters’ states directly /_cat/*** /_cluster/health /_nodes/*** Monitor client-side metrics can compute detailed metrics can access even when a cluster is highly loaded (via our tool)
  • 15. Rubber Band UI, Health Watcher, Client - architecture
  • 16. HOW TO ALERT ON A CLUSTER STATUS? The X-Pack GOLD license supports Watcher, which also can be used to check the cluster health out-of-the-box! { "trigger" : { "schedule" : { "interval" : "10s" } }, "input" : { "http" : { "request" : { "host" : "localhost", "port" : 9200, "path" : "/_cluster/health" } } Uses cluster health API! We can also utilize it by ourselves:)
  • 17. EXAMPLES OF ALERT FROM HEALTH WATCHER
  • 18. Rubber Band UI, Health Watcher, Client - architecture
  • 19. MILESTONE PHASE 1 Rubber Band UI Rubber Band Health Watcher Rubber Band Client (Simple REST client wrapper) PHASE 2 • Rubber Band Curator (Centralized wrapper of curator) • Open to the other internal teams PHASE 3 • Publish it as a OSS
  • 20. KEY TAKEAWAYS How can we manage multiple clusters without any chaos? Our toolkit: Rubber Band A simple UI with information aggregation and appropriate delegation How can we do proper monitoring and alerting? Uses both of direct server states and client metrics Implements a simple health-check server by ourselves And..
  • 23. @Component public class ElasticsearchClientWrapper { private final RestHighLevelClient elasticsearchClient; private final MeterRegistry meterRegistry; public ElasticsearchClientWrapper(RestHighLevelClient elasticsearchClient, MeterRegistry meterRegistry) { this.elasticsearchClient = elasticsearchClient; this.meterRegistry = meterRegistry; } public void searchAndGetAggregationAsync(SearchRequest searchRequest) { Timer.Sample sample = Timer.start(meterRegistry); elasticsearchClient.searchAsync(searchRequest, new ActionListener<SearchResponse>() { @Override public void onResponse(SearchResponse searchResponse) { sample.stop(meterRegistry.timer("metrics.timer", "success")); // do stuff.. } @Override public void onFailure(Exception e) { sample.stop(meterRegistry.timer("metrics.timer", "failure")); // do fallback.. } }); } Wrap the official HighLevelRESTClient See also: Elasticsearch を検索エンジンとして利用する際のポイント https://engineering.linecorp.com/ja/blog/detail/99