SlideShare une entreprise Scribd logo
1  sur  49
Copyright© 2015,NTT Software Corporation. All rights reserved. 1
2015.10.27
NTT Software Corporation
Cloud and Security Business Department
Copyright© 2015,NTT Software Corporation. All rights reserved.
Agenda
• Introduction
• Server monitoring system
– Monitoring of the physical machine
– Monitoring of the virtual machine
– Integrate multiple of Zabbix screen
• Log monitoring system
2
Copyright© 2015,NTT Software Corporation. All rights reserved.
Introduction
• NTT Software uses OpenStack from Essex version.
• Monitoring and Log analysis has been an
important issue during the development ,test
and operation
• I introduce our system that was resolved with a
combination of OSS products.
3
Copyright© 2015,NTT Software Corporation. All rights reserved.
what do you need to monitoring
for the OpenStack.
Common
• Develop only a little.
• One screen.
• Automation
physical
• Resource Monitoring
• Middleware Monitoring
• Service Monitoring
Log Monitoring
• Collect
• Visualize
• Efficiency of error analysis.
• Automatic analysis and
notification
Virtual
• Automatic registration of
monitoring
• Automatic release of
monitoring
• Monitoring of virtual resources
4
1 2
4 3
Copyright© 2015,NTT Software Corporation. All rights reserved.
what do you need to monitoring
for the OpenStack.
Common
• Develop only a little.
• One screen.
• Automation
physical
• Resource Monitoring
• Middleware Monitoring
• Service Monitoring
Log Monitoring
• Collect
• Visualize
• Efficiency of error analysis.
• Automatic analysis and
notification
Virtual
• Automatic registration of
monitoring
• Automatic release of
monitoring
• Monitoring of virtual resources
5
1 2
4 3
Copyright© 2015,NTT Software Corporation. All rights reserved.
Keyword
I think the OpenStack as one application.
Don’t think as a same layer ,
physical and virtual machine monitoring method .
Failure of the physical side , detect by the physical side.
Failure of the virtual side , detect by the virtual side.
Log monitoring is EFK stack + Norikra + Zabbix
6
Copyright© 2015,NTT Software Corporation. All rights reserved.
Separate the physical and virtual
7
Physical Server
Middle
ware
OpenStack
VM VM
Physical
side
Virtual
side
Copyright© 2015,NTT Software Corporation. All rights reserved.
Server Monitoring System
8
Copyright© 2015,NTT Software Corporation. All rights reserved.
what do you need to monitoring
for the OpenStack.
Common
• Develop only a little.
• One screen.
• Automation
physical
• Resource Monitoring
• Middleware Monitoring
• Service Monitoring
Log Monitoring
• Collect
• Visualize
• Efficiency of error analysis.
• Automatic analysis and
notification
Virtual
• Automatic registration of
monitoring
• Automatic release of
monitoring
• Monitoring of virtual resources
9
1 2
4 3
Copyright© 2015,NTT Software Corporation. All rights reserved.
Server Monitoring(Physical)
• It need what you need in the general application monitoring.
• It is particularly necessary for the monitoring of OpenStack.
– Middleware Monitoring
– Service Monitoring
– Resource Monitoring
10
Copyright© 2015,NTT Software Corporation. All rights reserved.
Physical Server Monitoring
(UserParameter)
11
How to Monitor Middleware and Service
• UserParameter
• You get a script execution results as a monitoring
items.
• You can use plugin for Sensu, Nagios...and so on
Copyright© 2015,NTT Software Corporation. All rights reserved.
Physical Server Monitoring
(UserParameter)
• UserParameter for nova service
– It collects results of Nova hypervisor-show
•
(※)
引用:https://github.com/sensu-plugins/sensu-plugins-openstack
12
UserParameter=nova.hypervisor-state.running_vms,python /etc/zabbix/bin/nova-hypervisor-
metrics.py -u admin -p admin -t admin -a http://192.168.0.10:35357/v2.0 | awk -F '[. t]'
'$5=="running_vms" { x=x+$6 } END{print x}'
$ python /etc/zabbix/bin/nova-hypervisor-metrics.py -u admin -p
devstack -t admin -a http://192.168.0.5:35357/v2.0 | awk -F '[. t]'
'$5=="running_vms" { x=x+$6 } END{print x}'
1
Copyright© 2015,NTT Software Corporation. All rights reserved.
what do you need to monitoring
for the OpenStack.
Common
• Develop only a little.
• One screen.
• Automation
physical
• Resource Monitoring
• Middleware Monitoring
• Service Monitoring
Log Monitoring
• Collect
• Visualize
• Efficiency of error analysis.
• Automatic analysis and
notification
Virtual
• Automatic registration of
monitoring
• Automatic release of
monitoring
• Monitoring of virtual resources
13
1 2
4 3
Copyright© 2015,NTT Software Corporation. All rights reserved.
Where is Zabbix-Server
14
Project A
VM VM
Hypervisor Zabbix
Project A
Hypervisor
VM
Zabbi
x
Project B
VM
Zabbi
x
Pattern A Pattern B
Copyright© 2015,NTT Software Corporation. All rights reserved.
Virtual Machine Monitoring
(auto-registration)
• You can automatically register the monitored host to
zabbix-server.
• If it is in set to zabbix-agent,zabbix-server's function ,the
monitoring settings automatically.
• ex)
– zabbix-agent.conf
metadata="<foo>"
– Zabbix-server ‘s setting
IF metadata=controller then template = controller
15
Copyright© 2015,NTT Software Corporation. All rights reserved.
Virtual Machine Monitoring
(auto-registration)
16
WEB-server
WEB-server
WEB-server
metadata=web
DB-server
DB-server
DB-server
DB-server
DB-server
metadata=db
APP
server
APP
server
metadata=app
Zabbix
Server
IF metadata=web then template = webTemplate
IF metadata=db then template = dbTemplate
IF metadata=app then template = appTemplate
Copyright© 2015,NTT Software Corporation. All rights reserved.
Virtual Machine Monitoring
(network discovery)
• The network discovery function that is used in automatic
monitoring setting added,
It use to release the monitoring settings.
• It doesn’t distinguish between
the power-off / delete / failure of VM.
• If you don't restore , when event been abnormality
notification.
→ No recovery plan
→ Safe to delete from the monitoring target
→ auto-registration after restoration
17
Copyright© 2015,NTT Software Corporation. All rights reserved.
Virtual Machine Monitoring
(network discovery)
18
Node1 Node2 Node3 Node4 Node5 zabbix
Monitoring
Network
192.168.100.0/24
Alert
Copyright© 2015,NTT Software Corporation. All rights reserved.
Fault of virtual resources
19
Compute
Node
Storage
Node
Nova Cinder
VM
NW 192.168.0.0/24
VM VM
VM VM VM
Copyright© 2015,NTT Software Corporation. All rights reserved.
What do you monitor ?
• PingAlive
• ALL ProcessProcess
• CPU
• Memory
• Disk
Resource
• HW
• NWSNMP
20
Copyright© 2015,NTT Software Corporation. All rights reserved.
Fault of virtual resources
• PingAlive
• ALL ProcessProcess
• CPU
• Memory
• Disk
Resource
• HW
• NWSNMP
21
Detectable
Copyright© 2015,NTT Software Corporation. All rights reserved.
Integrate multiple of Zabbix screen
22
Copyright© 2015,NTT Software Corporation. All rights reserved.
Where is Zabbix-Server
23
Project A
VM VM
Hypervisor Zabbix
Project A
Hypervisor
VM
Zabbi
x
Project B
VM
Zabbi
x
Pattern A Pattern B
Copyright© 2015,NTT Software Corporation. All rights reserved.
Too many tabs
24
Tenant A Tenant B
Tenant C Tenant D
Tenant Hatohol
Hatohol
L3
Copyright© 2015,NTT Software Corporation. All rights reserved.
Hatohol
25
Zabbix
1
Zabbix
2
Copyright© 2015,NTT Software Corporation. All rights reserved.
Summary of server monitoring
• Failure of the physical side , detect by the physical side.
• Failure of the virtual side , detect by the virtual side.
• You can use plugin for sensu, Nagios...etc.
for middleware and OpenStack service monitoring.
• Add to host「 auto-registration 」
• Del to host「 network discovery 」
• Integrate multiple of Zabbix screen that using hatohol.
26
Copyright© 2015,NTT Software Corporation. All rights reserved.
Log Monitoring System
27
Copyright© 2015,NTT Software Corporation. All rights reserved.
what do you need to monitoring
for the OpenStack.
Common
• Develop only a little.
• One screen.
• Automation
physical
• Resource Monitoring
• Middleware Monitoring
• Service Monitoring
Log Monitoring
• Collect
• Visualize
• Efficiency of error analysis.
• Automatic analysis and
notification
Virtual
• Automatic registration of
monitoring
• Automatic release of
monitoring
• Monitoring of virtual resources
28
1 2
4 3
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Log Monitoring
Use Name
Search engine Elasticsearch
Log collect Fluentd
Log visualize Kibana
Log analysis Norikra
Notification Zabbix
29
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Log Monitoring
30
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Log Monitoring
• What is Norikra
31
Schema-less Stream Processing with SQL
Norikra is a open source server software provides "Stream
Processing" with SQL, written in JRuby, runs on JVM, licensed under
GPLv2.
Quotation : http://norikra.github.io/
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Log Monitoring
• Streaming log
Cut sometime→Analysis→Loop
32
Quotation:
Esper: Event Processing for Java
http://www.espertech.com/products/esper
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Structure
33
OpenStack
Fluentd
Monitoring
Server
Fluentd Norikra
Elastic
search
File
Zabbix
Kibana
Copyright© 2015,NTT Software Corporation. All rights reserved.
EFK+NZ Log Monitoring
• Most important things,
how to write the rules of Norikra
• Our experience and know-how
will be rule of the Norikra.
– develop・test・trouble
– Log list
• Every version.
• It does not work if there is no rule after
construction.
34
Copyright© 2015,NTT Software Corporation. All rights reserved.
Detect the variation of the Kibana graph
• ex1) Detect the variation of the Kibana graph.
– Suspicious activity
• ex2) Detect a failure as a pattern.
– Error analysis
35
Copyright© 2015,NTT Software Corporation. All rights reserved.
Detect the variation of the Kibana graph
• keystone don't have mechanism to detect the dos attack.
So, I want to detect it.
• When under attack, 401 error log increases.
• discover is possible if you look at the graph of Kibana.
However, I must always watch.
36
Copyright© 2015,NTT Software Corporation. All rights reserved.
Detect the variation of the Kibana graph
• keystone don't have mechanism to detect the dos attack.
So, I want to detect it.
• When under attack, 401 error log increases.
• discover is possible if you look at the graph of Kibana.
However, I must always watch.
37
Copyright© 2015,NTT Software Corporation. All rights reserved.
Detect the variation of the Kibana graph
38
To Norikra of rules it
writes with "If from
the same IP address
per unit time 401
error in ○ times
more access
occurs." ,
Copyright© 2015,NTT Software Corporation. All rights reserved.
Detect the variation of the Kibana graph
39
Keystone was
attacked !!
Copyright© 2015,NTT Software Corporation. All rights reserved.
ERROR Analysis
• Detection of the patterned error
• Log AAA+ Log BBB = Error Type 002
• Manually when you analyze.
– Lock at log file
nova-api.log nova-conductor.log nova-compute.log…and so on
– Use grep command.
– He requires a log analysis skills.
40
Copyright© 2015,NTT Software Corporation. All rights reserved.
ERROR Analysis
41
• Ex) Using a Flavor that
is too high memory.
Failure to start.
Copyright© 2015,NTT Software Corporation. All rights reserved.
ERROR Analysis
42
what's happened.
Copyright© 2015,NTT Software Corporation. All rights reserved.
ERROR Analysis
43
Error cause is not
writing.
Copyright© 2015,NTT Software Corporation. All rights reserved.
ERROR Analysis
44
Zabbix was notified as
a error type 002
Copyright© 2015,NTT Software Corporation. All rights reserved.
Future Action
• Our experience and know-how from the
Essex version will continue efforts to expand
the rule of Norikra .
45
Copyright© 2015,NTT Software Corporation. All rights reserved.
Announcement
• I will demonstrate the log monitoring system.
– Place:NTT Group booth S14
– Date:29(Thu) 10:30~
46
Copyright© 2015,NTT Software Corporation. All rights reserved.
ご清聴ありがとうございました。
THANK YOU FOR LISTENING.
47
Copyright© 2015,NTT Software Corporation. All rights reserved.
References Quotation
• Monasca
https://wiki.openstack.org/wiki/Monasca
• Monasca/Monitoring Of Monasca
https://wiki.openstack.org/wiki/Monasca/Monitoring_Of_Monasca
• Monasca/Logging
https://wiki.openstack.org/wiki/Monasca/Logging
• Zabbix Documentation 2.2
https://www.zabbix.com/documentation/2.2/
• Elastic
https://www.elastic.co/jp/
• Norikra
http://norikra.github.io/
• EsperTech
http://www.espertech.com/products/esper.php
• Treasure Data Inc
http://www.treasuredata.com/
48
Copyright© 2015,NTT Software Corporation. All rights reserved.
Trademark
OpenStackは、米国におけるOpenStack,LLCの登録商標です。
Zabbixはラトビア共和国にあるZabbix LLCの商標です。
Erasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
logstash is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
Kibana is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.
その他、文中に記載されている商品・サービス名、および会社名は、それぞれ各社の商標また
は登録商標です。
49

Contenu connexe

Tendances

Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Akihiro Suda
 
2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a
Ken Igarashi
 
Robert collins openstack on openstack 201304162
Robert collins   openstack on openstack 201304162Robert collins   openstack on openstack 201304162
Robert collins openstack on openstack 201304162
OpenStack Foundation
 
OpenStack: Inside Out
OpenStack: Inside OutOpenStack: Inside Out
OpenStack: Inside Out
Etsuji Nakai
 

Tendances (20)

OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
OpenStack Summit Tokyo - Know-how of Challlenging Deploy/Operation NTT DOCOMO...
 
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...Tackling non-determinism in Hadoop - Testing and debugging distributed system...
Tackling non-determinism in Hadoop - Testing and debugging distributed system...
 
2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a2014-4Q-OpenStack-Fall-presentation-public-20150310a
2014-4Q-OpenStack-Fall-presentation-public-20150310a
 
PostgreSQL 10: What to Look For
PostgreSQL 10: What to Look ForPostgreSQL 10: What to Look For
PostgreSQL 10: What to Look For
 
Rakuten openstack
Rakuten openstackRakuten openstack
Rakuten openstack
 
TripleO
 TripleO TripleO
TripleO
 
Extending TripleO for OpenStack Management
Extending TripleO for OpenStack ManagementExtending TripleO for OpenStack Management
Extending TripleO for OpenStack Management
 
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
[OpenStack Day in Korea 2015] Track 1 - Triple O를 이용한 빠르고 쉬운 OpenStack 설치
 
Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330Canonical ubuntu introduction_20170330
Canonical ubuntu introduction_20170330
 
Robert collins openstack on openstack 201304162
Robert collins   openstack on openstack 201304162Robert collins   openstack on openstack 201304162
Robert collins openstack on openstack 201304162
 
OpenStack: Inside Out
OpenStack: Inside OutOpenStack: Inside Out
OpenStack: Inside Out
 
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
NTTドコモ様 導入事例 OpenStack Summit 2016 Barcelona 講演「Expanding and Deepening NTT D...
 
Triple o overview
Triple o overviewTriple o overview
Triple o overview
 
TripleO Lightning Talk
TripleO Lightning TalkTripleO Lightning Talk
TripleO Lightning Talk
 
[OpenStack Day in Korea 2015] Track 3-6 - Archiectural Overview of the Open S...
[OpenStack Day in Korea 2015] Track 3-6 - Archiectural Overview of the Open S...[OpenStack Day in Korea 2015] Track 3-6 - Archiectural Overview of the Open S...
[OpenStack Day in Korea 2015] Track 3-6 - Archiectural Overview of the Open S...
 
OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27OVN 設定サンプル | OVN config example 2015/12/27
OVN 設定サンプル | OVN config example 2015/12/27
 
/bin/tails from OpenStack Operations: Rarm Nagalingam, Red Hat
/bin/tails from OpenStack Operations: Rarm Nagalingam, Red Hat/bin/tails from OpenStack Operations: Rarm Nagalingam, Red Hat
/bin/tails from OpenStack Operations: Rarm Nagalingam, Red Hat
 
How to use TripleO tools for your own project
How to use TripleO tools for your own projectHow to use TripleO tools for your own project
How to use TripleO tools for your own project
 
ONAP integration with opnfv via opera
ONAP integration with opnfv via opera ONAP integration with opnfv via opera
ONAP integration with opnfv via opera
 
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
[OpenStack Day in Korea 2015] Track 3-1 - OpenStack Storage Infrastructure & ...
 

Similaire à Monitoring system for OpenStack,using a OSS products

GWAVACon 2013: Gain Control - ZENworks
GWAVACon 2013: Gain Control - ZENworksGWAVACon 2013: Gain Control - ZENworks
GWAVACon 2013: Gain Control - ZENworks
GWAVA
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
 

Similaire à Monitoring system for OpenStack,using a OSS products (20)

Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
Kirin User Story: Migrating Mission Critical Applications to OpenStack Privat...
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-final
 
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStackAutomated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
Automated Deployment & Benchmarking with Chef, Cobbler and Rally for OpenStack
 
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
Creating Real-Time Data Streaming powered by SQL on Kubernetes - Albert Lewan...
 
What’s New in UniVerse 11.2
What’s New in UniVerse 11.2What’s New in UniVerse 11.2
What’s New in UniVerse 11.2
 
Add Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring ToolkitAdd Apache Web Server to your Unified Monitoring Toolkit
Add Apache Web Server to your Unified Monitoring Toolkit
 
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
[2015-11월 정기 세미나] Cloud Native Platform - Pivotal
 
IBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management SolutionsIBM Monitoring and Event Management Solutions
IBM Monitoring and Event Management Solutions
 
WebLogic Performance Monitoring - OFM Canberra July 2014
WebLogic Performance Monitoring - OFM Canberra July 2014WebLogic Performance Monitoring - OFM Canberra July 2014
WebLogic Performance Monitoring - OFM Canberra July 2014
 
Lesson_08_Continuous_Monitoring.pdf
Lesson_08_Continuous_Monitoring.pdfLesson_08_Continuous_Monitoring.pdf
Lesson_08_Continuous_Monitoring.pdf
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
Openstack Summit Vancouver 2015 - Maintaining and Operating Swift at Public C...
 
GWAVACon 2013: Gain Control - ZENworks
GWAVACon 2013: Gain Control - ZENworksGWAVACon 2013: Gain Control - ZENworks
GWAVACon 2013: Gain Control - ZENworks
 
Introducing RTView Enterprise Monitor 1.5
Introducing RTView Enterprise Monitor 1.5 Introducing RTView Enterprise Monitor 1.5
Introducing RTView Enterprise Monitor 1.5
 
Manage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with ObservabilityManage Microservices Chaos and Complexity with Observability
Manage Microservices Chaos and Complexity with Observability
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Netflix MSA and Pivotal
Netflix MSA and PivotalNetflix MSA and Pivotal
Netflix MSA and Pivotal
 
Removing Barriers Between Dev and Ops
Removing Barriers Between Dev and OpsRemoving Barriers Between Dev and Ops
Removing Barriers Between Dev and Ops
 
Introducing Postgres Enterprise Manager 5.0
Introducing Postgres Enterprise Manager 5.0Introducing Postgres Enterprise Manager 5.0
Introducing Postgres Enterprise Manager 5.0
 
Openstack Ops Meetup Palo Alto LT
Openstack Ops Meetup Palo Alto LTOpenstack Ops Meetup Palo Alto LT
Openstack Ops Meetup Palo Alto LT
 

Dernier

AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
VictorSzoltysek
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 

Dernier (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
Shapes for Sharing between Graph Data Spaces - and Epistemic Querying of RDF-...
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
Direct Style Effect Systems -The Print[A] Example- A Comprehension AidDirect Style Effect Systems -The Print[A] Example- A Comprehension Aid
Direct Style Effect Systems - The Print[A] Example - A Comprehension Aid
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
Define the academic and professional writing..pdf
Define the academic and professional writing..pdfDefine the academic and professional writing..pdf
Define the academic and professional writing..pdf
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 

Monitoring system for OpenStack,using a OSS products

  • 1. Copyright© 2015,NTT Software Corporation. All rights reserved. 1 2015.10.27 NTT Software Corporation Cloud and Security Business Department
  • 2. Copyright© 2015,NTT Software Corporation. All rights reserved. Agenda • Introduction • Server monitoring system – Monitoring of the physical machine – Monitoring of the virtual machine – Integrate multiple of Zabbix screen • Log monitoring system 2
  • 3. Copyright© 2015,NTT Software Corporation. All rights reserved. Introduction • NTT Software uses OpenStack from Essex version. • Monitoring and Log analysis has been an important issue during the development ,test and operation • I introduce our system that was resolved with a combination of OSS products. 3
  • 4. Copyright© 2015,NTT Software Corporation. All rights reserved. what do you need to monitoring for the OpenStack. Common • Develop only a little. • One screen. • Automation physical • Resource Monitoring • Middleware Monitoring • Service Monitoring Log Monitoring • Collect • Visualize • Efficiency of error analysis. • Automatic analysis and notification Virtual • Automatic registration of monitoring • Automatic release of monitoring • Monitoring of virtual resources 4 1 2 4 3
  • 5. Copyright© 2015,NTT Software Corporation. All rights reserved. what do you need to monitoring for the OpenStack. Common • Develop only a little. • One screen. • Automation physical • Resource Monitoring • Middleware Monitoring • Service Monitoring Log Monitoring • Collect • Visualize • Efficiency of error analysis. • Automatic analysis and notification Virtual • Automatic registration of monitoring • Automatic release of monitoring • Monitoring of virtual resources 5 1 2 4 3
  • 6. Copyright© 2015,NTT Software Corporation. All rights reserved. Keyword I think the OpenStack as one application. Don’t think as a same layer , physical and virtual machine monitoring method . Failure of the physical side , detect by the physical side. Failure of the virtual side , detect by the virtual side. Log monitoring is EFK stack + Norikra + Zabbix 6
  • 7. Copyright© 2015,NTT Software Corporation. All rights reserved. Separate the physical and virtual 7 Physical Server Middle ware OpenStack VM VM Physical side Virtual side
  • 8. Copyright© 2015,NTT Software Corporation. All rights reserved. Server Monitoring System 8
  • 9. Copyright© 2015,NTT Software Corporation. All rights reserved. what do you need to monitoring for the OpenStack. Common • Develop only a little. • One screen. • Automation physical • Resource Monitoring • Middleware Monitoring • Service Monitoring Log Monitoring • Collect • Visualize • Efficiency of error analysis. • Automatic analysis and notification Virtual • Automatic registration of monitoring • Automatic release of monitoring • Monitoring of virtual resources 9 1 2 4 3
  • 10. Copyright© 2015,NTT Software Corporation. All rights reserved. Server Monitoring(Physical) • It need what you need in the general application monitoring. • It is particularly necessary for the monitoring of OpenStack. – Middleware Monitoring – Service Monitoring – Resource Monitoring 10
  • 11. Copyright© 2015,NTT Software Corporation. All rights reserved. Physical Server Monitoring (UserParameter) 11 How to Monitor Middleware and Service • UserParameter • You get a script execution results as a monitoring items. • You can use plugin for Sensu, Nagios...and so on
  • 12. Copyright© 2015,NTT Software Corporation. All rights reserved. Physical Server Monitoring (UserParameter) • UserParameter for nova service – It collects results of Nova hypervisor-show • (※) 引用:https://github.com/sensu-plugins/sensu-plugins-openstack 12 UserParameter=nova.hypervisor-state.running_vms,python /etc/zabbix/bin/nova-hypervisor- metrics.py -u admin -p admin -t admin -a http://192.168.0.10:35357/v2.0 | awk -F '[. t]' '$5=="running_vms" { x=x+$6 } END{print x}' $ python /etc/zabbix/bin/nova-hypervisor-metrics.py -u admin -p devstack -t admin -a http://192.168.0.5:35357/v2.0 | awk -F '[. t]' '$5=="running_vms" { x=x+$6 } END{print x}' 1
  • 13. Copyright© 2015,NTT Software Corporation. All rights reserved. what do you need to monitoring for the OpenStack. Common • Develop only a little. • One screen. • Automation physical • Resource Monitoring • Middleware Monitoring • Service Monitoring Log Monitoring • Collect • Visualize • Efficiency of error analysis. • Automatic analysis and notification Virtual • Automatic registration of monitoring • Automatic release of monitoring • Monitoring of virtual resources 13 1 2 4 3
  • 14. Copyright© 2015,NTT Software Corporation. All rights reserved. Where is Zabbix-Server 14 Project A VM VM Hypervisor Zabbix Project A Hypervisor VM Zabbi x Project B VM Zabbi x Pattern A Pattern B
  • 15. Copyright© 2015,NTT Software Corporation. All rights reserved. Virtual Machine Monitoring (auto-registration) • You can automatically register the monitored host to zabbix-server. • If it is in set to zabbix-agent,zabbix-server's function ,the monitoring settings automatically. • ex) – zabbix-agent.conf metadata="<foo>" – Zabbix-server ‘s setting IF metadata=controller then template = controller 15
  • 16. Copyright© 2015,NTT Software Corporation. All rights reserved. Virtual Machine Monitoring (auto-registration) 16 WEB-server WEB-server WEB-server metadata=web DB-server DB-server DB-server DB-server DB-server metadata=db APP server APP server metadata=app Zabbix Server IF metadata=web then template = webTemplate IF metadata=db then template = dbTemplate IF metadata=app then template = appTemplate
  • 17. Copyright© 2015,NTT Software Corporation. All rights reserved. Virtual Machine Monitoring (network discovery) • The network discovery function that is used in automatic monitoring setting added, It use to release the monitoring settings. • It doesn’t distinguish between the power-off / delete / failure of VM. • If you don't restore , when event been abnormality notification. → No recovery plan → Safe to delete from the monitoring target → auto-registration after restoration 17
  • 18. Copyright© 2015,NTT Software Corporation. All rights reserved. Virtual Machine Monitoring (network discovery) 18 Node1 Node2 Node3 Node4 Node5 zabbix Monitoring Network 192.168.100.0/24 Alert
  • 19. Copyright© 2015,NTT Software Corporation. All rights reserved. Fault of virtual resources 19 Compute Node Storage Node Nova Cinder VM NW 192.168.0.0/24 VM VM VM VM VM
  • 20. Copyright© 2015,NTT Software Corporation. All rights reserved. What do you monitor ? • PingAlive • ALL ProcessProcess • CPU • Memory • Disk Resource • HW • NWSNMP 20
  • 21. Copyright© 2015,NTT Software Corporation. All rights reserved. Fault of virtual resources • PingAlive • ALL ProcessProcess • CPU • Memory • Disk Resource • HW • NWSNMP 21 Detectable
  • 22. Copyright© 2015,NTT Software Corporation. All rights reserved. Integrate multiple of Zabbix screen 22
  • 23. Copyright© 2015,NTT Software Corporation. All rights reserved. Where is Zabbix-Server 23 Project A VM VM Hypervisor Zabbix Project A Hypervisor VM Zabbi x Project B VM Zabbi x Pattern A Pattern B
  • 24. Copyright© 2015,NTT Software Corporation. All rights reserved. Too many tabs 24 Tenant A Tenant B Tenant C Tenant D Tenant Hatohol Hatohol L3
  • 25. Copyright© 2015,NTT Software Corporation. All rights reserved. Hatohol 25 Zabbix 1 Zabbix 2
  • 26. Copyright© 2015,NTT Software Corporation. All rights reserved. Summary of server monitoring • Failure of the physical side , detect by the physical side. • Failure of the virtual side , detect by the virtual side. • You can use plugin for sensu, Nagios...etc. for middleware and OpenStack service monitoring. • Add to host「 auto-registration 」 • Del to host「 network discovery 」 • Integrate multiple of Zabbix screen that using hatohol. 26
  • 27. Copyright© 2015,NTT Software Corporation. All rights reserved. Log Monitoring System 27
  • 28. Copyright© 2015,NTT Software Corporation. All rights reserved. what do you need to monitoring for the OpenStack. Common • Develop only a little. • One screen. • Automation physical • Resource Monitoring • Middleware Monitoring • Service Monitoring Log Monitoring • Collect • Visualize • Efficiency of error analysis. • Automatic analysis and notification Virtual • Automatic registration of monitoring • Automatic release of monitoring • Monitoring of virtual resources 28 1 2 4 3
  • 29. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Log Monitoring Use Name Search engine Elasticsearch Log collect Fluentd Log visualize Kibana Log analysis Norikra Notification Zabbix 29
  • 30. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Log Monitoring 30
  • 31. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Log Monitoring • What is Norikra 31 Schema-less Stream Processing with SQL Norikra is a open source server software provides "Stream Processing" with SQL, written in JRuby, runs on JVM, licensed under GPLv2. Quotation : http://norikra.github.io/
  • 32. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Log Monitoring • Streaming log Cut sometime→Analysis→Loop 32 Quotation: Esper: Event Processing for Java http://www.espertech.com/products/esper
  • 33. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Structure 33 OpenStack Fluentd Monitoring Server Fluentd Norikra Elastic search File Zabbix Kibana
  • 34. Copyright© 2015,NTT Software Corporation. All rights reserved. EFK+NZ Log Monitoring • Most important things, how to write the rules of Norikra • Our experience and know-how will be rule of the Norikra. – develop・test・trouble – Log list • Every version. • It does not work if there is no rule after construction. 34
  • 35. Copyright© 2015,NTT Software Corporation. All rights reserved. Detect the variation of the Kibana graph • ex1) Detect the variation of the Kibana graph. – Suspicious activity • ex2) Detect a failure as a pattern. – Error analysis 35
  • 36. Copyright© 2015,NTT Software Corporation. All rights reserved. Detect the variation of the Kibana graph • keystone don't have mechanism to detect the dos attack. So, I want to detect it. • When under attack, 401 error log increases. • discover is possible if you look at the graph of Kibana. However, I must always watch. 36
  • 37. Copyright© 2015,NTT Software Corporation. All rights reserved. Detect the variation of the Kibana graph • keystone don't have mechanism to detect the dos attack. So, I want to detect it. • When under attack, 401 error log increases. • discover is possible if you look at the graph of Kibana. However, I must always watch. 37
  • 38. Copyright© 2015,NTT Software Corporation. All rights reserved. Detect the variation of the Kibana graph 38 To Norikra of rules it writes with "If from the same IP address per unit time 401 error in ○ times more access occurs." ,
  • 39. Copyright© 2015,NTT Software Corporation. All rights reserved. Detect the variation of the Kibana graph 39 Keystone was attacked !!
  • 40. Copyright© 2015,NTT Software Corporation. All rights reserved. ERROR Analysis • Detection of the patterned error • Log AAA+ Log BBB = Error Type 002 • Manually when you analyze. – Lock at log file nova-api.log nova-conductor.log nova-compute.log…and so on – Use grep command. – He requires a log analysis skills. 40
  • 41. Copyright© 2015,NTT Software Corporation. All rights reserved. ERROR Analysis 41 • Ex) Using a Flavor that is too high memory. Failure to start.
  • 42. Copyright© 2015,NTT Software Corporation. All rights reserved. ERROR Analysis 42 what's happened.
  • 43. Copyright© 2015,NTT Software Corporation. All rights reserved. ERROR Analysis 43 Error cause is not writing.
  • 44. Copyright© 2015,NTT Software Corporation. All rights reserved. ERROR Analysis 44 Zabbix was notified as a error type 002
  • 45. Copyright© 2015,NTT Software Corporation. All rights reserved. Future Action • Our experience and know-how from the Essex version will continue efforts to expand the rule of Norikra . 45
  • 46. Copyright© 2015,NTT Software Corporation. All rights reserved. Announcement • I will demonstrate the log monitoring system. – Place:NTT Group booth S14 – Date:29(Thu) 10:30~ 46
  • 47. Copyright© 2015,NTT Software Corporation. All rights reserved. ご清聴ありがとうございました。 THANK YOU FOR LISTENING. 47
  • 48. Copyright© 2015,NTT Software Corporation. All rights reserved. References Quotation • Monasca https://wiki.openstack.org/wiki/Monasca • Monasca/Monitoring Of Monasca https://wiki.openstack.org/wiki/Monasca/Monitoring_Of_Monasca • Monasca/Logging https://wiki.openstack.org/wiki/Monasca/Logging • Zabbix Documentation 2.2 https://www.zabbix.com/documentation/2.2/ • Elastic https://www.elastic.co/jp/ • Norikra http://norikra.github.io/ • EsperTech http://www.espertech.com/products/esper.php • Treasure Data Inc http://www.treasuredata.com/ 48
  • 49. Copyright© 2015,NTT Software Corporation. All rights reserved. Trademark OpenStackは、米国におけるOpenStack,LLCの登録商標です。 Zabbixはラトビア共和国にあるZabbix LLCの商標です。 Erasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. logstash is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. Kibana is a trademark of Elasticsearch BV, registered in the U.S. and in other countries. その他、文中に記載されている商品・サービス名、および会社名は、それぞれ各社の商標また は登録商標です。 49

Notes de l'éditeur

  1. 本日のアジェンダです。 背景 コミュニティ状況 物理と仮想それぞれのサーバー監視に使える小ねたを。  仮想監視でZabbixが増えると、画面も増えるので統一したい ログ監視システムについて
  2. 弊社NTTソフトウェアではOpenStackのEssex版から研究所の裏で開発に携わっておりまして、 そのOpenStackの開発や運用をしていく中で監視や、特にログ解析が常に重要な課題となっていた。 G会社からもログ解析大変という声がでていました。 この課題をOSS製品の組み合わせで解消したので、ご紹介いたします。
  3. 背景として、そもそもOpenStackの監視システムに何が必要かを出してみました。 共通の項目として、・・・ 物理側の監視には、・・・ 仮想側の監視には、・・・ ログ監視には、・・・ が必要かと思われます。
  4. 背景として、そもそもOpenStackの監視システムに何が必要かを出してみました。 共通の項目として、・・・ 物理側の監視には、・・・ 仮想側の監視には、・・・ ログ監視には、・・・ が必要かと思われます。
  5. OpenStackを1アプリケーションとして考える 1個のアプリケーションとして考えると簡単です。  IaaSだから難しそうとか考えずに、「ただのアプリケーションだから、こういうものを見ないといけない」があるはず。  普段アプリケーション開発したら監視どうやってます?ほぼその通りで大丈夫です。 物理監視と仮想監視を一緒に考えない。  物理はあれを見て・・・仮想はあれを見て・・・あれ・・・この項目は物理側だよな・・・仮想側じゃないよな・・・?って考えると混乱します。  きっちり分けて考えましょう。 きっちり分けたら、物理側の障害は物理側で、仮想側の障害は仮想側で検知しましょう。 ログ監視はEFK stack + Norikra + zabbixです。 本日は是非これを覚えて帰ってください。
  6. まず、色々とお話を始める前に、物理と仮想のレイヤをきっちりと別けます。 物理側といったらここ 仮想側といったらこの部分になります。
  7. 背景として、そもそもOpenStackの監視システムに何が必要かを出してみました。 共通の項目として、・・・ 物理側の監視には、・・・ 仮想側の監視には、・・・ ログ監視には、・・・ が必要かと思われます。
  8. 一般的なAPPの監視に必要なものは当然必要です。 それ以外にOpenStackだからこそ必要なものとしては、 以下のミドル サービス監視 リソース監視です OpenStackは様々なmiddlewareの集合体です。 それぞれのmiddlewareが機能しなくなると、OpenStackに悪影響を与えるためしっかりと見張りましょう。 ミドルウェアや各プロセスが正常に動いているように見えても、実はサイレント故障が起きており正常動作していないときがあります。 OpenStackとして正常に動作しているかを確認しましょう。 仮想資源が枯渇すると、上で動いているVMに影響が出ることがあります。 収容設計等を見直す機会にもなりますので見ておいたほうが良いでしょう。
  9. 作り込みは最低限だけということで、 公開されているものを有効活用しましょう ミドルウェア・サービスともにsensu用やNagios用が監視スクリプトを公開しているのでそれを使います。 その際に使う機能としてUserParameter機能があります。
  10. これは、スクリプトの実行結果を監視アイテムとして取得するものです。 nova- hypervisor-showの結果から、起動しているvm数の情報を取得させています。 この仕組みを使えば、ミドルウェア・サービスともに必要な情報が取得できます。
  11. 物理側はこれくらいです。 他は難しく考えず従来どおりの監視で大丈夫です。 続いて仮想側の監視です。  仮想マシンは作っては消え作っては消えるので、そのたびに監視設定をしていたのでは非効率です。  自動化しましょう。 仮想資源の故障をどうやって見つけるか。
  12. まずは前提として、監視サーバをどこに配置しますか。 方法としては 物理上のzabbix-serverで一元管理する方法と Project毎にzabbixを設置する方法があります。 うちでは、Project毎に使う人が違うことを想定してzabbix-serverを分けています。 こうすることで、各Project管理者は自分のProjectのzabbixを見れますし、zabbixの収集詰まりも抑えられます。
  13. 監視することはできても、数十台~数百台のマシンに監視設定を適用させるのは大変ですよね。 人手でちまちまと設定反映させていくのは大変なので、自動で監視設定してもらいましょう。 zabbixの自動登録を使います。 この設定をしておくと、zabbix-agentからzabbix-serverに対して通知を飛ばします。 通知受け取ったzabbix-serverは、あらかじめ設定した条件にしたがって監視設定を反映させます。   ・metadata    zabbix-agent.confのmetadata="<foo>"    zabbixの自動登録でmetadataが"foo"ならテンプレート"bar"を適用とする。    サーバ構築の際にansible,chef,puppet等の構成管理ツールでzabbix-agentを埋め込むか、GlanceのImageにあらかじめ組み込んでおくと良いでしょう。    用途ごとにmetadataの設定を変えてあげればOK
  14. Zabbix-server側に、metadataが**だったら、templateは**を適用と設定しておくと、 監視対象が増えていっても、同じ監視設定が勝手に適応される。 仮想環境の監視でもこれは使える。 監視対象ノードが増えていっても設定するのは最初の1回だけ。  そのノード用のテンプレートを作って自動登録の設定をするだけ。 仮想マシンの場合、GlanceのImageにzabbix-agentと設定を事前に埋め込んでおくとベスト。
  15. 増えるVMの監視追加は前述の自動追加を使う。 問題は、監視対象削除の方  VM一覧とzabbixの監視ホストの突合せを定期的に行って、不一致があれば対処するツールを作っている人もいるようですが、  ここでは単純に、一定時間見えなくなったら消す。という方法にします。
  16. マシンの電源OFFすると、zabbixへアラートがあがります。 障害起因で落ちたなら、アラートを無視しないで復旧にさせます。 自分で電源落としたもしくは削除したなら、そのアラートを無視しますね。 一定時間放置されていたら、もう使わないと判断して、監視対象から外す。 直さないってことは使わないってことですから。再度使うにしても、自動登録が動いて監視対象に再設定されます。 ネットワークディスカバリ機能で指定したセグメントでIP疎通取れるかをチェック 今まで登録されていた→消えた。となれば監視設定から外す。
  17. 仮に故障箇所がcinderが作ったvolumeだけとします。 矢印のようにVMがCinderのvolumeをマウントしていたとします。 ある日突然CinderのVolumeがここだけ壊れました。 他のVMのマウントしているvolumeは壊れてないです。 さて、これどうやって検出します?
  18. 監視項目がこんな風にあったとすると、
  19. 右側の方法でzabbix-serverを配置していくと・・・
  20. Project毎にzabbixが増えるので気が付いたらタブが恐ろしいほど増えていることがあります そこでHatoholを使ってzabbix画面を一元化する方法をオススメします。 コレを使うと、
  21. Zabbix1の情報 Zabbix2の情報のように1画面で複数のzabbixの情報を取ってこれるので画面の一元化に一役買っています。
  22. 最後にログ監視についてです。 収集 可視化 解析の効率化 自動で解析 通知をやって欲しいですね。
  23. まずは使うものです。 検索エンジンにElasticsearch ログ収集にFluentd ログ可視化にKibana いわゆるEFK stackです ログ解析にNorikra 異常通知にZabbixを使っています。
  24. Kibanaで可視化できる。やったー。 で終わってませんか。 グラフで見れば違いがわかって一目瞭然とか思っていませんか。 そこで登場するのがNorikraです。
  25. 要約すると、スキーマレスでSQLストリーミング解析ができるOSS製品です。 ちなみにNorikraの名前の元は、日本の飛騨山脈(北アルプス)の乗鞍岳のようです。
  26.  ▲Norikraの内部では、EsperTech Inc.がOSSとして公開しているEsperを利用しており、   CEP (Complex Event Processing)エンジンが組み込まれている      流れてくるログを一定期間のみ抽出→解析する→捨てる→ループ   鹿威しのイメージ  ▲SQLストリーミング解析 条件式はSQL文で書くので、   SQL強い人ならなんでもできる
  27. OpenStackの動いているサーバにFluentdを入れて、ログを収集します。 収集したログは監視サーバ内のFluentdへ転送し、Elasticsearch Norikra Flileへ転送します。 Elasticsearchへはログ可視化のため Norikraへはログ解析のため  解析後はzabbixへ通知します。 Fileは念のため  
  28. Norikraの仕組み作りはやれば誰でもできますが、使い物になるかどうかはNorikraルールがどれくらい作りこめているかに比例します。 このルールを作るためには、「ログA+ B =事象Cである」。や「このログがこれだけ出たらおかしい」や、「単体では問題ないログでも、他と組み合わせると異常が発生している。」など、 ルールを作るための大量のデータが必要になります。 ただ作るだけで即実用的かといわれるとそうではない。 重要なので2度言います。 如何にしてルールを作るかが重要になります。 我々はE版からの経験で、エラーログリストを持っています。 このエラーは無視できるもの、や、このエラーは絶対に無視できないなど。 これを元にしてルール作りに取り組んでいます。
  29. 実際に具体例を使ってEFK+NZをご紹介したいと思います。 ・kibanaで表示しているグラフの変動を検知 ・パターン化された障害を検知
  30. Keystoneはdos攻撃を検知する方法を持っていないのでこれを検知させます。 Kibanaで見るとグラフが変動するので、 この「グラフが変動したら通知する」をやってみます
  31. Keystoneはdos攻撃を検知する方法を持っていないのでこれを検知させます。 Kibanaで見るとグラフが変動するので、 この「グラフが変動したら通知する」をやってみます
  32. Norikraのルールに「単位時間当たり同一IPアドレスから401エラーで○回以上アクセスが起きたら」と書いてあげると、 このあとkeystoneへ偽のuser名でtokenを取りに行きます。 辞書攻撃やブルートフォースアタックを受けたと想定してください。 そうするとこのNorikraのルールに一致するので、zabbixへ通知が飛びます
  33. Zabbixへ通知が飛んでいく。 サーバー監視もzabbixで見るので、ログ監視も同じ画面で監視できます。 運用者は主にこの画面をいていれば良いでしょう
  34. あるログAとあるログBが同時期に出たら、この障害パターンですと通知してくれます。 手動で解析すると、 たくさんのログファイルを見て、lessやgrepを駆使して解析していると思います。 これには一定のスキルが必要になるので、簡単なエラー事象の解析のためにハイスキルナ人を割り当てるのはナンセンスです。 これの解析通知を自動化します。
  35. メモリを大量に使うFlavorでVMを起動させます。
  36. No valid host was found. There are not enough hosts available. 利用可能なホストが見つかりませんでした。 え?何がわるかったの?
  37. 詳細見ても、原因書いてない。 ログを見ればわかりますけど、利用者はログなんて見れないのでどうしようもない。
  38. Nova boot error type2 Type2 は flavorのmemoryを満たすホストがなかった。とすれば解析の手間が簡略化されます。 「我々のようにopenstackのログを熟知しているもの」がこのルールを作れば、  利用者や一部スキル不足な運用者等でもログファイルを読むことなく原因がわ かります。
  39. ログ監視システムのデモをNTTブースで29日の○時からやります。 もしご興味あればマーケットプレイスのNTTブースまでお越しください。
  40. ライセンス条項 Elasticsearch logstash Kibana https://www.elastic.co/legal/trademarks OpenStack http://www.openstack.org/brand/openstack-trademark-policy/ Zabbix http://www.zabbix.com/jp/policy.php