2. What is monit ?
monitはプロセスを監視し、エラー時に自動でメンテナ
ンスと復旧を行うことができるツール。(http://
mmonit.com/monit/)。例えばプロセスを関して問題
を検知したらプロセスを再起動するといったことが可能。
プロセスだけでなく、ファイルやディレクトリ、ファイル
システムの変更なども監視可能。
Monitは独自のDSL(Domain Specific Language)を
使用。DSLが非常に読み易くGood.
4. Configuration
set daemon < 監視する間隔
set logfile <Path> ログファイルの
set logfile syslog facility <faclity syslog
set mailserver < 何かイベントが発生した場合にメー
ルで通知set alert <
set alert < 通知して欲しいイベントのみ通知
set alert < 特定のイベント以外を通知
include <Path> 別の設定ファイルをインクルード
configuration file is /etc/monit.conf
global設定は/etc/monit.confに記載し、プロセスなどの監視・コントロー
ルについては/etc/monit.d/配下にそれぞれ書いていった方がよさそう。
5. 例)global? config
コントロール全てに共通の設定を/etc/monit.confに記述
##監視間隔を1分に設定
set daemon 60
##ログをsyslogに飛ばす
set logfile syslog facility log_daemon
##プライマリとセカンダリのサーバを指定
set mailserver mail1.ex.com,mail2.ex.com
set mail-format {
##アラートメールのフォーマットなど
from: monit@ex.com
subject: $HOST : $SERVICE - $EVENT
message: Monit
ACTION : $ACTION
SERVICE : $SERVICE
at $DATE on $HOST.
DESCRIPTION : $DESCRIPTION
}
set alert server_alert@ex.com
include /etc/monit.d/*.conf
6. A service does not exist (e.g. process is not
running)
Cannot read service data (e.g. cannot get
filesystem usage)
Execution of service related scripts failed (e.g.
start failed)
Invalid service type (e.g. if path points to
directory instead of file)
Custom test script returned error
Ping test failed
TCP/UDP connection and/or port test failed
Resource usage test failed (e.g. cpu usage too
high)
Checksum mismatch or change (e.g. file
changed)
File size test failed (e.g. file too large)
Appendix)Alert Messages
以下のsituationのとき、Alertをraiseする。
Timestamp test failed (e.g. file is older then
expected)
Permission test failed (e.g. file mode doesn't
match)
An UID test failed (e.g. file owned by
different user)
A GID test failed (e.g. file owned by different
group)
A process' PID changed out of Monit control
A process' PPID changed out of Monit control
Too many service recovery attempts failed
A file content matched the pattern
Filesystem flags changed
A service action was performed by
administrator
Monit was started, stopped or reloaded
7. Appendix) List of possible event types
Event Failure State Success State
ACTION Action done Action done
CHECKSUM Checksum failed Checksum succeeded
CONNECTION Connection failed Connection succeeded
DATA Data access error Data access succeeded
EXEC Execution failed Execution succeeded
FSFLAGS FIlesystem flags failed Filesystem flags succeeded
UID/GID UID/GID failed UID/GID succeeded
ICMP ICMP failed ICMP succeeded
INSTANCE Monit instance changed Monit instance changed not
INVALID Invalid type Type succeeded
NONEXIST Does not exist Exists
PERMISSION Permission failed Permission succeeded
PID/PPID PID/PPID failed PID/PPID succeeded
RESOURCE Resource limit matched Resource limit succeeded
SIZE Size failed Size succeeded
STATUS Status failed Status succeeded
TIMEOUT Timeout Timeout recovery
TIMESTAMP Timestamp failed Timestamp succeeded
UPTIME Uptime failed Uptime succeeded