4. About Operation
• 现在:
– SRE(Google)
Keep the site up
Work at a Large Scale
Balance competing demands
– PE(淘宝)
– DevOps(Facebook)
Move Fast, Monitor Close
– SDN
41. monit
set httpd port 2812 and
use address localhost
allow localhost
set daemon 60
set alert 'noreply@admin.com'
check system localhost
if cpu usage (wait) > 5% for 2 cycles then alert
check process squid with pidfile '/var/run/squid.pid'
start program = '/etc/init.d/squid start'
stop program = '/etc/init.d/squid stop'
if totalmem > 8192 Mb then restart squid
check file cache_log with path /var/log/squid/cache.log
if match "COSS: /data/stripe: Rebuild Completed"
then exec "/usr/libexec/squid/online" every 10 cycles
57. natification_options
r = Recovery(恢复)
f = Flapping(抖动)
s = Scheduled downtime(规划内停止和恢复)
n = None(不发送)
d = Down(host状态)
u = Unreachable(host不可达)或Unknown(service未知)
w = Warning(service警告)
c = Critical(service危险)