9. Before Datadog
• We used:
• munin
• growthforecast
• cloudwatch
• Wanted to centralized management !
10. After Datadog - Phase1
• OK, we can manage centrally
• But...?
• We're respecting the free development of engineers !
• Problem that the monitoring setting is leaked out "
11. Phase2
• Introduce Interferon
• Datadog DSL
• Well, we can monitor all resources automatically
• But...?
• Unmaintained in active !
• Can't feel free to mute from Web UI "
• Lack of flexibility #
12. Phase3
• Integrated itamae
• Our engineers were used to write chef
• Easy to override default settings
• It's asynchronous. Feel free to mute from Web UI
• Integrated dogaws @takus
• Yet another Datadog CloudWatch Integragion
• We are used in combination with itamae
14. Datadog tips
• Easiness anomary detection
• Can't compared over 24hours until quite recently
• We request to be able to compare more longer period.
Thank Datadog for implementing !
• This is a closed function. If you want to use it, ask
Datadog support "
15. For example
• Comapare Kinesis records count EWMA
pct_change(median(last_1h),
1w_ago):ewma_20(avg:aws.kinesis.incoming_records{env
:production,cost:smartnews} by {name}) > 50
• Compare application warn log
change(median(last_1h),1w_ago):
sum:app.log.warn{env:production} by
{autoscaling_group} > 25
17. We're hiring!
Only two people on Site Reliability
Engineering Team !
• スマニューのSite Reliability Engineer
募集!
• http://about.smartnews.com/en/
careers/