This is my presentation from LISA 2014 in Seattle on November 14, 2014.
Most IT Ops teams only keep an eye on a small fraction of the metrics they collect because analyzing this haystack of data and extracting signal from the noise is not easy and generates too many false positives.
In this talk I will show some of the types of anomalies commonly found in dynamic data center environments and discuss the top 5 things I learned while building algorithms to find them. You will see how various Gaussian based techniques work (and why they don’t!), and we will go into some non-parametric methods that you can use to great advantage.