1. Anomaly Detection for Information
Systems
CSML-001: Machine Learning
Olga Tatarintseva
otatarintseva@satelliz.com, olitatarintseva@gmail.com
Support Team Manager at Satelliz
2. Introduction
➢ The main idea of the company:
○ presenting good of quality of the service for the client
➢ Satelliz provides a monitoring as a service for servers, applications, websites in real time and with
many details
➢ The incident occurs when the metric of some sensor checked by Satelliz agent is out of the defined
range:
○ the default value, or
○ value defined by the engineer relying on his knowledge and experience
3. How to Choose the Sensor Settings?
➢ Are the default limits or manual settings of all
metrics the best idea?
➢ Is there any other way to find out that the
behaviour of the system needs attention of
the engineer?
➢ The real issue is only going to raise:
○ can we detect it?
○ can we notify the engineer about it?
4. Project Idea
The engineers don’t receive notifications for:
─ the false-positive issues
─ the services impacted by usual operations (backup, maintenance etc.)
The engineers receive notification when:
⊹ the behaviour of the system doesn’t look normal
⊹ the issue is only going to appear
⊹ the metric changed to unusually big or low value
⊹ dynamic of changing values for several interdependent metrics is not normal
5. Project Goal
To increase the productivity of the monitoring process by teaching the monitoring system to know
the usual behaviour of the system
➢ which values of sensor metrics are normal for this current machine
➢ how several metrics interact with each other
The monitoring system should be teached:
➢ to understand when the system works in normal way
➢ to understand if the issue exist or going to raise
➢ to determine the best Warning or Critical threshold for the metric
➢ to notify the engineer about the detected anomaly
6. Project Result
Problem type for unsupervised learning
Approach:
➢ anomaly detection
Model:
➢ Gaussain mixture model
Method:
➢ expectation-maximization algorithm
Input data: perf data for one of application servers
○ Sensor: CPU load Perf metric: load1
○ Sensor: Memory used Perf metric: ram_used
○ Distribution: Normal
7. Project Result
Num of gaussians: 2
Convergence achieved for: 7.44570943755e-07
Gaussian 1 center: (2.02114639; 388.55796395)
Weight of Gaussian 1: 0.93117592
Gaussian 2 center: (3.80960681, 528.0761025)
Weight of Gaussian 2: 0.06882408