A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts

DevOps tutorial:
How to setup intelligent
machine learning alerts
Sébastien Léger, founder, Loud ML,
loudml.io
16 January 2019

DevOps tutorial:
How to setup intelligent
machine learning alerts
Sébastien Léger
founder and CEO, Loud ML
loudml.io
In this session, you will learn about:
• the TICK stack;
• Loud ML, a popular extension for live
anomaly detection in time-series data;
• how to setup alerting via Kapacitor and
integrate with PagerDuty, and other
alerting tools;
• Donut: the neural net architecture in Loud ML;
• applicable use-cases.

10K+ cumulated downloads in 2018,
lots of feedback, and more!
Joined the Rockstart AI accelerator program
in November 2018.
Thank You!

Agenda
• User stories, DevOps and
IoT
• Typical requirements for
alerting and ML
• Data pipeline
• Live streaming data demo
• Low false positives
(ie, low noise)
• Neural nets, deep-learning,
and Donut
• Loud ML 1.5 – Join the beta
• FAQ

User story 1: uptime
• My e-commerce site is global.
• Different users from different countries connect every day and
make purchases.
• Planned updates, or DevOps (CICD).
• Will it break anything, or cause downtime?
• How to spot if the # of transactions are correct, or how much
time is spent in the conversion funnel.

User story 2: security
• My e-commerce site is global.
• Different users from different countries connect every day and
make purchases.
• Different volumes from different sources.

User story 3: utilization
• The load changes during the day, during the night, and
during the weekend.
• Cloud or private DC resource utilization versus costs.

User story 4: Internet transit
• Running data center operations across multiple regions.
• Dynamic changes in traffic volume at the network edges.
• Get the right capacity, for the right cost (price per Mb/s).

User story 5: IoT
• PV: voltages, internal temperature, charge cycles, quantity
of electricity produced.
• Remote maintenance: spot damaged batteries.
• Physical infrastructure.
• Patterns in structural frequencies.
• Remote maintenance: spot when significant changes occur.
• Digital clone, industrial IoT (IIoT).
• Normal versus abnormal.

Typical requirements for
alerting and ML
Outliers, and then...

Typical requirements for alerting and ML
Performance
• Near real time, low alert delay.
• Running at scale, 24/7,
10,000+ users, hosts,
applications, devices.
• Low false positives; ie, low noise.
• Developer friendly: fast to
validate and deploy.
Functionality
• Can understand seasonality in the
data; eg, weekend vs daily
patterns, or across regions.
• Can learn and reinforce
continuously using live data.
• Can understand business rules.
• Works with third-party integrations.
Applicable to logs, metrics, events: page views, clicks, online users, orders,
response times, active IPs, syslogs, temperature, acceleration data & more.

Live TICK-L demo
Telegraf + InfluxDb + Chronograf + Kapacitor + Loud ML (AutoML)

Useful resources
• Website: loudml.io
• Blog: medium.com/loud-ml
• Github: github.com/regel/loudml

Data pipeline
Metrics
and logs
collection
Pre-
processing,
feature
engineering
Database
storage
Machine
Learning
Automation
and alerting
DataViz
T (K)
(C)
(I) (L) (K)

Low false positives ie, low noise
How to evaluate ML fitness in a given application

Low false positives ie, low noise
How to evaluate ML fitness in a given application
Credits: dataschool.io
Precision P=TP/(TP+FP)
Recall R=TP/(TP+FN)
F1-score 2/(1/P+1/R)
Recall
Precision

Donut
arXiv:1802.03903
April 23-27, 2018, Lyon

Donut neural nets
Baseline
Reconstruction
probability
Encode Sampling Decode

Donut is cool
• Donut has interesting properties.
• Low false positives (ie, low noise).
• It is as good as it gets.
• F1-score = 0.7 to 0.9, from arXiv:1802.03903.
• It can understand seasonality in the data.
• It can learn from labels.

Loud ML 1.5 beta
• Donut, plus more:
• Near real time, low alert delay.
• Running at scale, 24/7, 10,000+ users, hosts, applications.
• Can learn and reinforce continuously using live data.
• Developer friendly: fast to deploy, runs on CPUs or GPUs.
• Why Loud ML
• Fast ML deployment for time series data:
• the goal is to remove all the hurdles in AI.
• Explainable: gives % to observe specific values, easy to interpret!
• Agnostic of the underlying database.
• Accessible: the best ML, at a fraction of the cost.

Thank You
Interested in joining the beta?
loudml.io/contact@loud_ml

Will the model continue to learn?

What are the options for
feature engineering?

A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts

Similaire à A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts (20)

Plus de DevOps.com

Plus de DevOps.com (20)

Dernier

Dernier (20)

A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts