In this webinar, Sebastien Leger, Founder, and CEO of LoudML, will show you how to use the LoudML machine learning API with your InfluxDB instance to quickly detect anomalies in your time series data that can trigger notifications in Slack or any of your favorite Incident management solutions like Pager Duty, OpsGenie, Victor Ops, or Alerta.
This webinar is organized in 4-parts: Basic setup running Docker, training your first-time series model (no programming needed!), building intelligent triggers and notifications, putting it all into practice as your solution easily detects abnormal data!
Take control of your SAP testing with UiPath Test Suite
A DevOps Tutorial to Set-up Intelligent Machine Learning Driven Alerts
1. DevOps tutorial:
How to setup intelligent
machine learning alerts
Sébastien Léger, founder, Loud ML,
loudml.io
16 January 2019
2. DevOps tutorial:
How to setup intelligent
machine learning alerts
Sébastien Léger
founder and CEO, Loud ML
loudml.io
In this session, you will learn about:
• the TICK stack;
• Loud ML, a popular extension for live
anomaly detection in time-series data;
• how to setup alerting via Kapacitor and
integrate with PagerDuty, and other
alerting tools;
• Donut: the neural net architecture in Loud ML;
• applicable use-cases.
3. 10K+ cumulated downloads in 2018,
lots of feedback, and more!
Joined the Rockstart AI accelerator program
in November 2018.
Thank You!
4. Agenda
• User stories, DevOps and
IoT
• Typical requirements for
alerting and ML
• Data pipeline
• Live streaming data demo
• Low false positives
(ie, low noise)
• Neural nets, deep-learning,
and Donut
• Loud ML 1.5 – Join the beta
• FAQ
6. User story 1: uptime
• My e-commerce site is global.
• Different users from different countries connect every day and
make purchases.
• Planned updates, or DevOps (CICD).
• Will it break anything, or cause downtime?
• How to spot if the # of transactions are correct, or how much
time is spent in the conversion funnel.
7. User story 2: security
• My e-commerce site is global.
• Different users from different countries connect every day and
make purchases.
• Different volumes from different sources.
8. User story 3: utilization
• The load changes during the day, during the night, and
during the weekend.
• Cloud or private DC resource utilization versus costs.
9. User story 4: Internet transit
• Running data center operations across multiple regions.
• Dynamic changes in traffic volume at the network edges.
• Get the right capacity, for the right cost (price per Mb/s).
10. User story 5: IoT
• PV: voltages, internal temperature, charge cycles, quantity
of electricity produced.
• Remote maintenance: spot damaged batteries.
• Physical infrastructure.
• Patterns in structural frequencies.
• Remote maintenance: spot when significant changes occur.
• Digital clone, industrial IoT (IIoT).
• Normal versus abnormal.
12. Typical requirements for alerting and ML
Performance
• Near real time, low alert delay.
• Running at scale, 24/7,
10,000+ users, hosts,
applications, devices.
• Low false positives; ie, low noise.
• Developer friendly: fast to
validate and deploy.
Functionality
• Can understand seasonality in the
data; eg, weekend vs daily
patterns, or across regions.
• Can learn and reinforce
continuously using live data.
• Can understand business rules.
• Works with third-party integrations.
Applicable to logs, metrics, events: page views, clicks, online users, orders,
response times, active IPs, syslogs, temperature, acceleration data & more.
17. Low false positives ie, low noise
How to evaluate ML fitness in a given application
18. Low false positives ie, low noise
How to evaluate ML fitness in a given application
Credits: dataschool.io
Precision P=TP/(TP+FP)
Recall R=TP/(TP+FN)
F1-score 2/(1/P+1/R)
Recall
Precision
22. Donut is cool
• Donut has interesting properties.
• Low false positives (ie, low noise).
• It is as good as it gets.
• F1-score = 0.7 to 0.9, from arXiv:1802.03903.
• It can understand seasonality in the data.
• It can learn from labels.
24. Loud ML 1.5 beta
• Donut, plus more:
• Near real time, low alert delay.
• Running at scale, 24/7, 10,000+ users, hosts, applications.
• Can learn and reinforce continuously using live data.
• Developer friendly: fast to deploy, runs on CPUs or GPUs.
• Why Loud ML
• Fast ML deployment for time series data:
• the goal is to remove all the hurdles in AI.
• Explainable: gives % to observe specific values, easy to interpret!
• Agnostic of the underlying database.
• Accessible: the best ML, at a fraction of the cost.