How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

How to Improve Data Labels and Feedback Loops
in time-series using InfluxDB

Julien Muller
AI expert
Ex-IBM
Big Data architect
https://www.linkedin.com/in/mullerjulien/
CTO at Ezako
Creator of Upalgo
2

We are Ezako
Based in Paris and in Sophia-Antipolis on the
French Riviera.
Startup specialized in AI and time-series data.
Expertise in Machine Learning.
Creator of Upalgo.
Aerospace, Automotive, Telecom.
Sensor, telemetric and IoT data.
3
Ezako offices in Sophia-Antipolis

Why Upalgo ?
Upalgo is a time series management
suite.
4
Anomaly
Detection
Labeling
Time series & Machine Learning:
- Large datasets
- Temporality matters
- We don’t know the ground truth

InfluxDB and Ezako
5
Using InfluxDB since 2016
Influx is the 4th (relational database, nosql, hadoop
...) system we use for storage of TS data.
Our issues were:
- Big data (sampling) & high frequency
- Slow access
- Need for specific elements in the engine
Windows & features
- Need a community to get answers (as this is a
very specific field)
Why did we chose InfluxDB ?
- Storage adapted to TS data
- Better performance
- Native nanosecond handling
- No schema

Upalgo architecture
6
Our data challenges:
- Continuous writes
- Intensive reads at learning
phases
The architectural solution:
- InfluxDB

Machine Learning with InfluxDB
7
Machine Learning is challenging because:
- Continuous data insert (often between
1khz to 50khz sensors)
- Intensive metadata / feature calculations
- Learning on huge datasets
- Fast detection on small data sets
- You don’t know the ground truth
InfluxDB brings a solution to these limitations.

An Anomaly Detection workflow
8
Anomaly Detection in time-series is hard because two
users won’t have the same definition of an anomaly.
A solid workflow is essential to perform a good
Anomaly Detection:
➔ insert data
➔ calculate features
➔ understand your data
➔ learn a model
➔ detect

InfluxDB as intermediary storage
9
Raw data must be stored (reference).
Adjusted data is useful.
➔ We store several calculated time-series for each raw
time-serie.

An Anomaly Detection workflow
10
Data
processing
Meta-data
extraction
Feature
calculation
Validated
model
Label
spreading
Learning
Anomaly
detection
Labeling
Raw
Data
InﬂuxDB
VisualizeVisualizeVisualize

What is Labeling ?
Labeling is the activity of
tagging one or more labels
to identify certain properties
or characteristics of data.
Labeled data produce considerable improvement in
learning accuracy.
Labeling is a time consuming process which is a crucial
part of training machine learning algorithms. Data
Scientists and experts spend most of their time in this
repetitive task.
11

Challenge 1
12
1. User friendly UI
2. Auto label spreading with
Machine Learning
How do you put 20 000 labels on
20 million data points in a few
minutes?

Labeling is interesting because
13
➔ Experts want more information on their data
➔ Supervised Machine Learning need labels
➔ Manual labeling is exhausting

Ergonomics can increase by 15 times the speed of labeling
14

AI based label conflict management
All the labels are controlled
for conflicts.
Benefits: reduce labeling
errors.
15

UI based labeling and tag management
Always visible and
accessible one-click labeling.
16
Conﬁrming and discarding the label
propositions
Tag management Tags
Labels

Label propagation can increase by 15 times the labeling
speed
The idea is to to label the entire
dataset with AI based auto
label propagation.
Benefits: much faster
labelling.
17

Label propagation
18
Propagated labels ready
to be confirmed.

Challenge 2
19
Create an Anomaly Detection
workflow based on user
feedback loop.
Optimize algorithm performance
through user feedbacks.

Feedback loop is interesting because
20
➔ Continuous relearn
➔ Read challenges on big data sets
➔ UI complexity

Importance of UI in feedback loops
an anomaly
21

A scoring system to optimize the model configuration
22
Use a scoring system in order to
optimize the algorithm and feature
choices.

To sum-up
23
Time-series labeling and feedback management is very complex and difficult.
The solution is to:
- adopt a TS database as InfluxDB
- create a user-friendly UI
- apply propagation tools to spped up things
- implement an efficient workflow
Our experience with InfluxDB:
- pretty smooth
- plug and forget mentality

Migrating to influxDB 2.0 ?
24
➔ influxDB IOX
➔ influx Query Language: flux -> New functions ...

Julien Muller
julien.muller@ezako.com
+33 6 65 06 64 66
www.ezako.com

How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB

Similaire à How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB (20)

Plus de InfluxData

Plus de InfluxData (20)

Dernier

Dernier (20)

How to Improve Data Labels and Feedback Loops Through High-Frequency Sensor Anomaly Detection by Using InfluxDB