SlideShare une entreprise Scribd logo
1  sur  66
Télécharger pour lire hors ligne
Detecting
time series anomalies
on-the-fly with WarpScript™
Jean-Charles Vialatte
Machine Learning Engineer
Agenda
I. Presentation
II. Detecting Anomalies
III. Seasonality Analysis
IV. Conclusion
Presentation
I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
Agenda
What are Time Series?
Domains and use cases
Why it’s not easy to build a TSDB
Storage
● Scalability
● Ingestion / Fetch performance
● Security, GDPR compliant
● Deployment (e.g. standalone vs edge vs distributed)
Analytics
● Simple and complex queries
● Concurrent access
● Interoperable with other programs / languages / libraries
● Parallelizable when storage is distributed
The Geo Time Series™ data model
Metadata Datapoints
key1: value1
key2: value2
. . .
timestamps
values
geostamps (optional)labels: immutable
attributes: mutable
classname
identifies a GTS
Long, Double, String, Bytes,
Multi-values, nested GTS, . . .
Warp 10 Storage Engine
Geo Time Series™
Performance
Secured, GDPR
Scalable
Standard protocols and formats
Interoperability
A library of 1000+ functions
From basic statistics to advanced signal
processing and anomaly detection
12
Warp 10 Analytics Engine
Execute a same code on a single server
or on a distributed cluster Executable via Http,
via Java, or via Python
A library of 1000+ functions
13
Warp 10 Analytics Engine
Executable via Http,
via Java, or via Python
Independent of the Storage engine
Can be connected to any data source
Concise Syntax designed for data flows
$data FUNC1 FUNC2 FUNC3 ...
Things / Sensors
Data transmission
Data cleansing
Data synchronization
Analytics, ML Feature
Engineering & Extraction
Data filtering
Data access control
Data storage
Business Applications and
services
Business analytics
Data science
80%
of effort
Scope of Warp 10™
Advantages of Warp 10™
● Broader scope: from storage to analytics
● More complex queries and analytics
● Optional support for Geo
● Both storage and analytics are distributable
● Strongly interoperable with other tools
Get your hands on Warp 10™ in no time
https://sandbox.senx.io
WarpScript functions
WarpScript has many built-in anomaly detection functions:
● THRESHOLDTEST
● ZSCORETEST
● GRUBBSTEST
● ESDTEST
● STLESDTEST
● HYBRIDTEST
● HYBRIDTEST2
● DISCORDS
● ZDISCORDS
● . . .
Why so many?
To answer different
types of anomalies
What is an anomaly?
A A A A B A A A . . .
What is an anomaly?
A A A A B A A A . . .
A A A A B A A A A B A A A A B A C A A B A A A A B A . . .
What is to be considered as an anomaly?
This is the real question to ask.
What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
➢ Outliers of a statistical distribution
➢ Forecast errors
What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
➢ Outliers of a statistical distribution
➢ Forecast errors
➢ Seasonality dependant
➢ Use case dependant
Detecting Anomalies
Agenda
I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
WarpScript basics
args... FUNCTION
syntax
1 ‘a’ STORE
Assign value
$a
Use variable
<% ‘some operations’ %> ‘macro’ STORE
Define a macro (i.e. a custom function) args... @macro
Evaluate macro
args... @trusted/repo/macro
Evaluate macro from trusted repository
Threshold techniques
How to define the threshold?
● Above (or below) a simple value
$data $threshold THRESHOLDTEST
● Compare with the mean (or median)
$data $useMedian $nb_std ZSCORETEST
● Compare with the moving mean (or median)
$data $window_args $nb_std @moving_ZSCORETEST
$args FUNCTION
$args @macro
Above a threshold
// Detect anomaly
100.0 THRESHOLDTEST
// Fetch data
[ $token 'response_time' {} NOW -500 ] FETCH $args FUNCTION
Compare with the mean
$data false 3.0 ZSCORETEST
$args FUNCTION
Compare with the moving mean
$data $windows_args 3.0 @moving_ZSCORETEST
$args FUNCTION
Changing moving window parameters
5 before 5 after to 5 before 0 after $args FUNCTION
Statistical tests
Under normality assumption:
● Grubbs test: detects if the maximum (or minimum value) is an outlier
$data $useMedian GRUBBSTEST
● Extreme studentized deviate test: detect up to k outliers
$data $k $useMedian ESDTEST
$args FUNCTION
ESD test
$data 100 3.0 ESDTEST
$args FUNCTION
Forecast anomalies
With the extension Warp10-ext-Forecast, you can create
forecast models.
● Specific forecast models:
LSTM, NNETAR, SES, HOLT, HOLTWINTERS, ARMA, ARIMA, SARMA, SARIMA
● Let an algorithm choose for you:
AUTO, SEARCH.NNET, SEARCH.ETS, SEARCH.ARIMA
● Anomalies can be detected using:
$forecastModel FORECAST.ANOMALIES
$args FUNCTION
Automatic forecast model
$data AUTO FORECAST.ANOMALIES
$args FUNCTION
Seasonality Analysis
Agenda
I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
Seasonal data
A A A A B A A A A B A A A A B A A A A B A A A A B A . . .
Seasonal data
A A A A B A A A A B A A A A B A A A A B A A A A B A . . .
A B C D A B C D A B C D A B C D A B C D A B C D A . . .
Seasonal data
Hourly temperature measurements (in Kelvins) in San Francisco
How to detect seasonality?
● Auto-Correlation function (ACF)
$data [ $data ] [ $domain ] CORRELATE
● Power spectral density (using FFT and IFFT functions)
$data @FAST_CORRELATE
$args FUNCTION
First seasonality
ACF plot shows 1-day seasonality
Second seasonality
ACF plot shows 1-year seasonality
Seasonal Trend Extraction
$data $params STL
● With Seasonal Trend Loess (STL) procedure
$args FUNCTION
Residual
● Data minus extracted seasonal and trend components
$args FUNCTION
It is easier to detect anomalies on the residual!
Seasonal anomaly detection
WarpScript functions:
● Seasonal statistical outliers
STLESDTEST, HYBRIDTEST, HYBRIDTEST2
● Seasonal forecast anomalies
SARIMA, SEARCH.SARIMA, HOLTWINTERS, SEARCH.ETS
$args FUNCTION
Without Seasonality
$data $k $useMedian ESDTEST
$args FUNCTION
With Seasonality
$data $seasonality $piece $k HYBRIDTEST
$args FUNCTION
How to handle multiple seasonalities?
Possible strategies
● Iterate Anomaly detection for each seasonality
● Use difference series and integrate (available with forecast extension):
[ $seasonality_1 $seasonality_2 ... ] DIFF
@ANOMALY_DETECTION
[ $seasonality_1 $seasonality_2 ... ]INVERTDIFF
$args FUNCTION
Single seasonality difference
$data [ $1d ] DIFF
$args FUNCTION
Double seasonality difference
$data [ $1d $5m ] DIFF
$args FUNCTION
With double seasonality
$data [ $1d $5m ] DIFF 100 false ESDTEST
$args FUNCTION
Conclusion
Takeaways
● Warp 10 and WarpScript!
● Anomaly detection: multiple techniques
● Threshold techniques
● Statistical tests
● Forecast anomalies
● How to handle seasonalities
● Check out our blog! blog.senx.io
$args FUNCTION
Thank you!
Supplementary slides
Rationales for using Geo Time Series
Some features
● Store raw data
● Inner relations: time (and optionally geo)
● Outer relations: group by classname, group by key/value
Some benefits
● Chunkable / Parallelizable
● Easy manipulation
● Easier implementation of analytics
WarpScript has over 900 functions
String Function (32) Maths (74)
Geo Time Series®
(145)
Stack (66)
Composite Types
(52)
Processing (94) Platform (39) Logic (10)
Time Related (26) Cryptographic (16)
Logic Structure &
Flow Control (21)
Constants (9)
Quaternions (8) Mappers (93) reducers (37) Bucketizers (23)
Operations (18) Filters (12) Conversions (24) Geo (19)
58
Mode and ecosystem interoperability
APIs
FetchIngress Find Meta
Delete
REPLEgress
Py4J
gateway
Mobius
Interacting with the storage engine:
Interacting with the analytics engine:
Stream
update
Plasma . . .
. . .
Example of WarpScript
[ args ] FETCH
Example of WarpScript
[ args ] FETCH
[ args ] BUCKETIZE
[ args ] REDUCE
Shareability / Extensibility
Easily share macros (no installation required)
Retrieve and publish plugins, extensions, macros
warpfleet.macros.repos = http://MY/MACRO/REPOSITORY
@my/macro
Configuration file
Warpscript
$wf get --conf my/conf/file group artifact
Command line
Challenges Data Tools Results
• Monitoring large
infrastructures
(servers, networks,
devices, applications,
middlewares )
• Willingness to
rationalize monitoring
tools
• Enable advanced
analytics and Machine
Learning
• Monitoring
metrics and
events
• Over 500
millions Time
Series from
containers
(evanescent
series) and
physical devices
• Peaks over 50
millions
datapoints per
second
• Distributed
version of
Warp 10
• In-Memory
Warp 10
instances for
caching
• WarpScript for
analytics
• Reduced number of
technologies used for
monitoring
• Ability to perform analytics
on millions of series in
realtime
• Access to large historical
datasets (100s trillion of
datapoints) for trend analysis
and pattern detection
• Dashboarding tools (Grafana)
connected to Warp 10™
datasource used by all teams
• Identical analytics skills
acquired by all teams.
Challenges Data Tools Results
• Aircrafts are
equipped with a
growing number of
sensors
• Need to analyse
aircraft data for safety
maintenance and
diagnostic purposes for
individual aircrafts and
fleets
• Multiple teams want
access to the data
• Growth opportunities
in new services based
on data analysis
• 1 hour of flight
produces 8 Mb
to 1 Gb of data
depending on
aircraft (from
10 M to 3 B
datapoints per
flight hour)
• Historical
dataset for over
300 aircrafts for
multiple years
with projected
volumes in the
petabytes scale
for upcoming
fleets
• Time Series
analytics using
WarpScript on
Spark for batch
processing
• Interactive
manipulation of
intermediate
results in Warp 10
standalone
• Data science
using the Warp 10
Zeppelin plugin
• Ability to analyze all
existing flight data
• Efficient and flexible
incident analysis
• Fast data ingestion and
processing pipeline, enabling
maintenance KPIs to be
computed between landing
and parking of aircraft
Challenges Data Tools Results
• Industrial IoT
• 10.000 hours of
system validation in
Haïti, 2 devices
• Engineers must
record 200+ CAN and
temperature data
• 900.712 points per
hour (raw data
stored on the
embedded SSD)
• High cost non
reliable M2M 3G
connection
• 90.158 points
to upload per
hour per device
after custom
resampling
• CAN and
modbus
networks
• Warp 10™ Edge
on an iMX-6 with
500 GB
industrial SSD
• Distributed
version of
Warp 10™ for the
historical data
• VertX
application to
manage CAN
and modbus
• Local
WarpScript code
for resampling
and remote/local
database
synchronization
• 130 kB per hour,
100MB per month data
plan only.
• SDMO engineers can:
- Do usage statistics
- Compute thermal
stress
- Refine their validation
plan in real time

Contenu connexe

Tendances

Scala design pattern
Scala design patternScala design pattern
Scala design pattern
Kenji Yoshida
 
Functional Objects & Function and Closures
Functional Objects  & Function and ClosuresFunctional Objects  & Function and Closures
Functional Objects & Function and Closures
Sandip Kumar
 
Procedure Typing for Scala
Procedure Typing for ScalaProcedure Typing for Scala
Procedure Typing for Scala
akuklev
 
57200143 flash-action-script-quickref
57200143 flash-action-script-quickref57200143 flash-action-script-quickref
57200143 flash-action-script-quickref
pritam268
 
Javascript Uncommon Programming
Javascript Uncommon ProgrammingJavascript Uncommon Programming
Javascript Uncommon Programming
jeffz
 

Tendances (20)

SacalaZa #1
SacalaZa #1SacalaZa #1
SacalaZa #1
 
JavaScript Foundations Day1
JavaScript Foundations Day1JavaScript Foundations Day1
JavaScript Foundations Day1
 
Scala test
Scala testScala test
Scala test
 
Scala
ScalaScala
Scala
 
Metaprogramming in Scala 2.10, Eugene Burmako,
Metaprogramming  in Scala 2.10, Eugene Burmako, Metaprogramming  in Scala 2.10, Eugene Burmako,
Metaprogramming in Scala 2.10, Eugene Burmako,
 
Few simple-type-tricks in scala
Few simple-type-tricks in scalaFew simple-type-tricks in scala
Few simple-type-tricks in scala
 
Scala design pattern
Scala design patternScala design pattern
Scala design pattern
 
Scala uma poderosa linguagem para a jvm
Scala   uma poderosa linguagem para a jvmScala   uma poderosa linguagem para a jvm
Scala uma poderosa linguagem para a jvm
 
Functional Objects & Function and Closures
Functional Objects  & Function and ClosuresFunctional Objects  & Function and Closures
Functional Objects & Function and Closures
 
Procedure Typing for Scala
Procedure Typing for ScalaProcedure Typing for Scala
Procedure Typing for Scala
 
Class
ClassClass
Class
 
57200143 flash-action-script-quickref
57200143 flash-action-script-quickref57200143 flash-action-script-quickref
57200143 flash-action-script-quickref
 
Java patterns in Scala
Java patterns in ScalaJava patterns in Scala
Java patterns in Scala
 
Scala - brief intro
Scala - brief introScala - brief intro
Scala - brief intro
 
Javascript Uncommon Programming
Javascript Uncommon ProgrammingJavascript Uncommon Programming
Javascript Uncommon Programming
 
Scala jargon cheatsheet
Scala jargon cheatsheetScala jargon cheatsheet
Scala jargon cheatsheet
 
响应式编程及框架
响应式编程及框架响应式编程及框架
响应式编程及框架
 
scala.reflect, Eugene Burmako
scala.reflect, Eugene Burmakoscala.reflect, Eugene Burmako
scala.reflect, Eugene Burmako
 
1.2 scala basics
1.2 scala basics1.2 scala basics
1.2 scala basics
 
Garbage
GarbageGarbage
Garbage
 

Similaire à #OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec WarpScript - Jean-Charles Vialatte, SenX

Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
MLconf
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
Paco Nathan
 

Similaire à #OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec WarpScript - Jean-Charles Vialatte, SenX (20)

Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
TransmogrifAI - Automate Machine Learning Workflow with the power of Scala an...
 
Java Performance and Profiling
Java Performance and ProfilingJava Performance and Profiling
Java Performance and Profiling
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Analytics with Spark
Analytics with SparkAnalytics with Spark
Analytics with Spark
 
Don't Be Afraid of Abstract Syntax Trees
Don't Be Afraid of Abstract Syntax TreesDon't Be Afraid of Abstract Syntax Trees
Don't Be Afraid of Abstract Syntax Trees
 
Apache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating SystemApache Spark: The Analytics Operating System
Apache Spark: The Analytics Operating System
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Sorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at SpotifySorry - How Bieber broke Google Cloud at Spotify
Sorry - How Bieber broke Google Cloud at Spotify
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Java 8 Lambda
Java 8 LambdaJava 8 Lambda
Java 8 Lambda
 
Scio - Moving to Google Cloud, A Spotify Story
 Scio - Moving to Google Cloud, A Spotify Story Scio - Moving to Google Cloud, A Spotify Story
Scio - Moving to Google Cloud, A Spotify Story
 
Java Performance and Using Java Flight Recorder
Java Performance and Using Java Flight RecorderJava Performance and Using Java Flight Recorder
Java Performance and Using Java Flight Recorder
 
Shiksharth com java_topics
Shiksharth com java_topicsShiksharth com java_topics
Shiksharth com java_topics
 
Big data distributed processing: Spark introduction
Big data distributed processing: Spark introductionBig data distributed processing: Spark introduction
Big data distributed processing: Spark introduction
 
MLconf NYC Shan Shan Huang
MLconf NYC Shan Shan HuangMLconf NYC Shan Shan Huang
MLconf NYC Shan Shan Huang
 
Inside the JVM - Follow the white rabbit!
Inside the JVM - Follow the white rabbit!Inside the JVM - Follow the white rabbit!
Inside the JVM - Follow the white rabbit!
 
NetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talksNetflixOSS Open House Lightning talks
NetflixOSS Open House Lightning talks
 
GraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communitiesGraphX: Graph analytics for insights about developer communities
GraphX: Graph analytics for insights about developer communities
 

Plus de Paris Open Source Summit

#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
Paris Open Source Summit
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
Paris Open Source Summit
 

Plus de Paris Open Source Summit (20)

#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
#OSSPARIS19 : Control your Embedded Linux remotely by using WebSockets - Gian...
 
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
#OSSPARIS19 : A virtual machine approach for microcontroller programming : th...
 
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
#OSSPARIS19 : RIOT: towards open source, secure DevOps on microcontroller-bas...
 
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
#OSSPARIS19 : The evolving (IoT) security landscape - Gianluca Varisco, Arduino
 
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
#OSSPARIS19: Construire des applications IoT "secure-by-design" - Thomas Gaza...
 
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
#OSSPARIS19 : Supervision d'objets connectés industriels - Eric DOANE, Zabbix
 
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
#OSSPARIS19: Introduction to scikit-learn - Olivier Grisel, Inria
 
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
#OSSPARIS19 - Fostering disruptive innovation in AI with JEDI - André Loesekr...
 
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches  ...
#OSSPARIS19 : Comment ONLYOFFICE aide à organiser les travaux de recherches ...
 
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
#OSSPARIS19 : MDPH : une solution collaborative open source pour l'instructio...
 
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
#OSSPARIS19 - Understanding Open Source Governance - Gilles Gravier, Wipro Li...
 
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
#OSSPARIS19 : Publier du code Open Source dans une banque : Mission impossibl...
 
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
#OSSPARIS19 : Libre à vous ! Raconter les libertés informatiques à la radio -...
 
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
#OSSPARIS19 - Le logiciel libre : un enjeu politique et social - Etienne Gonn...
 
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
#OSSPARIS19 - Conflits d’intérêt & concurrence : la place de l’éditeur dans l...
 
#OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données #OSSPARIS19 - Table ronde : souveraineté des données
#OSSPARIS19 - Table ronde : souveraineté des données
 
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
#OSSPARIS19 - Comment financer un projet de logiciel libre - LUDOVIC DUBOST, ...
 
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
#OSSPARIS19 - BlueMind v4 : les dessous technologiques de 10 ans de travail p...
 
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
#OSSPARIS19 - Tuto de première installation de VITAM, un système d'archivage ...
 
#OSSPARIS19 - Cryptpad : la collaboration chiffrée - LUDOVIC DUBOST, CEO XWik...
#OSSPARIS19 - Cryptpad : la collaboration chiffrée - LUDOVIC DUBOST, CEO XWik...#OSSPARIS19 - Cryptpad : la collaboration chiffrée - LUDOVIC DUBOST, CEO XWik...
#OSSPARIS19 - Cryptpad : la collaboration chiffrée - LUDOVIC DUBOST, CEO XWik...
 

Dernier

The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
shinachiaurasa2
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
VishalKumarJha10
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
mohitmore19
 

Dernier (20)

Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verifiedSector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
Sector 18, Noida Call girls :8448380779 Model Escorts | 100% verified
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
The title is not connected to what is inside
The title is not connected to what is insideThe title is not connected to what is inside
The title is not connected to what is inside
 
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdfThe Top App Development Trends Shaping the Industry in 2024-25 .pdf
The Top App Development Trends Shaping the Industry in 2024-25 .pdf
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park %in kempton park+277-882-255-28 abortion pills for sale in kempton park
%in kempton park+277-882-255-28 abortion pills for sale in kempton park
 
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456LEVEL 5   - SESSION 1 2023 (1).pptx - PDF 123456
LEVEL 5 - SESSION 1 2023 (1).pptx - PDF 123456
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
OpenChain - The Ramifications of ISO/IEC 5230 and ISO/IEC 18974 for Legal Pro...
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdfAzure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
Azure_Native_Qumulo_High_Performance_Compute_Benchmarks.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 

#OSSPARIS19 : Detecter des anomalies de séries temporelles à la volée avec WarpScript - Jean-Charles Vialatte, SenX

  • 1. Detecting time series anomalies on-the-fly with WarpScript™ Jean-Charles Vialatte Machine Learning Engineer
  • 2. Agenda I. Presentation II. Detecting Anomalies III. Seasonality Analysis IV. Conclusion
  • 4. I. Presentation A. Time Series data B. Warp 10 and WarpScript C. Anomaly Detection II. Detecting Anomalies A. Using simple threshold techniques B. Using statistical methods C. Using forecast models III. Seasonality Analysis A. Detecting seasonality B. Seasonal anomaly detection C. Multiple seasonalities IV. Conclusion Agenda
  • 5. What are Time Series?
  • 7. Why it’s not easy to build a TSDB Storage ● Scalability ● Ingestion / Fetch performance ● Security, GDPR compliant ● Deployment (e.g. standalone vs edge vs distributed) Analytics ● Simple and complex queries ● Concurrent access ● Interoperable with other programs / languages / libraries ● Parallelizable when storage is distributed
  • 8.
  • 9.
  • 10. The Geo Time Series™ data model Metadata Datapoints key1: value1 key2: value2 . . . timestamps values geostamps (optional)labels: immutable attributes: mutable classname identifies a GTS Long, Double, String, Bytes, Multi-values, nested GTS, . . .
  • 11. Warp 10 Storage Engine Geo Time Series™ Performance Secured, GDPR Scalable Standard protocols and formats Interoperability
  • 12. A library of 1000+ functions From basic statistics to advanced signal processing and anomaly detection 12 Warp 10 Analytics Engine Execute a same code on a single server or on a distributed cluster Executable via Http, via Java, or via Python
  • 13. A library of 1000+ functions 13 Warp 10 Analytics Engine Executable via Http, via Java, or via Python Independent of the Storage engine Can be connected to any data source Concise Syntax designed for data flows $data FUNC1 FUNC2 FUNC3 ...
  • 14. Things / Sensors Data transmission Data cleansing Data synchronization Analytics, ML Feature Engineering & Extraction Data filtering Data access control Data storage Business Applications and services Business analytics Data science 80% of effort Scope of Warp 10™
  • 15. Advantages of Warp 10™ ● Broader scope: from storage to analytics ● More complex queries and analytics ● Optional support for Geo ● Both storage and analytics are distributable ● Strongly interoperable with other tools
  • 16. Get your hands on Warp 10™ in no time https://sandbox.senx.io
  • 17. WarpScript functions WarpScript has many built-in anomaly detection functions: ● THRESHOLDTEST ● ZSCORETEST ● GRUBBSTEST ● ESDTEST ● STLESDTEST ● HYBRIDTEST ● HYBRIDTEST2 ● DISCORDS ● ZDISCORDS ● . . . Why so many? To answer different types of anomalies
  • 18. What is an anomaly? A A A A B A A A . . .
  • 19. What is an anomaly? A A A A B A A A . . . A A A A B A A A A B A A A A B A C A A B A A A A B A . . .
  • 20. What is to be considered as an anomaly? This is the real question to ask.
  • 21. What is to be considered as an anomaly? This is the real question to ask. An anomaly can be: ➢ Particular values, new values . . . ➢ Values above or below a certain threshold
  • 22. What is to be considered as an anomaly? This is the real question to ask. An anomaly can be: ➢ Particular values, new values . . . ➢ Values above or below a certain threshold ➢ Outliers of a statistical distribution ➢ Forecast errors
  • 23. What is to be considered as an anomaly? This is the real question to ask. An anomaly can be: ➢ Particular values, new values . . . ➢ Values above or below a certain threshold ➢ Outliers of a statistical distribution ➢ Forecast errors ➢ Seasonality dependant ➢ Use case dependant
  • 25. Agenda I. Presentation A. Time Series data B. Warp 10 and WarpScript C. Anomaly Detection II. Detecting Anomalies A. Using simple threshold techniques B. Using statistical methods C. Using forecast models III. Seasonality Analysis A. Detecting seasonality B. Seasonal anomaly detection C. Multiple seasonalities IV. Conclusion
  • 26. WarpScript basics args... FUNCTION syntax 1 ‘a’ STORE Assign value $a Use variable <% ‘some operations’ %> ‘macro’ STORE Define a macro (i.e. a custom function) args... @macro Evaluate macro args... @trusted/repo/macro Evaluate macro from trusted repository
  • 27. Threshold techniques How to define the threshold? ● Above (or below) a simple value $data $threshold THRESHOLDTEST ● Compare with the mean (or median) $data $useMedian $nb_std ZSCORETEST ● Compare with the moving mean (or median) $data $window_args $nb_std @moving_ZSCORETEST $args FUNCTION $args @macro
  • 28. Above a threshold // Detect anomaly 100.0 THRESHOLDTEST // Fetch data [ $token 'response_time' {} NOW -500 ] FETCH $args FUNCTION
  • 29. Compare with the mean $data false 3.0 ZSCORETEST $args FUNCTION
  • 30. Compare with the moving mean $data $windows_args 3.0 @moving_ZSCORETEST $args FUNCTION
  • 31. Changing moving window parameters 5 before 5 after to 5 before 0 after $args FUNCTION
  • 32. Statistical tests Under normality assumption: ● Grubbs test: detects if the maximum (or minimum value) is an outlier $data $useMedian GRUBBSTEST ● Extreme studentized deviate test: detect up to k outliers $data $k $useMedian ESDTEST $args FUNCTION
  • 33. ESD test $data 100 3.0 ESDTEST $args FUNCTION
  • 34. Forecast anomalies With the extension Warp10-ext-Forecast, you can create forecast models. ● Specific forecast models: LSTM, NNETAR, SES, HOLT, HOLTWINTERS, ARMA, ARIMA, SARMA, SARIMA ● Let an algorithm choose for you: AUTO, SEARCH.NNET, SEARCH.ETS, SEARCH.ARIMA ● Anomalies can be detected using: $forecastModel FORECAST.ANOMALIES $args FUNCTION
  • 35. Automatic forecast model $data AUTO FORECAST.ANOMALIES $args FUNCTION
  • 37. Agenda I. Presentation A. Time Series data B. Warp 10 and WarpScript C. Anomaly Detection II. Detecting Anomalies A. Using simple threshold techniques B. Using statistical methods C. Using forecast models III. Seasonality Analysis A. Detecting seasonality B. Seasonal anomaly detection C. Multiple seasonalities IV. Conclusion
  • 38. Seasonal data A A A A B A A A A B A A A A B A A A A B A A A A B A . . .
  • 39. Seasonal data A A A A B A A A A B A A A A B A A A A B A A A A B A . . . A B C D A B C D A B C D A B C D A B C D A B C D A . . .
  • 40. Seasonal data Hourly temperature measurements (in Kelvins) in San Francisco
  • 41. How to detect seasonality? ● Auto-Correlation function (ACF) $data [ $data ] [ $domain ] CORRELATE ● Power spectral density (using FFT and IFFT functions) $data @FAST_CORRELATE $args FUNCTION
  • 42. First seasonality ACF plot shows 1-day seasonality
  • 43. Second seasonality ACF plot shows 1-year seasonality
  • 44. Seasonal Trend Extraction $data $params STL ● With Seasonal Trend Loess (STL) procedure $args FUNCTION
  • 45. Residual ● Data minus extracted seasonal and trend components $args FUNCTION It is easier to detect anomalies on the residual!
  • 46. Seasonal anomaly detection WarpScript functions: ● Seasonal statistical outliers STLESDTEST, HYBRIDTEST, HYBRIDTEST2 ● Seasonal forecast anomalies SARIMA, SEARCH.SARIMA, HOLTWINTERS, SEARCH.ETS $args FUNCTION
  • 47. Without Seasonality $data $k $useMedian ESDTEST $args FUNCTION
  • 48. With Seasonality $data $seasonality $piece $k HYBRIDTEST $args FUNCTION
  • 49. How to handle multiple seasonalities? Possible strategies ● Iterate Anomaly detection for each seasonality ● Use difference series and integrate (available with forecast extension): [ $seasonality_1 $seasonality_2 ... ] DIFF @ANOMALY_DETECTION [ $seasonality_1 $seasonality_2 ... ]INVERTDIFF $args FUNCTION
  • 50. Single seasonality difference $data [ $1d ] DIFF $args FUNCTION
  • 51. Double seasonality difference $data [ $1d $5m ] DIFF $args FUNCTION
  • 52. With double seasonality $data [ $1d $5m ] DIFF 100 false ESDTEST $args FUNCTION
  • 54. Takeaways ● Warp 10 and WarpScript! ● Anomaly detection: multiple techniques ● Threshold techniques ● Statistical tests ● Forecast anomalies ● How to handle seasonalities ● Check out our blog! blog.senx.io $args FUNCTION
  • 57. Rationales for using Geo Time Series Some features ● Store raw data ● Inner relations: time (and optionally geo) ● Outer relations: group by classname, group by key/value Some benefits ● Chunkable / Parallelizable ● Easy manipulation ● Easier implementation of analytics
  • 58. WarpScript has over 900 functions String Function (32) Maths (74) Geo Time Series® (145) Stack (66) Composite Types (52) Processing (94) Platform (39) Logic (10) Time Related (26) Cryptographic (16) Logic Structure & Flow Control (21) Constants (9) Quaternions (8) Mappers (93) reducers (37) Bucketizers (23) Operations (18) Filters (12) Conversions (24) Geo (19) 58
  • 59. Mode and ecosystem interoperability
  • 60. APIs FetchIngress Find Meta Delete REPLEgress Py4J gateway Mobius Interacting with the storage engine: Interacting with the analytics engine: Stream update Plasma . . . . . .
  • 61. Example of WarpScript [ args ] FETCH
  • 62. Example of WarpScript [ args ] FETCH [ args ] BUCKETIZE [ args ] REDUCE
  • 63. Shareability / Extensibility Easily share macros (no installation required) Retrieve and publish plugins, extensions, macros warpfleet.macros.repos = http://MY/MACRO/REPOSITORY @my/macro Configuration file Warpscript $wf get --conf my/conf/file group artifact Command line
  • 64. Challenges Data Tools Results • Monitoring large infrastructures (servers, networks, devices, applications, middlewares ) • Willingness to rationalize monitoring tools • Enable advanced analytics and Machine Learning • Monitoring metrics and events • Over 500 millions Time Series from containers (evanescent series) and physical devices • Peaks over 50 millions datapoints per second • Distributed version of Warp 10 • In-Memory Warp 10 instances for caching • WarpScript for analytics • Reduced number of technologies used for monitoring • Ability to perform analytics on millions of series in realtime • Access to large historical datasets (100s trillion of datapoints) for trend analysis and pattern detection • Dashboarding tools (Grafana) connected to Warp 10™ datasource used by all teams • Identical analytics skills acquired by all teams.
  • 65. Challenges Data Tools Results • Aircrafts are equipped with a growing number of sensors • Need to analyse aircraft data for safety maintenance and diagnostic purposes for individual aircrafts and fleets • Multiple teams want access to the data • Growth opportunities in new services based on data analysis • 1 hour of flight produces 8 Mb to 1 Gb of data depending on aircraft (from 10 M to 3 B datapoints per flight hour) • Historical dataset for over 300 aircrafts for multiple years with projected volumes in the petabytes scale for upcoming fleets • Time Series analytics using WarpScript on Spark for batch processing • Interactive manipulation of intermediate results in Warp 10 standalone • Data science using the Warp 10 Zeppelin plugin • Ability to analyze all existing flight data • Efficient and flexible incident analysis • Fast data ingestion and processing pipeline, enabling maintenance KPIs to be computed between landing and parking of aircraft
  • 66. Challenges Data Tools Results • Industrial IoT • 10.000 hours of system validation in Haïti, 2 devices • Engineers must record 200+ CAN and temperature data • 900.712 points per hour (raw data stored on the embedded SSD) • High cost non reliable M2M 3G connection • 90.158 points to upload per hour per device after custom resampling • CAN and modbus networks • Warp 10™ Edge on an iMX-6 with 500 GB industrial SSD • Distributed version of Warp 10™ for the historical data • VertX application to manage CAN and modbus • Local WarpScript code for resampling and remote/local database synchronization • 130 kB per hour, 100MB per month data plan only. • SDMO engineers can: - Do usage statistics - Compute thermal stress - Refine their validation plan in real time