The document discusses anomaly detection in time series data using WarpScript functions. It begins with an introduction to time series data and WarpScript. Key techniques for detecting anomalies discussed include threshold-based methods, statistical tests, and forecast models. The document also covers analyzing seasonality in time series and methods for handling multiple seasonal patterns.
4. I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
Agenda
7. Why it’s not easy to build a TSDB
Storage
● Scalability
● Ingestion / Fetch performance
● Security, GDPR compliant
● Deployment (e.g. standalone vs edge vs distributed)
Analytics
● Simple and complex queries
● Concurrent access
● Interoperable with other programs / languages / libraries
● Parallelizable when storage is distributed
8.
9.
10. The Geo Time Series™ data model
Metadata Datapoints
key1: value1
key2: value2
. . .
timestamps
values
geostamps (optional)labels: immutable
attributes: mutable
classname
identifies a GTS
Long, Double, String, Bytes,
Multi-values, nested GTS, . . .
11. Warp 10 Storage Engine
Geo Time Series™
Performance
Secured, GDPR
Scalable
Standard protocols and formats
Interoperability
12. A library of 1000+ functions
From basic statistics to advanced signal
processing and anomaly detection
12
Warp 10 Analytics Engine
Execute a same code on a single server
or on a distributed cluster Executable via Http,
via Java, or via Python
13. A library of 1000+ functions
13
Warp 10 Analytics Engine
Executable via Http,
via Java, or via Python
Independent of the Storage engine
Can be connected to any data source
Concise Syntax designed for data flows
$data FUNC1 FUNC2 FUNC3 ...
14. Things / Sensors
Data transmission
Data cleansing
Data synchronization
Analytics, ML Feature
Engineering & Extraction
Data filtering
Data access control
Data storage
Business Applications and
services
Business analytics
Data science
80%
of effort
Scope of Warp 10™
15. Advantages of Warp 10™
● Broader scope: from storage to analytics
● More complex queries and analytics
● Optional support for Geo
● Both storage and analytics are distributable
● Strongly interoperable with other tools
16. Get your hands on Warp 10™ in no time
https://sandbox.senx.io
17. WarpScript functions
WarpScript has many built-in anomaly detection functions:
● THRESHOLDTEST
● ZSCORETEST
● GRUBBSTEST
● ESDTEST
● STLESDTEST
● HYBRIDTEST
● HYBRIDTEST2
● DISCORDS
● ZDISCORDS
● . . .
Why so many?
To answer different
types of anomalies
19. What is an anomaly?
A A A A B A A A . . .
A A A A B A A A A B A A A A B A C A A B A A A A B A . . .
20. What is to be considered as an anomaly?
This is the real question to ask.
21. What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
22. What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
➢ Outliers of a statistical distribution
➢ Forecast errors
23. What is to be considered as an anomaly?
This is the real question to ask.
An anomaly can be:
➢ Particular values, new values . . .
➢ Values above or below a certain threshold
➢ Outliers of a statistical distribution
➢ Forecast errors
➢ Seasonality dependant
➢ Use case dependant
25. Agenda
I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
26. WarpScript basics
args... FUNCTION
syntax
1 ‘a’ STORE
Assign value
$a
Use variable
<% ‘some operations’ %> ‘macro’ STORE
Define a macro (i.e. a custom function) args... @macro
Evaluate macro
args... @trusted/repo/macro
Evaluate macro from trusted repository
27. Threshold techniques
How to define the threshold?
● Above (or below) a simple value
$data $threshold THRESHOLDTEST
● Compare with the mean (or median)
$data $useMedian $nb_std ZSCORETEST
● Compare with the moving mean (or median)
$data $window_args $nb_std @moving_ZSCORETEST
$args FUNCTION
$args @macro
28. Above a threshold
// Detect anomaly
100.0 THRESHOLDTEST
// Fetch data
[ $token 'response_time' {} NOW -500 ] FETCH $args FUNCTION
32. Statistical tests
Under normality assumption:
● Grubbs test: detects if the maximum (or minimum value) is an outlier
$data $useMedian GRUBBSTEST
● Extreme studentized deviate test: detect up to k outliers
$data $k $useMedian ESDTEST
$args FUNCTION
34. Forecast anomalies
With the extension Warp10-ext-Forecast, you can create
forecast models.
● Specific forecast models:
LSTM, NNETAR, SES, HOLT, HOLTWINTERS, ARMA, ARIMA, SARMA, SARIMA
● Let an algorithm choose for you:
AUTO, SEARCH.NNET, SEARCH.ETS, SEARCH.ARIMA
● Anomalies can be detected using:
$forecastModel FORECAST.ANOMALIES
$args FUNCTION
37. Agenda
I. Presentation
A. Time Series data
B. Warp 10 and WarpScript
C. Anomaly Detection
II. Detecting Anomalies
A. Using simple threshold techniques
B. Using statistical methods
C. Using forecast models
III. Seasonality Analysis
A. Detecting seasonality
B. Seasonal anomaly detection
C. Multiple seasonalities
IV. Conclusion
38. Seasonal data
A A A A B A A A A B A A A A B A A A A B A A A A B A . . .
39. Seasonal data
A A A A B A A A A B A A A A B A A A A B A A A A B A . . .
A B C D A B C D A B C D A B C D A B C D A B C D A . . .
41. How to detect seasonality?
● Auto-Correlation function (ACF)
$data [ $data ] [ $domain ] CORRELATE
● Power spectral density (using FFT and IFFT functions)
$data @FAST_CORRELATE
$args FUNCTION
49. How to handle multiple seasonalities?
Possible strategies
● Iterate Anomaly detection for each seasonality
● Use difference series and integrate (available with forecast extension):
[ $seasonality_1 $seasonality_2 ... ] DIFF
@ANOMALY_DETECTION
[ $seasonality_1 $seasonality_2 ... ]INVERTDIFF
$args FUNCTION
57. Rationales for using Geo Time Series
Some features
● Store raw data
● Inner relations: time (and optionally geo)
● Outer relations: group by classname, group by key/value
Some benefits
● Chunkable / Parallelizable
● Easy manipulation
● Easier implementation of analytics
58. WarpScript has over 900 functions
String Function (32) Maths (74)
Geo Time Series®
(145)
Stack (66)
Composite Types
(52)
Processing (94) Platform (39) Logic (10)
Time Related (26) Cryptographic (16)
Logic Structure &
Flow Control (21)
Constants (9)
Quaternions (8) Mappers (93) reducers (37) Bucketizers (23)
Operations (18) Filters (12) Conversions (24) Geo (19)
58
63. Shareability / Extensibility
Easily share macros (no installation required)
Retrieve and publish plugins, extensions, macros
warpfleet.macros.repos = http://MY/MACRO/REPOSITORY
@my/macro
Configuration file
Warpscript
$wf get --conf my/conf/file group artifact
Command line
64. Challenges Data Tools Results
• Monitoring large
infrastructures
(servers, networks,
devices, applications,
middlewares )
• Willingness to
rationalize monitoring
tools
• Enable advanced
analytics and Machine
Learning
• Monitoring
metrics and
events
• Over 500
millions Time
Series from
containers
(evanescent
series) and
physical devices
• Peaks over 50
millions
datapoints per
second
• Distributed
version of
Warp 10
• In-Memory
Warp 10
instances for
caching
• WarpScript for
analytics
• Reduced number of
technologies used for
monitoring
• Ability to perform analytics
on millions of series in
realtime
• Access to large historical
datasets (100s trillion of
datapoints) for trend analysis
and pattern detection
• Dashboarding tools (Grafana)
connected to Warp 10™
datasource used by all teams
• Identical analytics skills
acquired by all teams.
65. Challenges Data Tools Results
• Aircrafts are
equipped with a
growing number of
sensors
• Need to analyse
aircraft data for safety
maintenance and
diagnostic purposes for
individual aircrafts and
fleets
• Multiple teams want
access to the data
• Growth opportunities
in new services based
on data analysis
• 1 hour of flight
produces 8 Mb
to 1 Gb of data
depending on
aircraft (from
10 M to 3 B
datapoints per
flight hour)
• Historical
dataset for over
300 aircrafts for
multiple years
with projected
volumes in the
petabytes scale
for upcoming
fleets
• Time Series
analytics using
WarpScript on
Spark for batch
processing
• Interactive
manipulation of
intermediate
results in Warp 10
standalone
• Data science
using the Warp 10
Zeppelin plugin
• Ability to analyze all
existing flight data
• Efficient and flexible
incident analysis
• Fast data ingestion and
processing pipeline, enabling
maintenance KPIs to be
computed between landing
and parking of aircraft
66. Challenges Data Tools Results
• Industrial IoT
• 10.000 hours of
system validation in
Haïti, 2 devices
• Engineers must
record 200+ CAN and
temperature data
• 900.712 points per
hour (raw data
stored on the
embedded SSD)
• High cost non
reliable M2M 3G
connection
• 90.158 points
to upload per
hour per device
after custom
resampling
• CAN and
modbus
networks
• Warp 10™ Edge
on an iMX-6 with
500 GB
industrial SSD
• Distributed
version of
Warp 10™ for the
historical data
• VertX
application to
manage CAN
and modbus
• Local
WarpScript code
for resampling
and remote/local
database
synchronization
• 130 kB per hour,
100MB per month data
plan only.
• SDMO engineers can:
- Do usage statistics
- Compute thermal
stress
- Refine their validation
plan in real time