f you have any device or source that generates values over time (also a log from a service), you want to determine if in a time frame, the time serie is correct or you can detect some anomalies. What can you do as a developer (not a Data Scientist) with .NET o Azure? Let's see how in this session.
WSO2Con2024 - WSO2's IAM Vision: Identity-Led Digital Transformation
Time Series Anomaly Detection with Azure and .NETT
1. Time Series Anomaly Detection
with Azure and .NET (part 1)
Marco Parenzan // @marco_parenzan
2. Marco Parenzan
• Senion Solution Architect @ beanTech
• 1nn0va Community Lead (Pordenone)
• Microsoft Azure MVP
• Profiles
• Linkedin: https://www.linkedin.com/in/marcoparenzan/
• Slideshare: https://www.slideshare.net/marco.parenzan
• GitHub: https://github.com/marcoparenzan
3. This is the journey of…
• …a .NET developer…
• …or an IoT developer…
• …a one-man band (sometimes )…
• …facing typical data science world topics…
• …that wants to use .NET everywhere!
5. Scenario
• In an industrial fridge, you monitor temperatures to check not the
temperature «per se», but to check the healthy of the plant
From real industrial fridges
7. With no any specific request...
what is IoT all about?
Efficiency Anomalies
Batch Streaming
8. Time Series
• Definition
• Time series is a sequence of data points recorded in time order, often taken at successive
equally paced points in time.
• Examples
• Stock prices, Sales demand, website traffic, daily temperatures, quarterly sales
• Time series is different from regression analysis because of its time-dependent
nature.
• Auto-correlation: Regression analysis requires that there is little or no autocorrelation in the
data. It occurs when the observations are not independent of each other. For example, in
stock prices, the current price is not independent of the previous price. [The observations
have to be dependent on time]
• Seasonality, a characteristic which we will discuss below.
9. Anomaly Detection in Time Series
• In time series data, an anomaly or outlier can be termed as a data
point which is not following the common collective trend or seasonal
or cyclic pattern of the entire data and is significantly distinct from
rest of the data. By significant, most data scientists mean statistical
significance, which in order words, signify that the statistical
properties of the data point is not in alignment with the rest of the
series.
• Anomaly detection has two basic assumptions:
• Anomalies only occur very rarely in the data.
• Their features differ from the normal instances significantly.
10. Threshold anomalies?
• Threshold Anomalies for a time window
• Slow changing damages
• Fridge is no more efficient
• Threshold alarms are not enough
• Anomalies cannot be just «over a threshold
for some time»...
• Condenser or Evaporator with difficulties
starting
• Distinguish from Opening a door (that is
also an anomaly)
• Or also counting the number of times that
there are peaks (too many times)
• You can considering each of these
events as anomalies that alter the
temperature you measure in different
part of the fridge
11. How can we implement processing?
Ingest Process
Storage
Account
Azure
IoT Hub-Related
Services
Devices
Events
?
We explore some of them,
probably the most Microsoft and Azure oriented
17. Spark Unifies:
Batch Processing
Interactive SQL
Real-time processing
Machine Learning
Deep Learning
Graph Processing
An unified, open source, parallel, data processing framework for Big Data Analytics
Spark Core Engine
Spark SQL
Batch processing
Spark Structured
Streaming
Stream processing
Spark MLlib
Machine
Learning
Yarn
Spark MLlib
Machine
Learning
Spark
Streaming
Stream processing
GraphX
Graph
Computation
http://spark.apache.org
Apache Spark
18. Batch vs. Notebooks
• Batch
• Work on slow data stored into a
Datalake
• Submit a complete app in one
single deploy
• Receive the entire output
• Notebook
• «sketching» the code
• Write/delete/rewrite continuously
• Run cell by cell (but also all at
once) interactive
• In a world of Mathematica
19. Jupyter
• Evolution and generalization of the seminal role of Mathematica
• In web standards way
• Web (HTTP+Markdown)
• Python adoption (ipynb)
• Written in Java
• Python has an interop bridge...not native (if ever
important)Python is a kernel for Jupyter
20. Python!
• Simple to start (that why C# is pythonizing…)
• “Open Source”
• TensorFlow, Scikit-learn, Keras, Pandas, PyTorch
• Remember one thing:
• Often behind a Data Science framework there is a native library and Python
binds that library
• Spark is written in Java and there is a bridge for Python to Spark
• Jupyter is written in Java and there is a bridge (kernel) for Python
22. Helping no-data scientits developers (all! )
• Unsupervised Machine LearningNo labelling
• Auto(mated) MLfind the best tuning for you with parameters and algorithms
• Automated Training Set for Anomaly Detection Algorithms
• the algorithms automatically generates a simulated training set based non your input
data https://docs.microsoft.com/en-us/azure/machine-learning/algorithm-cheat-sheet
23. Spectrum Residual Cnn (SrCnn)
• to monitor the time-series continuously and alert for potential incidents on time
• The algorithm first computes the Fourier Transform of the original data. Then it
computes the spectral residual of the log amplitude of the transformed signal
before applying the Inverse Fourier Transform to map the sequence back from
the frequency to the time domain. This sequence is called the saliency map. The
anomaly score is then computed as the relative difference between the saliency
map values and their moving averages. If the score is above a threshold, the value
at a specific timestep is flagged as an outlier.
• There are several parameters for SR algorithm. To obtain a model with good
performance, we suggest to tune windowSize and threshold at first, these are the
most important parameters to SR. Then you could search for an appropriate
judgementWindowSize which is no larger than windowSize. And for the
remaining parameters, you could use the default value directly.
• Time-Series Anomaly Detection Service at Microsof
[https://arxiv.org/pdf/1906.03821.pdf]
25. Data Science and AI for the .NET developer
• ML.NET is first and foremost a framework that you can use
to create your own custom ML models. This custom
approach contrasts with “pre-built AI,” where you use pre-
designed general AI services from the cloud (like many of
the offerings from Azure Cognitive Services). This can work
great for many scenarios, but it might not always fit your
specific business needs due to the nature of the machine
learning problem or to the deployment context (cloud vs.
on-premises).
• ML.NET enables developers to use their existing .NET skills
to easily integrate machine learning into almost any .NET
application. This means that if C# (or F# or VB) is your
programming language of choice, you no longer have to
learn a new programming language, like Python or R, in
order to develop your own ML models and infuse custom
machine learning into your .NET apps.
27. .NET Interactive and Jupyter
and Visual Studio Code
• .NET Interactive gives C# and F# kernels to Jupyter
• .NET Interactive gives all tools to create your hosting application
independently from Jupyter
• In Visual Studio Code, you have two different notebooks (looking similar but
developed in parallel by different teams)
• .NET Interactive Notebook (by the .NET Interactive Team) that can run also Python
• Jupyter Notebook (by the Azure Data Studio Team – probably) that can run also C# and
F#
• There is a little confusion on that
• .NET Interactive has a strong C#/F# Kernel...
• ...a less mature infrastructure (compared to Jupiter)
28. .NET for Apache Spark 1.1.1
• .NET bindings (C# e F#) to Spark
• Written on the Spark interop layer,
designed to provide high
performance bindings to multiple
languages
• Re-use knowledge, skills, code you
have as a .NET developer
• Compliant with .NET Standard
• You can use .NET for Apache Spark
anywhere you write .NET code
• Original project Moebius
• https://github.com/microsoft/Mobius
31. Functions everywhere
Platform
App delivery
OS
On-premises
Code
App Service on Azure Stack
Windows
●●●
Non-Azure hosts
●●●
●●●
+
Azure Functions
host runtime
Azure Functions
Core Tools
Azure Functions
base Docker image
Azure Functions
.NET Docker image
Azure Functions
Node Docker image
●●●
32. Azure Cognitive Services
• Cognitive Services brings AI within reach of every developer—without requiring
machine-learning expertise. All it takes is an API call to embed the ability to see,
hear, speak, search, understand, and accelerate decision-making into your apps.
Enable developers of all skill levels to easily add AI capabilities to their apps.
• Five areas:
• Decision
• Language
• Speech
• Vision
• Web search
Anomaly Detector
Identify potential problems early on.
Content Moderator
Detect potentially offensive or unwanted
content.
Metrics Advisor PREVIEW
Monitor metrics and diagnose issues.
Personalizer
Create rich, personalized experiences for every
user.
33. Azure Synapse Analytics for the Big Data
Limitless analytics service with unmatched time to insight
Platform
Azure
Data Lake Storage
Common Data Model
Enterprise Security
Optimized for Analytics
METASTORE
SECURITY
MANAGEMENT
MONITORING
DATA INTEGRATION
Analytics Runtimes
DEDICATED SERVERLESS
Form Factors
SQL
Languages
Python .NET Java Scala
Experience Synapse Analytics Studio
Artificial Intelligence / Machine Learning / Internet of Things
Intelligent Apps / Business Intelligence
METASTORE
SECURITY
MANAGEMENT
MONITORING
34. What is ADX?
34
• A Telemetry data Search engine => ELK replacement
• A TSDB => OSS LAMBDA (MinIO + Kafka) replacement
• A Tool to Materialize data into ADLS & SQL
• A Tool for monitoring, summarizing information and send notifications
36. First conclusions
• .NET ecosystem in Data Science World is completing
• C# is pythonizing since C# 7.x
• .NET for Spark that runs in Synapse and DataBricks
• .Net Interactive notebooks in Visual Studio Code, Synapse, Cosmos...
• Azure has lot of support for Data Science in .NET and adopt
everything described
37. See you for second part with the complete
journey with more demoes
Part 2: Sept 23rd, 7.20 AM EDT
Time Series Anomaly Detection
with Azure and .NET (part 1)
38. Thank you!
Marco Parenzan
Senior Solution Architect @ beanTech
Microsoft Azure MVP
1nn0va Community Lead
• https://docs.microsoft.com/en-us/azure/cognitive-services/anomaly-detector/
• https://docs.microsoft.com/en-us/dotnet/machine-learning/tutorials/sales-anomaly-detection
• https://github.com/dotnet/interactive
• https://docs.microsoft.com/en-us/dotnet/machine-learning/how-to-guides/serve-model-serverless-azure-functions-ml-net
• https://azure.microsoft.com/en-us/services/cognitive-services/metrics-advisor/
Anomaly detection is the process of identifying unexpected items or events in data sets, which differ from the norm. And anomaly detection is often applied on unlabeled data which is known as unsupervised anomaly detection.
https://towardsdatascience.com/effective-approaches-for-time-series-anomaly-detection-9485b40077f1
The Spectral Residual outlier detector is based on the paper Time-Series Anomaly Detection Service at Microsoft and is suitable for unsupervised online anomaly detection in univariate time series data. The algorithm first computes the Fourier Transform of the original data. Then it computes the spectral residual of the log amplitude of the transformed signal before applying the Inverse Fourier Transform to map the sequence back from the frequency to the time domain. This sequence is called the saliency map. The anomaly score is then computed as the relative difference between the saliency map values and their moving averages. If the score is above a threshold, the value at a specific timestep is flagged as an outlier. For more details, please check out the paper.