Time series anomaly detection using cnn coupled with data augmentation using ga ns
1. Time Series Anomaly
detection on structured
data from IOT Network
using CNN
using synthetic labelled data generation using GANs
2. PROBLEM DESCRIPTION
Standardized data sets have been a crucial factor in the success of ML, however, there is nothing like
that available for IIOT domains.
Many/most data sets are toy, noisy, unnormalized; some data sets are proprietary.
Manual analysis of massive amount of data and associated metrics is inefficient and practically not
feasible.
Manual analysis is also not sustainable as this method is subject to individual personnel knowledge
and experience which can result in inconsistencies and rendering the process non-scalable.
3. BRIEF SUMMARY OF THE PROPOSED SOLUTION
• The proposed methodology
consists of several components
to classify various types of
anomalies that can occur in an
IIOT network type with more
emphasis given to generating
synthetic data using deep
generative model (DCGAN) to
achieve high generalizability of
the anomaly detection
classification model.
4. BRIEF SUMMARY OF THE PROPOSED
SOLUTION
Stage 1
- The input dataset is pre-processed.
- Performance metrics are first encoded and
then transformed into an innovative multi-
dimensional representation of various
performance metrics and time to capture the
multi-spatial relationships between them.
Stage 2
- Deep Convolutional Generative Adversarial
Networks (DCGAN) model which is used to
generate synthetic data for each anomaly class
enabled by the innovative image
representation proposed in stage 1.
- We also propose multiple evaluation metrics
to evaluate deep generative models on
diversity of generated data and closeness to
the target distribution.
- Additionally, we performed sampling
saturation checks for the trained generative
model without compromising on the
evaluation metrics.
Stage 3
- We merge synthetic data and real data. This
merged data is used to train a classifier model.
- We propose multiple evaluation metrics to
evaluate CNN model which quantifies the
improvement made by synthetic images
generated by deep generative models.
5. Model Evaluation Framework
EvalDiversity learns a classifier on DCGAN
generated synthetic data and measures the
performance on real data(image
representation). This evaluates the diversity
and realism of generated synthetic images.
EvalDistributionAccuracy learns a classifier on real
data (image representation) and evaluates it on
generated synthetic images. This measures how
close is the generated data distribution to the
actual data distribution.
EvalMergedModelTestMergedData learns a classifier
on a merged data set (real + synthetic data) and
evaluated on merged data. This further certifies
the diversity of the images generated by the
deep generative model.
EvalMergedModelTestRealData learns a classifier
on a merged data set comprising of 50% of real
data and 50% of generated data. Evaluation is
done only on real data not used for training.
This evaluates whether adding generated data
improves the classifier trained on original data.