SlideShare a Scribd company logo
1 of 40
이상 감지
(Anomaly Detection)
고등 지능 기술 연구회
(Advanced Intelligence Technology Research Society)
김철(ki4420@gmail.com)
2016-07-09
이상감지란?
데이터의 메인 스트림에서 벗어난 샘플
데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범
주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의
미.
outlier
이상감지란?(cont.)
Min:Max ≠ Outlier
1.5xIQR rule
IQR(Interquartile Range) = Q3 – Q1
Max
Min
이상감지란?(cont.)
이상 값은 전형적으로 문제의 한 증상으로 해석
일반적인 통계 정의에 따르지 않는 드문 현상
이상감지란?(cont.)
클러스터 알고리즘으로 이상 패턴에 의해 형성된
마이크로 클러스터를 검출
역사
Anomaly detection was proposed for intrusion
detection systems (IDS) by Dorothy Denning in 1986.
초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅
그리고, 귀납적 학습
역사(cont.)
응용기술
사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건
전성 모니터링, IoT, etc.
생태계 교란을 감지
데이터에서 이상 값을 제거하는 데 자주 사용
3가지 분류
1. 비지도 이상 감지(Unsupervised anomaly detection)
- 레이블 없는 데이터에서 이상 감지
- K-means 클러스터 알고리즘으로 이상검출
2. 지도 이상 감지(Supervised anomaly detection)
- 정상(Normal), 비정상(Abnormal) 레이블이 존재
- 분류 모델 이용(SVM, Random forests, Logistic, Robust,
KNN, etc.)
3가지 분류(cont.)
3. 준지도 이상 감지(Semi-supervised anomaly detection)
- 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한
likelihood를 비교해서 이상 값을 추출
- NKIA’s LRSTSD based Anomaly Detection
- Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly
Detection
NKIA’s Anomaly Detection Twitter’s Anomaly Detection
입력 데이터
단변량(Univariate) 다변량(Multivariate)
입력 데이터(cont.)
자료구조
- Binary
- Categorical
- Continuous
- Hybrid
이상값의 종류
Point Anomalies
- 데이터 셋의 뭉치에서 벗어나는 값
이상값의 종류(cont.)
Contextual Anomalies
- 컨텍스트에 동떨어진 값
- 컨텍스트의 개념이 필요
- 조건부 이상치의 참조(Rules)
이상값의 종류(cont.)
Collective Anomalies
- 수집 문제로 발생한 이상값
Output of Anomaly Detection
Label
- Label of normal or anomaly
- 분류문제 접근법에서 true|false or class
Score
- Rank
- 0:1
- Threshold parameter가 필요
이상감지의 평가
F-Measure
- 지도학습, 분류문제 평가
- Formula:
Recall(R) = TP / (TP + FN)
Precision(P) = TP / (TP + FP)
F-measure = 2*R*P/(R+P)
The Area Under an ROC Curve
- AUC(Area Under the Curve)
- Detection Rate(TP), False Alarm Rate(TN)
- 0:1
- Equation:
Confusion Actual class
Normal Anomaly
Predicted
class
Normal TP FP
Anomaly FN TN
이원교차표(Crosstable)
Score Label
.90 ~ 1 Excellent(A)
.80 ~ .90 Good(B)
.70 ~ .80 Fair(C)
.60 ~ .70 Poor(D)
.50 ~ .60 Fail(F)
평가표 ROC(Receiver Operating
Characteristic) Curves
m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
Taxonomy*
유명한 이상감지 기법들
Twitter’s Anomaly Detection R pack.
Twitter open-sourced their R package for anomaly
detection.
They call their algorithm Seasonal Hybrid ESD (S-H-
ESD), which is built on Generalized ESD.
Sometimes anomalies can mess up your modeling.
Twitter’s Anomaly Detection R pack.(cont.)
install.packages("devtools")
devtools::install_github("twitter/AnomalyDetection")
library(AnomalyDetection)
install.packages("gtable")
install.packages("scales")
data(raw_data)
res = AnomalyDetectionTs(raw_data, max_anoms=0.02,
direction='both', plot=TRUE)
res$plota
Twitter’s Anomaly Detection R pack.(cont.)
v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv")
res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72,
direction='both', plot=TRUE)
res2$plot
Twitter’s Anomaly Detection R pack.(cont.)
Usage
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
Arguments
X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists
of the observations.
max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data.
direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'.
alpha : The level of statistical significance with which to accept or reject anomalies.
only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'.
threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'.
e_value : Add an additional column to the anoms output containing the expected value.
longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below.
piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014).
Defaults to 2.
Twitter’s Anomaly Detection R pack.(cont.)
Usage
AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value =
FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title
= NULL, verbose = FALSE)
Arguments(cont.)
plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned.
y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the
rest of the data.
xlabel : X-axis label to be added to the output plot.
ylabel : Y-axis label to be added to the output plot.
title : Title for the output plot.
verbose : Enable debug messages
Twitter’s Anomaly Detection R pack.(cont.)
To understand how twitter’s algorithm works, you need
to know.
- Student t-distribution
- Extreme Studentized Deviate (ESD) test
- Generalized ESD
- Linear regression
- LOESS
- STL(Seasonal Trend LOESS)
Twitter’s Anomaly Detection R pack.(cont.)
Student t-distribution
정규 분포의 평균을 측정할 때 주로 사용되는 분포
PDF
t
Twitter’s Anomaly Detection R pack.(cont.)
Extreme Studentized Deviate (ESD) test
Twitter’s Anomaly Detection R pack.(cont.)
Generalized ESD
Twitter’s Anomaly Detection R pack.(cont.)
Seasonality(linear regression, LOESS, STL)
The generalized ESD works when you have a set of points from a normal distribution,
but real data has some seasonality. This is where STL comes in. It decomposes the data
into a season part, a trend and whatever’s left over using local regression (LOESS), which
fits a low order polynomial to a subset of the data and stitches them together by
weighting them. Since you can remove the trend and seasonal part with loess, you
should be left with something that is more or less normally distributed. You can apply
generalized ESD on what’s left over to detect anomalies.
#STL: “Seasonal and Trend decomposition using Loess”
Seasonality Local regression(LOESS) Polynomial regression
Twitter: Introducing practical and robust
anomaly detection in a time series
Global/Local
At Twitter, we observe distinct seasonal patterns in most of the time series.
Global: global anomalies typically extend above or below expected seasonality and are
therefore not subject to seasonality and underlying trend
Local: anomalies which occur inside seasonal patterns, are masked and thus are much
more difficult to detect in a robust fashion.
Positive/Negative
Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용)
Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터
수집 이슈를 발견
Subspace- and correlation-based outlier
detection for high-dimensional data.
주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여
차원 축소
부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
Subspace- and correlation-based outlier
detection for high-dimensional data.(cont.)
HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
RNN(Replicator neural networks)
에러를 최소화해서 입력 패턴을 재생하는 방법
정상 모델을 생성하여 이상값을 추출
A schematic view of a fully connected
Replicator Neural Network.
𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어
𝑛 = # of features
𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값
𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
LOF(Local Outlier Factor)
Density-based anomaly detection by KNN
Score를 제공하여 해석이 용이하나 delay time이 좀 있음.
Unsupervised anomaly detection
Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower
density than its neighbors
LOF(Local Outlier Factor)(cont.)
Formula:
Illustration of the
reachability distance.
Objects B and C have the
same reachability distance
(k=3), while D is not a k
nearest neighbor
LOF(Local Outlier Factor)(cont.)
LOF scores as visualized by ELKI. While the upper right cluster has a
comparable density to the outliers close to the bottom left cluster, they
are detected correctly.
LOF(Local Outlier Factor)(cont.)
LOF scores of cpu util. vs. Time by Rlof
LRSTSD(Log regression seasonality based
approach of time series decomposition)
Anomaly score formula:
Anomaly score
1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx
𝐸𝑖 = i번째 에러
𝐴𝑖 = i번째 관측값
𝑈𝑖 = i번째 예측 상한 값
𝐿𝑖 = i번째 예측 하한 값
𝑃 = 전체 값(Parameter)
결론
이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술
 예측률 향상 기대
데이터의 오탐/수집 실패를 감지
 Resampling, 보정 등 적절한 대처가 가능
관측된 이상 값과 문제와의 연관성 분석
 문제에 대한 사전 감지 기술로 활용
 고장 예측
참고문헌
• https://en.wikipedia.org/wiki/Anomaly_detection
• http://datascience.stackexchange.com/questions/2313/mach
ine-learning-where-is-the-difference-between-one-class-
binary-class-and-m
• https://en.wikipedia.org/wiki/Outlier#Detection
• https://www.semanticscholar.org/paper/Outlier-Detection-
Using-Replicator-Neural-Networks-Hawkins-
He/87a09c777dcecab4883e328669ef2af1ba8dd7be
• http://neuro.bstu.by/ai/To-dom/My_research/Papers-0/For-
research/D-mining/Anomaly-D/KDD-cup-
99/NN/dawak02.pdf
• http://slideplayer.com/slide/4194183/
• http://link.springer.com/chapter/10.1007%2F978-981-10-
0281-6_118#page-1
• https://cran.r-project.org/web/packages/Rlof/index.html
• https://warrenmar.wordpress.com/tag/seasonal-hybrid-esd/
• https://ko.wikipedia.org/wiki/%EC%8A%A4%ED%8A%9C%EB
%8D%98%ED%8A%B8_t_%EB%B6%84%ED%8F%AC
• https://en.wikipedia.org/wiki/Soft_computing
• https://www.google.com/trends/explore#q=anomaly%2C%20%2Fm%
2F02vnd10%2C%20%2Fm%2F0bs2j8q&cmpt=q&tz=Etc%2FGMT-9
• http://www.slideserve.com/sidonie/data-mining-for-anomaly-
detection
• http://www.physics.csbsju.edu/stats/box2.html
• http://study.com/academy/lesson/maximums-minimums-outliers-in-
a-data-set-lesson-quiz.html
• http://www.sfu.ca/~jackd/Stat203/Wk02_1_Full.pdf
• http://slideplayer.com/slide/6321088/
• http://gim.unmc.edu/dxtests/roc3.htm
• http://www.cs.ru.nl/~tomh/onderwijs/dm/dm_files/roc_auc.pdf
• http://togaware.com/papers/dawak02.pdf
• https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers
• https://github.com/twitter/AnomalyDetection
• https://blog.twitter.com/2015/introducing-practical-and-robust-
anomaly-detection-in-a-time-series

More Related Content

What's hot

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsManojit Nandi
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataSalah Amean
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningAmAn Singh
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessingankur bhalla
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detectionShantanuDeosthale
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree inductionthamizh arasi
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learningKoundinya Desiraju
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxImpetus Technologies
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kambererror007
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision treeKrish_ver2
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision treehktripathy
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretizationKrish_ver2
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 

What's hot (20)

Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Anomaly Detection for Real-World Systems
Anomaly Detection for Real-World SystemsAnomaly Detection for Real-World Systems
Anomaly Detection for Real-World Systems
 
Naive bayes
Naive bayesNaive bayes
Naive bayes
 
Data mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, dataData mining :Concepts and Techniques Chapter 2, data
Data mining :Concepts and Techniques Chapter 2, data
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Outlier analysis and anomaly detection
Outlier analysis and anomaly detectionOutlier analysis and anomaly detection
Outlier analysis and anomaly detection
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
Bayesian learning
Bayesian learningBayesian learning
Bayesian learning
 
Introduction to machine learning
Introduction to machine learningIntroduction to machine learning
Introduction to machine learning
 
Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptxAnomaly Detection and Spark Implementation - Meetup Presentation.pptx
Anomaly Detection and Spark Implementation - Meetup Presentation.pptx
 
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; KamberChapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
Chapter - 7 Data Mining Concepts and Techniques 2nd Ed slides Han &amp; Kamber
 
2.2 decision tree
2.2 decision tree2.2 decision tree
2.2 decision tree
 
Lect9 Decision tree
Lect9 Decision treeLect9 Decision tree
Lect9 Decision tree
 
1.8 discretization
1.8 discretization1.8 discretization
1.8 discretization
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 

Viewers also liked

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterArun Kejariwal
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache SparkCloudera, Inc.
 
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?Silke Berz
 
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigMLDavid Gerster
 
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?Kumazaki Hiroki
 
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ai Makabi
 
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaJatin Dhola
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp packageDr. Fiona McGroarty
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Saksham Agrawal
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflixCody Rioux
 
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professionalAi Makabi
 
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professionalAi Makabi
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup SlidesQuantUniversity
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAdam Gibson
 
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)NAVER D2
 
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professionalAi Makabi
 
Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1David Khosid
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionKhalid Elshafie
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangVivian S. Zhang
 

Viewers also liked (20)

Statistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ TwitterStatistical Learning Based Anomaly Detection @ Twitter
Statistical Learning Based Anomaly Detection @ Twitter
 
Anomaly Detection with Apache Spark
Anomaly Detection with Apache SparkAnomaly Detection with Apache Spark
Anomaly Detection with Apache Spark
 
Welcher Test gewinnt?
Welcher Test gewinnt?Welcher Test gewinnt?
Welcher Test gewinnt?
 
Anomaly Detection with BigML
Anomaly Detection with BigMLAnomaly Detection with BigML
Anomaly Detection with BigML
 
What is jubatus? How it works for you?
What is jubatus? How it works for you?What is jubatus? How it works for you?
What is jubatus? How it works for you?
 
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup ) Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
Ansibleを使ってローカル開発環境を作ろう ( #PyLadiesTokyo Meetup )
 
Vector space - subspace By Jatin Dhola
Vector space - subspace By Jatin DholaVector space - subspace By Jatin Dhola
Vector space - subspace By Jatin Dhola
 
Time series Analysis & fpp package
Time series Analysis & fpp packageTime series Analysis & fpp package
Time series Analysis & fpp package
 
PyGotham 2016
PyGotham 2016PyGotham 2016
PyGotham 2016
 
Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1Network_Intrusion_Detection_System_Team1
Network_Intrusion_Detection_System_Team1
 
Real time analytics @ netflix
Real time analytics @ netflixReal time analytics @ netflix
Real time analytics @ netflix
 
単純ベイズ法による異常検知 #ml-professional
単純ベイズ法による異常検知  #ml-professional単純ベイズ法による異常検知  #ml-professional
単純ベイズ法による異常検知 #ml-professional
 
Chapter 01 #ml-professional
Chapter 01 #ml-professionalChapter 01 #ml-professional
Chapter 01 #ml-professional
 
Anomaly detection Meetup Slides
Anomaly detection Meetup SlidesAnomaly detection Meetup Slides
Anomaly detection Meetup Slides
 
Anomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) EnglishAnomaly detection in deep learning (Updated) English
Anomaly detection in deep learning (Updated) English
 
[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)[devil's camp] - 알고리즘 대회와 STL (박인서)
[devil's camp] - 알고리즘 대회와 STL (박인서)
 
Chapter 02 #ml-professional
Chapter 02  #ml-professionalChapter 02  #ml-professional
Chapter 02 #ml-professional
 
Anomaly detection, part 1
Anomaly detection, part 1Anomaly detection, part 1
Anomaly detection, part 1
 
Chapter 10 Anomaly Detection
Chapter 10 Anomaly DetectionChapter 10 Anomaly Detection
Chapter 10 Anomaly Detection
 
Winning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen ZhangWinning data science competitions, presented by Owen Zhang
Winning data science competitions, presented by Owen Zhang
 

Similar to Anomaly detection

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsCynthia Freeman
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabSneheshDutta
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzerbutest
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMapAshish Patel
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Introduction
IntroductionIntroduction
Introductionbutest
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxakashayosha
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalizationKamal Bhatt
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCarl Byington
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionIhor Bobak
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - FinalMax Robertson
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Oswald Campesato
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AINcib Lotfi
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learningRADO7900
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce
 

Similar to Anomaly detection (20)

Anomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language ModelsAnomaly Detection in Sequences of Short Text Using Iterative Language Models
Anomaly Detection in Sequences of Short Text Using Iterative Language Models
 
tutorial.ppt
tutorial.ppttutorial.ppt
tutorial.ppt
 
Encoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlabEncoder for (7,3) cyclic code using matlab
Encoder for (7,3) cyclic code using matlab
 
Jörg Stelzer
Jörg StelzerJörg Stelzer
Jörg Stelzer
 
Java and Deep Learning
Java and Deep LearningJava and Deep Learning
Java and Deep Learning
 
TamingStatistics
TamingStatisticsTamingStatistics
TamingStatistics
 
Deep learning MindMap
Deep learning MindMapDeep learning MindMap
Deep learning MindMap
 
Nural network ER. Abhishek k. upadhyay
Nural network ER. Abhishek  k. upadhyayNural network ER. Abhishek  k. upadhyay
Nural network ER. Abhishek k. upadhyay
 
Introduction
IntroductionIntroduction
Introduction
 
Advanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptxAdvanced Econometrics L5-6.pptx
Advanced Econometrics L5-6.pptx
 
Adaptive equalization
Adaptive equalizationAdaptive equalization
Adaptive equalization
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
CBM Fault Detection by Carl Byington
CBM Fault Detection by Carl ByingtonCBM Fault Detection by Carl Byington
CBM Fault Detection by Carl Byington
 
PhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-SeneviratnePhysicsSIG2008-01-Seneviratne
PhysicsSIG2008-01-Seneviratne
 
Hierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly DetectionHierarchical Temporal Memory for Real-time Anomaly Detection
Hierarchical Temporal Memory for Real-time Anomaly Detection
 
EBDSS Max Research Report - Final
EBDSS  Max  Research Report - FinalEBDSS  Max  Research Report - Final
EBDSS Max Research Report - Final
 
Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)Java and Deep Learning (Introduction)
Java and Deep Learning (Introduction)
 
Cheat sheets for AI
Cheat sheets for AICheat sheets for AI
Cheat sheets for AI
 
Neural networks and deep learning
Neural networks and deep learningNeural networks and deep learning
Neural networks and deep learning
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 

Recently uploaded

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGIThomas Poetter
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Seán Kennedy
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 

Recently uploaded (20)

Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGILLMs, LMMs, their Improvement Suggestions and the Path towards AGI
LLMs, LMMs, their Improvement Suggestions and the Path towards AGI
 
Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...Student Profile Sample report on improving academic performance by uniting gr...
Student Profile Sample report on improving academic performance by uniting gr...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 

Anomaly detection

  • 1. 이상 감지 (Anomaly Detection) 고등 지능 기술 연구회 (Advanced Intelligence Technology Research Society) 김철(ki4420@gmail.com) 2016-07-09
  • 2. 이상감지란? 데이터의 메인 스트림에서 벗어난 샘플 데이터 마이닝에서 이상감지는 예상 패턴 또는 정상 범 주를 준수하지 않는 아이템, 이벤트, 관찰들의 식별을 의 미. outlier
  • 3. 이상감지란?(cont.) Min:Max ≠ Outlier 1.5xIQR rule IQR(Interquartile Range) = Q3 – Q1 Max Min
  • 4. 이상감지란?(cont.) 이상 값은 전형적으로 문제의 한 증상으로 해석 일반적인 통계 정의에 따르지 않는 드문 현상
  • 5. 이상감지란?(cont.) 클러스터 알고리즘으로 이상 패턴에 의해 형성된 마이크로 클러스터를 검출
  • 6. 역사 Anomaly detection was proposed for intrusion detection systems (IDS) by Dorothy Denning in 1986. 초기에는 정상 임계치, 통계량의 전처리, 소프트 컴퓨팅 그리고, 귀납적 학습
  • 8. 응용기술 사이버 침입 탐지, 신용카드 사기, 고장 감지, 시스템 건 전성 모니터링, IoT, etc. 생태계 교란을 감지 데이터에서 이상 값을 제거하는 데 자주 사용
  • 9. 3가지 분류 1. 비지도 이상 감지(Unsupervised anomaly detection) - 레이블 없는 데이터에서 이상 감지 - K-means 클러스터 알고리즘으로 이상검출 2. 지도 이상 감지(Supervised anomaly detection) - 정상(Normal), 비정상(Abnormal) 레이블이 존재 - 분류 모델 이용(SVM, Random forests, Logistic, Robust, KNN, etc.)
  • 10. 3가지 분류(cont.) 3. 준지도 이상 감지(Semi-supervised anomaly detection) - 정상(Normal) 레이블만 존재하고, 정상 모델에 의해 생성한 likelihood를 비교해서 이상 값을 추출 - NKIA’s LRSTSD based Anomaly Detection - Twitter’s Seasonal Hybrid ESD (S-H-ESD) based Anomaly Detection NKIA’s Anomaly Detection Twitter’s Anomaly Detection
  • 12. 입력 데이터(cont.) 자료구조 - Binary - Categorical - Continuous - Hybrid
  • 13. 이상값의 종류 Point Anomalies - 데이터 셋의 뭉치에서 벗어나는 값
  • 14. 이상값의 종류(cont.) Contextual Anomalies - 컨텍스트에 동떨어진 값 - 컨텍스트의 개념이 필요 - 조건부 이상치의 참조(Rules)
  • 15. 이상값의 종류(cont.) Collective Anomalies - 수집 문제로 발생한 이상값
  • 16. Output of Anomaly Detection Label - Label of normal or anomaly - 분류문제 접근법에서 true|false or class Score - Rank - 0:1 - Threshold parameter가 필요
  • 17. 이상감지의 평가 F-Measure - 지도학습, 분류문제 평가 - Formula: Recall(R) = TP / (TP + FN) Precision(P) = TP / (TP + FP) F-measure = 2*R*P/(R+P) The Area Under an ROC Curve - AUC(Area Under the Curve) - Detection Rate(TP), False Alarm Rate(TN) - 0:1 - Equation: Confusion Actual class Normal Anomaly Predicted class Normal TP FP Anomaly FN TN 이원교차표(Crosstable) Score Label .90 ~ 1 Excellent(A) .80 ~ .90 Good(B) .70 ~ .80 Fair(C) .60 ~ .70 Poor(D) .50 ~ .60 Fail(F) 평가표 ROC(Receiver Operating Characteristic) Curves m = # of TP, n = # of TN, 𝑝𝑖 = 𝑇𝑃 𝑅𝑎𝑡𝑒(Detection Rate), 𝑝𝑗 = 𝑇𝑁 𝑅𝑎𝑡𝑒(𝐹𝑎𝑙𝑠𝑒 𝐴𝑙𝑎𝑟𝑚 𝑅𝑎𝑡𝑒)
  • 20. Twitter’s Anomaly Detection R pack. Twitter open-sourced their R package for anomaly detection. They call their algorithm Seasonal Hybrid ESD (S-H- ESD), which is built on Generalized ESD. Sometimes anomalies can mess up your modeling.
  • 21. Twitter’s Anomaly Detection R pack.(cont.) install.packages("devtools") devtools::install_github("twitter/AnomalyDetection") library(AnomalyDetection) install.packages("gtable") install.packages("scales") data(raw_data) res = AnomalyDetectionTs(raw_data, max_anoms=0.02, direction='both', plot=TRUE) res$plota
  • 22. Twitter’s Anomaly Detection R pack.(cont.) v <- read.csv("D:/r/tsd_paper/cpu_5m_02.csv") res2 = AnomalyDetectionVec(v, max_anoms=0.02, period=72, direction='both', plot=TRUE) res2$plot
  • 23. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments X : Time series as a two column data frame where the first column consists of the timestamps and the second column consists of the observations. max_anoms : Maximum number of anomalies that S-H-ESD will detect as a percentage of the data. direction : Directionality of the anomalies to be detected. Options are: 'pos' | 'neg' | 'both'. alpha : The level of statistical significance with which to accept or reject anomalies. only_last : Find and report anomalies only within the last day or hr in the time series. NULL | 'day' | 'hr'. threshold : Only report positive going anoms above the threshold specified. Options are: 'None' | 'med_max' | 'p95' | 'p99'. e_value : Add an additional column to the anoms output containing the expected value. longterm : Increase anom detection efficacy for time series that are greater than a month. See Details below. piecewise_median_period_weeks : The piecewise median time window as described in Vallis, Hochenbaum, and Kejariwal (2014). Defaults to 2.
  • 24. Twitter’s Anomaly Detection R pack.(cont.) Usage AnomalyDetectionTs(x, max_anoms = 0.1, direction = "pos", alpha = 0.05, only_last = NULL, threshold = "None", e_value = FALSE, longterm = FALSE, piecewise_median_period_weeks = 2, plot = FALSE, y_log = FALSE, xlabel = "", ylabel = "count", title = NULL, verbose = FALSE) Arguments(cont.) plot : A flag indicating if a plot with both the time series and the estimated anoms, indicated by circles, should also be returned. y_log : Apply log scaling to the y-axis. This helps with viewing plots that have extremely large positive anomalies relative to the rest of the data. xlabel : X-axis label to be added to the output plot. ylabel : Y-axis label to be added to the output plot. title : Title for the output plot. verbose : Enable debug messages
  • 25. Twitter’s Anomaly Detection R pack.(cont.) To understand how twitter’s algorithm works, you need to know. - Student t-distribution - Extreme Studentized Deviate (ESD) test - Generalized ESD - Linear regression - LOESS - STL(Seasonal Trend LOESS)
  • 26. Twitter’s Anomaly Detection R pack.(cont.) Student t-distribution 정규 분포의 평균을 측정할 때 주로 사용되는 분포 PDF t
  • 27. Twitter’s Anomaly Detection R pack.(cont.) Extreme Studentized Deviate (ESD) test
  • 28. Twitter’s Anomaly Detection R pack.(cont.) Generalized ESD
  • 29. Twitter’s Anomaly Detection R pack.(cont.) Seasonality(linear regression, LOESS, STL) The generalized ESD works when you have a set of points from a normal distribution, but real data has some seasonality. This is where STL comes in. It decomposes the data into a season part, a trend and whatever’s left over using local regression (LOESS), which fits a low order polynomial to a subset of the data and stitches them together by weighting them. Since you can remove the trend and seasonal part with loess, you should be left with something that is more or less normally distributed. You can apply generalized ESD on what’s left over to detect anomalies. #STL: “Seasonal and Trend decomposition using Loess” Seasonality Local regression(LOESS) Polynomial regression
  • 30. Twitter: Introducing practical and robust anomaly detection in a time series Global/Local At Twitter, we observe distinct seasonal patterns in most of the time series. Global: global anomalies typically extend above or below expected seasonality and are therefore not subject to seasonality and underlying trend Local: anomalies which occur inside seasonal patterns, are masked and thus are much more difficult to detect in a robust fashion. Positive/Negative Positive: 슈퍼볼 경기 동안의 트윗 폭증 등(이벤트에 대한 용량 산정을 위해 사용) Negative: 초당 쿼리수(QPS[Queries Per Second])의 증가 등 잠재적인 하드웨어나 데이터 수집 이슈를 발견
  • 31. Subspace- and correlation-based outlier detection for high-dimensional data. 주성분 분석(PCA), 요인 분석(Dimension reduction)을 이용하여 차원 축소 부분공간(Subspace)의 대비(Contrast)를 계산하여 이상을 감지
  • 32. Subspace- and correlation-based outlier detection for high-dimensional data.(cont.) HiCS: High Contrast Subspaces for Density-Based Outlier Ranking
  • 33. RNN(Replicator neural networks) 에러를 최소화해서 입력 패턴을 재생하는 방법 정상 모델을 생성하여 이상값을 추출 A schematic view of a fully connected Replicator Neural Network. 𝑂𝐹𝑖 = i번째 요소의 Anomaly Factor 스코어 𝑛 = # of features 𝑥𝑖𝑗 = i번째 요소의 j컬럼 관측값 𝑜𝑖𝑗 = i번째 요소의 j컬럼 RNN으로 재생한 정규값
  • 34. LOF(Local Outlier Factor) Density-based anomaly detection by KNN Score를 제공하여 해석이 용이하나 delay time이 좀 있음. Unsupervised anomaly detection Basic idea of LOF: comparing the local density of a point with the densities of its neighbors. A has a much lower density than its neighbors
  • 35. LOF(Local Outlier Factor)(cont.) Formula: Illustration of the reachability distance. Objects B and C have the same reachability distance (k=3), while D is not a k nearest neighbor
  • 36. LOF(Local Outlier Factor)(cont.) LOF scores as visualized by ELKI. While the upper right cluster has a comparable density to the outliers close to the bottom left cluster, they are detected correctly.
  • 37. LOF(Local Outlier Factor)(cont.) LOF scores of cpu util. vs. Time by Rlof
  • 38. LRSTSD(Log regression seasonality based approach of time series decomposition) Anomaly score formula: Anomaly score 1일 네트워크 트래픽Tx 7일 네트워크 트래픽Tx 𝐸𝑖 = i번째 에러 𝐴𝑖 = i번째 관측값 𝑈𝑖 = i번째 예측 상한 값 𝐿𝑖 = i번째 예측 하한 값 𝑃 = 전체 값(Parameter)
  • 39. 결론 이상감지는 예측 모델 생성 시 Noise를 제거할 수 있는 기술  예측률 향상 기대 데이터의 오탐/수집 실패를 감지  Resampling, 보정 등 적절한 대처가 가능 관측된 이상 값과 문제와의 연관성 분석  문제에 대한 사전 감지 기술로 활용  고장 예측
  • 40. 참고문헌 • https://en.wikipedia.org/wiki/Anomaly_detection • http://datascience.stackexchange.com/questions/2313/mach ine-learning-where-is-the-difference-between-one-class- binary-class-and-m • https://en.wikipedia.org/wiki/Outlier#Detection • https://www.semanticscholar.org/paper/Outlier-Detection- Using-Replicator-Neural-Networks-Hawkins- He/87a09c777dcecab4883e328669ef2af1ba8dd7be • http://neuro.bstu.by/ai/To-dom/My_research/Papers-0/For- research/D-mining/Anomaly-D/KDD-cup- 99/NN/dawak02.pdf • http://slideplayer.com/slide/4194183/ • http://link.springer.com/chapter/10.1007%2F978-981-10- 0281-6_118#page-1 • https://cran.r-project.org/web/packages/Rlof/index.html • https://warrenmar.wordpress.com/tag/seasonal-hybrid-esd/ • https://ko.wikipedia.org/wiki/%EC%8A%A4%ED%8A%9C%EB %8D%98%ED%8A%B8_t_%EB%B6%84%ED%8F%AC • https://en.wikipedia.org/wiki/Soft_computing • https://www.google.com/trends/explore#q=anomaly%2C%20%2Fm% 2F02vnd10%2C%20%2Fm%2F0bs2j8q&cmpt=q&tz=Etc%2FGMT-9 • http://www.slideserve.com/sidonie/data-mining-for-anomaly- detection • http://www.physics.csbsju.edu/stats/box2.html • http://study.com/academy/lesson/maximums-minimums-outliers-in- a-data-set-lesson-quiz.html • http://www.sfu.ca/~jackd/Stat203/Wk02_1_Full.pdf • http://slideplayer.com/slide/6321088/ • http://gim.unmc.edu/dxtests/roc3.htm • http://www.cs.ru.nl/~tomh/onderwijs/dm/dm_files/roc_auc.pdf • http://togaware.com/papers/dawak02.pdf • https://en.wikipedia.org/wiki/Grubbs%27_test_for_outliers • https://github.com/twitter/AnomalyDetection • https://blog.twitter.com/2015/introducing-practical-and-robust- anomaly-detection-in-a-time-series

Editor's Notes

  1. oO