2. Who am I ?
Development Experience
◆ Image Recognition using Neural Network
◆ Bio-Medical Data Processing
◆ Human Brain Mapping on High Performance
Computing
◆ Medical Image Reconstruction
(Computer Tomography)
◆ Enterprise System
◆ Open Source Software Developer
Open Source Software Developer
◆ Linux Kernel & LLVM
◆ OPNFV (NFV&SDN) & OpenStack
◆ Machine Learning (TensorFlow)
Book
◆ Unix V6 Kernel
Korea Open Source Software Lab.
Mario Cho
hephaex@gmail.com
3. Problem Motivation
• Just like in other learning problems,
• Want to know a given dataset is abnormal/anomalous or not?
• define a "model"
- that tells us the probability the example is not anomalous.
- also use a threshold (epsilon) as a dividing line
- so we can say which examples are anomalous or not.
5. Example of Anomaly detection
• Aircraft engine features:
• Features
- x1 = heat generated
- x2 = vibration intensity
6. Example of Anomaly detection
• Density estimation
• Dataset: { x(1), x(2), x(3), ,,, , x(m)}
• Is “New engine: xtest” anomalous?
Model p(x) 에 대하여.
P(xtest ) < E à flag anomaly
P(xtest ) >= E à not anomaly, normal
9. Anomaly detection example
• Fraud detection
• X(i)= features of user I’s activities
• Model p(x) from data
• Identify unusual users by checking with have p(x) < E
• Manufacturing
• X(i)= features of process I’s
• Model p(x) from measured data
• Identify unusual product by checking with have p(x) < E
• Monitoring computer in a data center
• X(i)= features of machine I
• X1 = memory use,
• X2 = number of disk accesses / sec
• X3 = CPU load
• Identify unusual status by checking with have p(x) < E
19. Anomaly detection vs. Supervised learning
• Detect very small number
• Positive (y = 1) : 0~20
• Negative (y = 0 ) : Large
• Many different “types” of
anomalies.
• Hard to adaptive similar learning
• Future anomalies may look
nothing like any of the
anomalous examples we’ve
seen so far.
• Positive & Negative are large
• Positive (y = 1) : Large
• Negative (y = 0 ) : Large
• Enough positive example for
algorithm to get a sense of what
positive example are like
• Many different “types” of
anomalies.
• Easy to adaptive similar learning
• Future positive exaple likely to
be similar to ones in training set
22. Error analysis for anomaly detection
• Want
• P(x) large for normal examples x.
• P(x) small for anomalous examples x.
• Most common problem:
• P(x) is comparable (say, both large) for normal and anomalous
23. Monitoring computers in a data center
• Choose feature that might take on unusually large or small
value in the event of an anomaly
• X(i)= features of machine I
• X1 = memory use,
• X2 = number of disk accesses / sec
• X3 = CPU load
• X4 = Network traffic