SlideShare une entreprise Scribd logo
1  sur  30
Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC
 Anomalies

Data Science Fairy Tale
 Topics in Anomaly Detection
 Seizure Detection Example
 Summary

anomaly something that deviates from what is standard, normal, or expected
data cleansing
3-5% mislabeled ground truth in MNIST database
9

1

0

1

7

2

3

9

5

0

3

6

6

0

7

5

0

7

6

3

stock price
Volkswagen (VOW.DE) short squeeze, 10/28/2008
transactions

video surveillance

email
Date: Sat, 12 Aug 2012 14:39:59 UTC
From: "Iglobal"
<tryme@yourdomain.com>
To: ”Mr. Foo1" <foo1@freemail.com>
Subject: Foo1, Please Confirm Your
Position!
Hi Foo1,
Welcome To The $7 Plan. I Bring in 3 to 5
New Members In Every Day, I can show you
how easily. Its to much Fun.
Solution #1 It costs too much every month.
Not with the $7 Plan! The TOTAL cost is $7
per month. The $7.00 Plan is still holding
your position and we have people that are
waiting to place under you. That's right only

Credit Card Fraud
 Campaign Response





Traffic
Persons of Interest




Spam
Intrusion / Malware
c
o
u
n
t
e
r

f
e
i
t

h
e
a
l
t
h
c

a
r
e

c
o
n
d
i
t
i

o
n

s
e
i
z
u
r
e

s


Many names



One key (counter-intuitive) idea:
focus on the hay…

… not the needle




Machine learning (ooh)
Unsupervised*
Classification*
User
Device
Sensors

Signals
(Data)




Alerts | intervention
Online | batch

Features

Outputs

Detector
(Classifier)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.







Advantages
Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…
Disadvantages?
We sell healthy, green apples!



Bob ... knows apples

common (n=13)

rare (n=1)


Bob “The 8th Dwarf”
8 Dwarf Orchards, Inc.

… sells healthy apples



… studies data science



… does “Big Apple Data”
Goal: label instances
(green vs. red)

watercore

greens


green = +1

red = -1

Feature Space

Labels



mass density (g/cm3)





reds

Training

zi

Inputs
xi

zi

yi
f :X

Y
Test Examples

watercore

Test Examples – Results

Confusion Matrix
Green (G)

not-green
(NG)

Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)

mass density (g/cm3)
Key idea: trade-off mislabeling each class (P vs. N)

Sensitivity

Confusion matrix
True Classes
Green (G)

TPR = TP / (TP+FN) = 13/14

not-green (NG)

Specificity
Label G

13 (TP)

4 (FP)

Label NG

1 (FN)

1 (TN)

P

N

SPC= TN / (FP+TN) = 1/5

False Positive Rate
FPR= FP / (TP+FP) = 4/17

errors on the “positive” class, Green.
errors on the “negative” class, not-green.
Idea: distance to “average” example
centroid based anomaly detection

examples
 centroid
 threshold
 anomaly

watercore




mass density (g/cm3)

false positive
anomaly score
Trait

classic

anomaly

Sensitivity

.928

1.00

Specificity

.200

.833

Feature dependent?
Require labels?
Magic numbers?

Performance
Goal: find densest regions in feature space

Standard deviation



mass density (g/cm3)

Tukey statistic (IQR)



watercore



Mahalanobis distance
Goal: find densest regions in feature space

Flexible



Density based



Robust



watercore



Tunable

mass density (g/cm3)

How? the one-class support vector machine
Goal: find densest regions in feature space







x

xx

“Flood” graph


x

Pick fraction, e.g. 0.5

Mark waterlines



Note support

The One-class Support Vector Machine Does This



Outlier impact
Rich data
 Graphs

 Spatio-temporal
 Text

Use labels
 Online / latency
 Features
 Clustering & alternatives


You Are Here
APPROACHES

SAMPLE METHODS

Statistical methods
 Distance based methods
 Rule systems
 Profiling Methods
 Model based approaches












Kernel methods
PCA & subspace methods
OCNM & OCSVM
CUSUM
Nearest neighbors
Decision trees
Replicator Neural Networks
Clustering

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)


Problem: Detect seizures in patients from IEEG



Solution: Use one-class SVM to train on 15-minutes of
baseline



Performance: Improve state-of-the art latency
(5 secs) to -13 secs, auto channel selection, unsupervised
technique, …



Reference: “One-Class Novelty Detection for Seizure
Analysis from Intracranial EEG,” Journal of Machine
Learning Research ‘06








Neurological disorder
Electrographic seizures
1% of population
30% non-controllable
EEG, IEEG, MRI, fMRI, PET, etc.
Cyberonics, Neuropace, NeuroVista,…
an “obvious” electrographic seizure

9 minutes
Traditional Model
Brain Electrical Activity

Novelty Model
Brain Electrical Activity

baseline

baseline

pre-seizure

seizure

other
(e.g., seizures, artifacts,
etc.)
Idea: Capture Spectral Changes

Sliding Windows

Spectrum
frequency

EEG

time



Teager Energy



Curve Length



Short-Term Energy
slide & compute
Baseline IEEG
2000

1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Ictal IEEG
2000
1000
0
-1000
-2000
-10

-5

0

5

0

5

P(seizure)
1

0.5

0
-10

-5

time (minutes)
Nothing is more expensive than a missed opportunity.
– H. Jackson Brown, Jr.

Advantages






Data haystacks .01%
Unusual = interesting
Models $$$
Labels $$$
…

Challenges







Features FTW
Normal = ?
Deviation = ?
False positives
Adaptation
…



Questions?
Connect!

Andrew B. Gardner
agardner@momentics.com
http://linkd.in/1byADxC

V. Chandola, A. Banerjee and V. Kumar, “Anomaly Detection: A Survey.” (2009)

Contenu connexe

En vedette

Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detectionSajan Sahu
 
masters seminar_Detection
masters seminar_Detectionmasters seminar_Detection
masters seminar_Detectionashek1520
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urbantaylor_1313
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical jimmy9090909
 
sgp 30 slides
sgp 30 slidessgp 30 slides
sgp 30 slidesBrianne
 
Intrusion Detection Presentation
Intrusion Detection PresentationIntrusion Detection Presentation
Intrusion Detection PresentationMustafash79
 
Ascites in domestic animals
Ascites in domestic animalsAscites in domestic animals
Ascites in domestic animalsDr. Prabhu kumar
 
Stem cells: Information environment
Stem cells: Information environmentStem cells: Information environment
Stem cells: Information environmentArete-Zoe, LLC
 
2017 slideshare
2017 slideshare2017 slideshare
2017 slideshareomhealth
 
Animal Farm Chapter 4
Animal Farm Chapter 4Animal Farm Chapter 4
Animal Farm Chapter 4mrbelprez
 
Intestinal obstruction in small animals
Intestinal obstruction in small animalsIntestinal obstruction in small animals
Intestinal obstruction in small animalsDr Alok Bharti
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Longhow Lam
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCADeepak Kumar
 
Teaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechTeaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechagramfort
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnagramfort
 

En vedette (20)

Anomaly Detection
Anomaly DetectionAnomaly Detection
Anomaly Detection
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Artificial neural network for misuse detection
Artificial neural network for misuse detectionArtificial neural network for misuse detection
Artificial neural network for misuse detection
 
masters seminar_Detection
masters seminar_Detectionmasters seminar_Detection
masters seminar_Detection
 
Robust techniques for background subtraction in urban
Robust techniques for background subtraction in urbanRobust techniques for background subtraction in urban
Robust techniques for background subtraction in urban
 
Network anomaly detection based on statistical
Network anomaly detection based on statistical Network anomaly detection based on statistical
Network anomaly detection based on statistical
 
Animal Quotes
Animal QuotesAnimal Quotes
Animal Quotes
 
sgp 30 slides
sgp 30 slidessgp 30 slides
sgp 30 slides
 
Intrusion Detection Presentation
Intrusion Detection PresentationIntrusion Detection Presentation
Intrusion Detection Presentation
 
Ascites in domestic animals
Ascites in domestic animalsAscites in domestic animals
Ascites in domestic animals
 
Stem cells: Information environment
Stem cells: Information environmentStem cells: Information environment
Stem cells: Information environment
 
2017 slideshare
2017 slideshare2017 slideshare
2017 slideshare
 
Animal Farm Chapter 4
Animal Farm Chapter 4Animal Farm Chapter 4
Animal Farm Chapter 4
 
Anomaly detection
Anomaly detectionAnomaly detection
Anomaly detection
 
Intestinal obstruction in small animals
Intestinal obstruction in small animalsIntestinal obstruction in small animals
Intestinal obstruction in small animals
 
Machine learning overview (with SAS software)
Machine learning overview (with SAS software)Machine learning overview (with SAS software)
Machine learning overview (with SAS software)
 
Anomaly Detection Via PCA
Anomaly Detection Via PCAAnomaly Detection Via PCA
Anomaly Detection Via PCA
 
Teaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTechTeaching ML with scikit-learn at Telecom ParisTech
Teaching ML with scikit-learn at Telecom ParisTech
 
Dev gene therapy
Dev gene therapyDev gene therapy
Dev gene therapy
 
Anomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learnAnomaly/Novelty detection with scikit-learn
Anomaly/Novelty detection with scikit-learn
 

Dernier

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 

Dernier (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 

Introduction to Anomaly Detection - Data Science ATL Meetup Presentation, 07-31-2013

Notes de l'éditeur

  1. (1:00)Thank organizers &amp; attendeesMy background thesisInvitation to connect
  2. (1:00)Anomaly detection is intuitiveRequires a contextRequires a measure
  3. (0:45)MNIST database of handwritten digits. Longstanding story about accuracy of the data set.Volkswagen share price from 210EUR -&gt; 1005EUR. Porsche disclosed holdings, including options that intended to acquire the underlying in. This was going to deplete the float, which caused a run by short sellers. (http://www.risk.net/risk-magazine/feature/1498381/the-volkswagen-squeeze)Anomalies focus our attention
  4. (0:45)Anomalies have intrinsic valuebusiness, social and scientific valuetransactions, like insurance, purchases, returns, etc., looking for unusual good and bad behavior. Canonical example is credit card fraud, for instance my recent “purchase” of wine in SpainVideo surveillance, directly examining people, vehicles, and scenes for gait, position, counts, etc. to determine unusual traffic, intent, directionEmail – canonical example is the spam scam. Anomalous to me individually by content, sender, etc.Anomalous to recipients of an ISP because of the number of spreadMalware – anomalous mailings by me
  5. (0:45)Often overlookedTwo axesExpensive to acquire examplesExpensive to miss anomaliesCurrency – secret service tv episodeConditions – life safety, services, etcSeizures
  6. Anomalies everywhereChanging perspective
  7. Machine learning makes it happenIdeal vs. real systemAlertsbc of intervention costOnline is rareWorkflow is similar
  8. Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus
  9. Meet bobRed apples are “poison” so build a healthy (green) apple detector
  10. RFA request for applesCount all combinations of “what I said It was” x “what it actually was” -&gt; confusion matrixNote the unforeseen apple examples: rotten, yellow, etc.These unanticipated counter-examples are one reason why traditional classification “breaks”
  11. Confusion matrices are … confusingReduce to two statistics (sens, spec)Fpr is related to specSens: how well do we do on green applesSpec: how well do we do on the othersExample: can build a perfect green apple detector by labeling all apples green. That’s highly sensitive, but not specific
  12. Watercore is a real produce feature!This works pretty well for some problems, but there are issues as we will see…
  13. Tukey = nonparametric, spherical region of supportStddev = parametric, spherical region of supportMahalanobis = elliptical, generalization of stddev, tighter bounds but more expensive to computeIn practice, mahalanobis performs nicely
  14. Ideal case: find statistically significant “islands”Curiously, outliers distort this taskThe one-class SVM is the canonical, golden algorithm to achieve this Oracle Data Mining implements one-class svmThere are better variants, now, like OCNM
  15. Outlier pruning before modeling can helpRich data has representation challengesHow do you encode feature vectors?What is an anomaly?How do you define normal?Semisupervised technique: do anomaly detection + use labels for classifyingIf online system, concerned with latencyFeatures matter, even more so for anomaly detectionClustering is an alternative and related problem. Many other related problems. Maybe worth considering.
  16. Good survey paperThey create a taxonomy of techniquesExamples of AD techniques listed Note familiar methods: lots of ML algorithms can be reworked as anomaly detectionStrategies:Find a technique that works for your dataMap your data so it works with your favorite techniqueInvent your own technique
  17. When non-controllable, looking at Surgical brain resection (gold standard)Implantable device (experimental)alternative
  18. Real 20-minictal EEGSeizures not so obvious in raw time series form
  19. We pick simple but robust features from the speech and signal processing literatureTime series almost never useful in raw formUse sliding window approachesHow to pick window width?What about multiscale phenomena
  20. Interictal (baseline) features vsictal (seizure)Notice that feature distributions shift during seizure = anomaly
  21. Data growthUnusual eventsExpensive to modelLabeled examples are rare, expensivePrioritized focus