SlideShare une entreprise Scribd logo
1  sur  34
Télécharger pour lire hors ligne
DETECTING ANOMALIES IN STREAMING DATA
Data By The Bay
May 19, 2016
Subutai Ahmad
@SubutaiAhmad
sahmad@numenta.com
OUTLINE
• Real-time streaming analytics
• Anomaly detection with Hierarchical Temporal Memory
• Benchmarking real-time anomaly detection
• Summary
Monitoring
IT infrastructure
Uncovering
fraudulent
transactions
Tracking
vehicles
Real-time
health
monitoring
Monitoring
energy
consumption
Detection is necessary, but prevention is often the goal
REAL-TIME ANOMALY DETECTION
•  Exponential growth in IoT, sensors and real-time data collection is driving an
explosion of streaming data
•  The biggest application for machine learning is anomaly detection
EXAMPLE: PREVENTIVE MAINTENANCE
EXAMPLE: PREVENTIVE MAINTENANCE
Planned
shutdown
Behavioral change
preceding failure
Catastrophic
failure
THE STREAMING ANALYTICS PROBLEM
Given all past input and current
input, decide whether the system
behavior is anomalous right now.
Must report decision, perform
any retraining, bookkeeping,
etc. before next input arrives.
No look-ahead
No training/test set split – everything must be done online
System must be automated, and customized to each stream
HIERARCHICAL TEMPORAL MEMORY (HTM)
• Powerful sequence memory derived
from recent findings in experimental
neuroscience
• High capacity memory based system
• Models temporal sequences in data
• Inherently streaming
• Continuously learning and predicting
• No need to tune hyper-parameters
• Open source: github.com/numenta
HTM PREDICTS FUTURE INPUT
• Input to the system is a stream of data
• Encoded into a sparse high dimensional vector
• Learns temporal sequences in input stream and makes a prediction
in the form of a sparse vector
•  represents a prediction for upcoming input
HTM
ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
is an instantaneous measure of
prediction error
•  0 if input was perfectly prediction
•  1 if it was completely unpredicted
•  Could threshold it directly to report
anomalies, but in very noisy
environments we can do better
ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1.  Estimate historical distribution of anomaly scores
2.  Check if recent scores are very different
ANOMALY LIKELIHOOD
• Second order measure: did the predictability of the metric change?
1.  Estimate historical distribution of anomaly scores
2.  Check if recent scores are very different
ANOMALY DETECTION WITH HTM
HTM
Raw anomaly
score
Anomaly
likelihood
Learns temporal sequences
Continuously makes predictions
Continuously learning
Was current input
predicted?
Has level of
predictability changed
significantly?
ANOMALIES IN IT INFRASTRUCTURE
• Grok
•  Commercial server based product detects anomalies in IT infrastructure
•  Runs thousands of HTM anomaly detectors in real time
•  10 milliseconds per input per metric, including continuous learning
•  No parameter tuning required
•  http://grokstream.com
ANOMALIES IN FINANCIAL DATA
• HTM for Stocks
•  Real-time free demo application
•  Continuously monitors top 200 stocks
•  Available on iOS App Store or Google Play Store
•  Open source application: github.com/numenta/numenta-apps
OUTLINE
• Real-time streaming analytics
• Anomaly detection with Hierarchical Temporal Memory
• Benchmarking real-time anomaly detection
• Summary
EVALUATING STREAMING ANOMALY DETECTION
•  Most existing benchmarks are designed for batch data, not
streaming data
•  Hard to find benchmarks containing real world data labeled with
anomalies
•  There is a need for an open benchmark designed to test real-time
anomaly detection
•  A standard community benchmark could spur innovation in
streaming anomaly detection algorithms
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Rewards early detection
•  Different “application profiles”
NUMENTA ANOMALY BENCHMARK (NAB)
•  NAB: a rigorous benchmark for anomaly
detection in streaming applications
•  Real-world benchmark data set
•  58 labeled data streams
(47 real-world, 11 artificial streams)
•  Total of 365,551 data points
•  Scoring mechanism
•  Rewards early detection
•  Different “application profiles”
•  Open resource
•  AGPL repository contains data, source code,
and documentation
•  github.com/numenta/NAB
•  Ongoing competition to expand NAB
EXAMPLE: HOURLY SERVICE DEMAND
Spike in demand
Unusually low demand
EXAMPLE: PRODUCTION SERVER CPU
Spiking behavior becomes the new norm
Spike anomaly
HOW SHOULD WE SCORE ANOMALIES?
•  The perfect detector
•  Detects anomalies as soon as possible
•  Provides detections in real time
•  Triggers no false alarms
•  Requires no parameter tuning
•  Automatically adapts to changing statistics
•  Scoring methods in traditional benchmarks are insufficient
•  Precision/recall does not incorporate importance of early detection
•  Artificial separation into training and test sets does not handle continuous learning
•  Batch data files allow look ahead and multiple passes through the data
WHERE IS THE ANOMALY?
NAB DEFINES ANOMALY WINDOWS
NAB scoring function gives higher score to earlier detections in window
OTHER DETAILS
•  Application profiles
•  Three application profiles assign different weightings based on the tradeoff between
false positives and false negatives.
•  EKG data on a cardiac patient favors False Positives.
•  IT / DevOps professionals hate False Positives.
•  Three application profiles: standard, favor low false positives, favor low false negatives.
•  NAB emulates practical real-time scenarios
•  Look ahead not allowed for algorithms. Detections must be made on the fly.
•  No separation between training and test files. Invoke model, start streaming, and go.
•  No batch parameter tuning. Must be fully automated with single set of parameters
across data streams. Any further parameter tuning must be done on the fly.
TESTING ALGORITHMS WITH NAB
•  NAB is designed to easily plug in and test new algorithms
•  Results with several algorithms:
•  Hierarchical Temporal Memory
•  Etsy Skyline
•  Popular open source anomaly detection technique
•  Mixture of statistical experts, continuously learning
•  Twitter ADVec
•  Open source anomaly detection released last year
•  Robust outlier statistics + piecewise approximation
•  Bayesian Online Change Point Detection
•  Formal Bayesian method for detecting anomalies in time series
NAB V1.0 RESULTS (58 FILES)
DETECTION RESULTS: CPU USAGE ON
PRODUCTION SERVER
Simple spike, all 3
algorithms detect
Shift in usage
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
DETECTION RESULTS: MACHINE
TEMPERATURE READINGS
HTM detects purely
temporal anomaly
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
All 3 detect
catastrophic failure
DETECTION RESULTS: TEMPORAL CHANGES IN
BEHAVIOR OFTEN PRECEDE A LARGER SHIFT
HTM detects anomaly 3
hours earlier
Etsy
Skyline
Numenta
HTM
Twitter
ADVec
Red denotes
False Positive
Key
NAB COMPETITION!!
•  NAB is a resource for the streaming analytics community
•  Need additional real-world data files and more algorithms tested
•  NAB Competition offers cash prizes for:
•  Additional anomaly detection algorithms tested on NAB
•  Submission of real-world data files with labeled real anomalies
•  Cash prizes of $2,500 each for algorithms and data
•  Easy to enter, high likelihood of winning!
•  Go to http://numenta.org/nab for details
SUMMARY
•  Anomaly detection for streaming data imposes unique challenges
•  Stringent real-time constraints and automation requirements
•  Typical batch methodologies do not work well
•  HTM learning algorithms
•  Can be used to create a streaming anomaly detection system
•  Performs very well across a wide range of datasets
•  Open source, commercially deployable
•  NAB is an open source benchmark for streaming anomaly detection
•  Includes a labeled dataset with real world data
•  Scoring methodology designed for practical real-time applications
•  NAB competition!
RESOURCES
Grok (anomalies in IT infrastructure): http://grokstream.com
HTM Studio (desktop app for easy experimentation): contact me
Open Source Repositories:
Algorithm code: https://github.com/numenta/nupic
HTM Stocks demo: https://github.com/numenta/numenta-apps
NAB code + paper: https://github.com/numenta/nab
Apache Flink: https://github.com/nupic-community/flink-htm
Contact info:
Subutai Ahmad sahmad@numenta.com, @SubutaiAhmad
Alex Lavin alavin@numenta.com, @theAlexLavin

Contenu connexe

En vedette

Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoMarco Brambilla
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithmsSandeep Joshi
 
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming AlgorithmsRakuten Group, Inc.
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming AlgorithmsJoe Kelley
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Hamza Aslam
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Adrianos Dadis
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RRadek Maciaszek
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalHortonworks
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Flink Forward
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantParis Carbone
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Big Data Spain
 

En vedette (12)

Chapter 2.1 : Data Stream
Chapter 2.1 : Data StreamChapter 2.1 : Data Stream
Chapter 2.1 : Data Stream
 
Big Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di MilanoBig Data and Stream Data Analysis at Politecnico di Milano
Big Data and Stream Data Analysis at Politecnico di Milano
 
Data streaming algorithms
Data streaming algorithmsData streaming algorithms
Data streaming algorithms
 
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms[RakutenTechConf2013] [D-3_2] Counting Big Databy Streaming Algorithms
[RakutenTechConf2013] [D-3_2] Counting Big Data by Streaming Algorithms
 
Streaming Algorithms
Streaming AlgorithmsStreaming Algorithms
Streaming Algorithms
 
Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm Data Stream Outlier Detection Algorithm
Data Stream Outlier Detection Algorithm
 
Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016Stream processing using Apache Storm - Big Data Meetup Athens 2016
Stream processing using Apache Storm - Big Data Meetup Athens 2016
 
Data Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and RData Stream Algorithms in Storm and R
Data Stream Algorithms in Storm and R
 
Discover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.finalDiscover.hdp2.2.storm and kafka.final
Discover.hdp2.2.storm and kafka.final
 
Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink- Márton Balassi Streaming ML with Flink-
Márton Balassi Streaming ML with Flink-
 
Data Stream Analytics - Why they are important
Data Stream Analytics - Why they are importantData Stream Analytics - Why they are important
Data Stream Analytics - Why they are important
 
Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...Advanced data science algorithms applied to scalable stream processing by Dav...
Advanced data science algorithms applied to scalable stream processing by Dav...
 

Similaire à Detecting Anomalies in Streaming Data

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkNumenta
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLANumenta
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsDATAVERSITY
 
How the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOpsHow the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOpsCA Technologies
 
Chris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, TridiumChris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, TridiumGlobal Business Intelligence
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalVMware Tanzu Korea
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayMIDIH_EU
 
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor MihajlovicMachine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor MihajlovicInstitute of Contemporary Sciences
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomSudarson Roy Pratihar
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralMichael O'Connell
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant confluent
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applicationsAmit Kejriwal
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Aleksandr Tavgen
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applicationsGR8Conf
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...ManageEngine, Zoho Corporation
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligenceNumenta
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexApache Apex
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionCNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionSam Bowne
 

Similaire à Detecting Anomalies in Streaming Data (20)

Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly BenchmarkEvaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
Evaluating Real-Time Anomaly Detection: The Numenta Anomaly Benchmark
 
Anomaly Detection Using the CLA
Anomaly Detection Using the CLAAnomaly Detection Using the CLA
Anomaly Detection Using the CLA
 
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming AnalyticsSmartData Webinar: Applying Neocortical Research to Streaming Analytics
SmartData Webinar: Applying Neocortical Research to Streaming Analytics
 
How the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOpsHow the Big Data of APM can Supercharge DevOps
How the Big Data of APM can Supercharge DevOps
 
Chris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, TridiumChris Irwin - Business Development Director, Tridium
Chris Irwin - Business Development Director, Tridium
 
Real Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from PivotalReal Time Business Platform by Ivan Novick from Pivotal
Real Time Business Platform by Ivan Novick from Pivotal
 
Apeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_dayApeman masta midih-oc2_demo_day
Apeman masta midih-oc2_demo_day
 
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor MihajlovicMachine Intelligence in Manufacturing Industry - Igor Mihajlovic
Machine Intelligence in Manufacturing Industry - Igor Mihajlovic
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
 
Anomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science CentralAnomaly detection - TIBCO Data Science Central
Anomaly detection - TIBCO Data Science Central
 
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant Inside Kafka Streams—Monitoring Comcast’s Outside Plant
Inside Kafka Streams—Monitoring Comcast’s Outside Plant
 
Building data intensive applications
Building data intensive applicationsBuilding data intensive applications
Building data intensive applications
 
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine Observability -  The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
Observability - The good, the bad and the ugly Xp Days 2019 Kiev Ukraine
 
Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex Next Gen Big Data Analytics with Apache Apex
Next Gen Big Data Analytics with Apache Apex
 
Performance tuning Grails applications
 Performance tuning Grails applications Performance tuning Grails applications
Performance tuning Grails applications
 
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
NetFlow Analyzer Training Part II : Diagnosing and troubleshooting traffic is...
 
Predictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine IntelligencePredictive Analytics with Numenta Machine Intelligence
Predictive Analytics with Numenta Machine Intelligence
 
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache ApexHadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
Hadoop Summit SJ 2016: Next Gen Big Data Analytics with Apache Apex
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data CollectionCNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
CNIT 121: 6 Discovering the Scope of the Incident & 7 Live Data Collection
 

Dernier

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptxLBM Solutions
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?XfilesPro
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Dernier (20)

AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Key Features Of Token Development (1).pptx
Key  Features Of Token  Development (1).pptxKey  Features Of Token  Development (1).pptx
Key Features Of Token Development (1).pptx
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?How to Remove Document Management Hurdles with X-Docs?
How to Remove Document Management Hurdles with X-Docs?
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Detecting Anomalies in Streaming Data

  • 1. DETECTING ANOMALIES IN STREAMING DATA Data By The Bay May 19, 2016 Subutai Ahmad @SubutaiAhmad sahmad@numenta.com
  • 2. OUTLINE • Real-time streaming analytics • Anomaly detection with Hierarchical Temporal Memory • Benchmarking real-time anomaly detection • Summary
  • 3. Monitoring IT infrastructure Uncovering fraudulent transactions Tracking vehicles Real-time health monitoring Monitoring energy consumption Detection is necessary, but prevention is often the goal REAL-TIME ANOMALY DETECTION •  Exponential growth in IoT, sensors and real-time data collection is driving an explosion of streaming data •  The biggest application for machine learning is anomaly detection
  • 5. EXAMPLE: PREVENTIVE MAINTENANCE Planned shutdown Behavioral change preceding failure Catastrophic failure
  • 6. THE STREAMING ANALYTICS PROBLEM Given all past input and current input, decide whether the system behavior is anomalous right now. Must report decision, perform any retraining, bookkeeping, etc. before next input arrives. No look-ahead No training/test set split – everything must be done online System must be automated, and customized to each stream
  • 7. HIERARCHICAL TEMPORAL MEMORY (HTM) • Powerful sequence memory derived from recent findings in experimental neuroscience • High capacity memory based system • Models temporal sequences in data • Inherently streaming • Continuously learning and predicting • No need to tune hyper-parameters • Open source: github.com/numenta
  • 8. HTM PREDICTS FUTURE INPUT • Input to the system is a stream of data • Encoded into a sparse high dimensional vector • Learns temporal sequences in input stream and makes a prediction in the form of a sparse vector •  represents a prediction for upcoming input HTM
  • 9. ANOMALY DETECTION WITH HTM HTM Raw anomaly score Anomaly likelihood is an instantaneous measure of prediction error •  0 if input was perfectly prediction •  1 if it was completely unpredicted •  Could threshold it directly to report anomalies, but in very noisy environments we can do better
  • 10. ANOMALY LIKELIHOOD • Second order measure: did the predictability of the metric change? 1.  Estimate historical distribution of anomaly scores 2.  Check if recent scores are very different
  • 11. ANOMALY LIKELIHOOD • Second order measure: did the predictability of the metric change? 1.  Estimate historical distribution of anomaly scores 2.  Check if recent scores are very different
  • 12. ANOMALY DETECTION WITH HTM HTM Raw anomaly score Anomaly likelihood Learns temporal sequences Continuously makes predictions Continuously learning Was current input predicted? Has level of predictability changed significantly?
  • 13. ANOMALIES IN IT INFRASTRUCTURE • Grok •  Commercial server based product detects anomalies in IT infrastructure •  Runs thousands of HTM anomaly detectors in real time •  10 milliseconds per input per metric, including continuous learning •  No parameter tuning required •  http://grokstream.com
  • 14. ANOMALIES IN FINANCIAL DATA • HTM for Stocks •  Real-time free demo application •  Continuously monitors top 200 stocks •  Available on iOS App Store or Google Play Store •  Open source application: github.com/numenta/numenta-apps
  • 15. OUTLINE • Real-time streaming analytics • Anomaly detection with Hierarchical Temporal Memory • Benchmarking real-time anomaly detection • Summary
  • 16. EVALUATING STREAMING ANOMALY DETECTION •  Most existing benchmarks are designed for batch data, not streaming data •  Hard to find benchmarks containing real world data labeled with anomalies •  There is a need for an open benchmark designed to test real-time anomaly detection •  A standard community benchmark could spur innovation in streaming anomaly detection algorithms
  • 17. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications
  • 18. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points
  • 19. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points •  Scoring mechanism •  Rewards early detection •  Different “application profiles”
  • 20. NUMENTA ANOMALY BENCHMARK (NAB) •  NAB: a rigorous benchmark for anomaly detection in streaming applications •  Real-world benchmark data set •  58 labeled data streams (47 real-world, 11 artificial streams) •  Total of 365,551 data points •  Scoring mechanism •  Rewards early detection •  Different “application profiles” •  Open resource •  AGPL repository contains data, source code, and documentation •  github.com/numenta/NAB •  Ongoing competition to expand NAB
  • 21. EXAMPLE: HOURLY SERVICE DEMAND Spike in demand Unusually low demand
  • 22. EXAMPLE: PRODUCTION SERVER CPU Spiking behavior becomes the new norm Spike anomaly
  • 23. HOW SHOULD WE SCORE ANOMALIES? •  The perfect detector •  Detects anomalies as soon as possible •  Provides detections in real time •  Triggers no false alarms •  Requires no parameter tuning •  Automatically adapts to changing statistics •  Scoring methods in traditional benchmarks are insufficient •  Precision/recall does not incorporate importance of early detection •  Artificial separation into training and test sets does not handle continuous learning •  Batch data files allow look ahead and multiple passes through the data
  • 24. WHERE IS THE ANOMALY?
  • 25. NAB DEFINES ANOMALY WINDOWS NAB scoring function gives higher score to earlier detections in window
  • 26. OTHER DETAILS •  Application profiles •  Three application profiles assign different weightings based on the tradeoff between false positives and false negatives. •  EKG data on a cardiac patient favors False Positives. •  IT / DevOps professionals hate False Positives. •  Three application profiles: standard, favor low false positives, favor low false negatives. •  NAB emulates practical real-time scenarios •  Look ahead not allowed for algorithms. Detections must be made on the fly. •  No separation between training and test files. Invoke model, start streaming, and go. •  No batch parameter tuning. Must be fully automated with single set of parameters across data streams. Any further parameter tuning must be done on the fly.
  • 27. TESTING ALGORITHMS WITH NAB •  NAB is designed to easily plug in and test new algorithms •  Results with several algorithms: •  Hierarchical Temporal Memory •  Etsy Skyline •  Popular open source anomaly detection technique •  Mixture of statistical experts, continuously learning •  Twitter ADVec •  Open source anomaly detection released last year •  Robust outlier statistics + piecewise approximation •  Bayesian Online Change Point Detection •  Formal Bayesian method for detecting anomalies in time series
  • 28. NAB V1.0 RESULTS (58 FILES)
  • 29. DETECTION RESULTS: CPU USAGE ON PRODUCTION SERVER Simple spike, all 3 algorithms detect Shift in usage Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 30. DETECTION RESULTS: MACHINE TEMPERATURE READINGS HTM detects purely temporal anomaly Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key All 3 detect catastrophic failure
  • 31. DETECTION RESULTS: TEMPORAL CHANGES IN BEHAVIOR OFTEN PRECEDE A LARGER SHIFT HTM detects anomaly 3 hours earlier Etsy Skyline Numenta HTM Twitter ADVec Red denotes False Positive Key
  • 32. NAB COMPETITION!! •  NAB is a resource for the streaming analytics community •  Need additional real-world data files and more algorithms tested •  NAB Competition offers cash prizes for: •  Additional anomaly detection algorithms tested on NAB •  Submission of real-world data files with labeled real anomalies •  Cash prizes of $2,500 each for algorithms and data •  Easy to enter, high likelihood of winning! •  Go to http://numenta.org/nab for details
  • 33. SUMMARY •  Anomaly detection for streaming data imposes unique challenges •  Stringent real-time constraints and automation requirements •  Typical batch methodologies do not work well •  HTM learning algorithms •  Can be used to create a streaming anomaly detection system •  Performs very well across a wide range of datasets •  Open source, commercially deployable •  NAB is an open source benchmark for streaming anomaly detection •  Includes a labeled dataset with real world data •  Scoring methodology designed for practical real-time applications •  NAB competition!
  • 34. RESOURCES Grok (anomalies in IT infrastructure): http://grokstream.com HTM Studio (desktop app for easy experimentation): contact me Open Source Repositories: Algorithm code: https://github.com/numenta/nupic HTM Stocks demo: https://github.com/numenta/numenta-apps NAB code + paper: https://github.com/numenta/nab Apache Flink: https://github.com/nupic-community/flink-htm Contact info: Subutai Ahmad sahmad@numenta.com, @SubutaiAhmad Alex Lavin alavin@numenta.com, @theAlexLavin