Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

95 vues

Publié le

Given a stream of graph edges from a dynamic graph, how can we assign anomaly scores to edges in an online manner, for the purpose of detecting unusual behavior, using constant time and memory? Existing approaches aim to detect individually surprising edges. In this work, we propose MIDAS, which focuses on detecting microcluster anomalies, or suddenly arriving groups of suspiciously similar edges, such as lockstep behavior, including denial of service attacks in network traffic data. MIDAS has the following properties:
(a) it detects microcluster anomalies while providing theoretical guarantees about its false positive probability;
(b) it is online, thus processing each edge in constant time and constant memory, and also processes the data 162 – 644 times faster than state-of-the-art approaches;
(c) it provides 42%-48% higher accuracy (in terms of AUC) than state-of-the-art approaches.

Publié dans : Données & analyses
  • Soyez le premier à commenter

  • Soyez le premier à aimer ceci

MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams

  1. 1. 1 MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, Christos Faloutsos siddharth@comp.nus.edu.sg @siddharthb_
  2. 2. 2Introduction Method Experiments ConclusionProblem Related Work Future Work
  3. 3. Intrusion Detection T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 3Introduction Method Experiments ConclusionProblem Related Work Future Work
  4. 4. Time Evolving Graphs T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs ComputerNetworks Twitter/IMNetworks 4Introduction Method Experiments ConclusionProblem Related Work Future Work
  5. 5. Edge Streams T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs u1 u2 u3 u4 u5 a1 a2 a3 u6 5Introduction Method Experiments ConclusionProblem Related Work 𝑢! 𝑢" 𝑢# 𝑢$ 𝑢$ 𝑢% 𝑢% 𝑎! 𝑎! 𝑎! 𝑎" 𝑎" 𝑎" 𝑎# 𝑎# 𝑎# 1 2 2 3 4 4 5 6 6 6 6 6 6 6 6 6 𝑢" 𝑢! 𝑢% 𝑢# 𝑢& 𝑢& 𝑢& 𝑢" 𝑢# 𝑢% 𝑢" 𝑢% 𝑢& 𝑢# 𝑢% 𝑢& Time Future Work
  6. 6. Roadmap T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 6Introduction Method Experiments ConclusionProblem Related Work • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work Future Work
  7. 7. Problem T3.1: Detecting Anomalous Dynamic Subgraphs Input: • Edge stream 𝑬 from time evolving graph 𝑮 • Directed, multigraph, discrete time Output: • Anomaly Score for each edge Our Contributions: • Microcluster Detection • Guarantees on False Positive Probability • Constant Memory • Constant Update Time 7Introduction Method Experiments ConclusionProblem Related Work Future Work
  8. 8. Roadmap T3.1: Detecting Anomalous Dynamic Subgraphs 8Introduction Method Experiments ConclusionProblem Related Work • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work Future Work
  9. 9. T3.1: Detecting Anomalous Dynamic Subgraphs Background T3.1: Detecting Anomalous Dynamic Subgraphs Offline: • Time series methods Online: • Bounded memory • Set of nodes is not fixed 9Introduction Method Experiments ConclusionProblem Related Work Detecting Microcluster Future Work
  10. 10. T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs Gaussian distribution: • Find mean & standard deviation • Compute Gaussian likelihood (edge occurrences at 𝑡) • if 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 < 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑, declare anomaly 10Introduction Method Experiments ConclusionProblem Related Work Future Work Naive Solution
  11. 11. T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs Past (t<10) Current (t=10) Expected 900 100 Observed 0 1000 11Introduction Method Experiments ConclusionProblem Related Work Mean Level at 𝑡 = 10 equals Mean Level for 𝑡 < 10 Our assumption Future Work
  12. 12. T3.1: Detecting Anomalous Dynamic Subgraphs 𝑠 𝑢𝑣 : 𝑢 − 𝑣 edges up to time 𝑡 𝑎 𝑢𝑣 : 𝑢 − 𝑣 edges at current time 𝑡 ̂𝑠 𝑢𝑣 : Approximate total count #𝑎 𝑢𝑣 : Approximate current count 12Introduction Method Experiments ConclusionProblem Related Work Streaming Data Structure: CMS Future Work Details 4𝑎 𝑢𝑣 ≤ 𝑎 𝑢𝑣 + 𝜐𝑁𝑡 with probability at least 1 − 𝜀 𝜐 is the amount of error we can tolerate. 1 − 𝜀 is the probability. e.g. with 99% probability only up to 0.5% error
  13. 13. T3.1: Detecting Anomalous Dynamic Subgraphs expectedobserved Details T3.1: Detecting Anomalous Dynamic Subgraphs 13Introduction Method Experiments ConclusionProblem Related Work Future Work Anomaly Score: Chi-Squared Test
  14. 14. Roadmap T3.1: Detecting Anomalous Dynamic Subgraphs 14Introduction Method Experiments ConclusionProblem Related Work • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work Future Work
  15. 15. MIDAS-R: Incorporating Relations T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs Details Temporal Relations: • Allows past edges to count toward the current time tick • Diminishing weight • At end of every time tick 𝑡, reduce counts by 𝛼 𝜖 [0,1] 15Introduction Method Experiments ConclusionProblem Related Work 1 2 3 4 5 6 7 8 9 444 888 Future Work Spatial Relations: • Catch large groups of spatially nearby edges • 𝑠 𝑢 : edges adjacent to node 𝑢 up to time 𝑡 • 𝑎 𝑢 : edges adjacent to node 𝑢 at current time 𝑡 • max(𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑣, 𝑡 , 𝑠𝑐𝑜𝑟𝑒 𝑢, 𝑡 , 𝑠𝑐𝑜𝑟𝑒 𝑣, 𝑡 )
  16. 16. Time and Memory Complexity T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 𝑑: number of hash functions 𝑤: number of buckets Space complexity: • 𝑂(𝑤𝑑) Time complexity: • 𝑂(𝑑) 16Introduction Method Experiments ConclusionProblem Related Work Future Work
  17. 17. Roadmap • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work T3.1: Detecting Anomalous Dynamic Subgraphs 17Introduction Method Experiments ConclusionProblem Related Work Future Work
  18. 18. Related Work T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs Anomaly Detection in Static Graphs: • Node detection: CATCHSYNC [2016], ODDBALL [2010] • Subgraph detection: [SHIN ET AL., 2018], FRAUDER [2017] • Edge detection: NRMF [2011], AUTOPART [2004] Anomaly Detection in Graph Streams: • Node detection: DTA [2006] • Subgraph detection: CATCHSYNC [2016], COPYCATCH [2013] • Edge detection: ANOMRANK [2019], SPOTLIGHT [2018] Anomaly Detection in Edge Streams: • Node detection: [YU ET AL., 2013] • Subgraph detection: DENSEALERT [2017] • Edge detection: SEDANSPOT [2018], RHSS [2016] 18Introduction Method Experiments ConclusionProblem Related Work Future Work
  19. 19. Comparison T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 19Introduction Method Experiments ConclusionProblem Related Work Future Work
  20. 20. Roadmap 20Introduction Method Experiments ConclusionProblem Related Work • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work Future Work
  21. 21. Datasets T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1. DARPA: 4.5M IP-IP communications, 87.7K minutes 2. TwitterSecurity: 2.6M tweets (May-August, 2014) 3. TwitterWorldCup: 1.7M tweets (June-July, 2014) 21Introduction Method Experiments ConclusionProblem Related Work Future Work
  22. 22. Accuracy T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 22Introduction Method Experiments ConclusionProblem Related Work Future Work
  23. 23. Running Times T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 23Introduction Method Experiments ConclusionProblem Related Work Future Work
  24. 24. Accuracy vs Time T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 24Introduction Method Experiments ConclusionProblem Related Work Accuracy(AUC) 0 0.333 0.667 1 Wall-clock time (s) 0.1 1 10 100 0.64 0.91 0.95 Midas-R Midas SedanSpot Future Work
  25. 25. Scalability T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 25Introduction Method Experiments ConclusionProblem Related Work Future Work
  26. 26. Real-World Effectiveness 26 T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1. 13-05-2014. Turkey Mine Accident 2. 24-05-2014. Raid 3. 30-05-2014. Attack/Ambush 03-06-14. Suicide bombing 4. 09-06-14. Suicide/Truck bombings 5. 10-06-2014. Iraqi Militants Seize 11-06-2014. Kidnapping 6. 15-06-14. Mpeketoni attack 7. 26-06-14. Suicide Bombing 8. 03-07-14. Gaza conflicts 9. 18-07-14. Airplane shot 10. 30-07-14. Ebola Virus Outbreak
  27. 27. Real-World Effectiveness 27 T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1. 13-05-2014. Turkey Mine Accident 2. 24-05-2014. Raid 3. 30-05-2014. Attack/Ambush 03-06-14. Suicide bombing 4. 09-06-14. Suicide/Truck bombings 5. 10-06-2014. Iraqi Militants Seize 11-06-2014. Kidnapping 6. 15-06-14. Mpeketoni attack 7. 26-06-14. Suicide Bombing 8. 03-07-14. Gaza conflicts 9. 18-07-14. Airplane shot 10. 30-07-14. Ebola Virus Outbreak
  28. 28. Real-World Effectiveness 28 T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1. 13-05-2014. Turkey Mine Accident 2. 24-05-2014. Raid 3. 30-05-2014. Attack/Ambush 03-06-14. Suicide bombing 4. 09-06-14. Suicide/Truck bombings 5. 10-06-2014. Iraqi Militants Seize 11-06-2014. Kidnapping 6. 15-06-14. Mpeketoni attack 7. 26-06-14. Suicide Bombing 8. 03-07-14. Gaza conflicts 9. 18-07-14. Airplane shot 10. 30-07-14. Ebola Virus Outbreak
  29. 29. Real-World Effectiveness 29 T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1. 13-05-2014. Turkey Mine Accident 2. 24-05-2014. Raid 3. 30-05-2014. Attack/Ambush 03-06-14. Suicide bombing 4. 09-06-14. Suicide/Truck bombings 5. 10-06-2014. Iraqi Militants Seize 11-06-2014. Kidnapping 6. 15-06-14. Mpeketoni attack 7. 26-06-14. Suicide Bombing 8. 03-07-14. Gaza conflicts 9. 18-07-14. Airplane shot 10. 30-07-14. Ebola Virus Outbreak
  30. 30. Real-World Effectiveness 30 T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 1 2 3 4 5 6 7 8 9 444 888
  31. 31. Roadmap 31Introduction Method Experiments ConclusionProblem Related Work Future Work • Problem • Algorithm • MIDAS • MIDAS-R • Related Work • Experiments • Future Work
  32. 32. MSTREAM T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 32Introduction Method Experiments ConclusionProblem Related Work Challenges: • High number of dimensions • Real-valued features • Correlation between features • Constant memory & time both w.r.t. stream length and dimensionality Future Work
  33. 33. 33 How to Contribute? 1. MIDAS-F: Filtering MIDAS 2. Parallel/Distributed/GPU version 3. Hardware implementation over FPGA 4. Knowledge Graphs (NLP) 5. Periodic setting 6. Erlang, F# implementations Introduction Method Experiments ConclusionProblem Related Work Future Work
  34. 34. Conclusion T3.1: Detecting Anomalous Dynamic SubgraphsT3.1: Detecting Anomalous Dynamic Subgraphs 34Introduction Method Experiments ConclusionProblem Related Work 1. Streaming Microcluster Detection: • Constant time and memory 2. Theoretical Guarantees: • False Positive Probability 3. Effectiveness: • MIDAS is 42%-48% more accurate • MIDAS processes the data 162×-644× faster Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos. “MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams.” AAAI Conference on Artificial Intelligence (AAAI), 2020. https://arxiv.org/abs/1911.04464 Future Work https://github.com/Stream-AD/MIDAS/

×