SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
Roy Levin, Microsoft
CyberMLToolkit:
Anomaly Detection as a Scalable
Generic Service Over Apache
Spark
#UnifiedDataAnalytics #SparkAISummit
Session goals
• Present an easy-to-use framework that produces
cyber-security-anomalies
• Explain how recommendation systems are used to
find anomalous resource access
• Show how we evaluated the framework to show its
usefulness
3
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
4
centralized cloud native
Security Information &
Event Management system
Build Your Own ML (BYOML)
1. Log data from cloud resources
2. Process logs from Azure
Databricks cluster
3. Author custom security analytics
5
6
General Anomaly Detector
Dataset
Fault
detection
System health
monitoring
Security
incidents
…
We would like to capture only
Security-related-anomalies
7
•
•
•
anomalous access
• Train and apply on a simple-to-construct dataset
– Avoid writing and maintaining complex rules and logic
– Avoid the need to analyze multiple complex datasets such as:
§ Org-charts
§ RBAC tables
§ Cloud architectures
8
?
9
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
10
• Given user & resource pair (u, r)
• Provide an anomaly score of user u accessing resource r
• If anomaly score is above some threshold then surface the event
11
?
The straight forward approach
But users access new resources quite
often, so this is just not good enough
12
?Create profile per user and
resource and see if access
deviates from that profile
13
Intuition:
• Take a recommendation system and use it for anti-recommendations
14
Recommendation Engines
15
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 4 5
The Dark Knight2 3 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 3 3
Analyze That5 3 5 4 4
Anger Management6 3 5 5
Black Hawk Down7 5 4
Model Training Phase
Movie Recommendations
16
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Model Training Phase
Movie Recommendations
17
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
Model Training Phase
Movie Recommendations
18
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6
The God Father1 ? 4 ? 5 ? ?
The Dark Knight2 3 ? ? ? 2 5
Pulp Fiction3 5 3 5 4 4 5
40 Year Old Virgin4 2 4 ? ? 3 3
Analyze That5 3 5 4 ? 4 ?
Anger Management6 3 5 ? ? ? 5
Black Hawk Down7 5 ? ? 4 ? ?
Romance Action Comedy
x1
x2
xm
f1 f2 f3
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
? ? ?
Model Apply Phase
Movie Recommendations
f1 ? ? ? ? ? ?
f2 ? ? ? ? ? ?
f3 ? ? ? ? ? ?
𝜃"
Romance
Action
Comedy
𝜃# 𝜃$
Back to
Anomalous Resource Access
20
• Let us re-examine our data:
– User-resource pairs with number of times accessed
• Standard CF model assumes explicit item ratings, some problems:
– A rating is not really what we have in the input
• Although more user access to a resource likely means he should be allowed access
– We do not really have negative rating indications either, i.e., there is no explicit
indicator saying that a user should not have access to some resource
• what we do have is missing access
21
user1 user2 user3 user4 user5 user6
resource1 1200 1500
resource2 900 301 1
resource3 1500 599 1 902 1205 1500
resource4 299 1200 895 901
resource5 601 1500 1200 1203
resource6 603 1499 1495
resource7 1499 1200
user1 user2 user3 user4 user5 user6
resource1 9 10
resource2 8 6 5
resource3 10 7 5 8 9 10
resource4 6 9 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9
Linear Scaling
22
user1 user2 user3 user4 user5 user6
resource1 9 10
resource2 8 6 5
resource3 10 7 5 8 9 10
resource4 6 9 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9
Random Negative Samples
23
user1 user2 user3 user4 user5 user6
resource1 1 9 10
resource2 8 1 6 5
resource3 10 7 5 8 9 10
resource4 6 9 1 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9 1
Random Negative Samples
24
user1 user2 user3 user4 user5 user6
resource1 1 9 10
resource2 8 1 6 5
resource3 10 7 5 8 9 10
resource4 6 9 1 8 8
resource5 7 10 9 9
resource6 7 10 10
resource7 10 9 1
Adjusting for user & resource bias and create an anomaly score
−
25
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
26
• Actually: we are given a tenant-id, user, resource triplet (tid, u, r)
• Provide anomaly score of user u accessing resource r per-tenant
• Note: access within each tenant is isolated
• Goals:
– Process tenants in parallel
– Cope with data from large tenants
27
• Create a PUDF which uses the Surprise Python library to run the
CF algorithm locally on each worker node
• Provided PUDF works on Pandas-DFs that are created per-group
when apply is called
• The method is applied as follows:
– df.groupBy(tid_colname).apply(my_pudf)
* SurPRISE: Simple Python RecommendatIon System Engine http://surpriselib.com/
28
• Problem: the data from some tenants may be too large to fit into
the memory of a single worker node
• Solution: before applying, count number of entries per-tenant
– If number of entries can fit in-memory then apply PUDF method
– If not, then apply Spark CF, per tenant, one-by-one
29
• Training produces a model which is basically
– A dataframe mapping (tenant-id, user) and (tenant-id, resource) pairs to
their corresponding latent feature vectors
• Applying the model requires:
– Joining with respective user/resource to retrieve vectors
– Applying a dot-product
* Note: model can be applied with Structured Streaming
30
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
31
Experiments for Azure Sentinel AI
1. Synthetic dataset
2. Actual file share data from large customer
• Users accessing shared network files
32
For training
33
Add cross
group access
For testing
1.
2.
34
Results
100%, i.e. all 100 cross group access
receives top-100 anomaly scores!
Add cross
group access
35
File Share SMB server
Actual Attack Description
shares
Machine 1
shares
Machine 2
shares
Machine n
58% of companies have over 100,000 folders open to everyone within the network
(source: Varonis cybersecurity data security and analytics)
36
Algorithm Training
shares
Machine 1
shares
Machine 2
shares
Machine n
37
Testset (2 days after training)
shares
Machine 1
shares
Machine 2
shares
Machine n
38
Results
dataset/anomaly
scores
Mean stddev min Max count
Entire test set 0.05 1.16 -19.21 8.07 3.8M
𝑼𝒏𝒔𝒆𝒆𝒏 𝒗𝒂𝒍𝒊𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 -0.28 0.38 -1.2 1.18 410
𝑹𝒆𝒔𝒕𝒓𝒊𝒄𝒕𝒆𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 7.81 0.11 7.44 8.07 400
39
Motivation
Formulation & Models
Scalability for Large Datasets
Evaluation
Summary
Agenda
40
41
from sentinel_ai.peer_anomaly.spark_collaborative_filtering import AccessAnomaly
access_anomaly = AccessAnomaly( # it is just an estimator
tenant_colname,
user_colname,
res_colname,
score_colname
)
anom_model = access_anomaly.fit(training_dataset_scored_triplets)
scored_test_dataset_triplets = anom_model.transform(test_dataset_triplets)
scored_test_dataset_triplets.show()
https://github.com/Azure/Azure-Sentinel-BYOML
• Introduced an Access Anomaly Detection framework for cyber
security and how it fits into the BYOML pillar of Azure Sentinel
– an anti-recommendation is an access-anomaly
– code has been open sourced
• The framework provides a simple-to-use API allowing security
analysts to surface access anomalies
• Call-to-action: experiment with the framework, continue this line
of research, suggest and add more algorithm
42
DON’T FORGET TO RATE
AND REVIEW THE SESSIONS
SEARCH SPARK + AI SUMMIT

Contenu connexe

Tendances

Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at ScaleDatabricks
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowDatabricks
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingDatabricks
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinDatabricks
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherDatabricks
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkDatabricks
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsDatabricks
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowDatabricks
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Databricks
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Databricks
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...Spark Summit
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Databricks
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Databricks
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterDatabricks
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageDatabricks
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaSpark Summit
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Databricks
 

Tendances (20)

Automated Production Ready ML at Scale
Automated Production Ready ML at ScaleAutomated Production Ready ML at Scale
Automated Production Ready ML at Scale
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Managing the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflowManaging the Complete Machine Learning Lifecycle with MLflow
Managing the Complete Machine Learning Lifecycle with MLflow
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold XinUnifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
Unifying State-of-the-Art AI and Big Data in Apache Spark with Reynold Xin
 
Tactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark TogetherTactical Data Science Tips: Python and Spark Together
Tactical Data Science Tips: Python and Spark Together
 
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache SparkData-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
Data-Driven Transformation: Leveraging Big Data at Showtime with Apache Spark
 
Apache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new DirectionsApache Spark's MLlib's Past Trajectory and new Directions
Apache Spark's MLlib's Past Trajectory and new Directions
 
Splice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflowSplice Machine's use of Apache Spark and MLflow
Splice Machine's use of Apache Spark and MLflow
 
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
ModelDB: A System to Manage Machine Learning Models: Spark Summit East talk b...
 
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
 
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
Lessons Learned Replatforming A Large Machine Learning Application To Apache ...
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
Apache Spark-Based Stratification Library for Machine Learning Use Cases at N...
 
From Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim HunterFrom Pipelines to Refineries: scaling big data applications with Tim Hunter
From Pipelines to Refineries: scaling big data applications with Tim Hunter
 
Observability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineageObservability for Data Pipelines With OpenLineage
Observability for Data Pipelines With OpenLineage
 
How Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-ShmaHow Spark Enables the Internet of Things- Paula Ta-Shma
How Spark Enables the Internet of Things- Paula Ta-Shma
 
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
Operationalizing Machine Learning—Managing Provenance from Raw Data to Predic...
 

Similaire à CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark

2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class NineFRSecure
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial EmulationScott Sutherland
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisMatt Stubbs
 
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Precisely
 
2018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 92018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 9FRSecure
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapFelipe Prado
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for CybersecurityVMware Tanzu
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2RazorEQX
 
Mining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDMining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDLoren Gordon
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Precisely
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionSri Ambati
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsFaithWestdorp
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysJoff Thyer
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob HolcombPriyanka Aash
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies台灣資料科學年會
 
2020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 92020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 9FRSecure
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READZachary S. Brown
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...Mike Spaulding
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j
 

Similaire à CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark (20)

2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine2019 FRSecure CISSP Mentor Program: Class Nine
2019 FRSecure CISSP Mentor Program: Class Nine
 
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
2017 Q1 Arcticcon - Meet Up - Adventures in Adversarial Emulation
 
Big Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with RedisBig Data LDN 2017: Serving Predictive Models with Redis
Big Data LDN 2017: Serving Predictive Models with Redis
 
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
Machine Learning & IT Service Intelligence for the Enterprise: The Future is ...
 
2018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 92018 FRSecure CISSP Mentor Program Session 9
2018 FRSecure CISSP Mentor Program Session 9
 
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slapDEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
DEF CON 27 - CHRISTOPHER ROBERTS - firmware slap
 
Using Data Science for Cybersecurity
Using Data Science for CybersecurityUsing Data Science for Cybersecurity
Using Data Science for Cybersecurity
 
How i'm going to own your organization v2
How i'm going to own your organization v2How i'm going to own your organization v2
How i'm going to own your organization v2
 
Mining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVDMining software vulns in SCCM / NIST's NVD
Mining software vulns in SCCM / NIST's NVD
 
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
Machine Learning for Your Enterprise: Operations and Security for Mainframe E...
 
Become a Cloud Security Ninja
Become a Cloud Security NinjaBecome a Cloud Security Ninja
Become a Cloud Security Ninja
 
Building, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for ProductionBuilding, Evaluating, and Optimizing your RAG App for Production
Building, Evaluating, and Optimizing your RAG App for Production
 
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash PluginsEmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
EmPOW: Integrating Attack Behavior Intelligence into Logstash Plugins
 
BSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad GuysBSIDES-PR Keynote Hunting for Bad Guys
BSIDES-PR Keynote Hunting for Bad Guys
 
RIoT (Raiding Internet of Things) by Jacob Holcomb
RIoT  (Raiding Internet of Things)  by Jacob HolcombRIoT  (Raiding Internet of Things)  by Jacob Holcomb
RIoT (Raiding Internet of Things) by Jacob Holcomb
 
[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies[系列活動] 資料探勘速遊 - Session4 case-studies
[系列活動] 資料探勘速遊 - Session4 case-studies
 
2020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 92020 FRSecure CISSP Mentor Program - Class 9
2020 FRSecure CISSP Mentor Program - Class 9
 
Cyber Threat Ranking using READ
Cyber Threat Ranking using READCyber Threat Ranking using READ
Cyber Threat Ranking using READ
 
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
BlackHat Presentation - Lies and Damn Lies: Getting past the Hype of Endpoint...
 
Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You Neo4j: What's Under the Hood & How Knowing This Can Help You
Neo4j: What's Under the Hood & How Knowing This Can Help You
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理e4aez8ss
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max PrincetonTimothy Spann
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research
 

Dernier (20)

Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
Defining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data StoryDefining Constituents, Data Vizzes and Telling a Data Story
Defining Constituents, Data Vizzes and Telling a Data Story
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
科罗拉多大学波尔得分校毕业证学位证成绩单-可办理
 
Real-Time AI Streaming - AI Max Princeton
Real-Time AI  Streaming - AI Max PrincetonReal-Time AI  Streaming - AI Max Princeton
Real-Time AI Streaming - AI Max Princeton
 
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
Biometric Authentication: The Evolution, Applications, Benefits and Challenge...
 

CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark

  • 1. WIFI SSID:Spark+AISummit | Password: UnifiedDataAnalytics
  • 2. Roy Levin, Microsoft CyberMLToolkit: Anomaly Detection as a Scalable Generic Service Over Apache Spark #UnifiedDataAnalytics #SparkAISummit
  • 3. Session goals • Present an easy-to-use framework that produces cyber-security-anomalies • Explain how recommendation systems are used to find anomalous resource access • Show how we evaluated the framework to show its usefulness 3
  • 4. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 4
  • 5. centralized cloud native Security Information & Event Management system Build Your Own ML (BYOML) 1. Log data from cloud resources 2. Process logs from Azure Databricks cluster 3. Author custom security analytics 5
  • 6. 6 General Anomaly Detector Dataset Fault detection System health monitoring Security incidents … We would like to capture only Security-related-anomalies
  • 8. anomalous access • Train and apply on a simple-to-construct dataset – Avoid writing and maintaining complex rules and logic – Avoid the need to analyze multiple complex datasets such as: § Org-charts § RBAC tables § Cloud architectures 8
  • 9. ? 9
  • 10. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 10
  • 11. • Given user & resource pair (u, r) • Provide an anomaly score of user u accessing resource r • If anomaly score is above some threshold then surface the event 11
  • 12. ? The straight forward approach But users access new resources quite often, so this is just not good enough 12
  • 13. ?Create profile per user and resource and see if access deviates from that profile 13
  • 14. Intuition: • Take a recommendation system and use it for anti-recommendations 14
  • 16. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 4 5 The Dark Knight2 3 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 3 3 Analyze That5 3 5 4 4 Anger Management6 3 5 5 Black Hawk Down7 5 4 Model Training Phase Movie Recommendations 16
  • 17. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$ Model Training Phase Movie Recommendations 17
  • 18. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Model Training Phase Movie Recommendations 18 f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$
  • 19. Roy1 Inbal2 Hasan3 Lior4 Anat5 Arnon6 The God Father1 ? 4 ? 5 ? ? The Dark Knight2 3 ? ? ? 2 5 Pulp Fiction3 5 3 5 4 4 5 40 Year Old Virgin4 2 4 ? ? 3 3 Analyze That5 3 5 4 ? 4 ? Anger Management6 3 5 ? ? ? 5 Black Hawk Down7 5 ? ? 4 ? ? Romance Action Comedy x1 x2 xm f1 f2 f3 ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Model Apply Phase Movie Recommendations f1 ? ? ? ? ? ? f2 ? ? ? ? ? ? f3 ? ? ? ? ? ? 𝜃" Romance Action Comedy 𝜃# 𝜃$
  • 21. • Let us re-examine our data: – User-resource pairs with number of times accessed • Standard CF model assumes explicit item ratings, some problems: – A rating is not really what we have in the input • Although more user access to a resource likely means he should be allowed access – We do not really have negative rating indications either, i.e., there is no explicit indicator saying that a user should not have access to some resource • what we do have is missing access 21
  • 22. user1 user2 user3 user4 user5 user6 resource1 1200 1500 resource2 900 301 1 resource3 1500 599 1 902 1205 1500 resource4 299 1200 895 901 resource5 601 1500 1200 1203 resource6 603 1499 1495 resource7 1499 1200 user1 user2 user3 user4 user5 user6 resource1 9 10 resource2 8 6 5 resource3 10 7 5 8 9 10 resource4 6 9 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 Linear Scaling 22
  • 23. user1 user2 user3 user4 user5 user6 resource1 9 10 resource2 8 6 5 resource3 10 7 5 8 9 10 resource4 6 9 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 Random Negative Samples 23
  • 24. user1 user2 user3 user4 user5 user6 resource1 1 9 10 resource2 8 1 6 5 resource3 10 7 5 8 9 10 resource4 6 9 1 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 1 Random Negative Samples 24
  • 25. user1 user2 user3 user4 user5 user6 resource1 1 9 10 resource2 8 1 6 5 resource3 10 7 5 8 9 10 resource4 6 9 1 8 8 resource5 7 10 9 9 resource6 7 10 10 resource7 10 9 1 Adjusting for user & resource bias and create an anomaly score − 25
  • 26. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 26
  • 27. • Actually: we are given a tenant-id, user, resource triplet (tid, u, r) • Provide anomaly score of user u accessing resource r per-tenant • Note: access within each tenant is isolated • Goals: – Process tenants in parallel – Cope with data from large tenants 27
  • 28. • Create a PUDF which uses the Surprise Python library to run the CF algorithm locally on each worker node • Provided PUDF works on Pandas-DFs that are created per-group when apply is called • The method is applied as follows: – df.groupBy(tid_colname).apply(my_pudf) * SurPRISE: Simple Python RecommendatIon System Engine http://surpriselib.com/ 28
  • 29. • Problem: the data from some tenants may be too large to fit into the memory of a single worker node • Solution: before applying, count number of entries per-tenant – If number of entries can fit in-memory then apply PUDF method – If not, then apply Spark CF, per tenant, one-by-one 29
  • 30. • Training produces a model which is basically – A dataframe mapping (tenant-id, user) and (tenant-id, resource) pairs to their corresponding latent feature vectors • Applying the model requires: – Joining with respective user/resource to retrieve vectors – Applying a dot-product * Note: model can be applied with Structured Streaming 30
  • 31. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 31
  • 32. Experiments for Azure Sentinel AI 1. Synthetic dataset 2. Actual file share data from large customer • Users accessing shared network files 32
  • 34. Add cross group access For testing 1. 2. 34
  • 35. Results 100%, i.e. all 100 cross group access receives top-100 anomaly scores! Add cross group access 35
  • 36. File Share SMB server Actual Attack Description shares Machine 1 shares Machine 2 shares Machine n 58% of companies have over 100,000 folders open to everyone within the network (source: Varonis cybersecurity data security and analytics) 36
  • 38. Testset (2 days after training) shares Machine 1 shares Machine 2 shares Machine n 38
  • 39. Results dataset/anomaly scores Mean stddev min Max count Entire test set 0.05 1.16 -19.21 8.07 3.8M 𝑼𝒏𝒔𝒆𝒆𝒏 𝒗𝒂𝒍𝒊𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 -0.28 0.38 -1.2 1.18 410 𝑹𝒆𝒔𝒕𝒓𝒊𝒄𝒕𝒆𝒅 𝒂𝒄𝒄𝒆𝒔𝒔 7.81 0.11 7.44 8.07 400 39
  • 40. Motivation Formulation & Models Scalability for Large Datasets Evaluation Summary Agenda 40
  • 41. 41 from sentinel_ai.peer_anomaly.spark_collaborative_filtering import AccessAnomaly access_anomaly = AccessAnomaly( # it is just an estimator tenant_colname, user_colname, res_colname, score_colname ) anom_model = access_anomaly.fit(training_dataset_scored_triplets) scored_test_dataset_triplets = anom_model.transform(test_dataset_triplets) scored_test_dataset_triplets.show() https://github.com/Azure/Azure-Sentinel-BYOML
  • 42. • Introduced an Access Anomaly Detection framework for cyber security and how it fits into the BYOML pillar of Azure Sentinel – an anti-recommendation is an access-anomaly – code has been open sourced • The framework provides a simple-to-use API allowing security analysts to surface access anomalies • Call-to-action: experiment with the framework, continue this line of research, suggest and add more algorithm 42
  • 43. DON’T FORGET TO RATE AND REVIEW THE SESSIONS SEARCH SPARK + AI SUMMIT