Graph The Planet 2019 - Intrusion Detection with Graphs

Intrusion Detection with Graphs
Faster, smarter, and with more context

The challenge
Windows server intrusion detection in Office 365
Security event logs from hundreds of thousands of servers
Contains system activity like deployment, upgrade, engineer troubleshooting
Analysis and response performed by security engineering team
Graphs help us succeed at scale and in detail
Review alerts in context, not in isolation
Prioritize investigation according to risk
Incorporate low-fidelity signals without overwhelming analysts

Detection pipeline
Detection inputs
Process, user behavior from built-in Windows audit events
Per-process network activity, DNS lookups
Windows internal subsystem activity via ETW monitoring
Detection results
Stored in a flexible-schema columnar database (Azure Data Explorer)
Column values are normalized to enforce common semantics across results
Classified according to the fidelity of the detection

Building the graph
Three steps
Extract entities that represent “pivots” between detection results
Link each result to the entities it contains and insert these into the graph
If an entity already exists from a prior step, use it
Forms a hypergraph that links related results together
Resulting graph is sparsely-connected and easy to visualize
Algorithm is O(n) and trivial to implement in Javascript, C#, etc

Building the graph
Anomalous DLL rundll32.exe launched as svc_sql11 on CFE110095
New process uploading rundll32.exe to 40.114.40.133 on CFE110095
Large transfer 50MB to 40.114.40.133 from sqlagent.exe on SQL11006

Building the graph
Anomalous DLL rundll32.exe launched as svc_sql11 on CFE110095
New process uploading rundll32.exe to 40.114.40.133 on CFE110095
Large transfer 50MB to 40.114.40.133 from sqlagent.exe on SQL11006
detection type
detection type
detection type hostname
process
process
process user hostname
hostname
hostname
hostname
anomalousdll
procupload
largetransfer
svc_sql11
CFE110095rundll32.exe
40.114.40.133sqlagent.exe
SQL11006

Graph clustering
Each cluster represents an “incident”
Detection results with entities in common that tell a story
Analysts view and triage all results in the cluster together
View cluster results in tabular form for increased density and detail
Identical clusters are merged together
Define similarity by the types of detection results each cluster contains
Collapses the long tail of small clusters caused by environment-wide changes

Cluster scoring
Clusters must meet a criteria to be eligible for triage
One result classified alert or atomic
Two unique detection types classified behavioral
Score based on detection and entity uniqueness
Points assigned to each distinct detection type in the cluster
Divided by number of distinct machines emitting that detection type
Multiplied together to generate an overall cluster score
Down-votes systemic behavior and up-votes clusters with many unique detections

Cluster-based actions
Alerting for high-scoring clusters
In-memory graph ingests new detection results and triage decisions
Scores each cluster, persists cluster snapshot as JSON, exposes REST API
Emits a high-fidelity alert when cluster score reaches a threshold
Automated triage for environment-wide behavior
“Time-travel triage” identifies activity that occurs across many servers
Adds a rule to suppress future alerts and a detection result to inform analysts

Opportunities
Time-series analysis
Updated cluster snapshots are written every 5 minutes
Can we visualize progression over time or score based on rate of change?
Improved cluster scoring
Can we use statistics to boost influence of detections that rarely fire?
Can we categorize detections by killchain stage and look for in-time-order traversal?
Can we use ML to identify detection types that typically fire together?

Bonus
Same technique can be applied to customer audit logs
Are privileged operations being performed across many resources?
Are specific IP addresses responsible for a high number of access attempts?
Are sensitive documents being accessed in bulk by a single user?
Example using O365 audit logs and PowerBI: aka.ms/auditgraph
Graph-based exploratory data analysis on user behavior
Great opportunity to help customers get more value out of their audit logs
Would love to see someone make this a point-and-click integration with O365

Thank you!
mswann@microsoft.com
@MSwannMSFT
linkedin.com/in/swannman

Graph The Planet 2019 - Intrusion Detection with Graphs

Recommended

Recommended

More Related Content

What's hot

What's hot (13)

Similar to Graph The Planet 2019 - Intrusion Detection with Graphs

Similar to Graph The Planet 2019 - Intrusion Detection with Graphs (20)

Recently uploaded

Recently uploaded (20)

Graph The Planet 2019 - Intrusion Detection with Graphs

Editor's Notes