Extending Splunk with Machine-learning Predictive Analytics

Extending Splunk with
Machine-learning Predictive Analytics

Rich Collier
Solutions Architect
rich@prelert.com

Why Machine Learning?
• Overcome limitations of human analysis
• Auto-learn baseline behavior using proper
modeling
• Detect anomalous behavior

Overcoming limitations of
Human Analysis
• Judging what’s “normal” is not always easy

• Humans don’t always choose the right
techniques

IPTables (firewall)
• How to find most anomalous users (aggressive
brute force attackers)?
• Here is a typical (manual) process

Step 1) Search

Questions:
What’s normal?
What about that
spike?
Probably should try to visualize counts by SRC over time…

Step 2) stats
command, sort by
count

Question: How to
show as a function
of time, not just
overall?

Step 3) add
bucketing for
breakdown by time

Question: What is
an anomalous
count per bucket?
100? 1000?
10,000? Maybe we should try to use some more stats?

Step 4) add some
“basic” statistical
analysis:
avg +/- 2
Question: How to
show the individual
“outliers” (and not
lose the concept of
time)?

Step 5) use
eventstats to repair
time problem and
add “where” clause
to only show those
outside of +/-2
Question: Are these
161 results accurate?
(I hope you didn’t build an
alert and get 161 of them!)

Problem: Statistical modeling is
INCORRECT for this data
– (-75) events doesn’t make
sense for avg - 2
– how much confidence do
you have in avg + 2 ?
Result:
• Wrong model= false
positives/negatives

The Problem: +/-2
assumes data is
Gaussian (Bell Curve)
Clearly, this data is
better fit by a
Poisson curve

Examples of Non-Gaussian Data
status=503
Memory Utilization

CPU load

status=404
Revenue Transactions

One More Problem…
• Even if the demonstrated technique was
accurate:
– Still need to persist what you’ve learned “so far”
so that you don’t have to keep re-inspecting
historical data as new data comes in
– This requires you to manually write/read
information into a summary index

First, an Analogy
• How could I accurately predict how much
Postal-mail you are likely to get delivered to
your home tomorrow?

I Would…
• Watch your mail delivery for a while
– 1 day?
– 1 week?
– 1 month?
– 1 year?

• Use my observations to create a…

Average?
Std. Deviation?
Probability Distribution Function?

A Probability Distribution Function!
% likelihood (probability)

Best for my house

pieces of mail per day


College Student?




My Mom


Using Machine Learning
to build a Probability Distribution Function
• PDF must be built specifically for each
“instance”
• PDF should be constructed automatically
merely by watching the data

Using Machine Learning
to build a Probability Distribution Function

23

Finding “what’s unexpected”…
Your job is often looking for unexpected change in your
environment, either proactively through monitoring or
reactively through diagnostics/troubleshooting


Using the PDF to Find
What is Unexpected
zero pieces
of mail?
fifteen
pieces of
mail?


Relate back to data in Splunk
• # Pieces of mail = # events of a certain type
– number of failed logins
– number of errors of different types
– number of events with certain status codes
– etc.

• Or, performance metrics
– response time
– utilization %

• Prelert Anomaly Detective
– Automatically, and correctly
models data via self-learning
– Applies sophisticated
Bayesian techniques
– Persists “on-going” analysis
to allow real-time alerting
– Makes it easy to use
3 significant alerts, not 161!

• Results are:
– Accurate outliers
– Automatically clustered
and scored by their
probabilistic “unlikelihood”
– Relevant in time, easy to
make alerts
– Clickable for drill-down

• Drill-downs:
– Automatically constructs
useful search syntax and
time selection
– Shows anomalies in
context of the original data
– Serve as a possible
jumping-off point for
subsequent manual mining

Automated Anomaly Detection

• Less time searching & troubleshooting
• Proactive trustworthy alerts without
thresholds
• Auto-discovers the previously unknown

Automated Anomaly Detection for
splunk>

Additional
Use Cases

Use Case
• Data sources:
– App logs
– Network performance
– SQL-Server metrics

• Prelert identifies
network discards that
cause app to
disconnect from DB

Correlating Anomalies
Across Data Types

Use Case
• Data source: Netstat
• Prelert finds a rare FTP
connection from a
server that doesn’t
normally use FTP

Servers making
unusual TCP connections

Use Case
• Data source: Custom
logs
• Prelert identifies unusual
$0.60 transaction –
traced to bug in currency
conversion

Revenue
Transactions

Use Case
• Data source:
BlueCoat proxy
• Prelert identifies
users abusing
Internet privileges
gambling sites

porn sites

Clients pervasively
visiting rare URLs

Use Case
• Response time of
online bank website
• Prelert alerts on
spikes without the
need to create a
single threshold

Monitoring Performance
w/o Thresholds

Use Case
• Data source: BlueCoat
proxy
• Prelert identifies client
attempting to exploit an
outside IIS webserver

Unusual outbound
traffic rates

Automated Anomaly Detection
for splunk>

Extending Splunk with Machine-learning Predictive Analytics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (10)

Similaire à Extending Splunk with Machine-learning Predictive Analytics

Similaire à Extending Splunk with Machine-learning Predictive Analytics (20)

Plus de Splunk

Plus de Splunk (20)

Dernier

Dernier (20)

Extending Splunk with Machine-learning Predictive Analytics

Notes de l'éditeur