How to reduce false positives in security systems through feedback and rules.
You will learn about:
1) Implicit Feedback
2) Applying Rules above ML systems
3) Applying Rules as Features
4) Combining them using MLN
3. Powerball Predictor
Photo Credit: Sean McGrath
The overwhelming majority of
tickets are not winners.
Failing to recognize this is
falling victim to the base rate
fallacy.
4. Security Crystal Ball
Photo Credit: Sean McGrath
The overwhelming majority of
log entries and data points do
not represent fraud and
intrusions.
Failing to recognize this is
falling victim to the base rate
fallacy.
30. How to encode Domain
Knowledge: Embrace
Rules
• Business Heuristics to filter out the
“Security interesting anomalies”
• Rules can take many forms:
•TI feeds
•IOCs, IOAs
•TTPs
• Rules are awesome
• Credible, Interpretable, Adaptable (to some
extent), Actionable!
• Highest Precision
• Highest Recall
32. Three Ways to combine Rules and ML
1.Above Machine Learning Systems
a.Business Heuristics to filter alerts
i. “For account _foo_, only raise sev 2 alerts until March 28th, 2016”,
34. 2. Below Machine Learning Systems
a. Featurizations - “If IP address present in List of malicious IP dataset, flag 1”
b. Utilizes Threat Intel feeds (Cymru, Virus total, FireEye)
35. 3: Combining Rules and Machine Learning
together using Markov Logic Networks
Initial Ideas given by Vinod Nair, MSR
36. Intuition
•Rules alone place a set of hard constraints
on the set of possible worlds
•Let’s make them soft constraints:
When a world violates a formula,
It becomes less probable, not impossible
•Give each formula a weight
(Higher weight ⇒ Stronger constraint)
Source: Lectures by Pedro Domingos
37. Interactive logons from service accounts causes attack
Similar service accounts tend to have similar logon behavior
Example: Service Accounts
Domain
Knowledge
44. •How to learn the structure?
•Begin with hand-coded rules
•Use Inductive Logic Programming, but need to infer arbitrary clause
•How to learn the weights?
•For generative learning, depend on pseudolikelihood
•Checkout Alchemy -- http://alchemy.cs.washington.edu/
45. Call for Action - After the conference
• One Week
• Review
•@CodyRioux - IPython Notebook
•@Ram_ssk - Follow Up material
• Think comprehensively about Rules
• One Month
•Ask your data scientists to literature review section
•Implement the rules on TOP of ML systems
• One quarter
•Implement a feedback system to capture training data
•Implement all TI feeds within an ML System
•Play with Alchemy
46. Literature
● The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection
(Alexsson, 1999)
● Enhancing Performance Prediction Robustness by Combining Analytical Modeling
and Machine Learning (Didona et al., 2015)
● Richardson, Matthew, and Pedro Domingos. "Markov logic networks."Machine
learning 62.1-2 (2006): 107-136.