SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
A SOBER LOOK AT MACHINE
LEARNING
DR. SVEN KRASSER CHIEF SCIENTIST
@SVENKRASSER
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Distinguishing Science…
Source: CERN, http://home.cern/sites/home.web.cern.ch/files/image/experiment/2013/01/cms_0.jpeg
…from FictionSource: “Chain Reaction,” 20th Century Fox
MACHINE LEARNING 101
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
EXAMPLES OF MACHINE LEARNING
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SPAM
FILTERING
MOVIE
RECOMMENDATIONS
SIRI
(iPHONE)
TODAY’S FOCUS: SUPERVISED LEARNING
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
TODAY’S FOCUS: GEOMETRIC MODELS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
EVERYTHING YOU WILL SEE TODAY
IS REAL WORLD DATA
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Some Data to Get Started:
1988 ANTHROPOMETRIC
SURVEY OF ARMY PERSONNEL
Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• Over 4000 soldiers surveyed
• Over 100 measurements
• Reported by gender
Test subjects are in better shape
than the rest of us...
Data
Selection Bias
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FIRST LOOK
Height [mm]
Density
• Difference in distribution
• Significant overlap
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SECOND DIMENSION
Height [mm]
Weight[10-1
kg]
• Correlation
• Overlap
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FEATURE SELECTION
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Correlation
• Gender-specific slope
• Reduced overlap
• Selection of features
matters
• How to make a
prediction?
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
K-NEAREST NEIGHBOR
“Buttock Circumference” [mm]
Weight[10-1
kg]
m
f
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SUPPORT VECTOR MACHINE
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
SUPPORT VECTOR MACHINE
2016 CrowdStrike, Inc. All rights reserved.
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Overfitting
• Classifier does not
generalize
• Let’s take a
closer look…
CROSS
VALIDATION
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
TRAIN TRAIN TRAIN TEST
TRAIN TRAIN TEST TRAIN
TRAIN TEST TRAIN TRAIN
TEST TRAIN TRAIN TRAIN
• Divide data into k folds
• Train on k-1 folds, test
on the remaining one
• Repeat k times for
all folds
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
• Classifier generalizes
• Note some
misclassifications
• Let’s assume we want
to detect males (blue)
§ I.e. “blue” is our
positive class
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight[10-1
kg]
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
LET’S CLASSIFY
“Buttock Circumference” [mm]
Weight  [10-­1
kg]
• Get more “blue” right
(true positives)
• Get more “red” wrong
(false positives)
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
RECEIVER OPERATING CHARACTERISTICS CURVE
False Positive Rate
TruePositiveRate
Detect	
  more	
  by	
  accepting	
  more	
  false	
  positives
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
THREE DIMENSIONS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MORE DIMENSIONS
Decision Value
Density
• Linear model in ~160
dimensions
• Linearly separable
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
Source:Source: http://playground.tensorflow.org/
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
TREES AND TREE ENSEMBLES
SPARSE
FEATURES
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
400 401 402 403 404 405 406 407 408 409 410 411 412 413 414
area codes
0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
N-GRAMS
43 72 6F 77 64 53 74 72 69 6B 65
43726F 776453 747269
726F77 645374 72696B
6F7764 537472 696B65
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
MISSION ACCOMPLISHED:
WE JUST ADD MORE DIMENSIONS…
RIGHT?
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
CURSE OF DIMENSIONALITY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
REDUCED
predictive
performance
INCREASED
training time
SLOWER
classification
LARGER
memory footprint
Source: https://commons.wikimedia.org/w/index.php?curid=2257082
Source: https://commons.wikimedia.org/w/index.php?curid=2257082
DIMENSIONALITY AND SPARSENESS
2016 CrowdStrike, Inc. All rights reserved.
Height (mm)
Weight[10-1
kg]
DIMENSIONALITY AND SPARSENESS
2016 CrowdStrike, Inc. All rights reserved.
Height (mm)
Weight[10-1
kg]
MANAGING
DIMENSIONALITY
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• FEATURE ELIMINATION
– Feature ranking
– Stop words
• FEATURE REDUCTION
– Principal Component Analysis
– Autoencoders
– Points on lower-dimensional
manifold
– Stemming
• ENSEMBLE METHODS
– Classifier of classifiers, e.g. stacking
– Bagging and subspace sampling,
e.g. Random Forests
• And much, much more…
SECURITY APPLICATIONS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
FILE
ANALYSIS
AKA Static Analysis
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• THE GOOD
– Relatively fast
– Scalable
– No need to detonate
– Platform independent, can be done at gateway
– Can support file similarity analysis
• THE BAD
– Limited insight due to narrow view
– Different file types require different techniques
– Different subtypes need special consideration
– Packed files
– .Net
– Installers
– EXEs vs DLLs
– Obfuscations (yet good if detectable)
– Ineffective against exploitation and malware-less attacks
– Asymmetry: a fraction of a second to decide for the
defender, months to craft for the attacker
EXAMPLE FEATURES
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
32/64BIT
EXECUTABLE
GUI
SUBSYSTEM
COMMAND
LINE
SUBSYSTEM
FILESIZE TIMESTAMP
DEBUG
INFORMATION
PRESENT
PACKERTYPE FILEENTROPY
NUMBEROF
SECTIONS
NUMBER
WRITABLE
NUMBER
READABLE
NUMBER
EXECUTABLE
DISTRIBUTION
OFSECTION
ENTROPY
IMPORTED
DLLNAMES
IMPORTED
FUNCTION
NAMES
COMPILER
ARTIFACTS
LINKER
ARTIFACTS
RESOURCE
DATA
EMBEDDED
PROTOCOL
STRINGS
EMBEDDED
IPS/DOMAINS
EMBEDDED
PATHS
EMBEDDED
PRODUCT
METADATA
DIGITAL
SIGNATURE
ICON
CONTENT
…
COMBINING FEATURES
• Projection to show
clusters
• For illustration, not
the space in that we
classify
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
EXECUTION
ANALYSIS
AKA Dynamic Analysis
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• THE GOOD
– Captures actual behavior of file
– Obfuscating behavior is hard
– Effective against exploitation
– Effective against malware-less attacks
– Not dependent on awareness of specific file
types
• THE BAD
– File needs to be executed
– Takes additional time to observe execution
– Execution depends on environment (e.g.
sandbox vs real world)
EXAMPLE: GLOBAL BEHAVIOR
§ Behavior across many executions
of a file
§ Conducted on event data centrally
located in the cloud
Krasser, S., Meyer, B., & Crenshaw, P. (2015). Valkyrie:
Behavioral Malware Detection using Global Kernel-
level Telemetry Data. In Proceedings of the 2015 IEEE
International Workshop on Machine Learning for Signal
Processing.
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
ML VS OTHER TECHNIQUES
§ ML output is probabilistic
§ Use other techniques where appropriate
§ Most ML-based engines use standard hashes or fuzzy hashes on top of a model
§ Example: credentials theft IoA
EVALUATING ML SOLUTIONS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
PRELIMINARIES
§ ML is not a feature, it is an implementation detail
§ Every solution must make trade-offs of conflicting objectives
§ FP vs TP
§ Speed vs accuracy
§ Memory footprint vs accuracy
§ Expressiveness vs explainability
§ Benchmarks under different assumptions are very hard to compare, even internally
§ Marchitecture
§ Looking at the right data: 60% of intrusions do not involve malware
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
How much data is there to train on?
SCOPE: SCALE
§ Volume of data generated by
sources used
§ Aperture: footprint of deployment
§ Data collection
§ Point of analysis (endpoint, on-
prem, cloud)
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
How many data sources are used?
SCOPE: BREADTH
§ Varied sources and techniques
§ Static analysis
§ Behavioral analysis
§ Proliferation
§ Indicators from other techniques
§ Access to historical data
§ Baseline
§ Process lineage
§ “Number of characteristics” is not a useful metric
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
DETECTION RATE
§ Detection rate w/o false positive rate is
meaningless
§ Considering the base rate is important
§ System
§ 100k clean files, 1 malware file
§ 99% TPR at 0.1% FPR è 100 FPs, 1 TP
§ Downloads
§ 1k clean files, 1 malware file
§ 99% TPR at 0.1% FPR è 1 FP, 1 TP
§ Sourcing of test files skews results
§ Number of samples used to measure
(often too small)
§ False Positive Rate
§TruePositiveRate
APTS & 99% OF MALWARE DETECTED…
2016 CrowdStrike, Inc. All rights reserved.51
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
APTS (CONT.)
§ Combine techniques to offset tradeoffs
§ Static and behavioral
§ ML and non-ML
§ Lean local techniques and heavy-weight cloud techniques
§ Avoid silent failure: what happens when the adversary made it onto the system?
§ Avoid brittle techniques: does the solution depend on the attacker not having
access to detection results?
KEY POINTS
2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
• Machine Learning is an important part of the security tool chest
• Hidden untapped structure in your data
• Various trade-offs, most importantly between true and false positives
• Dimensionality is good…until it’s not
• Not all dimensions are created equal
• Comprehensive coverage by combining techniques
A Sober Look at Machine Learning

Contenu connexe

Tendances

Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189
Vahid Jalili
 

Tendances (20)

Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189Federated Storage Resources GCC2018 https://vimeo.com/291738189
Federated Storage Resources GCC2018 https://vimeo.com/291738189
 
Elastic Stack Roadmap
Elastic Stack RoadmapElastic Stack Roadmap
Elastic Stack Roadmap
 
The Art and Science of Alert Triage
The Art and Science of Alert TriageThe Art and Science of Alert Triage
The Art and Science of Alert Triage
 
Au cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite ElasticAu cœur de la roadmap de la Suite Elastic
Au cœur de la roadmap de la Suite Elastic
 
Modernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led TrainingModernizing Your SOC: A CISO-led Training
Modernizing Your SOC: A CISO-led Training
 
Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)Threat Hunting Platforms (Collaboration with SANS Institute)
Threat Hunting Platforms (Collaboration with SANS Institute)
 
Machine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting StartedMachine Learning for Incident Detection: Getting Started
Machine Learning for Incident Detection: Getting Started
 
SQRRL threat hunting platform
SQRRL threat hunting platformSQRRL threat hunting platform
SQRRL threat hunting platform
 
Art into Science 2017 - Investigation Theory: A Cognitive Approach
Art into Science 2017 - Investigation Theory: A Cognitive ApproachArt into Science 2017 - Investigation Theory: A Cognitive Approach
Art into Science 2017 - Investigation Theory: A Cognitive Approach
 
Abstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat HuntingAbstract Tools for Effective Threat Hunting
Abstract Tools for Effective Threat Hunting
 
Troubleshooting your elasticsearch cluster like a support engineer
Troubleshooting your elasticsearch cluster like a support engineerTroubleshooting your elasticsearch cluster like a support engineer
Troubleshooting your elasticsearch cluster like a support engineer
 
Optimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen SolutionsOptimizing Elastic for Search at McQueen Solutions
Optimizing Elastic for Search at McQueen Solutions
 
Threat Hunting 102: Beyond the Basics
Threat Hunting 102: Beyond the BasicsThreat Hunting 102: Beyond the Basics
Threat Hunting 102: Beyond the Basics
 
Scaling and Managing Big Data Apps in the Cloud
Scaling and Managing Big Data Apps in the CloudScaling and Managing Big Data Apps in the Cloud
Scaling and Managing Big Data Apps in the Cloud
 
SOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation LabyrinthSOC2016 - The Investigation Labyrinth
SOC2016 - The Investigation Labyrinth
 
VariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomicsVariantSpark - a Spark library for genomics
VariantSpark - a Spark library for genomics
 
University of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEMUniversity of Oxford: building a next generation SIEM
University of Oxford: building a next generation SIEM
 
Leveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your HuntsLeveraging Threat Intelligence to Guide Your Hunts
Leveraging Threat Intelligence to Guide Your Hunts
 
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior GraphUser and Entity Behavior Analytics using the Sqrrl Behavior Graph
User and Entity Behavior Analytics using the Sqrrl Behavior Graph
 
In that case, we have an OWASP Top 10 opportunity...
In that case, we have an OWASP Top 10 opportunity...In that case, we have an OWASP Top 10 opportunity...
In that case, we have an OWASP Top 10 opportunity...
 

Similaire à A Sober Look at Machine Learning

How to Replace Your Legacy Antivirus Solution with CrowdStrike
How to Replace Your Legacy Antivirus Solution with CrowdStrikeHow to Replace Your Legacy Antivirus Solution with CrowdStrike
How to Replace Your Legacy Antivirus Solution with CrowdStrike
Adam Barrera
 
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
CrowdStrike
 
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophisticationBSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
Paül Jaramillo
 
Qconny2014dmarsh 140613080328-phpapp02
Qconny2014dmarsh 140613080328-phpapp02Qconny2014dmarsh 140613080328-phpapp02
Qconny2014dmarsh 140613080328-phpapp02
재구 김
 
Building private-clouds-qconsf
Building private-clouds-qconsfBuilding private-clouds-qconsf
Building private-clouds-qconsf
Andrew Shafer
 

Similaire à A Sober Look at Machine Learning (20)

Battling Unknown Malware with Machine Learning
Battling Unknown Malware with Machine Learning Battling Unknown Malware with Machine Learning
Battling Unknown Malware with Machine Learning
 
Startupfest 2012 - Coefficients of friction
Startupfest 2012 - Coefficients of frictionStartupfest 2012 - Coefficients of friction
Startupfest 2012 - Coefficients of friction
 
MITRE ATTACKcon Power Hour - October
MITRE ATTACKcon Power Hour - OctoberMITRE ATTACKcon Power Hour - October
MITRE ATTACKcon Power Hour - October
 
How to Replace Your Legacy Antivirus Solution with CrowdStrike
How to Replace Your Legacy Antivirus Solution with CrowdStrikeHow to Replace Your Legacy Antivirus Solution with CrowdStrike
How to Replace Your Legacy Antivirus Solution with CrowdStrike
 
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
Runbook Automation: Old News or a Key to Unlock Performance? [DOES2020]
 
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
CrowdStrike CrowdCast: Is Ransomware Morphing Beyond The Ability Of Standard ...
 
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophisticationBSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
BSides San Diego 2017 - Sophisticuffs: The rumble over adversary sophistication
 
Bsides Chicago2017
Bsides Chicago2017Bsides Chicago2017
Bsides Chicago2017
 
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
Worldwide Public Sector Breakfast Hosted by Teresa Carlson (WPS01) - AWS re:I...
 
Qconny2014dmarsh 140613080328-phpapp02
Qconny2014dmarsh 140613080328-phpapp02Qconny2014dmarsh 140613080328-phpapp02
Qconny2014dmarsh 140613080328-phpapp02
 
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
My Futuristic Vision of the Future of Cassandra's Future - NGCC 2015
 
Building private-clouds-qconsf
Building private-clouds-qconsfBuilding private-clouds-qconsf
Building private-clouds-qconsf
 
Fast Delivery DevOps Israel
Fast Delivery DevOps IsraelFast Delivery DevOps Israel
Fast Delivery DevOps Israel
 
DevOps: From Industry Buzzword to Real Implementation / Real Benefits
DevOps: From Industry Buzzword to Real Implementation / Real BenefitsDevOps: From Industry Buzzword to Real Implementation / Real Benefits
DevOps: From Industry Buzzword to Real Implementation / Real Benefits
 
Continuous Testing
Continuous TestingContinuous Testing
Continuous Testing
 
The Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform ManagementThe Art of Visibility: Enabling Multi-Platform Management
The Art of Visibility: Enabling Multi-Platform Management
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
Genetic Malware
Genetic MalwareGenetic Malware
Genetic Malware
 
The New Normal: Managing the constant stream of new vulnerabilities
The New Normal: Managing the constant stream of new vulnerabilitiesThe New Normal: Managing the constant stream of new vulnerabilities
The New Normal: Managing the constant stream of new vulnerabilities
 
Microservices Manchester: Keynote. Microservices are so 2015, What's Next? By...
Microservices Manchester: Keynote. Microservices are so 2015, What's Next? By...Microservices Manchester: Keynote. Microservices are so 2015, What's Next? By...
Microservices Manchester: Keynote. Microservices are so 2015, What's Next? By...
 

Dernier

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 

Dernier (20)

➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men  🔝mahisagar🔝   Esc...
➥🔝 7737669865 🔝▻ mahisagar Call-girls in Women Seeking Men 🔝mahisagar🔝 Esc...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men  🔝Thrissur🔝   Escor...
➥🔝 7737669865 🔝▻ Thrissur Call-girls in Women Seeking Men 🔝Thrissur🔝 Escor...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 

A Sober Look at Machine Learning

  • 1. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. A SOBER LOOK AT MACHINE LEARNING DR. SVEN KRASSER CHIEF SCIENTIST @SVENKRASSER
  • 2. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. Distinguishing Science… Source: CERN, http://home.cern/sites/home.web.cern.ch/files/image/experiment/2013/01/cms_0.jpeg
  • 3. …from FictionSource: “Chain Reaction,” 20th Century Fox
  • 4. MACHINE LEARNING 101 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 5. EXAMPLES OF MACHINE LEARNING 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. SPAM FILTERING MOVIE RECOMMENDATIONS SIRI (iPHONE)
  • 6. TODAY’S FOCUS: SUPERVISED LEARNING 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 7. TODAY’S FOCUS: GEOMETRIC MODELS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 8. EVERYTHING YOU WILL SEE TODAY IS REAL WORLD DATA 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 9. Some Data to Get Started: 1988 ANTHROPOMETRIC SURVEY OF ARMY PERSONNEL Source: http://mreed.umtri.umich.edu/mreed/downloads.html#anthro 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 10. • Over 4000 soldiers surveyed • Over 100 measurements • Reported by gender Test subjects are in better shape than the rest of us... Data Selection Bias 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 11. FIRST LOOK Height [mm] Density • Difference in distribution • Significant overlap 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 12. SECOND DIMENSION Height [mm] Weight[10-1 kg] • Correlation • Overlap 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 13. FEATURE SELECTION “Buttock Circumference” [mm] Weight[10-1 kg] • Correlation • Gender-specific slope • Reduced overlap • Selection of features matters • How to make a prediction? 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 14. K-NEAREST NEIGHBOR “Buttock Circumference” [mm] Weight[10-1 kg] m f 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 15. SUPPORT VECTOR MACHINE “Buttock Circumference” [mm] Weight[10-1 kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 16. SUPPORT VECTOR MACHINE 2016 CrowdStrike, Inc. All rights reserved. “Buttock Circumference” [mm] Weight[10-1 kg] • Overfitting • Classifier does not generalize • Let’s take a closer look…
  • 17. CROSS VALIDATION 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. TRAIN TRAIN TRAIN TEST TRAIN TRAIN TEST TRAIN TRAIN TEST TRAIN TRAIN TEST TRAIN TRAIN TRAIN • Divide data into k folds • Train on k-1 folds, test on the remaining one • Repeat k times for all folds
  • 18. LET’S CLASSIFY “Buttock Circumference” [mm] Weight[10-1 kg] • Classifier generalizes • Note some misclassifications • Let’s assume we want to detect males (blue) § I.e. “blue” is our positive class 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 19. LET’S CLASSIFY “Buttock Circumference” [mm] Weight[10-1 kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 20. LET’S CLASSIFY “Buttock Circumference” [mm] Weight[10-1 kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 21. LET’S CLASSIFY “Buttock Circumference” [mm] Weight[10-1 kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 22. LET’S CLASSIFY “Buttock Circumference” [mm] Weight[10-1 kg] 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 23. LET’S CLASSIFY “Buttock Circumference” [mm] Weight  [10-­1 kg] • Get more “blue” right (true positives) • Get more “red” wrong (false positives) 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 24. RECEIVER OPERATING CHARACTERISTICS CURVE False Positive Rate TruePositiveRate Detect  more  by  accepting  more  false  positives 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 25. THREE DIMENSIONS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 26. MORE DIMENSIONS Decision Value Density • Linear model in ~160 dimensions • Linearly separable 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 28. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. TREES AND TREE ENSEMBLES
  • 29. SPARSE FEATURES 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 area codes 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
  • 30. N-GRAMS 43 72 6F 77 64 53 74 72 69 6B 65 43726F 776453 747269 726F77 645374 72696B 6F7764 537472 696B65 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 31. MISSION ACCOMPLISHED: WE JUST ADD MORE DIMENSIONS… RIGHT? 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 32. CURSE OF DIMENSIONALITY 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. REDUCED predictive performance INCREASED training time SLOWER classification LARGER memory footprint
  • 35.
  • 36. DIMENSIONALITY AND SPARSENESS 2016 CrowdStrike, Inc. All rights reserved. Height (mm) Weight[10-1 kg]
  • 37. DIMENSIONALITY AND SPARSENESS 2016 CrowdStrike, Inc. All rights reserved. Height (mm) Weight[10-1 kg]
  • 38. MANAGING DIMENSIONALITY 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • FEATURE ELIMINATION – Feature ranking – Stop words • FEATURE REDUCTION – Principal Component Analysis – Autoencoders – Points on lower-dimensional manifold – Stemming • ENSEMBLE METHODS – Classifier of classifiers, e.g. stacking – Bagging and subspace sampling, e.g. Random Forests • And much, much more…
  • 39. SECURITY APPLICATIONS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 40. FILE ANALYSIS AKA Static Analysis 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • THE GOOD – Relatively fast – Scalable – No need to detonate – Platform independent, can be done at gateway – Can support file similarity analysis • THE BAD – Limited insight due to narrow view – Different file types require different techniques – Different subtypes need special consideration – Packed files – .Net – Installers – EXEs vs DLLs – Obfuscations (yet good if detectable) – Ineffective against exploitation and malware-less attacks – Asymmetry: a fraction of a second to decide for the defender, months to craft for the attacker
  • 41. EXAMPLE FEATURES 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. 32/64BIT EXECUTABLE GUI SUBSYSTEM COMMAND LINE SUBSYSTEM FILESIZE TIMESTAMP DEBUG INFORMATION PRESENT PACKERTYPE FILEENTROPY NUMBEROF SECTIONS NUMBER WRITABLE NUMBER READABLE NUMBER EXECUTABLE DISTRIBUTION OFSECTION ENTROPY IMPORTED DLLNAMES IMPORTED FUNCTION NAMES COMPILER ARTIFACTS LINKER ARTIFACTS RESOURCE DATA EMBEDDED PROTOCOL STRINGS EMBEDDED IPS/DOMAINS EMBEDDED PATHS EMBEDDED PRODUCT METADATA DIGITAL SIGNATURE ICON CONTENT …
  • 42. COMBINING FEATURES • Projection to show clusters • For illustration, not the space in that we classify 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 43. EXECUTION ANALYSIS AKA Dynamic Analysis 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • THE GOOD – Captures actual behavior of file – Obfuscating behavior is hard – Effective against exploitation – Effective against malware-less attacks – Not dependent on awareness of specific file types • THE BAD – File needs to be executed – Takes additional time to observe execution – Execution depends on environment (e.g. sandbox vs real world)
  • 44. EXAMPLE: GLOBAL BEHAVIOR § Behavior across many executions of a file § Conducted on event data centrally located in the cloud Krasser, S., Meyer, B., & Crenshaw, P. (2015). Valkyrie: Behavioral Malware Detection using Global Kernel- level Telemetry Data. In Proceedings of the 2015 IEEE International Workshop on Machine Learning for Signal Processing.
  • 45. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. ML VS OTHER TECHNIQUES § ML output is probabilistic § Use other techniques where appropriate § Most ML-based engines use standard hashes or fuzzy hashes on top of a model § Example: credentials theft IoA
  • 46. EVALUATING ML SOLUTIONS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED.
  • 47. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. PRELIMINARIES § ML is not a feature, it is an implementation detail § Every solution must make trade-offs of conflicting objectives § FP vs TP § Speed vs accuracy § Memory footprint vs accuracy § Expressiveness vs explainability § Benchmarks under different assumptions are very hard to compare, even internally § Marchitecture § Looking at the right data: 60% of intrusions do not involve malware
  • 48. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. How much data is there to train on? SCOPE: SCALE § Volume of data generated by sources used § Aperture: footprint of deployment § Data collection § Point of analysis (endpoint, on- prem, cloud)
  • 49. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. How many data sources are used? SCOPE: BREADTH § Varied sources and techniques § Static analysis § Behavioral analysis § Proliferation § Indicators from other techniques § Access to historical data § Baseline § Process lineage § “Number of characteristics” is not a useful metric
  • 50. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. DETECTION RATE § Detection rate w/o false positive rate is meaningless § Considering the base rate is important § System § 100k clean files, 1 malware file § 99% TPR at 0.1% FPR è 100 FPs, 1 TP § Downloads § 1k clean files, 1 malware file § 99% TPR at 0.1% FPR è 1 FP, 1 TP § Sourcing of test files skews results § Number of samples used to measure (often too small) § False Positive Rate §TruePositiveRate
  • 51. APTS & 99% OF MALWARE DETECTED… 2016 CrowdStrike, Inc. All rights reserved.51
  • 52. 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. APTS (CONT.) § Combine techniques to offset tradeoffs § Static and behavioral § ML and non-ML § Lean local techniques and heavy-weight cloud techniques § Avoid silent failure: what happens when the adversary made it onto the system? § Avoid brittle techniques: does the solution depend on the attacker not having access to detection results?
  • 53. KEY POINTS 2016 CROWDSTRIKE, INC. ALL RIGHTS RESERVED. • Machine Learning is an important part of the security tool chest • Hidden untapped structure in your data • Various trade-offs, most importantly between true and false positives • Dimensionality is good…until it’s not • Not all dimensions are created equal • Comprehensive coverage by combining techniques