SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
"You can't just turn the crank"
Machine learning for fighting abuse on the consumer web
David Freeman
Research Scientist/Engineer, Facebook Inc.
ScAINet 2018
Atlanta, GA USA, 11 May 2018
The consumer web
What do they try to do?
Malware
Payment

Fraud
Scraping
Click
Fraud
Phishing Spam Social

Engineering
Fake
Products
Scams
"Like"
FraudPromotion
Fraud
Identity
Theft
What do we see?
Fake
Reviews
Misinfor-
mation
Financial
Theft
Account
Resale
Fundamental question: Which requests are bad?
• Perfect for machine learning!
What could possibly go wrong?
Machine learning workflow
Label
Train
Validate
Launch
Measure
Profit!Lots!
How do we obtain labeled data?
(hint: not from your users)
Machine learning workflow
Label
• Human labeling of random samples.
• Labelers don't always know what they're looking for
• Labelers are inconsistent (with themselves and each other)
• Labelers get tired (esp. if most samples are good)
• Apply crowdsourcing best practices:
• Precise definitions, multiple labeling, ML-assisted sampling
• But will it scale?
Labeling: Gold standard
Objective measurement
• Find high-precision signals of badness
• Examples: unusual user-agent, malformed header
• DO NOT BLOCK ON THESE SIGNALS
• They are controlled by the adversary
• When the adversary adapts you will lose visibility
• Automatically generate signals using anomaly detection.
Labeling: Silver standard
Automatic labeling
• Use whatever you have!
• CS data, rules, other models

• Mitigate risks of blindness and feedback loops:
• Oversample manually labeled examples
• Oversample false positives and false negatives when retraining.
• Undersample positive examples from previous iterations of this model.
• Sample and label examples near the decision boundary
Labeling: Bronze standard
Be scrappy
• Users are terrible at
reporting.
• Product flows bias
reporting.
• Reports can be gamed.
• Reports can serve as an
directional measure.
Labeling: Iron standard
Have your users do the work
• Segment the problem
• e.g. status with link from country X
• Downsample intelligently
• if your distribution is lumpy, sample from all the lumps
• Learning the prior vs. focusing on the bad stuff
• no golden rule here -- you have to experiment
Assembling a training set
Labeling is just the beginning
{Training set 2
{
Model v2
Refreshing your data
Don't forget the past!
{Training set 1 {
Model v1
Mitigation:
• Keep old attacks around (exponential decay?)
• Keep old models around (raise thresholds?)
{
Training set N
{
Model vN
How do you know your model is ready to go?
Machine learning workflow
Train
Validate
• Labels aren't perfect
• Often miss on recall

• Models interact with each other
• Use offline P-R and ROC to stack-rank model candidates
Validating Performance
Don't trust offline replay Model B
FP
Model A
• Fundamental A/B testing assumption:

Experiment effects are independent of the cohorts chosen


The Perils of A/B Testing
The Perils of A/B Testing
A B
X
• Looks good so far....
Start with a small experiment
The Perils of A/B Testing
A B
X
• Did the adversary give up or iterate?
Roll it out to (almost) everyone — Option 1
The Perils of A/B Testing
A B
• Now your experiment is a vulnerability
Roll it out to (almost) everyone — Option 2
• Run new model online in "log-only" mode
• Evaluate performance where the new
model disagrees with the old one.
• ideally via sampling & labeling
• Push based on FP/FN tradeoff
Using Shadow Mode
Prod model
FP
New model
How do you figure out if it worked?
Machine learning workflow
Launch
Measure
True Positives Don't Matter
What's happening here?
Time
Precision
• Really want # of good users affected
• Solution: use one minus specificity (aka FPR)
True Positives Don't Matter
What's happening here?
Time
TP
Time
FP vs.
Time
FP
Time
TP
1
TN
FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
Not so fast!
Machine learning workflow
Profit!Adapt!
What not to Do (I)
Show the adversary what your limits are
Message 500 people
Message 400 people
Message 300 people
🛑
🛑
✅
• Introduce delay in blocking
response (and/or)

• Undo the damage without
telling the user.
What to do (I)
Don't give immediate feedback
"We don't want to be the ones solving the CAPTCHAs"
What not to Do (II)
Look for specific content to block
What to Do (II)
Focus on bad behavior, not only bad content
What to Do (III)
Use data the adversary doesn't know/control
Scoring at Entry Points
prevent access to accounts
Clustering, Anomaly Detection
prevent accounts from doing damage
User Reporting
find false negatives
Behavioral Analysis
detect bad activityIncreasing
speed
More
information
available
What to Do (IV)
Defense in depth
• Think about each step of the ML process.
• It's hard to build a good training set.
• Adversarial adaptation breaks many assumptions.
• Control the data & the response.
Take aways
Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis

Contenu connexe

Similaire à "You can't just turn the crank": Machine learning for fighting abuse on the consumer web

Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachAjit Ghodke
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning SystemsAnuj Gupta
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDatabricks
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionBigML, Inc
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...Hakka Labs
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxkprasad8
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandrySri Ambati
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsSri Ambati
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesAnuj Gupta
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9Roger Barga
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examplesFelipe
 
An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)Rodrigo Lopes
 
Supervised learning
Supervised learningSupervised learning
Supervised learningAlia Hamwi
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Manu Mukerji
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...QA or the Highway
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...Quantopian
 
Machine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PMMachine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PMProduct School
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemPierre Gutierrez
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningArshad Ahmed
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning Sean Yu
 

Similaire à "You can't just turn the crank": Machine learning for fighting abuse on the consumer web (20)

Machine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approachMachine-Learning-Overview a statistical approach
Machine-Learning-Overview a statistical approach
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Drifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in ProductionDrifting Away: Testing ML Models in Production
Drifting Away: Testing ML Models in Production
 
MLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model SelectionMLSEV Virtual. Automating Model Selection
MLSEV Virtual. Automating Model Selection
 
DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...DataEngConf SF16 - Three lessons learned from building a production machine l...
DataEngConf SF16 - Three lessons learned from building a production machine l...
 
AI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptxAI-900 - Fundamental Principles of ML.pptx
AI-900 - Fundamental Principles of ML.pptx
 
H2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark LandryH2O World - Top 10 Data Science Pitfalls - Mark Landry
H2O World - Top 10 Data Science Pitfalls - Mark Landry
 
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
 
Continuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakesContinuous Learning Systems: Building ML systems that learn from their mistakes
Continuous Learning Systems: Building ML systems that learn from their mistakes
 
Barga Data Science lecture 9
Barga Data Science lecture 9Barga Data Science lecture 9
Barga Data Science lecture 9
 
Online Machine Learning: introduction and examples
Online Machine Learning:  introduction and examplesOnline Machine Learning:  introduction and examples
Online Machine Learning: introduction and examples
 
An Overview of automated testing (1)
An Overview of automated testing (1)An Overview of automated testing (1)
An Overview of automated testing (1)
 
Supervised learning
Supervised learningSupervised learning
Supervised learning
 
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018 Machine Learning in Production: Manu Mukerji, Strata CA March 2018
Machine Learning in Production: Manu Mukerji, Strata CA March 2018
 
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
Scale your Testing and Quality with Automation Engineering and ML - Carlos Ki...
 
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris..."A Framework for Developing Trading Models Based on Machine Learning" by Kris...
"A Framework for Developing Trading Models Based on Machine Learning" by Kris...
 
Machine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PMMachine Learning 101 for Product Managers by Amazon Sr PM
Machine Learning 101 for Product Managers by Amazon Sr PM
 
From Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender systemFrom Labelling Open data images to building a private recommender system
From Labelling Open data images to building a private recommender system
 
Machine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine LearningMachine Learning without the Math: An overview of Machine Learning
Machine Learning without the Math: An overview of Machine Learning
 
NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning NTU DBME5028 Week5 Introduction to Machine Learning
NTU DBME5028 Week5 Introduction to Machine Learning
 

Dernier

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 

Dernier (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

"You can't just turn the crank": Machine learning for fighting abuse on the consumer web

  • 1. "You can't just turn the crank" Machine learning for fighting abuse on the consumer web David Freeman Research Scientist/Engineer, Facebook Inc. ScAINet 2018 Atlanta, GA USA, 11 May 2018
  • 3. What do they try to do? Malware Payment
 Fraud Scraping Click Fraud Phishing Spam Social
 Engineering Fake Products Scams "Like" FraudPromotion Fraud Identity Theft What do we see? Fake Reviews Misinfor- mation Financial Theft Account Resale Fundamental question: Which requests are bad? • Perfect for machine learning!
  • 4. What could possibly go wrong? Machine learning workflow Label Train Validate Launch Measure Profit!Lots!
  • 5. How do we obtain labeled data? (hint: not from your users) Machine learning workflow Label
  • 6. • Human labeling of random samples. • Labelers don't always know what they're looking for • Labelers are inconsistent (with themselves and each other) • Labelers get tired (esp. if most samples are good) • Apply crowdsourcing best practices: • Precise definitions, multiple labeling, ML-assisted sampling • But will it scale? Labeling: Gold standard Objective measurement
  • 7. • Find high-precision signals of badness • Examples: unusual user-agent, malformed header • DO NOT BLOCK ON THESE SIGNALS • They are controlled by the adversary • When the adversary adapts you will lose visibility • Automatically generate signals using anomaly detection. Labeling: Silver standard Automatic labeling
  • 8. • Use whatever you have! • CS data, rules, other models
 • Mitigate risks of blindness and feedback loops: • Oversample manually labeled examples • Oversample false positives and false negatives when retraining. • Undersample positive examples from previous iterations of this model. • Sample and label examples near the decision boundary Labeling: Bronze standard Be scrappy
  • 9. • Users are terrible at reporting. • Product flows bias reporting. • Reports can be gamed. • Reports can serve as an directional measure. Labeling: Iron standard Have your users do the work
  • 10. • Segment the problem • e.g. status with link from country X • Downsample intelligently • if your distribution is lumpy, sample from all the lumps • Learning the prior vs. focusing on the bad stuff • no golden rule here -- you have to experiment Assembling a training set Labeling is just the beginning
  • 11. {Training set 2 { Model v2 Refreshing your data Don't forget the past! {Training set 1 { Model v1 Mitigation: • Keep old attacks around (exponential decay?) • Keep old models around (raise thresholds?) { Training set N { Model vN
  • 12. How do you know your model is ready to go? Machine learning workflow Train Validate
  • 13. • Labels aren't perfect • Often miss on recall
 • Models interact with each other • Use offline P-R and ROC to stack-rank model candidates Validating Performance Don't trust offline replay Model B FP Model A
  • 14. • Fundamental A/B testing assumption:
 Experiment effects are independent of the cohorts chosen 
 The Perils of A/B Testing
  • 15. The Perils of A/B Testing A B X • Looks good so far.... Start with a small experiment
  • 16. The Perils of A/B Testing A B X • Did the adversary give up or iterate? Roll it out to (almost) everyone — Option 1
  • 17. The Perils of A/B Testing A B • Now your experiment is a vulnerability Roll it out to (almost) everyone — Option 2
  • 18. • Run new model online in "log-only" mode • Evaluate performance where the new model disagrees with the old one. • ideally via sampling & labeling • Push based on FP/FN tradeoff Using Shadow Mode Prod model FP New model
  • 19. How do you figure out if it worked? Machine learning workflow Launch Measure
  • 20. True Positives Don't Matter What's happening here? Time Precision
  • 21. • Really want # of good users affected • Solution: use one minus specificity (aka FPR) True Positives Don't Matter What's happening here? Time TP Time FP vs. Time FP Time TP 1 TN FP + TN<latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit><latexit sha1_base64="FfpNUyvhzZ48h+ueQ2Hy3cgzX50=">AAACDXicbZDLSgMxFIYz9VbrbdSlm2AVBLHMiKDLoiCupEJv0BlKJs20oUlmSDJiGeYF3Pgqblwo4ta9O9/GtJ2Ftv4Q+PKfc0jOH8SMKu0431ZhYXFpeaW4Wlpb39jcsrd3mipKJCYNHLFItgOkCKOCNDTVjLRjSRAPGGkFw6txvXVPpKKRqOtRTHyO+oKGFCNtrK594MIT6IUS4dTjQfSQ1m+zLMfrGjyG43vXLjsVZyI4D24OZZCr1rW/vF6EE06Exgwp1XGdWPspkppiRrKSlygSIzxEfdIxKBAnyk8n22Tw0Dg9GEbSHKHhxP09kSKu1IgHppMjPVCztbH5X62T6PDCT6mIE00Enj4UJgzqCI6jgT0qCdZsZABhSc1fIR4gE402AZZMCO7syvPQPK24TsW9OytXL/M4imAP7IMj4IJzUAU3oAYaAINH8AxewZv1ZL1Y79bHtLVg5TO74I+szx8ZwZry</latexit>
  • 22. Not so fast! Machine learning workflow Profit!Adapt!
  • 23. What not to Do (I) Show the adversary what your limits are Message 500 people Message 400 people Message 300 people 🛑 🛑 ✅
  • 24. • Introduce delay in blocking response (and/or)
 • Undo the damage without telling the user. What to do (I) Don't give immediate feedback
  • 25. "We don't want to be the ones solving the CAPTCHAs" What not to Do (II) Look for specific content to block
  • 26. What to Do (II) Focus on bad behavior, not only bad content
  • 27. What to Do (III) Use data the adversary doesn't know/control
  • 28. Scoring at Entry Points prevent access to accounts Clustering, Anomaly Detection prevent accounts from doing damage User Reporting find false negatives Behavioral Analysis detect bad activityIncreasing speed More information available What to Do (IV) Defense in depth
  • 29. • Think about each step of the ML process. • It's hard to build a good training set. • Adversarial adaptation breaks many assumptions. • Control the data & the response. Take aways Thanks to: Hervé Robert, Isaac Fullinwider, Henry Lu, Sagar Patel, Hongyang Li, Nektarios Leontiadis