SlideShare une entreprise Scribd logo
1  sur  26
a talk
Ryan Wang (@ryw90)
If it weighs the same as a duck
Detecting fraud with Python and machine learning
Outline
• Why do we use machine learning?
• Overview of our pipeline
• What does it take to update a model?
What is Stripe?
• Collect payments viaAPI
• Most users charge credit cards
import stripe
stripe.Charge.create(
amount='100',
currency='usd',
source={
object='card',
number='4242 4242 4242 4242',
...
}
)
Things fraudsters do
• Typical fraudster buys stolen credit cards then:
• Creates fake Stripe accounts
• Buys goods from legitimate Stripe users
• Others test / brute force credentials
Witches easier to spot than fraud
Stopping fraud v1
• Manual rules and aggressive blacklisting
• Scaling issues
• Hard to control precision
• Complexity grows quickly
• Little generalization
• But important infrastructure built
• Tools for manual investigation
• Graph search
Stopping fraud v2
• Tree-based models to estimate p(fraud | features)
• Target composite outcome
• Disputes,
• Manual tags
• Information from card networks
• Python as glue
Qualita've*
feedback*
Feature*
engineering*
Model*
training*
Model*
evalua'on*
Model*
deployment*
In order of work required
• Model evaluation
• Feature engineering
• Model training
• Qualitative feedback
• Monitoring / deployment
What does it take to update a model?
Feature engineering aka counting stuff
Types of features
• Static features useful on the margin
• Card from risky country?
• Billing details consistent?
• Dynamic features really useful
• Velocity of charges from email recently?
• Utilize network information
Feature pipeline
• Slow Hadoop jobs compute features
• Sampling doesn’t really help
• Luigi manages dependencies
• Only re-run jobs with changes
• Load results to database
• http://www.github.com/spotify/luigi
Raw$
Charges$
Sta-c$
features$
Card$
features$
Email$
features$
Joined$
features$
Training$
Outcomes$
Feature pipeline (cont.)
@redshift('transactionfraud.features')
class JoinFeatures(luigi.WrapperTask):
def requires(self):
components = [
'static_features',
'dynamic_card_features',
'dynamic_email_features',
'outcomes',
]
return [FeatureTask(c) for c in components]
def job(self):
return ScaldingJob(
job='JoinFeatures',
output=self.output().path,
**self.requires()
)
Feature pipeline (cont.)
import com.twitter.scalding._
import com.stripe.thrift.Charge
class DynamicIpFeatures(args: Args) extends Job(args) {
val charges = load[Charge](args("charges"))
val historicalCounts = getHistoricalCounts(charges)
historicalCounts
.map { case (chargeId, counts) =>
IpFeatures(
chargeId = chargeId,
feature1 = counts.feature1,
feature2 = counts.feature2,
...
)
}
.save
}
The curious case of email
Model debugging
• Added dynamic email features to model
• Velocity of charges from email recently?
• Quantitative measures good
• High feature importance
• Overall model performance improved
• Weird issues in staging
• Systematic false positives
• High velocity did not yield higher p(fraud)
Model debugging (cont.)
• Old fashioned data analysis reveals…
• Likelihood of fraud much higher when email undefined
than when defined
• p(fraud | email undefined) = ~14%
• p(fraud | email defined) = ~5%
• In other words, email missing “predictive” of fraud
Model debugging (cont.)
• Email attribute of Customer
• If credit card declined during customer creation*,
fails with `CardError`
• Fraud correlated with decline, thus missing email
stripe.Customer.create(
source={
'object': 'card',
# Test card for declines
'number': '4000000000000002',
'exp_year': '2016',
'exp_month': 1,
}
)
* Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly
• Apply this model on live traffic:
Model debugging (cont.)
• Data is generated according to:
stripe.Customer.create.
Card.declined.
(correlated.with.fraud).
No.customer.
(customer.email).
A"empt'charge'
without'email'
P(fraud'|'no'email)'>>'
P(fraud'|'email)'
Model'blocks'
charge'
Is the model any good?
Model evaluation
• Topmodel
• Flask app that charts and organizes output
from binary classifiers
• Cross between a lab notebook and Kaggle
• Feedback / PRs appreciated!
• https://github.com/stripe/topmodel
Model evaluation (cont.)
• Regularly generate ground truth and
benchmarks existing models
• Newly trained models automatically compared
test_y, test_start, test_end = 
topmodel_integration.retrieve_actuals(path)
test_X = query_to_df(
model.spec.sql_query()), test_start, test_end)
metadata = model.metadata()
results = model.score_and_format(test_y, test_X)
topmodel_integration.send_dataframe_to_s3(results, metadata)
Model evaluation (cont.)
• Maintaining reproducibility annoying
• Originally store pickled models on S3
• But wrapper code sometimes changes
• But sklearn sometimes changes
Summary
• Python glues together whole pipeline
• Adding a simple feature can be hard
• Spend a lot of time on feature
engineering, model evaluation
Questions?

Contenu connexe

Tendances

Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)ajmal anbu
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsChristian Gügi
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION K Srinivas Rao
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithmsankit panigrahy
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detectionPEIPEI HAN
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksDatabricks
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraudHenley Walls
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly DetectionKenneth Graham
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 

Tendances (20)

Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Real-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment TransactionsReal-Time Fraud Detection in Payment Transactions
Real-Time Fraud Detection in Payment Transactions
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detection
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraud
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
An Introduction to Anomaly Detection
An Introduction to Anomaly DetectionAn Introduction to Anomaly Detection
An Introduction to Anomaly Detection
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 

En vedette

PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationScott Mongeau
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsGreg Makowski
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learningAdam Gibson
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomSudarson Roy Pratihar
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detectionMk Kim
 
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)Amazon Web Services
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analyticshkbhadraa
 
Fraud in the Banking Sector
Fraud in the Banking Sector Fraud in the Banking Sector
Fraud in the Banking Sector Venktesh Venke
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlDominic Sroda Korkoryi
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnKan Ouivirach, Ph.D.
 
Masters thesis - Fraud & Big Data
Masters thesis - Fraud & Big DataMasters thesis - Fraud & Big Data
Masters thesis - Fraud & Big DataStephanie Canovas
 
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical AnalysisVMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical AnalysisCorporate Technologies
 
Operations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the othersOperations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the othersChristian Heitkamp
 

En vedette (20)

Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
ACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and MitigationACFE Presentation on Analytics for Fraud Detection and Mitigation
ACFE Presentation on Analytics for Fraud Detection and Mitigation
 
Using Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical ApplicationsUsing Deep Learning to do Real-Time Scoring in Practical Applications
Using Deep Learning to do Real-Time Scoring in Practical Applications
 
Anomaly detection in deep learning
Anomaly detection in deep learningAnomaly detection in deep learning
Anomaly detection in deep learning
 
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for TelecomFraud Analytics with Machine Learning and Big Data Engineering for Telecom
Fraud Analytics with Machine Learning and Big Data Engineering for Telecom
 
Bigdata based fraud detection
Bigdata based fraud detectionBigdata based fraud detection
Bigdata based fraud detection
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
AWS re:Invent 2016: Fraud Detection with Amazon Machine Learning on AWS (FIN301)
 
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time AnalyticsHadoop BIG Data - Fraud Detection with Real-Time Analytics
Hadoop BIG Data - Fraud Detection with Real-Time Analytics
 
Fraud in the Banking Sector
Fraud in the Banking Sector Fraud in the Banking Sector
Fraud in the Banking Sector
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
SnapChat Resume
SnapChat ResumeSnapChat Resume
SnapChat Resume
 
Exploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-LearnExploring Machine Learning in Python with Scikit-Learn
Exploring Machine Learning in Python with Scikit-Learn
 
Tensor flow
Tensor flowTensor flow
Tensor flow
 
Masters thesis - Fraud & Big Data
Masters thesis - Fraud & Big DataMasters thesis - Fraud & Big Data
Masters thesis - Fraud & Big Data
 
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical AnalysisVMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
VMware vSphere Vs. Microsoft Hyper-V: A Technical Analysis
 
Operations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the othersOperations Management Suite, the Penguins and the others
Operations Management Suite, the Penguins and the others
 
OMS Overview
OMS OverviewOMS Overview
OMS Overview
 

Similaire à Detecting fraud with Python and machine learning

Hack in Cash out OWASP London
Hack in Cash out OWASP LondonHack in Cash out OWASP London
Hack in Cash out OWASP LondonPayment Village
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionSid Anand
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchJonathan LeBlanc
 
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...Emagia
 
Lockbox and remittance data extraction with ai
Lockbox and remittance data extraction with aiLockbox and remittance data extraction with ai
Lockbox and remittance data extraction with aiEmagia
 
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptxShare Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptxyatintaneja6
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The MoneyResilient Systems
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayIRJET Journal
 
Netmera_Presentation.pdf
Netmera_Presentation.pdfNetmera_Presentation.pdf
Netmera_Presentation.pdfMustafa Kuğu
 
Ch 7: Attacking Session Management
Ch 7: Attacking Session ManagementCh 7: Attacking Session Management
Ch 7: Attacking Session ManagementSam Bowne
 
Email_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdfEmail_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdfFahim392515
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph insideTigerGraph
 
Abidin, zainal IBM Software "Data is a New Oil"
Abidin, zainal  IBM Software "Data is a New Oil"Abidin, zainal  IBM Software "Data is a New Oil"
Abidin, zainal IBM Software "Data is a New Oil"Zainal Abidin
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersBrian Griffith
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with GilmourAditya Godbole
 
The Target Breach - Follow The Money EU
The Target Breach - Follow The Money EUThe Target Breach - Follow The Money EU
The Target Breach - Follow The Money EUResilient Systems
 
Reduce lockbox fees with data capture ai
Reduce lockbox fees with data capture aiReduce lockbox fees with data capture ai
Reduce lockbox fees with data capture aiEmagia
 

Similaire à Detecting fraud with Python and machine learning (20)

Hack in Cash out OWASP London
Hack in Cash out OWASP LondonHack in Cash out OWASP London
Hack in Cash out OWASP London
 
Low Latency Fraud Detection & Prevention
Low Latency Fraud Detection & PreventionLow Latency Fraud Detection & Prevention
Low Latency Fraud Detection & Prevention
 
Creating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from ScratchCreating an In-Aisle Purchasing System from Scratch
Creating an In-Aisle Purchasing System from Scratch
 
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
Save lockbox fees with remit data capture AI | Lockbox Automation Software | ...
 
Lockbox and remittance data extraction with ai
Lockbox and remittance data extraction with aiLockbox and remittance data extraction with ai
Lockbox and remittance data extraction with ai
 
EAC-VB2023.pdf
EAC-VB2023.pdfEAC-VB2023.pdf
EAC-VB2023.pdf
 
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptxShare Credit_Card_Fraud_Detection_ML_MP (1).pptx
Share Credit_Card_Fraud_Detection_ML_MP (1).pptx
 
The Target Breach – Follow The Money
The Target Breach – Follow The MoneyThe Target Breach – Follow The Money
The Target Breach – Follow The Money
 
Review on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment GatewayReview on Fraud Detection in Electronic Payment Gateway
Review on Fraud Detection in Electronic Payment Gateway
 
Netmera_Presentation.pdf
Netmera_Presentation.pdfNetmera_Presentation.pdf
Netmera_Presentation.pdf
 
Ch 7: Attacking Session Management
Ch 7: Attacking Session ManagementCh 7: Attacking Session Management
Ch 7: Attacking Session Management
 
Email_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdfEmail_Account_Compromise_VB_2023_Final 2.pdf
Email_Account_Compromise_VB_2023_Final 2.pdf
 
AI_finance_Module-3.pptx
AI_finance_Module-3.pptxAI_finance_Module-3.pptx
AI_finance_Module-3.pptx
 
Fraud prevention is better with TigerGraph inside
Fraud prevention is better with  TigerGraph insideFraud prevention is better with  TigerGraph inside
Fraud prevention is better with TigerGraph inside
 
Abidin, zainal IBM Software "Data is a New Oil"
Abidin, zainal  IBM Software "Data is a New Oil"Abidin, zainal  IBM Software "Data is a New Oil"
Abidin, zainal IBM Software "Data is a New Oil"
 
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its CustomersHow Eastern Bank Uses Big Data to Better Serve and Protect its Customers
How Eastern Bank Uses Big Data to Better Serve and Protect its Customers
 
Technical Challenges Facing e-Payment
Technical Challenges Facing e-PaymentTechnical Challenges Facing e-Payment
Technical Challenges Facing e-Payment
 
Micro-service architectures with Gilmour
Micro-service architectures with GilmourMicro-service architectures with Gilmour
Micro-service architectures with Gilmour
 
The Target Breach - Follow The Money EU
The Target Breach - Follow The Money EUThe Target Breach - Follow The Money EU
The Target Breach - Follow The Money EU
 
Reduce lockbox fees with data capture ai
Reduce lockbox fees with data capture aiReduce lockbox fees with data capture ai
Reduce lockbox fees with data capture ai
 

Dernier

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Dernier (20)

Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Detecting fraud with Python and machine learning

  • 1. a talk Ryan Wang (@ryw90) If it weighs the same as a duck Detecting fraud with Python and machine learning
  • 2. Outline • Why do we use machine learning? • Overview of our pipeline • What does it take to update a model?
  • 3. What is Stripe? • Collect payments viaAPI • Most users charge credit cards import stripe stripe.Charge.create( amount='100', currency='usd', source={ object='card', number='4242 4242 4242 4242', ... } )
  • 4. Things fraudsters do • Typical fraudster buys stolen credit cards then: • Creates fake Stripe accounts • Buys goods from legitimate Stripe users • Others test / brute force credentials
  • 5. Witches easier to spot than fraud
  • 6. Stopping fraud v1 • Manual rules and aggressive blacklisting • Scaling issues • Hard to control precision • Complexity grows quickly • Little generalization • But important infrastructure built • Tools for manual investigation • Graph search
  • 7. Stopping fraud v2 • Tree-based models to estimate p(fraud | features) • Target composite outcome • Disputes, • Manual tags • Information from card networks • Python as glue
  • 8. Qualita've* feedback* Feature* engineering* Model* training* Model* evalua'on* Model* deployment* In order of work required • Model evaluation • Feature engineering • Model training • Qualitative feedback • Monitoring / deployment
  • 9. What does it take to update a model?
  • 10. Feature engineering aka counting stuff
  • 11. Types of features • Static features useful on the margin • Card from risky country? • Billing details consistent? • Dynamic features really useful • Velocity of charges from email recently? • Utilize network information
  • 12. Feature pipeline • Slow Hadoop jobs compute features • Sampling doesn’t really help • Luigi manages dependencies • Only re-run jobs with changes • Load results to database • http://www.github.com/spotify/luigi Raw$ Charges$ Sta-c$ features$ Card$ features$ Email$ features$ Joined$ features$ Training$ Outcomes$
  • 13. Feature pipeline (cont.) @redshift('transactionfraud.features') class JoinFeatures(luigi.WrapperTask): def requires(self): components = [ 'static_features', 'dynamic_card_features', 'dynamic_email_features', 'outcomes', ] return [FeatureTask(c) for c in components] def job(self): return ScaldingJob( job='JoinFeatures', output=self.output().path, **self.requires() )
  • 14. Feature pipeline (cont.) import com.twitter.scalding._ import com.stripe.thrift.Charge class DynamicIpFeatures(args: Args) extends Job(args) { val charges = load[Charge](args("charges")) val historicalCounts = getHistoricalCounts(charges) historicalCounts .map { case (chargeId, counts) => IpFeatures( chargeId = chargeId, feature1 = counts.feature1, feature2 = counts.feature2, ... ) } .save }
  • 15. The curious case of email
  • 16. Model debugging • Added dynamic email features to model • Velocity of charges from email recently? • Quantitative measures good • High feature importance • Overall model performance improved • Weird issues in staging • Systematic false positives • High velocity did not yield higher p(fraud)
  • 17. Model debugging (cont.) • Old fashioned data analysis reveals… • Likelihood of fraud much higher when email undefined than when defined • p(fraud | email undefined) = ~14% • p(fraud | email defined) = ~5% • In other words, email missing “predictive” of fraud
  • 18. Model debugging (cont.) • Email attribute of Customer • If credit card declined during customer creation*, fails with `CardError` • Fraud correlated with decline, thus missing email stripe.Customer.create( source={ 'object': 'card', # Test card for declines 'number': '4000000000000002', 'exp_year': '2016', 'exp_month': 1, } ) * Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly
  • 19. • Apply this model on live traffic: Model debugging (cont.) • Data is generated according to: stripe.Customer.create. Card.declined. (correlated.with.fraud). No.customer. (customer.email). A"empt'charge' without'email' P(fraud'|'no'email)'>>' P(fraud'|'email)' Model'blocks' charge'
  • 20. Is the model any good?
  • 21. Model evaluation • Topmodel • Flask app that charts and organizes output from binary classifiers • Cross between a lab notebook and Kaggle • Feedback / PRs appreciated! • https://github.com/stripe/topmodel
  • 22.
  • 23. Model evaluation (cont.) • Regularly generate ground truth and benchmarks existing models • Newly trained models automatically compared test_y, test_start, test_end = topmodel_integration.retrieve_actuals(path) test_X = query_to_df( model.spec.sql_query()), test_start, test_end) metadata = model.metadata() results = model.score_and_format(test_y, test_X) topmodel_integration.send_dataframe_to_s3(results, metadata)
  • 24. Model evaluation (cont.) • Maintaining reproducibility annoying • Originally store pickled models on S3 • But wrapper code sometimes changes • But sklearn sometimes changes
  • 25. Summary • Python glues together whole pipeline • Adding a simple feature can be hard • Spend a lot of time on feature engineering, model evaluation