6. Stopping fraud v1
• Manual rules and aggressive blacklisting
• Scaling issues
• Hard to control precision
• Complexity grows quickly
• Little generalization
• But important infrastructure built
• Tools for manual investigation
• Graph search
7. Stopping fraud v2
• Tree-based models to estimate p(fraud | features)
• Target composite outcome
• Disputes,
• Manual tags
• Information from card networks
• Python as glue
11. Types of features
• Static features useful on the margin
• Card from risky country?
• Billing details consistent?
• Dynamic features really useful
• Velocity of charges from email recently?
• Utilize network information
12. Feature pipeline
• Slow Hadoop jobs compute features
• Sampling doesn’t really help
• Luigi manages dependencies
• Only re-run jobs with changes
• Load results to database
• http://www.github.com/spotify/luigi
Raw$
Charges$
Sta-c$
features$
Card$
features$
Email$
features$
Joined$
features$
Training$
Outcomes$
13. Feature pipeline (cont.)
@redshift('transactionfraud.features')
class JoinFeatures(luigi.WrapperTask):
def requires(self):
components = [
'static_features',
'dynamic_card_features',
'dynamic_email_features',
'outcomes',
]
return [FeatureTask(c) for c in components]
def job(self):
return ScaldingJob(
job='JoinFeatures',
output=self.output().path,
**self.requires()
)
16. Model debugging
• Added dynamic email features to model
• Velocity of charges from email recently?
• Quantitative measures good
• High feature importance
• Overall model performance improved
• Weird issues in staging
• Systematic false positives
• High velocity did not yield higher p(fraud)
17. Model debugging (cont.)
• Old fashioned data analysis reveals…
• Likelihood of fraud much higher when email undefined
than when defined
• p(fraud | email undefined) = ~14%
• p(fraud | email defined) = ~5%
• In other words, email missing “predictive” of fraud
18. Model debugging (cont.)
• Email attribute of Customer
• If credit card declined during customer creation*,
fails with `CardError`
• Fraud correlated with decline, thus missing email
stripe.Customer.create(
source={
'object': 'card',
# Test card for declines
'number': '4000000000000002',
'exp_year': '2016',
'exp_month': 1,
}
)
* Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly
19. • Apply this model on live traffic:
Model debugging (cont.)
• Data is generated according to:
stripe.Customer.create.
Card.declined.
(correlated.with.fraud).
No.customer.
(customer.email).
A"empt'charge'
without'email'
P(fraud'|'no'email)'>>'
P(fraud'|'email)'
Model'blocks'
charge'
21. Model evaluation
• Topmodel
• Flask app that charts and organizes output
from binary classifiers
• Cross between a lab notebook and Kaggle
• Feedback / PRs appreciated!
• https://github.com/stripe/topmodel
24. Model evaluation (cont.)
• Maintaining reproducibility annoying
• Originally store pickled models on S3
• But wrapper code sometimes changes
• But sklearn sometimes changes
25. Summary
• Python glues together whole pipeline
• Adding a simple feature can be hard
• Spend a lot of time on feature
engineering, model evaluation