Detecting fraud with Python and machine learning

a talk
Ryan Wang (@ryw90)
If it weighs the same as a duck
Detecting fraud with Python and machine learning

Outline
• Why do we use machine learning?
• Overview of our pipeline
• What does it take to update a model?

What is Stripe?
• Collect payments viaAPI
• Most users charge credit cards
import stripe
stripe.Charge.create(
amount='100',
currency='usd',
source={
object='card',
number='4242 4242 4242 4242',
...
}
)

Things fraudsters do
• Typical fraudster buys stolen credit cards then:
• Creates fake Stripe accounts
• Buys goods from legitimate Stripe users
• Others test / brute force credentials

Witches easier to spot than fraud

Stopping fraud v1
• Manual rules and aggressive blacklisting
• Scaling issues
• Hard to control precision
• Complexity grows quickly
• Little generalization
• But important infrastructure built
• Tools for manual investigation
• Graph search

Stopping fraud v2
• Tree-based models to estimate p(fraud | features)
• Target composite outcome
• Disputes,
• Manual tags
• Information from card networks
• Python as glue

Qualita've*
feedback*
Feature*
engineering*
Model*
training*
Model*
evalua'on*
Model*
deployment*
In order of work required
• Model evaluation
• Feature engineering
• Model training
• Qualitative feedback
• Monitoring / deployment

What does it take to update a model?

Feature engineering aka counting stuff

Types of features
• Static features useful on the margin
• Card from risky country?
• Billing details consistent?
• Dynamic features really useful
• Velocity of charges from email recently?
• Utilize network information

Feature pipeline
• Slow Hadoop jobs compute features
• Sampling doesn’t really help
• Luigi manages dependencies
• Only re-run jobs with changes
• Load results to database
• http://www.github.com/spotify/luigi
Raw$
Charges$
Sta-c$
features$
Card$
features$
Email$
features$
Joined$
features$
Training$
Outcomes$

Feature pipeline (cont.)
@redshift('transactionfraud.features')
class JoinFeatures(luigi.WrapperTask):
def requires(self):
components = [
'static_features',
'dynamic_card_features',
'dynamic_email_features',
'outcomes',
]
return [FeatureTask(c) for c in components]
def job(self):
return ScaldingJob(
job='JoinFeatures',
output=self.output().path,
**self.requires()
)

Feature pipeline (cont.)
import com.twitter.scalding._
import com.stripe.thrift.Charge
class DynamicIpFeatures(args: Args) extends Job(args) {
val charges = load[Charge](args("charges"))
val historicalCounts = getHistoricalCounts(charges)
historicalCounts
.map { case (chargeId, counts) =>
IpFeatures(
chargeId = chargeId,
feature1 = counts.feature1,
feature2 = counts.feature2,
...
)
}
.save
}

Model debugging
• Added dynamic email features to model
• Velocity of charges from email recently?
• Quantitative measures good
• High feature importance
• Overall model performance improved
• Weird issues in staging
• Systematic false positives
• High velocity did not yield higher p(fraud)

Model debugging (cont.)
• Old fashioned data analysis reveals…
• Likelihood of fraud much higher when email undefined
than when defined
• p(fraud | email undefined) = ~14%
• p(fraud | email defined) = ~5%
• In other words, email missing “predictive” of fraud

• Email attribute of Customer
• If credit card declined during customer creation*,
fails with `CardError`
• Fraud correlated with decline, thus missing email
stripe.Customer.create(
source={
'object': 'card',
# Test card for declines
'number': '4000000000000002',
'exp_year': '2016',
'exp_month': 1,
}
)
* Not exactly accurate, as most users tokenize cards rather than creating customers with cards directly

• Apply this model on live traffic:
• Data is generated according to:
stripe.Customer.create.
Card.declined.
(correlated.with.fraud).
No.customer.
(customer.email).
A"empt'charge'
without'email'
P(fraud'|'no'email)'>>'
P(fraud'|'email)'
Model'blocks'
charge'

Model evaluation
• Topmodel
• Flask app that charts and organizes output
from binary classifiers
• Cross between a lab notebook and Kaggle
• Feedback / PRs appreciated!
• https://github.com/stripe/topmodel

Model evaluation (cont.)
• Regularly generate ground truth and
benchmarks existing models
• Newly trained models automatically compared
test_y, test_start, test_end =
topmodel_integration.retrieve_actuals(path)
test_X = query_to_df(
model.spec.sql_query()), test_start, test_end)
metadata = model.metadata()
results = model.score_and_format(test_y, test_X)
topmodel_integration.send_dataframe_to_s3(results, metadata)

Model evaluation (cont.)
• Maintaining reproducibility annoying
• Originally store pickled models on S3
• But wrapper code sometimes changes
• But sklearn sometimes changes

Summary
• Python glues together whole pipeline
• Adding a simple feature can be hard
• Spend a lot of time on feature
engineering, model evaluation

Detecting fraud with Python and machine learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (20)

Similaire à Detecting fraud with Python and machine learning

Similaire à Detecting fraud with Python and machine learning (20)

Dernier

Dernier (20)

Detecting fraud with Python and machine learning