SlideShare a Scribd company logo
1 of 36
Download to read offline
Example-Dependent Cost-Sensitive
Credit Card Fraud Detection
March 21st, 2014
Alejandro Correa Bahnsen
with
Djamila Aouada, SnT
Björn Ottersten, SnT
Introduction
€ 500
€ 600
€ 700
€ 800
2007 2008 2009 2010 2011E 2012E
Europe fraud evolution
Internettransactions(millions of euros)
2
Introduction
$-
$1.0
$2.0
$3.0
$4.0
$5.0
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
US fraud evolution
Online revenue lost due to fraud (Billions of dollars)
3
• Increasing fraud levels around the world
• Different technologies and legal requirements makes it
harder to control
• Lack of collaboration between academia and
practitioners, leading to solutions that fail to
incorporate practical issues of credit card fraud
detection:
• Financial comparison measures
• Huge class imbalance
• Response time measure in milliseconds
Introduction
4
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
5
Simplify transaction flow
Fraud??
Network
6
Data
• Larger European card
processing company
• Jan2012 – Jun2013
card present transactions
• 1,638,772 Transactions
• 3,444 Frauds
• 0.21%Fraud rate
• 205,542 EUR lost due to fraud
on test dataset
Jun13
May13
Apr13
Mar13
Feb13
Jan13
…
…
…
Mar12
Feb12
Jan12
Test
Train
7
• Raw attributes
• Other attributes:
Age, country of residence, postal code, type of card
Data
TRXID Client ID Date Amount Location Type
Merchant
Group
Fraud
1 1 2/1/12 6:00 580 Ger Internet Airlines No
2 1 2/1/12 6:15 120 Eng Present Car Rent No
3 2 2/1/12 8:20 12 Bel Present Hotel Yes
4 1 3/1/12 4:15 60 Esp ATM ATM No
5 2 3/1/12 9:18 8 Fra Present Retail No
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes
8
• Derived attributes
Data
Trx
ID
Client
ID
Date Amount Location Type
Merchant
Group
Fraud
No. of Trx – same
client– last 6 hour
Sum – same client–
last 7 days
1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0
2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580
3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0
4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700
5 2 3/1/12 9:18 8 Fra Present Retail No 0 12
6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760
By Group Last Function
Client None hour Count
Credit Card Transaction Type day Sum(Amount)
Merchant week Avg(Amount)
Merchant Category month
Merchant Country 3 months
– Combination of following criteria:
9
Date of transaction
04/03/2012 - 03:14
07/03/2012 - 00:47
07/03/2012 - 02:57
08/03/2012 - 02:08
14/03/2012 - 22:15
25/03/2012 - 05:03
26/03/2012 - 21:51
28/03/2012 - 03:41
𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛 =
1
𝑛
𝑡
𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑀𝑒𝑎𝑛 = tan _2−1
sin(𝑡) , cos(𝑡)
𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑆𝑡𝑑 = 𝑙𝑛 1
1
𝑛
sin 𝑡
2
+
1
𝑛
cos 𝑡
2
𝑡 ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝑘 ≈ 1
𝑠𝑡𝑑
𝑃 −𝑧𝑡 < 𝑡 < 𝑧𝑡 = 0.95
-1
-1
24h
6h
12h
18h
Data
10
Date of transaction
04/03/2012 - 03:14
07/03/2012 - 00:47
07/03/2012 - 02:57
08/03/2012 - 02:08
14/03/2012 - 22:15
25/03/2012 - 05:03
26/03/2012 - 21:51
28/03/2012 - 03:41
-1
-1
24h
6h
12h
18h
02/04/2012 - 02:02
03/04/2012 - 12:10
new features
Inside CI(0.95) last 30 days
Inside CI(0.95) last 7 days
Inside CI(0.5) last 30 days
Inside CI(0.5) last 7 days
Data
11
• Misclassification = 1 −
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
• Recall =
𝑇𝑃
𝑇𝑃+𝐹𝑁
• Precision =
𝑇𝑃
𝑇𝑃+𝐹𝑃
• F-Score = 2
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙
Evaluation
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1)
Legitimate
(𝑦𝑖=0)
Predicted
class (𝑝𝑖)
Fraud (𝑝𝑖=1) TP FP
Legitimate (𝑝𝑖=0) FN TN
• Confusion matrix
12
• Motivation:
• Equal misclassification results
• Frauds carry different cost
Evaluation - Financialmeasure
TRX
ID
Amount Fraud
1 580 No
2 120 No
3 12 Yes
4 60 No
5 8 No
6 1210 Yes
Miss-Class 2 / 6
Cost 1222
Prediction
(Fraud?)
No
No
Yes
No
Yes
No
2 / 6
1212
Prediction
(Fraud?)
No
No
No
No
Yes
Yes
2 / 6
14
Prediction
(Fraud?)
No
No
No
No
No
No
Algorithm 1 Algorithm 3Algorithm 2
13
• Cost matrix
where:
Evaluation - Financialmeasure
True Class (𝑦𝑖)
Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0)
Predicted
class (𝑝𝑖)
Fraud (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖
= Ca 𝐶 𝐹𝑃𝑖
= Ca
Legitimate (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖
= Amt(i) 𝐶 𝑇𝑁𝑖
= 0
Ca Administrative costs
Amt Amount of transaction i
𝐶𝑜𝑠𝑡 = 𝑦𝑖 𝑐𝑖 𝐶 𝑎 + 1 − 𝑐𝑖 𝐴𝑚𝑡𝑖 + (1 − 𝑦𝑖)𝑐𝑖 𝐶 𝑎
𝑚
𝑖=1
14
• Evaluation measure
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
15
• Decision model based on quantifying tradeoffs between
various decisions using probabilities and the costs that
accompany such decisions
• Risk of classification
Bayes Minimum Risk
16
• If then
𝑡 𝐵𝑀𝑅 𝑖
=
𝐶 𝑎
𝐴𝑚𝑡𝑖
Bayes Minimum Risk
17
• Example-dependent threshold
• Estimation of the fraud probabilities using
• Decision Trees
• Logistic Regression
• Random Forest
• Datasets
Experiments
Database Transactions Frauds Losses
Total 1,638,772 0.21% 860,448
Train 815,368 0.21% 416,369
Validation 412,137 0.22% 238,537
Test 411,267 0.21% 205,542
18
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
0
50,000
100,000
150,000
200,000
250,000
No Model RF DT LR
Cost(Euros)
Cost F1-Score
19
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
-
50,000
100,000
150,000
200,000
250,000
No Model RF BMR DT BMR LR BMR
RF DT LR
Cost(Euros)
Cost F1-Score
20
• Cost its reduced when using BRM
• The F1-Score is also reduced
Probability Calibration
• When using the output of a binary classier as a basis for
decision making, there is a need for a probability that not
only separates well between positive and negative examples,
but that also assesses the real probability of the event
21
Probability Calibration
• Reliability Diagram
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(pf)
P(pf|x)
base LR RF DT
22
Probability Calibration
• ROC Convex Hull calibration
ROC CurveClass (y) Prob (p)
0 0.0
1 0.1
0 0.2
0 0.3
1 0.4
0 0.5
1 0.6
1 0.7
0 0.8
1 0.9
1 1.0
23
Probability Calibration
• ROC Convex Hull calibration
ROC ConvexHull Curve
Class (y) Prob (p) Cal Prob
0.0 0 0
0.1 1 0.333
0.2 0 0.333
0.3 0 0.333
0.4 1 0.5
0.5 0 0.5
0.6 1 0.666
0.7 1 0.666
0.8 0 0.666
0.9 1 1
1.0 1 1
the calibratedprobabilitiesare extracted by first group the probabilitiesaccording
to the pointsin the ROCCH curve, and then make the calibratedprobabilitiesbe
the slope(T) for each group T. 24
Probability Calibration
• Reliability Diagram
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
P(pf)
P(pf|x)
base RF DT LR
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(pf)
P(pf|x)
base Cal RF Cal DT Cal LR
25
• Extra 1.5% decrease in cost by using calibrated probabilities
Experiments
0.00
0.05
0.10
0.15
0.20
0.25
-
50,000
100,000
150,000
200,000
250,000
No
Model
RF BMR CAL
BMR
DT BMR CAL
BMR
LR BMR CAL
BMR
RF DT LR
Cost(Euros)
Cost F1-Score
26
• Introduction
• Database
• Evaluation
• Bayes Minimum Risk
• Experiments
• Probability Calibration
• Other applications
• Conclusions & Future Work
Agenda
27
Other Applications
• Direct Marketing: Banking LTD offers
http://archive.ics.uci.edu/ml/datasets/Bank+Marketing
• Credit Scoring: 2011 Kaggle competition Give Me Some
Credit
http://www.kaggle.com/c/GiveMeSomeCredit/
• Credit Scoring: 2009 Pacific-Asia Knowledge Discovery and
Data Mining conference (PAKDD)competition
http://sede.neurotech.com.br:443/PAKDD2009/
28
Other Applications
• Direct Marketing: Banking LTD offers
where int(i) is the expected interest gains of customer i
• Datasets
Cost Matrix
True Class (𝑦𝑖)
Accept (𝑦𝑖=1) Decline (𝑦𝑖=0)
Predicted
class (𝑝𝑖)
Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖
= Ca 𝐶 𝐹𝑃𝑖
= Ca
Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖
= Int(i) 𝐶 𝑇𝑁𝑖
= 0
29
Database Examples Acceptance Int
Total 47,562 12.56% 394,211
Train 19,119 12.64% 156,676
Validation 11,809 12.78% 97,498
Test 11,815 12.23% 97,594
Other Applications
• Direct Marketing: Banking LTD offers
0.00
0.10
0.20
0.30
0.40
-
4,000
8,000
12,000
16,000
No
Model
RF BMR CAL
BMR
DT BMR CAL
BMR
LR BMR CAL
BMR
RF DT LR
Cost(Euros)
Cost F1-Score
95,594
30
• Extra 13.4%decrease in cost by using calibrated probabilities
Other Applications
• Credit Scoring
where 𝑙𝑔𝑑 is the loss given default, 𝐶𝑙𝑖 is the credit line of client i, 𝑟𝑖 is the expected
profit of client i, and 𝐶 𝐹𝑃 𝑎
is the expected cost of lending the money to an
alternativeborrower.
• Datasets
Kaggle Credit Dataset PAKDD Credit Dataset
Cost Matrix
True Class (𝑦𝑖)
Accept (𝑦𝑖=1) Decline (𝑦𝑖=0)
Predicted
class (𝑝𝑖)
Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖
= 0 𝐶 𝐹𝑃𝑖
= 𝑟𝑖 + 𝐶 𝐹𝑃𝑎
Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖
= 𝐶𝑙 𝑖 ∗ 𝑙𝑔𝑑 𝐶 𝑇𝑁𝑖
= 0
31
Database Examples Default
Total 112,915 6.74%
Train 45,358 6.83%
Validation 33,850 6.67%
Test 33,707 6.71%
Database Examples Default
Total 38,969 19.88%
Train 15,614 19.98%
Validation 11,711 20.02%
Test 11,644 19.63%
Other Applications
• Credit Scoring
Kaggle Credit Dataset PAKDD Credit Dataset
0.00
0.10
0.20
0.30
0.40
-
5.00
10.00
15.00
20.00
25.00
RF BMR CAL BMR
Cost(MillionsEuros)
Cost F1-Score
0.00
0.10
0.20
0.30
0.40
-
0.20
0.40
0.60
0.80
1.00
RF BMR CAL BMRCost(MillionsEuros)
Cost F1-Score
32
• Extra 0.9% decrease in cost by using calibrated probabilities
Conclusion
• Selecting models based on traditional statisticsdoes
not give the best results in terms of cost
• Models should be evaluated taking into account real
financial costsof the application
• Algorithms should be developed to incorporate
those real financial costs
• Calibration of probabilities yields to further decrease
in cost
33
Future work
• Example Dependent Cost Sensitive Decision Trees
• Example-Dependent Cost-Sensitive Calibration
Method
• Applications:
• Corporate credit risk
• Involuntary & Voluntary Churn in TV subscription
market
34
References
• Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2013). Cost Sensitive
Credit Card Fraud Detection using Bayes Minimum Risk. In International Conference
on Machine Learning and Applications. Miami, USA: IEEE.
• Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2014). Improving
Credit Card Fraud Detection with Calibrated Probabilities. In SIAM International
Conference on Data Mining. Philadelphia, USA: SIAM.
• Correa Bahnsen, A., Aouada, D., & Ottersten, B. (2014). Example-Dependent Cost-
Sensitive Credit Scoring using Bayes Minimum Risk. Submitted to ECAI 2014.
35
Contact information
Alejandro Correa Bahnsen
University of Luxembourg
Luxembourg
al.bahnsen@gmail.com
http://www.linkedin.com/in/albahnsen
http://www.slideshare.net/albahnsen
36

More Related Content

What's hot

Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detectionPEIPEI HAN
 
Face Chk - Face Recognition
Face Chk - Face RecognitionFace Chk - Face Recognition
Face Chk - Face RecognitionPrime Infoserv
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsAlejandro Correa Bahnsen, PhD
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION K Srinivas Rao
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentationHernan Huwyler
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionvineeta vineeta
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learningSandeep Garg
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
Chp8 electronic payment system
Chp8 electronic payment systemChp8 electronic payment system
Chp8 electronic payment systemEngr Razaque
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learningdataalcott
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Payments Made Easy with Stripe
Payments Made Easy with StripePayments Made Easy with Stripe
Payments Made Easy with StripeShawn Hooper
 

What's hot (20)

Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Payment fraud
Payment fraudPayment fraud
Payment fraud
 
Credit card payment_fraud_detection
Credit card payment_fraud_detectionCredit card payment_fraud_detection
Credit card payment_fraud_detection
 
Face Chk - Face Recognition
Face Chk - Face RecognitionFace Chk - Face Recognition
Face Chk - Face Recognition
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Fraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive AnalyticsFraud Detection with Cost-Sensitive Predictive Analytics
Fraud Detection with Cost-Sensitive Predictive Analytics
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
Chp8 electronic payment system
Chp8 electronic payment systemChp8 electronic payment system
Chp8 electronic payment system
 
Fraud analytics
Fraud analyticsFraud analytics
Fraud analytics
 
Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Credit card fraud detection through machine learning
Credit card fraud detection through machine learningCredit card fraud detection through machine learning
Credit card fraud detection through machine learning
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
 
Payments Made Easy with Stripe
Payments Made Easy with StripePayments Made Easy with Stripe
Payments Made Easy with Stripe
 

Viewers also liked

2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practiceAlejandro Correa Bahnsen, PhD
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...Alejandro Correa Bahnsen, PhD
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningAlejandro Correa Bahnsen, PhD
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationAlejandro Correa Bahnsen, PhD
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Alejandro Correa Bahnsen, PhD
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesAlejandro Correa Bahnsen, PhD
 

Viewers also liked (12)

2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice2013 credit card fraud detection why theory dosent adjust to practice
2013 credit card fraud detection why theory dosent adjust to practice
 
Fraud analytics detección y prevención de fraudes en la era del big data sl...
Fraud analytics detección y prevención de fraudes en la era del big data   sl...Fraud analytics detección y prevención de fraudes en la era del big data   sl...
Fraud analytics detección y prevención de fraudes en la era del big data sl...
 
Analytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacionAnalytics - compitiendo en la era de la informacion
Analytics - compitiendo en la era de la informacion
 
1609 Fraud Data Science
1609 Fraud Data Science1609 Fraud Data Science
1609 Fraud Data Science
 
Classifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural NetworksClassifying Phishing URLs Using Recurrent Neural Networks
Classifying Phishing URLs Using Recurrent Neural Networks
 
Maximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learningMaximizing a churn campaigns profitability with cost sensitive machine learning
Maximizing a churn campaigns profitability with cost sensitive machine learning
 
Modern Data Science
Modern Data ScienceModern Data Science
Modern Data Science
 
2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle2011 advanced analytics through the credit cycle
2011 advanced analytics through the credit cycle
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...Maximizing a churn campaign’s profitability with cost sensitive predictive an...
Maximizing a churn campaign’s profitability with cost sensitive predictive an...
 
Demystifying machine learning using lime
Demystifying machine learning using limeDemystifying machine learning using lime
Demystifying machine learning using lime
 
Ensembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slidesEnsembles of example dependent cost-sensitive decision trees slides
Ensembles of example dependent cost-sensitive decision trees slides
 

Similar to Example-Dependent Cost-Sensitive Credit Card Fraud Detection

Data analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxData analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxKRNL1
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingAkos Hajdu
 
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...Lili Zhang
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4Khadija Atiya
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwarePooyan Jamshidi
 
Improving the safety of ride hailing services using iot analytics
Improving the safety of ride hailing services using iot analyticsImproving the safety of ride hailing services using iot analytics
Improving the safety of ride hailing services using iot analyticsMohan Manivannan
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptxshuchismitjha2
 
Decision making tool (ahp)
Decision making tool (ahp)Decision making tool (ahp)
Decision making tool (ahp)Amit Jain
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering odsc
 
Six Sgima (DMAIC & PDCA)
Six Sgima (DMAIC & PDCA)Six Sgima (DMAIC & PDCA)
Six Sgima (DMAIC & PDCA)Mayank Agarwal
 
Loan portfolio manufacturing sme's statistical analysis
Loan portfolio manufacturing sme's   statistical analysisLoan portfolio manufacturing sme's   statistical analysis
Loan portfolio manufacturing sme's statistical analysisManzar Ahmed
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked inSanket Butoliya
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesAlejandro Correa Bahnsen, PhD
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesZhongmin Luo
 

Similar to Example-Dependent Cost-Sensitive Credit Card Fraud Detection (20)

Data analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptxData analysis for credit card fraud detection.pptx
Data analysis for credit card fraud detection.pptx
 
Fraud Analytics
Fraud AnalyticsFraud Analytics
Fraud Analytics
 
Towards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model CheckingTowards Evaluating Size Reduction Techniques for Software Model Checking
Towards Evaluating Size Reduction Techniques for Software Model Checking
 
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
Influence of the Event Rate on Discrimination Abilities of Bankruptcy Predict...
 
RS in the context of Big Data-v4
RS in the context of Big Data-v4RS in the context of Big Data-v4
RS in the context of Big Data-v4
 
Autonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based SoftwareAutonomic Resource Provisioning for Cloud-Based Software
Autonomic Resource Provisioning for Cloud-Based Software
 
Improving the safety of ride hailing services using iot analytics
Improving the safety of ride hailing services using iot analyticsImproving the safety of ride hailing services using iot analytics
Improving the safety of ride hailing services using iot analytics
 
Graphical tools
Graphical toolsGraphical tools
Graphical tools
 
EvaluationMetrics.pptx
EvaluationMetrics.pptxEvaluationMetrics.pptx
EvaluationMetrics.pptx
 
ch15.ppt
ch15.pptch15.ppt
ch15.ppt
 
Decision making tool (ahp)
Decision making tool (ahp)Decision making tool (ahp)
Decision making tool (ahp)
 
Travel_Time_Reliability
Travel_Time_ReliabilityTravel_Time_Reliability
Travel_Time_Reliability
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 
Internship Poster
Internship PosterInternship Poster
Internship Poster
 
Six Sgima (DMAIC & PDCA)
Six Sgima (DMAIC & PDCA)Six Sgima (DMAIC & PDCA)
Six Sgima (DMAIC & PDCA)
 
Energy saving policies final
Energy saving policies finalEnergy saving policies final
Energy saving policies final
 
Loan portfolio manufacturing sme's statistical analysis
Loan portfolio manufacturing sme's   statistical analysisLoan portfolio manufacturing sme's   statistical analysis
Loan portfolio manufacturing sme's statistical analysis
 
Group Project - Final - Linked in
Group Project - Final - Linked inGroup Project - Final - Linked in
Group Project - Final - Linked in
 
Fraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision TreesFraud Detection by Stacking Cost-Sensitive Decision Trees
Fraud Detection by Stacking Cost-Sensitive Decision Trees
 
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning TechniquesCredit Default Swap (CDS) Rate Construction by Machine Learning Techniques
Credit Default Swap (CDS) Rate Construction by Machine Learning Techniques
 

Recently uploaded

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 

Recently uploaded (20)

Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 

Example-Dependent Cost-Sensitive Credit Card Fraud Detection

  • 1. Example-Dependent Cost-Sensitive Credit Card Fraud Detection March 21st, 2014 Alejandro Correa Bahnsen with Djamila Aouada, SnT Björn Ottersten, SnT
  • 2. Introduction € 500 € 600 € 700 € 800 2007 2008 2009 2010 2011E 2012E Europe fraud evolution Internettransactions(millions of euros) 2
  • 3. Introduction $- $1.0 $2.0 $3.0 $4.0 $5.0 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 US fraud evolution Online revenue lost due to fraud (Billions of dollars) 3
  • 4. • Increasing fraud levels around the world • Different technologies and legal requirements makes it harder to control • Lack of collaboration between academia and practitioners, leading to solutions that fail to incorporate practical issues of credit card fraud detection: • Financial comparison measures • Huge class imbalance • Response time measure in milliseconds Introduction 4
  • 5. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 5
  • 7. Data • Larger European card processing company • Jan2012 – Jun2013 card present transactions • 1,638,772 Transactions • 3,444 Frauds • 0.21%Fraud rate • 205,542 EUR lost due to fraud on test dataset Jun13 May13 Apr13 Mar13 Feb13 Jan13 … … … Mar12 Feb12 Jan12 Test Train 7
  • 8. • Raw attributes • Other attributes: Age, country of residence, postal code, type of card Data TRXID Client ID Date Amount Location Type Merchant Group Fraud 1 1 2/1/12 6:00 580 Ger Internet Airlines No 2 1 2/1/12 6:15 120 Eng Present Car Rent No 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 4 1 3/1/12 4:15 60 Esp ATM ATM No 5 2 3/1/12 9:18 8 Fra Present Retail No 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 8
  • 9. • Derived attributes Data Trx ID Client ID Date Amount Location Type Merchant Group Fraud No. of Trx – same client– last 6 hour Sum – same client– last 7 days 1 1 2/1/12 6:00 580 Ger Internet Airlines No 0 0 2 1 2/1/12 6:15 120 Eng Present Car Renting No 1 580 3 2 2/1/12 8:20 12 Bel Present Hotel Yes 0 0 4 1 3/1/12 4:15 60 Esp ATM ATM No 0 700 5 2 3/1/12 9:18 8 Fra Present Retail No 0 12 6 1 3/1/12 9:55 1210 Ita Internet Airlines Yes 1 760 By Group Last Function Client None hour Count Credit Card Transaction Type day Sum(Amount) Merchant week Avg(Amount) Merchant Category month Merchant Country 3 months – Combination of following criteria: 9
  • 10. Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41 𝐴𝑟𝑖𝑡ℎ𝑚𝑒𝑡𝑖𝑐 𝑀𝑒𝑎𝑛 = 1 𝑛 𝑡 𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑀𝑒𝑎𝑛 = tan _2−1 sin(𝑡) , cos(𝑡) 𝑃𝑒𝑟𝑖𝑜𝑑𝑖𝑐 𝑆𝑡𝑑 = 𝑙𝑛 1 1 𝑛 sin 𝑡 2 + 1 𝑛 cos 𝑡 2 𝑡 ~ 𝑣𝑜𝑛𝑚𝑖𝑠𝑒𝑠 𝑘 ≈ 1 𝑠𝑡𝑑 𝑃 −𝑧𝑡 < 𝑡 < 𝑧𝑡 = 0.95 -1 -1 24h 6h 12h 18h Data 10
  • 11. Date of transaction 04/03/2012 - 03:14 07/03/2012 - 00:47 07/03/2012 - 02:57 08/03/2012 - 02:08 14/03/2012 - 22:15 25/03/2012 - 05:03 26/03/2012 - 21:51 28/03/2012 - 03:41 -1 -1 24h 6h 12h 18h 02/04/2012 - 02:02 03/04/2012 - 12:10 new features Inside CI(0.95) last 30 days Inside CI(0.95) last 7 days Inside CI(0.5) last 30 days Inside CI(0.5) last 7 days Data 11
  • 12. • Misclassification = 1 − 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁 • Recall = 𝑇𝑃 𝑇𝑃+𝐹𝑁 • Precision = 𝑇𝑃 𝑇𝑃+𝐹𝑃 • F-Score = 2 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 ∗ 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛+𝑅𝑒𝑐𝑎𝑙𝑙 Evaluation True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑝𝑖=1) TP FP Legitimate (𝑝𝑖=0) FN TN • Confusion matrix 12
  • 13. • Motivation: • Equal misclassification results • Frauds carry different cost Evaluation - Financialmeasure TRX ID Amount Fraud 1 580 No 2 120 No 3 12 Yes 4 60 No 5 8 No 6 1210 Yes Miss-Class 2 / 6 Cost 1222 Prediction (Fraud?) No No Yes No Yes No 2 / 6 1212 Prediction (Fraud?) No No No No Yes Yes 2 / 6 14 Prediction (Fraud?) No No No No No No Algorithm 1 Algorithm 3Algorithm 2 13
  • 14. • Cost matrix where: Evaluation - Financialmeasure True Class (𝑦𝑖) Fraud (𝑦𝑖=1) Legitimate (𝑦𝑖=0) Predicted class (𝑝𝑖) Fraud (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = Ca 𝐶 𝐹𝑃𝑖 = Ca Legitimate (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = Amt(i) 𝐶 𝑇𝑁𝑖 = 0 Ca Administrative costs Amt Amount of transaction i 𝐶𝑜𝑠𝑡 = 𝑦𝑖 𝑐𝑖 𝐶 𝑎 + 1 − 𝑐𝑖 𝐴𝑚𝑡𝑖 + (1 − 𝑦𝑖)𝑐𝑖 𝐶 𝑎 𝑚 𝑖=1 14 • Evaluation measure
  • 15. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 15
  • 16. • Decision model based on quantifying tradeoffs between various decisions using probabilities and the costs that accompany such decisions • Risk of classification Bayes Minimum Risk 16
  • 17. • If then 𝑡 𝐵𝑀𝑅 𝑖 = 𝐶 𝑎 𝐴𝑚𝑡𝑖 Bayes Minimum Risk 17 • Example-dependent threshold
  • 18. • Estimation of the fraud probabilities using • Decision Trees • Logistic Regression • Random Forest • Datasets Experiments Database Transactions Frauds Losses Total 1,638,772 0.21% 860,448 Train 815,368 0.21% 416,369 Validation 412,137 0.22% 238,537 Test 411,267 0.21% 205,542 18
  • 20. Experiments 0.00 0.05 0.10 0.15 0.20 0.25 - 50,000 100,000 150,000 200,000 250,000 No Model RF BMR DT BMR LR BMR RF DT LR Cost(Euros) Cost F1-Score 20 • Cost its reduced when using BRM • The F1-Score is also reduced
  • 21. Probability Calibration • When using the output of a binary classier as a basis for decision making, there is a need for a probability that not only separates well between positive and negative examples, but that also assesses the real probability of the event 21
  • 22. Probability Calibration • Reliability Diagram 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(pf) P(pf|x) base LR RF DT 22
  • 23. Probability Calibration • ROC Convex Hull calibration ROC CurveClass (y) Prob (p) 0 0.0 1 0.1 0 0.2 0 0.3 1 0.4 0 0.5 1 0.6 1 0.7 0 0.8 1 0.9 1 1.0 23
  • 24. Probability Calibration • ROC Convex Hull calibration ROC ConvexHull Curve Class (y) Prob (p) Cal Prob 0.0 0 0 0.1 1 0.333 0.2 0 0.333 0.3 0 0.333 0.4 1 0.5 0.5 0 0.5 0.6 1 0.666 0.7 1 0.666 0.8 0 0.666 0.9 1 1 1.0 1 1 the calibratedprobabilitiesare extracted by first group the probabilitiesaccording to the pointsin the ROCCH curve, and then make the calibratedprobabilitiesbe the slope(T) for each group T. 24
  • 25. Probability Calibration • Reliability Diagram 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 P(pf) P(pf|x) base RF DT LR 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1P(pf) P(pf|x) base Cal RF Cal DT Cal LR 25
  • 26. • Extra 1.5% decrease in cost by using calibrated probabilities Experiments 0.00 0.05 0.10 0.15 0.20 0.25 - 50,000 100,000 150,000 200,000 250,000 No Model RF BMR CAL BMR DT BMR CAL BMR LR BMR CAL BMR RF DT LR Cost(Euros) Cost F1-Score 26
  • 27. • Introduction • Database • Evaluation • Bayes Minimum Risk • Experiments • Probability Calibration • Other applications • Conclusions & Future Work Agenda 27
  • 28. Other Applications • Direct Marketing: Banking LTD offers http://archive.ics.uci.edu/ml/datasets/Bank+Marketing • Credit Scoring: 2011 Kaggle competition Give Me Some Credit http://www.kaggle.com/c/GiveMeSomeCredit/ • Credit Scoring: 2009 Pacific-Asia Knowledge Discovery and Data Mining conference (PAKDD)competition http://sede.neurotech.com.br:443/PAKDD2009/ 28
  • 29. Other Applications • Direct Marketing: Banking LTD offers where int(i) is the expected interest gains of customer i • Datasets Cost Matrix True Class (𝑦𝑖) Accept (𝑦𝑖=1) Decline (𝑦𝑖=0) Predicted class (𝑝𝑖) Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = Ca 𝐶 𝐹𝑃𝑖 = Ca Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = Int(i) 𝐶 𝑇𝑁𝑖 = 0 29 Database Examples Acceptance Int Total 47,562 12.56% 394,211 Train 19,119 12.64% 156,676 Validation 11,809 12.78% 97,498 Test 11,815 12.23% 97,594
  • 30. Other Applications • Direct Marketing: Banking LTD offers 0.00 0.10 0.20 0.30 0.40 - 4,000 8,000 12,000 16,000 No Model RF BMR CAL BMR DT BMR CAL BMR LR BMR CAL BMR RF DT LR Cost(Euros) Cost F1-Score 95,594 30 • Extra 13.4%decrease in cost by using calibrated probabilities
  • 31. Other Applications • Credit Scoring where 𝑙𝑔𝑑 is the loss given default, 𝐶𝑙𝑖 is the credit line of client i, 𝑟𝑖 is the expected profit of client i, and 𝐶 𝐹𝑃 𝑎 is the expected cost of lending the money to an alternativeborrower. • Datasets Kaggle Credit Dataset PAKDD Credit Dataset Cost Matrix True Class (𝑦𝑖) Accept (𝑦𝑖=1) Decline (𝑦𝑖=0) Predicted class (𝑝𝑖) Accept (𝑐𝑖=1) 𝐶 𝑇𝑃𝑖 = 0 𝐶 𝐹𝑃𝑖 = 𝑟𝑖 + 𝐶 𝐹𝑃𝑎 Decline (𝑐𝑖=0) 𝐶 𝐹𝑁𝑖 = 𝐶𝑙 𝑖 ∗ 𝑙𝑔𝑑 𝐶 𝑇𝑁𝑖 = 0 31 Database Examples Default Total 112,915 6.74% Train 45,358 6.83% Validation 33,850 6.67% Test 33,707 6.71% Database Examples Default Total 38,969 19.88% Train 15,614 19.98% Validation 11,711 20.02% Test 11,644 19.63%
  • 32. Other Applications • Credit Scoring Kaggle Credit Dataset PAKDD Credit Dataset 0.00 0.10 0.20 0.30 0.40 - 5.00 10.00 15.00 20.00 25.00 RF BMR CAL BMR Cost(MillionsEuros) Cost F1-Score 0.00 0.10 0.20 0.30 0.40 - 0.20 0.40 0.60 0.80 1.00 RF BMR CAL BMRCost(MillionsEuros) Cost F1-Score 32 • Extra 0.9% decrease in cost by using calibrated probabilities
  • 33. Conclusion • Selecting models based on traditional statisticsdoes not give the best results in terms of cost • Models should be evaluated taking into account real financial costsof the application • Algorithms should be developed to incorporate those real financial costs • Calibration of probabilities yields to further decrease in cost 33
  • 34. Future work • Example Dependent Cost Sensitive Decision Trees • Example-Dependent Cost-Sensitive Calibration Method • Applications: • Corporate credit risk • Involuntary & Voluntary Churn in TV subscription market 34
  • 35. References • Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2013). Cost Sensitive Credit Card Fraud Detection using Bayes Minimum Risk. In International Conference on Machine Learning and Applications. Miami, USA: IEEE. • Correa Bahnsen, A., Stojanovic, A., Aouada, D., & Ottersten, B. (2014). Improving Credit Card Fraud Detection with Calibrated Probabilities. In SIAM International Conference on Data Mining. Philadelphia, USA: SIAM. • Correa Bahnsen, A., Aouada, D., & Ottersten, B. (2014). Example-Dependent Cost- Sensitive Credit Scoring using Bayes Minimum Risk. Submitted to ECAI 2014. 35
  • 36. Contact information Alejandro Correa Bahnsen University of Luxembourg Luxembourg al.bahnsen@gmail.com http://www.linkedin.com/in/albahnsen http://www.slideshare.net/albahnsen 36