SlideShare une entreprise Scribd logo
1  sur  93
Télécharger pour lire hors ligne
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
Adaptive Machine Learning for
Credit Card Fraud Detection
Andrea Dal Pozzolo
4/12/2015
PhD public defence
Supervisor: Prof. Gianluca Bontempi
1/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE Doctiris PROJECT
4 years PhD project funded by InnovIRIS (Brussels Region)
Industrial project done in collaboration with Worldline
S.A. (Brussels, Belgium)
Requirement: 50% time at Worldline (WL) and 50% at
MLG-ULB
Goal: Improve the Data-Driven part of the Fraud Detection
System (FDS) by means of Machine Learning algorithms.
2/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
INTRODUCTION
This thesis is about Fraud Detection (FD) in credit card
transactions.
FD is the process of identifying fraudulent transactions
after the payment is authorized.
This process is typically automatized by means of a FDS.
Technical sections will be denoted by the symbol
3/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE FRAUD DETECTION PROCESS
Predic;ve&model&
Rejected&
Terminal&check&
Correct'Pin?''
Sufficient'Balance'?''
Blocked'Card?'
Feedback&
Inves;gators&
Fraud&score&
4/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
WHAT’S MACHINE LEARNING?
The design of algorithms that discover patterns in a
collection of data instances in an automated manner.
The goal is to use the discovered patterns to make
predictions on new data.
  
  
  
Predic've)
Model)
Training)samples)
Tes'ng)samples)
Machine)Learning)
algorithm)
We let computers learn these patterns.
5/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CHALLENGES FOR A DATA-DRIVEN FDS
1. Unbalanced distribution (few frauds).
2. Concept Drift (CD) due to fraud evolution and change in
customer behaviour.
3. Few recent supervised transactions (true class of only
transactions reviewed by investigators).
4. For confidentiality reason, exchange of datasets and
information is often difficult.
6/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
PCA PROJECTION
Figure : Transactions distribution over the first two Principal
Components in different days.
7/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
RESAMPLING METHODS
Undersampling- Oversampling- SMOTE-
Figure : Resampling methods for unbalanced classification. The
positive and negative symbols denote fraudulent and genuine
transactions. In red the new observations created with oversampling
methods.
8/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCEPT DRIFT
In data streams the concept to learn can change due to
non-stationary distributions.
t +1
t
Class%priors%change% Within%class%change% Class%boundary%change%
Figure : Illustrative example of different types of Concept Drifts.
9/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THESIS CONTRIBUTIONS
1. Learning with unbalanced distribution:
impact of undersampling on the accuracy of the final
classifier
racing algorithm to adaptively select the best unbalanced
strategy
2. Learning from evolving and unbalanced data streams
multiple strategies for unbalanced data stream.
effective solution to avoid propagation of old transactions.
3. Prototype of a real-world FDS
realistic framework that uses all supervised information
and provides accurate alerts.
4. Software and Credit Card Fraud Detection Dataset
release of a software package and a real-world dataset.
10/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
PUBLICATIONS
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi.
Credit Card Fraud Detection with Alert-Feedback Interaction.
Submitted to IEEE TNNLS, 2015.
A. Dal Pozzolo, O. Caelen, and G. Bontempi.
When is undersampling effective in unbalanced classification tasks?
In ECML-KDD, 2015.
A. Dal Pozzolo, O. Caelen, R. Johnson and G. Bontempi.
Calibrating Probability with Undersampling for Unbalanced Classification.
In IEEE CIDM, 2015.
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi.
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information.
In IEEE IJCNN, 2015.
A. Dal Pozzolo, R. Johnson, O. Caelen, S. Waterschoot, N. Chawla, and G. Bontempi.
Using HDDT to avoid instances propagation in unbalanced and evolving data streams.
In IEEE IJCNN, 2014.
A. Dal Pozzolo, O. Caelen, Y. Le Borgne, S. Waterschoot, and G. Bontempi.
Learned lessons in credit card fraud detection from a practitioner perspective.
Elsevier ESWA, 2014.
A. Dal Pozzolo, O. Caelen, S. Waterschoot, G. Bontempi,
Racing for unbalanced methods selection.
In IEEE IDEAL, 2013
11/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE FRAUD DETECTION PROBLEM
We formalize FD as a classification task f : Rn → {+, −}.
X ∈ Rn is the input and Y ∈ {+, −} the output domain.
+ is the fraud (minority) and − the genuine (majority)
class.
Given a classifier K and a training set TN, we are interested
in estimating for a new sample (x, y) the posterior
probability P(y = +|x).
Fraud&
Genuine&
12/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE FRAUD DETECTION DATA
WL is processing millions of transactions per day.
When a transaction arrives it contains about 20 raw
features (e.g. CARD ID, AMOUNT, DATETIME,
SHOP ID, CURRENCY, etc.).
Aggregate features are compute to include the cardholder
historical behaviour (e.g. AVG AMOUNT WEEK,
TIME FROM LAST TRX, NB TRX SAME SHOP, etc.).
Final feature vector contains about 50 variables.
13/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ACCURACY MEASURE OF A FDS
In a detection problem it is more important to
provide accurate ranking than correct
classification.
In a realistic FDS, investigators check only a
limited number of transactions.
Hence, Pk is the most relevant measure, with k
being the number of transactions checked.
Pk is the proportion of frauds in k most risky
transactions.
k Predicted))Score True)Class
1 0.99 +
2 0.98 +
3 0.98 :
4 0.96 :
5 0.95 +
6 0.92 :
7 0.91 +
8 0.84 :
9 0.82 :
10 0.79 :
11 0.76 :
12 0.75 +
13 0.7 :
14 0.67 :
15 0.64 +
16 0.61 :
17 0.58 :
18 0.55 +
19 0.52 :
20 0.49 :
14/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
Theoretical contribution
CHAPTER 4
-
UNDERSTANDING SAMPLING METHODS
Publications:
A. Dal Pozzolo, O. Caelen, and G. Bontempi.
When is undersampling effective in unbalanced classification tasks?
In ECML-KDD, 2015.
A. Dal Pozzolo, O. Caelen, R. Johnson and G. Bontempi.
Calibrating Probability with Undersampling for Unbalanced Classification.
In IEEE CIDM, 2015.
A. Dal Pozzolo, O. Caelen, S. Waterschoot, G. Bontempi,
Racing for unbalanced methods selection.
In IEEE IDEAL, 2013
15/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
OBJECTIVE OF CHAPTER 4
In Chapter 4 we aim to
analyse the shift in the posterior distribution due to
undersampling and to study its impact on the final
ranking.
show under which condition undersampling is expected to
improve classification accuracy.
propose a method to calibrate posterior probabilities and
to select the classification threshold.
show how to efficiently select the best unbalanced strategy
for a given dataset.
16/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
EFFECT OF UNDERSAMPLING
Suppose that a classifier K is trained on set TN which is
unbalanced.
Let s be a random variable associated to each sample
(x, y) ∈ TN, s = 1 if the point is sampled and s = 0
otherwise.
Assume that s is independent of the input x given the class
y (class-dependent selection):
P(s|y, x) = P(s|y) ⇔ P(x|y, s) = P(x|y)
.
Undersampling--
Unbalanced- Balanced-
Figure : Undersampling: remove randomly majority class examples.
In red samples that are removed from the unbalanced dataset (s = 0).
17/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
POSTERIOR PROBABILITIES
Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a
function of p:
ps =
p
p + β(1 − p)
(1)
where β = p(s = 1|−). Using (1) we can obtain an expression of
p as a function of ps:
p =
βps
βps − ps + 1
(2)
18/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SHIFT AND CLASS SEPARABILITY
(a) ps as a function of β
3 15
0
500
1000
1500
−10 0 10 20 −10 0 10 20
x
Count
class
0
1
(b) Class distribution
Figure : Class distribution and posterior probability as a function of β
for two univariate binary classification tasks with norm class
conditional densities X−
∼ N(0, σ) and X+
∼ N(µ, σ) (on the left
µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p
corresponds to β = 1 and ps to β < 1.
19/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONDITION FOR A BETTER RANKING
Using νs > ν and by making a hypothesis of normality in the
errors, K has better ranking with undersampling when
dps
dp
>
νs
ν
(3)
where
dps
dp is the derivative of ps w.r.t. p:
dps
dp
=
β
(p + β(1 − p))2
20/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
UNIVARIATE SYNTHETIC DATASET
−2 −1 0 1 2
0.00.20.40.60.81.01.2
x
Posteriorprobability
(a) Class conditional distributions
(thin lines) and the posterior distri-
bution of the minority class (thicker
line).
(b) dps
dp
(solid lines), νs
ν
(dotted
lines).
Figure : Non separable case. On the right we plot both terms of
inequality 3 (solid: left-hand, dotted: right-hand term) for β = 0.1
and β = 0.4
21/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
REAL DATASETS
Figure : Difference between the Kendall rank correlation of ˆps and ˆp
with p, namely Ks and K, for points having the condition (3) satisfied
and not. Ks and K are calculated as the mean of the correlations over
all βs.
22/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
REAL DATASETS II
Figure : Ratio between the number of sample satisfying condition 3
and all the instances available in each dataset averaged over all the
βs.
23/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DISCUSSION
When (3) is satisfied the posterior probability obtained
after sampling returns a more accurate ordering.
Several factors influence (3) (e.g. β, variance of the
classifier, class separability)
Practical use (3) is not straightforward since it requires
knowledge of p and νs
ν (not easy to estimate).
The sampling rate should be tuned, using a balanced
distribution is not always the best option.
24/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SELECTING THE BEST STRATEGY
With no prior information about the data distribution is
difficult to decide which unbalanced strategy to use.
No-free-lunch theorem [23]: no single strategy is coherently
superior to all others in all conditions (i.e. algorithm,
dataset and performance metric)
Testing all unbalanced techniques is not an option because
of the associated computational cost.
We proposed to use the Racing approach [17] to perform
strategy selection.
25/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
RACING FOR STRATEGY SELECTION
Racing consists in testing in parallel a set of alternatives
and using a statistical test to remove an alternative if it is
significantly worse than the others.
We adopted F-Race version [4] to search efficiently for the
best strategy for unbalanced data.
The F-race combines the Friedman test with Hoeffding
Races [17].
26/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
RACING FOR UNBALANCED TECHNIQUE SELECTION
Automatically select the most adequate technique for a given
dataset.
1. Test in parallel a set of alternative balancing strategies on a
subset of the dataset
2. Remove progressively the alternatives which are
significantly worse.
3. Iterate the testing and removal step until there is only one
candidate left or not more data is available
Candidate(1 Candidate(2 Candidate(3
subset(1( 0.50 0.47 0.48
subset(2 0.51 0.48 0.30
subset(3 0.51 0.47
subset(4 0.60 0.45
subset(5 0.55
Candidate(1 Candidate(2 Candidate(3
subset(1( 0.50 0.47 0.48
subset(2 0.51 0.48 0.3027/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
F-RACE METHOD
Use 10-fold Cross Validation (CV) to provide the data
during the race.
Every time new data is added to the race, the Friedman
test is used to remove significantly bad candidates.
We made a comparison of CV and F-race using different
classifiers.
Candidate(1 Candidate(2 Candidate(3
subset(1( 0.50 0.47 0.48
subset(2 0.51 0.48 0.30
subset(3 0.51 0.47
subset(4 0.60 0.45
subset(5 0.55
Candidate(1 Candidate(2 Candidate(3
subset(1( 0.50 0.47 0.48
subset(2 0.51 0.48 0.30
subset(3 0.51 0.47
subset(4 0.49 0.46
subset(5 0.48 0.46
subset(6 0.60 0.45
subset(7 0.59
subset(8
subset(9
subset(10
Original(Dataset
Time
28/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
F-RACE VS CROSS VALIDATION
Dataset Exploration Method Ntest % Gain Mean Sd
ecoli
Race Under 46 49 0.836 0.04
CV SMOTE 90 - 0.754 0.112
letter-a
Race Under 34 62 0.952 0.008
CV SMOTE 90 - 0.949 0.01
letter-vowel
Race Under 34 62 0.884 0.011
CV Under 90 - 0.887 0.009
letter
Race SMOTE 37 59 0.951 0.009
CV Under 90 - 0.951 0.01
oil
Race Under 41 54 0.629 0.074
CV SMOTE 90 - 0.597 0.076
page
Race SMOTE 45 50 0.919 0.01
CV SMOTE 90 - 0.92 0.008
pendigits
Race Under 39 57 0.978 0.011
CV Under 90 - 0.981 0.006
PhosS
Race Under 19 79 0.598 0.01
CV Under 90 - 0.608 0.016
satimage
Race Under 34 62 0.843 0.008
CV Under 90 - 0.841 0.011
segment
Race SMOTE 90 0 0.978 0.01
CV SMOTE 90 - 0.978 0.01
estate
Race Under 27 70 0.553 0.023
CV Under 90 - 0.563 0.021
covtype
Race Under 42 53 0.924 0.007
CV SMOTE 90 - 0.921 0.008
cam
Race Under 34 62 0.68 0.007
CV Under 90 - 0.674 0.015
compustat
Race Under 37 59 0.738 0.021
CV Under 90 - 0.745 0.017
creditcard
Race Under 43 52 0.927 0.008
CV SMOTE 90 - 0.924 0.006
Table : Results in terms of G-mean for RF classifier.
29/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DISCUSSION
The best strategy is extremely dependent on the data
nature, algorithm adopted and performance measure.
F-race is able to automatise the selection of the best
unbalanced strategy for a given unbalanced problem
without exploring the whole dataset.
For the fraud dataset the unbalanced strategy chosen had a
big impact on the accuracy of the results.
30/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCLUSIONS OF CHAPTER 4
Undersampling is not always the best strategy for all
unbalanced datasets.
This result warning against a naive use of undersampling
in unbalanced tasks.
With racing [9] we can rapidly select the best strategy for a
given unbalanced task.
However, we see that undersampling and SMOTE are
often the best strategy to adopt for FD.
31/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
Methodological
contribution
CHAPTER 5
-
LEARNING WITH CONCEPT DRIFT
Publications:
A. Dal Pozzolo, O. Caelen, Y. Le Borgne, S. Waterschoot, and G. Bontempi.
Learned lessons in credit card fraud detection from a practitioner perspective.
Elsevier ESWA, 2014.
A. Dal Pozzolo, R. Johnson, O. Caelen, S. Waterschoot, N. Chawla, and G. Bontempi.
Using HDDT to avoid instances propagation in unbalanced and evolving data streams.
In IEEE IJCNN, 2014.
32/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
OBJECTIVE OF CHAPTER 5
The objectives of this chapter are:
investigating the benefit of updating the FDS.
assessing the impact of Concept Drift (CD).
comparing alternative learning strategies for CD.
studying an effective way to deal with unbalanced
distribution in a streaming environment.
33/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
LEARNING APPROACHES
We compare three standard learning approaches on
transactions from February 2012 to May 2013.
Approach Strengths Weaknesses
Static • Speed • No CD adaptation
Update • No instances propagation • Need several batches
• CD adaptation for the minority class
Propagate and Forget • Accumulates minority • Propagation leads to
instances faster larger training time
• CD adaptation
# Days # Features # Transactions Period
422 45 2’202’228 1Feb12 - 20May13
34/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE STATIC APPROACH
Time%Sta)c%approach%
Fraudulent%transac)ons%
Genuine%transac)ons%
Bt 2 Bt 1 Bt Bt+1 Bt+2 Bt+3 Bt+4
Mt
35/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE UPDATE APPROACH
Time%Update%approach%
Bt 2 Bt 1 Bt Bt+1
Mt 2 Mt 1 MtMt 3
Et
Bt 5 Bt 4 Bt 3
Fraudulent%transac)ons%
Genuine%transac)ons%
36/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE PROPAGATE AND FORGET APPROACH
Time%
Propagate%and%Forget%approach%
Bt 2 Bt 1 Bt Bt+1Bt 3
Mt 2 Mt 1 MtMt 3
Et
Fraudulent%transac)ons%
Genuine%transac)ons%
37/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
BEST OVERALL STRATEGY
The Propagate and Forget approach is the best while the Static
approach is the worst.
NNET.Under.One.1M.Static.60K
SVM.Under.One.1M.Static.60K
NNET.Under.One.1M.Static.90K
SVM.Under.One.1M.Static.90K
NNET.Under.One.1M.Static.120K
SVM.Under.One.1M.Static.120K
RF.Under.One.1M.Static.30K
RF.Under.One.1M.Static.60K
RF.SMOTE.One.1M.Static.90K
RF.Under.One.1M.Static.90K
RF.EasyEnsemble.One.1M.Static.90K
RF.Under.One.1M.Static.120K
RF.Under.15days.1M.Update.90K
RF.Under.Daily.30M.Forget.30Kgen
RF.Under.Weekly.1M.Update.90K
RF.Under.Daily.15M.Forget.30Kgen
RF.Under.Daily.10M.Forget.30Kgen
RF.Under.Daily.1M.Update.90K
RF.Under.Daily.30M.Update.90K
RF.SMOTE.Daily.1M.Forget.90Kgen
RF.Under.Daily.5M.Forget.30Kgen
RF.Under.Daily.1M.Forget.90Kgen
RF.Under.Daily.15M.Update.90K
RF.Under.Daily.5M.Update.90K
RF.EasyEnsemble.Daily.1M.Update.90K
RF.SMOTE.Daily.1M.Update.90K
RF.EasyEnsemble.Daily.1M.Forget.90Kgen
0 1000 2000 3000 4000
Sum of the ranks
Best significant
FALSE
TRUE
Metric: PrecisionRank
Figure : Comparison of all strategies using sum of ranks in all batches
when using Pk as accuracy measure.
38/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DISCUSSION
Without CD we would expect the three approaches to
perform similarly.
The static approach is consistently the worst.
Ensembles are often better alternative than single models.
Higher accuracy can be achieved by using SMOTE or
undersampling the data.
Instance propagation can be an alternative way to
rebalance the data stream.
39/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCLUSION OF CHAPTER 5
Updating the models daily and combining them into
ensemble is an effective strategy to adapt to CD.
Retaining positives samples/undersampling the data
stream improves the accuracy of the FDS.
However, it is not always possible to either revisit or store
old instances of a data stream.
In that case we reccomend using Hellinger Distance
Decision Tree [7] (HDDT) to avoid instances propagation.
40/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
Proposed FDS
CHAPTER 6
-
LEARNING WITH ALERT-FEEDBACK INTERACTION
Publications:
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi.
Credit Card Fraud Detection with Alert-Feedback Interaction.
Submitted to IEEE TNNLS, 2015.
A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi.
Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information.
In IEEE IJCNN, 2015.
41/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
OBJECTIVE OF CHAPTER 6
Propose a framework meeting realistic working conditions.
Model the interaction between the FDS generating alerts
and investigators.
Integrate investigators’ feedback in the FDS.
42/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
REALISTIC WORKING CONDITIONS
Alert-Feedback Interaction: the supervised samples
depends on the alerts reported to investigators.
Only a small set of alerts can be checked by human
investigators.
The labels of the majority of transactions are available only
several days later (after customers have reported
unauthorized transactions).
Investigators trust a FDS only if it provides accurate alerts,
i.e. alert precision (Pk) is the accuracy measure that
matters.
43/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SUPERVISED SAMPLES
The k transactions with largest PKt−1
(+|x) define the alerts
At reported to the investigators.
Investigators provide a feedback about the alerts in At,
defining a set of k supervised couples (x, y):
Ft = {(x, y), x ∈ At}, (4)
Ft are the only immediate supervised samples.
Dt−δ are transactions that have not been checked by
investigators, but their label is assumed to be correct after
that δ days have elapsed.
44/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SUPERVISED SAMPLES II
Ft are a small set of risky transactions according the FDS.
Dt−δ contains all the occurred transactions in a day (≈ 99%
genuine transactions).
t −1 t t +1
Feedbacks)Delayed)Samples)
t −δ
All)fraudulent)transac6ons)of)a)day)
All)genuine)transac6ons)of)a)day)
Fraudulent)transac6ons)in)the)feedback)
Genuine)transac6ons)in)the)feedback)
FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8 Bt+1
Figure : The supervised samples available at day t include: i)
feedbacks of the first δ days and ii) delayed couples occurred before
the δth
day.
45/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
FEEDBACKS VS DELAYED SAMPLES
Learning from feedbacks Ft is a different problem than learning
from delayed samples in Dt−δ:
Ft provides recent, up-to-date, information while Dt−δ
might be already obsolete once it comes.
Percentage of frauds in Ft and Dt−δ is different.
Samples in Ft are selected as the most risky by Kt−1.
A classifier trained on Ft learns how to label transactions
that are most likely to be fraudulent.
Feedbacks and delayed samples have to be treated separately.
46/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
LEARNING STRATEGY
A conventional solution for CD adaptation is Wt [20]. To learn
separately from feedbacks and delayed transactions we
propose AW
t (aggregation of Ft and WD
t ).
FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8
Wt
FtWD
t
AW
t
Figure : Learning strategies for feedbacks and delayed transactions
occurring in the two days (M = 2) before the feedbacks (δ = 7).
47/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CLASSIFIER AGGREGATIONS
WD
t has to be aggregated with Ft to exploit information
provided by feedbacks. We combine these classifiers by
averaging the posterior probabilities.
PAW
t
(+|x) = αtPFt (+|x) + (1 − αt)PWD
t
(+|x) (5)
AW
t with αt ≥ 0.5 gives larger influence to feedbacks on the
probability estimates w.r.t Wt.
48/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DATASETS
We used three datasets of e-commerce credit card transactions
(fraud << 1%):
Table : Datasets
Id Start day End day # Days # Transactions # Features
2013 2013-09-05 2014-01-18 136 21’830’330 51
2014 2014-08-05 2014-12-31 148 27’113’187 51
2015 2015-01-05 2015-05-31 146 27’651’197 51
2013 2014 2015
q
q
qqq
q
q
q
qq
q
q
q
qq
q
q
q
q
qqq
q
q
qq
q
q
q
qqq
q
q
q
q
q
q
qq
q
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
qqq
q
q
q
q
qq
qq
qqqq
qqqq
qqq
q
q
q
q
q
q
q
qqq
q
q
q
qq
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
qq
qq
qq
q
q
q
q
q
q
q
q
qq
q
q
qqqqqqq
qq
q
q
q
qq
q
q
qq
q
q
q
qqqqq
q
q
q
q
q
qq
qq
qq
q
qqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
q
q
q
q
q
qqq
qq
q
q
q
q
q
q
q
qqq
q
q
qq
qq
qqq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qqq
q
qq
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
qq
q
q
q
q
q
qq
qqqq
q
q
q
q
q
q
q
q
q
qq
q
q
qq
qq
q
q
q
q
q
q
q
qq
qqq
qq
q
q
q
q
q
qq
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
qqq
q
q
0
500
1000
1500
2000
2500
Sep Oct Nov Dec Jan Aug Sep Oct Nov Dec JanJan Feb Mar Apr May Jun
Day
Numberoffrauds
49/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SEPARATING FEEDBACKS FROM DELAYED SAMPLES
Aggregating two distinct classifiers (Ft with WD
t ) using AW
t is
better than pooling together feedbacks and delayed samples
with Wt.
dataset classifier mean sd
2013 F 0.62 0.25
2013 WD 0.54 0.22
2013 W 0.57 0.23
2013 AW 0.70 0.21
2014 F 0.59 0.29
2014 WD 0.58 0.26
2014 W 0.60 0.26
2014 AW 0.69 0.24
2015 F 0.67 0.25
2015 WD 0.66 0.21
2015 W 0.68 0.21
2015 AW 0.75 0.20
Table : Average Pk with δ = 7, M = 16 and αt = 0.5.
50/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
EXPERIMENTS WITH ARTIFICIAL CD
AW
W
(a) Sliding window strate-
gies on dataset CD1
AW
W
(b) Sliding window strate-
gies on dataset CD2
W
AW
(c) Sliding window strate-
gies on dataset CD3
Figure : Average Pk per day on datasets with artificial CD using
moving average of 15 days. The vertical bar denotes the date of CD.
51/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CLASSIFIERS IGNORING AFI
Experiments with a classifier Rt trained on all recent
transactions between t and t − δ. Rt makes the unrealistic
assumption that all transactions are labeled by investigators.
FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8
FtWD
t
AW
t
Rt
Figure : Rt makes the unrealistic assumption that all transactions
between t and t − δ have been checked by investigators.
52/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CLASSIFIERS IGNORING AFI II
AUC is a global ranking measure, while Pk measures the
ranking quality only on the top k.
(a) dataset 2013
metric classifier mean sd
AUC F 0.83 0.06
AUC WD 0.94 0.03
AUC R 0.96 0.01
AUC AW 0.94 0.03
Pk F 0.67 0.24
Pk WD 0.54 0.22
Pk R 0.60 0.22
Pk AW 0.72 0.20
(b) dataset 2014
metric classifier mean sd
AUC F 0.81 0.08
AUC WD 0.94 0.03
AUC R 0.96 0.02
AUC AW 0.93 0.03
Pk F 0.67 0.27
Pk WD 0.56 0.27
Pk R 0.63 0.25
Pk AW 0.71 0.25
(c) dataset 2015
metric classifier mean sd
AUC F 0.81 0.07
AUC WD 0.95 0.02
AUC R 0.97 0.01
AUC AW 0.94 0.02
Pk F 0.71 0.21
Pk WD 0.62 0.21
Pk R 0.68 0.20
Pk AW 0.76 0.19
Table : Average AUC and Pk for the sliding approach (δ = 15, M = 16
and αt = 0.5).
53/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADAPTIVE WEIGHTING
Experiments with different weighting strategies in AW
t .
(a) dataset 2013
α adaptation mean sd
probDif 0.70 0.22
acc 0.72 0.21
accMiss 0.72 0.21
precision 0.72 0.21
recall 0.71 0.21
auc 0.72 0.21
none(αt = 0.5) 0.72 0.20
(b) dataset 2014
α adaptation mean sd
probDif 0.68 0.26
acc 0.71 0.24
accMiss 0.70 0.24
precision 0.71 0.24
recall 0.70 0.25
auc 0.71 0.24
none(αt = 0.5) 0.71 0.24
(c) dataset 2015
α adaptation mean sd
probDif 0.73 0.21
acc 0.76 0.19
accMiss 0.75 0.19
precision 0.75 0.19
recall 0.74 0.20
auc 0.75 0.19
none(αt = 0.5) 0.76 0.19
Table : Average Pk of AW
t with adaptive αt (δ = 15).
54/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADAPTIVE WEIGHTING II
Figure : PFt
(+|x) and PWD
t
(+|x) for alerts in different days.
55/ 68
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
qq
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
qqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q q
q q
q
q
q q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
qq
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qq
q
q
qq
q
q
q
q
0.00
0.25
0.50
0.75
1.00
0.00 0.25 0.50 0.75 1.00
PF(+|x)
PWD(+|x)
Class
q
q
+
−
Alert
q
FALSE
TRUE
Pk: 0.64 alpha: 0.464
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCLUSION OF CHAPTER 6
The goal of the FDS is to return precise alerts (high Pk).
In a realistic settings feedbacks and delayed samples are
the only supervised information available.
Feedbacks and delayed samples address two different
classification tasks.
Not convenient to pool the two types of supervised
samples together.
56/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCLUSIONS AND
FUTURE PERSPECTIVES
57/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SUMMARY OF CONTRIBUTIONS
In this thesis we:
formalize realistic working conditions of a FDS.
study how undersampling can improve the ranking of a
classifier K.
release a R package called unbalanced [8] with racing.
investigate strategies for data streams with both CD and
unbalanced distribution.
prototype a real-world FDS that, for the first time, learns
from investigators’ feedback and allows us to have more
precise alerts.
58/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
LEARNED LESSONS
Rebalancing a training set with undersampling is not
guaranteed to improve performances.
F-Race [4] can be effective to rapidly select the unbalanced
technique to use.
The data stream defined by credit card transactions is
severely affected by CD, updating the FDS is often a good
idea.
For a real-world FDS it is mandatory to produce precise
alerts.
Feedbacks and delayed samples should be separately
handled when training a FDS.
Aggregating two distinct classifiers is an effective strategy
and that it enables a quicker adaptation to CD.
59/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
TAKE-HOME MESSAGES
Class imbalance and CD seriously affect the performances
of a classifier.
In a real-world scenario, there is a strong Alert-Feedback
interaction that has to be explicitly considered.
The best learning algorithm for a FDS changes with the
accuracy measure.
A standard classifier that is trained on all recent
transactions is not the best when we want to produce
accurate alerts.
60/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
FUTURE WORK: TOWARDS BIG DATA SOLUTIONS
This work will be continued in the BruFence project.
Big Data Mining for Fraud Detection and Security.
Three research groups (ULB-QualSec, ULB-MLG and
UCL-MLG) and three companies (Worldline, Steria and
Nviso).
2015 - 2018 funded by InnovIRIS (Brussels Region).
BruFence&
•  Big&Data&Mining&for&Fraud&Detec;on&and&Security&&
•  2015O2018&funded&by&Innoviris&(Brussels&Region).&&
c
nce - consortium• ULB - QualSec
• ULB - MLG
• UCL - MLG
!
!
!
!
• Wordline
• Steria
BruFence - consortiumQualSec
MLG
MLG
ine
uFence - consortium
Spice
LB - QualSec
LB - MLG
CL - MLG
ordline
eria
Viso
BruFence - consortium
Spice
61/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
THE BruFence PROJECT
Goal: implement the FDS of Chapter 6 using Big Data
technologies (e.g. Spark, NoSQL DB).
Real-time framework scalable w.r.t. the amount of data
and resources available.
Determine to which extent external network data can
improve prediction accuracy.
Compute aggregated features in real time and using a
larger set of historical transactions.
Figure : Apache Big Data softwares
62/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADDED VALUE FOR WL
RF has emerged as the best algorithm (now used by WL).
New ways to create aggregated features.
Undersampling can be used to improve the accuracy of
FDS, but probabilities need to be calibrated.
Our experimental analysis on CD gave WL some
guidelines on how to improve the FDS.
Including feedbacks in the learning process allows WL to
obtain more accurate alerts.
63/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONCLUDING REMARKS
Doctiris was an unique opportunity to work on real-world
fraud detection data.
Real working conditions define new challenges (often
ignored in the literature).
Companies are more interested on practical results, while
academics on theoretical ones.
Industry and university, should look at each other and
exchange ideas in order to have a much larger impact on
society.
64/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
Website: www.ulb.ac.be/di/map/adalpozz
Email: adalpozz@ulb.ac.be
Thank you for the attention
65/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
BIBLIOGRAPHY I
[1] D. N. A. Asuncion.
UCI machine learning repository, 2007.
[2] C. Alippi, G. Boracchi, and M. Roveri.
A just-in-time adaptive classification system based on the intersection of confidence intervals rule.
Neural Networks, 24(8):791–800, 2011.
[3] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer.
Moa: Massive online analysis.
The Journal of Machine Learning Research, 99:1601–1604, 2010.
[4] M. Birattari, T. St¨utzle, L. Paquete, and K. Varrentrapp.
A racing algorithm for configuring metaheuristics.
In Proceedings of the genetic and evolutionary computation conference, pages 11–18, 2002.
[5] D. A. Cieslak and N. V. Chawla.
Detecting fractures in classifier performance.
In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 123–132. IEEE, 2007.
[6] D. A. Cieslak and N. V. Chawla.
Learning decision trees for unbalanced data.
In Machine Learning and Knowledge Discovery in Databases, pages 241–256. Springer, 2008.
[7] D. A. Cieslak, T. R. Hoens, N. V. Chawla, and W. P. Kegelmeyer.
Hellinger distance decision trees are robust and skew-insensitive.
Data Mining and Knowledge Discovery, 24(1):136–158, 2012.
[8] A. Dal Pozzolo, O. Caelen, and G. Bontempi.
unbalanced: Racing For Unbalanced Methods Selection., 2015.
R package version 2.0.
65/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
BIBLIOGRAPHY II
[9] A. Dal Pozzolo, O. Caelen, S. Waterschoot, and G. Bontempi.
Racing for unbalanced methods selection.
In Proceedings of the 14th International Conference on Intelligent Data Engineering and Automated Learning. IDEAL,
2013.
[10] C. Elkan.
The foundations of cost-sensitive learning.
In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978. Citeseer, 2001.
[11] R. Elwell and R. Polikar.
Incremental learning of concept drift in nonstationary environments.
Neural Networks, IEEE Transactions on, 22(10):1517–1531, 2011.
[12] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu.
Classifying data streams with skewed class distributions and concept drifts.
Internet Computing, 12(6):37–49, 2008.
[13] T. R. Hoens, R. Polikar, and N. V. Chawla.
Learning from streaming data with concept drift and imbalance: an overview.
Progress in Artificial Intelligence, 1(1):89–101, 2012.
[14] M. G. Kelly, D. J. Hand, and N. M. Adams.
The impact of changing populations on classifier performance.
In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages
367–371. ACM, 1999.
[15] A. Liaw and M. Wiener.
Classification and regression by randomforest.
R News, 2(3):18–22, 2002.
66/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
BIBLIOGRAPHY III
[16] R. N. Lichtenwalter and N. V. Chawla.
Adaptive methods for classification in arbitrarily imbalanced and drifting data streams.
In New Frontiers in Applied Data Mining, pages 53–75. Springer, 2010.
[17] O. Maron and A. Moore.
Hoeffding races: Accelerating model selection search for classification and function approximation.
Robotics Institute, page 263, 1993.
[18] C. R. Rao.
A review of canonical coordinates and an alternative to correspondence analysis using hellinger distance.
Questii´o: Quaderns d’Estad´ıstica, Sistemes, Informatica i Investigaci´o Operativa, 19(1):23–63, 1995.
[19] M. Saerens, P. Latinne, and C. Decaestecker.
Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure.
Neural computation, 14(1):21–41, 2002.
[20] D. K. Tasoulis, N. M. Adams, and D. J. Hand.
Unsupervised clustering in streaming data.
In ICDM Workshops, pages 638–642, 2006.
[21] V. N. Vapnik and V. Vapnik.
Statistical learning theory, volume 1.
Wiley New York, 1998.
[22] S. Wang, K. Tang, and X. Yao.
Diversity exploration and negative correlation learning on imbalanced data sets.
In Neural Networks, 2009. IJCNN 2009. International Joint Conference on, pages 3259–3266. IEEE, 2009.
[23] D. H. Wolpert.
The lack of a priori distinctions between learning algorithms.
Neural computation, 8(7):1341–1390, 1996.
67/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
BIBLIOGRAPHY IV
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
OPEN ISSUES
Agreeing on a good performance measure (cost-based Vs
detection).
Modelling Alert-Feedback Interaction.
Using both the supervised and unsupervised information
available.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
AVERAGE PRECISION
Let N+ be the number of positive (fraud) case in the original
dataset and TPk be the number of true positives in the first k
ranked transactions (TPk ≤ k). Let us denote Precision and
Recall at k as Pk = TPk
k and Rk = TPk
N+ . We can then define AP as:
AP =
N
k=1
Pk(Rk − Rk−1) (6)
where N is the total number of observation in the dataset.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
RANKING ERROR
We have a wrong ranking if ˆp1 > ˆp2 and its probability is:
P(ˆp2 < ˆp1) = P(p2 + 2 < p1 + 1) = P( 1 − 2 > ∆p)
where 2 − 1 ∼ N(0, 2ν). By making an hypothesis of
normality we have
P( 1 − 2 > ∆p) = 1 − Φ
∆p
√
2ν
(7)
where Φ is the cumulative distribution function of the standard
normal distribution.
The probability of a ranking error with undersampling is:
P(ˆps,2 < ˆps,1) = P(η1 − η2 > ∆ps)
and
P(η1 − η2 > ∆ps) = 1 − Φ
∆ps
√
2νs
(8)
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CLASSIFICATION THRESHOLD
Let r+ and r− be the risk of predicting an instance as positive
and negative:
r+
= (1 − p) · l1,0 + p · l1,1
r−
= (1 − p) · l0,0 + p · l0,1
where li,j is the cost in predicting i when the true class is j and
p = P(y = +|x). A sample is predicted as positive if
r+ ≤ r− [21].
Alternatively, predict as positive when p > τ with τ:
τ =
l1,0 − l0,0
l1,0 − l0,0 + l0,1 − l1,1
(9)
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CONDITION FOR A BETTER RANKING
K has better ranking with undersampling w.r.t. a classifier
learned with unbalanced distribution when
P(ˆp2 < ˆp1) > P(ˆps,2 < ˆps,1)
P( 1 − 2 > ∆p) > P(η1 − η2 > ∆ps) (10)
or equivalently from (7) and (8) when
1 − Φ
∆p
√
2ν
> 1 − Φ
∆ps
√
2νs
⇔ Φ
∆p
√
2ν
< Φ
∆ps
√
2νs
since Φ is monotone non decreasing and νs > ν: Then it follows
that undersampling is useful (better ranking) when
dps
dp
>
νs
ν
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CORRECTING POSTERIOR PROBABILITIES
We propose to use p , the bias-corrected probability obtained
from ps:
p =
βps
βps − ps + 1
(11)
Eq. (11) is a special case of the framework proposed by Saerens
et al. [19] and Elkan [10] for correcting the posterior probability
in the case of testing and training sets sharing the same priors.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CORRECTING THE CLASSIFICATION THRESHOLD
When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can
use the priors. Let l1,0 = π+ and l0,1 = π−, from (9) we get:
τ =
l1,0
l1,0 + l0,1
=
π+
π+ + π−
= π+
(12)
since π+ + π− = 1. Then we should use π+ as threshold with p:
p −→ τ = π+
Similarly
ps −→ τs = π+
s
From Elkan [10]:
τ
1 − τ
1 − τs
τs
= β (13)
Therefore, we obtain:
p −→ τ = π+
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
CREDIT CARDS DATASET
Real-world credit card dataset with transactions from Sep 2013,
frauds account for 0.172% of all transactions.
LB RF SVM
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqq qqqqqqqqqq
qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqq qqqqqq
qqqqqqqq qqqqqqqq
qqqqqqqq qqqqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
0.900
0.925
0.950
0.975
1.000
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
AUC
Probability
p
p'
ps
Credit−card
LB RF SVM
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqqqqqq
qqqqqqqqqq
qqqqqq
qqqqqq
q
q
q
q
q
q
q
q
q
q
q
q
qqqqqq
qqqqqq
qqqqqqqq qqqqqqqq
3e−04
6e−04
9e−04
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
0.10.20.30.40.50.60.70.80.9
1
beta
BS
Probability
p
p'
ps
Credit−card
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DISCUSSION
As a result of undersampling, ˆps is shifted away from ˆp.
Using (11), we can remove the drift in ˆps and obtain ˆp
which has better calibration.
ˆp provides the same ranking quality of ˆps.
Using ˆp with τ we are able to improve calibration without
losing predictive accuracy.
Unfortunately there is no way to select the best β
automatically yet.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
HELLINGER DISTANCE
Quantify the similarity between two probability
distributions [18]
Recently proposed as a splitting criteria in decision trees
for unbalanced problems [6, 7].
In data streams, it has been used to detect classifier
performance degradation due to concept drift [5, 16].
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
HELLINGER DISTANCE II
Consider two probability measures P1, P2 of discrete
distributions in Φ. The Hellinger distance is defined as:
HD(P1, P2) =
φ∈Φ
P1(φ) − P2(φ)
2
(14)
Hellinger distance has several properties:
HD(P1, P2) = HD(P1, P2) (symmetric)
HD(P1, P2) >= 0 (non-negative)
HD(P1, P2) ∈ [0,
√
2]
HD is close to zero when the distributions are similar and close
to
√
2 for distinct distributions.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
HELLINGER DISTANCE DECISION TREES
HDDT [6] use Hellinger Distance as splitting criteria in
decision trees for unbalanced problems.
All numerical features are partitioned into q bins (only
categorical variables).
The Hellinger Distance between the positive and negative
class is computed for each feature and then the feature
with the maximum distance is used to split the tree.
HD(f+
, f−
) =
q
j=1


|f+
j |
|f+|
−
|f−
j |
|f−|


2
(15)
where j defines the jth bin of feature f and |f+
j | is the number of
positive instances of feature f in the jth bin.
Note that the class priors do not appear explicitly in (15).
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
AVOIDING INSTANCES PROPAGATION
HDDT allows us to remove instance propagations between
batches, benefits:
improved predictive accuracy
improved speed
single-pass through the data.
An ensemble of HDDTs is used to adapt to CD and
increase the accuracy of single classifiers.
We test our framework on several streaming datasets with
unbalanced classes and concept drift.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
DATASETS
We used different types of datasets.
UCI datasets [1] (unbalanced, no CD).
Synthetic datasets from MOA [3] (artificial CD).
A credit card dataset (online payment transactions
between the 5th and the 25th of Sept. 2013).
Name Source Instances Features Imbalance Ratio
Adult UCI 48,842 14 3.2:1
can UCI 443,872 9 52.1:1
compustat UCI 13,657 20 27.5:1
covtype UCI 38,500 10 13.0:1
football UCI 4,288 13 1.7:1
ozone-8h UCI 2,534 72 14.8:1
wrds UCI 99,200 41 1.0:1
text UCI 11,162 11465 14.7:1
DriftedLED MOA 1,000,000 25 9.0:1
DriftedRBF MOA 1,000,000 11 1.02:1
DriftedWave MOA 1,000,000 41 2.02:1
Creditcard FRAUD 3,143,423 36 658.8:1
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
RESULTS CREDIT CARD DATASET
Figure 18 shows the sum of the ranks for each strategy over all
the batches. For each batch, we assign the highest rank to the
most accurate strategy and then sum the ranks over all batches.
HDIG No Ensemble
0.00
0.25
0.50
0.75
1.00
BD BL SE UNDER BD BL SE UNDER
Sampling
AUROC
Algorithm
C4.5
HDDT
Figure : Batch average results in
terms of AUROC (higher is better)
using different sampling strategies
and batch-ensemble weighting
methods with C4.5 and HDDT
over Credit card dataset.
BL_C4.5
SE_C4.5
BD_C4.5
UNDER_C4.5
BL_HDIG_C4.5
SE_HDDT
SE_HDIG_C4.5
UNDER_HDDT
BL_HDDT
BD_HDIG_C4.5
BD_HDDT
BD_HDIG_HDDT
UNDER_HDIG_C4.5
SE_HDIG_HDDT
UNDER_HDIG_HDDT
BL_HDIG_HDDT
0 100 200
Sum of the ranks
Best significant
FALSE
TRUE
AUROC
Figure : Sum of ranks in all chunks
for the Credit card dataset in
terms of AUROC. In gray are the
strategies that are not significantly
worse than the best having the
highest sum of ranks.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
HELLINGER DISTANCE BETWEEN BATCHES
Let Bt be the batch at time t used for training Kt and Bt+1 the
testing batch. We compute the distance between Bt and Bt+1 for
a given feature f using Hellinger distance as [16]:
HD(B
f
t, B
f
t+1) =
v∈f

 |B
f=v
t |
|Bt|
−
|B
f=v
t+1|
|Bt+1|


2
(16)
where |B
f=v
t | is the number of instances of feature f taking value
v in the batch at time t, while |Bt| is the total number of
instances in the same batch.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
HELLINGER DISTANCE AND INFORMATION GAIN
We can make the assumption that feature relevance remains
stable over time and define a new distance function that
combines IG and HD as:
HDIG(Bt, Bt+1, f) = HD(B
f
t, B
f
t+1) ∗ (1 + IG(Bt, f)) (17)
The final distance between two batches is then computed
taking the average over all the features.
AHDIG(Bt, Bt+1) =
f∈Bt
HDIG(Bt, Bt+1, f)
|f ∈ Bt|
(18)
Then Kt receives as weight wt = AHDIG(Bt, Bt+1)−M, where M
is the ensemble size.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADAPTIVE WEIGHTING II
Let FAW
t
be the feedbacks requested by AW
t and FFt be the
feedbacks requested by Ft. Let Y+
t (resp. Y−
t ) be the fraudulent
(resp. genuine) transactions at day t (Bt), we define the
following sets:
CORRFt = FAW
t
∩ FFt ∩ Y+
t .
ERRFt = FAW
t
∩ FFt ∩ Y−
t .
MISSFt = FAW
t
∩ {Bt  FFt } ∩ Y+
t .
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADAPTIVE WEIGHTING II
In the example below we have |CORRFt | = 4, |ERRFt | = 2 and
|MISSFt | = 3.
FAW
t
FFt
FWD
t
Bt
Figure : Feedbacks requested by AW
t (FAW
t
) are a subset of all the
transactions of day t (Bt). FFt
( FWD
t
) denotes the feedbacks requested
by Ft (WD
t ). The symbol + is used for frauds and − for genuine
transactions.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
ADAPTIVE WEIGHTING III
We can now compute some accuracy measures of Ft in the
feedbacks requested by AW
t :
accFt = 1 −
|ERRFt
|
k
accMissFt = 1 −
|ERRFt
|+|MISSFt
|
k
precisionFt =
|CORRFt
|
k
recallFt =
|CORRFt
|
|FAW
t
∩Y+
t |
aucFt = probability that fraudulent feedbacks rank higher
than genuine feedbacks in FAW
t
according to PFt .
probDifFt = difference between the mean of PFt calculated
on the frauds and the mean on the genuine in FAW
t
.
If we choose aucFt as metric for Ft and similarly aucWD
t
for WD
t ,
then αt+1 is:
αt+1 =
aucFt
aucFt + aucWD
t
(19)
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
SELECTION BIAS AND AFI
At day t AFI provides Ft, which is not representative of Bt.
∀(x, y) ∈ Bt let s ∈ {0, 1} where s = 1 if (x, y) ∈ Ft, and
s = 0 otherwise (Ft ⊂ Bt).
With Ft we get PKt (+|x, s = 1) rather than PKt (+|x).
A standard solution to SSB consist into training a
weight-sensitive algorithm, where (x, y) has weight w:
w =
P(s = 1)
P(s = 1|x, y)
(20)
Ft are selected according to P(+|x), hence P(s = 1|x) and
P(+|x) are highly correlated.
We expect P(s = 1|x, y) to be larger for feedbacks, leading
to smaller weights in (20).
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
INCREASING δ AND SSB CORRECTION
Table : Average Pk when δ = 15.
(a) sliding
dataset classifier mean sd
2013 F 0.67 0.24
2013 WD 0.54 0.22
2013 AW 0.72 0.20
2014 F 0.67 0.27
2014 WD 0.56 0.27
2014 AW 0.71 0.25
2015 F 0.71 0.21
2015 WD 0.62 0.21
2015 AW 0.76 0.19
(b) ensemble
dataset classifier mean sd
2013 F 0.67 0.23
2013 ED 0.48 0.23
2013 AE 0.72 0.21
2014 F 0.67 0.28
2014 ED 0.49 0.27
2014 AE 0.70 0.25
2015 F 0.71 0.22
2015 ED 0.56 0.18
2015 AE 0.75 0.20
Table : Average Pk of Ft with methods for SSB correction (δ = 15).
dataset SSB correction mean sd
2013 SJA 0.66 0.24
2013 weigthing 0.64 0.25
2014 SJA 0.66 0.27
2014 weigthing 0.65 0.28
2015 SJA 0.71 0.22
2015 weigthing 0.68 0.23
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
TRAINING SET SIZE AND TIME
qqq
qqqqqqqqqqqqq qqqqqqqqqqqqqqq
0e+00
1e+06
2e+06
3e+06
delayed feedback
Classifier
Trainingsetsize
adaptation
sliding
ensemble
(a) Training set size
qqq
q
q qq
0
100
200
300
400
delayed feedback
Classifier
Trainingtime
adaptation
sliding
ensemble
(b) Training time
Figure : Training set size and time (in seconds) to train a RF for the
feedback (Ft) and delayed classifiers (WD
t and ED
t ) in the 2013 dataset.
68/ 68
Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions
FEATURE IMPORTANCE
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
TX_ACCEPTED
HAD_REFUSED
NB_REFUSED_HIS
TX_INTL
HAD_TRX_SAME_SHOP
TX_3D_SECURE
IS_NIGHT
HAD_LOT_NB_TX
RISK_GENDER
NB_TRX_SAME_SHOP_HIS
RISK_CARD_BRAND
RISK_D_AGE
RISK_LANGUAGE
HAD_TEST
RISK_D_AMT
RISK_TERM_MCC_GROUP
TX_HOUR
RISK_BROKER
RISK_TERM_MCCG
RISK_LAST_COUNTRY_HIS
AGE
RISK_D_SUM_AMT
RISK_TERM_REGION
RISK_TERM_MCC
NB_TRX_HIS
RISK_TERM_COUNTRY
AMOUNT
RISK_TERM_CONTINENT
MIN_AMT_HIS
SUM_AMT_HIS
RISK_LAST_MIDUID_HIS
RISK_TERM_MIDUID
0 20 40 60
Feature importance
RF model day 20130906
Figure : Average feature importance measured by the mean decrease
in accuracy calculated with the Gini index in the RF models of WD
t in
the 2013 dataset. The randomForest package [15] available in R use
the Gini index as splitting criteria.
68/ 68

Contenu connexe

Tendances

Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)k.surya kumar
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection MLMaatougSelim
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning Mohammad Junaid Khan
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksDatabricks
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionkalpesh1908
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Pratibha Singh
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learningwgyn
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)ajmal anbu
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraudHenley Walls
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detectionijtsrd
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Marina Santini
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Marina Santini
 

Tendances (20)

Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)Credit card fraud detection methods using Data-mining.pptx (2)
Credit card fraud detection methods using Data-mining.pptx (2)
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Fraud detection
Fraud detectionFraud detection
Fraud detection
 
Fraud detection ML
Fraud detection MLFraud detection ML
Fraud detection ML
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Decision trees in Machine Learning
Decision trees in Machine Learning Decision trees in Machine Learning
Decision trees in Machine Learning
 
Credit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In DatabricksCredit Card Fraud Detection Using ML In Databricks
Credit Card Fraud Detection Using ML In Databricks
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...Comparative study of various approaches for transaction Fraud Detection using...
Comparative study of various approaches for transaction Fraud Detection using...
 
Detecting fraud with Python and machine learning
Detecting fraud with Python and machine learningDetecting fraud with Python and machine learning
Detecting fraud with Python and machine learning
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraud
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods Lecture 6: Ensemble Methods
Lecture 6: Ensemble Methods
 
Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?Lecture 1: What is Machine Learning?
Lecture 1: What is Machine Learning?
 

En vedette

Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Andrea Dal Pozzolo
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014Sri Ambati
 
Credit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationCredit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationAyapparaj SKS
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionAlejandro Correa Bahnsen, PhD
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionanthonytaylor01
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentationHernan Huwyler
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlDominic Sroda Korkoryi
 
Graph Database in Graph Intelligence
Graph Database in Graph IntelligenceGraph Database in Graph Intelligence
Graph Database in Graph IntelligenceChen Zhang
 
Aq workshop l01 - way of a muslim
Aq workshop   l01 - way of a muslimAq workshop   l01 - way of a muslim
Aq workshop l01 - way of a muslimexploringislam
 
Taqleed following blindly
Taqleed   following blindlyTaqleed   following blindly
Taqleed following blindlyexploringislam
 
Valve Handbook For New Employees
Valve Handbook For New EmployeesValve Handbook For New Employees
Valve Handbook For New EmployeesGauin
 
互邀新平台产品宣介
互邀新平台产品宣介互邀新平台产品宣介
互邀新平台产品宣介Gauin
 
Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015Andrea Dal Pozzolo
 

En vedette (20)

Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?Is Machine learning useful for Fraud Prevention?
Is Machine learning useful for Fraud Prevention?
 
Deep Learning for Fraud Detection
Deep Learning for Fraud DetectionDeep Learning for Fraud Detection
Deep Learning for Fraud Detection
 
PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014PayPal's Fraud Detection with Deep Learning in H2O World 2014
PayPal's Fraud Detection with Deep Learning in H2O World 2014
 
Credit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client PresentationCredit Card Fraud Detection Client Presentation
Credit Card Fraud Detection Client Presentation
 
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud DetectionExample-Dependent Cost-Sensitive Credit Card Fraud Detection
Example-Dependent Cost-Sensitive Credit Card Fraud Detection
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
Fraud Detection Architecture
Fraud Detection ArchitectureFraud Detection Architecture
Fraud Detection Architecture
 
Fraud Detection presentation
Fraud Detection presentationFraud Detection presentation
Fraud Detection presentation
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
Graph Database in Graph Intelligence
Graph Database in Graph IntelligenceGraph Database in Graph Intelligence
Graph Database in Graph Intelligence
 
Confusion in manhaj
Confusion in manhajConfusion in manhaj
Confusion in manhaj
 
Aq workshop l01 - way of a muslim
Aq workshop   l01 - way of a muslimAq workshop   l01 - way of a muslim
Aq workshop l01 - way of a muslim
 
Taqleed following blindly
Taqleed   following blindlyTaqleed   following blindly
Taqleed following blindly
 
Valve Handbook For New Employees
Valve Handbook For New EmployeesValve Handbook For New Employees
Valve Handbook For New Employees
 
互邀新平台产品宣介
互邀新平台产品宣介互邀新平台产品宣介
互邀新平台产品宣介
 
Catalogue
CatalogueCatalogue
Catalogue
 
Estructura para la pagina web 1 zanabria
Estructura para  la pagina web  1 zanabriaEstructura para  la pagina web  1 zanabria
Estructura para la pagina web 1 zanabria
 
Squaring 4
Squaring   4Squaring   4
Squaring 4
 
Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015Data Innovation Summit - Made in Belgium 2015
Data Innovation Summit - Made in Belgium 2015
 

Similaire à Adaptive Machine Learning for Credit Card Fraud Detection

Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsJieming Wei
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statisticsJiri Haviger
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Studyiosrjce
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaPyData
 
A Review on Concept Drift
A Review on Concept DriftA Review on Concept Drift
A Review on Concept DriftIOSR Journals
 
V. pacáková, d. brebera
V. pacáková, d. breberaV. pacáková, d. brebera
V. pacáková, d. breberalogyalaa
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperJames by CrowdProcess
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsMario Pavone
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Daniel Valcarce
 
S2C+SNC João Botelho
S2C+SNC João BotelhoS2C+SNC João Botelho
S2C+SNC João BotelhoJoão Botelho
 
vision correcting display
vision correcting displayvision correcting display
vision correcting displayDisha Tiwari
 
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy up...
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy  up...OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy  up...
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy up...feature software solutions pvt ltd
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...IJDKP
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifePeea Bal Chakraborty
 
Machine Learning
Machine LearningMachine Learning
Machine LearningVivek Garg
 

Similaire à Adaptive Machine Learning for Credit Card Fraud Detection (20)

Cerdit card
Cerdit cardCerdit card
Cerdit card
 
Accurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification AlgorithmsAccurate Campaign Targeting Using Classification Algorithms
Accurate Campaign Targeting Using Classification Algorithms
 
Novelties in social science statistics
Novelties in social science statisticsNovelties in social science statistics
Novelties in social science statistics
 
Identification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation StudyIdentification of Outliersin Time Series Data via Simulation Study
Identification of Outliersin Time Series Data via Simulation Study
 
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena SharovaUnsupervised Anomaly Detection with Isolation Forest - Elena Sharova
Unsupervised Anomaly Detection with Isolation Forest - Elena Sharova
 
A Review on Concept Drift
A Review on Concept DriftA Review on Concept Drift
A Review on Concept Drift
 
D017122026
D017122026D017122026
D017122026
 
V. pacáková, d. brebera
V. pacáková, d. breberaV. pacáková, d. brebera
V. pacáková, d. brebera
 
Machine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paperMachine learning in credit risk modeling : a James white paper
Machine learning in credit risk modeling : a James white paper
 
An Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection AlgorithmsAn Information-Theoretic Approach for Clonal Selection Algorithms
An Information-Theoretic Approach for Clonal Selection Algorithms
 
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
Efficient Pseudo-Relevance Feedback Methods for Collaborative Filtering Recom...
 
S2C+SNC João Botelho
S2C+SNC João BotelhoS2C+SNC João Botelho
S2C+SNC João Botelho
 
vision correcting display
vision correcting displayvision correcting display
vision correcting display
 
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy up...
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy  up...OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy  up...
OPTIMIZED FINGERPRINT COMPRESSION WITHOUT LOSS OF DATAProposed workblessy up...
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...A predictive system for detection of bankruptcy using machine learning techni...
A predictive system for detection of bankruptcy using machine learning techni...
 
icpr_2012
icpr_2012icpr_2012
icpr_2012
 
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real LifeSimplified Knowledge Prediction: Application of Machine Learning in Real Life
Simplified Knowledge Prediction: Application of Machine Learning in Real Life
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
2012 predictive clusters
2012 predictive clusters2012 predictive clusters
2012 predictive clusters
 

Plus de Andrea Dal Pozzolo

Presentation of the unbalanced R package
Presentation of the unbalanced R packagePresentation of the unbalanced R package
Presentation of the unbalanced R packageAndrea Dal Pozzolo
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationAndrea Dal Pozzolo
 
When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?Andrea Dal Pozzolo
 
Andrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data ScientistAndrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data ScientistAndrea Dal Pozzolo
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Andrea Dal Pozzolo
 
Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels Andrea Dal Pozzolo
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Andrea Dal Pozzolo
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selectionAndrea Dal Pozzolo
 

Plus de Andrea Dal Pozzolo (9)

Andrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CVAndrea Dal Pozzolo's CV
Andrea Dal Pozzolo's CV
 
Presentation of the unbalanced R package
Presentation of the unbalanced R packagePresentation of the unbalanced R package
Presentation of the unbalanced R package
 
Calibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced ClassificationCalibrating Probability with Undersampling for Unbalanced Classification
Calibrating Probability with Undersampling for Unbalanced Classification
 
When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?When is undersampling effective in unbalanced classification tasks?
When is undersampling effective in unbalanced classification tasks?
 
Andrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data ScientistAndrea Dal Pozzolo - Data Scientist
Andrea Dal Pozzolo - Data Scientist
 
Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...Credit card fraud detection and concept drift adaptation with delayed supervi...
Credit card fraud detection and concept drift adaptation with delayed supervi...
 
Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels Doctiris project - Innoviris, Brussels
Doctiris project - Innoviris, Brussels
 
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...Using HDDT to avoid instances propagation in unbalanced and evolving data str...
Using HDDT to avoid instances propagation in unbalanced and evolving data str...
 
Racing for unbalanced methods selection
Racing for unbalanced methods selectionRacing for unbalanced methods selection
Racing for unbalanced methods selection
 

Dernier

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

Adaptive Machine Learning for Credit Card Fraud Detection

  • 1. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions Adaptive Machine Learning for Credit Card Fraud Detection Andrea Dal Pozzolo 4/12/2015 PhD public defence Supervisor: Prof. Gianluca Bontempi 1/ 68
  • 2. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE Doctiris PROJECT 4 years PhD project funded by InnovIRIS (Brussels Region) Industrial project done in collaboration with Worldline S.A. (Brussels, Belgium) Requirement: 50% time at Worldline (WL) and 50% at MLG-ULB Goal: Improve the Data-Driven part of the Fraud Detection System (FDS) by means of Machine Learning algorithms. 2/ 68
  • 3. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions INTRODUCTION This thesis is about Fraud Detection (FD) in credit card transactions. FD is the process of identifying fraudulent transactions after the payment is authorized. This process is typically automatized by means of a FDS. Technical sections will be denoted by the symbol 3/ 68
  • 4. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE FRAUD DETECTION PROCESS Predic;ve&model& Rejected& Terminal&check& Correct'Pin?'' Sufficient'Balance'?'' Blocked'Card?' Feedback& Inves;gators& Fraud&score& 4/ 68
  • 5. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions WHAT’S MACHINE LEARNING? The design of algorithms that discover patterns in a collection of data instances in an automated manner. The goal is to use the discovered patterns to make predictions on new data.          Predic've) Model) Training)samples) Tes'ng)samples) Machine)Learning) algorithm) We let computers learn these patterns. 5/ 68
  • 6. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CHALLENGES FOR A DATA-DRIVEN FDS 1. Unbalanced distribution (few frauds). 2. Concept Drift (CD) due to fraud evolution and change in customer behaviour. 3. Few recent supervised transactions (true class of only transactions reviewed by investigators). 4. For confidentiality reason, exchange of datasets and information is often difficult. 6/ 68
  • 7. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions PCA PROJECTION Figure : Transactions distribution over the first two Principal Components in different days. 7/ 68
  • 8. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions RESAMPLING METHODS Undersampling- Oversampling- SMOTE- Figure : Resampling methods for unbalanced classification. The positive and negative symbols denote fraudulent and genuine transactions. In red the new observations created with oversampling methods. 8/ 68
  • 9. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCEPT DRIFT In data streams the concept to learn can change due to non-stationary distributions. t +1 t Class%priors%change% Within%class%change% Class%boundary%change% Figure : Illustrative example of different types of Concept Drifts. 9/ 68
  • 10. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THESIS CONTRIBUTIONS 1. Learning with unbalanced distribution: impact of undersampling on the accuracy of the final classifier racing algorithm to adaptively select the best unbalanced strategy 2. Learning from evolving and unbalanced data streams multiple strategies for unbalanced data stream. effective solution to avoid propagation of old transactions. 3. Prototype of a real-world FDS realistic framework that uses all supervised information and provides accurate alerts. 4. Software and Credit Card Fraud Detection Dataset release of a software package and a real-world dataset. 10/ 68
  • 11. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions PUBLICATIONS A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. Credit Card Fraud Detection with Alert-Feedback Interaction. Submitted to IEEE TNNLS, 2015. A. Dal Pozzolo, O. Caelen, and G. Bontempi. When is undersampling effective in unbalanced classification tasks? In ECML-KDD, 2015. A. Dal Pozzolo, O. Caelen, R. Johnson and G. Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In IEEE CIDM, 2015. A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information. In IEEE IJCNN, 2015. A. Dal Pozzolo, R. Johnson, O. Caelen, S. Waterschoot, N. Chawla, and G. Bontempi. Using HDDT to avoid instances propagation in unbalanced and evolving data streams. In IEEE IJCNN, 2014. A. Dal Pozzolo, O. Caelen, Y. Le Borgne, S. Waterschoot, and G. Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Elsevier ESWA, 2014. A. Dal Pozzolo, O. Caelen, S. Waterschoot, G. Bontempi, Racing for unbalanced methods selection. In IEEE IDEAL, 2013 11/ 68
  • 12. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE FRAUD DETECTION PROBLEM We formalize FD as a classification task f : Rn → {+, −}. X ∈ Rn is the input and Y ∈ {+, −} the output domain. + is the fraud (minority) and − the genuine (majority) class. Given a classifier K and a training set TN, we are interested in estimating for a new sample (x, y) the posterior probability P(y = +|x). Fraud& Genuine& 12/ 68
  • 13. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE FRAUD DETECTION DATA WL is processing millions of transactions per day. When a transaction arrives it contains about 20 raw features (e.g. CARD ID, AMOUNT, DATETIME, SHOP ID, CURRENCY, etc.). Aggregate features are compute to include the cardholder historical behaviour (e.g. AVG AMOUNT WEEK, TIME FROM LAST TRX, NB TRX SAME SHOP, etc.). Final feature vector contains about 50 variables. 13/ 68
  • 14. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ACCURACY MEASURE OF A FDS In a detection problem it is more important to provide accurate ranking than correct classification. In a realistic FDS, investigators check only a limited number of transactions. Hence, Pk is the most relevant measure, with k being the number of transactions checked. Pk is the proportion of frauds in k most risky transactions. k Predicted))Score True)Class 1 0.99 + 2 0.98 + 3 0.98 : 4 0.96 : 5 0.95 + 6 0.92 : 7 0.91 + 8 0.84 : 9 0.82 : 10 0.79 : 11 0.76 : 12 0.75 + 13 0.7 : 14 0.67 : 15 0.64 + 16 0.61 : 17 0.58 : 18 0.55 + 19 0.52 : 20 0.49 : 14/ 68
  • 15. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions Theoretical contribution CHAPTER 4 - UNDERSTANDING SAMPLING METHODS Publications: A. Dal Pozzolo, O. Caelen, and G. Bontempi. When is undersampling effective in unbalanced classification tasks? In ECML-KDD, 2015. A. Dal Pozzolo, O. Caelen, R. Johnson and G. Bontempi. Calibrating Probability with Undersampling for Unbalanced Classification. In IEEE CIDM, 2015. A. Dal Pozzolo, O. Caelen, S. Waterschoot, G. Bontempi, Racing for unbalanced methods selection. In IEEE IDEAL, 2013 15/ 68
  • 16. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions OBJECTIVE OF CHAPTER 4 In Chapter 4 we aim to analyse the shift in the posterior distribution due to undersampling and to study its impact on the final ranking. show under which condition undersampling is expected to improve classification accuracy. propose a method to calibrate posterior probabilities and to select the classification threshold. show how to efficiently select the best unbalanced strategy for a given dataset. 16/ 68
  • 17. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions EFFECT OF UNDERSAMPLING Suppose that a classifier K is trained on set TN which is unbalanced. Let s be a random variable associated to each sample (x, y) ∈ TN, s = 1 if the point is sampled and s = 0 otherwise. Assume that s is independent of the input x given the class y (class-dependent selection): P(s|y, x) = P(s|y) ⇔ P(x|y, s) = P(x|y) . Undersampling-- Unbalanced- Balanced- Figure : Undersampling: remove randomly majority class examples. In red samples that are removed from the unbalanced dataset (s = 0). 17/ 68
  • 18. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions POSTERIOR PROBABILITIES Let ps = p(+|x, s = 1) and p = p(+|x). We can write ps as a function of p: ps = p p + β(1 − p) (1) where β = p(s = 1|−). Using (1) we can obtain an expression of p as a function of ps: p = βps βps − ps + 1 (2) 18/ 68
  • 19. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SHIFT AND CLASS SEPARABILITY (a) ps as a function of β 3 15 0 500 1000 1500 −10 0 10 20 −10 0 10 20 x Count class 0 1 (b) Class distribution Figure : Class distribution and posterior probability as a function of β for two univariate binary classification tasks with norm class conditional densities X− ∼ N(0, σ) and X+ ∼ N(µ, σ) (on the left µ = 3 and on the right µ = 15, in both examples σ = 3). Note that p corresponds to β = 1 and ps to β < 1. 19/ 68
  • 20. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONDITION FOR A BETTER RANKING Using νs > ν and by making a hypothesis of normality in the errors, K has better ranking with undersampling when dps dp > νs ν (3) where dps dp is the derivative of ps w.r.t. p: dps dp = β (p + β(1 − p))2 20/ 68
  • 21. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions UNIVARIATE SYNTHETIC DATASET −2 −1 0 1 2 0.00.20.40.60.81.01.2 x Posteriorprobability (a) Class conditional distributions (thin lines) and the posterior distri- bution of the minority class (thicker line). (b) dps dp (solid lines), νs ν (dotted lines). Figure : Non separable case. On the right we plot both terms of inequality 3 (solid: left-hand, dotted: right-hand term) for β = 0.1 and β = 0.4 21/ 68
  • 22. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions REAL DATASETS Figure : Difference between the Kendall rank correlation of ˆps and ˆp with p, namely Ks and K, for points having the condition (3) satisfied and not. Ks and K are calculated as the mean of the correlations over all βs. 22/ 68
  • 23. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions REAL DATASETS II Figure : Ratio between the number of sample satisfying condition 3 and all the instances available in each dataset averaged over all the βs. 23/ 68
  • 24. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DISCUSSION When (3) is satisfied the posterior probability obtained after sampling returns a more accurate ordering. Several factors influence (3) (e.g. β, variance of the classifier, class separability) Practical use (3) is not straightforward since it requires knowledge of p and νs ν (not easy to estimate). The sampling rate should be tuned, using a balanced distribution is not always the best option. 24/ 68
  • 25. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SELECTING THE BEST STRATEGY With no prior information about the data distribution is difficult to decide which unbalanced strategy to use. No-free-lunch theorem [23]: no single strategy is coherently superior to all others in all conditions (i.e. algorithm, dataset and performance metric) Testing all unbalanced techniques is not an option because of the associated computational cost. We proposed to use the Racing approach [17] to perform strategy selection. 25/ 68
  • 26. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions RACING FOR STRATEGY SELECTION Racing consists in testing in parallel a set of alternatives and using a statistical test to remove an alternative if it is significantly worse than the others. We adopted F-Race version [4] to search efficiently for the best strategy for unbalanced data. The F-race combines the Friedman test with Hoeffding Races [17]. 26/ 68
  • 27. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions RACING FOR UNBALANCED TECHNIQUE SELECTION Automatically select the most adequate technique for a given dataset. 1. Test in parallel a set of alternative balancing strategies on a subset of the dataset 2. Remove progressively the alternatives which are significantly worse. 3. Iterate the testing and removal step until there is only one candidate left or not more data is available Candidate(1 Candidate(2 Candidate(3 subset(1( 0.50 0.47 0.48 subset(2 0.51 0.48 0.30 subset(3 0.51 0.47 subset(4 0.60 0.45 subset(5 0.55 Candidate(1 Candidate(2 Candidate(3 subset(1( 0.50 0.47 0.48 subset(2 0.51 0.48 0.3027/ 68
  • 28. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions F-RACE METHOD Use 10-fold Cross Validation (CV) to provide the data during the race. Every time new data is added to the race, the Friedman test is used to remove significantly bad candidates. We made a comparison of CV and F-race using different classifiers. Candidate(1 Candidate(2 Candidate(3 subset(1( 0.50 0.47 0.48 subset(2 0.51 0.48 0.30 subset(3 0.51 0.47 subset(4 0.60 0.45 subset(5 0.55 Candidate(1 Candidate(2 Candidate(3 subset(1( 0.50 0.47 0.48 subset(2 0.51 0.48 0.30 subset(3 0.51 0.47 subset(4 0.49 0.46 subset(5 0.48 0.46 subset(6 0.60 0.45 subset(7 0.59 subset(8 subset(9 subset(10 Original(Dataset Time 28/ 68
  • 29. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions F-RACE VS CROSS VALIDATION Dataset Exploration Method Ntest % Gain Mean Sd ecoli Race Under 46 49 0.836 0.04 CV SMOTE 90 - 0.754 0.112 letter-a Race Under 34 62 0.952 0.008 CV SMOTE 90 - 0.949 0.01 letter-vowel Race Under 34 62 0.884 0.011 CV Under 90 - 0.887 0.009 letter Race SMOTE 37 59 0.951 0.009 CV Under 90 - 0.951 0.01 oil Race Under 41 54 0.629 0.074 CV SMOTE 90 - 0.597 0.076 page Race SMOTE 45 50 0.919 0.01 CV SMOTE 90 - 0.92 0.008 pendigits Race Under 39 57 0.978 0.011 CV Under 90 - 0.981 0.006 PhosS Race Under 19 79 0.598 0.01 CV Under 90 - 0.608 0.016 satimage Race Under 34 62 0.843 0.008 CV Under 90 - 0.841 0.011 segment Race SMOTE 90 0 0.978 0.01 CV SMOTE 90 - 0.978 0.01 estate Race Under 27 70 0.553 0.023 CV Under 90 - 0.563 0.021 covtype Race Under 42 53 0.924 0.007 CV SMOTE 90 - 0.921 0.008 cam Race Under 34 62 0.68 0.007 CV Under 90 - 0.674 0.015 compustat Race Under 37 59 0.738 0.021 CV Under 90 - 0.745 0.017 creditcard Race Under 43 52 0.927 0.008 CV SMOTE 90 - 0.924 0.006 Table : Results in terms of G-mean for RF classifier. 29/ 68
  • 30. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DISCUSSION The best strategy is extremely dependent on the data nature, algorithm adopted and performance measure. F-race is able to automatise the selection of the best unbalanced strategy for a given unbalanced problem without exploring the whole dataset. For the fraud dataset the unbalanced strategy chosen had a big impact on the accuracy of the results. 30/ 68
  • 31. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCLUSIONS OF CHAPTER 4 Undersampling is not always the best strategy for all unbalanced datasets. This result warning against a naive use of undersampling in unbalanced tasks. With racing [9] we can rapidly select the best strategy for a given unbalanced task. However, we see that undersampling and SMOTE are often the best strategy to adopt for FD. 31/ 68
  • 32. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions Methodological contribution CHAPTER 5 - LEARNING WITH CONCEPT DRIFT Publications: A. Dal Pozzolo, O. Caelen, Y. Le Borgne, S. Waterschoot, and G. Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Elsevier ESWA, 2014. A. Dal Pozzolo, R. Johnson, O. Caelen, S. Waterschoot, N. Chawla, and G. Bontempi. Using HDDT to avoid instances propagation in unbalanced and evolving data streams. In IEEE IJCNN, 2014. 32/ 68
  • 33. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions OBJECTIVE OF CHAPTER 5 The objectives of this chapter are: investigating the benefit of updating the FDS. assessing the impact of Concept Drift (CD). comparing alternative learning strategies for CD. studying an effective way to deal with unbalanced distribution in a streaming environment. 33/ 68
  • 34. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions LEARNING APPROACHES We compare three standard learning approaches on transactions from February 2012 to May 2013. Approach Strengths Weaknesses Static • Speed • No CD adaptation Update • No instances propagation • Need several batches • CD adaptation for the minority class Propagate and Forget • Accumulates minority • Propagation leads to instances faster larger training time • CD adaptation # Days # Features # Transactions Period 422 45 2’202’228 1Feb12 - 20May13 34/ 68
  • 35. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE STATIC APPROACH Time%Sta)c%approach% Fraudulent%transac)ons% Genuine%transac)ons% Bt 2 Bt 1 Bt Bt+1 Bt+2 Bt+3 Bt+4 Mt 35/ 68
  • 36. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE UPDATE APPROACH Time%Update%approach% Bt 2 Bt 1 Bt Bt+1 Mt 2 Mt 1 MtMt 3 Et Bt 5 Bt 4 Bt 3 Fraudulent%transac)ons% Genuine%transac)ons% 36/ 68
  • 37. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE PROPAGATE AND FORGET APPROACH Time% Propagate%and%Forget%approach% Bt 2 Bt 1 Bt Bt+1Bt 3 Mt 2 Mt 1 MtMt 3 Et Fraudulent%transac)ons% Genuine%transac)ons% 37/ 68
  • 38. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions BEST OVERALL STRATEGY The Propagate and Forget approach is the best while the Static approach is the worst. NNET.Under.One.1M.Static.60K SVM.Under.One.1M.Static.60K NNET.Under.One.1M.Static.90K SVM.Under.One.1M.Static.90K NNET.Under.One.1M.Static.120K SVM.Under.One.1M.Static.120K RF.Under.One.1M.Static.30K RF.Under.One.1M.Static.60K RF.SMOTE.One.1M.Static.90K RF.Under.One.1M.Static.90K RF.EasyEnsemble.One.1M.Static.90K RF.Under.One.1M.Static.120K RF.Under.15days.1M.Update.90K RF.Under.Daily.30M.Forget.30Kgen RF.Under.Weekly.1M.Update.90K RF.Under.Daily.15M.Forget.30Kgen RF.Under.Daily.10M.Forget.30Kgen RF.Under.Daily.1M.Update.90K RF.Under.Daily.30M.Update.90K RF.SMOTE.Daily.1M.Forget.90Kgen RF.Under.Daily.5M.Forget.30Kgen RF.Under.Daily.1M.Forget.90Kgen RF.Under.Daily.15M.Update.90K RF.Under.Daily.5M.Update.90K RF.EasyEnsemble.Daily.1M.Update.90K RF.SMOTE.Daily.1M.Update.90K RF.EasyEnsemble.Daily.1M.Forget.90Kgen 0 1000 2000 3000 4000 Sum of the ranks Best significant FALSE TRUE Metric: PrecisionRank Figure : Comparison of all strategies using sum of ranks in all batches when using Pk as accuracy measure. 38/ 68
  • 39. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DISCUSSION Without CD we would expect the three approaches to perform similarly. The static approach is consistently the worst. Ensembles are often better alternative than single models. Higher accuracy can be achieved by using SMOTE or undersampling the data. Instance propagation can be an alternative way to rebalance the data stream. 39/ 68
  • 40. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCLUSION OF CHAPTER 5 Updating the models daily and combining them into ensemble is an effective strategy to adapt to CD. Retaining positives samples/undersampling the data stream improves the accuracy of the FDS. However, it is not always possible to either revisit or store old instances of a data stream. In that case we reccomend using Hellinger Distance Decision Tree [7] (HDDT) to avoid instances propagation. 40/ 68
  • 41. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions Proposed FDS CHAPTER 6 - LEARNING WITH ALERT-FEEDBACK INTERACTION Publications: A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. Credit Card Fraud Detection with Alert-Feedback Interaction. Submitted to IEEE TNNLS, 2015. A. Dal Pozzolo, G. Boracchi, O. Caelen, C. Alippi, and G. Bontempi. Credit Card Fraud Detection and Concept-Drift Adaptation with Delayed Supervised Information. In IEEE IJCNN, 2015. 41/ 68
  • 42. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions OBJECTIVE OF CHAPTER 6 Propose a framework meeting realistic working conditions. Model the interaction between the FDS generating alerts and investigators. Integrate investigators’ feedback in the FDS. 42/ 68
  • 43. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions REALISTIC WORKING CONDITIONS Alert-Feedback Interaction: the supervised samples depends on the alerts reported to investigators. Only a small set of alerts can be checked by human investigators. The labels of the majority of transactions are available only several days later (after customers have reported unauthorized transactions). Investigators trust a FDS only if it provides accurate alerts, i.e. alert precision (Pk) is the accuracy measure that matters. 43/ 68
  • 44. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SUPERVISED SAMPLES The k transactions with largest PKt−1 (+|x) define the alerts At reported to the investigators. Investigators provide a feedback about the alerts in At, defining a set of k supervised couples (x, y): Ft = {(x, y), x ∈ At}, (4) Ft are the only immediate supervised samples. Dt−δ are transactions that have not been checked by investigators, but their label is assumed to be correct after that δ days have elapsed. 44/ 68
  • 45. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SUPERVISED SAMPLES II Ft are a small set of risky transactions according the FDS. Dt−δ contains all the occurred transactions in a day (≈ 99% genuine transactions). t −1 t t +1 Feedbacks)Delayed)Samples) t −δ All)fraudulent)transac6ons)of)a)day) All)genuine)transac6ons)of)a)day) Fraudulent)transac6ons)in)the)feedback) Genuine)transac6ons)in)the)feedback) FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8 Bt+1 Figure : The supervised samples available at day t include: i) feedbacks of the first δ days and ii) delayed couples occurred before the δth day. 45/ 68
  • 46. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions FEEDBACKS VS DELAYED SAMPLES Learning from feedbacks Ft is a different problem than learning from delayed samples in Dt−δ: Ft provides recent, up-to-date, information while Dt−δ might be already obsolete once it comes. Percentage of frauds in Ft and Dt−δ is different. Samples in Ft are selected as the most risky by Kt−1. A classifier trained on Ft learns how to label transactions that are most likely to be fraudulent. Feedbacks and delayed samples have to be treated separately. 46/ 68
  • 47. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions LEARNING STRATEGY A conventional solution for CD adaptation is Wt [20]. To learn separately from feedbacks and delayed transactions we propose AW t (aggregation of Ft and WD t ). FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8 Wt FtWD t AW t Figure : Learning strategies for feedbacks and delayed transactions occurring in the two days (M = 2) before the feedbacks (δ = 7). 47/ 68
  • 48. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CLASSIFIER AGGREGATIONS WD t has to be aggregated with Ft to exploit information provided by feedbacks. We combine these classifiers by averaging the posterior probabilities. PAW t (+|x) = αtPFt (+|x) + (1 − αt)PWD t (+|x) (5) AW t with αt ≥ 0.5 gives larger influence to feedbacks on the probability estimates w.r.t Wt. 48/ 68
  • 49. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DATASETS We used three datasets of e-commerce credit card transactions (fraud << 1%): Table : Datasets Id Start day End day # Days # Transactions # Features 2013 2013-09-05 2014-01-18 136 21’830’330 51 2014 2014-08-05 2014-12-31 148 27’113’187 51 2015 2015-01-05 2015-05-31 146 27’651’197 51 2013 2014 2015 q q qqq q q q qq q q q qq q q q q qqq q q qq q q q qqq q q q q q q qq q qqq qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q qq q q q q q qq q q q q q q q qqq q q q q qq qq qqqq qqqq qqq q q q q q q q qqq q q q qq q q qq q q q q q q qq q q q q q q q q q q q qq qq qq q q q q q q q q qq q q qqqqqqq qq q q q qq q q qq q q q qqqqq q q q q q qq qq qq q qqqq q q q q q q q q q q q q q q q q q q q q q qqqq q q q q q q q q q qqq qq q q q q q q q qqq q q qq qq qqq qq q q q q q q q q q q q q q q qq q q qqq q qq qq q q q q q q q q q q q q q qqqq q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q qq q q q q q qq qqqq q q q q q q q q q qq q q qq qq q q q q q q q qq qqq qq q q q q q qq q q q q q q qq q q q q q q q q q qqq q q 0 500 1000 1500 2000 2500 Sep Oct Nov Dec Jan Aug Sep Oct Nov Dec JanJan Feb Mar Apr May Jun Day Numberoffrauds 49/ 68
  • 50. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SEPARATING FEEDBACKS FROM DELAYED SAMPLES Aggregating two distinct classifiers (Ft with WD t ) using AW t is better than pooling together feedbacks and delayed samples with Wt. dataset classifier mean sd 2013 F 0.62 0.25 2013 WD 0.54 0.22 2013 W 0.57 0.23 2013 AW 0.70 0.21 2014 F 0.59 0.29 2014 WD 0.58 0.26 2014 W 0.60 0.26 2014 AW 0.69 0.24 2015 F 0.67 0.25 2015 WD 0.66 0.21 2015 W 0.68 0.21 2015 AW 0.75 0.20 Table : Average Pk with δ = 7, M = 16 and αt = 0.5. 50/ 68
  • 51. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions EXPERIMENTS WITH ARTIFICIAL CD AW W (a) Sliding window strate- gies on dataset CD1 AW W (b) Sliding window strate- gies on dataset CD2 W AW (c) Sliding window strate- gies on dataset CD3 Figure : Average Pk per day on datasets with artificial CD using moving average of 15 days. The vertical bar denotes the date of CD. 51/ 68
  • 52. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CLASSIFIERS IGNORING AFI Experiments with a classifier Rt trained on all recent transactions between t and t − δ. Rt makes the unrealistic assumption that all transactions are labeled by investigators. FtFt 1Ft 3Ft 4Ft 5Ft 6 Ft 2Dt 7Dt 8 FtWD t AW t Rt Figure : Rt makes the unrealistic assumption that all transactions between t and t − δ have been checked by investigators. 52/ 68
  • 53. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CLASSIFIERS IGNORING AFI II AUC is a global ranking measure, while Pk measures the ranking quality only on the top k. (a) dataset 2013 metric classifier mean sd AUC F 0.83 0.06 AUC WD 0.94 0.03 AUC R 0.96 0.01 AUC AW 0.94 0.03 Pk F 0.67 0.24 Pk WD 0.54 0.22 Pk R 0.60 0.22 Pk AW 0.72 0.20 (b) dataset 2014 metric classifier mean sd AUC F 0.81 0.08 AUC WD 0.94 0.03 AUC R 0.96 0.02 AUC AW 0.93 0.03 Pk F 0.67 0.27 Pk WD 0.56 0.27 Pk R 0.63 0.25 Pk AW 0.71 0.25 (c) dataset 2015 metric classifier mean sd AUC F 0.81 0.07 AUC WD 0.95 0.02 AUC R 0.97 0.01 AUC AW 0.94 0.02 Pk F 0.71 0.21 Pk WD 0.62 0.21 Pk R 0.68 0.20 Pk AW 0.76 0.19 Table : Average AUC and Pk for the sliding approach (δ = 15, M = 16 and αt = 0.5). 53/ 68
  • 54. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADAPTIVE WEIGHTING Experiments with different weighting strategies in AW t . (a) dataset 2013 α adaptation mean sd probDif 0.70 0.22 acc 0.72 0.21 accMiss 0.72 0.21 precision 0.72 0.21 recall 0.71 0.21 auc 0.72 0.21 none(αt = 0.5) 0.72 0.20 (b) dataset 2014 α adaptation mean sd probDif 0.68 0.26 acc 0.71 0.24 accMiss 0.70 0.24 precision 0.71 0.24 recall 0.70 0.25 auc 0.71 0.24 none(αt = 0.5) 0.71 0.24 (c) dataset 2015 α adaptation mean sd probDif 0.73 0.21 acc 0.76 0.19 accMiss 0.75 0.19 precision 0.75 0.19 recall 0.74 0.20 auc 0.75 0.19 none(αt = 0.5) 0.76 0.19 Table : Average Pk of AW t with adaptive αt (δ = 15). 54/ 68
  • 55. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADAPTIVE WEIGHTING II Figure : PFt (+|x) and PWD t (+|x) for alerts in different days. 55/ 68 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q qq q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q qq q q qq q q q q 0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00 PF(+|x) PWD(+|x) Class q q + − Alert q FALSE TRUE Pk: 0.64 alpha: 0.464
  • 56. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCLUSION OF CHAPTER 6 The goal of the FDS is to return precise alerts (high Pk). In a realistic settings feedbacks and delayed samples are the only supervised information available. Feedbacks and delayed samples address two different classification tasks. Not convenient to pool the two types of supervised samples together. 56/ 68
  • 57. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCLUSIONS AND FUTURE PERSPECTIVES 57/ 68
  • 58. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SUMMARY OF CONTRIBUTIONS In this thesis we: formalize realistic working conditions of a FDS. study how undersampling can improve the ranking of a classifier K. release a R package called unbalanced [8] with racing. investigate strategies for data streams with both CD and unbalanced distribution. prototype a real-world FDS that, for the first time, learns from investigators’ feedback and allows us to have more precise alerts. 58/ 68
  • 59. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions LEARNED LESSONS Rebalancing a training set with undersampling is not guaranteed to improve performances. F-Race [4] can be effective to rapidly select the unbalanced technique to use. The data stream defined by credit card transactions is severely affected by CD, updating the FDS is often a good idea. For a real-world FDS it is mandatory to produce precise alerts. Feedbacks and delayed samples should be separately handled when training a FDS. Aggregating two distinct classifiers is an effective strategy and that it enables a quicker adaptation to CD. 59/ 68
  • 60. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions TAKE-HOME MESSAGES Class imbalance and CD seriously affect the performances of a classifier. In a real-world scenario, there is a strong Alert-Feedback interaction that has to be explicitly considered. The best learning algorithm for a FDS changes with the accuracy measure. A standard classifier that is trained on all recent transactions is not the best when we want to produce accurate alerts. 60/ 68
  • 61. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions FUTURE WORK: TOWARDS BIG DATA SOLUTIONS This work will be continued in the BruFence project. Big Data Mining for Fraud Detection and Security. Three research groups (ULB-QualSec, ULB-MLG and UCL-MLG) and three companies (Worldline, Steria and Nviso). 2015 - 2018 funded by InnovIRIS (Brussels Region). BruFence& •  Big&Data&Mining&for&Fraud&Detec;on&and&Security&& •  2015O2018&funded&by&Innoviris&(Brussels&Region).&& c nce - consortium• ULB - QualSec • ULB - MLG • UCL - MLG ! ! ! ! • Wordline • Steria BruFence - consortiumQualSec MLG MLG ine uFence - consortium Spice LB - QualSec LB - MLG CL - MLG ordline eria Viso BruFence - consortium Spice 61/ 68
  • 62. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions THE BruFence PROJECT Goal: implement the FDS of Chapter 6 using Big Data technologies (e.g. Spark, NoSQL DB). Real-time framework scalable w.r.t. the amount of data and resources available. Determine to which extent external network data can improve prediction accuracy. Compute aggregated features in real time and using a larger set of historical transactions. Figure : Apache Big Data softwares 62/ 68
  • 63. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADDED VALUE FOR WL RF has emerged as the best algorithm (now used by WL). New ways to create aggregated features. Undersampling can be used to improve the accuracy of FDS, but probabilities need to be calibrated. Our experimental analysis on CD gave WL some guidelines on how to improve the FDS. Including feedbacks in the learning process allows WL to obtain more accurate alerts. 63/ 68
  • 64. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONCLUDING REMARKS Doctiris was an unique opportunity to work on real-world fraud detection data. Real working conditions define new challenges (often ignored in the literature). Companies are more interested on practical results, while academics on theoretical ones. Industry and university, should look at each other and exchange ideas in order to have a much larger impact on society. 64/ 68
  • 65. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions Website: www.ulb.ac.be/di/map/adalpozz Email: adalpozz@ulb.ac.be Thank you for the attention 65/ 68
  • 66. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions BIBLIOGRAPHY I [1] D. N. A. Asuncion. UCI machine learning repository, 2007. [2] C. Alippi, G. Boracchi, and M. Roveri. A just-in-time adaptive classification system based on the intersection of confidence intervals rule. Neural Networks, 24(8):791–800, 2011. [3] A. Bifet, G. Holmes, R. Kirkby, and B. Pfahringer. Moa: Massive online analysis. The Journal of Machine Learning Research, 99:1601–1604, 2010. [4] M. Birattari, T. St¨utzle, L. Paquete, and K. Varrentrapp. A racing algorithm for configuring metaheuristics. In Proceedings of the genetic and evolutionary computation conference, pages 11–18, 2002. [5] D. A. Cieslak and N. V. Chawla. Detecting fractures in classifier performance. In Data Mining, 2007. ICDM 2007. Seventh IEEE International Conference on, pages 123–132. IEEE, 2007. [6] D. A. Cieslak and N. V. Chawla. Learning decision trees for unbalanced data. In Machine Learning and Knowledge Discovery in Databases, pages 241–256. Springer, 2008. [7] D. A. Cieslak, T. R. Hoens, N. V. Chawla, and W. P. Kegelmeyer. Hellinger distance decision trees are robust and skew-insensitive. Data Mining and Knowledge Discovery, 24(1):136–158, 2012. [8] A. Dal Pozzolo, O. Caelen, and G. Bontempi. unbalanced: Racing For Unbalanced Methods Selection., 2015. R package version 2.0. 65/ 68
  • 67. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions BIBLIOGRAPHY II [9] A. Dal Pozzolo, O. Caelen, S. Waterschoot, and G. Bontempi. Racing for unbalanced methods selection. In Proceedings of the 14th International Conference on Intelligent Data Engineering and Automated Learning. IDEAL, 2013. [10] C. Elkan. The foundations of cost-sensitive learning. In International Joint Conference on Artificial Intelligence, volume 17, pages 973–978. Citeseer, 2001. [11] R. Elwell and R. Polikar. Incremental learning of concept drift in nonstationary environments. Neural Networks, IEEE Transactions on, 22(10):1517–1531, 2011. [12] J. Gao, B. Ding, W. Fan, J. Han, and P. S. Yu. Classifying data streams with skewed class distributions and concept drifts. Internet Computing, 12(6):37–49, 2008. [13] T. R. Hoens, R. Polikar, and N. V. Chawla. Learning from streaming data with concept drift and imbalance: an overview. Progress in Artificial Intelligence, 1(1):89–101, 2012. [14] M. G. Kelly, D. J. Hand, and N. M. Adams. The impact of changing populations on classifier performance. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining, pages 367–371. ACM, 1999. [15] A. Liaw and M. Wiener. Classification and regression by randomforest. R News, 2(3):18–22, 2002. 66/ 68
  • 68. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions BIBLIOGRAPHY III [16] R. N. Lichtenwalter and N. V. Chawla. Adaptive methods for classification in arbitrarily imbalanced and drifting data streams. In New Frontiers in Applied Data Mining, pages 53–75. Springer, 2010. [17] O. Maron and A. Moore. Hoeffding races: Accelerating model selection search for classification and function approximation. Robotics Institute, page 263, 1993. [18] C. R. Rao. A review of canonical coordinates and an alternative to correspondence analysis using hellinger distance. Questii´o: Quaderns d’Estad´ıstica, Sistemes, Informatica i Investigaci´o Operativa, 19(1):23–63, 1995. [19] M. Saerens, P. Latinne, and C. Decaestecker. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural computation, 14(1):21–41, 2002. [20] D. K. Tasoulis, N. M. Adams, and D. J. Hand. Unsupervised clustering in streaming data. In ICDM Workshops, pages 638–642, 2006. [21] V. N. Vapnik and V. Vapnik. Statistical learning theory, volume 1. Wiley New York, 1998. [22] S. Wang, K. Tang, and X. Yao. Diversity exploration and negative correlation learning on imbalanced data sets. In Neural Networks, 2009. IJCNN 2009. International Joint Conference on, pages 3259–3266. IEEE, 2009. [23] D. H. Wolpert. The lack of a priori distinctions between learning algorithms. Neural computation, 8(7):1341–1390, 1996. 67/ 68
  • 69. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions BIBLIOGRAPHY IV 68/ 68
  • 70. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions OPEN ISSUES Agreeing on a good performance measure (cost-based Vs detection). Modelling Alert-Feedback Interaction. Using both the supervised and unsupervised information available. 68/ 68
  • 71. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions AVERAGE PRECISION Let N+ be the number of positive (fraud) case in the original dataset and TPk be the number of true positives in the first k ranked transactions (TPk ≤ k). Let us denote Precision and Recall at k as Pk = TPk k and Rk = TPk N+ . We can then define AP as: AP = N k=1 Pk(Rk − Rk−1) (6) where N is the total number of observation in the dataset. 68/ 68
  • 72. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions RANKING ERROR We have a wrong ranking if ˆp1 > ˆp2 and its probability is: P(ˆp2 < ˆp1) = P(p2 + 2 < p1 + 1) = P( 1 − 2 > ∆p) where 2 − 1 ∼ N(0, 2ν). By making an hypothesis of normality we have P( 1 − 2 > ∆p) = 1 − Φ ∆p √ 2ν (7) where Φ is the cumulative distribution function of the standard normal distribution. The probability of a ranking error with undersampling is: P(ˆps,2 < ˆps,1) = P(η1 − η2 > ∆ps) and P(η1 − η2 > ∆ps) = 1 − Φ ∆ps √ 2νs (8) 68/ 68
  • 73. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CLASSIFICATION THRESHOLD Let r+ and r− be the risk of predicting an instance as positive and negative: r+ = (1 − p) · l1,0 + p · l1,1 r− = (1 − p) · l0,0 + p · l0,1 where li,j is the cost in predicting i when the true class is j and p = P(y = +|x). A sample is predicted as positive if r+ ≤ r− [21]. Alternatively, predict as positive when p > τ with τ: τ = l1,0 − l0,0 l1,0 − l0,0 + l0,1 − l1,1 (9) 68/ 68
  • 74. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CONDITION FOR A BETTER RANKING K has better ranking with undersampling w.r.t. a classifier learned with unbalanced distribution when P(ˆp2 < ˆp1) > P(ˆps,2 < ˆps,1) P( 1 − 2 > ∆p) > P(η1 − η2 > ∆ps) (10) or equivalently from (7) and (8) when 1 − Φ ∆p √ 2ν > 1 − Φ ∆ps √ 2νs ⇔ Φ ∆p √ 2ν < Φ ∆ps √ 2νs since Φ is monotone non decreasing and νs > ν: Then it follows that undersampling is useful (better ranking) when dps dp > νs ν 68/ 68
  • 75. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CORRECTING POSTERIOR PROBABILITIES We propose to use p , the bias-corrected probability obtained from ps: p = βps βps − ps + 1 (11) Eq. (11) is a special case of the framework proposed by Saerens et al. [19] and Elkan [10] for correcting the posterior probability in the case of testing and training sets sharing the same priors. 68/ 68
  • 76. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CORRECTING THE CLASSIFICATION THRESHOLD When the costs of a FN (l0,1) and FP (l1,0) are unknown, we can use the priors. Let l1,0 = π+ and l0,1 = π−, from (9) we get: τ = l1,0 l1,0 + l0,1 = π+ π+ + π− = π+ (12) since π+ + π− = 1. Then we should use π+ as threshold with p: p −→ τ = π+ Similarly ps −→ τs = π+ s From Elkan [10]: τ 1 − τ 1 − τs τs = β (13) Therefore, we obtain: p −→ τ = π+ 68/ 68
  • 77. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions CREDIT CARDS DATASET Real-world credit card dataset with transactions from Sep 2013, frauds account for 0.172% of all transactions. LB RF SVM qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqqqqqqqqqqqqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqq qqqqqqqq qqqqqqqq qqqqqqqq qqqqqqqq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.900 0.925 0.950 0.975 1.000 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta AUC Probability p p' ps Credit−card LB RF SVM q q q q q q q q q q q q q q q q q q q q qqqqqqqqqq qqqqqqqqqq qqqqqq qqqqqq q q q q q q q q q q q q qqqqqq qqqqqq qqqqqqqq qqqqqqqq 3e−04 6e−04 9e−04 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 0.10.20.30.40.50.60.70.80.9 1 beta BS Probability p p' ps Credit−card 68/ 68
  • 78. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DISCUSSION As a result of undersampling, ˆps is shifted away from ˆp. Using (11), we can remove the drift in ˆps and obtain ˆp which has better calibration. ˆp provides the same ranking quality of ˆps. Using ˆp with τ we are able to improve calibration without losing predictive accuracy. Unfortunately there is no way to select the best β automatically yet. 68/ 68
  • 79. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions HELLINGER DISTANCE Quantify the similarity between two probability distributions [18] Recently proposed as a splitting criteria in decision trees for unbalanced problems [6, 7]. In data streams, it has been used to detect classifier performance degradation due to concept drift [5, 16]. 68/ 68
  • 80. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions HELLINGER DISTANCE II Consider two probability measures P1, P2 of discrete distributions in Φ. The Hellinger distance is defined as: HD(P1, P2) = φ∈Φ P1(φ) − P2(φ) 2 (14) Hellinger distance has several properties: HD(P1, P2) = HD(P1, P2) (symmetric) HD(P1, P2) >= 0 (non-negative) HD(P1, P2) ∈ [0, √ 2] HD is close to zero when the distributions are similar and close to √ 2 for distinct distributions. 68/ 68
  • 81. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions HELLINGER DISTANCE DECISION TREES HDDT [6] use Hellinger Distance as splitting criteria in decision trees for unbalanced problems. All numerical features are partitioned into q bins (only categorical variables). The Hellinger Distance between the positive and negative class is computed for each feature and then the feature with the maximum distance is used to split the tree. HD(f+ , f− ) = q j=1   |f+ j | |f+| − |f− j | |f−|   2 (15) where j defines the jth bin of feature f and |f+ j | is the number of positive instances of feature f in the jth bin. Note that the class priors do not appear explicitly in (15). 68/ 68
  • 82. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions AVOIDING INSTANCES PROPAGATION HDDT allows us to remove instance propagations between batches, benefits: improved predictive accuracy improved speed single-pass through the data. An ensemble of HDDTs is used to adapt to CD and increase the accuracy of single classifiers. We test our framework on several streaming datasets with unbalanced classes and concept drift. 68/ 68
  • 83. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions DATASETS We used different types of datasets. UCI datasets [1] (unbalanced, no CD). Synthetic datasets from MOA [3] (artificial CD). A credit card dataset (online payment transactions between the 5th and the 25th of Sept. 2013). Name Source Instances Features Imbalance Ratio Adult UCI 48,842 14 3.2:1 can UCI 443,872 9 52.1:1 compustat UCI 13,657 20 27.5:1 covtype UCI 38,500 10 13.0:1 football UCI 4,288 13 1.7:1 ozone-8h UCI 2,534 72 14.8:1 wrds UCI 99,200 41 1.0:1 text UCI 11,162 11465 14.7:1 DriftedLED MOA 1,000,000 25 9.0:1 DriftedRBF MOA 1,000,000 11 1.02:1 DriftedWave MOA 1,000,000 41 2.02:1 Creditcard FRAUD 3,143,423 36 658.8:1 68/ 68
  • 84. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions RESULTS CREDIT CARD DATASET Figure 18 shows the sum of the ranks for each strategy over all the batches. For each batch, we assign the highest rank to the most accurate strategy and then sum the ranks over all batches. HDIG No Ensemble 0.00 0.25 0.50 0.75 1.00 BD BL SE UNDER BD BL SE UNDER Sampling AUROC Algorithm C4.5 HDDT Figure : Batch average results in terms of AUROC (higher is better) using different sampling strategies and batch-ensemble weighting methods with C4.5 and HDDT over Credit card dataset. BL_C4.5 SE_C4.5 BD_C4.5 UNDER_C4.5 BL_HDIG_C4.5 SE_HDDT SE_HDIG_C4.5 UNDER_HDDT BL_HDDT BD_HDIG_C4.5 BD_HDDT BD_HDIG_HDDT UNDER_HDIG_C4.5 SE_HDIG_HDDT UNDER_HDIG_HDDT BL_HDIG_HDDT 0 100 200 Sum of the ranks Best significant FALSE TRUE AUROC Figure : Sum of ranks in all chunks for the Credit card dataset in terms of AUROC. In gray are the strategies that are not significantly worse than the best having the highest sum of ranks. 68/ 68
  • 85. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions HELLINGER DISTANCE BETWEEN BATCHES Let Bt be the batch at time t used for training Kt and Bt+1 the testing batch. We compute the distance between Bt and Bt+1 for a given feature f using Hellinger distance as [16]: HD(B f t, B f t+1) = v∈f   |B f=v t | |Bt| − |B f=v t+1| |Bt+1|   2 (16) where |B f=v t | is the number of instances of feature f taking value v in the batch at time t, while |Bt| is the total number of instances in the same batch. 68/ 68
  • 86. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions HELLINGER DISTANCE AND INFORMATION GAIN We can make the assumption that feature relevance remains stable over time and define a new distance function that combines IG and HD as: HDIG(Bt, Bt+1, f) = HD(B f t, B f t+1) ∗ (1 + IG(Bt, f)) (17) The final distance between two batches is then computed taking the average over all the features. AHDIG(Bt, Bt+1) = f∈Bt HDIG(Bt, Bt+1, f) |f ∈ Bt| (18) Then Kt receives as weight wt = AHDIG(Bt, Bt+1)−M, where M is the ensemble size. 68/ 68
  • 87. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADAPTIVE WEIGHTING II Let FAW t be the feedbacks requested by AW t and FFt be the feedbacks requested by Ft. Let Y+ t (resp. Y− t ) be the fraudulent (resp. genuine) transactions at day t (Bt), we define the following sets: CORRFt = FAW t ∩ FFt ∩ Y+ t . ERRFt = FAW t ∩ FFt ∩ Y− t . MISSFt = FAW t ∩ {Bt FFt } ∩ Y+ t . 68/ 68
  • 88. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADAPTIVE WEIGHTING II In the example below we have |CORRFt | = 4, |ERRFt | = 2 and |MISSFt | = 3. FAW t FFt FWD t Bt Figure : Feedbacks requested by AW t (FAW t ) are a subset of all the transactions of day t (Bt). FFt ( FWD t ) denotes the feedbacks requested by Ft (WD t ). The symbol + is used for frauds and − for genuine transactions. 68/ 68
  • 89. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions ADAPTIVE WEIGHTING III We can now compute some accuracy measures of Ft in the feedbacks requested by AW t : accFt = 1 − |ERRFt | k accMissFt = 1 − |ERRFt |+|MISSFt | k precisionFt = |CORRFt | k recallFt = |CORRFt | |FAW t ∩Y+ t | aucFt = probability that fraudulent feedbacks rank higher than genuine feedbacks in FAW t according to PFt . probDifFt = difference between the mean of PFt calculated on the frauds and the mean on the genuine in FAW t . If we choose aucFt as metric for Ft and similarly aucWD t for WD t , then αt+1 is: αt+1 = aucFt aucFt + aucWD t (19) 68/ 68
  • 90. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions SELECTION BIAS AND AFI At day t AFI provides Ft, which is not representative of Bt. ∀(x, y) ∈ Bt let s ∈ {0, 1} where s = 1 if (x, y) ∈ Ft, and s = 0 otherwise (Ft ⊂ Bt). With Ft we get PKt (+|x, s = 1) rather than PKt (+|x). A standard solution to SSB consist into training a weight-sensitive algorithm, where (x, y) has weight w: w = P(s = 1) P(s = 1|x, y) (20) Ft are selected according to P(+|x), hence P(s = 1|x) and P(+|x) are highly correlated. We expect P(s = 1|x, y) to be larger for feedbacks, leading to smaller weights in (20). 68/ 68
  • 91. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions INCREASING δ AND SSB CORRECTION Table : Average Pk when δ = 15. (a) sliding dataset classifier mean sd 2013 F 0.67 0.24 2013 WD 0.54 0.22 2013 AW 0.72 0.20 2014 F 0.67 0.27 2014 WD 0.56 0.27 2014 AW 0.71 0.25 2015 F 0.71 0.21 2015 WD 0.62 0.21 2015 AW 0.76 0.19 (b) ensemble dataset classifier mean sd 2013 F 0.67 0.23 2013 ED 0.48 0.23 2013 AE 0.72 0.21 2014 F 0.67 0.28 2014 ED 0.49 0.27 2014 AE 0.70 0.25 2015 F 0.71 0.22 2015 ED 0.56 0.18 2015 AE 0.75 0.20 Table : Average Pk of Ft with methods for SSB correction (δ = 15). dataset SSB correction mean sd 2013 SJA 0.66 0.24 2013 weigthing 0.64 0.25 2014 SJA 0.66 0.27 2014 weigthing 0.65 0.28 2015 SJA 0.71 0.22 2015 weigthing 0.68 0.23 68/ 68
  • 92. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions TRAINING SET SIZE AND TIME qqq qqqqqqqqqqqqq qqqqqqqqqqqqqqq 0e+00 1e+06 2e+06 3e+06 delayed feedback Classifier Trainingsetsize adaptation sliding ensemble (a) Training set size qqq q q qq 0 100 200 300 400 delayed feedback Classifier Trainingtime adaptation sliding ensemble (b) Training time Figure : Training set size and time (in seconds) to train a RF for the feedback (Ft) and delayed classifiers (WD t and ED t ) in the 2013 dataset. 68/ 68
  • 93. Introduction Sampling Concept Drift Alert-Feedback Interaction Conclusions FEATURE IMPORTANCE ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● TX_ACCEPTED HAD_REFUSED NB_REFUSED_HIS TX_INTL HAD_TRX_SAME_SHOP TX_3D_SECURE IS_NIGHT HAD_LOT_NB_TX RISK_GENDER NB_TRX_SAME_SHOP_HIS RISK_CARD_BRAND RISK_D_AGE RISK_LANGUAGE HAD_TEST RISK_D_AMT RISK_TERM_MCC_GROUP TX_HOUR RISK_BROKER RISK_TERM_MCCG RISK_LAST_COUNTRY_HIS AGE RISK_D_SUM_AMT RISK_TERM_REGION RISK_TERM_MCC NB_TRX_HIS RISK_TERM_COUNTRY AMOUNT RISK_TERM_CONTINENT MIN_AMT_HIS SUM_AMT_HIS RISK_LAST_MIDUID_HIS RISK_TERM_MIDUID 0 20 40 60 Feature importance RF model day 20130906 Figure : Average feature importance measured by the mean decrease in accuracy calculated with the Gini index in the RF models of WD t in the 2013 dataset. The randomForest package [15] available in R use the Gini index as splitting criteria. 68/ 68