SlideShare une entreprise Scribd logo
1  sur  23
Credit card fraud detection using
Machine Learning Techniques:
A Comparative Analysis
Flow of Talk
 Introduction
 Experimental Set Up and Methods
 Performance Evaluation and Results
 Conclusion
 References
Introduction
 Credit Card Fraud
Inner Card Fraud: Done by using false identity
External Card Fraud: Done by using stolen credit card
 How Frauds are Recognized
Location: Purchase made from different location
Items you buy: If you deviate from your regular buying
pattern or time
Frequency: Make a large number of transactions in short
period of time
Amount: Suddenly if the costly items are purchased
Fraud can be defined as criminal deception with intent of
acquiring financial gain.
Contd…
Challenges:
The data is highly skewed.
Normal Machine Learning algorithms would give 99%+ Accuracy.
But we can get 99.8% accuracy even if we classify all Frauds as
Legitimate.
Authentic Fraud
Fig 1. Highly Unbalanced data
Contd…
 Feature (Variables) Selection
 Transaction statistics
 Regional statistics
 Merchant type statistics
 Time based amount statistics
 Time based number of transaction statistics
 Credit Card Fraud Detection
 Supervised: Models are estimated based on samples of
fraudulent and legitimate transactions to classify new transactions
as fraud or legitimate.
 Unsupervised: The outliers’ transactions are considered as
potential instance of fraud.
 Comparative Study
Experimental Set Up and
Methods
 Dataset
Sourced from ULB Machine Learning Group
Consisting 2,84,807 transactions , 0.172% fraud cases
Highly Unbalanced and Skewed towards fraud class
 Resampling Methods
The resampling methods are used to adjust the class
distribution of the data as the minority class is not equally
represented.
There are three methods to perform resampling :
Oversampling
Undersampling
Hybrid Sampling
Contd…
 SMOTE :
 SMOTE stands for Synthetic Minority Oversampling Technique.
 This is a statistical technique to increase the number samples in
the minority class in the dataset to make it balanced.
 It works by generating new instances of data from existing data
by taking feature space of each target class and its nearest
neighbours.
 But it is effective upto 6 to 7 parameters.
 Hybrid Sampling of data:
 It is done by stepwise addition and subtraction of data points
interpolated between existing data points till the overfitting
threshold is reached.
Contd…
 Naïve Bayes Classifier
 A simple classifier model that is:
 Based on the Bayes theorem and Conditional Probability
 Chooses decision based on highest probability
 Allows prior knowledge and logic
 Uses Supervised Learning
 Conditional Probability:
 Conditional probability is a measure of the probability of an
event given that another event has already occurred.
 Bayes Theorem:
 P(H|E)= P(E|H) P(H)/ P(E)
 P(H) is the probability of hypothesis H being true.
 P(E) is the probability of the evidence(regardless of the
hypothesis).
 P(E|H) is the probability of the evidence given that hypothesis
is true.
 P(H|E) is the probability of the hypothesis given that the
evidence is there.
Contd..
 Naïve Bayes Classifier
Bayesian classification rule
Contd…
 k-Nearest Neighbour Classifier
K - Nearest neighbors is a lazy learning instance based
classification( regression ) algorithm which is widely
implemented in both supervised and unsupervised learning
techniques.
It is lazy Learner as it doesn't learn from a discriminative
function from training data but memorizes training dataset.
This technique implements classification by considering
majority of vote among the “k” closest points to the
unlabeled data point.
It uses three types of functions for distance calculation
 Euclidian
 Manhattan
 Minkowski
Contd…
 k-Nearest Neighbour Classifier
Formula to calculate Euclidean distance
Fig 2. Example of k-NN classification
Contd…
 Logistic Regression Classifier:
Logistic Regression is a statistical method for analyzing a
dataset in which there are one or more independent
variables that determine an outcome. The outcome is
measured with a dichotomous variable, where there are
only two possible outcomes.
The goal of logistic regression is to find the best fitting
model to describe the relationship between the
dichotomous characteristic of interest, and a set of
independent variables.
Logistic Regression generates the coefficients of a formula
to predict a Logit Transformation of the probability of
presence of the characteristic of interest.
Contd…
 Logistic Regression Classifier
Fig.3 Working of Logistic Regression Model
Contd…
Sigmoid Function:
A sigmoid function is a mathematical function having a
characteristic "S"-shaped curve or sigmoid curve.
A sigmoid function is a bounded, differentiable, real function that
is defined for all real input values and has a non-negative derivative
at each point.
Fig.4 Sigmoid Function
Performance Evaluation and
Results
 Four metrics used in evaluation
◦ True Positive Rate(TPR)
◦ True Negative Rate (TNR)
◦ False Positive Rate (FPR)
◦ False Negative Rate (FNR)
Contd …
 Performance of naïve bayes, k-nearest
neighbour and logistic regression
classifiers are evaluated based on :
Accuracy
Sensitivity
Specificity
Precision
Matthews Correlation Coefficient (MCC)
Balanced Classification Rate(BCR)
Contd…
Contd..
 Comparative Performance
Figure 4
Performance evaluation chart for naive
bayes, knn and logistic regression
*mcc = matthews correlation coefficient
*bcr = balanced classification rate
Figure 5
TPR and FPR evaluation of naïve bayes
classifiers
*TPR = true positive rate
*fpr = false positive rate
*proposed NB = proposed naïve bayes
classifier
Contd…
Figure 6
TPR and FPR evaluation of k-nearest
neighbour classifiers
*TPR = true positive rate
*fpr = false positive rate
*proposed knn = proposed k-nearest
neighbor classifier
Figure 7
TPR and FPR evaluation of logistic regression
classifiers
*TPR = true positive rate
*fpr = false positive rate
*proposed LR = proposed logistic rgression
classifier
Conclusion
 Three classifiers based on different machine learning
techniques (Naïve Bayes, K-nearest neighbours and Logistic
Regression) are trained on real life of credit card transactions
data and their performances on credit card fraud detection
evaluated and compared based on several relevant metrics.
 The highly imbalanced dataset is sampled in a hybrid approach
where the positive class is oversampled and the negative class
under-sampled, achieving two sets of data distributions.
 The performances of the three classifiers are examined on the
two sets of data distributions using accuracy, sensitivity,
specificity, precision, balanced classification rate and Matthews
Correlation coefficient metrics.
 Results from the experiment shows that the kNN shows
significant performance for all metrics evaluated except for
accuracy in the 10:90 data distribution.
References
1. S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, "Credit card fraud detection
using Bayesian and neural networks", Proceeding International NAISO Congress
on Neuro Fuzzy Technologies, 2002.
2. F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud Detection
System", Journal of Engineering Science and Technology, vol. 6, no. 3, pp. 311-
322, 2011.
3. K. RamaKalyani, D. UmaDevi, "Fraud Detection of Credit Card Payment System by
Genetic Algorithm", International Journal of Scientific & Engineering Research,
vol. 3, no. 7, pp. 1-6, 2012, ISSN 2229-5518.
4. P. L. Meshram, P. Bhanarkar, "Credit and ATM Card Fraud Detection Using Genetic
Approach", International Journal of Engineering Research & Technology (IJERT),
vol. 1, no. 10, pp. 1-5, 2012, ISSN ISSN: 2278-0181.
5. G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine Learning
Approach for Detection of Fraud based on SVM", International Journal of Scientific
Engineering and Technology, vol. 1, no. 3, pp. 194-198, 2012, ISSN ISSN: 2277-
1581.
6. K. R. Seeja, M. Zareapoor, "FraudMiner: A Novel Credit Card Fraud Detection
Model Based on Frequent Itemset Mining" in The Scientific World Journal,
Hindawi Publishing Corporation, vol. 2014, pp. 1-10, 2014.
7. S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit Card
Fraud Detection Using Decision Tree Induction Algorithm", International Journal of
Computer Science and Mobile Computing (IJCSMC), vol. 4, no. 4, pp. 92-95,
2015, ISSN ISSN: 2320-088X.
8. E. Duman, A. Buyukkaya, I. Elikucuk, "A novel and successful credit card fraud
detection system implemented in aturkish bank", Data Mining Workshops
(ICDMW). 2013 IEEE 13th International Conference on, pp. 162-171, 2013.
9. A. C. Bahnsen, A. Stojanovic, D. Aouada, B. Ottersten, "Improving credit card fraud
detection with calibrated probabilities", Proceedingis of the 2014 SIAM International
Conference on Data Mining, pp. 677-685, 2014.
10. A. Y. Ng, M. I. Jordan, "On discriminative vs. generative classifiers: A comparison of
logistic regression and naive bayes", Advances in neural information processing
systems, vol. 2, pp. 841-848, 2002.
11. S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, "Credit card fraud detection
using Bayesian and neural networks", Proceedings of the 1st international naiso
congress on neuro fuzzy technologies, pp. 261-270, 2002.
12. A. Shen, R. Tong, Y. Deng, "Application of classification models on credit card fraud
detection", Service Systems and Service Management 2007 International
Conference on, pp. 1-4, 2007.
13. S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, "Data mining for credit
card fraud: A comparative study", Decision Support Systems, vol. 50, no. 3, pp.
602-613, 2011.
14. Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic
regression", Innovations in Intelligent Systems and Applications (INISTA) 2011
International Symposium on, pp. 315-319, 2011.
15. K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection
techniques", International Journal of Computer Science and Network (IJCSN), vol.
1, no. 4, pp. 31-35, 2012, ISSN ISSN: 2277-5420.
16. T. P. Bhatla, V. Prabhu, A. Dua, Understanding credit card frauds. Crads Business
Review# 2003-1, Tata Consultancy Services, 2003.
17. U. S. Credit & Debit Cards 2015, David Robertson, 2015.
18. S. Stolfo, D. W. Fan, W. Lee, A. Prodromidis, P. Chan, "Credit card fraud detection
using meta-learning: Issues and initial results", AAAI-97 Workshop on Fraud
Detection and Risk Management, 1997.
Thank you

Contenu connexe

Tendances

Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud DetectionBinayakreddy
 
Loan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approachLoan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approachEslam Nader
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learningijtsrd
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)ajmal anbu
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION K Srinivas Rao
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud DetectionNitesh Kumar
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsHariteja Bodepudi
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionJustluk Luk
 
IRJET- Credit Card Fraud Detection using Random Forest
IRJET-  	  Credit Card Fraud Detection using Random ForestIRJET-  	  Credit Card Fraud Detection using Random Forest
IRJET- Credit Card Fraud Detection using Random ForestIRJET Journal
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningJohn Alex
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxkumari36
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxBhoirRitesh19ET5008
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraudHenley Walls
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detectionanthonytaylor01
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detectionjagan477830
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learningdataalcott
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPTSiddharth Modi
 

Tendances (20)

Credit Card Fraud Detection
Credit Card Fraud DetectionCredit Card Fraud Detection
Credit Card Fraud Detection
 
Credit card fraud dection
Credit card fraud dectionCredit card fraud dection
Credit card fraud dection
 
Loan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approachLoan approval prediction based on machine learning approach
Loan approval prediction based on machine learning approach
 
A Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine LearningA Study on Credit Card Fraud Detection using Machine Learning
A Study on Credit Card Fraud Detection using Machine Learning
 
Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)Credit card fraud detection pptx (1) (1)
Credit card fraud detection pptx (1) (1)
 
CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION CREDIT CARD FRAUD DETECTION
CREDIT CARD FRAUD DETECTION
 
CREDIT_CARD.ppt
CREDIT_CARD.pptCREDIT_CARD.ppt
CREDIT_CARD.ppt
 
Machine Learning for Fraud Detection
Machine Learning for Fraud DetectionMachine Learning for Fraud Detection
Machine Learning for Fraud Detection
 
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning AlgorithmsCredit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
Credit Card Fraud Detection Using Unsupervised Machine Learning Algorithms
 
Analysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detectionAnalysis of-credit-card-fault-detection
Analysis of-credit-card-fault-detection
 
IRJET- Credit Card Fraud Detection using Random Forest
IRJET-  	  Credit Card Fraud Detection using Random ForestIRJET-  	  Credit Card Fraud Detection using Random Forest
IRJET- Credit Card Fraud Detection using Random Forest
 
Predicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine LearningPredicting Diabetes Using Machine Learning
Predicting Diabetes Using Machine Learning
 
Prediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptxPrediction of heart disease using machine learning.pptx
Prediction of heart disease using machine learning.pptx
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptx
 
How to identify credit card fraud
How to identify credit card fraudHow to identify credit card fraud
How to identify credit card fraud
 
Credit card fraud detection
Credit card fraud detectionCredit card fraud detection
Credit card fraud detection
 
credit card fraud detection
credit card fraud detectioncredit card fraud detection
credit card fraud detection
 
Diabetes prediction using machine learning
Diabetes prediction using machine learningDiabetes prediction using machine learning
Diabetes prediction using machine learning
 
Face recognition technology - BEST PPT
Face recognition technology - BEST PPTFace recognition technology - BEST PPT
Face recognition technology - BEST PPT
 
Machine Can Think
Machine Can ThinkMachine Can Think
Machine Can Think
 

Similaire à Credit card fraud detection using machine learning Algorithms

Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfShrutiGarg649495
 
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning AlgorithmCredit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithmijtsrd
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Melissa Moody
 
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNINGCREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNINGIRJET Journal
 
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTION
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTIONMACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTION
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTIONmlaij
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?Fabricio Quintanilla
 
San Francisco Crime Prediction Report
San Francisco Crime Prediction ReportSan Francisco Crime Prediction Report
San Francisco Crime Prediction ReportRohit Dandona
 
A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionIRJET Journal
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSameer Darekar
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET Journal
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationAlejandro Correa Bahnsen, PhD
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection IJSTA
 
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNINGCREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNINGIRJET Journal
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An OverviewMachinePulse
 

Similaire à Credit card fraud detection using machine learning Algorithms (20)

Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdfTanvi_Sharma_Shruti_Garg_pre.pdf.pdf
Tanvi_Sharma_Shruti_Garg_pre.pdf.pdf
 
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning AlgorithmCredit Card Fraud Detection Using Hybrid Machine Learning Algorithm
Credit Card Fraud Detection Using Hybrid Machine Learning Algorithm
 
Cerdit card
Cerdit cardCerdit card
Cerdit card
 
Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)Welcome to International Journal of Engineering Research and Development (IJERD)
Welcome to International Journal of Engineering Research and Development (IJERD)
 
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
Improving Credit Card Fraud Detection: Using Machine Learning to Profile and ...
 
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNINGCREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION AND AUTHENTICATION SYSTEM USING MACHINE LEARNING
 
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTION
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTIONMACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTION
MACHINE LEARNING ALGORITHMS FOR CREDIT CARD FRAUD DETECTION
 
¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?¿Como los modelos predictivos cambian los negocios?
¿Como los modelos predictivos cambian los negocios?
 
San Francisco Crime Prediction Report
San Francisco Crime Prediction ReportSan Francisco Crime Prediction Report
San Francisco Crime Prediction Report
 
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptxCredit Card Fraud Detection_ Mansi_Choudhary.pptx
Credit Card Fraud Detection_ Mansi_Choudhary.pptx
 
A Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud DetectionA Comparative Study on Credit Card Fraud Detection
A Comparative Study on Credit Card Fraud Detection
 
San Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contestSan Francisco Crime Analysis Classification Kaggle contest
San Francisco Crime Analysis Classification Kaggle contest
 
IRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection AnalysisIRJET- Credit Card Fraud Detection Analysis
IRJET- Credit Card Fraud Detection Analysis
 
PhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive ClassificationPhD Defense - Example-Dependent Cost-Sensitive Classification
PhD Defense - Example-Dependent Cost-Sensitive Classification
 
Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection Meta Classification Technique for Improving Credit Card Fraud Detection
Meta Classification Technique for Improving Credit Card Fraud Detection
 
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNINGCREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
CREDIT CARD FRAUD DETECTION USING MACHINE LEARNING
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Credit iconip
Credit iconipCredit iconip
Credit iconip
 
Predictive Analytics - An Overview
Predictive Analytics - An OverviewPredictive Analytics - An Overview
Predictive Analytics - An Overview
 
Predictive Analytics Overview
Predictive Analytics OverviewPredictive Analytics Overview
Predictive Analytics Overview
 

Dernier

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Principled Technologies
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 

Dernier (20)

Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
Deploy with confidence: VMware Cloud Foundation 5.1 on next gen Dell PowerEdg...
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 

Credit card fraud detection using machine learning Algorithms

  • 1. Credit card fraud detection using Machine Learning Techniques: A Comparative Analysis
  • 2. Flow of Talk  Introduction  Experimental Set Up and Methods  Performance Evaluation and Results  Conclusion  References
  • 3. Introduction  Credit Card Fraud Inner Card Fraud: Done by using false identity External Card Fraud: Done by using stolen credit card  How Frauds are Recognized Location: Purchase made from different location Items you buy: If you deviate from your regular buying pattern or time Frequency: Make a large number of transactions in short period of time Amount: Suddenly if the costly items are purchased Fraud can be defined as criminal deception with intent of acquiring financial gain.
  • 4. Contd… Challenges: The data is highly skewed. Normal Machine Learning algorithms would give 99%+ Accuracy. But we can get 99.8% accuracy even if we classify all Frauds as Legitimate. Authentic Fraud Fig 1. Highly Unbalanced data
  • 5. Contd…  Feature (Variables) Selection  Transaction statistics  Regional statistics  Merchant type statistics  Time based amount statistics  Time based number of transaction statistics  Credit Card Fraud Detection  Supervised: Models are estimated based on samples of fraudulent and legitimate transactions to classify new transactions as fraud or legitimate.  Unsupervised: The outliers’ transactions are considered as potential instance of fraud.  Comparative Study
  • 6. Experimental Set Up and Methods  Dataset Sourced from ULB Machine Learning Group Consisting 2,84,807 transactions , 0.172% fraud cases Highly Unbalanced and Skewed towards fraud class  Resampling Methods The resampling methods are used to adjust the class distribution of the data as the minority class is not equally represented. There are three methods to perform resampling : Oversampling Undersampling Hybrid Sampling
  • 7. Contd…  SMOTE :  SMOTE stands for Synthetic Minority Oversampling Technique.  This is a statistical technique to increase the number samples in the minority class in the dataset to make it balanced.  It works by generating new instances of data from existing data by taking feature space of each target class and its nearest neighbours.  But it is effective upto 6 to 7 parameters.  Hybrid Sampling of data:  It is done by stepwise addition and subtraction of data points interpolated between existing data points till the overfitting threshold is reached.
  • 8. Contd…  Naïve Bayes Classifier  A simple classifier model that is:  Based on the Bayes theorem and Conditional Probability  Chooses decision based on highest probability  Allows prior knowledge and logic  Uses Supervised Learning  Conditional Probability:  Conditional probability is a measure of the probability of an event given that another event has already occurred.  Bayes Theorem:  P(H|E)= P(E|H) P(H)/ P(E)  P(H) is the probability of hypothesis H being true.  P(E) is the probability of the evidence(regardless of the hypothesis).  P(E|H) is the probability of the evidence given that hypothesis is true.  P(H|E) is the probability of the hypothesis given that the evidence is there.
  • 9. Contd..  Naïve Bayes Classifier Bayesian classification rule
  • 10. Contd…  k-Nearest Neighbour Classifier K - Nearest neighbors is a lazy learning instance based classification( regression ) algorithm which is widely implemented in both supervised and unsupervised learning techniques. It is lazy Learner as it doesn't learn from a discriminative function from training data but memorizes training dataset. This technique implements classification by considering majority of vote among the “k” closest points to the unlabeled data point. It uses three types of functions for distance calculation  Euclidian  Manhattan  Minkowski
  • 11. Contd…  k-Nearest Neighbour Classifier Formula to calculate Euclidean distance Fig 2. Example of k-NN classification
  • 12. Contd…  Logistic Regression Classifier: Logistic Regression is a statistical method for analyzing a dataset in which there are one or more independent variables that determine an outcome. The outcome is measured with a dichotomous variable, where there are only two possible outcomes. The goal of logistic regression is to find the best fitting model to describe the relationship between the dichotomous characteristic of interest, and a set of independent variables. Logistic Regression generates the coefficients of a formula to predict a Logit Transformation of the probability of presence of the characteristic of interest.
  • 13. Contd…  Logistic Regression Classifier Fig.3 Working of Logistic Regression Model
  • 14. Contd… Sigmoid Function: A sigmoid function is a mathematical function having a characteristic "S"-shaped curve or sigmoid curve. A sigmoid function is a bounded, differentiable, real function that is defined for all real input values and has a non-negative derivative at each point. Fig.4 Sigmoid Function
  • 15. Performance Evaluation and Results  Four metrics used in evaluation ◦ True Positive Rate(TPR) ◦ True Negative Rate (TNR) ◦ False Positive Rate (FPR) ◦ False Negative Rate (FNR)
  • 16. Contd …  Performance of naïve bayes, k-nearest neighbour and logistic regression classifiers are evaluated based on : Accuracy Sensitivity Specificity Precision Matthews Correlation Coefficient (MCC) Balanced Classification Rate(BCR)
  • 18. Contd..  Comparative Performance Figure 4 Performance evaluation chart for naive bayes, knn and logistic regression *mcc = matthews correlation coefficient *bcr = balanced classification rate Figure 5 TPR and FPR evaluation of naïve bayes classifiers *TPR = true positive rate *fpr = false positive rate *proposed NB = proposed naïve bayes classifier
  • 19. Contd… Figure 6 TPR and FPR evaluation of k-nearest neighbour classifiers *TPR = true positive rate *fpr = false positive rate *proposed knn = proposed k-nearest neighbor classifier Figure 7 TPR and FPR evaluation of logistic regression classifiers *TPR = true positive rate *fpr = false positive rate *proposed LR = proposed logistic rgression classifier
  • 20. Conclusion  Three classifiers based on different machine learning techniques (Naïve Bayes, K-nearest neighbours and Logistic Regression) are trained on real life of credit card transactions data and their performances on credit card fraud detection evaluated and compared based on several relevant metrics.  The highly imbalanced dataset is sampled in a hybrid approach where the positive class is oversampled and the negative class under-sampled, achieving two sets of data distributions.  The performances of the three classifiers are examined on the two sets of data distributions using accuracy, sensitivity, specificity, precision, balanced classification rate and Matthews Correlation coefficient metrics.  Results from the experiment shows that the kNN shows significant performance for all metrics evaluated except for accuracy in the 10:90 data distribution.
  • 21. References 1. S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, "Credit card fraud detection using Bayesian and neural networks", Proceeding International NAISO Congress on Neuro Fuzzy Technologies, 2002. 2. F. N. Ogwueleka, "Data Mining Application in Credit Card Fraud Detection System", Journal of Engineering Science and Technology, vol. 6, no. 3, pp. 311- 322, 2011. 3. K. RamaKalyani, D. UmaDevi, "Fraud Detection of Credit Card Payment System by Genetic Algorithm", International Journal of Scientific & Engineering Research, vol. 3, no. 7, pp. 1-6, 2012, ISSN 2229-5518. 4. P. L. Meshram, P. Bhanarkar, "Credit and ATM Card Fraud Detection Using Genetic Approach", International Journal of Engineering Research & Technology (IJERT), vol. 1, no. 10, pp. 1-5, 2012, ISSN ISSN: 2278-0181. 5. G. Singh, R. Gupta, A. Rastogi, M. D. S. Chandel, A. Riyaz, "A Machine Learning Approach for Detection of Fraud based on SVM", International Journal of Scientific Engineering and Technology, vol. 1, no. 3, pp. 194-198, 2012, ISSN ISSN: 2277- 1581. 6. K. R. Seeja, M. Zareapoor, "FraudMiner: A Novel Credit Card Fraud Detection Model Based on Frequent Itemset Mining" in The Scientific World Journal, Hindawi Publishing Corporation, vol. 2014, pp. 1-10, 2014. 7. S. Patil, H. Somavanshi, J. Gaikwad, A. Deshmane, R. Badgujar, "Credit Card Fraud Detection Using Decision Tree Induction Algorithm", International Journal of Computer Science and Mobile Computing (IJCSMC), vol. 4, no. 4, pp. 92-95, 2015, ISSN ISSN: 2320-088X. 8. E. Duman, A. Buyukkaya, I. Elikucuk, "A novel and successful credit card fraud detection system implemented in aturkish bank", Data Mining Workshops (ICDMW). 2013 IEEE 13th International Conference on, pp. 162-171, 2013.
  • 22. 9. A. C. Bahnsen, A. Stojanovic, D. Aouada, B. Ottersten, "Improving credit card fraud detection with calibrated probabilities", Proceedingis of the 2014 SIAM International Conference on Data Mining, pp. 677-685, 2014. 10. A. Y. Ng, M. I. Jordan, "On discriminative vs. generative classifiers: A comparison of logistic regression and naive bayes", Advances in neural information processing systems, vol. 2, pp. 841-848, 2002. 11. S. Maes, K. Tuyls, B. Vanschoenwinkel, B. Manderick, "Credit card fraud detection using Bayesian and neural networks", Proceedings of the 1st international naiso congress on neuro fuzzy technologies, pp. 261-270, 2002. 12. A. Shen, R. Tong, Y. Deng, "Application of classification models on credit card fraud detection", Service Systems and Service Management 2007 International Conference on, pp. 1-4, 2007. 13. S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, "Data mining for credit card fraud: A comparative study", Decision Support Systems, vol. 50, no. 3, pp. 602-613, 2011. 14. Y. Sahin, E. Duman, "Detecting credit card fraud by ANN and logistic regression", Innovations in Intelligent Systems and Applications (INISTA) 2011 International Symposium on, pp. 315-319, 2011. 15. K. Chaudhary, B. Mallick, "Credit Card Fraud: The study of its impact and detection techniques", International Journal of Computer Science and Network (IJCSN), vol. 1, no. 4, pp. 31-35, 2012, ISSN ISSN: 2277-5420. 16. T. P. Bhatla, V. Prabhu, A. Dua, Understanding credit card frauds. Crads Business Review# 2003-1, Tata Consultancy Services, 2003. 17. U. S. Credit & Debit Cards 2015, David Robertson, 2015. 18. S. Stolfo, D. W. Fan, W. Lee, A. Prodromidis, P. Chan, "Credit card fraud detection using meta-learning: Issues and initial results", AAAI-97 Workshop on Fraud Detection and Risk Management, 1997.