Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.

Nasscom how can you identify fraud in fintech lending using deep learning

247 vues

Publié le

NASSCOM Presentation

Publié dans : Formation
  • Identifiez-vous pour voir les commentaires

Nasscom how can you identify fraud in fintech lending using deep learning

  1. 1. How can you Identify Fraud in Fintech Lending using Deep Learning RATNAKAR PANDEY, HEAD OF INDIA ANALYTICS & DATA SCIENCE, KABBAGE Disclaimer: The views expressed here are solely those of the presenter in his private capacity. 16th October 2018
  2. 2. “This series is solely for educational purposes only. This series does not intend to be complete or universal in nature and cannot be considered as an alternative to an expert opinion on any specific issue. The series is based on views of the speaker/facilitator and NASSCOM does not recommend/endorse the view-points per se and is primarily a medium to disseminate knowledge for the greater good of the Products ecosystem. Any attendee who opens or otherwise accesses the content of the series at any point of time, does so at their own risk and acknowledges and agrees that neither NASSCOM and nor its members and affiliates will not be responsible for any loss or damage suffered by any person. The content of this webinar series is solely for the purpose of NASSCOM members and NASSCOM digital channels and any copying/distribution is liable for legal action.” Legal Disclaimer 2
  4. 4. Outline Demo of Multi Level Perceptron (MLP) Classification Case Approach and Performance Suggested Deep Learning Application Areas Supervised Unsupervised Need for Deep Learning Existing Methods Why Deep Learning? Frauds in Fintech Lending Drivers Modus Operandi Introduction About Fintech About Kabbage 4
  5. 5. Fintech is an Integral Part of Our Life Now $24.7 B Invested in 2016 in global fintech companies 1076 Deals in 2016 in global fintech companies Sources: KPMG, The Pulse of Fintech Q4 2016 | Capgemini World Fintech Report 2017 | PwC Global Fintech Report 2017 | www.forbes.com 50.2% Of global customers have done business with fintech 20% Expected ROI on fintech projects 20+ Global fintech Unicorns 10K+ Global fintech companies Types of Fintech Alternative Lending- Kabbage, Lendingclub, Prosper, Zopa Payment / Billing Tech - Stripe, Paytm, Adyen, Ant Financial, Square Personal Finance / Asset Management Creditkarma, Bankrate, NerdWallet Robo Advisory- Wealthfront, Betterment, NerdWallet Blockchain- Abra, 21, coinbase, Ethereum 5
  6. 6. Kabbage is Blazing a Trail in Big Data & Fintech Kabbage is more than a lender for small businesses; our data and technology platform is now being used as a fully branded product by other lenders, and our products are expanding. We’ve received numerous awards & recognition, including- • CNBC Disruptors 50 list • Inc. 500 list for three consecutive years • The Forbes Most Promising Companies lists twice • Glassdoor’s 2017 Best Places to Work list 6
  7. 7. Fraud Drivers- Superfast Decision Making and Faceless Channels Decisioning within few minutes Application on web and Mobile May have higher exposure to thin file and new to credit More prone to invisible window applications Unconventional and evolving data sources Note: Even with these challenges the fraud rate in the industry is typically less than 20 bps for more data savvy lenders 7
  8. 8. How a Lending Fraud can be Classified? Who Commits? How? Who is the Victim? Borrower Someone known to the borrower- lead generator, friends, family employees etc. Someone unknown to the borrower First Payment Default, Bust Out, Synthetic Identity, Stacking etc. Friendly Fraud- someone misuses the trust Fraud rings, Identity Theft, Account Takeover Lender Borrower, Lender Borrower, Lender First Party Second Party Third Party 8
  9. 9. Sample Modus Operandi • Stolen identity • Synthetic identity • May replicate best customer (prime and super prime) • Falsified info • No willingness to pay • Acquire multiple loans in a short window ( invisible window) • May provide all info correctly • More likely to be on higher side in the risk spectrum • No or low willingness to pay • Mimic good payment behavior for significant time • Bust out when gains are highestCommon Fraud Related Terms- http://www.cpp.co.uk/helpful-info/fraud-glossary-of-terms 9
  10. 10. Current Situation- Heuristics and Regression Driven Approaches Intuitive Heuristics Statistical • Manual Reviews • Experts Driven • Gut feeling • Thumb rules • Driven by past experience • Quick decision making • Control/ confidence limits • Outlier detection/ deviation from norm • Decision tree, regression, time series 10
  11. 11. 10,000 + Features Unstructured Transactional Social Device & IP Third Parties Bureau Why go Deep? Explosion of Features and Data Sources • Uncover hard to detect patterns (using traditional techniques) when the incidence rate is low • Find latent features (super variables) without significant manual feature engineering • Real time fraud detection and self learning models using streaming data (KAFKA, MapR) • Ensure consistent customer experience and regulatory compliance • Higher operational efficiency • Big data and data exhaust handling capabilities 11
  13. 13. Find Anomalies- Autoencoder • Traditional techniques based on density or distance works better with linearly separable data • Stacked Autoencoders (SAE) and Deep Belief Networks ( DBN) make no assumptions about the distribution of data and work better on non linearly separable data • Unsupervised learning algorithms for feature learning, feature reduction and outlier detection • Input vectors are used as output vectors and reconstruction error computed • The data points with higher reconstruction error ( MSE) are more likely to be outliers • Helps in detecting different modus operandi of fraudsters Use Case- Deployment of Autoencoder for Credit Card Fraud Detection 13
  14. 14. Sequence Analysis- Recurrent Neural Network (LSTM) • Recurrent Neural Network (RNN) are a special type of feed-forward network used for sequential data analysis where inputs are not independent and are not of fixed length • Rather in this case, inputs are dependent on each other along the time dimension. In other words, what happens in time ‘t’ may depend on what happened in time ‘t-1’, ‘t-2’ and so on • These are also called ‘memory’ networks as previous inputs and states persist in the model for doing a more optimal sequential analysis. They can have both short term and long term time dependence. • Long Short Term Memory (LSTM) is one of the most popular Deep Network used for sequential data analysis. • More on LSTM Here- https://datafai.com/2018/03/08/recurrent- neural-network-rnn-in-python/ Use Case- Use RNN (LSTM) to analyse web behaviour and logs to detect fraudulent behavior 14
  15. 15. Find Networks - Clique and Links Graphs Detect Fraudulent Cases Find Commonalities Form Network • Use variety of attributes (on-us/ off-us) to build linkage between known bad customers and other customers with unknown status • Larger the size of network, easier the detection and vice versa • Overlap networks using enumerative approaches and find commonalities • Use graph transduction (t-SNE) to detect potential fraudulent cases by doing peer group (archetype) analysis to separate routine behavior from suspicious behavior - “birds of same feather flock together” 15
  17. 17. Real Time Detection- Convolution Neural Network (CNN) • Convolution Neural Network (CNN) are particularly useful for spatial data analysis, image recognition, computer vision, natural language processing, signal processing and variety of other different purposes. They are biologically motivated by functioning of neurons in visual cortex to a visual stimuli. • What makes CNN much more powerful compared to the other feedback forward networks for image recognition is the fact that they do not require as much human intervention and parameters as some of the other networks such as MLP do. This is primarily driven by the fact that CNNs have neurons arranged in three dimensions. • More on CNN Here- https://datafai.com/2018/02/25/deep-learning- convolution-neural-network-cnn-in-python/ Use Case- CNN for real time classification 17
  18. 18. Labeled Data- Multilayer Perceptron (MLP) • These are the most basic networks and feed forward the inputs to create output. They consist of an input layer and an output layer and many interconnected hidden layers and neurons between the input and the output layers. • They can be used for any supervised regression or classification problems • Since they generally use some non linear activation function such as Relu or Tanh to compute the losses ( the difference between the true output and computed output) such as Mean Square Error ( MSE), Logloss, they are more suitable for handling non linear problems. • We will do a MLP Demo on credit card fraud data 18
  19. 19. MLP Demo- Case Details • Anonymized credit card transactions data from European customers • 30 features ( 28 anonymized, duration elapsed, amount of transactions) • Label- fraud or normal transaction • 17bps incidence rate for fraudulent transactions • 284,807 total transaction in data Sources: http://mlg.ulb.ac.be | https://www.kaggle.com/dalpozz/creditcardfraud 19
  20. 20. MLP Demo- Tools and Techniques used Python 2.7 or 3.6 Keras 2.0.2 TensorFlow 1.0.1 20
  21. 21. MLP Demo- Traditional Modeling Techniques Process Manual Feature Engineering After variable treatments drop variables with little or no explaining power- WOE, IV, Distribution Look at WOE to create bins etc. WOEDensity Dist. 21
  22. 22. MLP Demo- Network Training Little or No Manual Feature Engineering • No over or under sampling • No variables dropped • Only standardization of features done • 75% training/ 25% validation • No manual binning Fitted Network • Multi Layer Perceptron with three hidden layers. o Activation function = Sigmoid o # of neurons = 512 in the input layer o Each consequent layer has half the neurons o Cost function = logloss o Optimizer = adam o Epochs= 5 o Dropout rate = 30% 22
  23. 23. MLP Demo- Performance Summary Metric Value Accuracy Score 99.9% Logloss 0.003 Precision Score 77% Recall Score 75% Area Under the Curve (AUC) 87.4% FScore 76.5% 23
  24. 24. MLP Demo- Hyperparameters Optimization • Epochs = [5,10,15,20,25…] • Batch Size = [5,10,20,30,40…] • Optimizer= [‘SGD’, ’Adam’, ’RMSprop’…] • Learning Rate = [0.01,0.05,0.1,0.2…] • Momentum = [0.2,0.4,0.6,…] • Weights Initiation= [‘Uniform’, ‘Normal’, …] • Activation Function= [‘relu’,’sigmoid’, ‘tanh’, ‘softmax’,…] • Drop-out rate= [0.0,0.2,0.4,0.5,…] • Neurons= [5,10,20,30,40…] Python scikit-learn gridsearch function, design of experiment( screening design, fractional designs) needs to be combined with intutition and expertise to come out with the best network! 24
  25. 25. Thank You! Christopher McDougall- “Every morning in Africa, a gazelle wakes up, it knows it must outrun the fastest lion or it will be killed. Every morning in Africa, a lion wakes up. It knows it must run faster than the slowest gazelle, or it will starve. It doesn't matter whether you're the lion or a gazelle-when the sun comes up, you'd better be running. Working in the fraud analytics is the same way. 25 25
  26. 26. Next Webinar : Go-to-market strategy / Planning Date : 2nd Nov 2018 Speaker: Ashok Munirathinam, Sr. Director, SAP Cloud Platform SAP Asia Pacific & Japan Queries: Ankita@nasscom.in 26