SlideShare a Scribd company logo
1 of 28
Active Learning for Fraud
Prevention
Agenda Introduction
Fraud Prevention
Algorithm
Experiments
Conclusion
©2016 PayPal Inc. Confidential and proprietary.
INTRODUCTION
© 2016 PayPal Inc. Confidential and proprietary.
About Me
• Software Engineer/Data Scientist/ML Researcher
• Ph. D Computer Science
• Research in Face Recognition, Phishing/Spam, Fraud Prevention
4
developers
+2.5
MILLIONpayments/year
4.9
BILLION
payments/
second at
peak
~30
0
active customer
accounts
184
M
petabytes of
data
42
database
calls/ quarter
4.5
T
PayPal operates
one of the largest
PRIVATE
CLOUDS
in the world
We have transformed
core business
processes into robust
SERVICE-BASED
PLATFORMS
The power of
our platform
Our technology transformation enables us to:
• Process payments at tremendous scale
• Accelerate the innovation of new products
• Engage world-class developers & technologists
About PayPal
FRAUD PREVENTION
Fraud Prevention @ PayPal
Robust feature engineering, machine
learning and statistical models
Highly scalable and multi-layered
infrastructure software
Superior team of data scientists,
researchers, financial and intelligence
analysts
Images source:
Fraud Prevention @ PayPal
• Employs advanced machine learning and statistical models to flag
fraudulent behavior up-front
• More sophisticated algorithms after transaction is complete
Transaction Level
• Monitor account level activity to identify abusive behavior
• Abusive pattern include frequent payments, suspicious profile
changes
Account Level
• Monitor account-to-account interaction
• Frequent transfer of money from several accounts to one central
account
Network Level
Fraud Prevention – What are we up against?
Fraudsters are becoming increasingly smarter and adaptive
Need cost-effective solutions that can model complex attack
patterns not previously observed
Need scalable and computationally efficient prediction models
© 2016 PayPal Inc. Confidential and proprietary.
Fraud Prevention – What are we up against?
• Much harder to get performance lift on
our flagship models
• Need to re-look at all aspects of
traditional model building
• Need out-of-the-box thinking
10
Area we are missing (AUC 0.96)
© 2016 PayPal Inc. Confidential and proprietary.
Fraud Prevention – What can we do to build better models?
11
feature1 …. featureN ……… Target
(Label)
d1
d2
…
dM
…..
Better
feature
Better
labeling
Advanced ML
Algorithms
Bigger
better data
ALGORITHM – ACTIVE LEARNING
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning – What is it?
• Supervised learning algorithms require
data to be labeled
• Labelling is difficult, time-consuming
and expensive : Active Learning to the
rescue
• Idea – ML Algorithm can achieve better
accuracy if it is allowed to “choose the
data” from which it learns*
• Overcome labelling bottleneck by asking
queries (unlabeled data) to be labeled by
human
13
Unlabeled
Data
Labeled Data
Human Annotator
Machine Learning Model
(Re)Build Model
Select Queries
Source*: Burr Settles
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning – What is it?
• Scenarios
• Membership Query Synthesis – request labels for ‘any’
unlabeled instance in input space
• Stream-based Selective Sampling – unlabeled instance is drawn
one at a time & learner decides whether to discard or query
• Pool-based Sampling – instances are queried from a pool
according to informative-ness measure
14
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning – What is it?
• Query Strategy Frameworks
• Uncertainty Sampling
• Query-By-Committee
• Expected Model Change
• Expected Error Reduction
• Variance Reduction
• Density Weighted Methods
15
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning –Toy Example
16
Toy data – 400 instances Model using random sampling
70% accuracy
Model using active learning
Uncertainty sampling – 90% accuracy
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning For Fraud Prevention – Why is it unique?
17
• Data is unbalanced
• Fraud labelling require trained experts. Can’t be outsourced
• Fraud labelling is time consuming
• Fraud labelling require more than just individual instances. Require before
& after transactions
• Fraud labelling require data from other entities (ex: IP address)
• Fraud labelling require aggregate data
• Fraud tag mature at different times (ex: chargeback) & not instantaneous
© 2016 PayPal Inc. Confidential and proprietary.
Active Learning For Fraud Prevention – High Level Framework
18
Labeled
Data
Create Bags
Deep Learning
Model
GBT Model
(Re)Build Models
Unlabeled
Data
Predict
Query By Committee
Human Expert
Create
Statistics
Active
Feature
Engineering
Simulate
Features
© 2016 PayPal Inc. Confidential and proprietary.
Modeling Algorithm – Deep Learning
19
Input Layer
Hidden Layers
Output Layer
• If a network has many layers of non-linearity, it is “deep”
• Need scalable platform
• Need lots of training data
© 2016 PayPal Inc. Confidential and proprietary.
Modeling Algorithm – Deep Learning
20
•NetworkTopology – Feed forward
•Key Parameters
• # of hidden layers
• # of neurons @ each hidden layer
• Regularization
• Activation function
© 2016 PayPal Inc. Confidential and proprietary.
Modeling Algorithm – Gradient BoostingTrees
21
• GBT = Gradient Descent + Boosting
• Fit an additive (ensemble) model in forward stage wise manner
• In each stage introduce a new model to compensate the shortcomings
of existing models
© 2016 PayPal Inc. Confidential and proprietary.
Modeling Algorithm – Gradient BoostingTrees
22
• Strengths
• No pre-processing required
• Robust
• Scalable
• Weaknesses
• Overfits (Need to find proper stopping point)
• Sensitive to noise
• Key Parameters
• # of trees
• Max depth
• Max observations
• Learning rate
EXPERIMENTS
© 2016 PayPal Inc. Confidential and proprietary.
Datasets
24
• Training Data
• 1 year
• 11 million transactions (1 million for active labelling)
• Test Data
• 4 months
• 4 million transactions
• # of features
• 500 - 600
© 2016 PayPal Inc. Confidential and proprietary.
Tools
25
• H2O
• Open source
• Scalable
• Robust
• Deep Learning & GBM implementations
• R
• Open source
• Active learning package
© 2016 PayPal Inc. Confidential and proprietary. 26
# of instances queried AUC (*weighted)
0 0.960
1000 0.961
10000 0.963
50000 0.971
100000 0.975
500000 0.977
1000000 0.979
Early Results – Active Learning Shows Promise…
CONCLUSIONS
© 2016 PayPal Inc. Confidential and proprietary.
Conclusions
28
• Deep learning & GBT has shown tremendous performance for fraud
detection.
• Active learning shows promise in improving performance of these
champion models
• Active learning also significantly reduce our labelling cost

More Related Content

What's hot

Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & OpportunityiTrain
 
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Edureka!
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and ForecastCastLabKAIST
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Appsilon Data Science
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsKrishnaram Kenthapadi
 
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE and Lenovo presenting Federated Learning at MWC 2019byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE and Lenovo presenting Federated Learning at MWC 2019byteLAKE
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa
 
FACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKFACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKcodebangla
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachAndry Alamsyah
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleImpetus Technologies
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Muhammad Ishaq
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationImpetus Technologies
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnodot
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
 
Paillier-ElGamal cryptosystem presentation
Paillier-ElGamal cryptosystem presentationPaillier-ElGamal cryptosystem presentation
Paillier-ElGamal cryptosystem presentationGauthamSK4
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseMindfire Solutions
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Simplilearn
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis PresentationWajdi Khattel
 

What's hot (20)

Deep Learning: Application & Opportunity
Deep Learning: Application & OpportunityDeep Learning: Application & Opportunity
Deep Learning: Application & Opportunity
 
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
Autoencoders Tutorial | Autoencoders In Deep Learning | Tensorflow Training |...
 
Cnn
CnnCnn
Cnn
 
AI Chip Trends and Forecast
AI Chip Trends and ForecastAI Chip Trends and Forecast
AI Chip Trends and Forecast
 
Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)Introduction to Generative Adversarial Networks (GANs)
Introduction to Generative Adversarial Networks (GANs)
 
Fairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML SystemsFairness and Privacy in AI/ML Systems
Fairness and Privacy in AI/ML Systems
 
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE and Lenovo presenting Federated Learning at MWC 2019byteLAKE and Lenovo presenting Federated Learning at MWC 2019
byteLAKE and Lenovo presenting Federated Learning at MWC 2019
 
Pruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inferencePruning convolutional neural networks for resource efficient inference
Pruning convolutional neural networks for resource efficient inference
 
FACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORKFACE RECOGNITION USING NEURAL NETWORK
FACE RECOGNITION USING NEURAL NETWORK
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Big Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network ApproachBig Data Analytics : A Social Network Approach
Big Data Analytics : A Social Network Approach
 
Anomaly detection with machine learning at scale
Anomaly detection with machine learning at scaleAnomaly detection with machine learning at scale
Anomaly detection with machine learning at scale
 
Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...Artificial neural network model & hidden layers in multilayer artificial neur...
Artificial neural network model & hidden layers in multilayer artificial neur...
 
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live ImplementationAnomaly Detection - Real World Scenarios, Approaches and Live Implementation
Anomaly Detection - Real World Scenarios, Approaches and Live Implementation
 
Analytics for large-scale time series and event data
Analytics for large-scale time series and event dataAnalytics for large-scale time series and event data
Analytics for large-scale time series and event data
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
 
Paillier-ElGamal cryptosystem presentation
Paillier-ElGamal cryptosystem presentationPaillier-ElGamal cryptosystem presentation
Paillier-ElGamal cryptosystem presentation
 
Neo4J : Introduction to Graph Database
Neo4J : Introduction to Graph DatabaseNeo4J : Introduction to Graph Database
Neo4J : Introduction to Graph Database
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 
Master's Thesis Presentation
Master's Thesis PresentationMaster's Thesis Presentation
Master's Thesis Presentation
 

Viewers also liked

Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on HadoopTyler Mitchell
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseDataWorks Summit/Hadoop Summit
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachDataWorks Summit
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlDominic Sroda Korkoryi
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with PythonGramener
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurancestuartdrose
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...DataWorks Summit/Hadoop Summit
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleDataWorks Summit/Hadoop Summit
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Precisely
 

Viewers also liked (20)

Solving Performance Problems on Hadoop
Solving Performance Problems on HadoopSolving Performance Problems on Hadoop
Solving Performance Problems on Hadoop
 
A New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouseA New "Sparkitecture" for modernizing your data warehouse
A New "Sparkitecture" for modernizing your data warehouse
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Effective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant ClustersEffective Spark on Multi-Tenant Clusters
Effective Spark on Multi-Tenant Clusters
 
SQL and Search with Spark in your browser
SQL and Search with Spark in your browserSQL and Search with Spark in your browser
SQL and Search with Spark in your browser
 
Producing Spark on YARN for ETL
Producing Spark on YARN for ETLProducing Spark on YARN for ETL
Producing Spark on YARN for ETL
 
Big Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud DetectionBig Data Application Architectures - Fraud Detection
Big Data Application Architectures - Fraud Detection
 
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run ApproachEvolution of Big Data at Intel - Crawl, Walk and Run Approach
Evolution of Big Data at Intel - Crawl, Walk and Run Approach
 
Comparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBaseComparison of Transactional Libraries for HBase
Comparison of Transactional Libraries for HBase
 
Presentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & controlPresentation on fraud prevention, detection & control
Presentation on fraud prevention, detection & control
 
Automated data analysis with Python
Automated data analysis with PythonAutomated data analysis with Python
Automated data analysis with Python
 
PowerShell Slides
PowerShell SlidesPowerShell Slides
PowerShell Slides
 
Enterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on HadoopEnterprise Grade Streaming under 2ms on Hadoop
Enterprise Grade Streaming under 2ms on Hadoop
 
SAS Fraud Framework for Insurance
SAS Fraud Framework for InsuranceSAS Fraud Framework for Insurance
SAS Fraud Framework for Insurance
 
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
Large Scale Health Telemetry and Analytics with MQTT, Hadoop and Machine Lear...
 
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo ScaleManaging Hadoop, HBase and Storm Clusters at Yahoo Scale
Managing Hadoop, HBase and Storm Clusters at Yahoo Scale
 
YARN Federation
YARN Federation YARN Federation
YARN Federation
 
Data Preparation of Data Science
Data Preparation of Data ScienceData Preparation of Data Science
Data Preparation of Data Science
 
Big Data Ready Enterprise
Big Data Ready Enterprise Big Data Ready Enterprise
Big Data Ready Enterprise
 
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
Use Cases from Batch to Streaming, MapReduce to Spark, Mainframe to Cloud: To...
 

Similar to Active Learning for Fraud Prevention

Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...DataWorks Summit
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceData Science Milan
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...DATAVERSITY
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and MLQuantUniversity
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Sri Ambati
 
Lessons learned
Lessons learnedLessons learned
Lessons learnedhexgnu
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningYogesh Sharma
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017Prashant Bhatmule
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )fmarukanda
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesDATAVERSITY
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data productsVikas Sardana
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16AnwarrChaudary
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine LearningRandy Shoup
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamSri Ambati
 

Similar to Active Learning for Fraud Prevention (20)

Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
Large Scale Graph Processing & Machine Learning Algorithms for Payment Fraud ...
 
Think Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial IntelligenceThink Big | Enterprise Artificial Intelligence
Think Big | Enterprise Artificial Intelligence
 
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
ADV Slides: What the Aspiring or New Data Scientist Needs to Know About the E...
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
Practical model management in the age of Data science and ML
Practical model management in the age of Data science and MLPractical model management in the age of Data science and ML
Practical model management in the age of Data science and ML
 
Knowledge Discovery
Knowledge DiscoveryKnowledge Discovery
Knowledge Discovery
 
Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...Building a Real-Time Security Application Using Log Data and Machine Learning...
Building a Real-Time Security Application Using Log Data and Machine Learning...
 
Lessons learned
Lessons learnedLessons learned
Lessons learned
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Hadoop and SAP BI
Hadoop and SAP BI   Hadoop and SAP BI
Hadoop and SAP BI
 
predictive analysis and usage in procurement ppt 2017
predictive analysis and usage in procurement  ppt 2017predictive analysis and usage in procurement  ppt 2017
predictive analysis and usage in procurement ppt 2017
 
Data mining
Data miningData mining
Data mining
 
Big data session five ( a )f
Big data session five ( a )fBig data session five ( a )f
Big data session five ( a )f
 
Assessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use CasesAssessing New Databases– Translytical Use Cases
Assessing New Databases– Translytical Use Cases
 
Customer value analysis of big data products
Customer value analysis of big data productsCustomer value analysis of big data products
Customer value analysis of big data products
 
Deep learning
Deep learningDeep learning
Deep learning
 
Intro to Data warehousing lecture 16
Intro to Data warehousing   lecture 16Intro to Data warehousing   lecture 16
Intro to Data warehousing lecture 16
 
An Agile Approach to Machine Learning
An Agile Approach to Machine LearningAn Agile Approach to Machine Learning
An Agile Approach to Machine Learning
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum ShachamH2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
H2O World - Data Science w/ Big Data in a Corporate Environment - Nachum Shacham
 

More from DataWorks Summit/Hadoop Summit

Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerDataWorks Summit/Hadoop Summit
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformDataWorks Summit/Hadoop Summit
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDataWorks Summit/Hadoop Summit
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...DataWorks Summit/Hadoop Summit
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...DataWorks Summit/Hadoop Summit
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLDataWorks Summit/Hadoop Summit
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)DataWorks Summit/Hadoop Summit
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...DataWorks Summit/Hadoop Summit
 

More from DataWorks Summit/Hadoop Summit (20)

Running Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in ProductionRunning Apache Spark & Apache Zeppelin in Production
Running Apache Spark & Apache Zeppelin in Production
 
State of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache ZeppelinState of Security: Apache Spark & Apache Zeppelin
State of Security: Apache Spark & Apache Zeppelin
 
Unleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache RangerUnleashing the Power of Apache Atlas with Apache Ranger
Unleashing the Power of Apache Atlas with Apache Ranger
 
Enabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science PlatformEnabling Digital Diagnostics with a Data Science Platform
Enabling Digital Diagnostics with a Data Science Platform
 
Revolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and ZeppelinRevolutionize Text Mining with Spark and Zeppelin
Revolutionize Text Mining with Spark and Zeppelin
 
Double Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSenseDouble Your Hadoop Performance with Hortonworks SmartSense
Double Your Hadoop Performance with Hortonworks SmartSense
 
Hadoop Crash Course
Hadoop Crash CourseHadoop Crash Course
Hadoop Crash Course
 
Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Apache Spark Crash Course
Apache Spark Crash CourseApache Spark Crash Course
Apache Spark Crash Course
 
Dataflow with Apache NiFi
Dataflow with Apache NiFiDataflow with Apache NiFi
Dataflow with Apache NiFi
 
Schema Registry - Set you Data Free
Schema Registry - Set you Data FreeSchema Registry - Set you Data Free
Schema Registry - Set you Data Free
 
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
Building a Large-Scale, Adaptive Recommendation Engine with Apache Flink and ...
 
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
Real-Time Anomaly Detection using LSTM Auto-Encoders with Deep Learning4J on ...
 
Mool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and MLMool - Automated Log Analysis using Data Science and ML
Mool - Automated Log Analysis using Data Science and ML
 
How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient How Hadoop Makes the Natixis Pack More Efficient
How Hadoop Makes the Natixis Pack More Efficient
 
HBase in Practice
HBase in Practice HBase in Practice
HBase in Practice
 
The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)The Challenge of Driving Business Value from the Analytics of Things (AOT)
The Challenge of Driving Business Value from the Analytics of Things (AOT)
 
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS HadoopBreaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
Breaking the 1 Million OPS/SEC Barrier in HOPS Hadoop
 
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
From Regulatory Process Verification to Predictive Maintenance and Beyond wit...
 
Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop Backup and Disaster Recovery in Hadoop
Backup and Disaster Recovery in Hadoop
 

Recently uploaded

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 

Recently uploaded (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 

Active Learning for Fraud Prevention

  • 1. Active Learning for Fraud Prevention
  • 4. © 2016 PayPal Inc. Confidential and proprietary. About Me • Software Engineer/Data Scientist/ML Researcher • Ph. D Computer Science • Research in Face Recognition, Phishing/Spam, Fraud Prevention 4
  • 5. developers +2.5 MILLIONpayments/year 4.9 BILLION payments/ second at peak ~30 0 active customer accounts 184 M petabytes of data 42 database calls/ quarter 4.5 T PayPal operates one of the largest PRIVATE CLOUDS in the world We have transformed core business processes into robust SERVICE-BASED PLATFORMS The power of our platform Our technology transformation enables us to: • Process payments at tremendous scale • Accelerate the innovation of new products • Engage world-class developers & technologists About PayPal
  • 7. Fraud Prevention @ PayPal Robust feature engineering, machine learning and statistical models Highly scalable and multi-layered infrastructure software Superior team of data scientists, researchers, financial and intelligence analysts Images source:
  • 8. Fraud Prevention @ PayPal • Employs advanced machine learning and statistical models to flag fraudulent behavior up-front • More sophisticated algorithms after transaction is complete Transaction Level • Monitor account level activity to identify abusive behavior • Abusive pattern include frequent payments, suspicious profile changes Account Level • Monitor account-to-account interaction • Frequent transfer of money from several accounts to one central account Network Level
  • 9. Fraud Prevention – What are we up against? Fraudsters are becoming increasingly smarter and adaptive Need cost-effective solutions that can model complex attack patterns not previously observed Need scalable and computationally efficient prediction models
  • 10. © 2016 PayPal Inc. Confidential and proprietary. Fraud Prevention – What are we up against? • Much harder to get performance lift on our flagship models • Need to re-look at all aspects of traditional model building • Need out-of-the-box thinking 10 Area we are missing (AUC 0.96)
  • 11. © 2016 PayPal Inc. Confidential and proprietary. Fraud Prevention – What can we do to build better models? 11 feature1 …. featureN ……… Target (Label) d1 d2 … dM ….. Better feature Better labeling Advanced ML Algorithms Bigger better data
  • 13. © 2016 PayPal Inc. Confidential and proprietary. Active Learning – What is it? • Supervised learning algorithms require data to be labeled • Labelling is difficult, time-consuming and expensive : Active Learning to the rescue • Idea – ML Algorithm can achieve better accuracy if it is allowed to “choose the data” from which it learns* • Overcome labelling bottleneck by asking queries (unlabeled data) to be labeled by human 13 Unlabeled Data Labeled Data Human Annotator Machine Learning Model (Re)Build Model Select Queries Source*: Burr Settles
  • 14. © 2016 PayPal Inc. Confidential and proprietary. Active Learning – What is it? • Scenarios • Membership Query Synthesis – request labels for ‘any’ unlabeled instance in input space • Stream-based Selective Sampling – unlabeled instance is drawn one at a time & learner decides whether to discard or query • Pool-based Sampling – instances are queried from a pool according to informative-ness measure 14
  • 15. © 2016 PayPal Inc. Confidential and proprietary. Active Learning – What is it? • Query Strategy Frameworks • Uncertainty Sampling • Query-By-Committee • Expected Model Change • Expected Error Reduction • Variance Reduction • Density Weighted Methods 15
  • 16. © 2016 PayPal Inc. Confidential and proprietary. Active Learning –Toy Example 16 Toy data – 400 instances Model using random sampling 70% accuracy Model using active learning Uncertainty sampling – 90% accuracy
  • 17. © 2016 PayPal Inc. Confidential and proprietary. Active Learning For Fraud Prevention – Why is it unique? 17 • Data is unbalanced • Fraud labelling require trained experts. Can’t be outsourced • Fraud labelling is time consuming • Fraud labelling require more than just individual instances. Require before & after transactions • Fraud labelling require data from other entities (ex: IP address) • Fraud labelling require aggregate data • Fraud tag mature at different times (ex: chargeback) & not instantaneous
  • 18. © 2016 PayPal Inc. Confidential and proprietary. Active Learning For Fraud Prevention – High Level Framework 18 Labeled Data Create Bags Deep Learning Model GBT Model (Re)Build Models Unlabeled Data Predict Query By Committee Human Expert Create Statistics Active Feature Engineering Simulate Features
  • 19. © 2016 PayPal Inc. Confidential and proprietary. Modeling Algorithm – Deep Learning 19 Input Layer Hidden Layers Output Layer • If a network has many layers of non-linearity, it is “deep” • Need scalable platform • Need lots of training data
  • 20. © 2016 PayPal Inc. Confidential and proprietary. Modeling Algorithm – Deep Learning 20 •NetworkTopology – Feed forward •Key Parameters • # of hidden layers • # of neurons @ each hidden layer • Regularization • Activation function
  • 21. © 2016 PayPal Inc. Confidential and proprietary. Modeling Algorithm – Gradient BoostingTrees 21 • GBT = Gradient Descent + Boosting • Fit an additive (ensemble) model in forward stage wise manner • In each stage introduce a new model to compensate the shortcomings of existing models
  • 22. © 2016 PayPal Inc. Confidential and proprietary. Modeling Algorithm – Gradient BoostingTrees 22 • Strengths • No pre-processing required • Robust • Scalable • Weaknesses • Overfits (Need to find proper stopping point) • Sensitive to noise • Key Parameters • # of trees • Max depth • Max observations • Learning rate
  • 24. © 2016 PayPal Inc. Confidential and proprietary. Datasets 24 • Training Data • 1 year • 11 million transactions (1 million for active labelling) • Test Data • 4 months • 4 million transactions • # of features • 500 - 600
  • 25. © 2016 PayPal Inc. Confidential and proprietary. Tools 25 • H2O • Open source • Scalable • Robust • Deep Learning & GBM implementations • R • Open source • Active learning package
  • 26. © 2016 PayPal Inc. Confidential and proprietary. 26 # of instances queried AUC (*weighted) 0 0.960 1000 0.961 10000 0.963 50000 0.971 100000 0.975 500000 0.977 1000000 0.979 Early Results – Active Learning Shows Promise…
  • 28. © 2016 PayPal Inc. Confidential and proprietary. Conclusions 28 • Deep learning & GBT has shown tremendous performance for fraud detection. • Active learning shows promise in improving performance of these champion models • Active learning also significantly reduce our labelling cost

Editor's Notes

  1. *All Q1 except for active customer accounts
  2. .