SlideShare a Scribd company logo
1 of 20
Predicting Propensity to
Default using PAI
Pradeep Menon,
Director of Big Data and AI Solutions,
Alibaba Cloud
@rpradeepmenon
pradeep.menon@alibaba-inc.com
Overview
01
02
Quick introduction to MaxCompute
and PAI
End-End Data Science:
Predict propensity to default
MaxCompute
Large Scale Data
Processing
Key Features
Peta-byte level scaling
Multiple computational
models (graphs, sql, MR)
Safe and reliable
Rich developer ecosystem
(python, java, R)
Cost-effective PaaS
offering$
Machine Learning PAI
100+ data science
components
Sophisticated Machine
Learning Algorithms
Drag & Drop interface + SDK
Support for deep learning
frameworks (TensorFlow,
Caffe, MXNet)^
GPU^ + CPU Support
Machine Learning
at Scale
Key Features
^ Coming Soon
PAI Algorithm Catalog
Time Series
ARIMA Auto ARIMA
Regression
Gradient Boosted
Regression
Linear
Regression
PS-SMART
Regression
PS-Linear
Regression
Network Analytics
Tree Depth K-Core
Single Source
Shortest Path
Page Rank
Label Propagation
Clustering
Modularity
Point Clustering
Coefficient
Edge Clustering
Coefficient
Multiclass Classification
PS-SMART
Regression
K-NN
Logistic
Regression
Random Forest
Naive Bayes
Binary Classification
Gradient Boosted
Decision Tree
PS-SMART Binary
Classification
Linear SVM
Logistic
Regression
Model Evaluation
Binary
Classification
Evaluation
Regression Model
Evaluation
Clustering
Evaluation
Confusion Matrix
Multi Class
Evaluation
Clustering
K-Means
Clustering
Recommendation
Collaborative
Filtering
Text Analytics
Split Word String Similarity
ngram Count
Text
Summarization
Keyword
Extraction
Sentence
Splitting
Semantic
Vector
Doc2Vec
CRF Article Similarity
Word2Vec TF-IDF
PLDA SVD
Predict who will default based on input
parameters
Business Problem
Define Business
Problem
Map to Machine
Learning Problem
Data Preparation
Exploratory Data
Analysis
Modeling Evaluation
• Clearly defined business problem
• Set success criteria
• Define clear data science objectives
• Understand data points and constraints
• Formulate data analytics strategy
• Perform required transformation
• Experiment with multiple models
• Choose the most optimal model
• Create a feedback loop
• Break business problems to data
science problems
• Identify Machine Learning
Problem categories
• Perform statistical and visual analysis
• Discover and handle outliers/errors
• Shortlist predictive modelling techniques
80% of work 20% of work
Data Science Process
Demo Architecture
Data
Laon Status
Fully Paid
Charged off i.e. Default
Categorical
Loan Status
Annual Income
Credit Score
Years in Current Job
Home Ownership
Purpose
Laon Amount Term
Unique
Loan Id
Customer Id
Loan Data
Tax Liens
Bankruptcies
Binary
Numeric
Credit Score
Loan Amount Term
Years in Current Job
Home Ownership
Annual Income
Years of Credit History
Months Since Last Delinquent
Number of Open Accounts
Number of Credit Problems
Current Credit Balance
Maximum Open Credit
Purpose Monthly Debt
Machine Learning Tasks
Machine
Learning
Unsupervised
Has Target
Specific
Purpose
No Target
Exploratory
Types
Clustering
Link Prediction
Data Reduction
Categorical Target
Will Churn or Not?
Numeric Target
Continuous Variable
E.g. Predicting car
price for next month
Often Binary
Creating Unknown
Segments
Recommendation
Engines
Dimensionality
Reduction
Supervised
Types
Classification
Regression
1
2
Get Data
SQL Script
Transformation
Fill Missing
Values
Normalize
Data
Exploratory
Data Analysis
Exploration and Data Pre-processing
Normalization
What
1
Re-scales Numeric Values
Brings Them to Same Scale
Eliminates Skew
Improves Model Performance
Buffers from Unseen Variability
Alleviates Outlier Impacts
2
FrequentTypes
3
Z-score
Min-max
Why?
Mean
Standard Deviation
Transform to a Range
-1..1
0..1
Normalisation
Data
Training
Set
Testing
Set
Derive
Model
Test
Model
Estimate
Accuracy/
Reduce Error
Refine Model
Unseen Data
Split Data
FPR
TPR
1
1
0
AUC > 0.99 ”May be overfitting”
AUC = 0.9 ”A better model”
AUC >= 0.7 ”A good model”
AUC > 0.5 ”Better use a coin”
Target AUC: 0.85
Binary Classification Evaluation
Time Series
ARIMA Auto ARIMA
Regression
Gradient Boosted
Regression
Linear
Regression
PS-SMART
Regression
PS-Linear
Regression
Network Analytics
Tree Depth K-Core
Single Source
Shortest Path
Page Rank
Label Propagation
Clustering
Modularity
Point Clustering
Coefficient
Edge Clustering
Coefficient
Multiclass Classification
PS-SMART
Regression
K-NN
Logistic
Regression
Random Forest
Naive Bayes
Binary Classification
Gradient Boosted
Decision Tree
PS-SMART Binary
Classification
Linear SVM
Logistic
Regression
Model Evaluation
Binary
Classification
Evaluation
Regression Model
Evaluation
Clustering
Evaluation
Confusion Matrix
Multi Class
Evaluation
Clustering
K-Means
Clustering
Recommendation
Collaborative
Filtering
Text Analytics
Split Word String Similarity
ngram Count
Text
Summarization
Keyword
Extraction
Sentence
Splitting
Semantic
Vector
Doc2Vec
CRF Article Similarity
Word2Vec TF-IDF
PLDA SVD
Lets Choose 2
Algorithms
LOGISTIC
REGRESSION
RANDOM
FOREST
End to End Experiment Flow
AUC = 0.6456 AUC = 0.8151
Target AUC: 0.80
LOGISTIC REGRESSION RANDOM FOREST
Choose the Best Model
Deployment Model
Creation
Schedule the model in
Data Works
Choose Best Model
Create Deployment
Experiment
Write Predictions to
MaxCompute
Deploy the model to
DataWorks for
periodic execution
Model Deployment and Schedule
PAI enables
Machine Learning
at scale
Easy to use platform
for AI
Enables rapid
deployment of model
for faster insight
Top 3 Takeaways
Loan Default Prediction with Machine Learning

More Related Content

What's hot

Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
Abhishek Singh
 

What's hot (20)

Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
 
Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning Algorithms
 
Loan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine LearningLoan Approval Prediction Using Machine Learning
Loan Approval Prediction Using Machine Learning
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients Machine Learning Project - Default credit card clients
Machine Learning Project - Default credit card clients
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
Default Credit Card Prediction
Default Credit Card PredictionDefault Credit Card Prediction
Default Credit Card Prediction
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Data science in finance industry
Data science in finance industryData science in finance industry
Data science in finance industry
 
Propensity Modelling for Banks
Propensity Modelling for BanksPropensity Modelling for Banks
Propensity Modelling for Banks
 
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
 
Decision tree for Predictive Modeling
Decision tree for Predictive ModelingDecision tree for Predictive Modeling
Decision tree for Predictive Modeling
 
Credit scorecard
Credit scorecardCredit scorecard
Credit scorecard
 
Machine Learning in Banking
Machine Learning in BankingMachine Learning in Banking
Machine Learning in Banking
 
Credit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning AlgorithmsCredit card fraud detection using machine learning Algorithms
Credit card fraud detection using machine learning Algorithms
 
Credit card fraud detection using python machine learning
Credit card fraud detection using python machine learningCredit card fraud detection using python machine learning
Credit card fraud detection using python machine learning
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Machine Learning Applications in Credit Risk
Machine Learning Applications in Credit RiskMachine Learning Applications in Credit Risk
Machine Learning Applications in Credit Risk
 

Similar to Loan Default Prediction with Machine Learning

Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Wei Di
 
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Databricks
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
webuploader
 

Similar to Loan Default Prediction with Machine Learning (20)

big-data-anallytics.pptx
big-data-anallytics.pptxbig-data-anallytics.pptx
big-data-anallytics.pptx
 
Imtiaz khan data_science_analytics
Imtiaz khan data_science_analyticsImtiaz khan data_science_analytics
Imtiaz khan data_science_analytics
 
TestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data TestingTestGuild and QuerySurge Presentation -DevOps for Data Testing
TestGuild and QuerySurge Presentation -DevOps for Data Testing
 
AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019AIDC NY: BODO AI Presentation - 09.19.2019
AIDC NY: BODO AI Presentation - 09.19.2019
 
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmeaIntroduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
Introduction to Machine learning - DBA's to data scientists - Oct 2020 - OGBEmea
 
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEAIntroduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
Introduction to Machine Learning - From DBA's to Data Scientists - OGBEMEA
 
Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)Amazon SageMaker 內建機器學習演算法 (Level 400)
Amazon SageMaker 內建機器學習演算法 (Level 400)
 
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan PanayotovSpark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
Spark ML with High Dimensional Labels Michael Zargham and Stefan Panayotov
 
Bhargav selenium
Bhargav seleniumBhargav selenium
Bhargav selenium
 
Arquitectura de Datos en Azure
Arquitectura de Datos en AzureArquitectura de Datos en Azure
Arquitectura de Datos en Azure
 
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligenceSpark summit 2017- Transforming B2B sales with Spark powered sales intelligence
Spark summit 2017- Transforming B2B sales with Spark powered sales intelligence
 
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
Transforming B2B Sales with Spark-Powered Sales Intelligence with Songtao Guo...
 
Recom Banking Solution
Recom Banking  SolutionRecom Banking  Solution
Recom Banking Solution
 
AnalysisServices
AnalysisServicesAnalysisServices
AnalysisServices
 
Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)Vadlamudi saketh30 (ml)
Vadlamudi saketh30 (ml)
 
DATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAIDATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAI
 
Data science training
Data science training Data science training
Data science training
 
DATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAIDATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAI
 
Data science training
Data science trainingData science training
Data science training
 
DATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAIDATA SCIENCE TRAINING IN CHENNAI
DATA SCIENCE TRAINING IN CHENNAI
 

More from Alibaba Cloud

More from Alibaba Cloud (20)

Why a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is EssentialWhy a Multi-cloud Strategy is Essential
Why a Multi-cloud Strategy is Essential
 
Getting Started with Elasticsearch
Getting Started with ElasticsearchGetting Started with Elasticsearch
Getting Started with Elasticsearch
 
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to ThinkAlibaba Cloud’s ET City Brain - Empowering Cities to Think
Alibaba Cloud’s ET City Brain - Empowering Cities to Think
 
Serverless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business ValueServerless Computing: Driving Innovation and Business Value
Serverless Computing: Driving Innovation and Business Value
 
Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)Next Level Digital Media with Alibaba Cloud (Part 2)
Next Level Digital Media with Alibaba Cloud (Part 2)
 
An Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message ServiceAn Introduction to Alibaba Cloud’s Message Service
An Introduction to Alibaba Cloud’s Message Service
 
Protecting Your Big Data on the Cloud
Protecting Your Big Data on the CloudProtecting Your Big Data on the Cloud
Protecting Your Big Data on the Cloud
 
Next Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best PracticesNext Generation Retail Part 3 - Retail Transformation Best Practices
Next Generation Retail Part 3 - Retail Transformation Best Practices
 
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
Cyber Security Compliance Solutions for Foreign Companies in China - Alibaba ...
 
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail CloudThe Next Generation of Retail - Unlocking Alibaba Retail Cloud
The Next Generation of Retail - Unlocking Alibaba Retail Cloud
 
Big Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data IntegrationBig Data Quickstart Series 3: Perform Data Integration
Big Data Quickstart Series 3: Perform Data Integration
 
Migration to Alibaba Cloud
Migration to Alibaba CloudMigration to Alibaba Cloud
Migration to Alibaba Cloud
 
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the CloudHow to Leverage ApsaraDB to Deploy Business Data on the Cloud
How to Leverage ApsaraDB to Deploy Business Data on the Cloud
 
Big Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data VisualizationBig Data Quickstart Series 1: Create Powerful Data Visualization
Big Data Quickstart Series 1: Create Powerful Data Visualization
 
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
Introduction to Elastic Compute Service on Alibaba Cloud to Power Your Busine...
 
Launch and Scale Your E-commerce Website with Magento
Launch and Scale Your E-commerce Website with MagentoLaunch and Scale Your E-commerce Website with Magento
Launch and Scale Your E-commerce Website with Magento
 
Responding to Digital Transformation With RDS Database Technology
Responding to Digital Transformation With RDS Database TechnologyResponding to Digital Transformation With RDS Database Technology
Responding to Digital Transformation With RDS Database Technology
 
How to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba CloudHow to Set Up ApsaraDB for RDS on Alibaba Cloud
How to Set Up ApsaraDB for RDS on Alibaba Cloud
 
Guide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in ChinaGuide to Cybersecurity Compliance in China
Guide to Cybersecurity Compliance in China
 
Discovering Cloud Networking: VPC, VPN, Express Connect & Server Load Balancer
Discovering Cloud Networking: VPC, VPN, Express Connect & Server Load BalancerDiscovering Cloud Networking: VPC, VPN, Express Connect & Server Load Balancer
Discovering Cloud Networking: VPC, VPN, Express Connect & Server Load Balancer
 

Recently uploaded

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Loan Default Prediction with Machine Learning

  • 1. Predicting Propensity to Default using PAI Pradeep Menon, Director of Big Data and AI Solutions, Alibaba Cloud @rpradeepmenon pradeep.menon@alibaba-inc.com
  • 2. Overview 01 02 Quick introduction to MaxCompute and PAI End-End Data Science: Predict propensity to default
  • 3. MaxCompute Large Scale Data Processing Key Features Peta-byte level scaling Multiple computational models (graphs, sql, MR) Safe and reliable Rich developer ecosystem (python, java, R) Cost-effective PaaS offering$
  • 4. Machine Learning PAI 100+ data science components Sophisticated Machine Learning Algorithms Drag & Drop interface + SDK Support for deep learning frameworks (TensorFlow, Caffe, MXNet)^ GPU^ + CPU Support Machine Learning at Scale Key Features ^ Coming Soon
  • 5. PAI Algorithm Catalog Time Series ARIMA Auto ARIMA Regression Gradient Boosted Regression Linear Regression PS-SMART Regression PS-Linear Regression Network Analytics Tree Depth K-Core Single Source Shortest Path Page Rank Label Propagation Clustering Modularity Point Clustering Coefficient Edge Clustering Coefficient Multiclass Classification PS-SMART Regression K-NN Logistic Regression Random Forest Naive Bayes Binary Classification Gradient Boosted Decision Tree PS-SMART Binary Classification Linear SVM Logistic Regression Model Evaluation Binary Classification Evaluation Regression Model Evaluation Clustering Evaluation Confusion Matrix Multi Class Evaluation Clustering K-Means Clustering Recommendation Collaborative Filtering Text Analytics Split Word String Similarity ngram Count Text Summarization Keyword Extraction Sentence Splitting Semantic Vector Doc2Vec CRF Article Similarity Word2Vec TF-IDF PLDA SVD
  • 6. Predict who will default based on input parameters Business Problem
  • 7. Define Business Problem Map to Machine Learning Problem Data Preparation Exploratory Data Analysis Modeling Evaluation • Clearly defined business problem • Set success criteria • Define clear data science objectives • Understand data points and constraints • Formulate data analytics strategy • Perform required transformation • Experiment with multiple models • Choose the most optimal model • Create a feedback loop • Break business problems to data science problems • Identify Machine Learning Problem categories • Perform statistical and visual analysis • Discover and handle outliers/errors • Shortlist predictive modelling techniques 80% of work 20% of work Data Science Process
  • 9. Data Laon Status Fully Paid Charged off i.e. Default Categorical Loan Status Annual Income Credit Score Years in Current Job Home Ownership Purpose Laon Amount Term Unique Loan Id Customer Id Loan Data Tax Liens Bankruptcies Binary Numeric Credit Score Loan Amount Term Years in Current Job Home Ownership Annual Income Years of Credit History Months Since Last Delinquent Number of Open Accounts Number of Credit Problems Current Credit Balance Maximum Open Credit Purpose Monthly Debt
  • 10. Machine Learning Tasks Machine Learning Unsupervised Has Target Specific Purpose No Target Exploratory Types Clustering Link Prediction Data Reduction Categorical Target Will Churn or Not? Numeric Target Continuous Variable E.g. Predicting car price for next month Often Binary Creating Unknown Segments Recommendation Engines Dimensionality Reduction Supervised Types Classification Regression 1 2
  • 11. Get Data SQL Script Transformation Fill Missing Values Normalize Data Exploratory Data Analysis Exploration and Data Pre-processing
  • 12. Normalization What 1 Re-scales Numeric Values Brings Them to Same Scale Eliminates Skew Improves Model Performance Buffers from Unseen Variability Alleviates Outlier Impacts 2 FrequentTypes 3 Z-score Min-max Why? Mean Standard Deviation Transform to a Range -1..1 0..1 Normalisation
  • 14. FPR TPR 1 1 0 AUC > 0.99 ”May be overfitting” AUC = 0.9 ”A better model” AUC >= 0.7 ”A good model” AUC > 0.5 ”Better use a coin” Target AUC: 0.85 Binary Classification Evaluation
  • 15. Time Series ARIMA Auto ARIMA Regression Gradient Boosted Regression Linear Regression PS-SMART Regression PS-Linear Regression Network Analytics Tree Depth K-Core Single Source Shortest Path Page Rank Label Propagation Clustering Modularity Point Clustering Coefficient Edge Clustering Coefficient Multiclass Classification PS-SMART Regression K-NN Logistic Regression Random Forest Naive Bayes Binary Classification Gradient Boosted Decision Tree PS-SMART Binary Classification Linear SVM Logistic Regression Model Evaluation Binary Classification Evaluation Regression Model Evaluation Clustering Evaluation Confusion Matrix Multi Class Evaluation Clustering K-Means Clustering Recommendation Collaborative Filtering Text Analytics Split Word String Similarity ngram Count Text Summarization Keyword Extraction Sentence Splitting Semantic Vector Doc2Vec CRF Article Similarity Word2Vec TF-IDF PLDA SVD Lets Choose 2 Algorithms LOGISTIC REGRESSION RANDOM FOREST
  • 16. End to End Experiment Flow
  • 17. AUC = 0.6456 AUC = 0.8151 Target AUC: 0.80 LOGISTIC REGRESSION RANDOM FOREST Choose the Best Model
  • 18. Deployment Model Creation Schedule the model in Data Works Choose Best Model Create Deployment Experiment Write Predictions to MaxCompute Deploy the model to DataWorks for periodic execution Model Deployment and Schedule
  • 19. PAI enables Machine Learning at scale Easy to use platform for AI Enables rapid deployment of model for faster insight Top 3 Takeaways