Loan Default Prediction with Machine Learning

Predicting Propensity to
Default using PAI
Pradeep Menon,
Director of Big Data and AI Solutions,
Alibaba Cloud
@rpradeepmenon
pradeep.menon@alibaba-inc.com

Overview
01
02
Quick introduction to MaxCompute
and PAI
End-End Data Science:
Predict propensity to default

MaxCompute
Large Scale Data
Processing
Key Features
Peta-byte level scaling
Multiple computational
models (graphs, sql, MR)
Safe and reliable
Rich developer ecosystem
(python, java, R)
Cost-effective PaaS
offering$

Machine Learning PAI
100+ data science
components
Sophisticated Machine
Learning Algorithms
Drag & Drop interface + SDK
Support for deep learning
frameworks (TensorFlow,
Caffe, MXNet)^
GPU^ + CPU Support
Machine Learning
at Scale
Key Features
^ Coming Soon

PAI Algorithm Catalog
Time Series
ARIMA Auto ARIMA
Regression
Gradient Boosted
Regression
Linear
Regression
PS-SMART
Regression
PS-Linear
Regression
Network Analytics
Tree Depth K-Core
Single Source
Shortest Path
Page Rank
Label Propagation
Clustering
Modularity
Point Clustering
Coefficient
Edge Clustering
Coefficient
Multiclass Classification
PS-SMART
Regression
K-NN
Logistic
Regression
Random Forest
Naive Bayes
Binary Classification
Gradient Boosted
Decision Tree
PS-SMART Binary
Classification
Linear SVM
Logistic
Regression
Model Evaluation
Binary
Classification
Evaluation
Regression Model
Evaluation
Clustering
Evaluation
Confusion Matrix
Multi Class
Evaluation
Clustering
K-Means
Clustering
Recommendation
Collaborative
Filtering
Text Analytics
Split Word String Similarity
ngram Count
Text
Summarization
Keyword
Extraction
Sentence
Splitting
Semantic
Vector
Doc2Vec
CRF Article Similarity
Word2Vec TF-IDF
PLDA SVD

Predict who will default based on input
parameters
Business Problem

Define Business
Problem
Map to Machine
Learning Problem
Data Preparation
Exploratory Data
Analysis
Modeling Evaluation
• Clearly defined business problem
• Set success criteria
• Define clear data science objectives
• Understand data points and constraints
• Formulate data analytics strategy
• Perform required transformation
• Experiment with multiple models
• Choose the most optimal model
• Create a feedback loop
• Break business problems to data
science problems
• Identify Machine Learning
Problem categories
• Perform statistical and visual analysis
• Discover and handle outliers/errors
• Shortlist predictive modelling techniques
80% of work 20% of work
Data Science Process

Data
Laon Status
Fully Paid
Charged off i.e. Default
Categorical
Loan Status
Annual Income
Credit Score
Years in Current Job
Home Ownership
Purpose
Laon Amount Term
Unique
Loan Id
Customer Id
Loan Data
Tax Liens
Bankruptcies
Binary
Numeric
Credit Score
Loan Amount Term
Years in Current Job
Home Ownership
Annual Income
Years of Credit History
Months Since Last Delinquent
Number of Open Accounts
Number of Credit Problems
Current Credit Balance
Maximum Open Credit
Purpose Monthly Debt

Machine Learning Tasks
Machine
Learning
Unsupervised
Has Target
Specific
Purpose
No Target
Exploratory
Types
Clustering
Link Prediction
Data Reduction
Categorical Target
Will Churn or Not?
Numeric Target
Continuous Variable
E.g. Predicting car
price for next month
Often Binary
Creating Unknown
Segments
Recommendation
Engines
Dimensionality
Reduction
Supervised
Types
Classification
Regression
1
2

Get Data
SQL Script
Transformation
Fill Missing
Values
Normalize
Data
Exploratory
Data Analysis
Exploration and Data Pre-processing

Normalization
What
1
Re-scales Numeric Values
Brings Them to Same Scale
Eliminates Skew
Improves Model Performance
Buffers from Unseen Variability
Alleviates Outlier Impacts
2
FrequentTypes
3
Z-score
Min-max
Why?
Mean
Standard Deviation
Transform to a Range
-1..1
0..1
Normalisation

Data
Training
Set
Testing
Set
Derive
Model
Test
Model
Estimate
Accuracy/
Reduce Error
Refine Model
Unseen Data
Split Data

FPR
TPR
1
1
0
AUC > 0.99 ”May be overfitting”
AUC = 0.9 ”A better model”
AUC >= 0.7 ”A good model”
AUC > 0.5 ”Better use a coin”
Target AUC: 0.85
Binary Classification Evaluation

Time Series
ARIMA Auto ARIMA
Regression
Gradient Boosted
Regression
Linear
Regression
PS-SMART
Regression
PS-Linear
Regression
Network Analytics
Tree Depth K-Core
Single Source
Shortest Path
Page Rank
Label Propagation
Clustering
Modularity
Point Clustering
Coefficient
Edge Clustering
Coefficient
Multiclass Classification
PS-SMART
Regression
K-NN
Logistic
Regression
Random Forest
Naive Bayes
Binary Classification
Gradient Boosted
Decision Tree
PS-SMART Binary
Classification
Linear SVM
Logistic
Regression
Model Evaluation
Binary
Classification
Evaluation
Regression Model
Evaluation
Clustering
Evaluation
Confusion Matrix
Multi Class
Evaluation
Clustering
K-Means
Clustering
Recommendation
Collaborative
Filtering
Text Analytics
Split Word String Similarity
ngram Count
Text
Summarization
Keyword
Extraction
Sentence
Splitting
Semantic
Vector
Doc2Vec
CRF Article Similarity
Word2Vec TF-IDF
PLDA SVD
Lets Choose 2
Algorithms
LOGISTIC
REGRESSION
RANDOM
FOREST

AUC = 0.6456 AUC = 0.8151
Target AUC: 0.80
LOGISTIC REGRESSION RANDOM FOREST
Choose the Best Model

Deployment Model
Creation
Schedule the model in
Data Works
Choose Best Model
Create Deployment
Experiment
Write Predictions to
MaxCompute
Deploy the model to
DataWorks for
periodic execution
Model Deployment and Schedule

PAI enables
Machine Learning
at scale
Easy to use platform
for AI
Enables rapid
deployment of model
for faster insight
Top 3 Takeaways

Loan Default Prediction with Machine Learning

Loan Default Prediction with Machine Learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Loan Default Prediction with Machine Learning

Similar to Loan Default Prediction with Machine Learning (20)

More from Alibaba Cloud

More from Alibaba Cloud (20)

Recently uploaded

Recently uploaded (20)

Loan Default Prediction with Machine Learning