See webinar recording of this presentation at: https://resource.alibabacloud.com/webinar/detail.htm?webinarId=50
This webinar is designed to help users understand the end-to-end data science processes of using a propensity model on Alibaba Cloud’s Machine Learning Platform for AI; from defining the business problem, exploratory data analysis, data processing, model training to testing and deployment. You get an end-to-end case study (including a live demo) on how to use Alibaba Cloud products to predict the propensity of loan defaults.
Learn more about Machine Learning Platform for AI:
https://www.alibabacloud.com/product/machine-learning
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Loan Default Prediction with Machine Learning
1. Predicting Propensity to
Default using PAI
Pradeep Menon,
Director of Big Data and AI Solutions,
Alibaba Cloud
@rpradeepmenon
pradeep.menon@alibaba-inc.com
3. MaxCompute
Large Scale Data
Processing
Key Features
Peta-byte level scaling
Multiple computational
models (graphs, sql, MR)
Safe and reliable
Rich developer ecosystem
(python, java, R)
Cost-effective PaaS
offering$
4. Machine Learning PAI
100+ data science
components
Sophisticated Machine
Learning Algorithms
Drag & Drop interface + SDK
Support for deep learning
frameworks (TensorFlow,
Caffe, MXNet)^
GPU^ + CPU Support
Machine Learning
at Scale
Key Features
^ Coming Soon
5. PAI Algorithm Catalog
Time Series
ARIMA Auto ARIMA
Regression
Gradient Boosted
Regression
Linear
Regression
PS-SMART
Regression
PS-Linear
Regression
Network Analytics
Tree Depth K-Core
Single Source
Shortest Path
Page Rank
Label Propagation
Clustering
Modularity
Point Clustering
Coefficient
Edge Clustering
Coefficient
Multiclass Classification
PS-SMART
Regression
K-NN
Logistic
Regression
Random Forest
Naive Bayes
Binary Classification
Gradient Boosted
Decision Tree
PS-SMART Binary
Classification
Linear SVM
Logistic
Regression
Model Evaluation
Binary
Classification
Evaluation
Regression Model
Evaluation
Clustering
Evaluation
Confusion Matrix
Multi Class
Evaluation
Clustering
K-Means
Clustering
Recommendation
Collaborative
Filtering
Text Analytics
Split Word String Similarity
ngram Count
Text
Summarization
Keyword
Extraction
Sentence
Splitting
Semantic
Vector
Doc2Vec
CRF Article Similarity
Word2Vec TF-IDF
PLDA SVD
6. Predict who will default based on input
parameters
Business Problem
7. Define Business
Problem
Map to Machine
Learning Problem
Data Preparation
Exploratory Data
Analysis
Modeling Evaluation
• Clearly defined business problem
• Set success criteria
• Define clear data science objectives
• Understand data points and constraints
• Formulate data analytics strategy
• Perform required transformation
• Experiment with multiple models
• Choose the most optimal model
• Create a feedback loop
• Break business problems to data
science problems
• Identify Machine Learning
Problem categories
• Perform statistical and visual analysis
• Discover and handle outliers/errors
• Shortlist predictive modelling techniques
80% of work 20% of work
Data Science Process
9. Data
Laon Status
Fully Paid
Charged off i.e. Default
Categorical
Loan Status
Annual Income
Credit Score
Years in Current Job
Home Ownership
Purpose
Laon Amount Term
Unique
Loan Id
Customer Id
Loan Data
Tax Liens
Bankruptcies
Binary
Numeric
Credit Score
Loan Amount Term
Years in Current Job
Home Ownership
Annual Income
Years of Credit History
Months Since Last Delinquent
Number of Open Accounts
Number of Credit Problems
Current Credit Balance
Maximum Open Credit
Purpose Monthly Debt
10. Machine Learning Tasks
Machine
Learning
Unsupervised
Has Target
Specific
Purpose
No Target
Exploratory
Types
Clustering
Link Prediction
Data Reduction
Categorical Target
Will Churn or Not?
Numeric Target
Continuous Variable
E.g. Predicting car
price for next month
Often Binary
Creating Unknown
Segments
Recommendation
Engines
Dimensionality
Reduction
Supervised
Types
Classification
Regression
1
2
12. Normalization
What
1
Re-scales Numeric Values
Brings Them to Same Scale
Eliminates Skew
Improves Model Performance
Buffers from Unseen Variability
Alleviates Outlier Impacts
2
FrequentTypes
3
Z-score
Min-max
Why?
Mean
Standard Deviation
Transform to a Range
-1..1
0..1
Normalisation
17. AUC = 0.6456 AUC = 0.8151
Target AUC: 0.80
LOGISTIC REGRESSION RANDOM FOREST
Choose the Best Model
18. Deployment Model
Creation
Schedule the model in
Data Works
Choose Best Model
Create Deployment
Experiment
Write Predictions to
MaxCompute
Deploy the model to
DataWorks for
periodic execution
Model Deployment and Schedule
19. PAI enables
Machine Learning
at scale
Easy to use platform
for AI
Enables rapid
deployment of model
for faster insight
Top 3 Takeaways