SlideShare une entreprise Scribd logo
1  sur  121
Télécharger pour lire hors ligne
Predicting delinquency on debt
What is the problem?
What is the problem?
• X Store has a retail credit card available to
customers
What is the problem?
• X Store has a retail credit card available to
customers
• There can be a number of sources of loss
from this product, but one is customer’s
defaulting on their debt
What is the problem?
• X Store has a retail credit card available to
customers
• There can be a number of sources of loss
from this product, but one is customer’s
defaulting on their debt
• This prevents the store from collecting
payment for products and services
rendered
Is this problem big enough to matter?
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
• Average loss of $8.12 per customer
Is this problem big enough to matter?
• Examining a slice of the customer database
(150,000 customers) we find that 6.6% of
customers were seriously delinquent in
payment the last two years
• If only 5% of their carried debt was the
store credit card this is potentially an:
• Average loss of $8.12 per customer
• Potential overall loss of $1.2 million
What can be done?
What can be done?
• There are numerous models that can be
used to predict which customers will
default
What can be done?
• There are numerous models that can be
used to predict which customers will
default
• This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
What can be done?
• There are numerous models that can be
used to predict which customers will
default
• This could be used to decrease credit limits
or cancel credit lines for current risky
customers to minimize potential loss
• Or better screen which customers are
approved for the card
How will I do this?
How will I do this?
• This is a basic classification problem with
important business implications
How will I do this?
• This is a basic classification problem with
important business implications
• We’ll examine a few simplistic models to
get an idea of performance
How will I do this?
• This is a basic classification problem with
important business implications
• We’ll examine a few simplistic models to
get an idea of performance
• Explore decision tree methods to achieve
better performance
What will the models predict delinquency?
Each customer has a number of attributes
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
What will the models predict delinquency?
Each customer has a number of attributes
John Smith
Delinquent:Yes
Age: 23
Income: $1600
Number of Lines: 4
Mary Rasmussen
Delinquent: No
Age: 73
Income: $2200
Number of Lines: 2
...
We will use the customer attributes to predict
whether they were delinquent
How do we make sure that our solution actually
has predictive power?
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train
150,000
customers
Delinquency
in dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
How do we make sure that our solution actually
has predictive power?
We have two slices of the customer dataset
Train Test
150,000
customers
Delinquency
in dataset
101,000
customers
Delinquency
not in
dataset
None of the customers in the test dataset are
used to train the model
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Internally we validate our model performance
with cross-fold validation
Using only the train dataset we can get a sense of how
well our model performs without externally validating it
Train
Train 1
Train 2
Train 3
Train 1
Train 2
Algorithm
Training
Algorithm
Testing
Train 3
What matters is how well we can predict
the test dataset
We judge this using the accuracy, which is the number
of our predictions correct out of the total number of
predictions made
So with 100,000 customers and an 80% accuracy we
will have correctly predicted whether 80,000
customers will default or not in the next two years
Putting accuracy in context
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
Putting accuracy in context
We could save $600,000 over two years if we
correctly predicted 50% of the customers that would
default and changed their account to prevent it
The potential loss is minimized by ~$8,000 for every
100,000 customers with each percentage point
increase in accuracy
Looking at the actual data
Looking at the actual data
Looking at the actual data
Looking at the actual data
Assume
$2,500
Looking at the actual data
Assume
$2,500
Assume
0
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
There is a continuum of algorithmic choices to
tackle the problem
Simpler,
Quicker
Complex,
Slower
Random
Chance
50%
Simple
Classification
For simple classification we pick a single attribute
and find the best split in the customers
For simple classification we pick a single attribute
and find the best split in the customers
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2
For simple classification we pick a single attribute
and find the best split in the customers
NumberofCustomers
Times Past Due
True Positive
True Negative
False Positive
False Negative
1 2 ...
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0 20 40 60 80 100
Number of Times 30-59 Days Past Due
0
0.2
0.4
0.6
0.8
Accuracy
Precision
Sensitivity
We evaluate possible splits using accuracy,
precision, and sensitivity
Acc = Number correct
Total Number
Prec = True Positives
Number of People
Predicted Delinquent
Sens = True Positives
Number of People
Actually Delinquent
0.61 KGI on Test Set
However, not all fields are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
However, not all fields are as informative
Using the number of times past due 60-89 days
we achieve a KGI of 0.5
The approach is naive and could be improved but
our time is better spent on different algorithms
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
A random forest starts from a decision tree
Customer Data
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
A random forest starts from a decision tree
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
75,000
Customers>30
Yes
25,000
Customers <30
...
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
Class assignment of a customer is based on how many
of the decision trees “vote” on how to split an attribute
A random forest is composed of many decision trees
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
We use a large number of trees to not over-fit to the
training data
Class assignment of a customer is based on how many
of the decision trees “vote” on how to split an attribute
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
The Random Forest algorithm are easily implemented
In Python or R for initial testing and validation
Also parallelized with Mahout and Hadoop since
there is no dependence from one tree to the next
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
A random forest performs well on the test set
Random Forest
10 trees: 0.779 KGI
150 trees: 0.843 KGI
1000 trees: 0.850 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Exploring algorithmic choices further
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
Boosting Trees is similar to a Random Forest
Customer Data
Find the best split in a set of
randomly chosen attributes
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Boosting Trees is similar to a Random Forest
Customer Data
Is age <30?
No
Customers
>30 Data
Yes
Customers
<30 Data
...
Do an exhaustive search
for best split
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
This is a sequential process in
comparison to the random forest
How Gradient Boosting Trees differs from
Random Forest
...
Customer Data
Best Split
No
Customers
Data Set 2
Yes
Customers
Data Set 1
The first tree is optimized to minimize
a loss function describing the data
The next tree is then optimized to
fit whatever variability the first
tree didn’t fit
This is a sequential process in
comparison to the random forest
We also run the risk of over-fitting
to the data, thus the learning rate
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
Implementing Gradient Boosted Trees
In Python or R it is easy for initial testing and validation
There are implementations that use Hadoop but it’s
more complicated to achieve the best performance
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
Gradient Boosting Trees performs well on the dataset
100 trees, 0.1 Learning: 0.865022 KGI
1000 trees, 0.1 Learning: 0.865248 KGI
0 0.6 0.8
Learning Rate
0.75
0.8
0.85
KGI
0.2 0.4
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Moving one step further in complexity
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Or more accurately an ensemble of
ensemble methods
Algorithm Progression
Random Forest
Extremely Random Forest
Gradient Tree Boosting
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Or more accurately an ensemble of
ensemble methods
Algorithm ProgressionTrain Data Probabilities
Random Forest
Extremely Random Forest
Gradient Tree Boosting
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Combine all of the model information
Train Data Probabilities
0.1
0.5
0.01
0.8
0.7
.
.
.
0.15
0.6
0.0
0.75
0.68
.
.
.
Optimize the set of train probabilities
to the known delinquencies
Apply the same weighting scheme to the
set of test data probabilities
Implementation can be done in a number of ways
Testing in Python or R is slower, due to the sequential nature
of applying the algorithms
Could be faster parallelized, running each algorithm separately
and combining the results
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Assessing model performance
Blending Performance, 100 trees: 0.864394 KGI
But this performance and the possibility of
additional gains comes at a distinct time cost.
0.4 0.5 0.6 0.7 0.8 0.9
Random
Accuracy
Classification
Random Forests
Boosting Trees
Blended
Examining the continuum of choices
Simpler,
Quicker
Complex,
Slower
Random
Chance
0.50
Simple
Classification
0.50-0.61
Random
Forests
0.78-0.85
Gradient Tree
Boosting
0.71-0.8659
Blended
Method
0.864
What would be best to implement?
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
What would be best to implement?
There is a large amount of optimization in the
blended method that could be done
However, this algorithm takes the longest to run.
This constraint will apply in testing and validation also
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
What would be best to implement?
Random Forests returns a reasonably good result.
It is quick and easily parallelized
Gradient Tree Boosting returns the best result and
runs reasonably fast.
It is not as easily parallelized though
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Increases in predictive performance have real
business value
Using any of the more complex algorithms we
achieve an increase of 35% in comparison to random
Potential decrease of ~$420k in losses by identifying
customers likely to default in the training set alone
Thank you for your time

Contenu connexe

Tendances

Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsSagar Tupkar
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in BankingArul Bharathi
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestHirak Sen Roy
 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelMattia Ciprian
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language Aayush Kumar
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default predictionALTEN Calsoft Labs
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case StudyLumbiniSardare
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionRavi Gupta
 
EDA_Case_Study_PPT.pptx
EDA_Case_Study_PPT.pptxEDA_Case_Study_PPT.pptx
EDA_Case_Study_PPT.pptxAmitDas125851
 
Final capstone powerpoint
Final capstone powerpointFinal capstone powerpoint
Final capstone powerpointCaroline Nguma
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card PaymentsVikas Virani
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelSubhasis Mishra
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?Smarten Augmented Analytics
 
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...accenture
 

Tendances (20)

Predicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning AlgorithmsPredicting Credit Card Defaults using Machine Learning Algorithms
Predicting Credit Card Defaults using Machine Learning Algorithms
 
Creditscore
CreditscoreCreditscore
Creditscore
 
Data Science Use cases in Banking
Data Science Use cases in BankingData Science Use cases in Banking
Data Science Use cases in Banking
 
Consumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random ForestConsumer Credit Scoring Using Logistic Regression and Random Forest
Consumer Credit Scoring Using Logistic Regression and Random Forest
 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
 
Credit scoring
Credit scoringCredit scoring
Credit scoring
 
Loan default prediction with machine language
Loan  default  prediction with  machine  language Loan  default  prediction with  machine  language
Loan default prediction with machine language
 
NLP in Finance
NLP in FinanceNLP in Finance
NLP in Finance
 
Case Study: Loan default prediction
Case Study: Loan default predictionCase Study: Loan default prediction
Case Study: Loan default prediction
 
Exploratory Data Analysis Bank Fraud Case Study
Exploratory  Data Analysis Bank Fraud Case StudyExploratory  Data Analysis Bank Fraud Case Study
Exploratory Data Analysis Bank Fraud Case Study
 
Taiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detectionTaiwanese Credit Card Client Fraud detection
Taiwanese Credit Card Client Fraud detection
 
EDA_Case_Study_PPT.pptx
EDA_Case_Study_PPT.pptxEDA_Case_Study_PPT.pptx
EDA_Case_Study_PPT.pptx
 
Final capstone powerpoint
Final capstone powerpointFinal capstone powerpoint
Final capstone powerpoint
 
Credit Risk Model Building Steps
Credit Risk Model Building StepsCredit Risk Model Building Steps
Credit Risk Model Building Steps
 
Default of Credit Card Payments
Default of Credit Card PaymentsDefault of Credit Card Payments
Default of Credit Card Payments
 
Project Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring ModelProject Report - Acquisition Credit Scoring Model
Project Report - Acquisition Credit Scoring Model
 
Credit eda case study
Credit eda case studyCredit eda case study
Credit eda case study
 
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
What is Naïve Bayes Classification and How is it Used for Enterprise Analysis?
 
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
Measuring and Managing Credit Risk With Machine Learning and Artificial Intel...
 

En vedette

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentationHJ van Veen
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature EngineeringHJ van Veen
 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluChristian Robert
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentationewig123
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3arogozhnikov
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionSeonho Park
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodHonglin Yu
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Zihui Li
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesPier Luca Lanzi
 
Tda presentation
Tda presentationTda presentation
Tda presentationHJ van Veen
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingTed Xiao
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDeepak George
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Parth Khare
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)HJ van Veen
 

En vedette (20)

Kaggle presentation
Kaggle presentationKaggle presentation
Kaggle presentation
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Reading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li ChenluReading Birnbaum's (1962) paper, by Li Chenlu
Reading Birnbaum's (1962) paper, by Li Chenlu
 
Q trade presentation
Q trade presentationQ trade presentation
Q trade presentation
 
Tree advanced
Tree advancedTree advanced
Tree advanced
 
MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3MLHEP 2015: Introductory Lecture #3
MLHEP 2015: Introductory Lecture #3
 
Comparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for RegressionComparison Study of Decision Tree Ensembles for Regression
Comparison Study of Decision Tree Ensembles for Regression
 
Introduction to Some Tree based Learning Method
Introduction to Some Tree based Learning MethodIntroduction to Some Tree based Learning Method
Introduction to Some Tree based Learning Method
 
L4. Ensembles of Decision Trees
L4. Ensembles of Decision TreesL4. Ensembles of Decision Trees
L4. Ensembles of Decision Trees
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Machine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers EnsemblesMachine Learning and Data Mining: 16 Classifiers Ensembles
Machine Learning and Data Mining: 16 Classifiers Ensembles
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
 
Winning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to StackingWinning Kaggle 101: Introduction to Stacking
Winning Kaggle 101: Introduction to Stacking
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting MachinesDecision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
Decision Tree Ensembles - Bagging, Random Forest & Gradient Boosting Machines
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
 
Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)Matrix Factorisation (and Dimensionality Reduction)
Matrix Factorisation (and Dimensionality Reduction)
 

Similaire à Kaggle "Give me some credit" challenge overview

Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonaliSonali Gupta
 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical InferenceZoha Qureshi
 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalJohn Tyler
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data ScienceCarolyn Knight
 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsScott Boren
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.customersforever
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Mira McKee
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongData Con LA
 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)SaaStock
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePedro Ecija Serrano
 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...3 Birds Marketing LLC
 
Stage Presentation
Stage PresentationStage Presentation
Stage PresentationSCI INFO
 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsFIS
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay ScoreBeth Hall
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance PortfolioRohit Pandey
 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509gnorth
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1Venkata Reddy Konasani
 

Similaire à Kaggle "Give me some credit" challenge overview (20)

Insurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis ProjectInsurance Churn Prediction Data Analysis Project
Insurance Churn Prediction Data Analysis Project
 
Project crm submission sonali
Project crm submission sonaliProject crm submission sonali
Project crm submission sonali
 
Important Terminologies In Statistical Inference
Important  Terminologies In  Statistical  InferenceImportant  Terminologies In  Statistical  Inference
Important Terminologies In Statistical Inference
 
Tpmg Manage Cust Prof Final
Tpmg Manage Cust Prof FinalTpmg Manage Cust Prof Final
Tpmg Manage Cust Prof Final
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Customer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance AgentsCustomer Lifetime Value for Insurance Agents
Customer Lifetime Value for Insurance Agents
 
How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.How to apply CRM using data mining techniques.
How to apply CRM using data mining techniques.
 
Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)Real Estate Executive Summary (MKT460 Lab #5)
Real Estate Executive Summary (MKT460 Lab #5)
 
Being Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're WrongBeing Right Starts By Knowing You're Wrong
Being Right Starts By Knowing You're Wrong
 
David apple typeform retention story - saa stock (1)
David apple   typeform retention story - saa stock (1)David apple   typeform retention story - saa stock (1)
David apple typeform retention story - saa stock (1)
 
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking ExamplePredictive Analytics for Customer Targeting: A Telemarketing Banking Example
Predictive Analytics for Customer Targeting: A Telemarketing Banking Example
 
PATH | WD
PATH | WDPATH | WD
PATH | WD
 
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
Playing the Long Game: Revitalize Your Digital Marketing by Spending on the R...
 
Stage Presentation
Stage PresentationStage Presentation
Stage Presentation
 
Transform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive AnalyticsTransform Your Credit and Collections with Predictive Analytics
Transform Your Credit and Collections with Predictive Analytics
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
 
7 Sat Essay Score
7 Sat Essay Score7 Sat Essay Score
7 Sat Essay Score
 
Valiance Portfolio
Valiance PortfolioValiance Portfolio
Valiance Portfolio
 
Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509Speech To Omega Scorebaord 2009 Conference 041509
Speech To Omega Scorebaord 2009 Conference 041509
 
Introduction to predictive modeling v1
Introduction to predictive modeling v1Introduction to predictive modeling v1
Introduction to predictive modeling v1
 

Plus de Adam Pah

Why Python?
Why Python?Why Python?
Why Python?Adam Pah
 
Quest overview
Quest overviewQuest overview
Quest overviewAdam Pah
 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchAdam Pah
 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildAdam Pah
 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleAdam Pah
 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial TutorialAdam Pah
 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Adam Pah
 

Plus de Adam Pah (7)

Why Python?
Why Python?Why Python?
Why Python?
 
Quest overview
Quest overviewQuest overview
Quest overview
 
A quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for researchA quick overview of why to use and how to set up iPython notebooks for research
A quick overview of why to use and how to set up iPython notebooks for research
 
Pah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuildPah res-potentia-netsci emailable-stagebuild
Pah res-potentia-netsci emailable-stagebuild
 
D3 interactivity Linegraph basic example
D3 interactivity Linegraph basic exampleD3 interactivity Linegraph basic example
D3 interactivity Linegraph basic example
 
Mercurial Tutorial
Mercurial TutorialMercurial Tutorial
Mercurial Tutorial
 
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"Introduction to Mercurial, or "Why we're switching from SVN no matter what"
Introduction to Mercurial, or "Why we're switching from SVN no matter what"
 

Dernier

It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayNZSG
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Tina Ji
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Delhi Call girls
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...noida100girls
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.Aaiza Hassan
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableDipal Arora
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxAndy Lambert
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Roland Driesen
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Lviv Startup Club
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communicationskarancommunications
 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaShree Krishna Exports
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...Any kyc Account
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in managementchhavia330
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...lizamodels9
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear RegressionRavindra Nath Shukla
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst SummitHolger Mueller
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdftbatkhuu1
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfPaul Menig
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...Paul Menig
 

Dernier (20)

It will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 MayIt will be International Nurses' Day on 12 May
It will be International Nurses' Day on 12 May
 
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
Russian Faridabad Call Girls(Badarpur) : ☎ 8168257667, @4999
 
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
Best VIP Call Girls Noida Sector 40 Call Me: 8448380779
 
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...BEST ✨ Call Girls In  Indirapuram Ghaziabad  ✔️ 9871031762 ✔️ Escorts Service...
BEST ✨ Call Girls In Indirapuram Ghaziabad ✔️ 9871031762 ✔️ Escorts Service...
 
M.C Lodges -- Guest House in Jhang.
M.C Lodges --  Guest House in Jhang.M.C Lodges --  Guest House in Jhang.
M.C Lodges -- Guest House in Jhang.
 
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Pune Just Call 9907093804 Top Class Call Girl Service Available
 
Monthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptxMonthly Social Media Update April 2024 pptx.pptx
Monthly Social Media Update April 2024 pptx.pptx
 
Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...Ensure the security of your HCL environment by applying the Zero Trust princi...
Ensure the security of your HCL environment by applying the Zero Trust princi...
 
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
Yaroslav Rozhankivskyy: Три складові і три передумови максимальної продуктивн...
 
Pharma Works Profile of Karan Communications
Pharma Works Profile of Karan CommunicationsPharma Works Profile of Karan Communications
Pharma Works Profile of Karan Communications
 
Best Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in IndiaBest Basmati Rice Manufacturers in India
Best Basmati Rice Manufacturers in India
 
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
KYC-Verified Accounts: Helping Companies Handle Challenging Regulatory Enviro...
 
GD Birla and his contribution in management
GD Birla and his contribution in managementGD Birla and his contribution in management
GD Birla and his contribution in management
 
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
Call Girls In DLf Gurgaon ➥99902@11544 ( Best price)100% Genuine Escort In 24...
 
Regression analysis: Simple Linear Regression Multiple Linear Regression
Regression analysis:  Simple Linear Regression Multiple Linear RegressionRegression analysis:  Simple Linear Regression Multiple Linear Regression
Regression analysis: Simple Linear Regression Multiple Linear Regression
 
Progress Report - Oracle Database Analyst Summit
Progress  Report - Oracle Database Analyst SummitProgress  Report - Oracle Database Analyst Summit
Progress Report - Oracle Database Analyst Summit
 
Event mailer assignment progress report .pdf
Event mailer assignment progress report .pdfEvent mailer assignment progress report .pdf
Event mailer assignment progress report .pdf
 
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
VVVIP Call Girls In Greater Kailash ➡️ Delhi ➡️ 9999965857 🚀 No Advance 24HRS...
 
Grateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdfGrateful 7 speech thanking everyone that has helped.pdf
Grateful 7 speech thanking everyone that has helped.pdf
 
7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...7.pdf This presentation captures many uses and the significance of the number...
7.pdf This presentation captures many uses and the significance of the number...
 

Kaggle "Give me some credit" challenge overview

  • 2. What is the problem?
  • 3. What is the problem? • X Store has a retail credit card available to customers
  • 4. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt
  • 5. What is the problem? • X Store has a retail credit card available to customers • There can be a number of sources of loss from this product, but one is customer’s defaulting on their debt • This prevents the store from collecting payment for products and services rendered
  • 6. Is this problem big enough to matter?
  • 7. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years
  • 8. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an:
  • 9. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer
  • 10. Is this problem big enough to matter? • Examining a slice of the customer database (150,000 customers) we find that 6.6% of customers were seriously delinquent in payment the last two years • If only 5% of their carried debt was the store credit card this is potentially an: • Average loss of $8.12 per customer • Potential overall loss of $1.2 million
  • 11. What can be done?
  • 12. What can be done? • There are numerous models that can be used to predict which customers will default
  • 13. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss
  • 14. What can be done? • There are numerous models that can be used to predict which customers will default • This could be used to decrease credit limits or cancel credit lines for current risky customers to minimize potential loss • Or better screen which customers are approved for the card
  • 15. How will I do this?
  • 16. How will I do this? • This is a basic classification problem with important business implications
  • 17. How will I do this? • This is a basic classification problem with important business implications • We’ll examine a few simplistic models to get an idea of performance
  • 18. How will I do this? • This is a basic classification problem with important business implications • We’ll examine a few simplistic models to get an idea of performance • Explore decision tree methods to achieve better performance
  • 19. What will the models predict delinquency? Each customer has a number of attributes
  • 20. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4
  • 21. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2
  • 22. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ...
  • 23. What will the models predict delinquency? Each customer has a number of attributes John Smith Delinquent:Yes Age: 23 Income: $1600 Number of Lines: 4 Mary Rasmussen Delinquent: No Age: 73 Income: $2200 Number of Lines: 2 ... We will use the customer attributes to predict whether they were delinquent
  • 24. How do we make sure that our solution actually has predictive power?
  • 25. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset
  • 26. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train 150,000 customers Delinquency in dataset
  • 27. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset
  • 28. How do we make sure that our solution actually has predictive power? We have two slices of the customer dataset Train Test 150,000 customers Delinquency in dataset 101,000 customers Delinquency not in dataset None of the customers in the test dataset are used to train the model
  • 29. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train
  • 30. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3
  • 31. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training
  • 32. Internally we validate our model performance with cross-fold validation Using only the train dataset we can get a sense of how well our model performs without externally validating it Train Train 1 Train 2 Train 3 Train 1 Train 2 Algorithm Training Algorithm Testing Train 3
  • 33. What matters is how well we can predict the test dataset We judge this using the accuracy, which is the number of our predictions correct out of the total number of predictions made So with 100,000 customers and an 80% accuracy we will have correctly predicted whether 80,000 customers will default or not in the next two years
  • 35. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it
  • 36. Putting accuracy in context We could save $600,000 over two years if we correctly predicted 50% of the customers that would default and changed their account to prevent it The potential loss is minimized by ~$8,000 for every 100,000 customers with each percentage point increase in accuracy
  • 37. Looking at the actual data
  • 38. Looking at the actual data
  • 39. Looking at the actual data
  • 40. Looking at the actual data Assume $2,500
  • 41. Looking at the actual data Assume $2,500 Assume 0
  • 42. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower
  • 43. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance
  • 44. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 45. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50%
  • 46. There is a continuum of algorithmic choices to tackle the problem Simpler, Quicker Complex, Slower Random Chance 50% Simple Classification
  • 47. For simple classification we pick a single attribute and find the best split in the customers
  • 48. For simple classification we pick a single attribute and find the best split in the customers
  • 49. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due
  • 50. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1
  • 51. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 52. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 53. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2
  • 54. For simple classification we pick a single attribute and find the best split in the customers NumberofCustomers Times Past Due True Positive True Negative False Positive False Negative 1 2 ...
  • 55. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number
  • 56. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent
  • 57. We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 58. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 59. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent
  • 60. 0 20 40 60 80 100 Number of Times 30-59 Days Past Due 0 0.2 0.4 0.6 0.8 Accuracy Precision Sensitivity We evaluate possible splits using accuracy, precision, and sensitivity Acc = Number correct Total Number Prec = True Positives Number of People Predicted Delinquent Sens = True Positives Number of People Actually Delinquent 0.61 KGI on Test Set
  • 61. However, not all fields are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5
  • 62. However, not all fields are as informative Using the number of times past due 60-89 days we achieve a KGI of 0.5 The approach is naive and could be improved but our time is better spent on different algorithms
  • 63. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61
  • 64. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests
  • 65. A random forest starts from a decision tree Customer Data
  • 66. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes
  • 67. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30?
  • 68. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30
  • 69. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30
  • 70. A random forest starts from a decision tree Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No 75,000 Customers>30 Yes 25,000 Customers <30 ...
  • 71. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 72. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1
  • 73. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
  • 74. A random forest is composed of many decision trees ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 We use a large number of trees to not over-fit to the training data Class assignment of a customer is based on how many of the decision trees “vote” on how to split an attribute
  • 75. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 76. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation
  • 77. The Random Forest algorithm are easily implemented In Python or R for initial testing and validation Also parallelized with Mahout and Hadoop since there is no dependence from one tree to the next
  • 78. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI
  • 79. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI
  • 80. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 81. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI
  • 82. A random forest performs well on the test set Random Forest 10 trees: 0.779 KGI 150 trees: 0.843 KGI 1000 trees: 0.850 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests
  • 83. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85
  • 84. Exploring algorithmic choices further Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting
  • 85. Boosting Trees is similar to a Random Forest Customer Data Find the best split in a set of randomly chosen attributes Is age <30? No Customers >30 Data Yes Customers <30 Data ...
  • 86. Boosting Trees is similar to a Random Forest Customer Data Is age <30? No Customers >30 Data Yes Customers <30 Data ... Do an exhaustive search for best split
  • 87. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data
  • 88. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit
  • 89. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit This is a sequential process in comparison to the random forest
  • 90. How Gradient Boosting Trees differs from Random Forest ... Customer Data Best Split No Customers Data Set 2 Yes Customers Data Set 1 The first tree is optimized to minimize a loss function describing the data The next tree is then optimized to fit whatever variability the first tree didn’t fit This is a sequential process in comparison to the random forest We also run the risk of over-fitting to the data, thus the learning rate
  • 91. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation
  • 92. Implementing Gradient Boosted Trees In Python or R it is easy for initial testing and validation There are implementations that use Hadoop but it’s more complicated to achieve the best performance
  • 93. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI
  • 94. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI
  • 95. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4
  • 96. Gradient Boosting Trees performs well on the dataset 100 trees, 0.1 Learning: 0.865022 KGI 1000 trees, 0.1 Learning: 0.865248 KGI 0 0.6 0.8 Learning Rate 0.75 0.8 0.85 KGI 0.2 0.4 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees
  • 97. Moving one step further in complexity Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method
  • 98. Or more accurately an ensemble of ensemble methods Algorithm Progression
  • 99. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest
  • 100. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest
  • 101. Or more accurately an ensemble of ensemble methods Algorithm Progression Random Forest Extremely Random Forest Gradient Tree Boosting
  • 102. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . .
  • 103. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 104. Or more accurately an ensemble of ensemble methods Algorithm ProgressionTrain Data Probabilities Random Forest Extremely Random Forest Gradient Tree Boosting 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 105. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . .
  • 106. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies
  • 107. Combine all of the model information Train Data Probabilities 0.1 0.5 0.01 0.8 0.7 . . . 0.15 0.6 0.0 0.75 0.68 . . . Optimize the set of train probabilities to the known delinquencies Apply the same weighting scheme to the set of test data probabilities
  • 108. Implementation can be done in a number of ways Testing in Python or R is slower, due to the sequential nature of applying the algorithms Could be faster parallelized, running each algorithm separately and combining the results
  • 109. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI
  • 110. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 111. Assessing model performance Blending Performance, 100 trees: 0.864394 KGI But this performance and the possibility of additional gains comes at a distinct time cost. 0.4 0.5 0.6 0.7 0.8 0.9 Random Accuracy Classification Random Forests Boosting Trees Blended
  • 112. Examining the continuum of choices Simpler, Quicker Complex, Slower Random Chance 0.50 Simple Classification 0.50-0.61 Random Forests 0.78-0.85 Gradient Tree Boosting 0.71-0.8659 Blended Method 0.864
  • 113. What would be best to implement?
  • 114. What would be best to implement? There is a large amount of optimization in the blended method that could be done
  • 115. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also
  • 116. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized
  • 117. What would be best to implement? There is a large amount of optimization in the blended method that could be done However, this algorithm takes the longest to run. This constraint will apply in testing and validation also Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 118. What would be best to implement? Random Forests returns a reasonably good result. It is quick and easily parallelized Gradient Tree Boosting returns the best result and runs reasonably fast. It is not as easily parallelized though
  • 119. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random
  • 120. Increases in predictive performance have real business value Using any of the more complex algorithms we achieve an increase of 35% in comparison to random Potential decrease of ~$420k in losses by identifying customers likely to default in the training set alone
  • 121. Thank you for your time