SlideShare une entreprise Scribd logo
1  sur  9
Télécharger pour lire hors ligne
Comparison of Credit Scoring
Models for Probability of
Default Estimation
White Paper by Rahul Dutta
Fierce competition amongst the banking and other financial sectors, as well as the recent
global financial crisis and the subsequent new regulatory environments, have brought
modelling credit scoring of business and personal loans to prominence. An accurate
estimation of the credit risk associated with customers has become paramount, as this
information assists financial institutions in deciding whether to grant credit to their
customers. As the demand for credit products rapidly increases and lenders consistently face
potential financial losses due to customers who are likely to default, it is important for
lenders to identify the main risk factors contributing to the probability of default, as well as
to predict the Probability of Default (PD) as accurately as possible. Several modelling
techniques are available for this analysis. In this paper we are going to compare different
credit scoring models for Probability of Default and Loss Given Default (LGD) estimation.
Executive Summary
Probability of Default and Loss Given Default analysis:
Probability of Default/Loss Given Default analysis is a method used by generally larger financial institutions
to calculate expected loss. A probability of default is already assigned to a specific risk measure, per
guidance, and represents the percentage expectation to default, measured most frequently by assessing past
dues. Loss Given Default measures the expected loss, net of any recoveries, expressed as a percentage and
will be unique to the industry or segment.
When combined with the variable Exposure at Default (EAD) or current balance at default, the expected loss
calculation is deceptively simple:
While the equation itself may be simple, deriving the variables requires in-depth analysis. PD and LGD
represent the past experience of a financial institution but also represent what an institution expects to
experience in the future. Expected loss being a function of EAD,PD and LGD depends on the estimates of
these. EAD, PD and LGD can be estimated using various techniques and at different level. Estimation of EAD
or exposure at default can be done using simple OLS techniques or Survival Analysis while LGD or Loss Given
Default can be predicted using either a regression model or decision trees. The most important part of this
calculation is the estimation of Probability of Default. The common methodology followed to estimate
probability of default (PD) is Logistic Modelling which predicts whether a customer will default payment of a
particular debt. For example, if a bank provides Auto Loans to its customers, this method will be able to
predict the probability of a particular customer to be a defaulter for that particular loan and given that a
particular customer is a defaulter, LGD models will help to identify the loss amount for the Bank.
Now, estimation of all these components can be done at different levels. While Probability of Default can be
calculated for every customer, LGD and EAD can be calculated at an aggregated level. E.g after calculating
the probability of default of individuals, EAD and LGD can be calculated for different time periods or for
group of individuals with same loan amount or same FICO score etc. So, finally expected loss is calculated at
an aggregated level.
Instead of calculating Expected Loss at an aggregated level, we can also calculate them for individuals in the
following way.
Let us assume we want to calculate total expected loss for the next one year from an auto loan given by a
Bank to its customers. Assume for the i- th individual PD is the probability of default within t months, EADit it
is the exposure at default and R is the recovery rate for the individual. So, for individual i at time t, expectedit
loss is PD * EAD *(1- R ). So, expected loss within next one year for an individual is as follows:it it it
Clearly, this method gives us the expected loss values for each individual. Estimation of EAD and R can beit it
done in the following way.
 EAD can be expressed as a function of I and t. E.g. if for an individual total loan amount is $100 with anit
interest of 10% and tenure of one year with a monthly premium of $11. Now if after 3 months the
individual becomes defaulter, Exposure at default (EAD) is $67.
 R can be calculated also in a similar fashion with some financial values available which are functions ofit
'I' and 't'.
Probability of Default for the individual or PD can be predicted in several ways. Following are the techniquesit
to calculate Probability of default (PD).
Logistic Regression
Model Structure:
The Logistic Regression takes the following form:
Where, 'p' is the probability of the event occurring, and 'K' independent variables; 'x' each are weighted by a
coefficient: 'β'
The above equation can be written as:
Interpretation: In logistic regression, a change in one factor changes the risk by an amount that is
proportional to the level of the other factors.
Data used: To identify defaulters by this method, data that is being used is usually an account level
data(customer level data) where at any given point of time from the past record models predict that given a
customer has a certain information about the predictors, what is the chance that the customer will be a
defaulter.
Predictors are usually the following ones:
 Loan Amount Issued
 Asset Amount
 Loan to Asset ratio
 No. of months in books
 Down Payment made
 No of months employed
 FICO score
 Total amount of loan taken till date etc.
Disadvantage of this procedure:
Even if this method takes into account a significant amount of information and is quite capable of
interpreting the coefficient of the predictors for the equation, the downside to this approach is that it takes
into account closed cases only, that is, those accounts where either the loan has been paid fully/ not paid /
has not paid anything for a certain period (e.g. 90 days.). That is, some accounts are deliberately considered
as defaulters who have not paid any amount for a certain period of time and then the model is built in order
to predict the probability of default. From an overview of this analysis we can say that the behaviour of
accounts/customers of a particular loan product varies across two dimensions. Firstly, the predictor variables
that are mentioned above which account for the variability between two different customers and try to
quantify the probability of being a defaulter. The second dimension is time which accounts for the change in
payment procedure within a particular observation (customer). This time factor brings in the concept of
failure rate for a customer at any point in time.
So, it is quite clear that Logistic Modelling approach might be helpful to identify /predict the defaulters but is
not able to identify when someone will default, is it after 6 months of approval of the loan or after 2 years;
which upon having the answer will serve the banking institutions considerably.
To resolve this issue the approach that we can follow is a survival analysis technique which is able to take
into account the time variant factors and will be able to answer the following questions:
i) Which borrower will default?
ii) When will that borrower default?
The advantages of Survival analysis method over Logistic Regression for credit scoring are as follows:
i) Survival models naturally match the loan default process,
ii) It gives a clearer approach to assessing the likely profitability of an applicant, and,
iii) Survival estimates will provide a forecast as a function of time
We can understand this with an example,
We expect that rises in interest rates may increase the risk of an individual failing to make payments. This can
be due to increased payment demands on loans and mortgages as well as outstanding credit card debt. The
scenario is similar to economic indicators like unemployment index, property price etc. Since these variables
are time variant, it is quite complicated to include them in a Logistic Regression setup whereas in Survival
Analysis they can be easily incorporated.
Survival analysis, provides the predicted distribution of 'T' (Time to default) along with a number of other
advantages:
 First, it provides a consistent means of predicting probability of default within many different periods
of time (e.g., 12 month default rate, 24 month default rate, etc.).
 Second, it possesses an inherent mechanism for taking into consideration the most recent data. On the
contrary, using Logistic Regression if one wishes to predict the probability of default within 24 months,
customers joining within the past 24 months cannot be included while fitting the model.
 Third, it provides comprehensive information on the predicted behaviour of 'T' via its predicted
distribution.
Proportional Hazard Model:
The objective here is to model the time of default for a particular customer. Let us assume T denotes the
random variable for time to default.
Let f(t) and F(t) denote the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of
the time 'T' to default (T=0 corresponds to the time of approval of the loan).
The hazard function then is defined as and is interpreted as the instantaneous likelihood
of defaulting at time 't', given that the customer has not defaulted prior to time 't' . From the definition of the
hazard function, it can be shown that
Let x , x , ... , x denote a set of 'M' predictor variables for an applicant, and define the predictor vector x =1 2 M
/
[x ,x , ..., x ] .1 2 M
In survival analysis, perhaps the most popular way to allow the distribution of 'T' to depend on a set of
predictor variables is through a PH survival model, defined below:
Denoting the hazard function for a customer with predictors 'x' by h(t; x) to indicate its explicit dependence
on x, a PH survival model represents:
Baseline Function,f (t) follows any distribution from the exponential family e.g. Exponential, Weibull, Log Normal etc.0
A quantity that can be extracted from the
predicted distribution is the probability that an
applicant will default within the specific time
period, which is what Logistic Regression
produces. For example, the predicted
probability of default within 24 months is
simply F(24; x) which is the predicted CDF
evaluated at month t=24. Conceptually, this is
the area under the predicted PDF f(t; x) between
t=0 and t=24 in the chart.
But the above approach does not include the impact of different economic scenarios, i.e. what will be the
change in default rate of the customers joining in different periods of time? To take into account that time
variability, the Time Dependent Proportional Hazard (TDPH) model is of great use.
Time Dependent Proportional Hazard Model:
To talk about the time factor in a default modelling scenario let's look at the following chart which shows the
percentage of customers who defaulted within the first 9 months of the loan tenure. Three vintages were
considered (i.e., customers joining in the three different quarters: Quarter 2 of 2004, Quarter 4 of 2005, and
Quarter 4 of 2007). The customers in all three vintages fell into the same FICO scoring band (between 675 and
705). Hence, if one ignored market trends and attempted to predict default probability based only on the
applicants' predictor variables, one could naively conclude that customers in the three vintages all have the
same default probability. In reality, the figure shows that the default probability is much higher for the Q4
2007 vintage, because of the severe economic downturn in 2008.
To account for such temporal effects, one potential approach is to incorporate macroeconomic variables into
the PH survival model.
For a customer that joins during month't', denote their hazard function by h(t; x,τ) to explicitly indicate its
dependence not only on x, but also on the time at which the customer joins. In the TDPH survival model, one
represents,
Parameter estimation:
Maximum Likelihood estimation method can be used to estimate the parameters for the equation.
Data Structure:
Unlike Logistic Modelling procedure PH Model or TDPH model does not require data for a particular time
window. For example, in logistic modelling, if we want to model the probability of default for 12 months we
cannot consider any data within one year, prior to the time of data collection. TDPH/PH models do not have
this problem. The basic difference between the data structure from Logistic Model to PH model is the censor
variable. Suppose, the period of the data collected is from 2002 to 2008 and we want to find the probability
of default for a loan with a cut-off point of 24 months. We create a binary (0/1) variable using the cut-off
point – '0' being the ones where either loan tenure is over or payment has been made and '1' being the
censored one where it is an open account. In addition to this, some macroeconomic factors like
unemployment rate, house rate index, interest rate etc. are included. The macroeconomic factors contain the
variability between the customers with loan approved in different time point but having the same kind of
loan information (e.g. Loan Amount, Tenure, Credit History etc.); this loan information corresponds to the
time dynamic part of the data.
Sampling Methods:
Biased sampling methods can be used to draw the training sample.
Generalized Additive Modelling Approach for Probability of
default and Loss given default modelling:
There are several credit scoring methods to calculate the probability of default and loss given default like
Logit Model, Divergence-Discriminant Method, Neural Networks, Proportional Hazard model etc. Most of
the known methods being parametric always involve a distributional assumption to build the model; which
might not always be a good choice, given the dynamic scenario of economy and customer behaviour.
To encounter that effect a semi parametric approach can be taken to incorporate those effects where the
parametric part takes care of the conventional aspect of the predictors and the non-parametric part takes
care of the remaining, which is not as such functional under any known distribution. A suitable modelling
approach of this kind is Generalized Additive Modelling.
In statistics, a Generalized Additive Model (GAM) is a generalized linear model in which the linear predictor
depends linearly on unknown smooth functions of some predictor variables, and interest focuses on
inference about these smooth functions.
The model relates a univariate response variable, 'Y', to some predictor variables, xi. An exponential family
distribution is specified for 'Y' (for example normal, binomial or Poisson distributions) along with a link
function g (for example the identity or log functions) relating the expected value of Y to the predictor
variables via a structure such as:
In the widely-used parametric models, the relationships between continuous predictor variables and the
response variable are assumed to be known functional forms, even though they are mostly unknown in many
empirical applications. By contrast, the semi-parametric GAM does not assume any functional forms for
these relationships, but the data is allowed to determine them. The data-driven cubic B-spline (non-
parametric) method is used to estimate the GAM. By doing so, the underlying true relationships between
continuous predictor variables and the binary response variable may be uncovered.
Performance Measures:
Performance measures for PD, LGD models are universal. Following performance measures can be used to
validate the model:
Model Validation Techniques:
Usual Model validation techniques for Logistic Regression model like KS Analysis or ROC curve can also be
applied in case of a Proportional Hazard model or a Time Dependent Proportional Hazard Model.
Following is a snapshot comparison of ROC curves for
the following four procedures:
 Time Dependent Proportional Hazard Model
 Proportional Hazard Model
 Logistic regression
 Logistic Regression with a dynamic time
component
Receiver Operating Characteristic Curve:
ROC is commonly used to determine the overall
classification power as well as to provide information
on the performance of a model at any cut-off score
point. A widely used simple analysis is to measure the
performance of binary classifier system is a 2×2
contingency table of type I and type II errors. For a
given cut-off point score of 0.5, if the estimated
probability is over 0.5, it is classified as a default or bad
loan.
This is a snapshot of a ROC curve.
The other measures for validating the performance of a
model are,
 Area Under Curve (AUC) : B+C, reflecting total
accuracy of the model
 Gini Coefficient : B/(A+B)
Data Structure:
Similar to Logistic Modelling technique, this method also deals with closed accounts only; where, if one
wants to predict the probability of default within 12 months, data used should be at least one year preceding
the time of scoring. The following traits should be captured at a customer/account level: monthly income
(INC), debt-to-equity ratio (DE), the amount of loan (FND), monthly payment (MPM), and revolving credit
line utilization (UTIL), year(s) of employment experience (EMP), housing ownership (HOM) and delinquency
(DEL) reports within the recent history.
Application of GAM: GAM, a semi-parametric method proposed by Hastie and Tibshirani (1990), been
applied to modelling bankruptcies (Berg 2007). It has also been applied to a comprehensive survey on loan
recovery process of Italian Banks.
References:
 A time-dependent proportional hazards survival model for credit risk analysis, May 2011 - J-K Im, DW Apley, C Qi and X Shan
- Department of Industrial Engineering & Management Sciences, Northwestern University
 Credit Scoring With Macroeconomic Variables Using Survival Analysis, May 2007 - Tony Bellotti and Jonathan Crook, Credit
Research Centre, Management School and Economics, University of Edinburgh
 Survival Analysis Methods For Personal Loan Data, April 2002 - Maria Stepanova, UBS AG, Financial Services Group, Lyn
Thomas, Department of Management, University of Southampton
 A case study on using generalized additive models to fit credit rating scores, 2011 - Marlene Müller, Beuth Hochschule für
Technik Berlin
 · Nonlinear and Semi-parametric Modelling of Personal Loan Credit Scoring, August 2013 - Nithi Sopitpongstorn, Jean-
Pierre Fenech and Param Silvapullea, Department of Econometrics and Business Statistics, Monash University, Australia,
Department of Accounting and Finance, Monash University, Australia.
Office
Bangalore: 389, 2nd Floor, 9th Main, HSR Layout, Sector – 7, Bangalore – 560 102
Phone: +91-80-42102154
US: 1013 Centre Road, ST # 403S, Wilmington, New Castle, DE 19805
Phone: +1 858 312 1075
www.bridgei2i.com | enquiries@bridgei2i.com
Facebook | Twitter | Google+ | LinkedIn: BRIDGEi2i
About BRIDGEi2i
BRIDGEi2i provides Business Analytics Solutions to enterprises globally, enabling them to achieve
accelerated business impact harnessing the power of data. These analytics services and technology
solutions enable business managers to consume more meaningful information from big data, generate
actionable insights from complex business problems and make data driven decisions across pan-
enterprise processes to create sustainable business impact. BRIDGEi2i has featured among the top 10
analytics and big data start-ups in several coveted publications.

Contenu connexe

En vedette

Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishArsalan Qadri
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsSalford Systems
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Magnify Analytic Solutions
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Sri Ambati
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsGilles Louppe
 
Credit Scoring
Credit ScoringCredit Scoring
Credit ScoringMABSIV
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMNYC Predictive Analytics
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelMattia Ciprian
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeGilles Louppe
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval Venkata Reddy Konasani
 
GBM package in r
GBM package in rGBM package in r
GBM package in rmark_landry
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsEnplus Advisors, Inc.
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningSri Ambati
 
The 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsThe 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsJohn J. Maxfield
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and RAyan Das
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionDrZahid Khan
 
Logistic regression
Logistic regressionLogistic regression
Logistic regressionsaba khan
 
Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesPier Luca Lanzi
 

En vedette (20)

Estimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit RishEstimation of the probability of default : Credit Rish
Estimation of the probability of default : Credit Rish
 
Improve Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForestsImprove Your Regression with CART and RandomForests
Improve Your Regression with CART and RandomForests
 
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
Logistic Modeling with Applications to Marketing and Credit Risk in the Autom...
 
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
Dr. Trevor Hastie: Data Science of GBM (October 10, 2013: Presented With H2O)
 
Tree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptionsTree models with Scikit-Learn: Great models with little assumptions
Tree models with Scikit-Learn: Great models with little assumptions
 
Credit Scoring
Credit ScoringCredit Scoring
Credit Scoring
 
Intro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVMIntro to Classification: Logistic Regression & SVM
Intro to Classification: Logistic Regression & SVM
 
Introduction to Modeling
Introduction to ModelingIntroduction to Modeling
Introduction to Modeling
 
Xgboost
XgboostXgboost
Xgboost
 
Delopment and testing of a credit scoring model
Delopment and testing of a credit scoring modelDelopment and testing of a credit scoring model
Delopment and testing of a credit scoring model
 
Understanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to PracticeUnderstanding Random Forests: From Theory to Practice
Understanding Random Forests: From Theory to Practice
 
Model building in credit card and loan approval
Model building in credit card and loan approval Model building in credit card and loan approval
Model building in credit card and loan approval
 
GBM package in r
GBM package in rGBM package in r
GBM package in r
 
Predicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random ForestsPredicting Customer Conversion with Random Forests
Predicting Customer Conversion with Random Forests
 
How to win data science competitions with Deep Learning
How to win data science competitions with Deep LearningHow to win data science competitions with Deep Learning
How to win data science competitions with Deep Learning
 
The 10 Most Important Banking Metrics
The 10 Most Important Banking MetricsThe 10 Most Important Banking Metrics
The 10 Most Important Banking Metrics
 
Credit scoring using Rattle and R
Credit scoring using Rattle and RCredit scoring using Rattle and R
Credit scoring using Rattle and R
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
Machine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision TreesMachine Learning and Data Mining: 11 Decision Trees
Machine Learning and Data Mining: 11 Decision Trees
 

Plus de BRIDGEi2i Analytics Solutions

Inventory Flexibility Modeling (Fortune 100 Technology Company)
Inventory Flexibility Modeling (Fortune 100 Technology Company)Inventory Flexibility Modeling (Fortune 100 Technology Company)
Inventory Flexibility Modeling (Fortune 100 Technology Company)BRIDGEi2i Analytics Solutions
 
Demand - Supply Interlock (Largest Contract Manufacturer)
Demand - Supply Interlock (Largest Contract Manufacturer)Demand - Supply Interlock (Largest Contract Manufacturer)
Demand - Supply Interlock (Largest Contract Manufacturer)BRIDGEi2i Analytics Solutions
 
Line Operations Analytics (Fortune 500 Technology Company)
Line Operations Analytics (Fortune 500 Technology Company)Line Operations Analytics (Fortune 500 Technology Company)
Line Operations Analytics (Fortune 500 Technology Company)BRIDGEi2i Analytics Solutions
 
Sales & Operations Planning (Fortune 100 Technology Company)
Sales & Operations Planning (Fortune 100 Technology Company)Sales & Operations Planning (Fortune 100 Technology Company)
Sales & Operations Planning (Fortune 100 Technology Company)BRIDGEi2i Analytics Solutions
 
Direct Channel Demand Planning (Fortune 100 Technology Company)
Direct Channel Demand Planning (Fortune 100 Technology Company)Direct Channel Demand Planning (Fortune 100 Technology Company)
Direct Channel Demand Planning (Fortune 100 Technology Company)BRIDGEi2i Analytics Solutions
 
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...BRIDGEi2i Analytics Solutions
 
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)BRIDGEi2i Analytics Solutions
 
Demand Planning for NPIs (Fortune 100 Technology Company)
Demand Planning for NPIs (Fortune 100 Technology Company)Demand Planning for NPIs (Fortune 100 Technology Company)
Demand Planning for NPIs (Fortune 100 Technology Company)BRIDGEi2i Analytics Solutions
 
Assessing customer pain points from social media feedbacks
Assessing customer pain points from social media feedbacksAssessing customer pain points from social media feedbacks
Assessing customer pain points from social media feedbacksBRIDGEi2i Analytics Solutions
 
Employee Engagement Analytics Suite – EmPOWER Flyer
Employee Engagement Analytics Suite – EmPOWER FlyerEmployee Engagement Analytics Suite – EmPOWER Flyer
Employee Engagement Analytics Suite – EmPOWER FlyerBRIDGEi2i Analytics Solutions
 
An Introduction to Employee Engagement Analytics Suite – EmPOWER
An Introduction to Employee Engagement Analytics Suite – EmPOWERAn Introduction to Employee Engagement Analytics Suite – EmPOWER
An Introduction to Employee Engagement Analytics Suite – EmPOWERBRIDGEi2i Analytics Solutions
 
Employee Engagement Analytics Suite – EmPOWER Demo
Employee Engagement Analytics Suite – EmPOWER DemoEmployee Engagement Analytics Suite – EmPOWER Demo
Employee Engagement Analytics Suite – EmPOWER DemoBRIDGEi2i Analytics Solutions
 
A quick Introduction to Employee Engagement Analytics Suite – EmPOWER
A quick Introduction to Employee Engagement Analytics Suite – EmPOWERA quick Introduction to Employee Engagement Analytics Suite – EmPOWER
A quick Introduction to Employee Engagement Analytics Suite – EmPOWERBRIDGEi2i Analytics Solutions
 
Sourcing & Procurement Analytics for the modern enterprise
Sourcing & Procurement Analytics for the modern enterpriseSourcing & Procurement Analytics for the modern enterprise
Sourcing & Procurement Analytics for the modern enterpriseBRIDGEi2i Analytics Solutions
 

Plus de BRIDGEi2i Analytics Solutions (20)

Covid19 impact on insurance - BRIDGEi2i PoV
Covid19 impact on insurance - BRIDGEi2i PoVCovid19 impact on insurance - BRIDGEi2i PoV
Covid19 impact on insurance - BRIDGEi2i PoV
 
Inventory Flexibility Modeling (Fortune 100 Technology Company)
Inventory Flexibility Modeling (Fortune 100 Technology Company)Inventory Flexibility Modeling (Fortune 100 Technology Company)
Inventory Flexibility Modeling (Fortune 100 Technology Company)
 
Demand - Supply Interlock (Largest Contract Manufacturer)
Demand - Supply Interlock (Largest Contract Manufacturer)Demand - Supply Interlock (Largest Contract Manufacturer)
Demand - Supply Interlock (Largest Contract Manufacturer)
 
Line Operations Analytics (Fortune 500 Technology Company)
Line Operations Analytics (Fortune 500 Technology Company)Line Operations Analytics (Fortune 500 Technology Company)
Line Operations Analytics (Fortune 500 Technology Company)
 
Sales & Operations Planning (Fortune 100 Technology Company)
Sales & Operations Planning (Fortune 100 Technology Company)Sales & Operations Planning (Fortune 100 Technology Company)
Sales & Operations Planning (Fortune 100 Technology Company)
 
Direct Channel Demand Planning (Fortune 100 Technology Company)
Direct Channel Demand Planning (Fortune 100 Technology Company)Direct Channel Demand Planning (Fortune 100 Technology Company)
Direct Channel Demand Planning (Fortune 100 Technology Company)
 
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...
Revenue Planning and Forecasting (Software division of a Fortune 100 Technolo...
 
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)
Social Media in NPI Planning (Fortune 500 Cosumer Technology Company)
 
Demand Planning for NPIs (Fortune 100 Technology Company)
Demand Planning for NPIs (Fortune 100 Technology Company)Demand Planning for NPIs (Fortune 100 Technology Company)
Demand Planning for NPIs (Fortune 100 Technology Company)
 
BRIDGEi2i web analytics presentation
BRIDGEi2i web analytics presentationBRIDGEi2i web analytics presentation
BRIDGEi2i web analytics presentation
 
Assessing customer pain points from social media feedbacks
Assessing customer pain points from social media feedbacksAssessing customer pain points from social media feedbacks
Assessing customer pain points from social media feedbacks
 
Marketing Science for the VUCA world
Marketing Science  for the VUCA worldMarketing Science  for the VUCA world
Marketing Science for the VUCA world
 
Customer Experience Management
Customer Experience ManagementCustomer Experience Management
Customer Experience Management
 
Employee Engagement Analytics Suite – EmPOWER Flyer
Employee Engagement Analytics Suite – EmPOWER FlyerEmployee Engagement Analytics Suite – EmPOWER Flyer
Employee Engagement Analytics Suite – EmPOWER Flyer
 
An Introduction to Employee Engagement Analytics Suite – EmPOWER
An Introduction to Employee Engagement Analytics Suite – EmPOWERAn Introduction to Employee Engagement Analytics Suite – EmPOWER
An Introduction to Employee Engagement Analytics Suite – EmPOWER
 
Employee Engagement Analytics Suite – EmPOWER Demo
Employee Engagement Analytics Suite – EmPOWER DemoEmployee Engagement Analytics Suite – EmPOWER Demo
Employee Engagement Analytics Suite – EmPOWER Demo
 
A quick Introduction to Employee Engagement Analytics Suite – EmPOWER
A quick Introduction to Employee Engagement Analytics Suite – EmPOWERA quick Introduction to Employee Engagement Analytics Suite – EmPOWER
A quick Introduction to Employee Engagement Analytics Suite – EmPOWER
 
Sourcing & Procurement Analytics for the modern enterprise
Sourcing & Procurement Analytics for the modern enterpriseSourcing & Procurement Analytics for the modern enterprise
Sourcing & Procurement Analytics for the modern enterprise
 
Supply Chain Analytics for the modern enterprise
Supply Chain Analytics for the modern enterpriseSupply Chain Analytics for the modern enterprise
Supply Chain Analytics for the modern enterprise
 
Contextual Market Intelligence
Contextual Market Intelligence Contextual Market Intelligence
Contextual Market Intelligence
 

Dernier

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz1
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 

Dernier (20)

Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
Best VIP Call Girls Noida Sector 22 Call Me: 8448380779
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...Determinants of health, dimensions of health, positive health and spectrum of...
Determinants of health, dimensions of health, positive health and spectrum of...
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...Vip Model  Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...
 
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 

Whitepaper - Ready Reckoner: Probability of Default Modeling

  • 1. Comparison of Credit Scoring Models for Probability of Default Estimation White Paper by Rahul Dutta
  • 2. Fierce competition amongst the banking and other financial sectors, as well as the recent global financial crisis and the subsequent new regulatory environments, have brought modelling credit scoring of business and personal loans to prominence. An accurate estimation of the credit risk associated with customers has become paramount, as this information assists financial institutions in deciding whether to grant credit to their customers. As the demand for credit products rapidly increases and lenders consistently face potential financial losses due to customers who are likely to default, it is important for lenders to identify the main risk factors contributing to the probability of default, as well as to predict the Probability of Default (PD) as accurately as possible. Several modelling techniques are available for this analysis. In this paper we are going to compare different credit scoring models for Probability of Default and Loss Given Default (LGD) estimation. Executive Summary Probability of Default and Loss Given Default analysis: Probability of Default/Loss Given Default analysis is a method used by generally larger financial institutions to calculate expected loss. A probability of default is already assigned to a specific risk measure, per guidance, and represents the percentage expectation to default, measured most frequently by assessing past dues. Loss Given Default measures the expected loss, net of any recoveries, expressed as a percentage and will be unique to the industry or segment. When combined with the variable Exposure at Default (EAD) or current balance at default, the expected loss calculation is deceptively simple: While the equation itself may be simple, deriving the variables requires in-depth analysis. PD and LGD represent the past experience of a financial institution but also represent what an institution expects to experience in the future. Expected loss being a function of EAD,PD and LGD depends on the estimates of these. EAD, PD and LGD can be estimated using various techniques and at different level. Estimation of EAD or exposure at default can be done using simple OLS techniques or Survival Analysis while LGD or Loss Given Default can be predicted using either a regression model or decision trees. The most important part of this calculation is the estimation of Probability of Default. The common methodology followed to estimate probability of default (PD) is Logistic Modelling which predicts whether a customer will default payment of a particular debt. For example, if a bank provides Auto Loans to its customers, this method will be able to predict the probability of a particular customer to be a defaulter for that particular loan and given that a particular customer is a defaulter, LGD models will help to identify the loss amount for the Bank. Now, estimation of all these components can be done at different levels. While Probability of Default can be calculated for every customer, LGD and EAD can be calculated at an aggregated level. E.g after calculating the probability of default of individuals, EAD and LGD can be calculated for different time periods or for group of individuals with same loan amount or same FICO score etc. So, finally expected loss is calculated at an aggregated level.
  • 3. Instead of calculating Expected Loss at an aggregated level, we can also calculate them for individuals in the following way. Let us assume we want to calculate total expected loss for the next one year from an auto loan given by a Bank to its customers. Assume for the i- th individual PD is the probability of default within t months, EADit it is the exposure at default and R is the recovery rate for the individual. So, for individual i at time t, expectedit loss is PD * EAD *(1- R ). So, expected loss within next one year for an individual is as follows:it it it Clearly, this method gives us the expected loss values for each individual. Estimation of EAD and R can beit it done in the following way.  EAD can be expressed as a function of I and t. E.g. if for an individual total loan amount is $100 with anit interest of 10% and tenure of one year with a monthly premium of $11. Now if after 3 months the individual becomes defaulter, Exposure at default (EAD) is $67.  R can be calculated also in a similar fashion with some financial values available which are functions ofit 'I' and 't'. Probability of Default for the individual or PD can be predicted in several ways. Following are the techniquesit to calculate Probability of default (PD). Logistic Regression Model Structure: The Logistic Regression takes the following form: Where, 'p' is the probability of the event occurring, and 'K' independent variables; 'x' each are weighted by a coefficient: 'β' The above equation can be written as: Interpretation: In logistic regression, a change in one factor changes the risk by an amount that is proportional to the level of the other factors. Data used: To identify defaulters by this method, data that is being used is usually an account level data(customer level data) where at any given point of time from the past record models predict that given a customer has a certain information about the predictors, what is the chance that the customer will be a defaulter. Predictors are usually the following ones:  Loan Amount Issued  Asset Amount  Loan to Asset ratio  No. of months in books  Down Payment made  No of months employed  FICO score  Total amount of loan taken till date etc.
  • 4. Disadvantage of this procedure: Even if this method takes into account a significant amount of information and is quite capable of interpreting the coefficient of the predictors for the equation, the downside to this approach is that it takes into account closed cases only, that is, those accounts where either the loan has been paid fully/ not paid / has not paid anything for a certain period (e.g. 90 days.). That is, some accounts are deliberately considered as defaulters who have not paid any amount for a certain period of time and then the model is built in order to predict the probability of default. From an overview of this analysis we can say that the behaviour of accounts/customers of a particular loan product varies across two dimensions. Firstly, the predictor variables that are mentioned above which account for the variability between two different customers and try to quantify the probability of being a defaulter. The second dimension is time which accounts for the change in payment procedure within a particular observation (customer). This time factor brings in the concept of failure rate for a customer at any point in time. So, it is quite clear that Logistic Modelling approach might be helpful to identify /predict the defaulters but is not able to identify when someone will default, is it after 6 months of approval of the loan or after 2 years; which upon having the answer will serve the banking institutions considerably. To resolve this issue the approach that we can follow is a survival analysis technique which is able to take into account the time variant factors and will be able to answer the following questions: i) Which borrower will default? ii) When will that borrower default? The advantages of Survival analysis method over Logistic Regression for credit scoring are as follows: i) Survival models naturally match the loan default process, ii) It gives a clearer approach to assessing the likely profitability of an applicant, and, iii) Survival estimates will provide a forecast as a function of time We can understand this with an example, We expect that rises in interest rates may increase the risk of an individual failing to make payments. This can be due to increased payment demands on loans and mortgages as well as outstanding credit card debt. The scenario is similar to economic indicators like unemployment index, property price etc. Since these variables are time variant, it is quite complicated to include them in a Logistic Regression setup whereas in Survival Analysis they can be easily incorporated. Survival analysis, provides the predicted distribution of 'T' (Time to default) along with a number of other advantages:  First, it provides a consistent means of predicting probability of default within many different periods of time (e.g., 12 month default rate, 24 month default rate, etc.).  Second, it possesses an inherent mechanism for taking into consideration the most recent data. On the contrary, using Logistic Regression if one wishes to predict the probability of default within 24 months, customers joining within the past 24 months cannot be included while fitting the model.  Third, it provides comprehensive information on the predicted behaviour of 'T' via its predicted distribution.
  • 5. Proportional Hazard Model: The objective here is to model the time of default for a particular customer. Let us assume T denotes the random variable for time to default. Let f(t) and F(t) denote the Probability Density Function (PDF) and Cumulative Distribution Function (CDF) of the time 'T' to default (T=0 corresponds to the time of approval of the loan). The hazard function then is defined as and is interpreted as the instantaneous likelihood of defaulting at time 't', given that the customer has not defaulted prior to time 't' . From the definition of the hazard function, it can be shown that Let x , x , ... , x denote a set of 'M' predictor variables for an applicant, and define the predictor vector x =1 2 M / [x ,x , ..., x ] .1 2 M In survival analysis, perhaps the most popular way to allow the distribution of 'T' to depend on a set of predictor variables is through a PH survival model, defined below: Denoting the hazard function for a customer with predictors 'x' by h(t; x) to indicate its explicit dependence on x, a PH survival model represents: Baseline Function,f (t) follows any distribution from the exponential family e.g. Exponential, Weibull, Log Normal etc.0 A quantity that can be extracted from the predicted distribution is the probability that an applicant will default within the specific time period, which is what Logistic Regression produces. For example, the predicted probability of default within 24 months is simply F(24; x) which is the predicted CDF evaluated at month t=24. Conceptually, this is the area under the predicted PDF f(t; x) between t=0 and t=24 in the chart. But the above approach does not include the impact of different economic scenarios, i.e. what will be the change in default rate of the customers joining in different periods of time? To take into account that time variability, the Time Dependent Proportional Hazard (TDPH) model is of great use.
  • 6. Time Dependent Proportional Hazard Model: To talk about the time factor in a default modelling scenario let's look at the following chart which shows the percentage of customers who defaulted within the first 9 months of the loan tenure. Three vintages were considered (i.e., customers joining in the three different quarters: Quarter 2 of 2004, Quarter 4 of 2005, and Quarter 4 of 2007). The customers in all three vintages fell into the same FICO scoring band (between 675 and 705). Hence, if one ignored market trends and attempted to predict default probability based only on the applicants' predictor variables, one could naively conclude that customers in the three vintages all have the same default probability. In reality, the figure shows that the default probability is much higher for the Q4 2007 vintage, because of the severe economic downturn in 2008. To account for such temporal effects, one potential approach is to incorporate macroeconomic variables into the PH survival model. For a customer that joins during month't', denote their hazard function by h(t; x,τ) to explicitly indicate its dependence not only on x, but also on the time at which the customer joins. In the TDPH survival model, one represents, Parameter estimation: Maximum Likelihood estimation method can be used to estimate the parameters for the equation. Data Structure: Unlike Logistic Modelling procedure PH Model or TDPH model does not require data for a particular time window. For example, in logistic modelling, if we want to model the probability of default for 12 months we cannot consider any data within one year, prior to the time of data collection. TDPH/PH models do not have this problem. The basic difference between the data structure from Logistic Model to PH model is the censor variable. Suppose, the period of the data collected is from 2002 to 2008 and we want to find the probability of default for a loan with a cut-off point of 24 months. We create a binary (0/1) variable using the cut-off point – '0' being the ones where either loan tenure is over or payment has been made and '1' being the censored one where it is an open account. In addition to this, some macroeconomic factors like unemployment rate, house rate index, interest rate etc. are included. The macroeconomic factors contain the variability between the customers with loan approved in different time point but having the same kind of loan information (e.g. Loan Amount, Tenure, Credit History etc.); this loan information corresponds to the time dynamic part of the data. Sampling Methods: Biased sampling methods can be used to draw the training sample.
  • 7. Generalized Additive Modelling Approach for Probability of default and Loss given default modelling: There are several credit scoring methods to calculate the probability of default and loss given default like Logit Model, Divergence-Discriminant Method, Neural Networks, Proportional Hazard model etc. Most of the known methods being parametric always involve a distributional assumption to build the model; which might not always be a good choice, given the dynamic scenario of economy and customer behaviour. To encounter that effect a semi parametric approach can be taken to incorporate those effects where the parametric part takes care of the conventional aspect of the predictors and the non-parametric part takes care of the remaining, which is not as such functional under any known distribution. A suitable modelling approach of this kind is Generalized Additive Modelling. In statistics, a Generalized Additive Model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. The model relates a univariate response variable, 'Y', to some predictor variables, xi. An exponential family distribution is specified for 'Y' (for example normal, binomial or Poisson distributions) along with a link function g (for example the identity or log functions) relating the expected value of Y to the predictor variables via a structure such as: In the widely-used parametric models, the relationships between continuous predictor variables and the response variable are assumed to be known functional forms, even though they are mostly unknown in many empirical applications. By contrast, the semi-parametric GAM does not assume any functional forms for these relationships, but the data is allowed to determine them. The data-driven cubic B-spline (non- parametric) method is used to estimate the GAM. By doing so, the underlying true relationships between continuous predictor variables and the binary response variable may be uncovered. Performance Measures: Performance measures for PD, LGD models are universal. Following performance measures can be used to validate the model: Model Validation Techniques: Usual Model validation techniques for Logistic Regression model like KS Analysis or ROC curve can also be applied in case of a Proportional Hazard model or a Time Dependent Proportional Hazard Model. Following is a snapshot comparison of ROC curves for the following four procedures:  Time Dependent Proportional Hazard Model  Proportional Hazard Model  Logistic regression  Logistic Regression with a dynamic time component
  • 8. Receiver Operating Characteristic Curve: ROC is commonly used to determine the overall classification power as well as to provide information on the performance of a model at any cut-off score point. A widely used simple analysis is to measure the performance of binary classifier system is a 2×2 contingency table of type I and type II errors. For a given cut-off point score of 0.5, if the estimated probability is over 0.5, it is classified as a default or bad loan. This is a snapshot of a ROC curve. The other measures for validating the performance of a model are,  Area Under Curve (AUC) : B+C, reflecting total accuracy of the model  Gini Coefficient : B/(A+B) Data Structure: Similar to Logistic Modelling technique, this method also deals with closed accounts only; where, if one wants to predict the probability of default within 12 months, data used should be at least one year preceding the time of scoring. The following traits should be captured at a customer/account level: monthly income (INC), debt-to-equity ratio (DE), the amount of loan (FND), monthly payment (MPM), and revolving credit line utilization (UTIL), year(s) of employment experience (EMP), housing ownership (HOM) and delinquency (DEL) reports within the recent history. Application of GAM: GAM, a semi-parametric method proposed by Hastie and Tibshirani (1990), been applied to modelling bankruptcies (Berg 2007). It has also been applied to a comprehensive survey on loan recovery process of Italian Banks. References:  A time-dependent proportional hazards survival model for credit risk analysis, May 2011 - J-K Im, DW Apley, C Qi and X Shan - Department of Industrial Engineering & Management Sciences, Northwestern University  Credit Scoring With Macroeconomic Variables Using Survival Analysis, May 2007 - Tony Bellotti and Jonathan Crook, Credit Research Centre, Management School and Economics, University of Edinburgh  Survival Analysis Methods For Personal Loan Data, April 2002 - Maria Stepanova, UBS AG, Financial Services Group, Lyn Thomas, Department of Management, University of Southampton  A case study on using generalized additive models to fit credit rating scores, 2011 - Marlene Müller, Beuth Hochschule für Technik Berlin  · Nonlinear and Semi-parametric Modelling of Personal Loan Credit Scoring, August 2013 - Nithi Sopitpongstorn, Jean- Pierre Fenech and Param Silvapullea, Department of Econometrics and Business Statistics, Monash University, Australia, Department of Accounting and Finance, Monash University, Australia.
  • 9. Office Bangalore: 389, 2nd Floor, 9th Main, HSR Layout, Sector – 7, Bangalore – 560 102 Phone: +91-80-42102154 US: 1013 Centre Road, ST # 403S, Wilmington, New Castle, DE 19805 Phone: +1 858 312 1075 www.bridgei2i.com | enquiries@bridgei2i.com Facebook | Twitter | Google+ | LinkedIn: BRIDGEi2i About BRIDGEi2i BRIDGEi2i provides Business Analytics Solutions to enterprises globally, enabling them to achieve accelerated business impact harnessing the power of data. These analytics services and technology solutions enable business managers to consume more meaningful information from big data, generate actionable insights from complex business problems and make data driven decisions across pan- enterprise processes to create sustainable business impact. BRIDGEi2i has featured among the top 10 analytics and big data start-ups in several coveted publications.