SlideShare une entreprise Scribd logo
1  sur  10
Summary
 Problem Statement :
 Other than “Known Good/Bad Sample” in models such as behavior models or loss forecasting type of models,
application scorecards are developed to predict the behavior of all applicants, and using a model based on only
preciously approved applicants can be inaccurate (“Sample Bias”)
 Previous Accept/ Decline decisions were made systematically and were not random. Accept population is a
biased sample and not representative of the rejects.
 Reject Inference:
 A process that the performance of previously rejected applications is analyzed to estimate their behavior.
Equivalent to saying “if these applicants had been accepted, this is how they would have performed”. This
process gives relevance to the scorecard development process by recreating the population performance for a
100% approval rate.
Reasons for Reject Inference
 Reasons for Reject Inference:
 Ignoring rejects would produce a scorecard that is not applicable to the total applicant population, the issue of
sample bias mentioned in above page.
 Reject inference also incorporates the influence of past decision making into the scorecard development
process, such as Low-side Override. If a scorecard is now developed using the known goods and bads, it will tell
us that those who have serious delinquency are very good credit risk.
 From a decision-making perspective, such as cut-off adjustment, reject inference enables accurate and realistic
expected performance forecasts for all applicant. Also coupled with swap set analysis, it will allow us to approve
the same number of people but obtain better performance through better selection. Or it will allow us to approve
more good customers but keeping same existing bad rate.
 In environments with both low or medium approval rates and low bad rates, reject inference helps in indentifying
opportunities to increase market share with risk-adjusted strategies.
Reject Inference Techniques –Method 1 (Requires High Performance Match Rate)
 Use performance at credit bureaus of those declined by one creditor but approved for a similar product
elsewhere. The performance with other products or companies is taken as proxy for how the declined applicants
would have performed has they been originally accepted.
 This method approximates actual performance, but has a few drawbacks.
 The applicants chosen must also obtain similar credit during similar time frame (i.e., soon after being
declined). Applicant declined at one institution or for one product are also likely to be declined elsewhere,
thus reducing the sample size.
 The “BAD” definition chosen through analysis of known goods and known bads must be used for these
accounts using different data sources. The bad definition from bureau will not be 100% consistent with
the definition from line of business.
 Different portfolios exhibits very different performance match rate.
Reject Inference Techniques- Method 2
2. Approve All Application
 Methods of data collection are:
1) Approving all applicants for a specific period, enough to generate a sample for model development data,
with enough bads.
2) Approving all applications above cutoff, but only randomly selected 1/N below cut-off.
3) Approving all applications up to 10 or 20 points below cut-off, and randomly sampling the rest, in order to
get a better sample applications in the decision making (i.e., where cut-off decision are likely to be made)
 Advantage:
1) The only method to find out the actual performance of rejected accounts. The most scientific and simple
way.
 Disadvantage:.
1) Approve the applicants that are known to be very high-risk can be in potential high cost in losses. A
further strategy for this can be granting the lower loans / credit lines.
2) In certain jurisdictions, there may be legal hurdles to this method. Approving some and declining others
with similar characteristics, or randomly approving applicants may present problems. A further strategy
for this can be avoiding these states.
Reject Inference Techniques- Method 3 (Requires High Performance Match Rate)
3. Supplemental Bureau All Data on Reject Inference
 Key Assumption:
1) Obtain bureau data on accepts and rejects at the end of the observation period. Use the performance
with other creditors over the observation period to infer how the rejects would have performed had they
been accepted.
2) P(bad | X, Z, rejected) = P(bad |X, Z). The bureau data at time of application (X) and the downstream
bureau data (Z) contain all the relevant information about P(bad).
 Method:
I. Step 1, fit a model for P(on-us-bad | Off-us performance) over the booked population.
II. Step 2, apply the model from step one to the reject population and get a predicted probability of bad for
each reject applicant. p=P(bad | Off-us performance).
III. Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the
account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’.
IV. Step 4, score the whole data (booked population weights=1 and reject population weights by step 3) to fit
a statistical model.
 Advantage:
1) Weak assumption
2) Incorporate additional information
 Weakness:
1) Requires Quality bureau match.
2) Costly.
Reject Inference Techniques- Method 4 (Low/No Performance Match Rate)
4. Reject Inference when match rate is low/none
 Reject Inference with supplemental Bureau Attributes Data
1) Obtain bureau data on accepts at the end of the observation period. Use the performance with other
creditors over the observation period to infer how the rejects would have performed had they been
accepted.
2) Method:
1) Step 1, fit a model for P(on-us bad | all data at time of application) over the booked population.
2) Step 2, apply the model from step one to the reject population and get a predicted probability of
bad for each reject applicant. p=P(bad | all data at time of application).
3) Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the
account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’.
4) Step 4, score the whole data (booked population weights=1 and reject population weights by step
3) to fit a statistical model.
Continue Method 4
This method involves using rejects with weight values that correspond to the probability of a given loan
application being approved or rejected.
1.The first step involves developing a scorecard using the information on approved loan applications:
2. Then, using the resulting scorecard, we evaluate the set of rejects:
3. For each rejected application, there are two records containing the weight values that correspond to the
probabilities:
Continue Method 4
4. The joint dataset that is extended due to the "double" number of rejects is used to adjust the parameters of
the scorecard:
The use of both the probability of rejection and the probability of approval of a reject allows adjusting the
parameters of the score to cover either of the two possible types of behavior ("good" or "bad").
For any comment or question,
please contact @
Kaitlyn.S.Hu@Gmail.Com
Thank You!

Contenu connexe

Tendances

[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸NAVER D2
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning ProjectAbhishek Singh
 
SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。yoyamasaki
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousingwork
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsKamalika Dutta
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descentSuraj Parmar
 
Data Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieData Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieJens Albrecht
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioMarina Santini
 
Deep Learning using Keras
Deep Learning using KerasDeep Learning using Keras
Deep Learning using KerasAly Abdelkareem
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion MatrixRajat Gupta
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysisalmenea
 
Crisp dm
Crisp dmCrisp dm
Crisp dmakbkck
 
데이터를 보는 안목 (Data Literacy)
데이터를 보는 안목 (Data Literacy)데이터를 보는 안목 (Data Literacy)
데이터를 보는 안목 (Data Literacy)sidney yang
 

Tendances (20)

[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸[216]딥러닝예제로보는개발자를위한통계 최재걸
[216]딥러닝예제로보는개발자를위한통계 최재걸
 
Real time analytics
Real time analyticsReal time analytics
Real time analytics
 
Machine Learning Project
Machine Learning ProjectMachine Learning Project
Machine Learning Project
 
SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。SQL+NoSQL!? それならMySQL Clusterでしょ。
SQL+NoSQL!? それならMySQL Clusterでしょ。
 
Datawarehousing
DatawarehousingDatawarehousing
Datawarehousing
 
Map Reduce
Map ReduceMap Reduce
Map Reduce
 
Big Data Analytics for Real Time Systems
Big Data Analytics for Real Time SystemsBig Data Analytics for Real Time Systems
Big Data Analytics for Real Time Systems
 
Linear regression with gradient descent
Linear regression with gradient descentLinear regression with gradient descent
Linear regression with gradient descent
 
Data Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur TechnologieData Lake Architektur: Von den Anforderungen zur Technologie
Data Lake Architektur: Von den Anforderungen zur Technologie
 
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain RatioLecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
Lecture 4 Decision Trees (2): Entropy, Information Gain, Gain Ratio
 
Deep Learning using Keras
Deep Learning using KerasDeep Learning using Keras
Deep Learning using Keras
 
Session 14 - Hive
Session 14 - HiveSession 14 - Hive
Session 14 - Hive
 
Confusion Matrix
Confusion MatrixConfusion Matrix
Confusion Matrix
 
Data mining 1
Data mining 1Data mining 1
Data mining 1
 
Sentiment Analysis
Sentiment AnalysisSentiment Analysis
Sentiment Analysis
 
Machine learning clustering
Machine learning clusteringMachine learning clustering
Machine learning clustering
 
Map reduce prashant
Map reduce prashantMap reduce prashant
Map reduce prashant
 
Crisp dm
Crisp dmCrisp dm
Crisp dm
 
데이터를 보는 안목 (Data Literacy)
데이터를 보는 안목 (Data Literacy)데이터를 보는 안목 (Data Literacy)
데이터를 보는 안목 (Data Literacy)
 
Hadoop
HadoopHadoop
Hadoop
 

Similaire à Reject Inference Methodologies on Underwriting Model

7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...55296
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation ModelMihai Enescu
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docxShiraPrater50
 
Predictive_Analytics_A_WC_Game_Changer
Predictive_Analytics_A_WC_Game_ChangerPredictive_Analytics_A_WC_Game_Changer
Predictive_Analytics_A_WC_Game_ChangerJeff Viene
 
Certified Specialist Business Intelligence (.docx
Certified     Specialist     Business  Intelligence     (.docxCertified     Specialist     Business  Intelligence     (.docx
Certified Specialist Business Intelligence (.docxdurantheseldine
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxChristianahEfunniyi
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryPranov Mishra
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptxDhanuDhanu49
 
Chapter16For all types of project and in their different sizes, .docx
Chapter16For all types of project and in their different sizes, .docxChapter16For all types of project and in their different sizes, .docx
Chapter16For all types of project and in their different sizes, .docxchristinemaritza
 
How Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudHow Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudTrendwise Analytics
 
Creditscore
CreditscoreCreditscore
Creditscorekevinlan
 
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...Pratik Meshram
 
Modeling results from Health Sciences data
Modeling results from Health Sciences dataModeling results from Health Sciences data
Modeling results from Health Sciences dataJudson Chase
 
Assessment Group Managers Training-SMCC1.pptx
Assessment Group Managers Training-SMCC1.pptxAssessment Group Managers Training-SMCC1.pptx
Assessment Group Managers Training-SMCC1.pptxHctorMurciaForero
 
How Long Is The Tail?
How Long Is The Tail?How Long Is The Tail?
How Long Is The Tail?Marius88
 
Data Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingData Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingJim Kaplan CIA CFE
 

Similaire à Reject Inference Methodologies on Underwriting Model (20)

ICAR - IFPRI- Power Calculation
ICAR - IFPRI- Power CalculationICAR - IFPRI- Power Calculation
ICAR - IFPRI- Power Calculation
 
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
7. Plan, perform, and evaluate samples for substantive procedures IPPTChap009...
 
Credit Risk Evaluation Model
Credit Risk Evaluation ModelCredit Risk Evaluation Model
Credit Risk Evaluation Model
 
Credit Card Marketing Classification Trees Fr.docx
 Credit Card Marketing Classification Trees Fr.docx Credit Card Marketing Classification Trees Fr.docx
Credit Card Marketing Classification Trees Fr.docx
 
Audit sampling
Audit samplingAudit sampling
Audit sampling
 
Predictive_Analytics_A_WC_Game_Changer
Predictive_Analytics_A_WC_Game_ChangerPredictive_Analytics_A_WC_Game_Changer
Predictive_Analytics_A_WC_Game_Changer
 
Certified Specialist Business Intelligence (.docx
Certified     Specialist     Business  Intelligence     (.docxCertified     Specialist     Business  Intelligence     (.docx
Certified Specialist Business Intelligence (.docx
 
Project Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptxProject Report for Mostan Superstore.pptx
Project Report for Mostan Superstore.pptx
 
Reduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage IndustryReduction in customer complaints - Mortgage Industry
Reduction in customer complaints - Mortgage Industry
 
Credit scoring i financial sector
Credit scoring i financial  sector Credit scoring i financial  sector
Credit scoring i financial sector
 
churn_detection.pptx
churn_detection.pptxchurn_detection.pptx
churn_detection.pptx
 
Chapter16For all types of project and in their different sizes, .docx
Chapter16For all types of project and in their different sizes, .docxChapter16For all types of project and in their different sizes, .docx
Chapter16For all types of project and in their different sizes, .docx
 
How Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From FraudHow Big Data Analysis Can Protect You From Fraud
How Big Data Analysis Can Protect You From Fraud
 
Creditscore
CreditscoreCreditscore
Creditscore
 
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
pratik meshram-Unit 4 contemporary marketing research full notes pune univers...
 
Modeling results from Health Sciences data
Modeling results from Health Sciences dataModeling results from Health Sciences data
Modeling results from Health Sciences data
 
Assessment Group Managers Training-SMCC1.pptx
Assessment Group Managers Training-SMCC1.pptxAssessment Group Managers Training-SMCC1.pptx
Assessment Group Managers Training-SMCC1.pptx
 
How Long Is The Tail?
How Long Is The Tail?How Long Is The Tail?
How Long Is The Tail?
 
03_AJMS_298_21.pdf
03_AJMS_298_21.pdf03_AJMS_298_21.pdf
03_AJMS_298_21.pdf
 
Data Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding SamplingData Analytics for Internal Auditors - Understanding Sampling
Data Analytics for Internal Auditors - Understanding Sampling
 

Reject Inference Methodologies on Underwriting Model

  • 1.
  • 2. Summary  Problem Statement :  Other than “Known Good/Bad Sample” in models such as behavior models or loss forecasting type of models, application scorecards are developed to predict the behavior of all applicants, and using a model based on only preciously approved applicants can be inaccurate (“Sample Bias”)  Previous Accept/ Decline decisions were made systematically and were not random. Accept population is a biased sample and not representative of the rejects.  Reject Inference:  A process that the performance of previously rejected applications is analyzed to estimate their behavior. Equivalent to saying “if these applicants had been accepted, this is how they would have performed”. This process gives relevance to the scorecard development process by recreating the population performance for a 100% approval rate.
  • 3. Reasons for Reject Inference  Reasons for Reject Inference:  Ignoring rejects would produce a scorecard that is not applicable to the total applicant population, the issue of sample bias mentioned in above page.  Reject inference also incorporates the influence of past decision making into the scorecard development process, such as Low-side Override. If a scorecard is now developed using the known goods and bads, it will tell us that those who have serious delinquency are very good credit risk.  From a decision-making perspective, such as cut-off adjustment, reject inference enables accurate and realistic expected performance forecasts for all applicant. Also coupled with swap set analysis, it will allow us to approve the same number of people but obtain better performance through better selection. Or it will allow us to approve more good customers but keeping same existing bad rate.  In environments with both low or medium approval rates and low bad rates, reject inference helps in indentifying opportunities to increase market share with risk-adjusted strategies.
  • 4. Reject Inference Techniques –Method 1 (Requires High Performance Match Rate)  Use performance at credit bureaus of those declined by one creditor but approved for a similar product elsewhere. The performance with other products or companies is taken as proxy for how the declined applicants would have performed has they been originally accepted.  This method approximates actual performance, but has a few drawbacks.  The applicants chosen must also obtain similar credit during similar time frame (i.e., soon after being declined). Applicant declined at one institution or for one product are also likely to be declined elsewhere, thus reducing the sample size.  The “BAD” definition chosen through analysis of known goods and known bads must be used for these accounts using different data sources. The bad definition from bureau will not be 100% consistent with the definition from line of business.  Different portfolios exhibits very different performance match rate.
  • 5. Reject Inference Techniques- Method 2 2. Approve All Application  Methods of data collection are: 1) Approving all applicants for a specific period, enough to generate a sample for model development data, with enough bads. 2) Approving all applications above cutoff, but only randomly selected 1/N below cut-off. 3) Approving all applications up to 10 or 20 points below cut-off, and randomly sampling the rest, in order to get a better sample applications in the decision making (i.e., where cut-off decision are likely to be made)  Advantage: 1) The only method to find out the actual performance of rejected accounts. The most scientific and simple way.  Disadvantage:. 1) Approve the applicants that are known to be very high-risk can be in potential high cost in losses. A further strategy for this can be granting the lower loans / credit lines. 2) In certain jurisdictions, there may be legal hurdles to this method. Approving some and declining others with similar characteristics, or randomly approving applicants may present problems. A further strategy for this can be avoiding these states.
  • 6. Reject Inference Techniques- Method 3 (Requires High Performance Match Rate) 3. Supplemental Bureau All Data on Reject Inference  Key Assumption: 1) Obtain bureau data on accepts and rejects at the end of the observation period. Use the performance with other creditors over the observation period to infer how the rejects would have performed had they been accepted. 2) P(bad | X, Z, rejected) = P(bad |X, Z). The bureau data at time of application (X) and the downstream bureau data (Z) contain all the relevant information about P(bad).  Method: I. Step 1, fit a model for P(on-us-bad | Off-us performance) over the booked population. II. Step 2, apply the model from step one to the reject population and get a predicted probability of bad for each reject applicant. p=P(bad | Off-us performance). III. Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’. IV. Step 4, score the whole data (booked population weights=1 and reject population weights by step 3) to fit a statistical model.  Advantage: 1) Weak assumption 2) Incorporate additional information  Weakness: 1) Requires Quality bureau match. 2) Costly.
  • 7. Reject Inference Techniques- Method 4 (Low/No Performance Match Rate) 4. Reject Inference when match rate is low/none  Reject Inference with supplemental Bureau Attributes Data 1) Obtain bureau data on accepts at the end of the observation period. Use the performance with other creditors over the observation period to infer how the rejects would have performed had they been accepted. 2) Method: 1) Step 1, fit a model for P(on-us bad | all data at time of application) over the booked population. 2) Step 2, apply the model from step one to the reject population and get a predicted probability of bad for each reject applicant. p=P(bad | all data at time of application). 3) Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’. 4) Step 4, score the whole data (booked population weights=1 and reject population weights by step 3) to fit a statistical model.
  • 8. Continue Method 4 This method involves using rejects with weight values that correspond to the probability of a given loan application being approved or rejected. 1.The first step involves developing a scorecard using the information on approved loan applications: 2. Then, using the resulting scorecard, we evaluate the set of rejects: 3. For each rejected application, there are two records containing the weight values that correspond to the probabilities:
  • 9. Continue Method 4 4. The joint dataset that is extended due to the "double" number of rejects is used to adjust the parameters of the scorecard: The use of both the probability of rejection and the probability of approval of a reject allows adjusting the parameters of the score to cover either of the two possible types of behavior ("good" or "bad").
  • 10. For any comment or question, please contact @ Kaitlyn.S.Hu@Gmail.Com Thank You!