Data Analytics for Internal Auditors - Understanding Sampling
Reject Inference Methodologies on Underwriting Model
1.
2. Summary
Problem Statement :
Other than “Known Good/Bad Sample” in models such as behavior models or loss forecasting type of models,
application scorecards are developed to predict the behavior of all applicants, and using a model based on only
preciously approved applicants can be inaccurate (“Sample Bias”)
Previous Accept/ Decline decisions were made systematically and were not random. Accept population is a
biased sample and not representative of the rejects.
Reject Inference:
A process that the performance of previously rejected applications is analyzed to estimate their behavior.
Equivalent to saying “if these applicants had been accepted, this is how they would have performed”. This
process gives relevance to the scorecard development process by recreating the population performance for a
100% approval rate.
3. Reasons for Reject Inference
Reasons for Reject Inference:
Ignoring rejects would produce a scorecard that is not applicable to the total applicant population, the issue of
sample bias mentioned in above page.
Reject inference also incorporates the influence of past decision making into the scorecard development
process, such as Low-side Override. If a scorecard is now developed using the known goods and bads, it will tell
us that those who have serious delinquency are very good credit risk.
From a decision-making perspective, such as cut-off adjustment, reject inference enables accurate and realistic
expected performance forecasts for all applicant. Also coupled with swap set analysis, it will allow us to approve
the same number of people but obtain better performance through better selection. Or it will allow us to approve
more good customers but keeping same existing bad rate.
In environments with both low or medium approval rates and low bad rates, reject inference helps in indentifying
opportunities to increase market share with risk-adjusted strategies.
4. Reject Inference Techniques –Method 1 (Requires High Performance Match Rate)
Use performance at credit bureaus of those declined by one creditor but approved for a similar product
elsewhere. The performance with other products or companies is taken as proxy for how the declined applicants
would have performed has they been originally accepted.
This method approximates actual performance, but has a few drawbacks.
The applicants chosen must also obtain similar credit during similar time frame (i.e., soon after being
declined). Applicant declined at one institution or for one product are also likely to be declined elsewhere,
thus reducing the sample size.
The “BAD” definition chosen through analysis of known goods and known bads must be used for these
accounts using different data sources. The bad definition from bureau will not be 100% consistent with
the definition from line of business.
Different portfolios exhibits very different performance match rate.
5. Reject Inference Techniques- Method 2
2. Approve All Application
Methods of data collection are:
1) Approving all applicants for a specific period, enough to generate a sample for model development data,
with enough bads.
2) Approving all applications above cutoff, but only randomly selected 1/N below cut-off.
3) Approving all applications up to 10 or 20 points below cut-off, and randomly sampling the rest, in order to
get a better sample applications in the decision making (i.e., where cut-off decision are likely to be made)
Advantage:
1) The only method to find out the actual performance of rejected accounts. The most scientific and simple
way.
Disadvantage:.
1) Approve the applicants that are known to be very high-risk can be in potential high cost in losses. A
further strategy for this can be granting the lower loans / credit lines.
2) In certain jurisdictions, there may be legal hurdles to this method. Approving some and declining others
with similar characteristics, or randomly approving applicants may present problems. A further strategy
for this can be avoiding these states.
6. Reject Inference Techniques- Method 3 (Requires High Performance Match Rate)
3. Supplemental Bureau All Data on Reject Inference
Key Assumption:
1) Obtain bureau data on accepts and rejects at the end of the observation period. Use the performance
with other creditors over the observation period to infer how the rejects would have performed had they
been accepted.
2) P(bad | X, Z, rejected) = P(bad |X, Z). The bureau data at time of application (X) and the downstream
bureau data (Z) contain all the relevant information about P(bad).
Method:
I. Step 1, fit a model for P(on-us-bad | Off-us performance) over the booked population.
II. Step 2, apply the model from step one to the reject population and get a predicted probability of bad for
each reject applicant. p=P(bad | Off-us performance).
III. Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the
account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’.
IV. Step 4, score the whole data (booked population weights=1 and reject population weights by step 3) to fit
a statistical model.
Advantage:
1) Weak assumption
2) Incorporate additional information
Weakness:
1) Requires Quality bureau match.
2) Costly.
7. Reject Inference Techniques- Method 4 (Low/No Performance Match Rate)
4. Reject Inference when match rate is low/none
Reject Inference with supplemental Bureau Attributes Data
1) Obtain bureau data on accepts at the end of the observation period. Use the performance with other
creditors over the observation period to infer how the rejects would have performed had they been
accepted.
2) Method:
1) Step 1, fit a model for P(on-us bad | all data at time of application) over the booked population.
2) Step 2, apply the model from step one to the reject population and get a predicted probability of
bad for each reject applicant. p=P(bad | all data at time of application).
3) Step 3, replicate each reject account to two with both good and bad. Assign p as the weight to the
account with outcome ’bad’ and 1-p as the weight to the account with outcome ’good’.
4) Step 4, score the whole data (booked population weights=1 and reject population weights by step
3) to fit a statistical model.
8. Continue Method 4
This method involves using rejects with weight values that correspond to the probability of a given loan
application being approved or rejected.
1.The first step involves developing a scorecard using the information on approved loan applications:
2. Then, using the resulting scorecard, we evaluate the set of rejects:
3. For each rejected application, there are two records containing the weight values that correspond to the
probabilities:
9. Continue Method 4
4. The joint dataset that is extended due to the "double" number of rejects is used to adjust the parameters of
the scorecard:
The use of both the probability of rejection and the probability of approval of a reject allows adjusting the
parameters of the score to cover either of the two possible types of behavior ("good" or "bad").
10. For any comment or question,
please contact @
Kaitlyn.S.Hu@Gmail.Com
Thank You!