The Impact of Manual Adjustments in Scoring Models

Kirill Odintsov , Head of Underwriting and data analytics of Home Credit Indonesia

Kirill Odintsov , Head of Underwriting and data analytics of Home Credit Indonesia

In conversation with Prisila,Correspondent,Asia Business Outlook Magazine. Kirill Explores the Factors and Scenarios Justifying Manual Scoring Model Adjustments in Risk Assessment and Management.

Kirill Odintsov , Under his supervision the teams build over 400 models for different departments like Risk, CRM, Sales, OPS, Custex and various Asian and European markets. During his working career Kirill building upon what he learned during Masters studies in Faculty of Mathematics and Physics in Charles University, Prague (Probability, Statistics and Econometrics)

Factors or situations might warrant manual additions to scoring models in risk assessment and management

There are two types of use cases when we need manual adjustments to the model.

First is manual adjustment during model creation and second is manual adjustment that we do when using the model to make automated decisions.

First Type.

"More focusing on the second type of the manual adjustment (manual adjustment when using the model) mainly because creating extra cutoffs is very dangerous and common."

During the modeling the need for manual adjustments comes from the imperfection of the real data. First there are many biases in the real data from internal process changes and external factors in the market. Second the data is time limited so modeling on them might not have information about crisis and such events. The job of modeler is to find those biases or data issues and make sure they are not negatively influencing the model. If there were no such biases/issues we could model simply by pushing one button without need of data scientists.

In credit risk the well-known bias is called 'Reject Inference'. This bias is that we model only on the data of approved clients so the model does not 'learn' the behavior of those rejected. Other bias could be that in our training data set there was a business process or market condition that now is no longer valid for example on training data we could have had weaker collection process on a segment of client which made them riskier, but now we could have changed it (for example starting to send field collectors on the segment) so the segment of the clients is less risky now. In such situation if we would model on the training data the model would 'learn' the segment to be riskier which thanks to collection process change is no longer true and model would unjustly reject more people from the segment if we would not manually adjust it. Lately I would mention that some relationship between risk and predictors change during crisis like events. For example in good economic situation the predictor that clients have a lot of loans with competitor signify good client but when crisis start these people are first to default. The model does not 'learn' this as it might not have crisis in its training sample.

Second Type.

In ideal world we would use a simple cutoff for the credit score - if the score predicts risk being higher than X we reject otherwise we approve the client. Sadly we cannot do it so simply. The credit score predicts only the risk of the client but when we make decision whether to approve or reject the client we need to take other business aspect into account like for example the profitability. Let's say if we have a product that has higher interest rate or subsidy from our partner it might be profitable to accept higher risk on such product. Other reason for manual adjustment on how we use the scorecard might be predictors that are predictive but are either unstable or too correlated with our internal processes to put into scorecard. All this manually adjust the model by creating a separate cutoffs on different segments of clients.

In good economic situation the predictor that clients have a lot of loans with competitor signify good client but when crisis start these people are first to default.

Methods or techniques use to ensure that manual additions to scoring models are unbiased and data-driven.

For the first type (manual adjustments to the model) it is using some extra data where we think the specific bias is not present (but the new data will have their own biases). For second one (manual adjustment when using the model) we just make sure if we would add the data to the model it would not skew too much the model's performance metric (AUC/GINI)

Balance the need for manual adjustments with maintaining the integrity and fairness of the scoring model, especially in areas like lending or credit risk assessment

This requires a lot of business understanding and discussion between many departments. There are no simple rules one recommendation I have is try to keep it to the minimum and only when there is truly big benefit of doing this.

The potential risks or drawbacks associated with manual additions to scoring models, and how you mitigate them.

By making too many manual adjustments you could be harming predictive power of your overall process.

More focusing on the second type of the manual adjustment (manual adjustment when using the model) mainly because creating extra cutoffs is very dangerous and common. It is tempting to create manual rules as always you can find a segment of clients who have more or less than average risk. Then the people are tempted to say let's have stricter cutoff on those with higher risk. It seems logical, right? These clients have risk higher than average so rejecting them more will help the risk, but more efficient way of doing that is using the scorecard. What do I mean by more efficient? Less volume loss under same risk decrease. Think about it this way you can always find segments of clients with higher than average risk. So you do manual adjustment once and again you still will be able to find those with more than average risk so you do it again and again and again. So ultimately you will create your own 'trivial scorecard'. Unlike normal scorecard it was created not using many statistical methods, you look at predictors univariately not combining them much together, plus updating and managing your manual rules would be quite challenging. Imagine you have 20 manual rules and I ask you to increase approval rate by 7%. It will take you much longer than simple cutoff adjustment on single score.

To avoid unnecessary manual rules we simulate if we can't achieve same or better effect adjusting existing cutoffs on our score rather than adding new ones

Validate the effectiveness of manual additions in improving the performance of a scoring model and the metrics.

This highly depends on why we had to do the manual adjustment. If we did it to decrease bias in the model we look on how the bias decrease and illogical behavior of the model is not there. If we did it because to increase profitability we would check the profitability and so on.

Tools or programming languages do you use for implementing manual additions to scoring models

We use a decision engine in which all our rules are defined. In Home Credit we have a commercial solution for that from an external company.