In conversation with Prisila, Correspondent, Asia Business Outlook Magazine. Krill Shares his views on the role can alternative data play in lowering financial risk for lenders. He also discussed about Model Interpretability value chain and its essential elements in comparison to black box models for risk management.
Kirill Odintsov is Head of Underwriting and data analytics of Home Credit Indonesia. Previously he was for 3 years head of Data Science of Home Credit Indonesia. He is for 5 years in Data Science Senior management roles and in total 10 years working in Analytics/Data science. Kirill organized the Home Credit Kaggle competition (at the time largest one by number of teams participating) and build one of the first telco scorecards in Indian and Indonesian markets. He started two Data Science teams.
Q: What advantages can we expect from integrating data analytics and other technology into the risk management process? What distinguishes data science from financial data analytics?
A: Using data analytics is a must of risk management. We need to predict which steps of ours, other departments or even external parties will have risk impact and adjust for it timely so we stay within our share-holders risk appetite.
In context of financial institution I would compare Data scientist and Data analyst to Chef and Detective. Data scientist being the Chef. Analyst the Detective. As Chef Data Scientist have wide array of models and methods (recipe and cooking tools) to utilize to satisfy their client. Expert Chef or Data Scientist can customize these methods to suit the taste/need of the current clients. For an Analyst there is much less described 'recipe' you simply receive a problem statement: "Risk on this segment is high find out why." So you as detective form a hypothesis (find suspect) test it on data and reject or confirm it. Data scientist tasks are more structured you start by understanding what a client wants to achieve and which data is available then understanding the data (EDA), cleaning the data, feature engineering, modeling, … For most of these steps (other than understanding what client wants) frameworks and clear guidelines exist. For Data analyst tools exists but no clear framework as each case and problem are unique and you as a Detective uncover the truth one theory at a time.
Q: Do you believe it is appropriate for firms to go from risk management to a risk-enabled performance management system in order to identify developing risk trends, and if so, why?
A: Automation is not even competitive advantage anymore; it is prerequisite for existence for a modern financial company. A human cannot timely check all of the thousands of potential dimension at once and for sure not their combinations. Machine can and if set properly it can give us early indication of risk increase by seeing the distributional changes in incoming clients humans would have no chance to catch.
Risk assessment is quite challenging as we are trying predicting future by information from the past. But often the fundamental risk drivers on the market already changed compared to the past and our predictions might not be fully correct anymore.
Q: What role can alternative data play in lowering financial risk for lenders? and what are the principal difficulties traditional lenders encounter when assessing risk?
A: 6 years ago standard application data where main component of our scorecards (Scorecard = model predicting customer probability to not pay back if we would give them a loan) in Indonesia. Nowadays, we're not only relying on the application form data, which have a limited effect on the model. We have access to different alternative data from our official partners. We do not obtain any raw or personal data about the customers; instead, we only use aggregated scores that we have collaborated on building with our partners. This approach ensures a seamless customer journey with a fast loan application process that we provide.
Risk assessment is quite challenging as we are trying predict future by information from the past. But often the fundamental risk drivers on the market already changed compared to the past and our predictions might not be fully correct anymore. So the tricks understands the situation and predict which patterns are still valid and which are not.
"For building a strong data science team my suggestion is to hire lot of curious people with willingness to learn."
Q: How can financial organizations develop a competent data science team? How to create a powerful team for risk underwriting in a financial institution
A: For building a strong data science team my suggestion is to hire lot of curious people with willingness to learn. Then to maintain them and see them develop you need to create a lot of challenging and diverse tasks for them. Important is to make sure the team has plenty of feedback and support of management so their task do not end up hidden away on folders and presentation but actually get implemented. From implemented projects the team learns the most as they see how their theoretical model works in reality, helping the team to create next model even stronger.
Q: Describe the Model Interpretability value chain and its essential elements in comparison to black box models for risk management. And why, when we don't need to, do we still use black box models in AI?
A: For credit risk modelling (scoring) we have 2 big issues that require the modeler to be more careful when using black box models.
First one is delayed feedback to the model. Once we put our model on production and it starts making decisions which clients to approve or reject it can take months for us to actually know if those decisions were correct. Because to know that we need to observe client's payment behavior on few installments and give them chance to have high enough days past due. So during modelling we have to be careful to not overfit on the training data as some pattern we see on this data now may not be similar in the future. For example there can be a flood in some region making it more risky temporarily. Or we might have collection issues with some type of clients making them riskier (this issue might have been solved already). If then model will learn this incorrect patterns and keep using them we would see it on production only after many month of incorrect decisioning.
Second is called reject inference. In short scoring model rejects clients so when building the new model after on the data approved by the initial old model we see behavior (target) for only approved clients not rejected (for rejected client we don’t see any payment behavior as we did not give them a loan ). If new model is 'too smart' it can over rely that it's data is prefiltered by the old model which is how it is during training but not on production. This can lead to very illogical behavior and patterns learnt by the model.
Both issues can happen both in interpretable and black-box models but for interpretable models they are easier to catch and mitigate. Blackbox models are 'smarter' and can create a proxi predictor to one that you forbit it to use by combination of other predictors thus overcoming your restrictions creating stronger performance on training data but weaker model on production. Having said that I consider it best practice to build more models and always at least 1 interpretable and 1 black-box. As final note I would like to mention that also black box models have methods how to help to interpret them.