
At Business Science Solutions we have been collaborating with Harry Scheule and Daniel Rosch in the development of mini applications and tools in support of DEEP CREDIT RISK, Machine Learning in Python and R. Deep Credit Risk is a text book which provides new and experienced credit professionals with training in the latest machine learning methods in the context of common credit risk problems.
Readers are introduced to models covering:
- the key credit measures: PD, LGD and EAD,
- for assessing applicants, assigning provisioning and capital, through to stres testing, scenario analysis and loan pricing;
- for data selection, methodology selection, model performance tuning;
- for implementation through classing and rating system design in dynamic and static environments.
The text book, and supporting lessons, are available with R, Python or SAS code. The material introduces key programming concepts in each language and will help those programming for the first time as well as those seeking to deepen their knowledge and skills.
In the learning environment over 60 interactive tools are available which simply and efficiently illustrate key concepts demonstrated in the text, the tools range from simple illustrations of basic statistical theory, through to applied ratings classing methods, economic forecasting models and loss simulation models. Two examples are provided below.
Under and Over-Fitting
A constant modelling challenge is deciding on the degree of necessary complexity. A modeler is generally inclined to add more features to a model as doing so often helps with fit.
In the following app a simple dataset has been generated from an underlying sine curve (the light blue line) plus a little randomisation (the middle blue dots). The data is fundamentally non-linear which makes capturing the underlying relationship an immediate challenge. A polynomial linear regression model is fitted to the data, and through the controls the degree of the polynomical can be increased.
- at 1 degree polynomial (a straight line) the model does not capture the underlying non-linear relationship -- the model is under-fitted;
- at 20 degree polynomial (adjust the slider in the controls) the model captures the underlying non-linear relationship and in the training data manages to capture many of the random deviations from the underlying trend, random deviations which are not the same as the random deviations included in the test data -- the model is over-fitted;
- at 4 degree polynomial the model captures the underlying non-linear relationship equally in both the training and test data -- the model is appropriately fitted.
The fundamental approach to identifying the best degree of fit, is to assess the performance of a model not on its in-sample data, but on a hold-out sample. In fact as modelers increasing optimise model selection with repeated reference to the test data, it makes sense to include more than 1 hold out sample.
Rating Migration
In credit model implementations it is common to class risk into a rating system. The most common and familier rating system is the risk ranking provided by Standard and Poors, Moodys and Fitch which risk grades are represent with letters and numbers such as AAA
(the best rating from Standard and Poors) and Ba3
(a junk bond rating from Moodys). When statistical models are used to predict default probabilities, especially complex machine learning models, ratings classes can be used to represent in simple risk grades the highly precise probabilities. In the following app a portfolio has been modelled and predicted probabilities are produced based on the variables selected by the user. The variables available for selection include:
- application characteristics, which are labeled
..._orig_time
, theFICO_orig_time
is a bureau application score at origination; - individual current performance characteristics, which are labeled
..._time
, thebalance_time
is the balance of the loan at the observation time; - common macroeconomic performance characteristics, which are labeled
..._time
and include:hpi_time
the house price index;rate_time
the current policy interest rate;gdp_time
the current growth rate in gross domestic product;uer_time
the current unemployment rate.
A key feature of rating system design is illustrated in the following app which observes the number of facilities in each ratings class each quarter over the course of an economic downturn.
- for a model with only application characteristics (the tools default state) the ratings population is very stable, to the extent that migration occurs it's exclusively to the absorbing default state;
- for a model with macroeconomic characteristics (in the controls, add
gdp_time
to the underlying statistical model), as the GDP deteriorates with the economic downturn, the proportion of facilities in the best ratings class declines and proportion of facilities in the worst ratings class increases. The number of defaults in both models is the same.
Neither model is better, they are both crude simplifications. The first model has characteristics which might be used for capital allocation purposes, here stability of capital is priority over the business cycle. The second model has characteristics which might be used for loss provisioning purposes, here allocating extra provisions ahead of a downturn is prudent.
To get the book and the data, subscribe to the course, start learning at: www.deepcreditrisk.com.
All the apps are modular, functional and flexibly applied to a range of datasets. If you have questions regarding application development and deployment, or have a business use, please let us know.