Modelling Credit Defaults Using Support Vector Machine And Binary Logistic Models

Peter Gachoki, Lucas Macharia, Jeremiah Kinyanjui


Defaulting on a loan essentially occurs when an individual has stopped making payments on a loan or credit card according to the account's terms. A default model is constructed by financial institutions to determine default probabilities on credit obligations by a corporation or sovereign entity. A probability of default model uses multivariate analysis and examines multiple characteristics or variables of the borrower, and it will usually account for credit or business cycles by either incorporating current financial data into the generation of the model or by including economic adjustments. Modelling loan default allows financial institutions to determine typical features and patterns of behavior that lead to a future inability to make debt repayments. This modelling helps to assess the probability of future default for each client. The focus of this study was to apply the support vector machine and binary logistic models to model credit defaults. The process involved identification of the predictors that could be associated with credit defaults as well as comparison of the performance of the prediction models on their statistical power to model credit defaults. The analysis was done using R statistical software. The results showed that variables; credit amount, marital status, credit history and location of property used as security were significant predictors of credit defaults. The results also showed that the binary logistic model had a better performance that the support vector machine model in terms of F1 score and accuracy of predicting credit defaults. The logistic model had the accuracy of 0.826087 and an F1 score of 0.8809524. The support vector machine had the accuracy of 0.7826087 and an F1 score of 0.8554913. From the study findings, it was concluded that, the accuracy of the prediction models in modelling of credit defaults was dependent on the variables considered. Different set of variables would yield different accuracies for the prediction models.


Credit defaults, support vector machine, binary logistic, accuracy, F1 score

Full Text:



Bach, M. P., Zoroja, J., Jaković, B., & Šarlija, N. (2017, May). Selection of variables for credit risk data mining models: preliminary research. In 2017 40th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO) (pp. 1367-1372). IEEE.

Brooks, J. R., & Levitin, A. J. (2020). Redesigning Education Finance: How Student Loans Outgrew the" Debt" Paradigm. Geo. LJ, 109, 5.

Busuttil, S. (2003). Support vector machines.

a. Dale, E. B. (2020). Introduction to Binary Logistic Regression and Propensity Score Analysis. Working Paper. www. researchgate. net Accessed on 06/08.

Edwards, J. (2019). What is predictive analytics? Transforming data into future insights. CIO.[Online] Available at: https://www. cio. com/article/3273114/what-is-predictive-analyticstransforming-data-into-future-insights. html [Accessed 10 August 2019].

Florez-Lopez, R., & Ramon-Jeronimo, J. M. (2014). Modelling credit risk with scarce default data: on the suitability of cooperative bootstrapped strategies for small low-default portfolios. Journal of the Operational Research Society, 65(3), 416-434.

Ganong, P., & Noel, P. J. (2020). Why do borrowers default on mortgages? A new method for causal attribution (No. w27585). National Bureau of Economic Research.

Geisser, S. (2016). Predictive Inference. Retrieved from https://www. /Predictive-Inference/Geisser/p/book/9780203742310.

Glennon, D., & Nigro, P. (2011). Evaluating the performance of static versus dynamic models of credit default: Evidence from long-term small business administration-guaranteed loans. Journal of Credit Risk, 7(2), 3-35.

Lunt, M. (2013). Introduction to Statistical Modelling: Linear Regression. Rheumatology, 54(7), 1137-1140.

Masai, J. M. (2020). Modelling Time to Default on Kenyan Bank Loans Using Non-parametric Models (Doctoral dissertation, University of Nairobi).

Sheskin, D. J. (2011). Parametric Versus Nonparametric Tests. In International Encyclopedia of Statistical Science (pp. 1051-1052). Springer, Berlin, Heidelberg.

Wanjohi, S. M., Waititu, A. G., & Wanjoya, A. K. (2016). Modeling Loan Defaults in Kenya Banks as a Rare Event Using the Generalized Extreme Value Regression Model. Science Journal of Applied Mathematics and Statistics, 4(6), 289-297.



  • There are currently no refbacks.

Copyright (c) 2022 Peter Gachoki, Lucas Macharia, Jeremiah Kinyanjui

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.