Aluno: Mafalda Gomes Gaspar
Resumo
Nowadays, the world is in constant changes and improvements, particularly in fields such as technology, with data and its applications becoming increasingly important, making it a fundamental aspect of the modern world. In this context, the present study integrates machine learning techniques with econometrics to analyze credit risk, focusing on both predicting defaults and understanding the key personal characteristics that drive default risk.
To achieve this, machine learning and econometric models were applied - Decision Tree, Generalized Additive Model with Least Absolute Shrinkage and Selection Operator, Gradient Boosting, K-Nearest Neighbors, Least Absolute Shrinkage and Selection Operator, Logistic Regression, Naïve Bayes, Random Forest and Ridge - allowing for a comparative performance analysis. Additionally, the study discusses some advantages and limitations associated with the use of machine learning in credit scoring.
The results indicate that Gradient Boosting outperformed other methods, aligning with findings in the literature that highlight its effectiveness in handling imbalanced datasets and providing high accuracy in credit scoring assessment. Beyond performance comparison, the study also explores the key factors influencing default risk, particularly through the application of the Generalized Additive Model with Least Absolute Shrinkage and Selection Operator. The three most relevant variables identified are late payments, frequent missed payments, and high credit utilization.
This study underscores the benefits of adopting new technologies in credit risk management and, consequently, in daily life. Furthermore, it highlights areas for future research, such as integrating alternative data sources to enhance predictive power, improving explainability in model decisions, and developing techniques to strengthen data privacy, ensuring that sensitive information is protected during model training and application.
Trabalho final de Mestrado