ISEG

Aluno: Li Zhu

Resumo

Credit card fraud detection is a key application of machine learning, but real-world fraud datasets are often highly imbalanced, with only a very small number of fraudulent transactions. The objective of this study is to explore how class imbalance affects model performance in fraud detection and to evaluate whether resampling strategies improve results. This paper investigates the impact of class imbalance on model performance and evaluates the effectiveness of several resampling techniques in improving prediction results. The XGBoost algorithm is used as a baseline classifier and trained on the original imbalanced data and data processed with various sampling strategies. Results show that XGBoost performs well even on the original imbalanced dataset after adjusting the class weights (scale_pos_weight). While SMOTE and SMOTETomek slightly improve precision, they reduce recall; NearMiss achieves the highest recall but has very low precision. This suggests that there may be a trade-off between identifying fraud and avoiding false positives. This study emphasizes the importance of selecting a sampling strategy that considers not only technical performance but also business objectives. In many real-world applications, algorithm-level tuning methods may be simpler and more efficient than data-level resampling. The paper discusses the limitations of the algorithm and future work directions, including interpretability, data leakage risks, and the potential of threshold tuning or online learning strategies.

Trabalho final de Mestrado

TFM_Li Zhu