Aluno: Joana Afonso Pinto
Resumo
Understanding how elite swimmers pace themselves during competition is essential for improving performance in long-distance events. This study explores the pacing strategies adopted by athletes in the 800m and 1500m freestyle races at the 2024 Olympic Games, with a particular focus on identifying the factors that explain variations in swimming velocity and evaluating which machine learning model best predicts it.
Initially, a classification-based approach was considered, aiming to predict pacing profiles from race features. However, due to the limited size and nature of the dataset, this approach was discarded. As an alternative, a two-step methodology was adopted: (i) pacing strategies were explored through agglomerative hierarchical clustering; and (ii) regression-based models were used to explain and predict swimmer velocity throughout the race.
The clustering analysis revealed three distinct pacing profiles in the 800m, two U-shaped patterns (one faster and one slower) and one positive-split strategy, while two U-shaped profiles were identified in the 1500m. Statistical tests confirmed that these clusters were associated with sex, entry time, and pacing variability (CV%), but not with final race ranking.
To study the determinants of velocity, new variables, including acceleration, distance to the finish line, and previous split, were computed. Feature importance analysis identified sex, acceleration, and entry time as the strongest predictors. Among the models tested, Gradient Boosting revealed the best predictive performance, outperforming Random Forests, Neural Networks, and traditional OLS regression. Residual analysis, including the Durbin-Watson test, confirmed the statistical robustness of the models.
Trabalho final de Mestrado