ISEG

Aluno: Anastasios Kondos

Resumo

This study investigates how accurately machine learning can predict apartment prices in Limassol, Cyprus, and whether adding textual data from listing descriptions improves performance beyond standard property features. Over 4,000 listings were scraped between November 2024 and February 2025, each containing structured numerical attributes (e.g., area, property age, coordinates) and free-text descriptions written by sellers. The text was preprocessed and vectorized using TF-IDF. Two input sets were tested: one with only structured features, and another combining those with textual data. Five regression algorithms were evaluated using grid search and cross-validation. All machine learning models outperformed the hedonic linear regression benchmark, highlighting their ability to capture more complex pricing patterns. Gradient Boosting performed best, achieving R² = 0.84 and MAPE = 16.5% without text. Adding descriptions led to a modest improvement (R² = 0.86, MAPE = 15.6%), suggesting that text captures some qualitative signals not fully reflected in the numeric data. However, the gain was limited, likely due to overlapping content or the shallow representation of TF-IDF. Overall, while listing descriptions offer incremental value, most predictive power stems from the core property features. Future work could explore more advanced embedding techniques to better capture meaning and nuance.

Trabalho final de Mestrado

TFM_Anastasios Kondos