Aluno: Anastasios Kondos
Resumo
This study investigates how accurately machine learning can predict apartment prices in Limassol,
Cyprus, and whether adding textual data from listing descriptions improves performance beyond
standard property features. Over 4,000 listings were scraped between November 2024 and
February 2025, each containing structured numerical attributes (e.g., area, property age,
coordinates) and free-text descriptions written by sellers. The text was preprocessed and vectorized
using TF-IDF. Two input sets were tested: one with only structured features, and another
combining those with textual data. Five regression algorithms were evaluated using grid search
and cross-validation. All machine learning models outperformed the hedonic linear regression
benchmark, highlighting their ability to capture more complex pricing patterns. Gradient Boosting
performed best, achieving R² = 0.84 and MAPE = 16.5% without text. Adding descriptions led to
a modest improvement (R² = 0.86, MAPE = 15.6%), suggesting that text captures some qualitative
signals not fully reflected in the numeric data. However, the gain was limited, likely due to
overlapping content or the shallow representation of TF-IDF. Overall, while listing descriptions
offer incremental value, most predictive power stems from the core property features. Future work
could explore more advanced embedding techniques to better capture meaning and nuance.
Trabalho final de Mestrado