Examining the market sample size for machine learning-based mass appraisal: a case study in Pendik district of Istanbul
Arif Çağdaş Aydınoğlu et al.
Abstract
Determining the optimum market sample size for Machine Learning (ML)-based modelling is important to achieve accurate, reliable and economical results in mass appraisal process. Insufficient market samples can lead to the model under-representing the diversity of the market, while many market samples may bring unnecessary computational costs. In this study, a methodological framework and comparison of model performances according to sample size for mass appraisal with ML was designed. A case study was performed in the densely populated Pendik district of Istanbul. 121 variables were determined in structural, spatial and local categories affecting the value of residential real estate, and the datasets were organised in the GIS environment. A total of 142 models were developed with increase of 500 market samples with the Random Forest algorithm, and models were evaluated with performance metrics. The modelling results showed that the models make estimations with high prediction accuracy and increasing the number of market samples improves performance in all metrics. When detecting the model performances based on the market sample size, the general distribution of the breakpoints was quite similar. It was concluded that high-accuracy results can be obtained with fewer market samples when the specific performance range is considered sufficient.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.