Integrating multi-source data and explainable AI for housing market analysis in Indonesia
Ivan Aldy Ganesen et al.
Abstract
Purpose This paper aims to address limitations in real estate valuation in Indonesia arising from regional heterogeneity, fragmented data sources and opaque automated models. Specifically, it seeks to develop a more accurate and transparent comparative market analysis framework by integrating multisource property data and explainable machine learning, thereby improving both valuation reliability and interpretability for heterogeneous market segments. Design/methodology/approach The study integrates property listings from multiple major Indonesian online platforms into a unified dataset with over 70 spatial, structural, facility and pricing attributes. After extensive preprocessing and normalization, properties are grouped into low, medium and high price classes. Multiple regression-based machine learning models are evaluated, with hyperparameter optimization applied to tree-based models. Explainable analysis is used to examine feature contributions across price classes. Findings Extreme Gradient Boosting demonstrates the strongest overall performance, achieving Mean Absolute Percentage Errors of 26.53% for medium-priced properties and 14.60% for high-priced properties. Explainable analysis indicates that spatial attributes consistently dominate price formation, while the influence of facility-related features varies by price segment, highlighting heterogeneous valuation drivers across the market. Originality/value This paper contributes a multisource, large-scale Indonesian real estate dataset and an explainable automated valuation framework that moves beyond single-platform, black-box approaches. The study provides both predictive accuracy and interpretable insights into price formation, offering value to researchers, practitioners and policymakers seeking transparent and data-driven comparative market analysis in emerging real estate markets.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.