Random forests and mixed effects random forests for small area estimation of general parameters: A poverty mapping case study in Mozambique
Patrick Krennmair et al.
Abstract
Use of standard random forests may not guarantee reliable small area estimates unless a rich source of predictors explains the between-area heterogeneity. We propose mixed effects random forests with area random effects for small area estimation of general parameters. A new fitting algorithm with an embedded bootstrap-bias correction for the random forest residual variance is presented. Point estimators of small area parameters are constructed using a smearing estimator of the area-specific distribution function. Nonparametric block bootstrap is used for MSE estimation. The methodology is evaluated using household consumption data from Mozambique to derive district estimates of head count ratio and poverty gap. Comparisons to the empirical best predictor under a linear mixed model and to a synthetic estimator under the random forest are presented. Estimates are further contrasted to 2023 World Bank estimates and to design-unbiased direct estimates. The results show: (a) the advantages from including random effects in random forests, (b) the importance of data transformations for machine learning methods, (c) robustness properties of random forest-type methods, and (d) the importance of bias correcting the naive estimator of the random forest residual variance. Our conclusions demonstrate that a black-box approach to using machine learning methods should be avoided.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.