Robust and reproducible evaluation framework for population synthesis models—Application to probabilistic and deep generative models
Vianey Darsel et al.
Abstract
• An evaluation framework for population synthesis based on three criteria • An introduction to an open-source and reference dataset for population synthesis • An introduction to population synthesis of a Diffusion Model handling tabular data • A benchmark in two realistic settings with Probabilistic and Deep Generative Models Extensive research exists on implementing new algorithms for population synthesis; however, there remains no consensus on how to evaluate a generated synthetic population. The generated population must be similar to the real population. That involves ensuring a similar distribution at a macroscopic level, and that each generated individual is realistic at the microscopic level. Given that population synthesis involves real data, another concern is to respect the data privacy , which must be also evaluated. In this paper, leveraging insights from population synthesis and tabular data generation literature, we propose three metrics, that address the current limitations in evaluation. To support our metrics, we provide some mathematical demonstrations, and the robustness of the metrics are tested, whenever possible. In addition to these metrics, we propose an open-source dataset that we consider well fitted for model evaluation. This allows proposing a complete evaluation framework for population synthesis promoting reproducible science. As an application, we present an extensive benchmark with probabilistic models, and deep generative models, including –the current state-of-the-art generative model for tabular data synthesis– diffusion model. In this benchmark, we simulate two data scenarios corresponding to real-world use cases. The results indicate that, under these settings, Bayesian networks are the most interesting models, performing well for all criteria. Regarding deep generative models, the diffusion model gives promising results, as it is the only model achieving similar results to those of the Bayesian Network, and could be a great option in more complicated use cases, with more attributes. Our code is available here, with our evaluation methodology and implementation of models, for future research: https://github.com/vdarsel/PopulationSynthesis .
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.