When Better Prediction Reduces Overlap: The Predictability Paradox in Propensity Score Matching with Machine Learning

Foong Soon Cheong

Econometrics2026https://doi.org/10.3390/econometrics14020019article
AJG 1ABDC B
Weight
0.50

Abstract

Evidence from observational studies plays a central role in shaping public policy in health, education, and financial regulation, where randomized experiments are rarely feasible. Propensity score matching (PSM) is a widely used method to approximate fair comparisons between treatment and control groups. Incorporating machine learning into the estimation of propensity scores can strengthen prediction and enhance the credibility of findings. However, stronger predictive models create a “predictability paradox”. As predictive accuracy improves, estimated propensity scores for treated and control units become more distinct when treatment assignment is strongly predictable from observed covariates, revealing limited overlap between groups. In the limit, near-perfect prediction produces near-complete separation between groups, rendering traditional matching infeasible and confining inference to a narrow subset of units near the boundary of the propensity score distribution, a setting analogous to a regression discontinuity design (RDD). Researchers thus face perverse incentives to use weaker models for statistically significant but spurious results. These dynamics jeopardize the reliability of evidence for policy. To safeguard decision-making, we propose a simple reform: require that studies using PSM disclose model error rates, including false positive and false negative rates, along with information on overlap and effective sample size.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.3390/econometrics14020019

Or copy a formatted citation

@article{foong2026,
  title        = {{When Better Prediction Reduces Overlap: The Predictability Paradox in Propensity Score Matching with Machine Learning}},
  author       = {Foong Soon Cheong},
  journal      = {Econometrics},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.3390/econometrics14020019},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

When Better Prediction Reduces Overlap: The Predictability Paradox in Propensity Score Matching with Machine Learning

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.