Neglected Heterogeneity, Simpson’s Paradox, and the Anatomy of Least Squares

Rainer Winkelmann

Journal of Econometric Methods2023https://doi.org/10.1515/jem-2023-0028article
AJG 1ABDC B
Weight
0.54

Abstract

When a sample combines data from two or more groups, multivariate regression yields a matrix-weighted average of the group-specific coefficient vectors. However, it is possible that the weighted average of a specific coefficient falls outside the range of the group-specific coefficients, and it may even have a different sign compared to both group-level coefficients, a manifestation of Simpson’s paradox. The result of the combined regression is then prone to misinterpretation. The purpose of this paper is to raise awareness of this problem and to state conditions under which such non-convex weighting or sign reversal can arise, for a model with two regressors and two groups. Two illustrative examples, an investment equation estimated with panel data, and a cross-sectional earnings equation for men and women, highlight the relevance of these findings for applied work.

4 citations

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1515/jem-2023-0028

Or copy a formatted citation

@article{rainer2023,
  title        = {{Neglected Heterogeneity, Simpson’s Paradox, and the Anatomy of Least Squares}},
  author       = {Rainer Winkelmann},
  journal      = {Journal of Econometric Methods},
  year         = {2023},
  doi          = {https://doi.org/https://doi.org/10.1515/jem-2023-0028},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Neglected Heterogeneity, Simpson’s Paradox, and the Anatomy of Least Squares

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.54

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.80 × 0.15 = 0.12
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.