A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort EHR studies

Ritoban Kundu et al.

Biostatistics2026https://doi.org/10.1093/biostatistics/kxag001article
ABDC A
Weight
0.50

Abstract

Selection bias can hinder accurate estimation of association parameters in binary disease risk models using non-probability samples like electronic health records (EHRs). The issue is compounded when participants are recruited from multiple clinics/centers with varying selection mechanisms that may depend on the disease/outcome of interest. Traditional inverse-probability-weighted (IPW) methods, based on constructed parametric selection models, often struggle with misspecifications when selection mechanisms vary across cohorts. This paper introduces a new Joint Augmented Inverse Probability Weighted (JAIPW) method, which integrates individual-level data from multiple cohorts collected under potentially outcome-dependent selection mechanisms, with data from an external probability sample. JAIPW offers double robustness by incorporating a flexible auxiliary score model to address potential misspecifications in the selection models. We outline the asymptotic properties of the JAIPW estimator, and our simulations reveal that JAIPW achieves up to 6 times lower relative bias and 5 times lower root mean square error (RMSE) compared to the best performing joint IPW methods under scenarios with misspecified selection models. Applying JAIPW to the Michigan Genomics Initiative (MGI), a multi-clinic EHR-linked biobank, combined with external national probability samples, resulted in cancer-sex association estimates closely aligned with national benchmark estimates. We also analyzed the association between cancer and polygenic risk scores (PRS) in MGI to illustrate a situation where the exposure variable is not measured in the external probability sample.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1093/biostatistics/kxag001

Or copy a formatted citation

@article{ritoban2026,
  title        = {{A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort EHR studies}},
  author       = {Ritoban Kundu et al.},
  journal      = {Biostatistics},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1093/biostatistics/kxag001},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

A doubly robust framework for addressing outcome-dependent selection bias in multi-cohort EHR studies

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.