Random Integrated Subdata Ensemble Method for Key Variable Selection in Rare Event Setting

Ching‐Chi Yang et al.

Journal of Forecasting2026https://doi.org/10.1002/for.70120article
AJG 2ABDC A
Weight
0.50

Abstract

We propose a general variable selection procedure to identify key input variables by applying elastic net regression to representative subdata in place of the full sample to select variables. We combine the lists of selected variables from each subdata through ensemble techniques, using the frequency of selecting the variable across different subdata as the final variable selection criteria. Using only variables that are frequently chosen (i.e., 90%), we are able to build a parsimonious model that optimizes predictive accuracy. We adapt this method to the rare event setting and show its application to bankruptcy data in Taiwan. In addition, we show how variable selection is affected by subdata size, , and sampling procedure.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1002/for.70120

Or copy a formatted citation

@article{ching‐chi2026,
  title        = {{Random Integrated Subdata Ensemble Method for Key Variable Selection in Rare Event Setting}},
  author       = {Ching‐Chi Yang et al.},
  journal      = {Journal of Forecasting},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1002/for.70120},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Random Integrated Subdata Ensemble Method for Key Variable Selection in Rare Event Setting

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.