Random Integrated Subdata Ensemble Method for Key Variable Selection in Rare Event Setting

Ching‐Chi Yang et al.

Journal of Forecasting2026https://doi.org/10.1002/for.70120article

AJG 2ABDC A

Weight

0.50

What the paper says

We propose a general variable selection procedure to identify key input variables by applying elastic net regression to representative subdata in place of the full sample to select variables. We combine the lists of selected variables from each subdata through ensemble techniques, using the frequency of selecting the variable across different subdata as the final variable selection criteria. Using only variables that are frequently chosen (i.e., 90%), we are able to build a parsimonious model that optimizes predictive accuracy. We adapt this method to the rare event setting and show its application to bankruptcy data in Taiwan. In addition, we show how variable selection is affected by subdata size, , and sampling procedure.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.