Flexible Data Aggregation for Prediction and Decision Making with Contextual Information: Applications in Retailing

Zhenkang Peng et al.

Manufacturing and Service Operations Management2026https://doi.org/10.1287/msom.2025.0313article

FT50UTD24AJG 3ABDC A*

Weight

0.50

What the paper says

Problem definition: How should online retailers make demand predictions or operational decisions with limited relevant data? Motivated by conventional data aggregation approaches, we develop a flexible data aggregation (FlexDA) framework to adapt to different degrees of heterogeneity across products, thereby striking a better balance between the bias and variance of data-driven predictions and decisions. Methodology/results: Under the FlexDA framework, we propose aggregating the individual data sets at different levels, potentially based on contextual information, and training submodels with each aggregated data set. A meta-model is trained to integrate the outputs of these submodels with a set of weights. For demand prediction tasks with linear models, we propose a consistent estimate of the optimal weight and theoretically demonstrate the advantage of the FlexDA approach over existing approaches under the small-data large-scale regime. For decision making with nonlinear feature-demand relationships, with a fixed sample size, we show that the optimality gap of the FlexDA approach decays near-linearly in the number of products with high probability. We further validate the FlexDA approach with synthetic data and real data from the Rossmann store. Building on theoretical development and empirical validation, we conducted an internal study with Meituan, focusing on ordering problems in their community group buying business. Our proposed approach achieves an average reduction of more than 10% in both lost sales ratio and inventory ratio for newly launched fresh products, as well as standard products compared with the algorithm implemented by Meituan. Managerial implications: Simply aggregating data from all products and training a shared model reduces the high variance caused by data scarcity but compromises the ability to capture heterogeneity across products. Our study highlights the value of flexible data aggregation for data-driven prediction and decision making, especially for large-scale applications with limited data, with both theoretical and empirical support. Funding: C. Li is supported by the National Natural Science Foundation of China [Grants 72472098, 72102142, 72131010, and 72192833/72192830]. Y. Rong is supported by the National Natural Science Foundation of China [Grants 72025201, 72331006, and 72221001]. Supplemental Material: The online appendices are available at https://doi.org/10.1287/msom.2025.0313 .

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.