It’s All in the Mix: Wasserstein Classification and Regression with Mixed Features

Reza Belbasi et al.

Manufacturing and Service Operations Management2026https://doi.org/10.1287/msom.2023.0738article
FT50UTD24AJG 3ABDC A*
Weight
0.50

Abstract

Problem definition: A key challenge in supervised learning is data scarcity, which can cause prediction models to overfit to the training data and perform poorly out of sample. A contemporary approach to combat overfitting is offered by distributionally robust problem formulations that consider all data-generating distributions close to the empirical distribution derived from historical samples, where “closeness” is determined by the Wasserstein distance. Although such formulations show significant promise in prediction tasks where all input features are continuous, they scale exponentially when discrete features are present. Methodology/results: We demonstrate that distributionally robust mixed-feature classification and regression problems can indeed be solved in polynomial time. Our proof relies on classical ellipsoid method-based solution schemes that do not scale well in practice. To overcome this limitation, we develop a practically efficient (yet, in the worst case, exponential-time) cutting-plane-based algorithm that admits a polynomial-time separation oracle, despite the presence of exponentially many constraints. We compare our method against alternative techniques both theoretically and empirically on standard benchmark instances. Managerial implications: Data-driven operations management problems often involve prediction models with discrete features. We develop and analyze distributionally robust prediction models that faithfully account for the presence of discrete features, and we demonstrate that our models can significantly outperform existing methods that are agnostic to the presence of discrete features both theoretically and on standard benchmark instances. Funding: This work was supported by the Engineering and Physical Sciences Research Council (EPSRC) [Grant EP/W003317/1]. Supplemental Material: The online appendix is available at https://doi.org/10.1287/msom.2023.0738 .

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1287/msom.2023.0738

Or copy a formatted citation

@article{reza2026,
  title        = {{It’s All in the Mix: Wasserstein Classification and Regression with Mixed Features}},
  author       = {Reza Belbasi et al.},
  journal      = {Manufacturing and Service Operations Management},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1287/msom.2023.0738},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

It’s All in the Mix: Wasserstein Classification and Regression with Mixed Features

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.