Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution

Jean-Loup Dupret & Donatien Hainaut

Operations Research2026https://doi.org/10.1287/opre.2024.1102article
FT50UTD24AJG 4*ABDC A*
Weight
0.50

Abstract

Multiasset Optimal Execution via Deep Learning for High-Dimensional Continuous-Time Stochastic Control In “Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution,” Dupret and Hainaut introduce the generalized policy iteration physics-informed neural network, a novel deep learning algorithm for solving high-dimensional continuous-time stochastic optimal control problems even when the optimal control does not admit explicit solution. The method combines physics-informed neural networks with an actor-critic structure based on generalized policy iteration and uses separate networks to approximate both the value function and the multidimensional optimal control. This approach provides a global approximation of the solution across time and space, enabling fast online evaluation. Theoretical guarantees on convergence and optimality are provided, whereas its accuracy and efficacy are empirically validated through two important numerical examples from operations research. Thereby, the authors generalize the Almgren–Chriss framework arising from optimal execution in finance by allowing both temporary and permanent price impacts to be fully nonlinear and by considering a multidimensional setting with multiple cointegrated assets.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1287/opre.2024.1102

Or copy a formatted citation

@article{jean-loup2026,
  title        = {{Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution}},
  author       = {Jean-Loup Dupret & Donatien Hainaut},
  journal      = {Operations Research},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1287/opre.2024.1102},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Deep Learning for High-Dimensional Continuous-Time Stochastic Optimal Control Without Explicit Solution

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.