Dynamic Pricing Strategy Optimization Based on a Reinforcement Learning PPO Algorithm

Zheng Zhang

Journal of Organizational and End User Computing2026https://doi.org/10.4018/joeuc.406688article
AJG 1ABDC B
Weight
0.50

Abstract

Dynamic pricing has become a cornerstone of ride-hailing platforms, yet designing strategies that simultaneously maximize revenue, ensure fairness, and maintain operational efficiency remains a formidable challenge. Traditional reinforcement learning approaches often optimize a single dimension—such as profitability or fairness—at the expense of others, limiting their applicability in real-world markets. To address this gap, the authors propose PRIME-PPO (Pricing with Repositioning Integration, Mechanism-awareness, and Equity via Proximal Policy Optimization), a unified reinforcement learning framework tailored for dynamic pricing. PRIME-PPO extends the PPO backbone with five innovations: a primal–dual mechanism for fairness and budget enforcement, mechanism-aware action masking to preserve incentive compatibility, auxiliary signals from dispatch and repositioning tasks to capture system-level dynamics, hierarchical grouping with parameter sharing for scalability, and dual-critic value distillation from TD3 and DQN for improved sample efficiency.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.4018/joeuc.406688

Or copy a formatted citation

@article{zheng2026,
  title        = {{Dynamic Pricing Strategy Optimization Based on a Reinforcement Learning PPO Algorithm}},
  author       = {Zheng Zhang},
  journal      = {Journal of Organizational and End User Computing},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.4018/joeuc.406688},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Dynamic Pricing Strategy Optimization Based on a Reinforcement Learning PPO Algorithm

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.