Multi-agent reinforcement learning with a hybrid sequential reward feedback strategy for dynamic multi-modal traffic assignment

Dongyue Cun et al.

Transportation Research Part C: Emerging Technologies2026https://doi.org/10.1016/j.trc.2026.105545article

ABDC A*

Weight

0.50

What the paper says

• Propose a multi-agent reinforcement learning framework (MARL) for multi-modal traffic assignment (MMTA) • Avoid to predefine choice sets of travel modes/routes for capturing combined multi-modal paths. • Use multi-edges to represent multi-modal links for cars and buses, exclusive bus lanes, and railways. • Develop a sequential reward feedback strategy to accelerate the convergence of the MARL-based MMTA algorithm. • Use hybrid reward function to reduce policy fluctuations and adjust the optimum between system optimum and user equilibrium. Urbanization and the expansion of transportation modes have exacerbated the challenges of understanding travelers’ decision-making processes regarding route choice across various transportation modes. This paper proposes a novel macroscopic hybrid sequential game method using multi-agent reinforcement learning (MARL) to address issues of computational efficiency and behavioral complexity in multi-modal transportation network simulations. Specifically, agents’ perception behaviors are modeled as a sequential decision-making process considering road capacity constraints, which helps estimate travel time under congestion effects in the multi-modal traffic assignment. In addition, a hybrid reward framework is proposed, providing system-level reward to guide the multi-agent system towards different Nash equilibria, thereby reducing policy fluctuations. To simulate interactions between agents of different transportation modes, a multi-edge representation and reward structures designed for car, bus, priority bus, and metro modes are adopted to handle the mixed traffic flow through the same road. Furthermore, our approach uses a mean-field multi-agent deep Q-learning method to consider both mode and route choice, simplifying agent interactions through mean-field theory and clustering agents with the same origin–destination (OD) demands. Experimental results demonstrate that the hybrid sequential feedback strategy outperforms the simultaneous feedback strategy regarding convergence speed, agent reward distribution, and network flow distribution. Furthermore, the proposed method is tested on the Sioux-Falls network to verify its computational efficiency in three network change scenarios (disruption, road reconstruction, and new road construction). These findings highlight the potential of the proposed MARL method for large-scale multi-modal transportation network analysis, particularly under various incident scenarios, providing an effective tool for urban transportation planning and project evaluation.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.