Multi-Attribute Utility Deep Reinforcement Learning method for Sequential Multi-Criteria Decision problems: Application to human resource planning

Mohammadreza Nematollahi et al.

Computers and Operations Research2026https://doi.org/10.1016/j.cor.2026.107426article

AJG 3ABDC A

Weight

0.50

What the paper says

Problem-solving and decision-making can be complex. There are often conflicting criteria, and decisions must take into account both immediate and long-term impacts, which define Sequential Multi-Criteria Decision (SMCD). Deep Reinforcement Learning (DRL) has emerged by integrating traditional Reinforcement Learning with Deep Learning to tackle intricate sequential decision-making problems. Although DRL has seen significant progress recently, there has been limited focus on developing DRL algorithms specifically for SMCD problems, which usually involve conflicting and non-commensurable attributes. To bridge this gap, we introduce a novel algorithm called Multi-Attribute Utility DRL (MAUDRL), which combines DRL with Multi-Criteria Decision Analysis (MCDA). This innovative approach provides a clear and transparent DRL model that can address the intricacies of SMCD problems while integrating the risk attitudes and preferences of the decision-maker. We showcase the potential of MAUDRL in promoting sustainable decision-making for human resource planning for blueberry farming in British Columbia, Canada. We evaluate the performance of MAUDRL in comparison with two benchmark algorithms—Oracle Discrete Multi-Attribute Utility Theory (MAUT) and the Single Reward Aggregation Approach—using three metrics: policy quality, goal achievement, and run times. The numerical analysis and benchmarks validate that MAUDRL offers practical solutions for SMCD problems by assisting in exploring diverse solution spaces efficiently. The theoretical implications and practical applications of these results are discussed, underscoring the capability of MAUDRL in tackling complex SMCD problem domains and advancing sustainable and socially responsible decision-making while considering the risk preferences of decision-makers. • Presents a MAUDRL method for sequential multi-criteria decision problems. • Integrates deep reinforcement learning with multi-attribute utility theory. • Considers decision-makers’ risk preferences and multiple conflicting criteria. • Reduces runtime by training separate DQNs and aggregating learned utilities. • Demonstrates MAUDRL in sustainable human resource planning for agriculture.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.