Harnessing Human Uncertainty to Train More Accurate and Aligned AI Systems
Gunnar P. Epping et al.
Abstract
Artificial intelligence (AI)-augmented decision making (AIADM) aims to leverage the computational power of machine learning (ML) models to assist humans in their decision-making processes. In many such systems, especially for complex tasks like medical image classification, ML models are often trained on large data sets annotated by humans. Neglecting to account for human decision-making biases when constructing these labeled data sets can lead to biased data sets, and subsequently models trained on such data sets can inherit the biases. We propose a novel approach to developing AIADM systems that aims to overcome these challenges by harnessing human uncertainty. Our approach has three elements: We collect subjective judgments from human annotators, we calibrate those subjective judgments, and we use the recalibrated subjective judgments to create probabilistic (i.e., soft) labels, which the AI decision aid is then trained on. We evaluate our methods through two studies using data from DiagnosUs, a crowdsourcing platform for medical image annotation. Across multiple training data sets, we assess how our proposed methods impact three key properties of AI decision aids that could benefit from leveraging human uncertainty in data annotation: accuracy, calibration, and alignment with human uncertainty. Our results show that ML models trained on recalibrated soft labels are more accurate and better aligned with expert judgments. We also observe a tradeoff between ML calibration and alignment with human uncertainty. These findings highlight the value of capturing and correcting human uncertainty in ML training data when developing AI systems. History: This paper has been accepted for the Decision Analysis Special Issue on the Implications of Advances in Artificial Intelligence for Decision Analysis. Funding: This work was supported by the Alfred P. Sloan Foundation (Cognitive Economics at Work). Supplemental Material: The online appendix is available at https://doi.org/10.1287/deca.2025.0395 .
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.