When Algorithms Favor the Underrepresented – Race and Gender Biases in LLM Résumé Evaluations
S. Kim et al.
What the paper says
Abstract: The literature on name-based biases in hiring suggests pervasive discrimination, as White-sounding names receive more callbacks than Black-sounding ones. This study assesses whether AI systems, through open Large Language Models (LLMs), exhibit similar biases. The LLMs evaluated résumés on attributes such as competence and warmth, aggregating these dimensions into composite scores for each résumé. The different names attached to a résumé led to changes in evaluation, despite identical content. Statistically significant race and gender biases were found in most models for warmth and competence ratings. Unlike typical settings, Black applicants and female names were rated slightly higher through the LLMs’ evaluations. These findings highlight the importance of examining AI tools used in hiring as they may unintentionally reflect societal biases.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.