Unpacking the Validity of Open-Ended Personality Assessments Using Fine-Tuned Large Language Models

Andrew B. Speer et al.

Organizational Research Methods2026https://doi.org/10.1177/10944281251413746article
AJG 4ABDC A*
Weight
0.50

Abstract

Alternative approaches to personality measurement, such as open-ended narrative-based assessments, have potential advantages for organizational research and practice. In this research, we investigate factors that affect valid application of natural language processing (NLP) for scoring open-ended personality assessments and when, how, and why such assessments capture personality-related variance. Using a large sample of responses to open-ended assessments, convergence between NLP scores and self-report target scores increased as the degree of customization and the sophistication of the underlying model increased, with the worst psychometric performance occurring for zero-shot large language model (LLM) scores and the best for fine-tuned LLM scores. However, all scoring methods exhibited evidence of validity. Additionally, when trained to predict direct evaluations of the narrative responses, correlations with target scores were large ( M = .83). NLP scores also exhibited discriminant and criterion-related validity evidence. However, validity was contingent upon the methodological rigor employed in developing writing prompts. Prompts designed to elicit trait-relevant information outperformed generic prompts, and this occurred because trait-specific prompts increased the amount of trait-relevant information (i.e., narrative units), which was associated with enhanced convergence with target scores.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1177/10944281251413746

Or copy a formatted citation

@article{andrew2026,
  title        = {{Unpacking the Validity of Open-Ended Personality Assessments Using Fine-Tuned Large Language Models}},
  author       = {Andrew B. Speer et al.},
  journal      = {Organizational Research Methods},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1177/10944281251413746},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Unpacking the Validity of Open-Ended Personality Assessments Using Fine-Tuned Large Language Models

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.