Text as data for evaluation: Natural language processing and large language models to generate novel insights from unstructured text data

Thomas Wencker et al.

Evaluation2025https://doi.org/10.1177/13563890251330911article
AJG 2ABDC B
Weight
0.41

Abstract

Policy formulation and implementation generate large volumes of text. However, since reading all relevant sources is often impossible, evaluators must navigate the complexities of selecting the appropriate technology to efficiently extract meaningful information from growing amounts of unstructured text. Text mining blends interpretative and statistical methods to generate novel insights, potentially contributing to evidence-based policy-making. At the same time, biases, a potential lack of accuracy, explainability, and transparency create ethical concerns and make it necessary to combine natural language processing and human judgment to avoid over-reliance on the capabilities of these methods and, in particular, large language models. This article provides practical guidance on how evaluators can use natural language processing to convert unstructured data from text to structured data. It presents a decision framework that accounts for the characteristics of the data, the nature of the task, and the expected results, facilitating the selection of the appropriate technique.

2 citations

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1177/13563890251330911

Or copy a formatted citation

@article{thomas2025,
  title        = {{Text as data for evaluation: Natural language processing and large language models to generate novel insights from unstructured text data}},
  author       = {Thomas Wencker et al.},
  journal      = {Evaluation},
  year         = {2025},
  doi          = {https://doi.org/https://doi.org/10.1177/13563890251330911},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Text as data for evaluation: Natural language processing and large language models to generate novel insights from unstructured text data

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.41

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.25 × 0.4 = 0.10
M · momentum0.55 × 0.15 = 0.08
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.