Goodbye human annotators? Content analysis of social policy debates using ChatGPT

Erwin Gielens et al.

Journal of Social Policy2025https://doi.org/10.1017/s0047279424000382article
AJG 3ABDC A
Weight
0.53

Abstract

Content analysis is a valuable tool for analysing policy discourse, but annotation by humans is costly and time consuming. ChatGPT is a potentially valuable tool to partially automate content analysis for policy debates, largely replacing human annotators. We evaluate ChatGPT’s ability to classify documents using pre-defined argument descriptions, comparing its performance with human annotators for two policy debates: the Universal Basic Income debate on Dutch Twitter (2014–2016) and the pension reforms debate in German newspapers (1993–2001). We use the API (GPT-4 Turbo) and user interface version (GPT-4) and evaluate multiple performance metrics (accuracy, precision and recall). ChatGPT is highly reliable and accurate in classifying pre-defined arguments across datasets. However, precision and recall are much lower, and vary strongly between arguments. These results hold for both datasets, despite differences in language and media type. Moreover, the cut-off method proposed in this paper may aid researchers in navigating the trade-off between detection and noise. Overall, we do not (yet) recommend a blind application of ChatGPT to classify arguments in policy debates. Those interested in adopting this tool should manually validate bot classifications before using them in further analyses. At least for now, human annotators are here to stay.

8 citations

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1017/s0047279424000382

Or copy a formatted citation

@article{erwin2025,
  title        = {{Goodbye human annotators? Content analysis of social policy debates using ChatGPT}},
  author       = {Erwin Gielens et al.},
  journal      = {Journal of Social Policy},
  year         = {2025},
  doi          = {https://doi.org/https://doi.org/10.1017/s0047279424000382},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Goodbye human annotators? Content analysis of social policy debates using ChatGPT

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.53

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.70 × 0.15 = 0.10
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.