The challenges of developing AI text detectors, and their moral implications
Rahman Sharifzadeh
Abstract
Purpose The advancement of more efficient Large Language Models (LLMs) has significantly impacted academic writing and publishing. These advancements have also raised ethical and professional concerns about the misuse of LLMs, such as machine ghostwriting. As a result, institutions, universities, publishers and others are increasingly seeking to distinguish human-written texts from machine-written ones using various artificial intelligence (AI) text detectors, often developed by the same companies creating LLMs. However, numerous studies have demonstrated that these technologies are not reliably effective in action. This raises the question: can more efficient detectors be developed to reliably differentiate between human-written and machine-written texts? The analysis of the mechanisms behind LLMs and detectors reveals some challenges. Because both LLMs and detectors rely on machine learning and natural language processing, this shared foundation, this study argues, cannot produce two efficient technologies with opposing functions. This result highlights the limited role of detectors in ethically significant decision-makings within academic contexts, especially looking ahead. Therefore, the purpose of this article is to show why AI detectors are ineffective in detecting machine ghostwriting and to show that this ineffectiveness is not a temporary issue. It also underscores the necessity of exploring alternative solutions to address the issue of machine ghostwriting. Design/methodology/approach This paper is a philosophical research. It delves into the literature of AI text detectors, and then engages in augmentation and analysis. Finally, it discusses some moral considerations toward the detectors. Findings This study will draw two conclusions: first, that the detector tools are deemed unreliable and inefficient by most researchers who have empirically tested them; second, that this inefficiency is not a temporary issue that can be resolved with the development of more efficient AI detectors. Originality/value Most researchers view the unreliability of text detectors as a technical issue: better detectors need to be developed using improved detection models to more accurately distinguish human-written texts from machine-written ones. This study argues that this approach to developing more efficient detectors is challenging.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.