Examination of ChatGPT’s Performance as a Data Analysis Tool

Duygu Koçak

Educational and Psychological Measurement2025https://doi.org/10.1177/00131644241302721article
ABDC A
Weight
0.52

Abstract

This study examines the performance of ChatGPT, developed by OpenAI and widely used as an AI-based conversational tool, as a data analysis tool through exploratory factor analysis (EFA). To this end, simulated data were generated under various data conditions, including normal distribution, response category, sample size, test length, factor loading, and measurement models. The generated data were analyzed using ChatGPT-4o twice with a 1-week interval under the same prompt, and the results were compared with those obtained using R code. In data analysis, the Kaiser-Meyer-Olkin (KMO) value, total variance explained, and the number of factors estimated using the empirical Kaiser criterion, Hull method, and Kaiser-Guttman criterion, as well as factor loadings, were calculated. The findings obtained from ChatGPT at two different times were found to be consistent with those obtained using R. Overall, ChatGPT demonstrated good performance for steps that require only computational decisions without involving researcher judgment or theoretical evaluation (such as KMO, total variance explained, and factor loadings). However, for multidimensional structures, although the estimated number of factors was consistent across analyses, biases were observed, suggesting that researchers should exercise caution in such decisions.

7 citations

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1177/00131644241302721

Or copy a formatted citation

@article{duygu2025,
  title        = {{Examination of ChatGPT’s Performance as a Data Analysis Tool}},
  author       = {Duygu Koçak},
  journal      = {Educational and Psychological Measurement},
  year         = {2025},
  doi          = {https://doi.org/https://doi.org/10.1177/00131644241302721},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Examination of ChatGPT’s Performance as a Data Analysis Tool

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.52

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.47 × 0.4 = 0.19
M · momentum0.68 × 0.15 = 0.10
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.