ChatGPT-4: Can It Handle Real-World Accounting Cases?
Theresa F. Henry et al.
Abstract
In this study, we assess ChatGPT-4’s ability in terms of accuracy, appropriateness of support, and consistency by applying it to a sizable number of case questions within the Deloitte Trueblood Case Study series. We contribute to the literature in three ways. First, we evaluate ChatGPT-4 on its ability to provide open-ended responses to realistic case study questions. Second, we ask ChatGPT-4 to not only answer the case questions but also provide the appropriate support from the relevant FASB standard. Finally, we run the case questions through ChatGPT-4 three times, therefore assessing the variation in ChatGPT-4 responses. Our results show that ChatGPT-4’s ability to accurately answer and support the case questions with consistency is not at levels that would be expected by accounting professionals. ChatGPT-4’s performance indicates that it may not be ready for more advanced accounting applications or even be relied upon for supplementary support by an accountant.
1 citation
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.16 × 0.4 = 0.06 |
| M · momentum | 0.53 × 0.15 = 0.08 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.