Can Textual Disclosures Explain Fraudulent Financial Statements? Evidence Based on the Performance Comparison of Machine Learning Models from Japan

Masumi Nakashima & Keisuke Yoshida

Journal of Forensic Accounting Research2025https://doi.org/10.2308/jfar-2024-002article

AJG 2ABDC B

Weight

0.50

What the paper says

This study aims to utilize machine learning in the detection of accounting fraud in textual information, focusing on the analysis performance and interpretability of the model. This study considers a manager’s motivation to conceal the fraudulent financial statement for content functions by applying the obfuscation hypothesis, for text functions by applying the information manipulation theory and for interpersonal functions by applying interpersonal deception theory. The analysis shows that the rates of katakana and alphabet characters are higher in fraudulent firms than in nonfraudulent firms, supporting the obfuscation hypothesis. In addition, the rates of numbers and proper nouns are lower in fraudulent firms than in nonfraudulent firms, supported by the information manipulation theory. Furthermore, when the performance of fraud detection models (decision tree, random forest, XGBoost, LightGBM, CatBoost) is compared, it was found that CatBoost had the highest performance. Data Availability: Data are available from sources identified in the paper. JEL Classifications: M41; M42; C45; C55; K42.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.