Can Textual Disclosures Explain Fraudulent Financial Statements? Evidence Based on the Performance Comparison of Machine Learning Models from Japan

Masumi Nakashima & Keisuke Yoshida

Journal of Forensic Accounting Research2025https://doi.org/10.2308/jfar-2024-002article
AJG 2ABDC B
Weight
0.50

Abstract

This study aims to utilize machine learning in the detection of accounting fraud in textual information, focusing on the analysis performance and interpretability of the model. This study considers a manager’s motivation to conceal the fraudulent financial statement for content functions by applying the obfuscation hypothesis, for text functions by applying the information manipulation theory and for interpersonal functions by applying interpersonal deception theory. The analysis shows that the rates of katakana and alphabet characters are higher in fraudulent firms than in nonfraudulent firms, supporting the obfuscation hypothesis. In addition, the rates of numbers and proper nouns are lower in fraudulent firms than in nonfraudulent firms, supported by the information manipulation theory. Furthermore, when the performance of fraud detection models (decision tree, random forest, XGBoost, LightGBM, CatBoost) is compared, it was found that CatBoost had the highest performance. Data Availability: Data are available from sources identified in the paper. JEL Classifications: M41; M42; C45; C55; K42.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.2308/jfar-2024-002

Or copy a formatted citation

@article{masumi2025,
  title        = {{Can Textual Disclosures Explain Fraudulent Financial Statements? Evidence Based on the Performance Comparison of Machine Learning Models from Japan}},
  author       = {Masumi Nakashima & Keisuke Yoshida},
  journal      = {Journal of Forensic Accounting Research},
  year         = {2025},
  doi          = {https://doi.org/https://doi.org/10.2308/jfar-2024-002},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Can Textual Disclosures Explain Fraudulent Financial Statements? Evidence Based on the Performance Comparison of Machine Learning Models from Japan

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.