Machine Learning Meets Tax Fraud: Insights from Slovakia

Eduard Baumöhl et al.

Journal of Economics / Ekonomicky casopis2025https://doi.org/10.31577/ekoncas.2025.05-06.01article
ABDC B
Weight
0.50

Abstract

One of the most intriguing topics in the field of corporate finance is the detection of tax fraud. We consider a unique dataset of outcomes from Slovak tax authority audits, obtaining valuable insights into verified instances of tax manipulation and avoiding the misclassification problem that is common in this stream of literature. We apply artificial neural networks, random forests, XGBoost, and support vector machines to verify the extent to which we can classify tax manipulators on the basis of publicly available financial statement indicators. Our results show that the XGBoost model demonstrated the highest effectiveness, achieving an F1 score of 0.75 in the full sample, slightly lower scores within the industry groups, and excellent results in sector A – Agriculture, with an F1 score of 0.85. Our results indicate that the use of nowadays commonly known machine learning methods along with standard financial variables can provide a useful tool for tax fraud detection and, as such, can contribute to higher efficiency of tax audits.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.31577/ekoncas.2025.05-06.01

Or copy a formatted citation

@article{eduard2025,
  title        = {{Machine Learning Meets Tax Fraud: Insights from Slovakia}},
  author       = {Eduard Baumöhl et al.},
  journal      = {Journal of Economics / Ekonomicky casopis},
  year         = {2025},
  doi          = {https://doi.org/https://doi.org/10.31577/ekoncas.2025.05-06.01},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Machine Learning Meets Tax Fraud: Insights from Slovakia

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.