Forensic investigation of suspected money laundering activities over the Ethereum blockchain: a machine learning approach
Henrique Yassuyuki Tsuboi et al.
Abstract
Purpose This study aims to provide an alternative machine learning model to more quickly and efficiently detect addresses on the Ethereum network suspected of involvement in fraudulent activities. Design/methodology/approach This study performed a machine learning technique known as LightGBM. The machine learning model is trained by using a dataset that identifies licit or illicit addresses on the Ethereum network. This study then applies the trained model to predict the probability that a new transaction should be classified as suspicious for money laundering. Findings Through a set of performance metrics, we show that our model outperforms machine learning models from previous studies, better predicting suspicious money laundering activities. The most relevant attributes in identifying an illicit transaction are: (i) the time difference between the first and last activity of the crypto wallet (a short “lifetime” of the address); (ii) the total number of transactions (accounts used only once or a few times) and (iii) the difference in the distribution of values between the crypto wallets (low values). Research limitations/implications Machine learning techniques have great potential to contribute to the activities of government agents, regulatory authorities and accounting professionals. Practical implications This study adds another tool to combat money laundering, which could lead to improvements in auditing and forensic accounting procedures. This study may be of special interest to regulators and policymakers in their anti-money-laundering roles. Originality/value This study adopts a modern technique that can be considered a valuable tool in identifying and combating fraudulent activities on blockchain networks.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.