A Comparative Study of Convolutional Neural Networks and Vision Transformers for Agricultural Image Classification
Hieu Pham et al.
Abstract
Digital agriculture is a growing field where complex systems are utilized for computer vision applications to aid decision-making. This work provides a comprehensive comparison of the applications of Convolutional Neural Networks and Vision Transformers, in the agricultural sector, specifically focusing on image classification. An extensive evaluation and comparison of both modeling frameworks is conducted with 17 different deep learning models. Using five distinct agricultural datasets comprising various image classification tasks, this work highlights the strengths and limitations of each architectural approach. Specifically, transformer variants like the Swin Transformer Version 2 excel in accuracy and the ability to capture complex patterns due to their attention-based mechanisms. In contrast, models Residual Networks offer a balance between computational efficiency and performance, making them suitable for scenarios with limited computational resources. The findings highlight the importance of selecting an appropriate deep learning model based on specific agricultural tasks.
Evidence weight
Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40
| F · citation impact | 0.50 × 0.4 = 0.20 |
| M · momentum | 0.50 × 0.15 = 0.07 |
| V · venue signal | 0.50 × 0.05 = 0.03 |
| R · text relevance † | 0.50 × 0.4 = 0.20 |
† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.