A Comparative Study of Convolutional Neural Networks and Vision Transformers for Agricultural Image Classification

Hieu Pham et al.

Journal of Database Management2026https://doi.org/10.4018/jdm.396706article
AJG 1ABDC A
Weight
0.50

Abstract

Digital agriculture is a growing field where complex systems are utilized for computer vision applications to aid decision-making. This work provides a comprehensive comparison of the applications of Convolutional Neural Networks and Vision Transformers, in the agricultural sector, specifically focusing on image classification. An extensive evaluation and comparison of both modeling frameworks is conducted with 17 different deep learning models. Using five distinct agricultural datasets comprising various image classification tasks, this work highlights the strengths and limitations of each architectural approach. Specifically, transformer variants like the Swin Transformer Version 2 excel in accuracy and the ability to capture complex patterns due to their attention-based mechanisms. In contrast, models Residual Networks offer a balance between computational efficiency and performance, making them suitable for scenarios with limited computational resources. The findings highlight the importance of selecting an appropriate deep learning model based on specific agricultural tasks.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.4018/jdm.396706

Or copy a formatted citation

@article{hieu2026,
  title        = {{A Comparative Study of Convolutional Neural Networks and Vision Transformers for Agricultural Image Classification}},
  author       = {Hieu Pham et al.},
  journal      = {Journal of Database Management},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.4018/jdm.396706},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

A Comparative Study of Convolutional Neural Networks and Vision Transformers for Agricultural Image Classification

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.