GD-YOLOv8 for construction safety: dual-modality (RGB–IR) fusion with lightweight Ghost/DWConv enhancements for worker, machinery, and crane detection

Jie Chen & Yunzhong Cao

Engineering, Construction and Architectural Management2026https://doi.org/10.1108/ecam-07-2025-1095article

AJG 1ABDC A

Weight

0.50

What the paper says

Purpose The construction industry faces challenges such as high safety risks and a lack of effective technical means for safety management. In complex low-light scenes, achieving accurate and efficient object detection is a key issue for improving construction safety and management intelligence, and it is crucial for enhancing intelligent perception and information processing in construction environments. This paper aims to propose a lightweight model that can achieve efficient and accurate object detection in complex low-light construction environments and verify its performance advantages under multi-source image fusion and on-site deployment conditions. Design/methodology/approach This paper proposes a lightweight object detection model named GD-YOLOv8 that integrates multi-source image information. First, the ORB image registration algorithm is used to align the visible and infrared images. The information of two modalities is combined through a weighted average fusion method, with the weighting coefficient set to 0.43 based on experimental verification, constructing a dual-modality dataset containing 5,928 images, which are divided into training, validation, and test sets in a ratio of 7:2:1, with 4,149 images for training, 1,186 for validation, and 593 for testing. In terms of model architecture, the framework is built based on YOLOv8s, incorporating GhostNet and Depthwise Convolution (DWConv) modules to effectively reduce model complexity and meet the requirements of resource-constrained on-site deployment. The code and data can be provided upon request. Findings Experimental results show that compared with the baseline model YOLOv8s, GD-YOLOv8 reduces GFLOPs by 30.6%, parameters by 32.2%, and the model size by 31.6%, while improving detection precision by 3.1%, mAP@0.5 by 0.5%, and slightly decreasing mAP@0.5:0.95 by 0.8%. The Average Precision (AP) for workers, construction machinery, and tower cranes reached 99.5%, 99.5%, and 99.3%, respectively, achieving efficient recognition in complex low-light environments. Originality/value This framework provides a new intelligent solution for smart management on construction sites. On one hand, multi-source image fusion enhances the perception capability of engineering entities and enables the structured knowledge representation of detection results. On the other hand, the proposed lightweight object detection model offers improved real-time performance, precision, and deployment adaptability, meeting the application requirements in complex construction environments.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.