Lightweight query-adaptive RAG framework for knowledge support in smart manufacturing

Tianyu Zhou et al.

Journal of Manufacturing Systems2026https://doi.org/10.1016/j.jmsy.2026.04.016article

AJG 1ABDC B

Weight

0.50

What the paper says

As smart manufacturing moves beyond automated execution towards knowledge-intensive decision support, efficiently integrating dispersed domain knowledge has become a key challenge. Retrieval-augmented generation (RAG)–based large language models (LLMs) provide a practical approach to incorporating external knowledge. They have been increasingly applied in on-site manufacturing scenarios, where decision processes usually require frequent human–AI interactions and rapid responses. In this context, response latency and the cost of knowledge utilisation impose constraints on deployment. However, conventional RAG methods typically adopt unified, static pipelines for retrieval and generation, which tend to introduce redundant retrieval and contextual overhead when handling varied queries. Therefore, this paper proposes LiteRAG, a lightweight query-adaptive framework that integrates semantic query classification with adaptive retrieval to optimise the necessity and scope of external knowledge. Experimental results show that LiteRAG reduces average token and latency consumption by nearly 50% compared with fixed retrieval RAG, while moderately improving response quality in flexible grinding tasks. The gains are mainly observed in queries requiring retrieval augmentation, reaching about 15–20%. Moreover, the cross-model evaluation indicates that medium-scale models Qwen3_8B and Qwen3_14B achieve around 90% of the performance of a 30B model under this framework, reflecting favourable resource efficiency and scalability. These results demonstrate that LiteRAG provides a deployable solution for balancing knowledge utilisation and deployment efficiency in LLM applications for smart manufacturing. • Proposed a lightweight RAG integrating query classification and adaptive retrieval. • LiteRAG can halve token/latency and lower intrinsic-query cost by over 80% vs RAG. • LiteRAG lifts augmented output 15–20% over RAG and keeps intrinsic responses stable. • Query classification and adaptive retrieval can regulate activation and context depth. • It shows model scaling affects intrinsic and augmented queries at different stages.

Open paper page →

Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact	0.50 × 0.4 = 0.20
M · momentum	0.50 × 0.15 = 0.07
V · venue signal	0.50 × 0.05 = 0.03
R · text relevance †	0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.