Multimodal misinformation detection across diverse languages using RAG and LLMs

Sheetal Harris et al.

Journal of Intelligent Information Systems2026https://doi.org/10.1007/s10844-026-01042-xarticle
ABDC B
Weight
0.50

Abstract

The rapid spread of multimodal fake news (FN) on Online Social Networks (OSNs) threatens digital information ecosystems, particularly in low-resource languages. Existing multimodal fake news detection (FND) methods are largely limited to high-resource settings, restricting their global applicability. We propose an M&M-RAG, a Multilingual & Multimodal Retrieval-Augmented Generation framework, that leverages Large Vision-Language Models (LVLMs) and Large Language Models (LLMs) to verify news claims across English, Chinese and Urdu. M&M-RAG integrates real-time multilingual evidence retrieval, language-aware prompting, and cross-modal reasoning for fact verification. We further propose Multi-Ax-to-Grind Urdu, the first large-scale, multi-domain multimodal benchmark for FND in Urdu. Experiments on typologically diverse monolingual multimodal datasets demonstrate that M&M-RAG achieves state-of-the-art (SOTA) performance, with 94.6% accuracy and 94.2% F1 score, surpassing models such as SpotFake, MPFN, MMCFND, and Semi-FND. The proposed framework remains robust in zero-shot and cross-lingual scenarios under frozen-model inference without task-specific fine-tuning. The results underscore the scalability and interpretability of LVLM-based approaches for combating multimodal misinformation, particularly in under-represented and typologically diverse languages.

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1007/s10844-026-01042-x

Or copy a formatted citation

@article{sheetal2026,
  title        = {{Multimodal misinformation detection across diverse languages using RAG and LLMs}},
  author       = {Sheetal Harris et al.},
  journal      = {Journal of Intelligent Information Systems},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1007/s10844-026-01042-x},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

Multimodal misinformation detection across diverse languages using RAG and LLMs

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.50

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.50 × 0.4 = 0.20
M · momentum0.50 × 0.15 = 0.07
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.