PO-GUISE+: Pose and Object Guided Transformer Token Selection for Efficient Driver Action Recognition

Ricardo Pizarro et al.

IEEE Transactions on Intelligent Transportation Systems2026https://doi.org/10.1109/tits.2026.3665254preprint
ABDC A
Weight
0.37

Abstract

We address the task of identifying distracted driving by analyzing in-car videos using efficient transformers. Although transformer models have achieved outstanding performance in human action recognition tasks, their high computational costs limit their application onboard a vehicle. We introduce PO-GUISE+, a multi-task video transformer that, given an input clip, predicts the distracted driving action, the driver’s pose, and the interacting object. Our enhanced features for token selection are specifically adapted to driver actions by leveraging information about object interaction and the driver’s pose. With PO-GUISE+, we significantly reduce the model’s computational demands while maintaining or improving baseline accuracy across various computational budgets. Additionally, to evaluate our model’s performance in real-world scenarios, we have developed benchmarks on a Jetson computing platform, demonstrating its effectiveness across different configurations and computational budgets. Our model outperforms current state-of-the-art results on the Drive&Act, 100-Driver, and 3MDAD datasets, while having superior efficiency compared to existing video transformer-based methods.

1 citation

Open via your library →

Cite this paper

https://doi.org/https://doi.org/10.1109/tits.2026.3665254

Or copy a formatted citation

@article{ricardo2026,
  title        = {{PO-GUISE+: Pose and Object Guided Transformer Token Selection for Efficient Driver Action Recognition}},
  author       = {Ricardo Pizarro et al.},
  journal      = {IEEE Transactions on Intelligent Transportation Systems},
  year         = {2026},
  doi          = {https://doi.org/https://doi.org/10.1109/tits.2026.3665254},
}

Paste directly into BibTeX, Zotero, or your reference manager.

Flag this paper

PO-GUISE+: Pose and Object Guided Transformer Token Selection for Efficient Driver Action Recognition

Flags are reviewed by the Arbiter methodology team within 5 business days.


Evidence weight

0.37

Balanced mode · F 0.40 / M 0.15 / V 0.05 / R 0.40

F · citation impact0.16 × 0.4 = 0.06
M · momentum0.53 × 0.15 = 0.08
V · venue signal0.50 × 0.05 = 0.03
R · text relevance †0.50 × 0.4 = 0.20

† Text relevance is estimated at 0.50 on the detail page — for your query’s actual relevance score, open this paper from a search result.