Drawing on Gestalt theory and processing fluency theory, this study empirically tests and theoretically explains how visual and vocal human presence impact viewer engagement in nature destination marketing videos. Results from one field study and one controlled experiment show that single-modal human presence (face or voice) in nature destination marketing videos negatively impacts viewers’ engagement. However, the negative effects are attenuated when human faces and voices are presented together, enhancing processing fluency. This study advances the tourism marketing literature by highlighting the theoretical importance of the interdependencies between visual and vocal human presence and providing empirical evidence of their interaction effects. Practically, findings offer actionable insights for destination marketers to optimize multimodal human presence in destination marketing videos.