The Role of Audiovisual Asynchrony in Driving Cross-Modal Predictions

Poster Presentation 26.474: Saturday, May 16, 2026, 2:45 – 6:45 pm, Pavilion
Session: Multisensory Processing: Recalibration, temporal

Merve Kınıklıoğlu1 (), Daniel Kaiser1,2; 1Neural Computation Group, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Giessen, Germany, 2Center for Mind, Brain and Behavior (CMBB), Universities of Marburg, Giessen, and Darmstadt, Germany

Perception is an active process, with the brain continuously anticipating sensory input. While predictive mechanisms have been well studied within single modalities, real-world perception relies on cross-modal predictions, where information from one modality shapes expectations in another. Here, we introduce controlled temporal asynchrony between auditory and visual streams to test whether leading auditory inputs predictively modulate the neural processing of upcoming visual inputs. In an fMRI study, participants (N = 31) viewed sixteen 10-second naturalistic videos of everyday cooking actions (e.g., chopping, peeling, blending, pouring, frying, grating). Audio tracks were shifted by three seconds to temporally precede (“leading” condition) or trail (“lagging” condition) the visual input, or remained synchronized (“synchronous” condition). Participants monitored audiovisual congruency by detecting occasional mismatched sound–video pairs. Pairwise decoding of the sixteen video stimuli within each ROI was performed to assess the discriminability of stimulus-specific activity patterns across conditions. Leading sounds produced higher decoding accuracy than lagging sounds in scene-selective PPA and object-selective LO, indicating that predictive auditory cues enhance visual representations of upcoming events across visual regions. In contrast, decoding accuracy in multisensory regions (IPS and IFG) tended to be higher in the lagging than in the leading and synchronous conditions, consistent with greater demands for temporal conflict monitoring. A whole-brain comparison between the synchronous vs. asynchronous conditions revealed only partially overlapping regions, suggesting that predictive effects under asynchronous conditions are spatially dissociable from integration effects. Together, these findings demonstrate how anticipatory signals from one modality predictively shape representations in another modality.

Acknowledgements: This work was supported by the Deutsche Forschungsgemeinschaft (German Research Foundation, DFG) under Germany’s Excellence Strategy (EXC 3066/1 “The Adaptive Mind”, Project No. 533717223)