Multisensory Processing
Talk Session: Sunday, May 17, 2026, 10:45 am – 12:30 pm, Talk Room 1
Moderator: Stephanie Badde, Tufts University
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Talk 1, 10:45 am, 32.11
Disrupted Sensory Reweighting for Postural Control in Glaucoma
Rakie Cham1,2, Galen Holland2, Michelle Harter2, Ian Conner2, Mark Redfern1; 1Department of Bioengineering, University of Pittsburgh, 2Department of Ophthalmology, University of Pittsburgh
Glaucoma is associated with increased falls and loss of independence, but vision loss alone does not fully explain the elevated fall risk. Mobility impairments, particularly balance and gait deficits, likely reflect altered central sensory integration. A key candidate mechanism is abnormal sensory reweighting for postural control—how the nervous system adjusts its reliance on visual, vestibular, and somatosensory cues when some inputs become unreliable. Prior work shows that people with glaucoma sway more than controls during standing, especially when somatosensory input is degraded (e.g., standing on foam), but whether they appropriately upweight vision under these conditions is unknown. In this study, we directly tested a sensory-reweighting mechanism in glaucoma. Twenty participants were enrolled in three groups: adults with glaucoma, older controls, and young controls. Participants completed an adapted Sensory Organization Test (SOT) during quiet standing on firm and foam surfaces with eyes open and eyes closed. The primary outcome was a normalized visual reliance metric: the ratio of center-of-pressure path length in eyes-closed versus eyes-open conditions, computed separately for firm and foam surfaces. All participants successfully completed the balance assessments. Consistent with classic sensory-reweighting theory, young and old controls increased their reliance on vision when somatosensory information was unreliable (foam). In contrast, adults with glaucoma did not increasingly rely on visual information to maintain balance on foam. These findings provide mechanistic evidence that glaucoma disrupts sensory reweighting for postural control, rather than simply reducing the amount of visual input. Identifying a central sensory integration deficit has direct clinical implications: it motivates development of targeted sensory retraining in occupational/physical therapy interventions. Such interventions should explicitly train more effective use of remaining visual and nonvisual cues to reduce falls and preserve independence in glaucoma.
The Henry L. Hillman Foundation
Talk 2, 11:00 am, 32.12
Multisensory audiovisual coding in the human hippocampus
Omri Raccah1 (omri.raccah@yale.edu), Aryan Agarwal1, Yannan Zhu1, Nicholas Turk-Browne1; 1Yale University
The human hippocampus is traditionally considered a brain region dedicated to long-term memory. An emerging literature suggests that it may also contribute to several aspects of vision, including eye movements, mental imagery, scene discrimination, and visual search. However, the hippocampus receives anatomical inputs from virtually all sensory systems, raising the question of how it contributes to the processing of sensory modalities beyond vision and to the integration of visual information with other modalities. Using high-resolution fMRI (N = 30), we examined unisensory and multisensory representations of audiovisual movies across subfields and the longitudinal axis of the human hippocampus. Participants were exposed to multiple repetitions of short naturalistic movie clips, each presented in four formats: audio only, visual only, congruent audiovisual, and incongruent audiovisual (audio and video from different movies). Although univariate analyses detected only visual responses across hippocampal subfields, with no activation for auditory stimuli and no benefit of congruent multisensory stimulation, multivariate analyses revealed representations of both auditory and visual scenes. The posterior hippocampus showed enhanced pattern similarity for congruent stimuli relative to unisensory stimuli, demonstrating multisensory facilitation. Incongruent stimuli similarly showed reduced pattern similarity relative to congruent stimuli, providing additional evidence of multisensory integration. The anterior hippocampus supported cross-modal decoding, suggesting a common representation of auditory and visual information. Finally, whole-brain searchlight analyses revealed parallel effects in cortical regions known to support multisensory integration. These findings advance our understanding of multisensory coding in the human hippocampus, including the discovery of a functional dissociation along its longitudinal axis, from distinct but interactive representations in posterior, to a shared amodal representation in anterior.
NIH F32 EY035941; NIH R01 MH069456
Talk 3, 11:15 am, 32.13
Two cortical mechanisms for natural audiovisual processing
Subha Nawer Pushpita1 (spushpit@andrew.cmu.edu), Leila Wehbe1; 1Carnegie Mellon University
Understanding how human brains process naturalistic audiovisual information remains a central challenge in cognitive neuroscience. Progress has been limited by difficulties of modeling complex audiovisual features - most prior work has relied on short, controlled stimuli or a single modality, leaving real-world comprehension mechanisms poorly characterized. Although recent advances in AI have enabled the extraction of high-quality features, how cortical regions dynamically process auditory and visual information as time unfolds remains largely unexplored. Using large-scale fMRI data collected while participants watched movies, we developed two novel computational approaches relying on prediction performance to map moment-by-moment sensory dynamics across cortex: one detects sustained periods when one modality predicts a region substantially better than the other, identifying regions that switch the modality they encode for meaningful stretches of time; the other isolates periods when both modalities predict well, revealing regions maintaining balanced encoding of both modalities. These approaches reveal two mechanisms of audiovisual processing: a pair of “bows” that switch modality—one posterior bow encircling category-selective visual cortex and another anterior bow spanning dorso-lateral frontal areas—and an arrow-like axis of bimodally predicted regions extending from lateral occipital into temporal cortex. We validate these findings in two ways. First, human raters confirm that scenes identified as audio or video-dominated by our method are ones in which they rely more heavily on the corresponding modality to understand the content. Second, we develop an audio-visual transformer that predicts fMRI responses, and its learned time-dependent attention-distribution over audio and visual inputs across regions matches those identified through our statistical frameworks. The coexistence of these systems suggests a cortical architecture that adaptively reweights sensory inputs while maintaining balanced multimodal representations, supporting robust comprehension of complex natural events. More broadly, this work shows how naturalistic neuroimaging paired with modern machine learning can reveal new principles of dynamic audiovisual processing.
NSF CAREER grant 2237064
Talk 4, 11:30 am, 32.14
Improved time perception from the integration of visual and auditory cues
Anthony Bruno1 (anthony_bruno@brown.edu), Jovan Kemp2, Fulvio Domini1, Leslie Welch1; 1Brown University, 2NYU Abu Dhabi
Humans integrate auditory and visual information to form a coherent perception of objects and events. Improvement from integrating two sources of information is greatest when performance between cues is comparable. In time perception, discrete stimuli like flashes and tones are often used to define time intervals, but discrimination performance is dramatically better in audition compared to vision (e.g., Burr, Banks, & Morrone, 2009). Prior attempts to measure cue integration relied on a weakened auditory cue (Burr et al., 2009; Hartcher-O’Brien, Di Luca, & Ernst, 2014). Equating the strength of auditory and visual cues can be challenging, which begs the question of whether discrete stimuli are appropriate for evaluating cue integration. Previously, we showed that a continuous bouncing stimulus improved visual time perception relative to a discrete flashing stimulus (Bruno & Welch, 2022). Performance with a bouncing stimulus was comparable to a continuous auditory stimulus, a tone that modulated in intensity to define a time interval. In the present experiment, participants completed a time interval discrimination task with continuous visual, auditory, and audiovisual stimuli. Performance with the audiovisual stimulus was significantly better than with either of the single-cue stimuli. To investigate the mechanism behind the integration, we compared audiovisual performance to two cue integration models. The models consider single-cue performance as the sum of sensory and memory noise. One model predicts that auditory and visual duration estimates are represented separately and then integrated, while the other predicts that auditory and visual event-markers are integrated from which a single duration estimate is produced (Hartcher-O’Brien et al., 2014). Performance with the audiovisual stimulus more closely resembled the prediction from the model that integrates the single-cue duration estimates. This suggests that the integration mechanism brings together separate visual and auditory duration estimates to result in improved discrimination performance.
Talk 5, 11:45 am, 32.15
(Micro-)Saccadic Eye Movements Alter Auditory and Tactile Temporal Order Perception
Stephanie Badde1, Vanalata Bulusu1; 1Tufts University
Inferences about the causality of two events fundamentally rely on their temporal order. Causally linked events might be registered through different senses, suggesting that this distinct area of temporal perception is governed by a supramodal mechanism. To scrutinize this hypothesis, we tested whether oculomotor actions alter the perceived order of auditory and tactile events, as is the case for visual events. Participants judged the spatiotemporal order of pairs of auditory and tactile stimuli presented sequentially in different locations. Simultaneously, they either made saccades following a target stimulus that moved from one side of a distant screen to the other every four seconds (Expts. 1-3) or they fixated a stationary, central target that was presented alone (Expts. 4-5) or with the moving target, which’s position they indicated verbally (Expts. 6-7). Participants tended to perceive the spatiotemporal order of auditory and tactile stimulus pairs presented around the onset of a saccade as reversed. This perceptual reversal of temporal order was clearly linked to the saccades: it did not occur for stimulus pairs presented at other times or for those combined with a verbal task. The saccadic suppression effect also emerged for tactile stimulus pairs that elicited apparent radial rather than horizontal motion. Moreover, when participants fixated a stationary target, auditory, tactile, and visual (Expt. 8) spatiotemporal perception was distorted around the onset of microsaccadestiny, involuntary saccades that observers are typically not aware of–further ruling out cognitive variables linked to the guided saccades, such as perceptual and processing load, as a source of the effect. These novel effects of (micro-)saccades on the perceived order of auditory and tactile events demonstrate that action and perception are linked across modalities and reveal supramodal encoding of temporal order.
Talk 6, 12:00 pm, 32.16
Dissociating Predictive and Postdictive Audiovisual Inference
Manda Fischer1,2 (manda.fischer@utoronto.ca), Keisuke Fukuda1,2; 1University of Toronto, 2University of Toronto Mississauga
Our brains have the remarkable ability to use contextual information to resolve perceptual uncertainty. This reliance on context has been demonstrated when information is presented before a stimulus (supporting predictive inference) and after it (supporting postdictive inference). However, it remains unclear whether these two forms of inference rely on overlapping or distinct mechanisms. We addressed this question using an audiovisual working memory task designed to test how category-level auditory cues influence visual face perception in a predictive and postdictive manner. Each trial participants (N=84) briefly viewed a face morphed along a female–male continuum (100 ms) and later reconstructed it after a short retention interval (1900 ms) using a continuous morph slider. To manipulate auditory context, either a male or female voice was presented 1000 ms before (pre-cue) or after (post-cue) face onset. Mixed-effects modeling revealed that the voice cue biased face reconstructions toward the gender of the voice in both pre- and post-cue conditions, but in distinct ways. Pre-cue presentation produced a uniform shift toward the cued gender across the continuum (cue main effect), reflecting a predictive bias. Post-cue presentation produced a gender-dependent modulation, strongest for ambiguous/moderately gendered faces (cue × face interaction), reflecting a postdictive reinterpretation of the visual input. Significant cue effects in both pre- and post-cue conditions clustered near ambiguous faces, highlighting the range where perception is most malleable. Cue effects were uncorrelated across individuals, suggesting that predictive and postdictive mechanisms are dissociable. Moreover, the influence of the auditory cue was largest when the visual face signal contributed less to the final percept, reflecting a trade-off between auditory and visual sources of information in shaping perception. Taken together, our results suggest that distinct mechanisms underlie predictive and postdictive inference, each dynamically leveraging auditory context to disambiguate visual input, especially when the fidelity of this input is low.
We thank Kayla Vasquez for their help with data collection.
Talk 7, 12:15 pm, 32.17
Biological Motion as a Multisensory Signal: Predictive Integration of Space and Time
Melissa Nur Robinson1, Ataol Burak Ozsu1,2, Andreas Treske1, Ufuk Onen1, Burcu A. Urgen1; 1Bilkent University, 2University College London
The brain receives input from sensory organs that is often incomplete and noisy, making prioritization and integration of key features across modalities a non-trivial process supporting adaptive behavior. Biological motion is a particularly salient signal, robustly processed across the lifespan due to its survival relevance and social significance. While visual processing of biological motion has been extensively studied, its multimodal nature has received comparatively little attention. Prior multisensory studies have largely focused on temporal coupling between visual and auditory cues, yet in naturalistic contexts, audio-visual signals are also spatially aligned. Here, we investigated how temporal and spatial coupling interact to shape biological motion perception using a novel stimulus set with spatially dynamic auditory footstep sequences. We conducted two experiments to test whether biological motion selectively enhances temporal binding and whether this effect is modulated by spatial congruency between auditory stimuli and visual motion. In Experiment 1, participants viewed upright or scrambled point-light walkers paired with temporally synchronous or asynchronous footsteps and judged whether the trials were synchronous. In Experiment 2, we extended this design by additionally manipulating spatial congruency between the walker’s direction and the auditory motion trajectory, where the auditory motion trajectory either matched the walker direction (congruent) or did not match (incongruent). Our results indicate that both temporal and spatial alignment influence synchrony judgments between modalities, and critically, that spatial coupling enhances synchrony detection when auditory spatial information is misaligned. These findings highlight the importance of considering multiple dimensions of audiovisual integration in biological motion perception and provide new insights into how the brain flexibly integrates multisensory information.