Dissociating Predictive and Postdictive Audiovisual Inference

Talk Presentation 32.16: Sunday, May 17, 2026, 10:45 am – 12:30 pm, Talk Room 1
Session: Multisensory Processing

Manda Fischer1,2 (manda.fischer@utoronto.ca), Keisuke Fukuda1,2; 1University of Toronto, 2University of Toronto Mississauga

Our brains have the remarkable ability to use contextual information to resolve perceptual uncertainty. This reliance on context has been demonstrated when information is presented before a stimulus (supporting predictive inference) and after it (supporting postdictive inference). However, it remains unclear whether these two forms of inference rely on overlapping or distinct mechanisms. We addressed this question using an audiovisual working memory task designed to test how category-level auditory cues influence visual face perception in a predictive and postdictive manner. Each trial participants (N=84) briefly viewed a face morphed along a female–male continuum (100 ms) and later reconstructed it after a short retention interval (1900 ms) using a continuous morph slider. To manipulate auditory context, either a male or female voice was presented 1000 ms before (pre-cue) or after (post-cue) face onset. Mixed-effects modeling revealed that the voice cue biased face reconstructions toward the gender of the voice in both pre- and post-cue conditions, but in distinct ways. Pre-cue presentation produced a uniform shift toward the cued gender across the continuum (cue main effect), reflecting a predictive bias. Post-cue presentation produced a gender-dependent modulation, strongest for ambiguous/moderately gendered faces (cue × face interaction), reflecting a postdictive reinterpretation of the visual input. Significant cue effects in both pre- and post-cue conditions clustered near ambiguous faces, highlighting the range where perception is most malleable. Cue effects were uncorrelated across individuals, suggesting that predictive and postdictive mechanisms are dissociable. Moreover, the influence of the auditory cue was largest when the visual face signal contributed less to the final percept, reflecting a trade-off between auditory and visual sources of information in shaping perception. Taken together, our results suggest that distinct mechanisms underlie predictive and postdictive inference, each dynamically leveraging auditory context to disambiguate visual input, especially when the fidelity of this input is low.

Acknowledgements: We thank Kayla Vasquez for their help with data collection.