Dynamics of Facial Emotion Perception Across Humans, Primate IT Cortex, and Artificial Neural Networks
Undergraduate Just-In-Time Abstract
Poster Presentation 23.356: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Undergraduate Just-In-Time 1
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Stephanie Habach1, Marcus Wong1,2, Sabine Muzellec1, Maren Wehrheim1,3, Kohitij Kar1; 1York University, Toronto, 2University of Toronto, 3Mila - Quebec AI institute, Montréal, Canada
Facial expressions unfold over time through rapidly changing visual cues. Humans can readily report when one emotion transitions to another, but how this ability relates to dynamic neural encoding and its underlying computations remains unclear. To operationalize this behavior, we developed a change-detection task quantifying perceived transition timing in dynamic facial stimuli. We generated 1260 videos (500ms) in which one expression gradually morphed into another (6 expressions, 12 identities) at varying transition times. 344 human participants reported perceived change moments using a continuous slider. To assess accuracy, we compared behavioral reports to ground-truth transition times. Although performance was reliable, correlations with ground truth (Spearman's R=0.56) fell below the noise ceiling (R=0.72), indicating systematic deviations. We hypothesized that these deviations reflect stimulus-dependent biases. To test this, we focused on a subset of videos with identical midpoint transitions, holding ground truth constant. Despite this, observers showed reliable early–late biases across stimuli (reliability = 0.48), demonstrating that perceptual timing depends on stimulus-specific dynamics rather than objective timing. We next asked whether current vision model computations capture these biases. Mid-transition videos (N=360) were presented to artificial neural networks, and time-resolved features were decoded using cross-validated linear classifiers across early and late frame segments. Although models accurately classified emotions, their inferred change times did not correlate with human behavior and showed no systematic early–late differences. We then tested neural representations by presenting the same stimuli to two passively fixating monkeys while recording IT activity (114 neurons). However, unlike ANNs, IT dynamics exhibited a systematic bias: relative emotional strength was smaller for early-biased videos and larger for late-biased videos (p = 0.031), indicating that perceptual timing reflects time-integrated differences in competing representations. Together, these findings reveal species-conserved temporal dynamics in high-level visual representations that predict human perception of expression transitions that ANNs do not yet predict.
Acknowledgements: KK is supported by Canada Research Chair (CRC-2021-00326), Brain-Canada Foundation (2023-0259), Canada First Research Excellence Funds (VISTA Program), and NSERC (RGPIN-2024-06223). KK, SH, MW are supported by SURFIN (Simons Foundation). SM, MW are funded by Connected Minds (supported by CFREF).