Images of facial expressions with harder to reconstruct representations are evaluated and remembered as more intensely emotional

Poster Presentation 56.306: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Face and Body Perception: Emotion

Nicholas Fagan1 (), Qi Lin2, James Gross3, Amit Goldenberg4, Steve Chang1, Ilker Yildirim1; 1Yale University, 2RIKEN Institute, 3Stanford University, 4Harvard University

If faces hold a special place in our minds, emotional faces stand out as extraordinary. Not only do faces expressing emotions engage rich inferences that deeply impact how we navigate the social world, they also lead to greater attention, better memory, and faster reactions. Yet, the computational basis of how we evaluate and remember emotionality of faces remains unaddressed. Here, we hypothesize that instead of domain-specific mechanisms optimized for emotional expressions, processing the emotionality of faces arises from a general mechanism of a perception-to-memory interface. To test this hypothesis, we use a recent computational formulation of the classic level-of-processing theory — the idea that memory strength is determined by the depth of perceptual analysis. This formulation used a sparse coding model (SPC) to compress feature embeddings of natural scene images, and showed that images with harder to reconstruct representations are more memorable. Here we train the SPC model to compress feature embeddings of images of faces, with the feature embeddings obtained from a pretrained face recognition deepnet. We hypothesized that the remembered emotionality of faces would be positively correlated with the depth of processing, operationalized by the magnitude of reconstruction residuals of individual face images under this model. We tested this prediction by exploiting a recent phenomenon of emotionality judgments: Humans overestimate the average emotionality of sequentially presented faces. We find that this “sequential amplification” effect falls from our model: Simply averaging the reconstruction error of the representations of faces in a sequence reproduces the amplification effect, including finer-grained effects of valence (greater amplification for negative vs. positive). Moreover, reconstruction error correlates highly with the emotionality of face sequences and singletons. Crucially, these results don’t occur using vision-only models (i.e., without reconstruction error) or non-face deepnets. These results ground evaluation and remembering of emotional expressions in a general perception-to-memory interface.