Face and Body Perception: Emotion, identity

Talk Session: Sunday, May 17, 2026, 2:30 – 4:15 pm, Talk Room 2
Moderator: Nancy Kanwisher, MIT

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

This session was recorded. Log In to view the video.

Talk 1, 2:30 pm, 34.21

Macaques in action: realistic individualized 3D monkey avatars with naturalistic motion

Lucas Martini¹ (lucas.martini@uni-tuebingen.de), Alexander Lappe¹, Anna Bognár², Rufin Vogels², Martin A. Giese¹; ¹University of Tuebingen, ²KU Leuven

Realistic and well-controlled synthetic stimuli have been crucial for studying the mechanisms of face perception in monkeys. To investigate the neurocomputational processes underlying body perception, an equivalent level of stimulus control is desirable, especially for dynamic bodies. However, this goal is challenged by the complexity of recording realistic movements without markers and modeling dynamic 3D shape deformations of bodies in these animals. METHODS: We recorded eight rhesus macaques using 16 high-resolution color cameras to reconstruct their 3D appearance via surface-based avatar models. These models include internal skeletal structures that drive surface deformations, allowing us to align synthetic, computer-generated macaques with real video recordings using keypoint predictions and temporal masks in an automated fashion. Using labeled data, we further generated subject-specific avatars that more accurately capture each individual’s appearance and movement dynamics. RESULTS: Our dataset contains over 750 macaque actions and interactions, each annotated with ethogram-based action labels. The alignment of our surface models shows qualitative and quantitative improvements over established methods, based on different quality measures of 3D fidelity. This reconstruction pipeline is also designed to make large-scale 3D surface modeling tractable. To assess whether the reconstructions capture critical information about body shape, we evaluated action recognition performance across a wide range of visual feature encodings, with and without our 3D pose descriptions. Across our benchmark, adding 3D pose features substantially improves action classification in mean average precision, including action categories ranging from locomotion to social interaction. CONCLUSION: We present a large-scale 3D motion-capture dataset of eight rhesus macaques and corresponding subject-specific synthetic avatar models. These avatars can be posed arbitrarily, include color appearance, and pave the way for fully controlled, experimental studies of body perception in nonhuman primates.

European Research Council (2019-SyG-RELEVANCE-856495). International Max Planck Research School for Intelligent Systems (IMPRS-IS).

Talk 2, 2:45 pm, 34.22

Familiarity and Trait Empathy Modulate Primary Somatosensory Responses to Vicarious Threat

Naama Zur¹ (nrz9@georgetown.edu), Minwoo Lee¹, Maria Czarnecka¹, Jingrun Lin², James Coan², Casey Kenyon Brown¹; ¹Georgetown University, ²University of Virginia

Vicarious threat refers to experiencing another’s threat as if it were one’s own. Simulation theories propose that somatotopically matched activation in primary somatosensory cortex (SI BA3b) plays a crucial role in representing another’s bodily threat experience. However, whether SI BA3b also contributes to perceivers’ own affective responses to vicarious threat remains unclear. Addressing this gap, we used functional magnetic resonance imaging (fMRI) to test whether SI BA3b shows increased activation during vicarious threat and whether this response is associated with factors known to amplify vicarious emotional experience: self-reported affective trait empathy and familiarity with the target. We hypothesized that SI BA3b would show greater activation for threat than for safety, track affective ratings, be stronger for familiar partners than for strangers, and scale with trait affective empathy. Ninety-seven participants (ages 23–26) underwent fMRI while holding hands with either a familiar partner or a stranger in separate runs. In each run, safety or threat cues indicated whether the other individual would receive an electric shock. Contrast estimates for [threat > safe] in SI BA3b were extracted (small-volume-corrected p < .05) and analyzed in relation to trait empathy and self-reported affective ratings. Contrary to predictions, SI BA3b showed decreased activation during threat trials compared with safety trials, regardless of target familiarity. Intriguingly, during the partner condition, but not the stranger condition, this deactivation was less pronounced among individuals with higher trait affective empathy. Affective ratings did not correlate with SI BA3b activity. Together, these findings suggest that SI activation may not directly encode perceivers’ affective responses to vicarious threat. Instead, SI activation appears to be modulated by familiarity and emotional trait empathy, indicating a more nuanced role for SI in vicarious emotional experiences of threat.

Talk 3, 3:00 pm, 34.23

The STS Responds Preferentially to Communicative Signals

Emalie McMahon¹, David Rahabi¹, Nancy Kanwisher¹; ¹Massachusetts Institute of Technology

Our capacity to perceive and interpret communicative signals—whether directed toward us or to another person— is essential for navigating the social world. A recent large-scale naturalistic study found that regions in the superior temporal sulcus may underlie this ability. Regions that had been previously shown to respond to faces (fSTS) and third-person social interactions (SI-STS) both responded more to videos depicting two people communicating with each other than interacting physically or not interacting at all. However, the communicative and noncommunicative videos differed in many ways, leaving open alternative accounts. We, therefore, conducted a controlled fMRI study in which participants were scanned with fMRI while they viewed videos in which the same people in the same setting interacted communicatively or noncommunicatively, or performed independent noncommunicative actions. We independently localized the face-selective (fSTS) and social interaction selective (SI-STS) regions in each participant individually. Both regions showed significantly stronger responses to communicative interactions than to either noncommunicative interactions or independent actions. Is this response specific to viewing third-party communication? Or does it generalize to viewing one person communicating directly to the viewer? We answer this question using close-up videos of faces communicating with another person off-screen (third-person), communicating directly with the viewer (first-person), or performing a non-communicative action like eating or brushing teeth. Both the fSTS and SI-STS responded just as strongly when viewing one face communicating, whether with an unseen partner or directly with the viewer. Together, these findings demonstrate that these STS regions are tuned to detect both third-party and first-person communicative signals.

This work was funded by the MIT Simons Center for the Social Brain

Talk 4, 3:15 pm, 34.24

The Periphery is Sufficient for Natural, Dynamic Emotion Processing

Jefferson Ortega¹ (jefferson_ortega@berkeley.edu), David Whitney¹; ¹University of California, Berkeley

Traditional models of emotion perception posit that high-resolution foveal vision is more informative than peripheral vision for understanding others' emotions. However, these findings rely largely on static, decontextualized images of facial expressions. Here, we revisited the role of the periphery in affective processing, using dynamic, context-rich videos along with a gaze-contingent paradigm. Using the Inferential Emotion Tracking (IET) paradigm (Chen & Whitney, PNAS, 2019), participants (N = 76) continuously rated the valence and arousal of target characters in 11 video clips under two gaze-contingent viewing conditions: Fovea-Only (8° central window visible; periphery masked) or Periphery-Only (8° central scotoma; periphery visible). Performance was compared to an independent Control group that did not have any of their visual field masked. We found that participants in the Fovea-Only condition exhibited significantly lower emotion tracking accuracy in the IET task and had lower inter-subject agreement in their emotional inferences compared to controls. Conversely, IET task accuracy and between-subject agreement were not significantly different between the Periphery-Only and the full-field Control group. These results upend the conventional understanding of retinotopic contributions to affective processing, demonstrating that peripheral vision is sufficient, and perhaps indispensable, for inferring others’ emotions in dynamic, naturalistic environments.

Supported in part by the National Institute of Health (grant no. R01CA236793) to D.W and (grant no. F99NS141343) to J.O.

Talk 5, 3:30 pm, 34.25

Dynamic Visual Reconstruction Reveals Dimensional Representations of Facial Expression Perception

Yong Zhong Liang¹ (yongzhong.liang@mail.utoronto.ca), Tyler Roberts¹, Gerald Cupchik¹, Jonathan Cant¹, Adrian Nestor¹; ¹University of Toronto

Everyday interactions rely on the ability to parse a rich spectrum of dynamic facial expressions, ranging from canonical emotions to conversational signals, such as disbelief or “yeah-right” smiles. Yet, the representational structure of this broader set, and its temporal unfolding, remain incompletely understood. Here, we combine EEG decoding, behavioral similarity judgments, and frame-based visual reconstruction to characterize how diverse dynamic expressions are represented. Healthy adults (n = 14) viewed 24 one-second videos (14 emotional, 10 conversational expressions) that evolved from neutral to an apex configuration, and separately judged pairwise similarity among the same stimuli. Multivariate EEG patterns supported reliable decoding of all expressions, and representational similarity analysis revealed a low-dimensional neural space that closely tracks a behavior-based space organized primarily along affective valence and arousal. Time-resolved decoding indicated that expression-specific information emerges shortly after stimulus onset and peaks well before an expression configuration reaches its apex, suggesting that neural processing capitalizes on early, diagnostically informative cues. Sliding-window EEG reconstruction further recovered dynamic visual representations that preserve fine-grained distinctions between closely related expressions (e.g., happy satiated vs. schadenfreude), with reconstruction accuracy mirroring the anticipatory decoding profile. Behavioral data yielded convergent static reconstructions with a comparable dimensional organization. Together, these results move beyond proof-of-concept reconstruction to reveal how dynamic emotional and conversational expressions jointly populate a shared, anticipatory representational space.

Talk 6, 3:45 pm, 34.26

Intact motion processing but altered social interaction selectivity in autism

Hannah Small¹, Haemy Lee Masson², Ericka Wodka^3,4, Stewart Mostofsky^3,4, Leyla Isik¹; ¹Johns Hopkins University, ²Durham University, ³Kennedy Krieger Institute, ⁴Johns Hopkins School of Medicine

Autism is characterized by altered social interaction and communication, yet reliable neural correlates of these differences remain elusive. Prior work has also highlighted differences in visual perception, including weaker motion processing and biological motion perception, in autism. In the neurotypical population, a region in the right posterior superior temporal sulcus (pSTS) responds selectively to observed social interactions in different dynamic visual displays. Because of its integrative role in social processing, altered selectivity in this region may link visual processing differences to downstream social communication behaviors in autism. For the first time, we collect fMRI responses from matched autistic (n=20, 13 F) and neurotypical (n=34, 17 F) participants while they view silent videos of point-light figures involved in simple social interactions or independent actions. We find significantly reduced selectivity for social motion in right pSTS in autistic participants at the group and individual level. This reduced social visual processing is specific, as we find no differences in general motion processing in the motion-selective middle temporal area (MT) between groups. Ongoing work will characterize fMRI responses in these regions to a naturalistic, audiovisual movie in the both groups of participants. Together these results suggest social interaction perception in the pSTS may be specifically altered in autism, and highlight the dissociation between social and general motion processing in the human brain.

Talk 7, 4:00 pm, 34.27 Remote Presentation

Feature-Specific Effects of Occlusion and Feedback in Face Recognition

Maryam Karimi¹ (imariyakarimi@gmail.com), Jalaledin Noroozi^1,2*, Mohammad-Reza A. Dehaqani^1,3*; ¹Institute for Research in Fundamental Sciences, Tehran, Iran, ²Institute for Convergence Science and Technology (ICST), Sharif University of Technology, Tehran,Iran, ³School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran,Iran

Early face processing in inferior temporal cortex is driven by specific diagnostic features, particularly the eye region.However, when such features are missing due to occlusion, this local feature-based representation becomes insufficient. Recent evidence indicates that under these conditions, feedback from prefrontal regions supports the recovery of face representations by modulating activity in inferior temporal cortex. It remains unclear how feedback depends on missing facial features, contributes to their recovery, and whether its role changes with the degree of occlusion.To examine, we employed a two-alternative-forced-choice (2-AFC) paradigm combined with backward masking to examine how occlusion and stimulus-onset-asynchrony (SOA) influence feedback processing in face recognition.On each trial, participants briefly viewed a face for 35 ms with parametrically occluded features, followed by a phase-scrambled mask at SOAs ranging from 35 to 250 ms. Sixty-six participants took part in one session comprising six blocks, alternating masked and unmasked blocks. Each session included 11 FaceGen-generated identities, each presented at six occlusion levels.Our study demonstrates a double dissociation between feature type and the effects of feedback across different levels of occlusion.Recognition of the eyes was most sensitive to disruption, whereas recognition of the mouth and nose was less affected. Both occlusion and mask presence reduced recognition accuracy, with the strongest impairments observed under high-coverage conditions. In addition, greater disruption of feedback processing at short SOAs, indicating that at these early time points, feedback signals have not yet fully propagated through the cortical hierarchy. Whereas at longer SOAs, feedback information has already reached allowing incomplete information to be integrated and recognition performance to return toward normal levels.We suggest that occluding the eyes leads to the largest decrease in subjects’ performance.This indicates that feedback processing likely contains more information about the eyes than other features.Furthermore, higher-order regions (the prefrontal cortex), may preferentially encode features that are more salient.

Vision Sciences Society

Face and Body Perception: Emotion, identity

Macaques in action: realistic individualized 3D monkey avatars with naturalistic motion

Familiarity and Trait Empathy Modulate Primary Somatosensory Responses to Vicarious Threat

The STS Responds Preferentially to Communicative Signals

The Periphery is Sufficient for Natural, Dynamic Emotion Processing

Dynamic Visual Reconstruction Reveals Dimensional Representations of Facial Expression Perception

Intact motion processing but altered social interaction selectivity in autism

Feature-Specific Effects of Occlusion and Feedback in Face Recognition

Important Dates

MyVSS

Join VSS

Future Meetings