Both visual and theory-of-mind computations explain unique variance in the social brain
Poster Presentation 16.338: Friday, May 15, 2026, 3:45 – 6:00 pm, Banyan Breezeway
Session: Face and Body Perception: Social cognition 1
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Manasi Malik1 (), Minjae Kim1, Tianmin Shu1, Shari Liu1, Leyla Isik1; 1Johns Hopkins University
Making social evaluations from visual input is a core human ability, yet the computational mechanisms that enable this remain unknown. Visual social recognition consistently engages regions involved in social perception, including regions along the superior temporal sulcus (STS), as well as higher-level theory-of-mind (ToM) areas such as the temporoparietal junction (TPJ). One common hypothesis is that these regions operate hierarchically, with social-perceptual regions carrying out bottom-up visual computations that serve as input to higher-level inference in the ToM network. This hypothesis, however, has never been computationally tested, in large part due to the lack of successful models of social processing. We recently developed computational models of both social interaction perception and theory-of-mind: a graph-neural-network (GNN) model that relies on relational visual information, and a generative inverse-planning model that recognizes social interactions by inverting a model of agents’ goals and the physical world. In this preregistered study, we collected fMRI responses while participants (n = 25) watched 10s videos depicting social interactions and compared neural responses to both computational models using representational similarity analysis (RSA) in functional regions of interest and whole-brain searchlights. We found that both the GNN and the inverse-planning model significantly explained variance in both social-perceptual (e.g., pSTS) and ToM (e.g., TPJ) regions, even after controlling for variance explained by the other model. This suggests that both regions support a combination of relational visual processing and higher-level inferential reasoning. Exploratory analyses suggested differences in the time course of these different computations, indicating a shift from early visual processing towards later inferential evaluation in both regions. Overall, these findings suggest that both social-perceptual and ToM regions carry out a combination of relational visual and higher-level inferential computations, perhaps on distinct timescales with early relational visual processing followed by later inverse-planning-based inference.
Acknowledgements: This work was funded by NIMH R01MH132826 awarded to L.I.