VSS, May 13-18

Face Perception: Neural mechanisms

Talk Session: Sunday, May 15, 2022, 8:15 – 9:45 am EDT, Talk Room 1
Moderator: Galit Yovel, Tel Aviv University

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 3:39 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 8:15 am, 31.11

Common encoding axes for both face selectivity and non-face objects in macaque face cells

Kasper Vinken1 (), Talia Konkle2, Margaret Livingstone1; 1Harvard Medical School, 2Harvard University

Higher visual areas in the ventral stream contain regions that show category selective responses. The most compelling examples are face patches with mainly face-selective neurons. Still, such face cells also show weaker yet reliable responses to non-face objects, which is hard to explain with a semantic/categorical interpretation. If face cells care only about faces, then what explains the tuning for other objects in the low firing rate regime? Here, we tested the hypothesis that face selectivity is not categorical per se, but rather that neurons encode higher-order visuo-statistical features which are strongly activated by faces and to a lesser extent by other objects. We investigated firing rates of 452 neural sites in and around the medial lateral face patch of macaque inferotemporal cortex, a potential homologue of the human fusiform face area, in response to over a thousand images (448 faces, 960 non-faces). We found that neural responses to both faces and non-face objects were deeply related, where the structure of responses to non-face objects could predict the degree of face selectivity. This link was not well explained by tuning to semantically-interpretable shape features like roundness, color, etc. Instead, domain-general features from an Imagenet-trained deep neural network were able to predict neural face selectivity exclusively from responses to non-face images. Additionally, encoding models trained only on responses to non-face objects could also (1) predict the face inversion effect, (2) were sensitive to contextual relationships that indicate the presence of faces, and (3) when coupled with image synthesis using a generative adversarial network, revealed increasing preference for faces with increasing neural face selectivity. Together, these results show that face selectivity and responses to non-face objects are driven by tuning along common encoding axes, where these features are not categorical for faces, but instead reflect tuning to the more general visuo-statistical structure.

Talk 2, 8:30 am, 31.12

Testing the Expertise Hypothesis with Deep Convolutional Neural Networks Optimized for Subordinate-level Categorization

Galit Yovel1 (), Idan Grosbard1, Noam Avidor1, Amit Bardosh1, Koby Boyango1, Danielle Chason1, Naphtali Abudarham1; 1Tel Aviv University

Perceptual expertise involves discrimination of stimuli at subordinate level of categorization. The question of whether perceptual expertise is mediated by general expertise or domain-specific mechanisms has been hotly debated. To decide between these two hypotheses, previous studies have asked whether objects of expertise share the same computations that are used for face recognition, for which all humans are experts. A main limitation of these studies is that human experience with faces is more extensive than with any object of expertise. Current computational models of object recognition enable us now to re-evaluate this question with models that are matched for the amount of experience with different categories of expertise. We evaluated computational similarity by measuring the learning curve and performance level of a deep convolutional neural network (DCNN) that was pre-trained for subordinate-level categorization of one category (faces), to re-learn subordinate-level categorization of a new category (birds) and compared to a DCNN that was pre-trained for basic-level categorization (objects). A general expertise hypothesis predicts that a face-trained DCNN will show faster learning and higher performance for birds than an object-trained DCNN. In contrast to this prediction, an object-trained network learned to discriminate birds much faster and reached higher performance level than a face-trained network. Another measure that was used to indicate whether objects of expertise share similar computations is the inversion effect. We therefore examined performance of a face-trained and a bird-trained DCNN for identity matching of upright and inverted objects, faces and birds. A face-trained DCNN showed an inversion effect only for faces and a bird-trained DCNN showed an inversion effect only for birds, indicating that the inversion effect is a specific within-category effect rather than shared among different categories of expertise. Taken together, these findings suggest that perceptual expertise is mediated by domain-specific rather than general expertise mechanisms.

Talk 3, 8:45 am, 31.13

Fast Periodic Visual Stimulation Reveals Expedited Neural Face Processing in Super-Recognizers

Jeffrey D. Nador1 (), Meike Ramon1; 1Applied Face Cognition Lab, Switzerland

Previous research examining face identity processing (FIP) has shown that robust, face-selective neural responses occur within 170ms in neurotypical adults (Rossion et al., 2020). However, whether their amplitude or speed varies as a function of FIP ability remains unclear. Therefore, we investigated face-selective neural responses from conservatively assessed Super-Recognizers (SRs; Russell et al., 2009; Ramon, 2021) and controls, in a pair of fast periodic visual stimulation EEG experiments. In Experiment 1, observers were shown random sequences of naturalistic “base” images of objects and animals, 167ms at a time (6Hz), interleaved with face or house “oddballs” once per second (1Hz), while they performed an orthogonal task. In Experiment 2, we varied base presentation duration (50ms, 100ms) and rate (10Hz, 20Hz), with face oddballs presented at 1Hz. In Experiment 1, both groups exhibited greater neural responses for faces compared to house oddballs at early harmonics. In Experiment 2, face oddball responses were substantially reduced at 10Hz (100ms duration), but much less so for SRs. Even at shorter presentation durations (10Hz; 50ms duration), SRs’ face-selective neural responses at early harmonics remained unchanged, whereas controls’ were further reduced. At the fastest presentation rates, both groups’ face oddball responses were indistinguishable from houses. Given sufficient time (i.e. 167ms), oddball neural responses were face-selective in both groups. At faster presentation rates and shorter stimulus durations, though, SRs’ advantages become clearer. First, they show sustained face oddball responses at higher presentation rates, and second, at shorter durations. Potentially, 20Hz presentations of base images were fast enough to backwards-mask oddballs, inhibiting face-selective responses in both groups. Taken together, these results imply that FIP proceeds faster among SRs relative to controls.

Acknowledgements: MR is supported by a Swiss National Science Foundation PRIMA (Promoting Women in Academia) grant (PR00P1_179872).

Talk 4, 9:00 am, 31.14

Identifying visual brain regions in the absence task fMRI

David Osher1 (), Zeynep Saygin2; 1The Ohio State University, 2The Ohio State University

The ventral visual stream is comprised of numerous regions selective for specific high-level visual categories. While generating an areal map of the brain is a century-long endeavor, no approach is yet able to accurately identify functionally-selective high level visual regions on an individual subject basis in the absence of a task-based fMRI localizer. Our and others’ previous work has demonstrated a tight link between brain circuitry and function, at the fine-grain of single voxels from individual subjects and reflecting individual variation therein. Can connectivity reliably identify high-level visual functional regions of interest (fROIs), in place of well-established functional localizers? If so, a single 10-minute resting state scan could be used in lieu of myriad localizers, saving researchers an enormous amount of scanning time, effort, and funding. Further, these models illuminate the neural circuitry that best define each brain region, and are strong candidates of the underlying mechanisms that govern visual selectivity. We scanned 40 participants with functional localizers for visually-selective regions involved in the perception of faces (FFA, OFA, STS), scenes (PPA, RSC, TOS), bodies (EBA), and objects (LOC, PFS). We designed linear models to predict the location of each ROI using resting-state (functional connectivity, FC). These models were able to accurately identify face, scene, body, and object selective voxels in all cases, and could reliably localize each fROI for any given participant. These FC-ROIs were selective to the expected category of interest, similar to the fROIs identified with the functional localizer task. They also outperformed probabilistic parcels, as well as the closest matching region from other areal maps/atlases purported to reflect functional subdivisions of the brain, e.g. Glasser atlas. Thus, a single resting-state scan can efficiently replace an entire set of functional localizers for high-level vision, offering practical and scientific advantages.

Acknowledgements: Alfred P. Sloan award (ZMS)

Talk 5, 9:15 am, 31.15

Prosopagnosia does not abolish other-race effects

Pauline Schaller1, Anne-Raphaëlle Richoz1, Roberto Caldara1; 1University of Fribourg

Race refers to the socially constructed classification of human faces based on salient physical traits. This biologically relevant feature is extracted automatically, almost instantly and markedly shapes face processing. Previous studies have shown that observers recognize more accurately same- (SR) than other-race (OR) faces (i.e., the Other-Race Effect - ORE), but categorize them more slowly by race (i.e., the Other-Race Categorization Advantage - ORCA). While several fMRI studies reported stronger neural activations to the recognition and categorization of SR vs. OR faces in the Fusiform Face Area (FFA) and Occipital Face Area (OFA), others reported discrepant findings. Interestingly, stronger activations in these face-sensitive regions were also often associated with greater magnitude of ORE and ORCA. However, whether these face-sensitive regions play a genuine causal role in the other-race effects remains unknown. To clarify this issue, we tested PS, a pure case of acquired prosopagnosia with bilateral occipitotemporal lesions encompassing the left FFA and the right OFA. PS, healthy age-matched and young adults performed two recognition and three categorization by race tasks, using different databases of normalized Western Caucasian and East Asian faces with, without external features and in naturalistic settings. As expected, our data show superior memory performance but slower categorization responses for SR than OR faces in controls, with PS having slower and less accurate responses. Crucially, however, the magnitude of PS’ ORE and ORCA was comparable to the controls in all the tasks. Our data show that an intact face-cortical network – and more precisely intact left FFA and/or right OFA – is not causally necessary to observe the other-race effects. These brain regions therefore only boost the accuracy and speed of those effects. Race is a strong visual and social signal that is encoded in a large neural face-sensitive network, robustly tuned for processing same-race faces.

Acknowledgements: This project was supported by the Swiss National Science Foundation, grant number- 100019_189018

Talk 6, 9:30 am, 31.16

Beyond faces: Characterizing the response of the amygdala to visual stimuli.

Jessica Taubert1,2 (), Susan G. Wardle2, Amanda Patterson2, Chris I. Baker2; 1The University of Queensland, 2The National Institute of Mental Health

The primate amygdala is thought to play a critical role in social and emotional processing. Consequently, the response of the amygdala to visually presented faces has received a disproportionate level of attention. However, to better inform theories of amygdala function, our aim was to characterize the amygdala’s response to a broader range of visual stimuli varying in both social relevance and emotional valence. We used event-related fMRI to measure amygdala activity in four adult macaques. This design permitted us to investigate both neural tuning and representational geometry. The univariate results revealed that at the population level amygdala activity is moderated by both social relevance and emotional valence. As expected, the average fMRI signal was greater for valent stimuli than neutral stimuli. Surprisingly, however, the non-social stimuli drove amygdala activity more so than the social stimuli. Despite the univariate results, we use multivariate analyses to show that neither social relevance nor emotional valence are adequate models for characterizing the amygdala’s entire representational space. The amygdala was driven the most by an assortment of different stimuli, including aggressive social interactions, the faces of conspecific infants, bird eggs, snakes and familiar objects such as medical syringes. Meanwhile, cortical regions in the ventral visual pathway (two face-selective regions and one object-selective region) were found to be sensitive to the distinction between social and non-social stimuli; activity in the face patches was greatest for the social stimuli than the non-social stimuli. In sum our findings suggest that the visual responses of the macaque amygdala are not easily explained by concepts such as social relevance or emotional valence (as defined by human researchers), and that amygdala function extends beyond the recognition of facial expressions.

Acknowledgements: This research was supported by the Intramural Research Program of the National Institute of Mental Health (ZIAMH002909 to C.I.B). J.T. was supported by funding from the Australian Research Council (FT200100843).