VSS, May 13-18
Talk 1, 10:45 am, 22.11
The face diet of adults with autism spectrum disorder
Autism spectrum disorder (ASD) is a neurodevelopmental condition characterized by difficulties in social communication and interaction, as well as restricted and repetitive behaviour and thoughts. Deficits in face processing have been argued to play a fundamental role in the social interaction challenges observed in autistic people, however, variability in face recognition ability in autism has yet to be explained. Experience-based accounts like the Social Motivation Hypothesis posit that reduced or atypical social attention may deprive individuals with autism of the necessary experience for the typical protracted development of face processes. Lab-based eye tracking studies and retrospective home video studies support the notion that autistic people have reduced and atypical attention to faces, but there is currently little empirical evidence of insufficient exposure to faces as part of their daily visual input. A recent study (Oruc et al., 2018) characterized the exposure statistics to faces for neurotypical adults using a head-mounted camera worn by participants for one day. We have utilized the same methodology to examine the visual experience of adults with autism (n = 17). Our results reveal notable quantitative and qualitative differences between the “face diets” of those with autism and non-autistic controls (n = 43). Individuals with autism spend significantly less time exposed to faces (8.45 minutes of every walking hour) than their non-autistic counterparts (12.21 minutes per hour; p = 0.044). In addition, faces were seen from a farther distance (p <<0.001) and more likely to appear in profile view (p = 0.01) for those with ASD. These exposure statistics differ from those characteristic of social interactions. Taken together, we provide the first evidence in an ecologically valid setting that visual experience with faces is reduced and atypical in ASD. This may be an important limiting factor in face recognition competence in ASD.
Acknowledgements: This work was supported by a Natural Sciences and Engineering Research Council of Canada Discovery Grant RGPIN-2019-05554 and an Accelerator Supplement RGPAS-2019-00026, and a Canada Foundation for Innovation, John R. Evans Leaders Fund.
Talk 2, 11:00 am, 22.12
She still seems angry: inflexibility in updating emotional priors in autism
Sarit Szpiro1,2, Renana Twito1, Bat-Sheva Hadad1,2; 1Special Education Department, University of Haifa, Israel, 2Edmond J. Safra Brain Research Center for the Study of Learning Disabilities, University of Haifa, Israel
Abnormalities in the use of prior information of basic perceptual stimuli have been observed in people with autism spectrum disorder (ASD), yet it remains unclear whether and how these abnormalities extend to complex stimuli and emotional stimuli in particular. Here, we compared people with ASD and neurotypicals in their ability to acquire and then update a prior of facial expressions. Participants performed a two-interval forced choice discrimination task between two facial expressions presented in succession. To examine the acquisition of the prior, we used the paradigm of “regression to the mean”, in which discrimination between faces is modified according to the mean statistics accumulated during the experiment. In the first part of the experiment, participants were exposed to facial expressions sampled from a normal distribution around one average prior (sad/angry). In the second part of the experiment, the average changed to a different prior (angry or sad accordingly). We found a significant difference in performance between groups in the use of the priors. While both groups acquired the first prior, only the neurotypicals acquired the second prior. Carrying out a trial-by-trial analysis further demonstrated the effects of the recent trials and overall mean statistics in both experiment parts on the performance of neurotypicals. In ASD, in contrast, the overall mean statistics was used in the first part of the experiment; however, when the overall mean was updated in the second part, ASD only relied on recent trials. These findings suggest that ASD have a difficulty in updating an acquired prior of an emotional facial expression and shed light on the social behavior of people with autism.
Acknowledgements: Grants ISF 882/19 and BSF 2020234 to BH.
Talk 3, 11:15 am, 22.13
Investigating the origins of the face inversion effect with an extraordinary participant
Yiyuan Zhang1 (), Lucia Garrido2, Constantin Rezlescu3, Maira Braga4, Tirta Susilo5, Brad Duchaine1; 1Dartmouth College, 2City, University of London, 3UCL, 4University of Western Australia, 5Victoria University of Wellington
What factors produce the face inversion effect (FIE)? A purely experiential account suggests the FIE results solely from the greater experience people have with faces that match the orientation of their faces (upright). In contrast, a purely phylogenetic account proposes the FIE reflects the operation of evolved orientation-specific mechanisms. Deciding between these alternatives is challenging because almost everyone has more experience with faces that match their own face’s orientation and everyone’s ancestors were selected to process faces with matched orientations. Here we report results from an individual whose extraordinary perceptual experience allows investigation of the origin of the FIE. Claudio is a 42-year-old man with a congenital joint disorder that causes his head to rest upside-down between his shoulder blades. As a result, most faces Claudio has observed are mismatched (upright) to his face’s orientation. We assessed Claudio’s performance with matched and mismatched faces in three types of tasks and compared his results to 22 age-matched controls. On two Thatcher tasks, Claudio showed better performance with mismatched (mean accuracy= 91.5%) than matched faces (mean accuracy = 75.0%). These scores provide formal evidence that he has had more perceptually-relevant experience with mismatched than matched faces. On seven face detection tasks, Claudio’s matched and mismatched performance was comparable and he did better with mismatched faces than controls. On seven identity matching tasks, Claudio’s accuracy was better with matched than mismatched faces and his FIE was similar to controls’ FIE. Because Claudio has had more experience with mismatched faces than matched faces, his scores in detection and identity matching indicate the adult visual system includes mechanisms that evolved to process matched faces. Additionally, Claudio’s reduced FIE in detection tasks is inconsistent with a purely phylogenetic account. Together, our result suggests the FIE reflects the effects of both experiential and phylogenetic factors.
Acknowledgements: We acknowledge the support from the Rockefeller Foundation for our project.
Talk 4, 11:30 am, 22.14
Norm-referenced neural mechanism for the recognition of facial expressions across fundamentally different face shapes
Michael Stettler1,2 (), Nick Taubert1, Ramona Siebert3, Silvia Spadacenta3, Peter Dicke3, Peter Thier3, Martin Giese1; 1Section for Computational Sensomotorics, Centre for Integrative Neuroscience & Hertie Institute for Clinical Brain Research, University Clinic Tübingen, 72076 Tübingen, Germany, 2International Max Planck Research School for Intelligent Systems (IMPRS-IS), 72076 Tübingen, Germany., 3Department of Cognitive Neurology, Hertie Institute for Clinical Brain Research, University of Tübingen, 72076 Tübingen, Germany
Humans recognize facial expressions from highly different facial shapes, including comic figures that never have been seen before. Even though they are trained with huge numbers of pictures of faces (REF), this task is a challenge for popular deep neural network models of face recognition, while humans effortlessly accomplish such transfer. For the encoding of facial identity two different encoding mechanisms have been contrasted, norm-referenced and example-based encoding (Tsao et al., 2017, Leopold et al., 2006; Koyano et al, 2021). We demonstrate that norm-referenced encoding is suitable to account for facial expression recognition from fundamentally different head shapes without extensive training. METHODS: We propose a neural model consisting of two modules: 1) a standard deep neural network, which models the initial parts of the visual pathway and extracts a sparse set of facial landmarks that support optimally the classification of the learned facial expressions; 2) a simple neural network for norm-referenced encoding of expressions based on these landmark features. We compare this new model to a standard deep neural model with a stimulus set that presents the same 7 basic facial expressions on very different head shapes (human avatar, comic figure, etc.). RESULTS: Our model recognizes reliably the facial expressions across all tested head shapes, requiring training of the expressions only of one head shape and of the neutral expressions of all other head shapes. In contrast, a state-of-the art CNN architecture (ResNet50) trained with 400k+ facial images fail to classify expressions robustly on this data set. CONCLUSIONS: We presented a physiologically-inspired neural model for the ‘vectorized encoding’ (Beymer & Poggio, 1995) of facial expressions that accounts for transfer across very different head shapes, which is difficult to obtain with standard models.
Acknowledgements: This work was supported by HFSP RGP0036/2016, ERC 2019-SyG-RELEVANCE-856495 and NVIDIA Corp. MG was also supported by BMBF FKZ 01GQ1704 and BW-Stiftung NEU007/1 KONSENS-NHE. RS, SS, PD, and PT were supported by a grant from the DFG (TH 425/12-2)
Talk 5, 11:45 am, 22.15
Facial expressions of threatening emotions show greater communicative robustness
Facial expressions are effective at communicating emotions partly due to their high signal variability. In line with communication theory, such signal variance could reflect evolutionary design to optimize communication efficiency and success—e.g., in-built degeneracy (different signals communicate the same message) and redundancy (similar signals communicate the same message, e.g., Hebets et al., 2016). However, such knowledge remains limited because current facial expression models are largely restricted to few, static, Western-centric signals. We address this knowledge gap by modelling dynamic facial expressions of the six classic basic emotions—happy, surprise, fear, anger, disgust, sad—in sixty individual participants from two cultures (Western European, East Asian) and characterizing their signal variance. Using the data-driven method of reverse correlation (e.g., see Jack et al., 2012), we agnostically generated facial expressions (i.e., combinations of dynamic Action Units—AUs) and asked participants to categorize 2,400 such stimuli according to one of the six emotions, or select ‘other’. We then measured the statistical relationship between the AUs presented on each trial and each participant’s emotion responses using Mutual Information (e.g., Ince et al., 2017), thus producing 720 dynamic facial expression models (6 emotions x 60 participants x 2 cultures). Finally, we used information-theoretic analyses and Bayesian inference of population prevalence (Ince et al., 2021) to characterize signal variance within each emotion category and culture. Results showed that, in both cultures, high-threat emotions (e.g., anger, disgust) are associated with a broader set of AUs than low-threat emotions (e.g., happy, sad; see also Liu et al., 2021), suggesting that costly-to-miss signals have higher levels of in-built redundancy and degeneracy that increase communication efficiency and success in noisy real-world environments. Our results contribute to unravelling the complex system of facial expression communication with implications for current theoretical accounts and the design of socially interactive digital agents.
Acknowledgements: TTM: UK Research & Innovation [EP/S02266X/1]; REJ: European Research Council , Economic & Social Research Council [ES/K001973/1];PGS: Multidisciplinary University Research Initiative/Engineering & Physical Sciences Research Council [172046-01]; RAAI/PGS: Wellcome Trust [214120/Z/18/Z;107802]
Talk 6, 12:00 pm, 22.16
Early automatic processes shape other-race effects for faces
Justin Duncan1,2, Chloé Galinier1, Caroline Blais1, Daniel Fiset1, Roberto Caldara2; 1Université du Québec en Outaouais, 2Université de Fribourg
The other-race effect (ORE) was initially coined to illustrate how familiarity with a given race modulates face recognition efficiency. In contemporary research, however, other-race effects consist in both a recognition disadvantage and race categorization advantage for other- compared to own-race faces. While perceptual and social factors have been considered to explain OREs, attentional factors are considered secondary and remain poorly understood. Drawing from central attention theory, we addressed the issue using a psychological refractory period (PRP) dual task paradigm. Sixty White participants were recruited and assigned to one of two dual task experiments (30 per group). Task 1 (T1) always consisted of a difficult tone categorization. Task 2 (T2) could consist of Eastern-Asian (EA) and White race categorization (Experiment 1), or delayed match-to-sample recognition of EA and White faces (Experiment 2). Stimulus onset asynchrony (150, 300, 600 or 1200ms) was manipulated to modulate T1-T2 overlap and interference, inducing a PRP effect, i.e., a slowing of T2 response times with overlap increase. T2 difficulty was modulated by morphing stimuli into unambiguous (100% signal) and ambiguous (60% signal, 40% noise) versions along the relevant axis (Exp 1: race; Exp 2: identity). As expected, increasing task overlap and difficulty slowed responses across the board. However, there were task- and race-mediated interactions. For categorization of EA but not White faces, the effect of difficulty decreased with overlap increase. The pattern was reversed for recognition, as the effect of difficulty decreased with overlap for White but not EA faces. According to the locus of slack framework, these results reflect automatization for other-race categorization and own-race recognition, and requirement of central (i.e., controlled) processing for own-race categorization and other-race recognition. These findings highlight a critical, fast, and fine-grained attentional interplay modulating face processing as a function of race: Attention plays a primary role shaping other-race effects.
Acknowledgements: Social Sciences and Humanities Research Council of Canada 7032262 (JD); Canada Research Chair in Cognitive and Social Vision 950-232282 (CB); Natural Sciences and Engineering Research Council of Canada (DF); Swiss National Science Foundation 100019_189018 (RC)
Talk 7, 12:15 pm, 22.17
Social Inference from Relational Visual Information
Humans easily make social evaluations from visual input, such as recognizing a social interaction or deciding who is a friend and who is a foe. Prior efforts to model this ability have suggested that visual features and motion cues alone cannot account for human performance in social tasks, like distinguishing between helping and hindering interactions. On the other hand, generative Bayesian inference models, which make predictions based on simulations of agents’ social or physical goals, accurately predict human judgments, but are computationally very expensive and often impossible to implement in natural visual stimuli. Inspired by developmental work, we hypothesize that introducing inductive biases in visual models would allow them to make more human-like social judgments. Specifically, in this study, we investigate if relational representations of visual stimuli using graph neural networks can predict human social interaction judgments. We use the PHASE dataset, consisting of 2D animations of two agents and two objects resembling real-life social interactions, rated as friendly, neutral, or adversarial. We propose a graph neural networks (GNN) based architecture, that takes in graph representations of a video and predicts the relationship between the agents. We collected human ratings for each of the 400 videos and found that our GNN model aligns with human judgments significantly better than a baseline visual model with the same visual/motion information but without the graph structure (79% vs 62% prediction of human judgements). Intriguingly, explicitly adding relational information to the baseline model does not improve its performance, suggesting that graphical representations in particular are important to modeling human judgements. Taken together, these results suggest that relational graphical representations of visual information can help artificial vision systems make more human-like social judgments without incurring the computational cost of Bayesian models, and provide insights into the computations that humans employ while making visual social judgments.