Visuo-semantic clashes: What happens when objects do not look like they should?
Poster Presentation 56.401: Tuesday, May 23, 2023, 2:45 – 6:45 pm, Pavilion
Session: Object Recognition: Categories
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions | Abstract Numbering
Inga María Ólafsdóttir1,3 (), Marelle Maeekalle2,3, Heida Maria Sigurdardottir2,3; 1Reykjavik University, 2University of Iceland, 3Icelandic Vision Lab
Animacy may be a fundamental dimension in human object perception. We applied dimensionality reduction to deep layer activations in a convolutional neural network (CNN) trained for object classification to construct a two-dimensional visual object space. One of these two dimensions approximates an animate-inanimate distinction. Most inanimate objects map onto the inanimate part of object space, and most animate objects onto the animate part. However, some images are projected onto the “wrong” side of an animacy classification boundary. In these instances, there is an incongruency between object identity and its location in object space. In a series of studies, we explore what happens during these visuo-semantic clashes in visual object discrimination and classification. In studies 1 (N = 70), 2 (N = 43, preregistration: https://osf.io/g7kuy), and 3 (N = 511, preregistration: https://osf.io/tqgdk), we found that the degree to which distance in object space accounts for object discrimination speed depends on both real and predicted animacy, i.e., whether the object truly is animate and whether it looks animate according to a CNN. Furthermore, semantic relationships between to-be-discriminated objects greatly affect performance in cases of a visuo-semantic clash. In study 4 (N = 32, preregistration: https://osf.io/4sc5x), people classified images as animate or inanimate when presented very briefly (33 ms) or for longer (500 ms). In cases of a visuo-semantic clash, people were less accurate in their categorization specifically when presentation was short and masked, which may interrupt recurrent visual object processing. Overall, we suggest that studies 1-4 reflect the integration of top-down information with an initial feedforward sweep of visual information through the visual pathway. A visual dimension that closely, but not fully, maps onto the animate-inanimate distinction is a main visual dimension on which objects truly differ from one another, and CNNs and humans may extract this information directly from visual statistics.
Acknowledgements: This work was supported by The Icelandic Research Fund (Grants No. 228916 and 218092) and the University of Iceland Research Fund.