V-VSS, June 1-2

Perceptual Organization, Object Recognition

Talk Session: Wednesday, June 1, 2022, 8:30 – 9:45 am EDT, Zoom Session

Times are being displayed in EDT timezone (Florida time): Friday, September 30, 11:46 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 8:30 am, 71.71

Task-dependent contribution of higher-order statistics to natural texture processing

Daniel Herrera1 (), Ruben Coen-Cagli2; 1Faculty of Sciences, Universidad de la República, 2Department of Systems and Computational Biology, and Dominick Purpura Department of Neuroscience, Albert Einstein College of Medicine

During natural visual behavior our visual system extracts multiple features from its inputs and uses them to solve different tasks. Each feature conveys relevant information to different tasks, and for a given task our visual system relies on the relevant features while ignoring others. However, this hypothesis remains largely untested for complex tasks with natural stimuli. Here we compare the role of spectral and higher order (HOS) texture statistics of the Portilla-Simoncelli model across tasks using natural images, to explain their task-dependent use by humans. Portilla-Simoncelli HOS are important for human texture perception, peripheral vision, and physiology, but they play a much smaller role for texture segmentation. Modeling work suggests this could reflect the redundancy between HOS and spectral statistics (a strong segmentation cue in humans) for natural image segmentation. But the importance of HOS for texture perception suggests that these statistics may be informative for other texture-related tasks. In this work we test the hypothesis that, in contrast to segmentation, HOS are superior to spectral statistics for natural texture classification. To test this, we trained linear classifiers to solve 4 different natural image classification tasks (classification of physical texture instances, materials, perceptual descriptions and scenes) across 11 datasets, and compared the performance afforded by HOS and spectral statistics. We find that HOS improved task performance considerably over spectral statistics, unlike what was reported for segmentation. This is compatible with an account of the task-dependent use of these features by humans based on their task-dependent relevance in natural images. Interestingly, we find that the contribution of HOS varies between classification tasks, with larger improvements for instance classification. Future work should test whether use of HOS by humans follows this finer pattern within classification, and explore the computational underpinnings of the varying HOS contributions.

Acknowledgements: NIH grant EY031166

Talk 2, 8:45 am, 71.72

Bar graphs of mean values produce inflated and variable estimates of effect size

Ja Youn Lee1 (), Sarah Kerns2, Jeremy Wilmer3; 1Wellesley College

Bar graphs of mean values (BGoMs) are frequently criticized for abstracting beyond, and thus hiding, the individual values that are averaged to produce their plotted mean values. Yet does this abstraction produce miscommunication? BGoMs are often presumed, due to their visual simplicity, to communicate well, especially to non-expert viewers. Here, we tested that presumption. In a study of 29 non-expert viewers, we first found that viewers overestimated the effect sizes conveyed by two real BGoMs taken directly from popular Introductory Psychology textbooks. We then asked whether manipulation of the y-axis range would reduce this overestimation and found that it did, but only partly, and only in some viewers. We measured estimated effect sizes in Cohen’s d (SD) units via a drawing-based method developed by our lab that requires no prior statistical knowledge (Kerns & Wilmer, 2021). Participants simply sketch a version of a viewed BGoM, adding hypothesized data points that, when averaged, would produce the plotted mean values. For two BGoMs whose real effect sizes were 1.0 and 0.7, the median drawn effect sizes were 4.4 and 9.5. Expansion of the y-axis range reduced, but did not eliminate the overestimation (median effect sizes were 3.2 and 2.1, respectively, for a 2x expansion, and 3.7 and 2.5, respectively, for a 4x expansion). Moreover, the variation between the largest and smallest drawn effect size in every condition (2 BGoMs x 3 y-axis ranges) represented at least a fivefold difference; therefore, though overestimation was reduced on average, different viewers still came away with markedly different, often highly inaccurate, conceptions of the data. We conclude that BGoMs are capable of producing distorted, highly varied interpretations of data in non-expert viewers, and that abstraction in BGoMs is not just a theoretical concern, but an evidence-based concern.

Acknowledgements: This research was funded in part by NSF award #1624891 to JBW, a Brachman Hoffman grant to JBW, and a sub-award from NSF grant #1837731 to JBW.

Talk 3, 9:00 am, 71.73

Part structure predicts superordinate categorization of animals and plants

Henning Tiedemann1, Filipp Schmidt1, Roland W Fleming1; 1Justus Liebig University Giessen

Categorizing objects into superordinate classes (e.g., animals, plants) based on their visual appearance is challenging. Often, items of very different appearance need to be grouped together (e.g., elephants and mice) whereas items of more similar appearance don’t (e.g., twigs and insects). Plants and animals are a particularly salient example: as living things, both typically have rich shape structure consisting of multiple limbs, whose properties and configurations differ systematically on account of growth regularities peculiar to their biological category (e.g., animals tend to have symmetrical pairs of limbs, whereas plants don’t). We propose that these growth regularities lead to different perceptual organizations of object parts, namely their spatial arrangement and relations, creating potent cues for differentiating the two. To test this, we used a generative algorithm based on shape skeletons to create many novel object-pairs that differed in their part-structure but were otherwise very similar. We found that participants reliably judged shapes with certain part organizations to be systematically more like plants rather than animals (and vice versa). Based on these results, we generated another 110 sequences of shapes morphing from animal- to plant-like appearance by manipulating part structure in terms of three features: sprouting parts, curvedness of parts and symmetry of part pairs. Judgements of a different group of participants showed that all three parameters are highly predictive of human object classifications along the animal/plant continuum. This shows that the perceptual organization of parts along with part-based features like curvedness — both of which can be visually quite subtle — are powerful cues for superordinate categorization.

Acknowledgements: Research funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–project number 222641018–SFB/TRR 135 TP C1), by the European Research Council (ERC) Consolidator Award “SHAPE”–project number ERC-2015-CoG-682859 and by “The Adaptive Mind”

Talk 4, 9:15 am, 71.74

Functional connectivity and representational content of tool category and elongation in tool-selective parietal cortex

Olivia S. Cheung1 (), Chenxi He2; 1New York University Abu Dhabi, 2University of Western Ontario

When we perceive an object (e.g., a hammer), much information about the object is processed (e.g., its shape, kind, or function). How are the different aspects of information processed in the distributed category-selective networks? While tool-selective regions in the occipitotemporal cortex primarily contain categorical information about tools, compared with visual information (e.g., shape or spatial frequency, He et al., 2020), here we used fMRI to examine 1) how category and visual information may be represented in the parietal cortex, and 2) whether regions with increased similarity in the representational content are also more strongly connected functionally. Tool-selective regions in left superior and inferior parietal lobules (SPL/IPL) were first defined with images of animals and tools with naturally varied image statistics. We then tested the nature of representations in these regions with images of animals and tools that were either round or elongated, and either in low or high spatial frequencies (LSF/HSF). Importantly, these images shared comparable gist statistics, minimizing low- or mid-level visual differences across the categories. Using representational similarity analysis, we found that tool category and HSF information were independently represented in SPL, whereas elongation and HSF information interacted in IPL. Interestingly, functional connectivity analysis suggested that the tool category representation in SPL might be related to the stronger connectivity between tool-selective left medial fusiform gyrus and SPL, compared with IPL, and that the elongation representation in IPL might be related to the stronger connectivity between tool-selective left premotor region and IPL, compared with SPL. Together, these results showed that the complementary approaches of functional connectivity and representational similarity analyses can provide useful insights on the respective roles and interactions among regions in category-selective networks, such as how different aspects of tool information are represented in various regions in the tool-selective network to support recognition and potential action planning.

Talk 5, 9:30 am, 71.75

Interpretable object dimensions in deep neural networks and their similarities to human representations

Lukas Muttenthaler1,2, Martin N. Hebart1; 1Max Planck Institute for Human Cognitive and Brain Sciences, 2Technical University of Berlin

Convolutional neural networks (CNNs) have recently received a lot of attention in the vision sciences as candidate models of core visual object recognition. At the behavioral level, these models show near-human object classification performance, allow for oftentimes excellent prediction of object-related choices, and explain significant proportions of variance in object similarity judgments. Despite these parallels, CNNs continue to exhibit a performance gap in explaining object-based representations and behavior. Here we aimed at identifying what factors determine the similarities and differences between CNN and human object representations. Paralleling object similarity judgments in humans, we generated 20 million in-silico triplet odd-one-out choices on 22,248 natural object images, using the penultimate layer activations of a pretrained VGG-16 model. Next, we applied a gradient-based similarity embedding technique that yielded 57 sparse, non-negative dimensions that were hi ghly predictive of the CNN’s odd-one-out choices. These dimensions were interpretable, reflecting properties of objects that are both visual (e.g. color, shape, texture) and conceptual (e.g. high-level category, value) in nature. While recent work indicated that CNNs respond to the texture of an object rather than its shape, our results reveal robust shape-related dimensions, indicating that texture bias may not be a general representational limitation. To probe the representational content of individual dimensions, we developed a dimension prediction approach, allowing us to (1) generate optimal stimuli for individual dimensions, (2) reveal image regions for driving these dimensions, and (3) causally manipulate individual image features to identify the dimensions’ representational nature. Despite strong parallels between CNNs and humans, a one-to-one mapping of CNN dimensions to human representational dimensions revealed striking differences for a subset of images, reve aling novel image biases that limit a CNNs generalization ability. Together, this interpretability technique offers a powerful new approach for understanding the similarities and differences between representations derived from behavior and CNNs.

Acknowledgements: This work was supported by a Max Planck Research Group grant of the Max Planck Society awarded to MNH