Unveiling core, interpretable image properties underlying model-brain similarity with generative models

Poster Presentation 23.307: Saturday, May 18, 2024, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Miscellaneous

Yingqi Rong1, Colin Conwell1, Dianna Hidalgo2, Michael Bonner1; 1Johns Hopkins University, 2Harvard Medical School

Deep Neural Networks (DNNs) are now capable of predicting the hierarchy of natural images representations in human visual cortex with substantial accuracy. However, a key challenge in the use of these networks to predict representations in the brain is discerning the specific properties of these networks that underlie their predictive accuracy. In this work, we developed an approach for leveraging high-throughput generative vision models to run targeted, hypothesis-driven experiments on the key image properties that drive DNN predictions of brain representation. Specifically, we used diffusion models to create diverse image variations while preserving targeted image information. This targeted information included specific visual features (e.g. edges, background) as well as semantics from captions and categories. Using our synthesized image variations, we quantified the impact of each interpretable manipulation on the representational similarity between AlexNet activations and image-evoked fMRI responses in early visual and occipital temporal cortex (EVC, OTC). We found that representational similarity to high-level OTC (but not EVC) was stable so long as the synthesized images retained their semantic content, and this effect was robust to substantial structural variations in the synthesized images. To demonstrate the broad utility of this method, we quantified the influence of objects, backgrounds, shapes, and other visual details on model performance, and we performed analogous targeted experiments on aspects of higher-level scene semantics (e.g. object relations). Overall, these findings highlight the promise of employing generative models to probe brain-model similarities. Our work provides insight into how specific forms of image information shape the relationship between computational models and brain responses, and it paves the way for a deeper understanding of how models approximate biological visual processing.