Uncovering high-level visual cortex preferences by training convolutional neural networks on large neuroimaging data

Poster Presentation 43.311: Monday, May 22, 2023, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object Recognition: Models

K. Seeliger1, R. Leipe1,2, J. Roth1, M. N. Hebart1,3; 1Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany, 2Leipzig University, Germany, 3Department of Medicine, Justus Liebig University Giessen, Germany

Pretrained task-optimized convolutional neural networks are commonly used to predict brain responses to visual stimuli. Yet, they contain biases introduced by their training dataset and task objective (e.g. classification). Recent large-scale visual neuroimaging datasets have opened the avenue towards training modern convolutional neural networks with the objective of directly predicting brain responses measured with human neuroimaging data, which allows overcoming these biases. Here, we used the THINGS and the Natural Scenes Datasets – both massive functional MRI datasets acquired during the presentation of object photographs – to identify a suitable neural network architecture from the machine learning community from a set of candidate architectures (ResNet50, VGG-16, CORnet-S, and others) for predicting responses of individual regions in high-level visual cortex. Careful optimization of these networks yielded voxel-wise encoding models with high correlations, significantly surpassing state-of-the-art encoding performances of task-optimized models based on the same architectures. Treating these brain-optimized networks as in-silico models of ROIs in visual cortex, a sensitivity analysis based on passing millions of images through the network and a GAN-based synthesis of preferred images revealed the expected sensitivity of FFA, PPA, and EBA for faces, places, and body parts, respectively. Our results furthermore revealed novel selectivity, such as close-range pebble patterns in FFA and horizontal and perspectival lines in PPA. Together, these findings demonstrate the feasibility of training common neural network architectures on available massive neuroimaging datasets and provide novel insight into representations underlying human vision.

Acknowledgements: This work was supported by a Max Planck Research Group Grant (M.TN.A.NEPF0009) awarded to MNH and the ERC Starting Grant COREDIM (ERC-StG-2021-101039712) awarded to MNH.