Comparing artificial neural network models with varied objectives to probe the role of sensory representation in primate visual memorability

Poster Presentation 43.320: Monday, May 20, 2024, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Visual Memory: Encoding, retrieval

Ram Ahuja1 (), Kohitij Kar1; 1Department of Biology, York University

Imagine a typical experience of scrolling through numerous photos on social media. While most images blend into obscurity, certain ones, like a playful kitten tangled in yarn, linger in our memories. This selective retention, known as image memorability, raises intriguing questions about its neural basis, particularly how the sensory representations in the cortex facilitate this behavior. In this study, we investigate this process, leveraging recent strides in computer vision through artificial neural network (ANN) models. These models, designed to mimic the ventral visual pathway in primates, offer a vast hypotheses space for the brain function that are critical for determining the memorability of an image. We compare two distinct types of ANNs: models geared toward basic object categorization (e.g., ResNet-50, AlexNet, GoogLeNet) and those tailored to predict image memorability (e.g., MemNet, ViTMem) on a set of 200 images (20 images from 10 distinct object categories) from the MS-COCO dataset. Consistent with previous results, we observed that both these model classes can predict which images humans find most memorable. However, our results show that they produce significantly different internal representations as assessed by representational similarity analysis (comparing representations across model classes in architecture-specific and non-specific manner). In addition, here, we used neural data recorded across the macaque inferior temporal (IT) in 6 monkeys to address two primary questions. 1) Can both model classes accurately predict variance in the neural data? 2) Do they predict distinct variances in the responses of the individual neurons? Surprisingly, our results show that memorability models (MemNet) predict significantly more variance in the neural data than object categorization models with matched architecture (AlexNet) – emphasizing a) the need to further probe these model classes as encoding models of the ventral stream, b) provide a new normative framework to think about the evolution of sensory representations.

Acknowledgements: Google Research, CFREF, Brain Canada, SFARI