How do visual tasks alter the representational space of identical scenes? Insights from a brain-supervised convolutional neural network

Poster Presentation 26.462: Saturday, May 18, 2024, 2:45 – 6:45 pm, Pavilion
Session: Scene Perception: Neural mechanisms

Bruce C. Hansen1 (), Henry A.S. Lewinsohn1, Michelle R. Greene2; 1Colgate University, 2Barnard College

The neural representation of visual information is not a static pattern, but instead undergoes multiple transformations over time (Hansen et al., 2021), and supports different feature use with differing task demands (Greene & Hansen, 2020). However, exactly how task-relevant information is built up and subsequently used by the observer is only vaguely understood. To model that process, we constructed a novel convolutional neural network (CNN) where the convolutional layers were independently supervised by EEG responses at different time points. The CNN’s goal was to use image information evaluated against neural responses to differentiate between two different tasks performed on identical real-world scenes. Participants (n = 24) viewed repeated presentations of 80 scenes while making cued assessments about either the presence of an object in the scene, or whether the scene afforded the ability to perform a function. Neural data were gathered via 128-channel EEG in a standard visual evoked potential (VEP) paradigm. Deconvolution was used to back-project onto image space activations across the layers of our brain-supervised CNN to reveal how the neural responses guided the differentiation of identical scenes at different image locations. The distribution of local activations was then compared to behavioral assessments of task-relevant information at each image location obtained through a crowd-sourced experiment. The behavioral data showed that the central region of image space was frequently informative for the object task, with the ground plane being most often informative for the function task. Crucially, our brain-supervised CNN used those task-relevant regions more to differentiate between identical sets of stimuli at ~70ms and ~250ms. Interestingly, the brain-supervised CNN made differential use of the task-relevant information within the early and late time points, suggesting a two-stage analysis of behaviorally-relevant scene locations. Our findings suggest that the observer's task-specific engagement with visual information substantially alters early neural representations.

Acknowledgements: James S. McDonnell Foundation grant (220020430) to BCH; National Science Foundation grant (1736394) to BCH and MRG.