How does the human brain select and combine features over time to support goal-oriented behavior?
Poster Presentation 33.328: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Neural mechanisms
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Bruce Hansen1, Michelle Greene2, Audrey Kris1; 1Colgate University, 2Barnard College, Columbia University
Vision supports multiple behavioral goals, so the neural code must flexibly weight different features over time. The visual system selects a cascade of features for scene categorization (Greene & Hansen, 2020), and task demands alter the prioritization of those features (Hansen et al., 2025). However, when and how goal-relevant features are spatially selected remains an open question. To address that, we built a novel brain-guided convolutional neural network (CNN) with dedicated channels that were guided by the neural variance explained by each of eight features. Neural data (via 128-channel EEG) were collected from participants who viewed 78 scenes while performing either a path navigation task or a seating location task. The output layer was designed to classify each input image according to the observer’s task while viewing it. To successfully predict an observer’s task, the CNN combined image features with task-specific neural data. In each convolutional layer, eight independent sets of nodes were added for dedicated processing of features ranging from low-level wavelet and texture models to intermediate object and affordance models to high-level task-specific sentence embeddings. The network achieved 89.7% accuracy in task prediction. Using deconvolution, we projected feature-specific activations onto image locations to estimate each feature’s regional contribution by task. We compared feature-specific activation maps with the behavioral maps, in which participants indicated the relevant regions for walking and sitting in each image. Later network layers (neural data > 250ms post-stimulus onset) best aligned with behavior. Critically, task-specific features contributed most: the navigation task was best captured by object-based features, sentence embeddings from a navigation task, and general description sentence embeddings, while texture and affordance features, combined with sentence embeddings from a sitting task, best disentangled the feature space when participants performed the sitting task. These findings demonstrate goal-dependent, time-evolving neural representations spatially reweight features to support behavior.
Acknowledgements: James S. McDonnell Foundation grant (220020430) to BCH; National Science Foundation grant (2522311/2) to MRG and BCH.