Attention: Neural mechanisms

Talk Session: Tuesday, May 21, 2024, 8:15 – 9:45 am, Talk Room 2
Moderator: Li Zhaoping, Max Planck Institute

Talk 1, 8:15 am, 51.21

Long-range modulatory feedback connections in deep neural networks support top-down category-based attention

Talia Konkle1 (), George A. Alvarez; 1Harvard University

Many views of the world are cluttered with multiple kinds of objects present, but at any given moment only a subset of this information may be task-relevant. Top-down attention can direct visual encoding based on internal goals, e.g. when looking for keys, attention mechanisms select and amplify the relevant key-like image statistics, aiding detection and modulating the gain across the visual hierarchy. Motivated by visual cognition and visual neuroscience findings, we designed long-range modulatory feedback pathways to outfit deep neural network models, with learnable channel-to-channel influences between source and destination layers that spatially broadcast feature-based gain signals. We trained a series of Alexnets with varying feedback pathways on 1000-way ImageNet classification to be accurate on both their feed-forward and modulated pass. First, we show that models equipped with these feedback pathways naturally show improved image recognition, adversarial robustness, and emergent brain-alignment, relative to baseline models. Critically, the final layer of these models can serve as a flexible communication interface between visual and cognitive systems, where cognitive-level goals (e.g. “key?”) can be specified as a vectors in the output space, and naturally leverage feedback projections to modulate earlier hierarchical processing stages. We compare and identify the effective ways to ‘cognitively steer’ the model based on prototype representations, which dramatically improve recognition of categories in composite images of multiple categories, succeeding where baseline feed-forward models fail. Further, these models recapitulate neural signatures of category-based attention—e.g. showing modulation of face and scene selective units inside the model when attending to either faces or scenes, when presented with a fixed face-scene composite image. Broadly, these models offer a mechanistic account of top-down category-based attention, demonstrating how long-range modulatory feedback pathways can allow different goal states to make flexible use of fixed visual circuity, supporting dynamic goal-based routing of incoming visual information.

Acknowledgements: CRCNS 8431439-01 to TK and NSF PAC COMP-COG 1946308 to GAA

Talk 2, 8:30 am, 51.22

7T CBV fMRI reveals cortical microcircuits of bottom-up saliency in the human brain

Peng Zhang1 (), Chen Liu1, Chengwen Liu1, Li Zhaoping2; 1Institute of Biophysics, Chinese Academy of Sciences, 2Max Planck Institute for Biological Cybernetics

A visual item in sharp contrast with its neighbors automatically captures attention. Whether bottom-up saliency signals arise initially in the primary visual cortex (V1) or in the parietal cortex is still controversial. To distinguish these two hypotheses, we investigated the cortical microcircuits of bottom-up saliency with cortical layer-dependent CBV fMRI at 7 Tesla. Behavioral experiments measured the contrast detection performance to orientation singletons presented either at low (15 degrees) or high (90 degrees) orientation contrast within uniformly oriented background bars. Contrast sensitivity was higher to singletons with high compared to low orientation contrast. CBV-weighted fMRI results showed that the orientation-saliency signal was strongest in the superficial layers of V1, and peaked in the middle layers of V2/V3 and the intraparietal sulcus (IPS). Contrast sensitivities of the orientation singletons also correlated with CBV signals in the superficial layers of V1. These findings support the hypothesis that bottom-up saliency map is initially created by iso-feature suppression through lateral inhibition in V1 superficial layers, and then projects to the parietal cortex through the feedforward connection.

Talk 3, 8:45 am, 51.23

Role of Theta Oscillations in Top-Down Control of Feature-based Attention

Sreenivasan Meyyappan1 (), Mingzhou Ding2, George Ron Mangun1,3,4; 1Center for Mind and Brain, University of California Davis, 2J Crayton Pruitt Family Department of Biomedical Engineering, University of Florida, 3Department of Psychology, University of California Davis, 4Department of Neurology, University of California Davis

Applying top-down control to selectively process and distinguish visual stimuli based on their attributes such as color or motion is known as feature-based attention. Attention-control signals from the specialized regions in the frontal and parietal cortex, also known as dorsal attention network (DAN), are reported to bias the activity in the visual cortex in favor of the attended feature. Prior work has been successful in identifying the role of alpha oscillations (8-12 Hz) in modulation of sensory processing in visual cortex. However, it remains unknown whether and which oscillatory neural activity may support network communication and integration within and between the nodes of the attentional control network. We hypothesize that the nodes in the DAN dynamically interact via theta band (3-7 Hz) activity, and this coordination enables the DAN to send top-down control signals to the visual cortex. We investigated this by recording EEG during a cued feature attention experiment where participants were cued on a trial-by-trial basis to attend either the direction of motion or color of the forthcoming stimuli (moving dots). Using multivariate decoding approaches comparing attend-color versus attend-motion in the post-cue/pre-target period, we observe the pattern of theta and alpha activity to be predictive of the attended feature and importantly, the decoding timecourse in theta band to temporally precede the decoding in the alpha band. Further, estimating the spectral coherence between an ensemble of frontal and parietal scalp electrodes as an index of cortical synchronization between attention control networks from different frequency bands (e.g., theta, alpha, beta, and gamma activity), we observed significant decoding only in the theta band compared to decoding on surrogate (temporally shuffled) data. These results highlight the distinct role of theta oscillations in enabling the top-down control of selective sensory processing at the visual cortical level.

Talk 4, 9:00 am, 51.24

Object-based association fields for grouping and attention

Hossein Adeli1 (), Seoyoung Ahn2, Gregory Zelinsky2, Nikolaus Kriegeskorte1; 1Columbia University, 2Stony Brook University

What are the neural mechanisms that group visual features into coherent object percepts? Association fields, mediated by long-range horizontal connections, have been shown to dynamically configure the neural response in early visual areas to form objects from collinear line segments. We propose that such association fields also exist in higher visual areas and contribute to object-based grouping and attention. To test this hypothesis, we modeled the connection strengths in the association fields by measuring the similarity between the local image features from a transformer-based vision model. We then tested the effectiveness of these object-based associations using a well-established grouping task—a two-dot paradigm. In this task, the model needs to determine whether a central and a peripheral dot are on the same or different objects in a natural scene. Our model performs this grouping task by gradually spreading attention, mediated by the association field, from the two dot locations to the neighboring areas. We observed remarkable performance in attention staying within the object while spreading, showing for the first time the plausibility of attention spread through horizontal connections as an object grouping mechanism in scenes. The model reaches a 'same-object' decision when two segments show a sufficient level of agreement in their feature representations, according to a predefined threshold. We observed a significant correlation between the time taken by the model to arrive at its decision and the actual human reaction time in the same task (72 participants for 1020 trials; r = 0.32, p < 0.001), significantly closing the gap between the baseline models and the subject-subject agreement (r = 0.42). In this work, we hypothesize and provide evidence for how the existence of object-based association fields can mediate the spread of attention to group objects in natural scenes providing novel hypotheses to be tested in neuroscience.

Talk 5, 9:15 am, 51.25

The role of expectations in visual spatial coding across the visual hierarchy

Ningkai Wang1,2,3 (), Ralph Wientjens1,2,3, Jurjen Heij2, Gilles de Hollander4, Jan Theeuwes1,2, Tomas Knapen1,2,3; 1Vrije Universiteit Amsterdam, Amsterdam, Netherlands, 2Spinoza Centre for Neuroimaging, Royal Netherlands Academy of Sciences, Amsterdam, Netherlands, 3Institute Brain and Behavior Amsterdam (iBBA), Amsterdam, Netherlands, 4Zurich Center for Neuroeconomics, Department of Economics, University of Zurich, Zurich, Switzerland

Predictive processing theorizes that the brain predicts events based on prior experiences. Mismatches between the predictions and input lead to prediction errors (PEs). Despite the theory's popularity, our understanding of the role of PEs in visual spatial perception remains limited. Here, we investigated predicted and unpredicted coding of visual locations across the visual hierarchy, utilizing the predictability of the standard population receptive field (pRF) mapping paradigm while sampling BOLD responses at ultra-high field fMRI. Our experiment featured different conditions in which unpredictable stimulus omissions and/or violations (different bar location and orientation) were either embedded in the standard stimulus sequence, or presented separately. These conditions serve to produce prediction errors, both of stimulus presence and of stimulus location. For all conditions, we first calculated test-retest reliability of BOLD responses to identical stimulus sequences in different brain regions. We reasoned that if PEs drive BOLD responses, this should increase test-retest reliability across runs relative to a fully predictable stimulus design. We indeed find this pattern of results selectively in higher-level and not lower level visual cortex. Next, we fit a spatial divisive normalisation (DN-pRF) model to the BOLD timecourses in the standard pRF stimulus sequence, and tested whether bold timecourses in conditions with unexpected stimuli follow this model, which is linear in time. This analysis also indicates that PEs drive high-level visual cortex responses more than low-level visual cortex. These findings suggest that prediction error responses in visual cortex follow the evolution of temporal scales of integration, from fast to slow, along the visual hierarchy. This hints at a tight relationship between temporal divisive normalization and predictive processing.

Talk 6, 9:30 am, 51.26

Stimulus representations in neural priority maps are equally enhanced by attention independent of the number of attended locations

Amelia Harrison1 (), Daniel Thayer1, Thomas Sprague1; 1UC Santa Barbara

When spatial attention is distributed across multiple visual field locations, performance in visual tasks is often impaired. This bottleneck is evident in behavioral and neural studies, especially when using complex stimuli, and is echoed in behavioral and neural measurements of visual working memory. Some studies have suggested that this is because distributing attention results in lower attentional enhancement in visual cortex compared to focused attention (e.g., McMains & Somers, 2005), while others support a bottleneck at a post-perceptual decision-making stage (e.g., White et al, 2017; Harrison et al., 2022; Chen & Seidemann, 2012). To characterize how stimulus representations in neural priority maps reflect these constraints and discriminate between these models, we scanned participants with fMRI while they performed a selective attention task in which they were cued on each trial to discriminate a target that appeared at the fixation point, 1 cued location, or 2 cued locations. Using a spatial inverted encoding model, we reconstructed images of priority maps from retinotopic brain regions which contained representations of each stimulus. Comparing map activation between focal attention and fixation conditions replicated the canonical finding that attention to one stimulus caused map activation enhancement at the attended stimulus location. Next, we examined map activation when both stimuli were attended. Strikingly, both stimulus representations were enhanced when attended, with an equivalent increase in map activation as observed with focused attention directed to a single stimulus. This pattern was consistent across retinotopic cortex, with no evidence for graded attentional enhancement in any region. Thus, our results are consistent with a model whereby fMRI signals are enhanced when a stimulus is attended, and the degree of enhancement does not wane as the number of attended stimuli increases. Such a ‘relevance’ marker may be used to identify neural populations for selective readout from relevant locations during decision-making.

Acknowledgements: Research supported by Cooperative Agreement W911NF-19-2-0026 for the Institute for Collaborative Biotechnologies, a University of California, Santa Barbara Academic Senate Research Grant, and an Alfred P Sloan Research Fellowship.