2025 Sponsor Application

Welcome to VSS 2025

The 25th Annual Meeting of the Vision Sciences Society will be held
May 16-21, 2025 at the TradeWinds Island Resort in St. Pete Beach, Florida.
We hope you can join us.

Here are some key dates to put on your calendar

October 22, 2024 – Submissions Open
December 5, 2024 – Submissions Close
December 16, 2024 – Early Registration Opens

We look forward to seeing you in May!

Using deep networks to re-imagine object-based attention and perception

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 2

Organizers: Hossein Adeli1, Seoyoung Ahn2, Gregory Zelinsky2; 1Columbia University, 2Stony Brook University
Presenters: Patrick Cavanagh, Frank Tong, Paolo Papale, Alekh Karkada Ashok, Hossein Adeli, Melissa Le-Hoa Võ

What can Deep Neural Network (DNN) methods tell us about the brain mechanisms that transform visual features into object percepts? Using different state-of-the-art models, the speakers in this symposium will reexamine different cognitive and neural mechanisms of object-based attention (OBA) and perception and consider new computational mechanisms for how the visual system groups visual features into coherent object percepts. Our first speaker, Patrick Cavanagh, helped create the field of OBA and is therefore uniquely suited to give a perspective on how this question, essentially the feature-binding problem, has evolved over the years and has been shaped by paradigms and available methods. He will conclude by outlining his vision for how DNN architectures create new perspectives on understanding OBA. The next two speakers will review the recent behavioral and neural findings on object-based attention and feature grouping. Frank Tong will discuss the neural and behavioral signatures of OBA through the utilization of fMRI and eye tracking methods. He will demonstrate how the human visual system represents objects across the hierarchy of visual areas. Paolo Papale will discuss neurophysiological evidence for the role of OBA and grouping in object perception. Using stimuli systematically increasing in complexity from lines to natural objects (against cluttered backgrounds) he shows that OBA and grouping are iterative processes. Both talks will also include discussions of current modeling efforts, and what additional measures may be needed to realize more human-like object perception. The following two talks will provide concrete examples of how DNNs can be used to predict human behavior during different tasks. Lore Goetschalckx will focus on the importance of considering the time-course of grouping in object perception and will discuss her recent work on developing a method to analyze dynamics of different models. Using this method, she shows how a deep recurrent model trained on an object grouping task predicts human reaction time. Hossein Adeli will review modeling work on three theories of how OBA binds features into objects: one that implements object-files, another that uses generative processes to reconstruct an object percept, and a third model of spreading attention through association fields. In the context of these modeling studies, he will describe how each of these mechanisms was implemented as a DNN architecture. Lastly, Melissa Võ will drive home the importance of object representations and how they collectively create an object context that humans use to control their attention behavior in naturalistic settings. She shows how GANs can be used to study the hidden representations underlying our perception of objects. This symposium is timely because the advances in computational methods have made it possible to put old theories to the test and to develop new theories of OBA mechanisms that engage the role played by attention in creating object-centric representations.

Talk 1

The Architecture of Object-Based Attention

Patrick Cavanagh1, Gideon P. Caplovitz2, Taissa K. Lytchenko2, Marvin R. Maechler3, Peter U. Tse3, David R. Sheinberg4; 1Glendon College, York University, 2University of Nevada, Reno, 3Dartmouth College, 4Brown University

Evidence for the existence of object-based attention raises several important questions: what are objects, how does attention access them, and what anatomical regions are involved? What are the “objects” that attention can access? Several studies have shown that items in visual search tasks are only loose collections of features prior to the arrival of attention. Nevertheless, findings from a wide variety of paradigms, including unconscious priming and cuing, have overturned this view. Instead, the targets of object-based attention appear to be fully developed object representations that have reached the level of identity prior to the arrival of attention. Where do the downward projections of object-based attention originate? Current research indicates that the control of object-based attention must come from ventral visual areas specialized in object analysis that project downward to early visual areas. If so, how can feedback from object areas accurately target the object’s early locations and features when the object areas have only crude location information? Critically, recent work on autoencoders has made this plausible as they are capable of recovering the locations and features of the target objects from the high level, low dimensional codes in the object areas. I will outline the architecture of object-based attention, the novel predictions it brings, and discuss how it works in parallel with other attention pathways.

Talk 2

Behavioral and neural signatures of object-based attention in the human visual system

Frank Tong1, Sonia Poltoratski1, David Coggan1, Lasyapriya Pidaparthi1, Elias Cohen1; 1Vanderbilt University

How might one demonstrate the existence of an object representation in the visual system? Does objecthood arise preattentively, attentively, or via a confluence of bottom-up and top-down processes? Our fMRI work reveals that orientation-defined figures are represented by enhanced neural activity in the early visual system. We observe enhanced fMRI responses in the lateral geniculate nucleus and V1, even for unattended figures, implying that core aspects of scene segmentation arise from automatic perceptual processes. In related work, we find compelling evidence of object completion in early visual areas. fMRI response patterns to partially occluded object images resemble those evoked by unoccluded objects, with comparable effects of pattern completion found for unattended and attended objects. However, in other instances, we find powerful effects of top-down attention. When participants must attend to one of two overlapping objects (e.g., face vs. house), activity patterns from V1 through inferotemporal cortex are biased in favor of the covertly attended object, with functional coupling of the strength of object-specific modulation found across brain areas. Finally, we have developed a novel eye-tracking paradigm to predict the focus of object-based attention while observers view two dynamically moving objects that mostly overlap. Estimates of the precision of gaze following suggest that observers can entirely filter out the complex motion signals arising from the task-irrelevant object. To conclude, I will discuss whether current AI models can adequately account for these behavioral and neural properties of object-based attention, and what additional measures may be needed to realize more human-like object processing.

Talk 3

The spread of object attention in artificial and cortical neurons

Paolo Papale1, Matthew Self1, Pieter Roelfsema1; 1Netherlands Institute for Neuroscience

A crucial function of our visual system is to group local image fragments into coherent perceptual objects. Behavioral evidence has shown that this process is iterative and time-consuming. A simple theory suggested that visual neurons can solve this challenging task relying on recurrent processing: attending to an object could produce a gradual spread of enhancement across its representation in the visual cortex. Here, I will present results from a biologically plausible artificial neural network that can solve object segmentation by attention. This model was able to identify and segregate individual objects in cluttered scenes with extreme accuracy, only using modulatory top-down feedback as observed in visual cortical neurons. Then, I will present comparable results from large-scale electrophysiology recordings in the macaque visual cortex. We tested the effect of object attention with stimuli of increasing complexity, from lines to natural objects against cluttered backgrounds. Consistent with behavioral observations, the iterative model correctly predicted the spread of attentional modulation in visual neurons for simple stimuli. However, for more complex stimuli containing recognizable objects, we observed asynchronous but not iterative modulation. Thus, we produced a set of hybrid stimuli, combining local elements of two different objects, that we alternated with the presentation of stimuli of intact objects. By doing so, we made local information unreliable, forcing the monkey to solve the task iteratively. Indeed, we observed that this set of stimuli induced iterative attentional modulations. These results provide the first systematic investigation on object attention in both artificial and cortical neurons.

Talk 4

Time to consider time: Comparing human reaction times to dynamical signatures from recurrent vision models on a perceptual grouping task

Alekh Karkada Ashok1, Lore Goetschalckx1, Lakshmi Narasimhan Govindarajan1, Aarit Ahuja1, David Sheinberg1, Thomas Serre1; 1Brown University

To make sense of its retinal inputs, our visual system organizes perceptual elements into coherent figural objects. This perceptual grouping process, like many aspects of visual cognition, is believed to be dynamic and at least partially reliant on feedback. Indeed, cognitive scientists have studied its time course through reaction time measurements (RT) and have associated it with a serial spread of object-based attention. Recent progress in biologically-inspired machine learning, has put forward convolutional recurrent neural networks (cRNNs) capable of exhibiting and mimicking visual cortical dynamics. To understand how the visual routines learned by cRNNs compare to humans, we need ways to extract meaningful dynamical signatures from a cRNN and study temporal human-model alignment. We introduce a framework to train, analyze, and interpret cRNN dynamics. Our framework triangulates insights from attractor-based dynamics and evidential learning theory. We derive a stimulus-dependent metric, ξ, and directly compare it to existing human RT data on the same task: a grouping task designed to study object-based attention. The results reveal a “filling-in” strategy learned by the cRNN, reminiscent of the serial spread of object-based attention in humans. We also observe a remarkable alignment between ξ and human RT patterns for diverse stimulus manipulations. This alignment emerged purely as a byproduct of the task constraints (no supervision on RT). Our framework paves the way for testing further hypotheses on the mechanisms supporting perceptual grouping and object-based attention, as well as for inter-model comparisons looking to improve the temporal alignment with humans on various other cognitive tasks.

Talk 5

Three theories of object-based attention implemented in deep neural network models

Hossein Adeli1, Seoyoung Ahn2, Gregory Zelinsky2, Nikolaus Kriegeskorte1; 1Columbia University, 2Stony Brook University

Understanding the computational mechanisms that transform visual features into coherent object percepts requires the implementation of theories in scalable models. Here we report on implementations, using recent deep neural networks, of three previously proposed theories in which the binding of features is achieved (1) through convergence in a hierarchy of representations resulting in object-files, (2) through a reconstruction or a generative process that can target different features of an object, or (3) through the elevation of activation by spreading attention within an object via association fields. First, we present a model of object-based attention that relies on capsule networks to integrate features of different objects in the scene. With this grouping mechanism the model is able to learn to sequentially attend to objects to perform multi-object recognition and visual reasoning. The second modeling study shows how top-down reconstructions of object-centric representations in a sequential autoencoder can target different parts of the object in order to have a more robust and human-like object recognition system. The last study demonstrates how object perception and attention could be mediated by flexible object-based association fields at multiple levels of the visual processing hierarchy. Transformers provide a key relational and associative computation that may be present also in the primate brain, albeit implemented by a different mechanism. We observed that representations in transformer-based vision models can predict the reaction time behavior of people on an object grouping task. We also show that the feature maps can model the spreading of attention in an object.

Talk 6

Combining Generative Adversarial Networks (GANs) with behavior and brain recordings to study scene understanding

Melissa Le-Hoa Võ1, Aylin Kallmayer1; 1Goethe University Frankfurt

Our visual world is a complex conglomeration of objects that adhere to semantic and syntactic regularities, a.k.a. scene grammar according to which scenes can be decomposed into phrases – i.e, smaller clusters of objects forming conceptual units – which again contain so-called anchor objects. These usually large and stationary objects further anchor predictions regarding the identity and location of most other smaller objects within the same phrase and play a key role in guiding attention and boosting perception during real-world search. They therefore provide an important organizing principle for structuring real-world scenes. Generative adversarial networks (GANs) trained on images of real-world scenes learn the scenes’ latent grammar to then synthesize images that mimic images of real-world scenes increasingly well. Therefore GANs can be used to study the hidden representations underlying object-based perception serving as testbeds to investigate the role that anchor objects play in both the generation and understanding of scenes. We will present some recent work in which we presented participants with real and generated images recording both behavior and brain responses. Modelling behavioral responses from a range of computer vision models we found that mostly high-level visual features and the strength of anchor information predicted human scene understanding of generated scenes. Using EEG to investigate the temporal dynamics of these processes revealed initial processing of anchor information which generalized to subsequent processing of the scene’s authenticity. These new findings imply that anchors pave the way to scene understanding and that models predicting real-world attention and perception should become more object-centric.

< Back to 2024 Symposia

The Multifaceted effects of blindness and how sight might be restored

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 1

Organizer: Ella Striem-Amit1; 1Georgetown University
Presenters: Lara Coelho, Santani Teng, Woon Ju Park, Elizabeth J. Saccone, Ella Striem-Amit, Michael Beyeler

Congenital blindness illustrates the developmental roots of visual cortex functions. Here, a group of early-career researchers will present various perspectives on the multifaceted effects of blindness on the brain and behavior. To start off the symposium, Coelho will describe the effect of sight loss on multisensory properties, and the reliance on vision to develop an intact multisensory body representation. This presentation will highlight the dependence across modalities, revealing rich interactions between vision and body representations. Discussing a unique manifestation of compensation in blindness, Teng will discuss how echolocation functions in naturalistic settings and its properties of active sensing. Continuing the theme of integration across senses and diving into visual cortical reorganization, Park will argue for partial dependence and partial independence on vision for the development of motion processing in hMT+. Saccone will show evidence for a functional takeover of language over typically face-selective FFA in blindness, showing plasticity beyond sensory representations. Together, these two talks will highlight different views of brain plasticity in blindness. Adding to our discussion of the multifaceted nature of plasticity, Striem-Amit will discuss whether plasticity in the visual cortex is consistent across different blind individuals, showing evidence for divergent visual plasticity and stability over time in adulthood. The last speaker will discuss the challenges and potential for sight restoration using visual prostheses. Beyeler will discuss how some of the challenges of sight restoration can be addressed through perceptual learning of implant inputs. This talk highlights how understanding plasticity in the visual system and across the brain has direct applications for successfully restoring sight. Together, the symposium will bring different theoretical perspectives to illustrate the effects of blindness, revealing the extent and diversity of neural plasticity, and clarify the state-of-the-art capacities for sight restoration.

Talk 1

Implications of visual impairment on body representation

Lara Coelho1, Monica Gori; 1Unit for visually impaired people, Italian Institute of Technology, Genova, Italy

In humans, vision is the most accurate sensory modality for constructing our representation of space. It has been shown that visual impairment negatively influences daily living and quality of life. For example, spatial and locomotor skills are reduced in this population. One possibility is that these deficiencies arise from a distorted representation of the body. Body representation is fundamental for motor control, because we rely on our bodies as a metric guide for our actions. While body representation is a by-product of multisensory integration, it has been proposed that vision is necessary to construct an accurate representation of the body. In the MySpace project, we are investigating the role of visual experience on haptic body representations in sighted and visually impaired (VI) participants. To this end, we employ a variety of techniques to investigate two key aspects of body representation 1) size perception, and 2)the plasticity of the proprioceptive system. These techniques include landmark localization, psychophysics, and the rubber hand illusion. Our results in sighted participants show distortions in haptic but not visual body representation. In the VI participants there are distortions when estimating forearm, hand, and foot size in several different haptic tasks. Moreover, VI children fail to update their perceived body location in the rubber hand illusion task. Collectively, our findings support the hypothesis that vision is necessary to reduce distortions in haptic body representations. Moreover, we propose, that VI children may develop with impaired representations of their own bodies. We discuss possible opportunities for reducing this impairment.

Talk 2

Acoustic glimpses: The accumulation of perceptual information in blind echolocators

Santani Teng1; 1Smith-Kettlewell Eye Research Institute

Blindness imposes constraints on the acquisition of sensory information from the environment. To mitigate those constraints, some blind people employ active echolocation, a technique in which self-generated sounds, like tongue “clicks,” produce informative reflections. Echolocating observers integrate over multiple clicks, or samples, to make perceptual decisions that guide behavior. What information is gained in the echoacoustic signal from each click? Here, I will draw from similar work in eye movements and ongoing studies in our lab to outline our approaches to this question. In a psychoacoustic and EEG experiment, blind expert echolocators and sighted control participants localized a virtual reflecting object after hearing simulated clicks and echoes. Left-right lateralization improved on trials with more click repetitions, suggesting a systematic precision benefit to multiple samples even when each sample delivered no new sensory information. In a related behavioral study, participants sat in a chair but otherwise moved freely while echoacoustically detecting, then orienting toward a reflecting target located at a random heading in the frontal hemifield. Clicking behavior and target size (therefore sonar strength) strongly influenced the rate and precision of orientation convergence toward the target, indicating a dynamic interaction between motor-driven head movements, click production, and the resulting echoacoustic feedback to the observer. Taken together, modeling these interactions in blind expert practitioners suggests similar properties, and potential shared mechanisms, between active sensing behavior in visual and echoacoustic domains.

Talk 3

Constraints of cross-modal plasticity within hMT+ following early blindness

Woon Ju Park1, Kelly Chang, Ione Fine; 1Department of Psychology, University of Washington

Cross-modal plasticity following early blindness has been widely documented across numerous visual areas, highlighting our brain’s remarkable adaptability to changes in sensory environment. In many of these areas, functional homologies have been observed between the original and reorganized responses. However, the mechanisms driving these homologies remain largely unknown. Here, we will present findings that aim to answer this question within the area hMT+, which responds to visual motion in sighted individuals and to auditory motion in early blind individuals. Our goal was to examine how the known functional and anatomical properties of this area influence the development of cross-modal responses in early blind individuals. Using a multimodal approach that encompasses psychophysics, computational modeling, and functional and quantitative MRI, we simultaneously characterized perceptual, functional, and anatomical selectivity to auditory motion within early blind and sighted individuals. We find that some anatomical and functional properties of hMT+ are inherited, while others are altered in those who become blind early in life.

Talk 4

Visual experience is necessary for dissociating face- and language-processing in the ventral visual stream

Elizabeth J. Saccone1, Akshi1, Judy S. Kim2, Mengyu Tian3, Marina Bedny1; 1Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA, 2Center for Human Values, Princeton University, Princeton, NJ, USA, 3Center for Educational Science and Technology, Beijing Normal University at Zhuhai, China

The contributions of innate predispositions versus experience to face-selectivity in vOTC is hotly debated. Recent studies with people born blind suggest face specialization emerges regardless of experience. In blindness the FFA is said to process face shape, accessed through touch or sound, or maintain its behavioral role in person recognition by specializing for human voices. We hypothesized instead that in blind people the anatomical location of the FFA responds to language. While undergoing fMRI, congenitally blind English speakers (N=12) listened to spoken language (English), foreign speech (Russian, Korean, Mandarin), non-verbal vocalizations (e.g., laughter) and control non-human scene sounds (e.g., forest sounds) during a 1-back repetition task. Participants also performed a ‘face localizer’ task by touching 3D printed models of faces and control scenes and a language localizer (spoken words > backwards speech, Braille > tactile shapes). We identified individual-subject ROIs inside a FFA mask generated from sighted data. In people born blind, the anatomical location of the FFA showed a clear preference for language over all other sounds, whether human or not. Responses to spoken language were higher than to foreign speech or non-verbal vocalizations, which were not different from scene sounds. This pattern was observed even in parts of vOTC that responded more to touching faces. Specialization for faces in vOTC is influenced by experience. In the absence of vision, lateral vOTC becomes implicated in language. We speculate that shared circuits that evolved for communication specialize for either face recognition or language depending on experience.

Talk 5

Individual differences of brain plasticity in early visual deprivation

Ella Striem-Amit1; 1Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20057, USA

Early-onset blindness leads to reorganization in visual cortex connectivity and function. However, this has mostly been studied at the group level, largely ignoring differences in brain reorganization across early blind individuals. To test whether plasticity manifests differently in different blind individuals, we studied resting-state functional connectivity (RSFC) from the primary visual cortex in a large cohort of blind individuals. We find increased individual differences in connectivity patterns, corresponding to areas that show reorganization in blindness. Further, using a longitudinal approach in repeatedly sampled blind individuals, we showed that such individual patterns of organization and plasticity are stable over time, to the degree of decoding individual participant identity over 2 years. Together, these findings suggest that visual cortex reorganization is not ubiquitous, highlighting the potential diversity in brain plasticity and the importance of harnessing individual differences for fitting rehabilitation approaches for vision loss.

Talk 6

Learning to see again: The role of perceptual learning and user engagement in sight restoration

Michael Beyeler1; 1University of California, Santa Barbara

Retinal and cortical implants show potential in restoring a rudimentary form of vision to people living with profound blindness, but the visual sensations (“phosphenes”) produced by current devices often seem unnatural or distorted. Consequently, the ability of implant users to learn to make use of this artificial vision plays a critical role in whether some functional vision is successfully regained. In this talk, I will discuss recent work detailing the potential and limitations of perceptual learning in helping implant users learn to see again. Although the abilities of visual implant users tend to improve with training, there is little evidence that this is due to distortions becoming less perceptually apparent, but instead may be due to better interpretation of distorted input. Unlike those with natural vision, implant recipients must accommodate various visual anomalies, such as inconsistent spatial distortions and phosphene fading. Furthermore, perceptual measures such as grating acuity and motion discrimination, which are often used with the intention of objectively assessing visual function, may be modulated via gamification, highlighting the importance of user engagement in basic psychophysical tasks. Gamification may be particularly effective at engaging reward systems in the brain, potentially fostering greater plasticity through more varied stimuli and active attentional engagement. However, the effectiveness of such gamified approaches varies, suggesting a need for personalized strategies in visual rehabilitation.

< Back to 2024 Symposia

Attention: accept, reject, or major revisions?

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 2

Organizers: Alon Zivony1; 1University of Sheffield
Presenters: Britt Anderson, Ruth Rosenholtz, Wayne Wu, Sarah Shomstein, Alon Zivony

Is attention research in crisis? After more than a century, we have come full circle from the intuition that “everybody knows what attention is” (James, 1890) to the conclusion that “nobody knows what attention is” (Hommel et al., 2019). It has been suggested that attention is an incoherent and sterile concept, or unsuitable for scientific research. And yet, attention research continues as strongly as ever with little response to these critiques. Is the field ignoring glaring theoretical problems, or does the current conception of attention merely require some revisions? In this symposium, our speakers bring different perspectives to examine this critical question. Rather than merely raising issues with the concept of attention, each also suggests practical and theoretical solutions, which can hopefully inform future research. Each speaker will present either a critical view or defence of the concept of attention, and suggest whether attention should be abandoned, kept as is, or redefined. Our first two speakers will argue that scientists may be better off without the concept of attention. Britt Anderson will criticize the use of attention as an explanation of observed phenomena. He will suggest that the common usage is non-scientific and results in circular logic. He offers in its place an attention-free account of so-called attention effects. Ruth Rosenholtz argues that recent work, for example on peripheral vision, calls into question many of the basic tenets of attention theory. She will talk about her year of banning ‘attention’ in order to rethink attention from the ground up. The second group of speakers will question common understanding of attention but will argue in favour of it as a scientific concept. Wayne Wu will suggest that our shared methodology of studying attention commits us to the Jamesonian functional conceptualization of attention. He will argue that attention can and should be retained if we locate it in the right level analysis in cognitive explanation. Sarah Shomstein will discuss “attentional platypuses”, empirical abnormalities that do not fit into current attention research. These abnormalities reveal the need for a new way of thinking about attention. Alon Zivony will argue that many of the conceptual problems with attention stem from the standard view that equates attention with selection. Moving away from this definition will allow us to retain attention but will also require a change in our thinking. Each talk will conclude with a take-home message about what attention is and isn’t, a verdict of whether it should be abandoned or retained, and suggestions of how their understanding of attention can be applied in future research. We will conclude with a panel discussion.

Talk 1

Attention: Idol of the Tribe

Britt Anderson1; 1Dept of Psychology and Centre for Theoretical Neuroscience, University of Waterloo

The term ’attention’ has been a drag on our science ever since the early days of experimental psychology. Our frequent offerings and sacrifices (articles and the debates they provoke), and our unwillingness to abandon our belief in this reified entity indicates the aptness of the Jamesian phrase ”idol of the tribe.” While causal accounts of attention are empty, attention might be, as suggested by Hebb, a useful label. It could be used to indicate that some experimental observable is not immediately explained by the excitation of receptor cells. However, labeling of something as ’attention’ means there is something to be explained; not that something has been explained. Common experimental manipulations used to provoke visual selective attention: instructions, cues, and reward are in fact the guide to explaining away ’attention’. The observations provoked by such manipulations frequently induce behavioral performance differences not explainable in terms of differences in retinal stimulation. These manipulations are economically summarized as components of a process in which base rates, evidence, value, and plausibility combine to determine perceptual experience. After briefly reviewing the history of how attention has been confusing from the start, I will summarize the notion of conceptual fragmentation and show how it applies. I will then review how the traditional conditions of an attentional experiment provide the basis for a superior, attention free, account of the phenomena of interest, and I will present some of the opportunities for the use of more formal descriptions that should lead to better theoretically motivated experimental investigations.

Talk 2

Attention in Crisis

Ruth Rosenholtz1; 1NVIDIA Research

Recent research on peripheral vision has led to a paradigm-shifting conclusion: that vision science as a field must rethink the concept of visual attention. Research has uncovered significant anomalies not explained by existing theories, and some methods for studying attention may instead have uncovered mechanisms of peripheral vision. Nor can a summary statistic representation in peripheral vision solve these problems on its own. A year of banning “attention” in my lab allowed us to rethink attention from the ground up; this talk will conclude with some of the resulting insights.

Talk 3

Attention Unified

Wayne Wu1; 1Department of Philosophy and Neuroscience Institute, Carnegie Mellon University

For over a century, scientists have expressed deep misgivings about attention. A layperson would find this puzzling, for they know what attention is as well as those with sight know what seeing is. People visually attend all the time. Attention is real, we know what it is, and we can explain it. I shall argue that the problem of attention concerns the conceptual and logical structure of the scientific theory of attention. Because of shared methodology, we are committed to a single functional conception of attention, what William James articulated long ago. I show how this shared conception provides a principle of unification that links empirical work. To illustrate this, I show how two cueing paradigms tied to “external” and “internal” attention, spatial cueing and retro-cueing, are instances of the same kind of attention. Against common skepticism, I demonstrate that we are all committed to the existence of attention as a target of explanation. Yet in step with the skeptic, I show that attention is not an explainer in the sense that it is not a neural mechanism. Locating attention at the right level of analysis in cognitive explanation is key to understanding what it is and how science has made massive progress in understanding it.

Talk 4

What does a platypus have to do with attention?

Sarah Shomstein1; 1Department of Psychological and Brain Sciences, George Washington University

Decades of research on understanding the mechanisms of attentional selection have focused on identifying the units (representations) on which attention operates in order to guide prioritized sensory processing. These attentional units fit neatly to accommodate our understanding of how attention is allocated in a top-down, bottom-up, or historical fashion. In this talk, I will focus on attentional phenomena that are not easily accommodated within current theories of attentional selection. We call these phenomena attentional platypuses, as they allude to an observation that within biological taxonomies the platypus does not fit into either mammal or bird categories. Similarly, attentional phenomena that do not fit neatly within current attentional models suggest that current models need to be revised. We list a few instances of the ‘attentional platypuses’ and then offer a new approach, that we term Dynamically Weighted Prioritization, stipulating that multiple factors impinge onto the attentional priority map, each with a corresponding weight. The interaction between factors and their corresponding weights determine the current state of the priority map which subsequently constrains/guides attention allocation. We propose that this new approach should be considered as a supplement to existing models of attention, especially those that emphasize categorical organizations.

Talk 5

It’s time to redefine attention

Alon Zivony1; 1Department of Psychology, University of Sheffield

Many models of attention assume that attentional selection takes place at a specific moment in time which demarcates the critical transition from pre-attentive to attentive processing of sensory inputs. In this talk, I will argue that this intuitively appealing assumption is not only incorrect, but it is also the reason behind the conceptual confusion about what attention is, and how it should be understood in psychological science. As an alternative, I will offer a “diachronic” framework that views attention as a modulatory process that unfolds over time, in tandem with perceptual processing. This framework breaks down the false dichotomy between pre-attentive and attentive processing, and as such, offers new solutions to old problems in attention research (the early vs. late selection debate). More importantly, by situating attention within a broader context of selectivity in the brain, the diachronic account can provide a unified and conceptually coherent account of attention. This will allow us to keep the concept of attention but will also require serious rethinking about how we use attention as a scientific concept.

< Back to 2024 Symposia

The temporal evolution of visual perception

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 1

Organizers: Lina Teichmann1, Chris Baker1; 1Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, USA
Presenters: Lina Teichmann, Iris I. A. Groen, Diana Dima, Tijl Grootswagers, Rachel Denison

The human visual system dynamically processes input over the course of a few hundred milliseconds to generate our perceptual experience. Capturing the dynamic aspects of the neural response is therefore imperative to understand visual perception. By bringing five speakers together who use a diverse set of methods and approaches, the symposium aims to elucidate the temporal evolution of visual perception from different angles. All five speakers (four female) are early-career researchers based in Europe, Australia, the US, and Canada. Speakers will be allotted 18 minutes of presentation time plus 5 minutes of questions after each talk. In contrast to a lot of the current neuroimaging work, the symposium talks will focus on temporal dynamics rather than localization. Collectively, the work presented will demonstrate that the complex and dynamic nature of visual perception requires data that matches its temporal granularity. In the first talk, Lina Teichmann will present data from a large-scale study focusing on how individual colour-space geometries unfold in the human brain. Linking densely-sampled MEG data with psychophysics, her work on colour provides a test case to study the subjective nature of visual perception. Iris Groen will discuss findings from intracranial EEG studies that characterize neural responses across the visual hierarchy. Applying computational models, her work provides fundamental insights into how the visual response unfolds over time across visual cortex. Diana Dima will speak about how responses evoked by observed social interactions are processed in the brain. Using temporally-resolved EEG data, her research shows how visual information is modulated from perception to cognition. Tijl Grootswagers will present on studies investigating visual object processing. Using rapid series of object stimuli and linking EEG and behavioural data, his work shows the speed and efficiency of the visual system to make sense of the things we see. To conclude, Rachel Denison will provide insights into how we employ attentional mechanisms to prioritize relevant visual input at the right time. Using MEG data, she will highlight how temporal attention affects the dynamics of evoked visual responses. Overall, the symposium aims to shed light on the dynamic nature of visual processing at all levels of the visual hierarchy. It will be a chance to discuss benefits and challenges of different methodologies that will allow us to gain a comprehensive insight into the temporal aspects of visual perception.

Talk 1

The temporal dynamics of individual colour-space geometries in the human brain

Lina Teichmann1, Ka Chun Lam2, Danny Garside3, Amaia Benitez-Andonegui4, Sebastian Montesinos1, Francisco Pereira2, Bevil Conway3,5, Chris Baker1,5; 1Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, USA, 2Machine Learning Team, National Institute of Mental Health, Bethesda, USA, 3Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, USA, 4MEG Core Facility, National Institute of Mental Health, Bethesda, USA, 5equal contribution

We often assume that people see the world in a similar way to us, as we can effectively communicate how things look. However, colour perception is one aspect of vision that varies widely among individuals as shown by differences in colour discrimination, colour constancy, colour appearance and colour naming. Further, the neural response to colour is dynamic and varies over time. Many attempts have been made to construct formal, uniform colour spaces that aim to capture universally valid similarity relationships, but there are discrepancies between these models and individual perception. Combining Magnetoencephalography (MEG) and psychophysical data we examined the extent to which these discrepancies can be accounted for by the geometry of the neural representation of colour and their evolution over time. In particular, we used a dense sampling approach and collected neural responses to hundreds of colours to reconstruct individual fine-grained colour-space geometries from neural signals with millisecond accuracy. In addition, we collected large-scale behavioural data to assess perceived similarity relationships between different colours for every participant. Using a computational modelling approach, we extracted similarity embeddings from the behavioural data to model the neural signal directly. We find that colour information is present in the neural signal from approximately 70 ms onwards but that neural colour-space geometries unfold non-uniformly over time. These findings highlight the gap between theoretical colour spaces and colour perception and represent a novel avenue to gain insights into the subjective nature of perception.

Talk 2

Delayed divisive normalisation accounts for a wide range of temporal dynamics of neural responses in human visual cortex

Iris I. A. Groen1, Amber Brands1, Giovanni Piantoni2, Stephanie Montenegro3, Adeen Flinker3, Sasha Devore3, Orrin Devinsky3, Werner Doyle3, Patricia Dugan3, Daniel Friedman3, Nick Ramsey2, Natalia Petridou2, Jonathan Winawer4; 1Informatics Institute, University of Amsterdam, Amsterdam, Netherlands, 2University Medical Center Utrecht, Utrecht, Netherlands, 3New York University Grossman School of Medicine, New York, NY, USA, 4Department of Psychology and Center for Neural Science, New York University, New York, NY, USA

Neural responses in visual cortex exhibit various complex, non-linear temporal dynamics. Even for simple static stimuli, responses decrease when a stimulus is prolonged in time (adaptation), reduce to stimuli that are repeated (repetition suppression), and rise more slowly for low contrast stimuli (slow dynamics). These dynamics also vary depending on the location in the visual hierarchy (e.g., lower vs. higher visual areas) and the type of stimulus (e.g., contrast pattern stimuli vs. real-world object, scenes and face categories). In this talk, I will present two intracranial EEG (iEEG) datasets in which we quantified and modelled the temporal dynamics of neural responses across the visual cortex at millisecond resolution. Our work shows that many aspects of these dynamics are accurately captured by a delayed divisive normalisation model in which neural responses are normalised by recent activation history. I will highlight how fitting this model to the iEEG data unifies multiple disparate temporal phenomena in a single computational framework, thereby revealing systematic differences in temporal dynamics of neural population responses across the human visual hierarchy. Overall, these findings suggest a pervasive role of history-dependent delayed divisive normalisation in shaping neural response dynamics across the cortical visual hierarchy.

Talk 3

How natural action perception unfolds in the brain

Diana Dima1, Yalda Mohsenzadeh1; 1Western University, London, ON, Canada

In a fraction of a second, humans can recognize a wide range of actions performed by others. Yet actions pose a unique complexity challenge, bridging visual domains and varying along multiple perceptual and semantic features. What features are extracted in the brain when we view others’ actions, and how are they processed over time? I will present electroencephalography work using natural videos of human actions and rich feature sets to determine the temporal sequence of action perception in the brain. Our work shows that action features, from visual to semantic, are extracted along a temporal gradient, and that different processing stages can be dissociated with artificial neural network models. Furthermore, using a multimodal approach with video and text stimuli, we show how conceptual action representations emerge in the brain. Overall, these data reveal the rapid computations underlying action perception in natural settings. The talk will highlight how a temporally resolved approach to natural vision can uncover the neural computations linking perception and cognition.

Talk 4

Decoding rapid object representations

Tijl Grootswagers1, Amanda K. Robinson2; 1The MARCS Institute for Brain, Behaviour and Development, School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia, 2Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia

Humans are extremely fast at recognising objects, and can do this very reliably. Information about objects and object categories emerges within 200 milliseconds in the human visual system, even under difficult conditions such as occlusion or low visibility. These neural representations can be highly complex and multidimensional, despite relying on limited visual information. Understanding emerging object representations necessitates time-resolved neuroimaging methods with millisecond precision, such as EEG and MEG. Recent time-resolved neuroimaging work has used decoding methods in rapid serial visual presentation designs to show that relevant object-information about multiple sequentially presented objects is robustly encoded by the brain. This talk will highlight recent research on the time course of object representations in rapid image sequences, focusing on three key findings: (1) object representations are highly automatic, with robust representations emerging even with fast-changing visual input. (2) emerging object representations are highly robust to changes in context and task, suggesting strong reliance on feedforward processes. (3) object representational structures are highly consistent across individuals, to the extent that neural representations are predictive of independent behavioural judgments on a variety of tasks. Together, these findings suggest that the first sweep of information through the visual system contains highly robust information that is readily available for read-out in behavioural decisions.

Talk 5

Isolating neural mechanisms of voluntary temporal attention

Rachel Denison1,2, Karen Tian1,2, Jiating Zhu1, David Heeger2, Marisa Carrasco2; 1Boston University, Department of Psychological and Brain Sciences, USA, 2New York University, Department of Psychology and Center for Neural Science, USA

To handle the continuous influx of visual information, temporal attention prioritizes visual information at task-relevant moments in time. We first introduce a probabilistic framework that clarifies the conceptual distinction and formal relation between temporal attention, linked to timing relevance, and temporal expectation, linked to timing predictability. Next, we present two MEG studies in which we manipulated temporal attention while keeping expectation constant, allowing us to isolate neural mechanisms specific to voluntary temporal attention. Participants were cued to attend to one of two sequential grating targets with predictable timing, separated by a 300 ms SOA. The first study used time-resolved steady-state visual evoked responses (SSVER) to investigate how temporal attention modulates anticipatory visual activity. In the pre-target period, visual activity (measured with a background SSVER probe) steadily ramped up as the targets approached, reflecting temporal expectation. Furthermore, we found a low-frequency modulation of visual activity, which shifted approximately 180 degrees in phase according to which target was attended. The second study used time-resolved decoding and source reconstruction to examine how temporal attention affects dynamic target representations. Temporal attention to the first target enhanced its orientation representation within a left fronto-cingulate region ~250 ms after stimulus onset, perhaps protecting it from interference from the second target within the visual cortex. Together these studies reveal how voluntary temporal attention flexibly shapes pre-target periodic dynamics and post-target routing of stimulus information to select a task-relevant stimulus within a sequence.

< Back to 2024 Symposia

Large-scale visual neural datasets: where do we go from here?

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 2

Organizers: Alessandro Gifford1, Kendrick Kay2; 1Freie Universität Berlin, 2University of Minnesota
Presenters: Eline R. Kupers, Won Mok Shim, Ian Charest, Tomas Knapen, Jacob Prince, Alessandro T. Gifford

Vision science has witnessed an increase in worldwide initiatives collecting and publicly releasing large-scale visual neural datasets (LSVNDs). These initiatives have allowed thousands of vision scientists to readily harness LSVNDs, enabling new investigations and resulting in novel discoveries. This suggests vision science is entering a new era of inquiry characterized by big open data. The rapid growth in the collection and use of LSVNDs spawns urgent questions, the answers to which will steer the direction of the field. How can different researchers across the vision sciences spectrum benefit from these datasets? What are the opportunities and pitfalls of LSVNDs for theory formation? Which kinds of LSVNDs are missing, and what characteristics should future LSVNDs have to maximize their impact and utility? How can LSVNDs support a virtuous cycle between neuroscience and artificial intelligence? This symposium invites the VSS community to engage these questions in an interactive and guided community process. We will start with a short introduction (5 minutes), followed by six brief, thought-provoking talks (each 9 minutes plus 3 minutes for Q&A). Enriched by these perspectives, the symposium will then move to a highly interactive 30-minute discussion where we will engage the audience to discuss the most salient open questions on LSVNDs, generate and share insights, and foster new collaborations. Speakers from diverse career stages will cover a broad range of perspectives on LSVNDs, including dataset creators (Kupers, Shim), dataset users (Prince), and researchers playing both roles (Gifford, Charest, Knapen). Eline Kupers will expose behind-the-scenes knowledge on a particular LSVND that has received substantial traction in the field, the Natural Scenes Dataset (NSD), and will introduce ongoing efforts for a new large-scale multi-task fMRI dataset called Visual Cognition Dataset. Won Mok Shim will introduce the Naturalistic Perception Action and Cognition (NatPAC) 7T fMRI dataset, and discuss how this dataset allows investigation of the impact of goal-directed actions on visual representations under naturalistic settings. Ian Charest will present recent results on semantic representations enabled by NSD, as well as ongoing large-scale data collection efforts inspired by NSD. Tomas Knapen will demonstrate how combining LSVNDs with other datasets incites exploration and discovery, and will present ongoing large-scale data collection efforts in his group. Jacob Prince will provide a first-hand perspective on how researchers external to the data collection process can apply LSVNDs for diverse research aims across cognitive neuroscience, neuroAI, and neuroimaging methods development. Finally, Ale Gifford will highlight broad opportunities that LSVNDs offer to the vision sciences community, and present a vision for the future of large-scale datasets. This symposium will interest any VSS member interested in neural data as it will expose opportunities and limitations of LSVNDs and how they relate to smaller, more narrowly focused datasets. Our goal is to align the VSS community with respect to open questions regarding LSVNDs, and help incentivize and coordinate new large-scale data collection efforts. We believe this symposium will strengthen the impact of LSVNDs on the field of vision science, and foster a new generation of big-data vision scientists.

Talk 1

The Natural Scenes Dataset: Lessons Learned and What’s Next?

Eline R. Kupers1,2, Celia Durkin2, Clayton E Curtis3, Harvey Huang4, Dora Hermes4, Thomas Naselaris2, Kendrick Kay2; 1Stanford University, 2University of Minnesota, 3New York University, 4Mayo Clinic

Release and reuse of rich neuroimaging datasets have rapidly grown in popularity, enabling researchers to ask new questions about visual processing and to benchmark computational models. One highly used dataset is the Natural Scenes Dataset (NSD), a 7T fMRI dataset where 8 subjects viewed more than 70,000 images over the course of a year. Since its recent release in September 2021, NSD has gained 1700+ users and resulted in 55+ papers and pre-prints. Here, we share behind-the-scenes considerations and inside knowledge from the NSD acquisition effort that helped ensure its quality and impact. This includes lessons learned regarding funding, designing, collecting, and releasing a large-scale fMRI dataset. Complementing the creator’s perspective, we also highlight the user’s viewpoint by revealing results from a large anonymous survey distributed amongst NSD users. These results will provide valuable (and often unspoken) insights into both positive and negative experiences interacting with NSD and other publicly available datasets. Finally, we discuss ongoing efforts towards two new large-scale datasets: (i) NSD-iEEG, an intracranial electroencephalography dataset with extensive electrode coverage in cortex and sub-cortex using a similar paradigm to NSD and (ii) Visual Cognition Dataset, a 7T fMRI dataset that samples a large diversity of tasks on a common set of visual stimuli (in contrast to NSD which samples a large diversity of stimuli during a single task). By sharing these lessons and ideas, we hope to facilitate new data collection efforts and enhance the ability of these datasets to support new discoveries in vision and cognition.

Talk 2

Exploring naturalistic vision in action with the 7T Naturalistic Perception, Action, and Cognition (NatPAC) Dataset

Won Mok Shim1,2, Royoung Kim1,2, Jiwoong Park1,2; 1Institute of Basic Science, Republic of Korea, 2Sungkyunkwan University

Large-scale human neuroimaging datasets have provided invaluable opportunities to examine brain and cognitive functions. Our recent endeavor, the 7T NatPAC project, is designed to provide high-resolution human MRI structural and functional datasets using moderately dense sampling (12–16 2-hr sessions per subject) across a broad range of tasks. While previous large-scale datasets have featured sparse sampling of cognitive functions, our goal is to encompass a more extensive spectrum of cognitive and affective processes through diverse tasks, spanning both structured and naturalistic paradigms. Notably, we incorporated naturalistic tasks to probe a variety of higher-order cognitive functions including watching movies, freely speaking, and interactive 3D video game playing within a Minecraft environment. Through a collection of innovative Minecraft-based games simulating real-world behaviors, we aim to investigate the neural mechanisms of perception, action, and cognition as an integrative process that unfolds in naturalistic contexts. In this talk, I will focus on a shepherding game, where participants engage in strategic planning with hierarchical subgoals and adaptively update their strategies while navigating a virtual world. In combination with high-precision eye tracking data corrected for head motion, we explore how visual responses, including population receptive field (pRF) mapping, are modulated in the visual cortex and frontoparietal regions during free viewing and complex goal-directed behaviors compared to passive viewing of game replays and conventional pRF experiments. I will discuss the broader implications of the impact of goal-directed actions on visual representations and how large-scale datasets enable us to examine such effects in naturalistic settings.

Talk 3

Exploiting large-scale neuroimaging datasets to reveal novel insights in vision science

Ian Charest1,2, Peter Brotherwood1, Catherine Landry1, Jasper van den Bosch1, Shahab Bakhtiari1,2, Tim Kietzmann3, Frédéric Gosselin1, Adrien Doerig3; 1Université de Montréal, 2Mila – Québec AI Institute, 3University of Osnabrück

Building quantitative models of neural activity in the visual system is a long-standing goal in neuroscience. Though this research program is fundamentally limited by the small scale and low signal-to-noise of most existing datasets, with the advent of large-scale datasets it has become possible to build, test, and discriminate increasingly expressive competing models of neural representation. In this talk I will describe how the scale of the 7T fMRI Natural Scenes Dataset (NSD) has made possible novel insights into the mechanisms underlying scene perception. We harnessed recent advancements in linguistic artificial intelligence to construct models that capture progressively richer semantic information, ranging from object categories to word embeddings to scene captions. Our findings reveal a positive correlation between a model’s capacity to capture semantic information and its ability to predict NSD data, a feature then replicated with recurrent convolutional networks trained to predict sentence embeddings from visual inputs. This collective evidence suggests that the visual system, as a whole, is better characterized by an aim to extract rich semantic information rather than merely cataloging object inventories from visual inputs. Considering the substantial power of NSD, collecting additional neuroimaging and behavioral data using the same image set becomes highly appealing. We are expanding NSD through the development of two innovative datasets: an electroencephalography dataset called NSD-EEG, and a mental imagery vividness ratings dataset called NSD-Vividness. Datasets like NSD not only provide fresh insights into the visual system but also inspire the development of new datasets in the field.

Talk 4

Farewell to the explore-exploit trade-off in large-scale datasets

Tomas Knapen1,2, Nick Hedger3, Thomas Naselaris4, Shufan Zhang1,2, Martin Hebart5,6; 1Vrije Universiteit, 2Royal Dutch Academy of Arts and Sciences, 3University of Reading, 4University of Minnesota, 5Justus Liebig University, 6Max Planck Institute for Human Cognitive and Brain Sciences

LSVNDs are a very powerful tool for discovery science. Due to their suitability for exploration, large datasets synergize well when supplemented with more exploitative datasets focused on small-scale hypothesis testing that can confirm exploratory findings. Similar synergy can be attained when combining findings across datasets, where one LSVND can be used to confirm and extend discoveries from another LSVND. I will showcase how we have recently leveraged several large-scale datasets in unison to discover principles of topographic visual processing throughout the brain. These examples demonstrate how LSVNDs can be used to great effect, especially in combination across datasets. In our most recent example, we combined the HCP 7T fMRI dataset (a “wide” dataset with 180 participants, 2.5 hrs of whole-brain fMRI each) with NSD (a “deep” dataset with 8 participants, 40 hrs of whole-brain fMRI each) to investigate visual body-part selectivity. We discovered homuncular maps in high-level visual cortex through connectivity with primary somatosensory cortex in HCP, and validated the body-part tuning of these maps using NSD. This integration of wide and deep LSVNDs allows inference about computational mechanisms at both the individual and population levels. For this reason, we believe the field needs a variety of LSVNDs. I will briefly present ongoing work from my lab collecting new ‘deep’ LSVND contributions: a brief (2.5-s) video watching dataset and a retinotopic mapping dataset, each with up to 10 sessions of 7T fMRI in 8 subjects.

Talk 5

Large datasets: a Swiss Army knife for diverse research aims in neuroAI

Jacob Prince1, Colin Conwell2, Talia Konkle1; 1Harvard University, 2Johns Hopkins University

This talk provides a first-hand perspective on how users external to the data collection process can harness LSVNDs as foundation datasets for their research aims. We first highlight recent evidence that these datasets help address and move beyond longstanding debates in cognitive neuroscience, such as the nature of category selective regions, and the visual category code more broadly. We will show evidence that datasets like NSD have provided powerful new insight into how items from well-studied domains (faces, scenes) are represented in the context of broader representational spaces for objects. Second, we will highlight the potential of LSVNDs to answer urgent, emergent questions in neuroAI – for example, which inductive biases are critical for obtaining a good neural network model of the human visual system? We will describe a series of controlled experiments leveraging hundreds of open-source DNNs, systematically varying inductive biases to reveal the factors that most directly impact brain predictivity at scale. Finally, for users interested in neuroimaging methods development, we will highlight how the existence of these datasets has catalyzed rapid progress in methods for fMRI signal estimation and denoising, as well as for basic analysis routines like PCA and computing noise ceilings. We will conclude by reflecting on both the joys and pain points of working with LSVNDs, in order to help inform the next generation of these datasets.

Talk 6

What opportunities do large-scale visual neural datasets offer to the vision sciences community?

Alessandro T. Gifford1, Benjamin Lahner2, Pablo Oyarzo1, Aude Oliva2, Gemma Roig3, Radoslaw M. Cichy1; 1Freie Universität Berlin, 2MIT, 3Goethe Universität Frankfurt

In this talk I will provide three complementary examples of the opportunities that LSVNDs offer to the vision sciences community. First, LSVNDs of naturalistic (thus more ecologically valid) visual stimulation allow the investigation of novel mechanisms of high-level visual cognition. We are extensively recording human fMRI and EEG responses for short naturalistic movie clips; modeling results reveal that semantic information such as action understanding or movie captions is embedded in neural representations. Second, LSVNDs contribute to the emerging field of NeuroAI, advancing research in vision sciences through a symbiotic relationship between visual neuroscience and computer vision. We recently collected a large and rich EEG dataset of neural responses to naturalistic images, using it on the one hand to train deep-learning-based end-to-end encoding models directly on brain data, thus aligning visual representations in models and the brain, and on the other hand to increase the robustness of computer vision models by exploiting inductive biases from neural visual representations. Third, LSVNDs make possible critical initiatives such as challenges and benchmarks. In 2019 we founded the Algonauts Project, a platform where scientists from different disciplines can cooperate and compete in creating the best predictive models of the visual brain, thus advancing the state-of-the-art in brain modeling as well as promoting cross-disciplinary interaction. I will end with some forward-looking thoughts on how LSVNDs might transform the vision sciences.

< Back to 2024 Symposia

Neurodiversity in visual functioning: Moving beyond case-control studies

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 1

Organizers: Catherine Manning1, Michael-Paul Schallmo2; 1University of Reading, UK, 2University of Minnesota
Presenters: Catherine Manning, Michael-Paul Schallmo, Victor Pokorny, Brian Keane, Beier Yao, Alice Price

Although vision science has a rich history of investigating atypical functioning in developmental and psychiatric conditions, these studies have tended to compare a single diagnosis against a normative comparison group (the case-control approach). However, by studying diagnoses in isolation, we cannot determine whether case-control differences are condition-specific, or instead reflect neural changes that occur across multiple conditions. A related challenge to the case-control approach is the growing recognition that categorical diagnoses are not biologically or psychologically discrete entities: multiple diagnoses commonly co-occur within individuals, considerable heterogeneity is found among individuals with the same diagnosis, and similarities are often found between diagnosed individuals and those with subclinical traits. Moreover, categorical diagnoses do not clearly map onto the underlying biology (e.g., genes, neural function). Accordingly, there has been a recent conceptual shift away from the traditional case-control approach towards considering continuous, transdiagnostic dimensions of neurodiversity, which might better reflect the underlying biology (c.f. NIH’s Research Domain Criteria framework). By studying dimensions of visual functioning across conditions, we will elucidate the mechanisms implicated in cases of atypical visual functioning, while also helping to understand individual differences in the non-clinical population. This symposium will bring together cutting-edge research that goes beyond the traditional case-control approach to demonstrate this recent conceptual shift. Speakers representing diverse career-stages, scientific approaches and nationalities will present research encompassing a range of conditions (e.g., autism, dyslexia, schizophrenia, bipolar disorder, migraine) and methods (EEG, fMRI, psychophysics, computational modelling, questionnaires). Cathy Manning will first introduce the traditional case-control approach and its limitations, before presenting EEG and behavioural work identifying both convergence and divergence in autistic and dyslexic children’s visual motion processing and decision-making. Second, Michael-Paul Schallmo will show that weaker surround suppression is shared by both adults with autism and schizophrenia, and linked to continuous dimensions of psychiatric symptoms. Third, Victor Pokorny will describe a recent meta-analysis that found surprisingly weak evidence for generally weakened use of visuospatial context in schizophrenia, bipolar disorder, and related sub-clinical populations, but stronger evidence for specific alterations in contrast perception. Fourth, Brian Keane will describe how functional connectivity involving a higher-order visual network is aberrant in psychosis patients, regardless of diagnosis. Fifth, Beier Yao will present a visuomotor mechanism that is altered across psychosis diagnoses and relates to positive symptoms. Finally, Alice Price will describe how factors of the visual Cardiff Hypersensitivity Scale differ across conditions and in the general population. We will finish with a panel discussion drawing out overall themes and covering theoretical and practical considerations for advancing investigations into neurodiversity in visual functioning. The symposium will inform a richer understanding within the VSS community of visual function in psychiatric and neurodevelopmental conditions, and individual differences more broadly. The presentations and discussion will benefit both junior and senior vision scientists by highlighting cutting-edge methods and emerging theories of neurodiversity. The symposium is timely not only because of the recent “transdiagnostic revolution” (Astle et al., 2022), but also due to the increasing prevalence of diagnoses (e.g., autism, mental health difficulties).

Talk 1

Visual processing and decision-making in children with autism and dyslexia: Insights from cross-syndrome approaches

Catherine Manning1,2; 1University of Reading, UK, 2University of Birmingham, UK

Atypical visual processing has been reported in a range of developmental conditions, including autism and dyslexia. One explanation for this is that certain neural processes are vulnerable to atypical development, leading to shared effects across developmental conditions. However, few studies make direct comparisons between developmental conditions, or use sensitive-enough methods, to conclude whether visual processing is affected differently in these conditions, or whether they are affected similarly, therefore reflecting a more general marker of atypical development. After evaluating the current state of the science, I will present findings from two sets of studies that apply computational modelling approaches (equivalent noise modelling and diffusion modelling) and measure EEG data in matched groups of autistic, dyslexic and typically developing children aged 6 to 14 years (n = ~50 per group). These methods help pinpoint the component processes involved in processing visual information and making decisions about it, while linking brain and behaviour. The results identify both areas of convergence and divergence in autistic and dyslexic children’s visual processing and decision-making. For example, both autistic and dyslexic children show differences in late stimulus-locked EEG activity in response to coherent motion stimuli, which may reflect reduced segregation of signal-from-noise. However only dyslexic children (and not autistic children) show a reduced accumulation of sensory evidence which is reflected in a shallower build-up of activity in a centro-parietal EEG component. Therefore, while there may be some shared effects across conditions, there are also condition-specific effects, which will require refined theories.

Talk 2

Weaker visual surround suppression in both autism spectrum and psychosis spectrum disorders

Michael-Paul Schallmo1; 1University of Minnesota

Issues with sensory functioning and attention are common in both autism spectrum and psychosis spectrum disorders. Despite important differences in symptoms and developmental time course, these conditions share a number of common features with regard to visual perception. One such phenomenon that we and others have observed in both populations is a reduced effect of surrounding spatial context during the perception of basic visual features such as contrast or motion. In this talk, we will consider whether these differences in visual function may have a common source. In a series of psychophysical, and brain imaging experiments, we found that young adults with ASD showed weaker visual surround suppression during motion perception, as compared to neurotypical individuals. This was reflected by differences in behavioral task performance and fMRI responses from area MT. Likewise, across multiple experiments in people with psychosis, we have found that individuals with schizophrenia show weaker behavioral and neural surround suppression during visual contrast perception. Recently, we used a divisive normalization model to show that narrower spatial attention may be sufficient to explain weaker surround suppression in ASD. This theory was subsequently given support by another group who showed weaker suppression for narrow vs. broad attention conditions in healthy adults. Previous studies have also found narrower spatial attention both in people with ASD and in schizophrenia. Thus, we suggest narrower attention may be a common sensory difference that is sufficient to account for weaker surround suppression across both ASD and schizophrenia, versus neurotypicals.

Talk 3

Atypical use of visuospatial context in schizophrenia, bipolar disorder, and subclinical populations: A meta-analysis

Victor Pokorny1, Sam Klein1, Collin Teich2, Scott Sponheim1,2, Cheryl Olman1, Sylia Wilson1; 1University of Minnesota, 2Minneapolis Veterans Affairs Health Care System

Visual perception in people with psychotic disorders is thought to be minimally influenced by surrounding visual elements (i.e. visuospatial context). Visuospatial context paradigms have unique potential to clarify the neural bases of psychotic disorders because a) the neural mechanisms are well-studied in both animal and human models and b) generalized cognitive deficits are unlikely to explain altered performance. However, the published literature on the subject is conflicting and heterogeneous such that a systematic consolidation and evaluation of the published evidence is needed. We conducted a systematic review and meta-analysis of 46 articles spanning over fifty years of research. Articles included behavioral, fMRI and EEG reports in schizophrenia, bipolar disorder, and subclinical populations. When pooling across all paradigm types, we found little evidence of reduced use of visuospatial context in schizophrenia (Hedges’ g=0.20), and marginal evidence for bipolar disorder (g=0.25). The strongest evidence was observed for altered contrast perception paradigms in schizophrenia (g=0.73). With respect to subclinical populations, we observed immense heterogeneity in populations of interest, individual-difference measures, and study designs. Our meta-analysis provided surprisingly weak evidence for the prevailing view that psychotic disorders are associated with a general reduction in use of visuospatial context. Instead, we observed strongest evidence for a specific alteration in the effect of visuospatial context during contrast perception. We propose altered feedback to primary visual cortex as a potential neural mechanism of this effect.

Talk 4

A novel somato-visual functional connectivity biomarker for affective and non-affective psychosis

Brian Keane1, Yonatan Abrham1, Michael Cole2, Brent Johnson1, Carrisa Cocuzza3; 1University of Rochester, 2The State University of New Jersey, 3Yale University

People with psychosis are known to exhibit thalamo-cortical hyperconnectivity and cortico-cortical hypoconnectivity with sensory networks, however, it remains unclear if this applies to all sensory networks, whether it impacts affective and non-affective psychosis equally, or whether such differences could form the basis of a viable biomarker. To address the foregoing, we harnessed data from the Human Connectome Early Psychosis Project and computed resting-state functional connectivity (RSFC) matrices for healthy controls and affective/non-affective psychosis patients who were within 5 years of illness onset. Primary visual, secondary visual (“visual2”), auditory, and somatomotor networks were defined via a recent brain network partition. RSFC was determined for 718 regions (358 subcortical) via multiple regression. Both patient groups exhibited cortico-cortical hypoconnectivity and thalamo-cortical hyperconnectivity in somatomotor and visual2 networks. The patient groups were similar on every RSFC comparison. Across patients, a robust psychosis biomarker emerged when thalamo-cortical and cortico-cortical connectivity values were averaged across the somatomotor and visual2 networks, normalized, and subtracted. Four thalamic regions linked to the same two networks disproportionately drove the group difference (p=7e-10, Hedges’ g=1.10). This “somato-visual” biomarker was present in antipsychotic-naive patients and discoverable in a 5 minute scan; it could differentiate psychosis patients from healthy or ADHD controls in two independent data sets. The biomarker did not depend on comorbidities, had moderate test-retest reliability (ICC=.59), and could predict patient status in a held-out sample (sensitivity=.66, specificity=.82, AUC=.83). These results show that- across psychotic disorder diagnoses- an RSFC biomarker can differentiate patients from controls by the early illness stages.

Talk 5

Abnormal oculomotor corollary discharge signaling as a trans-diagnostic mechanism of psychosis

Beier Yao1,2,3, Martin Rolfs4, Rachael Slate5, Dominic Roberts3, Jessica Fattal6, Eric Achtyes7,8, Ivy Tso9, Vaibhav Diwadkar10, Deborah Kashy3, Jacqueline Bao3, Katharine Thakkar3; 1McLean Hospital, 2Harvard Medical School, 3Michigan State University, 4Humboldt University, 5Brigham Young University, 6Northwestern University, 7Cherry Health, 8Western Michigan University Homer Stryker M.D. School of Medicine, 9The Ohio State University, 10Wayne State University

Corollary discharge signals (CD) are “copies” of motor signals sent to sensory areas to predict the corresponding input. Because they are used to distinguish actions generated by oneself versus external forces, altered CDs are a hypothesized mechanism for agency disturbances in psychosis (e.g., delusion of alien control). We focused on the visuomotor system because the CD relaying circuit has been identified in primates, and the CD influence on visual perception can be quantified using psychophysical paradigms. Previous studies have shown a decreased influence of CD on visual perception in (especially more symptomatic) individuals with schizophrenia. We therefore hypothesized that altered CDs may be a trans-diagnostic mechanism of psychosis. We examined oculomotor CDs (using the trans-saccadic localization task) in 49 participants with schizophrenia or schizoaffective disorder (SZ), 36 psychotic bipolar participants (BPP), and 40 healthy controls (HC). Participants made a saccade to a visual target. Upon saccade initiation, the target disappeared and reappeared at a horizontally displaced position. Participants indicated the direction of displacement. With intact CDs, participants can remap the pre-saccadic target and make accurate perceptual judgements. Otherwise, participants may use saccade landing site as a proxy of pre-saccadic target. We found that both SZ and BPP were less sensitive to target displacement than HC. Regardless of diagnosis, patients with more severe positive symptoms were more likely to rely on saccade landing site. These results suggest a reduced influence of CDs on visual perception in SZ and BPP and, thus, that altered CD may be a trans-diagnostic mechanism of psychosis.

Talk 6

The four factors of visual hypersensitivity: definition and measurement across 16 clinical diagnoses and areas of neurodiversity

Alice Price1, Petroc Sumner1, Georgie Powell1; 1Cardiff University

Subjective sensitivity to visual stimuli, including repeating patterns and bright lights, is known to associate with several clinical conditions (e.g., migraine, anxiety, autism), and also occurs in the general population. Anecdotal reports suggest that people might be sensitive to different types of visual stimuli (e.g., to motion vs lights). The visual Cardiff Hypersensitivity Scale-Visual (CHYPS-V) was developed to define and measure the different factors of visual hypersensitivity, using questions which focus upon functional impact rather than affective changes. Across five samples (n > 3000), we found four highly replicable factors using bifactor modelling. These were brightness (e.g., sunlight), repeating patterns (e.g., stripes), strobing (e.g., light flashes), and intense visual environments (e.g., supermarkets). The CHYPS-V and its subscales show very good reliability (α > .80, ω > .80) and improved correlations with measures of visual discomfort. We also used the CHYPS-V to delineate how these factors may differentiate clinical diagnoses and areas of neurodiversity from each other, and from the general population. Differences from individuals reporting no clinical diagnoses were most pronounced for the intense visual environments subscale, with individuals reporting a diagnosis of autism, fibromyalgia, or persistent postural perceptual dizziness (PPPD) scoring highest. Whilst many conditions showed a similar pattern of visual sensitivity across factors, some conditions (e.g., migraine, PPPD) show evidence of condition specific sensitivities (e.g., to pattern, or to strobing). Further to identifying the factor structure of visual hypersensitivity, CHYPS-V can be used to help investigate underlying mechanisms which give rise to these differences in visual experience.

< Back to 2024 Symposia

Talk Sessions for Moderators