Symposia

The quest for 'the average person' in vision science: Promise and pitfalls

Symposium: Friday, May 15, 2026, 1:45 – 3:45 pm, Talk Room 2

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

Organizers: Jeremy Wilmer¹, Michael Herzog²; ¹Wellesley College, ²Swiss Institute of Technology (EPFL)
Presenters: Michael Herzog, Jeremy Wilmer, Jonathan Winawer, Anna Kosovicheva, Michael Webster, Alasdair Clarke

In everyday language, average means typical, usual, and ordinary. In data analysis, the average is a mathematically defined summary statistic. A good summary statistic should match the ordinary meaning of average, capturing the main tendencies while “dispensing with needless details” (Oxford Languages). But does the scientific practice of describing the human mind and brain through the average reasonably capture what is important? Human variation is ubiquitous. It demands explanation, gives life color, and inspires scientific curiosity. Moreover, as soon as one considers multiple dimensions—even highly correlated ones—the "average person" quickly ceases to exist (Rose, 2015; Downey, 2024). In this sense, the average person may, in fact, represent no one at all. At the same time, data descriptions that retain all details can be uninterpretable; they may lead one to miss the forest for the trees. In this symposium, we critically examine the scientific reliance on averaging across individuals in vision research. When does this approach succeed? When does it fall short? And what complementary methods might it require? The talks in this symposium tackle these questions via diverse methodologies (neural, behavioral, computational) and across multiple content areas (faces, illusions, color, scenes, graphs, eye movements). Herzog and colleagues show that the classic N1 component of visual evoked potentials is an artifact of averaging, and that there is no common factor in visual illusions; in both cases, the average response is atypical and deviations from it constitute stable individual traits. Wilmer and colleagues show that the average viewer’s reading of plotted averages masks massive variability and misperception, concluding that such graphs are a suboptimal method of visual data communication. Winawer and Benson show that standard cortical atlases, derived from population averages, often have large errors when applied to individual brains, and explore parcellation methods that balance efficiency with individual-level accuracy. Kosovicheva and colleagues show that even processes as simple as localization vary considerably between individuals and are tied to other forms of spatial judgment, suggesting that results relying on the average observer do not reflect individuals’ perceptual reality. Webster and colleagues review the individual differences manifest at different levels of color vision, the neural coding of color, and how they can be compensated to increase a common color experience among observers. Finally, Clarke and colleagues considers what we can infer from correlational analyses on averaged data and proposes new modeling solutions for studying individual differences. Taken together, these talks reveal a wide array of rich insights embedded in non-average-ness; that is, in deviations from the average. They show, for example, that patterns of covariance across individuals can reveal mechanisms; that proper stratification before averaging can reap benefits of averaging, such as increased signal to noise ratio, without compromising naturalism; and that the existence and magnitude of stable variation can be a phenomenon worthy of study in its own right. We hope that this symposium will stimulate ongoing discussion about the benefits and risks of averaging data across individuals, and about the complementarity between traditional average-based approaches and approaches that embrace individual differences.

Talk 1

Inter-individual variability is (often) not noise

Michael Herzog¹, Melissa Faggella¹, Simona Garrobio¹; ¹Swiss Institute of Technology (EPFL)

Classically, vision scientists treat inter-individual differences as a nuisance and eliminate it by aggregating individuals (grand-group average). Behind this procedure is the assumption that there is one “true” waveform, hidden in noise: vision research is ergodic, i.e., conclusions based on the mean are also valid for the single participants and vice versa. Here, we tested this assumption. We analysed visual evoked potentials (VEPs) recorded during a backward masking task and found that different observers had different numbers of peaks in their waveforms. When averaging, we obtained a single peak, the classic N1 component. We tested the same participants 5 and 10 years later and obtained virtually the same waveforms, i.e., a participant with 3 peaks had 3 peaks also after a decade. Hence, differences in waveforms are traits, not noise, and the N1 is an artifact of averaging. In fact, many participants did not show an N1. Similar results hold true for behavioural tests. Participants performed a battery of illusions. We found almost no correlations between the illusions. An observer may experience a strong Ebbinghaus and a weak Müller-Lyer illusion. Hence, there is no common factor for illusions. Still, in each participant, the respective illusion magnitudes were stable across a year. We will discuss the implications of heterogeneity for vision research, including how to measure such heterogeneity properly, and we will show how inter-individual differences may pave new avenues in vision research.

Talk 2

Graphed averages sow disagreement: The average reading of an average is atypical

Jeremy Wilmer¹, Sarah Kerns¹, Ken Nakayama²; ¹Wellesley College, ²University of California, Berkeley

A key aim when graphing data is that, as far as possible, different viewers should gain a comparable conception of the data from the same graph. The average (mean) has long been the most common computed and plotted value in vision research. Yet little is known about how plotted averages are perceived and understood, especially by viewers who may vary in statistical training. Here, we first developed an accessible yet information-rich drawing-based measure of how viewers read graphs of averages. We then gathered readings from a broad array of participants. Surprisingly, roughly 1 in 5 viewers, across a wide range of education levels, misinterpreted the bar in a bar chart of averages as depicting the range of the data, with the average frequently placed near the middle of the bar. The average read of the average, therefore, was inside the bar, a location with few to no real responses. A second surprising finding was that viewers both wildly underestimated and disagreed upon the amount of variation around individual averages. For example, for a graph taken directly from the most popular Introductory Psychology textbook, a gender difference was falsely perceived, by the average viewer, to have zero overlap, as if all men outperformed all women. Moreover, across many graphs and levels of statistical training, viewers’ conceptions of variation and overlap differed so much that, again, the average response reflected virtually no actual responses. We conclude that average-only graphs routinely fail at their basic tasks of consistent, and correct, data communication.

Talk 3

Common patterns and individual differences in visual cortical maps

Jonathan Winawer¹, Noah Benson²; ¹New York University, ²University of Washington

About a quarter of the human cerebral cortex is visual, consisting of multiple retinotopic maps and other areas that are highly responsive to certain classes of stimuli such as faces. Identifying these areas in individual participants has many applications in neuroscience and medicine: interpreting visual disorders; tracking cortical changes over development and aging; localizing intracranial electrodes in patients; characterizing how different areas respond to stimuli or tasks; tracing EEG or MEG signals back to specific cortical maps. Functional MRI experiments can identify most of these areas in any healthy individual, but this can be time-consuming and costly. An alternative way to identify visual areas is by standardized atlases, which take advantage of the tendency for specific visual areas to align with particular sulci and gyri. The atlases rely on two levels of averaging: one for the cortical surface itself, and another for the location of visual maps on this surface. Such atlases are now used in many research studies. However, atlases without functional localization at the individual level raise a number of questions: How often do they mislabel cortical areas? When they err, how large is the error? In short, how much do we lose by treating each individual as if they are the average? We argue that for some applications, cortical atlases without individual functional localizer data are insufficiently accurate. We then consider new methods that have some of the efficiency advantages of atlases while still respecting individual differences.

Talk 4

Individual differences in localization reveal links between fundamental visual processes

Anna Kosovicheva¹; ¹University of Toronto Mississauga

Approaches to examining individual differences in visual perception often emphasize high-level processes like object recognition, but what variability can we observe in fundamental visual tasks, and are these between-observer differences simply random noise? In a striking example of variability, several studies demonstrate that individuals exhibit consistent, idiosyncratic patterns of directional error in position judgments. In a series of laboratory studies, participants report the positions of briefly flashed targets in the periphery, and the directional errors in these tasks have been shown to be highly stable over several months and consistent across measurement methods (Kosovicheva & Whitney, 2017). These responses are weakly correlated between observers, such that the “average observer” shows accurate responses. Importantly, these directional biases have been linked to individual differences in other fundamental processes, including perceived size and acuity (Wang, Murai, & Whitney, 2020), as well as the strength of visual crowding (Haseeb, Wolfe, & Kosovicheva, 2023). More recently, we have shown that these idiosyncrasies exist not only in laboratory samples but in much larger, less-controlled, tasks (with over 9,400 observers and 4.5 million trials), demonstrating that stable individual variation is a meaningful, underacknowledged element in understanding localization. In this online sample, we found that directional localization errors were, on average, weakly correlated between all possible pairs of participants, yet were highly consistent and stable within each observer. Focusing on the average observer has revealed much about vision, but understanding visual function requires tackling the complexity and variability of individuals to reveal important links between different processes.

Talk 5

Interpreting and accounting for individual differences in color vision

Michael Webster¹, Kara Emery², Camilla Simoncelli¹; ¹University of Nevada, Reno, ²New York University

The concept of a “standard” or average observer is central to color science and application, but masks the marked variability even among observers with “typical” trichromatic color vision. These differences arise independently at many levels and affect diverse aspects of color coding, from genetics and sensitivity to color perception and cognition. As a result, two observers viewing the same stimulus may have very different experiences. Measurements of these variations have provided important and often surprising clues about the mechanisms and representational structure of color vision. Differences can be amplified in modern wide-gamut devices, and there is increasing recognition and interest in accounting for them to better display and communicate information about color. One approach is to introduce individual differences into the stimulus by tailoring it to the individual, so that two observers, viewing their own stimuli, have more standard color experiences.

Talk 6

CANCELLED - Generative models can explain (or at least help us understand) individual differences in cognition

Alasdair Clarke¹, Anna Hughes¹, Amelia Hunt²; ¹University of Essex, ²University of Aberdeen

Nearly all experiments in cognition and visual perception involve taking multiple measurements from each participant. This includes taking multiple recordings of the same concept (e.g., recording response time on a number of trials), and taking multiple measurements per trial (e.g., response time and accuracy). The number of measurements expands as we add in eye-tracking metrics, EEG statistics, and so on. Researchers interested in individual differences typically assume that these measurements are i) mutually independent and ii) free from confounds, before taking simple summary statistics and then applying correlation procedures. A general theme of research on individual differences in our field has been a lack of correlations. I argue that, on reflection, this shouldn’t be surprising, as the measurements we put into our correlations are imprecise composites of multiple causal factors. For example, it is well established that reaction times are influenced by learning and inter-trial serial dependencies. This means that participants in a hypothetical visual search experiment may all have identical “skill” in search, but could exhibit wildly different RTs due to differences in learning rates and serial dependency effects. I present simulated data to demonstrate how such processes can both mask correlations and produce spurious correlations. A potential solution to my hypothetical problem is the use of generative models with latent parameters. I provide an example in the form of FoMo, which decomposes performance in visual foraging into a small number of per-participant parameters.

2025 Symposia

Symposium Submission Guidelines and Policies

Policies

The symposium organizer must be a current 2026 member in good standing.
Invited speakers must register for the meeting but need not be members.
No speaker or organizer can participate in more than one symposium.
Speaker substitutions are not allowed unless due to unforeseeable, valid reasons after submission.
Online presentations are not allowed.
If a symposium talk has more than one author, it must be presented by the first author.
Submitting a symposium proposal or speaking in a symposium does not prevent you from submitting an abstract for a talk or poster presentation at VSS.
Organizers must ensure that all speakers are committed to participating in the symposium and registering for the meeting before submitting a proposal.

For questions about symposium submission policies, please contact us at

Symposia are presented at the VSS 2026 Annual Meeting on the first day (Friday).

VSS 2026 will be a fully in-person meeting with restricted possibilities for remote presentations. Symposium presentations must be in-person.

Four to six symposia will be scheduled, each in a 2-hour time slot.

Symposia can be organized along the lines of content, methodology or application, but in every case, talks within a symposium should focus on broader conceptual themes than a typical VSS presentation.

Proposals are evaluated by the VSS Board of Directors based on the following criteria:

Scientific merit
Theoretical and/or methodological innovation
Timeliness
Breadth and appeal to a substantial number of VSS attendees
Lack of overlap with the regular program and recent symposia
A slate of speakers that provides appropriately broad representation of the VSS membership’s national and international provenance and range of approaches.

Symposium Format

The recommended format is four to six talks, followed by a panel discussion that involves both the speakers and the audience. The allocation of time for talks and Q&A within the 2-hour session is flexible and can be determined at the organizers’ discretion. This may include Q&A after each individual talk, concentrating it at the end of the session through a panel discussion, or a combination of both approaches. Other formats will be considered, but proposals for other formats should include a clear rationale. Proposals from early career investigators are encouraged.

Symposium Information

The symposium proposal is submitted using a multi-page form that includes information describing the symposium and the talks in the symposium. A symposium may have a maximum of three organizers. A minimum of four talks is required; up to six talks are allowed. Talks must be entered in the order they will be presented. It is best to collect information about the individual talks from their authors before you start the submission process.

As the symposium organizer, you must have prior approval from your talk presenters they consent to participate in this symposium (and no other symposium), and to register and attend VSS 2025 in person to present their talk. The organizer must also agree to the Symposium Policies and disclose any Conflicts of Interest.

Required symposium information:

Symposium title.
Brief description of the symposium (maximum 100 words). Appears on the symposium overview page.
Full description of the symposium (maximum 500 words). Appears on the symposium detail page.
Estimated attendance.
Name, affiliation, and contact information for the organizers (maximum 3 organizers).
Acknowledgements (optional).

Required information for each talk:

Talk title.
Talk abstract (maximum 250 words).
Author names and affiliations.
Email and country of citizenship of the talk presenter (first author).
The presenter must agree to an Ethics statement
The presenter must disclose any Conflicts of Interest.

Organizer and Speaker Requirements

The symposium organizers (maximum of three) must be current VSS members (for 2026), but invited speakers are not required to be VSS members. All speakers are required to register for the meeting. Submitting a symposium proposal or speaking in a symposium does not prevent organizers or speakers from submitting an abstract for a talk or poster presentation at VSS. An individual may participate as organizer or speaker in only one symposium. The symposium organizer(s) may be a speaker in the symposium.

Organizers must ensure that all speakers are committed to participating in the symposium before submitting a proposal, and organizers must also ensure that speakers have not agreed to participate in more than one symposium. If a symposium talk has more than one author, it must be presented by the first author. For each speaker, please provide up to three references to published articles relevant to the proposed talk.

Disclosure of Conflicts of Interest

It is the responsibility of the symposium organizer to disclose any relevant commercial relationships or other conflicts of interest for all speakers and their co-authors in the submitted proposal. This information must include the name of any organization with which a commercial relationship exists for both the speaker and each co-author.

Each symposium speaker must verbally disclose all relevant commercial relationships or conflicts of interest at the beginning of their talk. This information should also be included on a slide during the presentation, outlining the names of the organization(s) involved with any commercial relationship(s) for the speaker and each co-author.

Compliance with this policy is a requirement.

Submission Schedule

Submissions Open: October 20, 2025
Submissions Close: November 20, 2025 Extended
Notification of Accepted Symposia: November 26, 2025

Submitting a Symposium

To submit a symposium, Log in to your MyVSS Account or Create a New MyVSS Account, pay for your 2026 membership, and then click the Submit a Symposium button.

For questions about Symposium Submissions, please contact us at

Symposium Submission Guidelines & Policies

Policies

The symposium organizer must be a current 2025 member in good standing.
Invited speakers must register for the meeting but need not be members.
No speaker or organizer can participate in more than one symposium.
Speaker substitutions are not allowed unless due to unforeseeable, valid reasons after submission.
Online presentations are not allowed.
If a symposium talk has more than one author, it must be presented by the first author.
Submitting a symposium proposal or speaking in a symposium does not prevent you from submitting an abstract for a talk or poster presentation at VSS.
Before submitting a proposal, organizers must ensure that all speakers are committed to participating in the symposium and registering for the meeting before submitting a proposal.

For questions about Symposium Submission Policies, please contact us at

Symposia are presented at the VSS 2025 Annual Meeting on the first day (Friday).

VSS 2025 will be a fully in-person meeting with no virtual components.

Four to six symposia will be scheduled, each in a 2-hour time slot.

Symposia can be organized along the lines of content or methodology, but in every case, talks within a symposium should focus on broader conceptual themes than a typical VSS presentation.

Proposals are evaluated by the VSS Board of Directors based on the following criteria:

Scientific merit
Theoretical and/or methodological innovation
Timeliness
Breadth and appeal to a substantial number of VSS attendees
Lack of overlap with the regular program and recent symposia
A slate of speakers that provides appropriately broad representation of the VSS membership’s national and international provenance and range of approaches.

Symposium Format

Symposium Information

Required symposium information:

Symposium title.
Brief description of the symposium (maximum 100 words). Appears on the symposium overview page.
Full description of the symposium (maximum 500 words). Appears on the symposium detail page.
Estimated attendance.
Name, affiliation, and contact information for the organizers (maximum 3 organizers).
Acknowledgements (optional).

Required information for each talk:

Talk title.
Talk abstract (maximum 250 words).
Author names and affiliations.
Email and country of citizenship of the talk presenter (first author).
The presenter must agree to an Ethics statement
The presenter must disclose any Conflicts of Interest.

Organizer and Speaker Requirements

The symposium organizers (maximum of three) must be current VSS members (for 2025), but invited speakers are not required to be VSS members. All speakers are required to register for the meeting. Submitting a symposium proposal or speaking in a symposium does not prevent organizers or speakers from submitting an abstract for a talk or poster presentation at VSS. An individual may participate as organizer or speaker in only one symposium. The symposium organizer(s) may be a speaker in the symposium.

Disclosure of Conflicts of Interest

Each symposium speaker must verbally disclose all relevant commercial relationships or conflicts of interest at the beginning of their talk. This information should also be included on a slide during the presentation, outlining the names of the name of the organization(s) involved with any commercial relationship(s) for the speaker and each co-author.

Compliance with this policy is a requirement.

Symposium Review

Proposals for Symposia are evaluated by the VSS Board of Directors. In addition to considering the VSS 2025 focus outlined above, our criteria include scientific merit, timeliness, theoretical innovation and breadth, methodological innovation, lack of overlap with the regular program and recent symposia, and speaker composition.

Submission Schedule

Submissions Open: October 22, 2024
Submissions Close: November 15, 2024
Notification of Accepted Symposia: November 25, 2024

Submitting a Symposium

To submit a symposium, Log in to your MyVSS Account or Create a New MyVSS Account, pay for your 2025 membership, and then click the Submit a Symposium button.

For questions about Symposium Submissions, please contact us at

2024 Symposia

Neurodiversity in visual functioning: Moving beyond case-control studies

Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 1

Organizers: Catherine Manning¹, Michael-Paul Schallmo²; ¹University of Reading, UK, ²University of Minnesota

Visual functioning in psychiatric and developmental conditions is typically studied by comparing a single diagnosis against a control group. However this approach cannot tell us whether atypical visual functioning is condition-specific or shared across conditions, and it neglects co-occurrence and heterogeneity. Accordingly, recent conceptualisations have moved away from traditional diagnostic boundaries towards considering transdiagnostic dimensions of neurodiversity. This symposium will bring these recent conceptual advances to the broader VSS community, through cutting-edge work spanning conditions and methods. We will first present studies that directly compare conditions to uncover convergence and divergence, before moving towards transdiagnostic studies of visual functioning. More…

Large-scale visual neural datasets: where do we go from here?

Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 2

Organizers: Alessandro Gifford¹, Kendrick Kay²; ¹Freie Universität Berlin, ²University of Minnesota

Recently, there has been an increase in both collection and use of large-scale visual neural datasets (LSVNDs), suggesting that the field of vision science is entering a new era of big open data. This transformation raises new and exciting questions about LSVNDs: their potential strengths, their potential pitfalls, how they can promote theory formation, and what LSVNDs we need most. This symposium addresses these questions through six talks from both LSVND creators and users, along with a guided interactive discussion with the audience aimed at sharing knowledge among VSS members and setting community-centered goals. More…

The temporal evolution of visual perception

Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 1

Organizers: Lina Teichmann¹, Chris Baker¹; ¹Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, USA

Visual processing is highly dynamic with neural representations evolving over several hundred milliseconds. This symposium will present multiple perspectives on the time course of visual processing, giving insight into how light falling on the retina ultimately gives rise to rich visual percepts. The speakers will focus on different methods (including EEG, MEG, intracranial recordings, and behaviour) and different domains of vision (including colour, object recognition, social perception and attention). Collectively, the work presented in this symposium will provide novel insights into the dynamic nature of visual perception across the visual hierarchy. More…

Attention: accept, reject, or major revisions?

Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 2

Organizers: Alon Zivony¹; ¹University of Sheffield

The concept of “attention” has been criticized to be theoretically incoherent and even unsuitable for scientific research. How should we, as a field, respond to these criticisms? Should we avoid using the concept, change the way we conceptualize attention, or simply continue with our research as usual? Our speakers bring different perspectives from psychology and philosophy in an attempt to answer this question. We provide an overview of some of the difficulties with conceptualizing attention, as well as practical and theoretical solutions to these problems. In doing so, we hope to promote a better science of attention and related phenomena. More…

The Multifaceted effects of blindness and how sight might be restored

Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 1

Organizer: Ella Striem-Amit¹; ¹Georgetown University

Congenital blindness illustrates the developmental roots of visual cortex functions. Our symposium of speakers from diverse academic careers present perspectives on the multifaceted effects of blindness on the brain and behavior. The speakers will describe the effect of sight loss on multisensory properties and on the visual cortex, highlighting differential effects in areas typically responding to motion and faces, and divergence of plasticity across individuals. We will also discuss the limitations of visual prostheses for restoring sight, and how they may be addressed. Altogether, our symposium will call attention to the substantial impact of plasticity and possibilities to overcome it. More…

Using deep networks to re-imagine object-based attention and perception

Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 2

Organizers: Hossein Adeli¹, Seoyoung Ahn², Gregory Zelinsky²; ¹Columbia University, ²Stony Brook University

What are the computational mechanisms that transform visual features into coherent object percepts, and what role does attention play in this process? The speakers in this symposium will use various state-of-the-art deep neural network models to reexamine the cognitive and neural mechanisms underlying object-based attention and perception. They will also explore new computational mechanisms for how the visual system groups visual features into coherent object percepts. This symposium will lay the foundation for the next generation of object-based attention models, ones that harness recent computational tools to advance our understanding of the object-centric nature of human perception. More…

Using deep networks to re-imagine object-based attention and perception

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 2

Organizers: Hossein Adeli¹, Seoyoung Ahn², Gregory Zelinsky²; ¹Columbia University, ²Stony Brook University
Presenters: Patrick Cavanagh, Frank Tong, Paolo Papale, Alekh Karkada Ashok, Hossein Adeli, Melissa Le-Hoa Võ

What can Deep Neural Network (DNN) methods tell us about the brain mechanisms that transform visual features into object percepts? Using different state-of-the-art models, the speakers in this symposium will reexamine different cognitive and neural mechanisms of object-based attention (OBA) and perception and consider new computational mechanisms for how the visual system groups visual features into coherent object percepts. Our first speaker, Patrick Cavanagh, helped create the field of OBA and is therefore uniquely suited to give a perspective on how this question, essentially the feature-binding problem, has evolved over the years and has been shaped by paradigms and available methods. He will conclude by outlining his vision for how DNN architectures create new perspectives on understanding OBA. The next two speakers will review the recent behavioral and neural findings on object-based attention and feature grouping. Frank Tong will discuss the neural and behavioral signatures of OBA through the utilization of fMRI and eye tracking methods. He will demonstrate how the human visual system represents objects across the hierarchy of visual areas. Paolo Papale will discuss neurophysiological evidence for the role of OBA and grouping in object perception. Using stimuli systematically increasing in complexity from lines to natural objects (against cluttered backgrounds) he shows that OBA and grouping are iterative processes. Both talks will also include discussions of current modeling efforts, and what additional measures may be needed to realize more human-like object perception. The following two talks will provide concrete examples of how DNNs can be used to predict human behavior during different tasks. Lore Goetschalckx will focus on the importance of considering the time-course of grouping in object perception and will discuss her recent work on developing a method to analyze dynamics of different models. Using this method, she shows how a deep recurrent model trained on an object grouping task predicts human reaction time. Hossein Adeli will review modeling work on three theories of how OBA binds features into objects: one that implements object-files, another that uses generative processes to reconstruct an object percept, and a third model of spreading attention through association fields. In the context of these modeling studies, he will describe how each of these mechanisms was implemented as a DNN architecture. Lastly, Melissa Võ will drive home the importance of object representations and how they collectively create an object context that humans use to control their attention behavior in naturalistic settings. She shows how GANs can be used to study the hidden representations underlying our perception of objects. This symposium is timely because the advances in computational methods have made it possible to put old theories to the test and to develop new theories of OBA mechanisms that engage the role played by attention in creating object-centric representations.

Talk 1

The Architecture of Object-Based Attention

Patrick Cavanagh¹, Gideon P. Caplovitz², Taissa K. Lytchenko², Marvin R. Maechler³, Peter U. Tse³, David R. Sheinberg⁴; ¹Glendon College, York University, ²University of Nevada, Reno, ³Dartmouth College, ⁴Brown University

Evidence for the existence of object-based attention raises several important questions: what are objects, how does attention access them, and what anatomical regions are involved? What are the “objects” that attention can access? Several studies have shown that items in visual search tasks are only loose collections of features prior to the arrival of attention. Nevertheless, findings from a wide variety of paradigms, including unconscious priming and cuing, have overturned this view. Instead, the targets of object-based attention appear to be fully developed object representations that have reached the level of identity prior to the arrival of attention. Where do the downward projections of object-based attention originate? Current research indicates that the control of object-based attention must come from ventral visual areas specialized in object analysis that project downward to early visual areas. If so, how can feedback from object areas accurately target the object’s early locations and features when the object areas have only crude location information? Critically, recent work on autoencoders has made this plausible as they are capable of recovering the locations and features of the target objects from the high level, low dimensional codes in the object areas. I will outline the architecture of object-based attention, the novel predictions it brings, and discuss how it works in parallel with other attention pathways.

Talk 2

Behavioral and neural signatures of object-based attention in the human visual system

Frank Tong¹, Sonia Poltoratski¹, David Coggan¹, Lasyapriya Pidaparthi¹, Elias Cohen¹; ¹Vanderbilt University

How might one demonstrate the existence of an object representation in the visual system? Does objecthood arise preattentively, attentively, or via a confluence of bottom-up and top-down processes? Our fMRI work reveals that orientation-defined figures are represented by enhanced neural activity in the early visual system. We observe enhanced fMRI responses in the lateral geniculate nucleus and V1, even for unattended figures, implying that core aspects of scene segmentation arise from automatic perceptual processes. In related work, we find compelling evidence of object completion in early visual areas. fMRI response patterns to partially occluded object images resemble those evoked by unoccluded objects, with comparable effects of pattern completion found for unattended and attended objects. However, in other instances, we find powerful effects of top-down attention. When participants must attend to one of two overlapping objects (e.g., face vs. house), activity patterns from V1 through inferotemporal cortex are biased in favor of the covertly attended object, with functional coupling of the strength of object-specific modulation found across brain areas. Finally, we have developed a novel eye-tracking paradigm to predict the focus of object-based attention while observers view two dynamically moving objects that mostly overlap. Estimates of the precision of gaze following suggest that observers can entirely filter out the complex motion signals arising from the task-irrelevant object. To conclude, I will discuss whether current AI models can adequately account for these behavioral and neural properties of object-based attention, and what additional measures may be needed to realize more human-like object processing.

Talk 3

The spread of object attention in artificial and cortical neurons

Paolo Papale¹, Matthew Self¹, Pieter Roelfsema¹; ¹Netherlands Institute for Neuroscience

A crucial function of our visual system is to group local image fragments into coherent perceptual objects. Behavioral evidence has shown that this process is iterative and time-consuming. A simple theory suggested that visual neurons can solve this challenging task relying on recurrent processing: attending to an object could produce a gradual spread of enhancement across its representation in the visual cortex. Here, I will present results from a biologically plausible artificial neural network that can solve object segmentation by attention. This model was able to identify and segregate individual objects in cluttered scenes with extreme accuracy, only using modulatory top-down feedback as observed in visual cortical neurons. Then, I will present comparable results from large-scale electrophysiology recordings in the macaque visual cortex. We tested the effect of object attention with stimuli of increasing complexity, from lines to natural objects against cluttered backgrounds. Consistent with behavioral observations, the iterative model correctly predicted the spread of attentional modulation in visual neurons for simple stimuli. However, for more complex stimuli containing recognizable objects, we observed asynchronous but not iterative modulation. Thus, we produced a set of hybrid stimuli, combining local elements of two different objects, that we alternated with the presentation of stimuli of intact objects. By doing so, we made local information unreliable, forcing the monkey to solve the task iteratively. Indeed, we observed that this set of stimuli induced iterative attentional modulations. These results provide the first systematic investigation on object attention in both artificial and cortical neurons.

Talk 4

Time to consider time: Comparing human reaction times to dynamical signatures from recurrent vision models on a perceptual grouping task

Alekh Karkada Ashok¹, Lore Goetschalckx¹, Lakshmi Narasimhan Govindarajan¹, Aarit Ahuja¹, David Sheinberg¹, Thomas Serre¹; ¹Brown University

To make sense of its retinal inputs, our visual system organizes perceptual elements into coherent figural objects. This perceptual grouping process, like many aspects of visual cognition, is believed to be dynamic and at least partially reliant on feedback. Indeed, cognitive scientists have studied its time course through reaction time measurements (RT) and have associated it with a serial spread of object-based attention. Recent progress in biologically-inspired machine learning, has put forward convolutional recurrent neural networks (cRNNs) capable of exhibiting and mimicking visual cortical dynamics. To understand how the visual routines learned by cRNNs compare to humans, we need ways to extract meaningful dynamical signatures from a cRNN and study temporal human-model alignment. We introduce a framework to train, analyze, and interpret cRNN dynamics. Our framework triangulates insights from attractor-based dynamics and evidential learning theory. We derive a stimulus-dependent metric, ξ, and directly compare it to existing human RT data on the same task: a grouping task designed to study object-based attention. The results reveal a “filling-in” strategy learned by the cRNN, reminiscent of the serial spread of object-based attention in humans. We also observe a remarkable alignment between ξ and human RT patterns for diverse stimulus manipulations. This alignment emerged purely as a byproduct of the task constraints (no supervision on RT). Our framework paves the way for testing further hypotheses on the mechanisms supporting perceptual grouping and object-based attention, as well as for inter-model comparisons looking to improve the temporal alignment with humans on various other cognitive tasks.

Talk 5

Three theories of object-based attention implemented in deep neural network models

Hossein Adeli¹, Seoyoung Ahn², Gregory Zelinsky², Nikolaus Kriegeskorte¹; ¹Columbia University, ²Stony Brook University

Understanding the computational mechanisms that transform visual features into coherent object percepts requires the implementation of theories in scalable models. Here we report on implementations, using recent deep neural networks, of three previously proposed theories in which the binding of features is achieved (1) through convergence in a hierarchy of representations resulting in object-files, (2) through a reconstruction or a generative process that can target different features of an object, or (3) through the elevation of activation by spreading attention within an object via association fields. First, we present a model of object-based attention that relies on capsule networks to integrate features of different objects in the scene. With this grouping mechanism the model is able to learn to sequentially attend to objects to perform multi-object recognition and visual reasoning. The second modeling study shows how top-down reconstructions of object-centric representations in a sequential autoencoder can target different parts of the object in order to have a more robust and human-like object recognition system. The last study demonstrates how object perception and attention could be mediated by flexible object-based association fields at multiple levels of the visual processing hierarchy. Transformers provide a key relational and associative computation that may be present also in the primate brain, albeit implemented by a different mechanism. We observed that representations in transformer-based vision models can predict the reaction time behavior of people on an object grouping task. We also show that the feature maps can model the spreading of attention in an object.

Talk 6

Combining Generative Adversarial Networks (GANs) with behavior and brain recordings to study scene understanding

Melissa Le-Hoa Võ¹, Aylin Kallmayer¹; ¹Goethe University Frankfurt

Our visual world is a complex conglomeration of objects that adhere to semantic and syntactic regularities, a.k.a. scene grammar according to which scenes can be decomposed into phrases – i.e, smaller clusters of objects forming conceptual units – which again contain so-called anchor objects. These usually large and stationary objects further anchor predictions regarding the identity and location of most other smaller objects within the same phrase and play a key role in guiding attention and boosting perception during real-world search. They therefore provide an important organizing principle for structuring real-world scenes. Generative adversarial networks (GANs) trained on images of real-world scenes learn the scenes’ latent grammar to then synthesize images that mimic images of real-world scenes increasingly well. Therefore GANs can be used to study the hidden representations underlying object-based perception serving as testbeds to investigate the role that anchor objects play in both the generation and understanding of scenes. We will present some recent work in which we presented participants with real and generated images recording both behavior and brain responses. Modelling behavioral responses from a range of computer vision models we found that mostly high-level visual features and the strength of anchor information predicted human scene understanding of generated scenes. Using EEG to investigate the temporal dynamics of these processes revealed initial processing of anchor information which generalized to subsequent processing of the scene’s authenticity. These new findings imply that anchors pave the way to scene understanding and that models predicting real-world attention and perception should become more object-centric.

< Back to 2024 Symposia

The Multifaceted effects of blindness and how sight might be restored

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 5:00 – 7:00 pm, Talk Room 1

Organizer: Ella Striem-Amit¹; ¹Georgetown University
Presenters: Lara Coelho, Santani Teng, Woon Ju Park, Elizabeth J. Saccone, Ella Striem-Amit, Michael Beyeler

Congenital blindness illustrates the developmental roots of visual cortex functions. Here, a group of early-career researchers will present various perspectives on the multifaceted effects of blindness on the brain and behavior. To start off the symposium, Coelho will describe the effect of sight loss on multisensory properties, and the reliance on vision to develop an intact multisensory body representation. This presentation will highlight the dependence across modalities, revealing rich interactions between vision and body representations. Discussing a unique manifestation of compensation in blindness, Teng will discuss how echolocation functions in naturalistic settings and its properties of active sensing. Continuing the theme of integration across senses and diving into visual cortical reorganization, Park will argue for partial dependence and partial independence on vision for the development of motion processing in hMT+. Saccone will show evidence for a functional takeover of language over typically face-selective FFA in blindness, showing plasticity beyond sensory representations. Together, these two talks will highlight different views of brain plasticity in blindness. Adding to our discussion of the multifaceted nature of plasticity, Striem-Amit will discuss whether plasticity in the visual cortex is consistent across different blind individuals, showing evidence for divergent visual plasticity and stability over time in adulthood. The last speaker will discuss the challenges and potential for sight restoration using visual prostheses. Beyeler will discuss how some of the challenges of sight restoration can be addressed through perceptual learning of implant inputs. This talk highlights how understanding plasticity in the visual system and across the brain has direct applications for successfully restoring sight. Together, the symposium will bring different theoretical perspectives to illustrate the effects of blindness, revealing the extent and diversity of neural plasticity, and clarify the state-of-the-art capacities for sight restoration.

Talk 1

Implications of visual impairment on body representation

Lara Coelho¹, Monica Gori; ¹Unit for visually impaired people, Italian Institute of Technology, Genova, Italy

In humans, vision is the most accurate sensory modality for constructing our representation of space. It has been shown that visual impairment negatively influences daily living and quality of life. For example, spatial and locomotor skills are reduced in this population. One possibility is that these deficiencies arise from a distorted representation of the body. Body representation is fundamental for motor control, because we rely on our bodies as a metric guide for our actions. While body representation is a by-product of multisensory integration, it has been proposed that vision is necessary to construct an accurate representation of the body. In the MySpace project, we are investigating the role of visual experience on haptic body representations in sighted and visually impaired (VI) participants. To this end, we employ a variety of techniques to investigate two key aspects of body representation 1) size perception, and 2)the plasticity of the proprioceptive system. These techniques include landmark localization, psychophysics, and the rubber hand illusion. Our results in sighted participants show distortions in haptic but not visual body representation. In the VI participants there are distortions when estimating forearm, hand, and foot size in several different haptic tasks. Moreover, VI children fail to update their perceived body location in the rubber hand illusion task. Collectively, our findings support the hypothesis that vision is necessary to reduce distortions in haptic body representations. Moreover, we propose, that VI children may develop with impaired representations of their own bodies. We discuss possible opportunities for reducing this impairment.

Talk 2

Acoustic glimpses: The accumulation of perceptual information in blind echolocators

Santani Teng¹; ¹Smith-Kettlewell Eye Research Institute

Blindness imposes constraints on the acquisition of sensory information from the environment. To mitigate those constraints, some blind people employ active echolocation, a technique in which self-generated sounds, like tongue “clicks,” produce informative reflections. Echolocating observers integrate over multiple clicks, or samples, to make perceptual decisions that guide behavior. What information is gained in the echoacoustic signal from each click? Here, I will draw from similar work in eye movements and ongoing studies in our lab to outline our approaches to this question. In a psychoacoustic and EEG experiment, blind expert echolocators and sighted control participants localized a virtual reflecting object after hearing simulated clicks and echoes. Left-right lateralization improved on trials with more click repetitions, suggesting a systematic precision benefit to multiple samples even when each sample delivered no new sensory information. In a related behavioral study, participants sat in a chair but otherwise moved freely while echoacoustically detecting, then orienting toward a reflecting target located at a random heading in the frontal hemifield. Clicking behavior and target size (therefore sonar strength) strongly influenced the rate and precision of orientation convergence toward the target, indicating a dynamic interaction between motor-driven head movements, click production, and the resulting echoacoustic feedback to the observer. Taken together, modeling these interactions in blind expert practitioners suggests similar properties, and potential shared mechanisms, between active sensing behavior in visual and echoacoustic domains.

Talk 3

Constraints of cross-modal plasticity within hMT+ following early blindness

Woon Ju Park¹, Kelly Chang, Ione Fine; ¹Department of Psychology, University of Washington

Cross-modal plasticity following early blindness has been widely documented across numerous visual areas, highlighting our brain’s remarkable adaptability to changes in sensory environment. In many of these areas, functional homologies have been observed between the original and reorganized responses. However, the mechanisms driving these homologies remain largely unknown. Here, we will present findings that aim to answer this question within the area hMT+, which responds to visual motion in sighted individuals and to auditory motion in early blind individuals. Our goal was to examine how the known functional and anatomical properties of this area influence the development of cross-modal responses in early blind individuals. Using a multimodal approach that encompasses psychophysics, computational modeling, and functional and quantitative MRI, we simultaneously characterized perceptual, functional, and anatomical selectivity to auditory motion within early blind and sighted individuals. We find that some anatomical and functional properties of hMT+ are inherited, while others are altered in those who become blind early in life.

Talk 4

Visual experience is necessary for dissociating face- and language-processing in the ventral visual stream

Elizabeth J. Saccone¹, Akshi¹, Judy S. Kim², Mengyu Tian³, Marina Bedny¹; ¹Department of Psychological and Brain Sciences, Johns Hopkins University, Baltimore, MD, USA, ²Center for Human Values, Princeton University, Princeton, NJ, USA, ³Center for Educational Science and Technology, Beijing Normal University at Zhuhai, China

The contributions of innate predispositions versus experience to face-selectivity in vOTC is hotly debated. Recent studies with people born blind suggest face specialization emerges regardless of experience. In blindness the FFA is said to process face shape, accessed through touch or sound, or maintain its behavioral role in person recognition by specializing for human voices. We hypothesized instead that in blind people the anatomical location of the FFA responds to language. While undergoing fMRI, congenitally blind English speakers (N=12) listened to spoken language (English), foreign speech (Russian, Korean, Mandarin), non-verbal vocalizations (e.g., laughter) and control non-human scene sounds (e.g., forest sounds) during a 1-back repetition task. Participants also performed a ‘face localizer’ task by touching 3D printed models of faces and control scenes and a language localizer (spoken words > backwards speech, Braille > tactile shapes). We identified individual-subject ROIs inside a FFA mask generated from sighted data. In people born blind, the anatomical location of the FFA showed a clear preference for language over all other sounds, whether human or not. Responses to spoken language were higher than to foreign speech or non-verbal vocalizations, which were not different from scene sounds. This pattern was observed even in parts of vOTC that responded more to touching faces. Specialization for faces in vOTC is influenced by experience. In the absence of vision, lateral vOTC becomes implicated in language. We speculate that shared circuits that evolved for communication specialize for either face recognition or language depending on experience.

Talk 5

Individual differences of brain plasticity in early visual deprivation

Ella Striem-Amit¹; ¹Department of Neuroscience, Georgetown University Medical Center, Washington, DC 20057, USA

Early-onset blindness leads to reorganization in visual cortex connectivity and function. However, this has mostly been studied at the group level, largely ignoring differences in brain reorganization across early blind individuals. To test whether plasticity manifests differently in different blind individuals, we studied resting-state functional connectivity (RSFC) from the primary visual cortex in a large cohort of blind individuals. We find increased individual differences in connectivity patterns, corresponding to areas that show reorganization in blindness. Further, using a longitudinal approach in repeatedly sampled blind individuals, we showed that such individual patterns of organization and plasticity are stable over time, to the degree of decoding individual participant identity over 2 years. Together, these findings suggest that visual cortex reorganization is not ubiquitous, highlighting the potential diversity in brain plasticity and the importance of harnessing individual differences for fitting rehabilitation approaches for vision loss.

Talk 6

Learning to see again: The role of perceptual learning and user engagement in sight restoration

Michael Beyeler¹; ¹University of California, Santa Barbara

Retinal and cortical implants show potential in restoring a rudimentary form of vision to people living with profound blindness, but the visual sensations (“phosphenes”) produced by current devices often seem unnatural or distorted. Consequently, the ability of implant users to learn to make use of this artificial vision plays a critical role in whether some functional vision is successfully regained. In this talk, I will discuss recent work detailing the potential and limitations of perceptual learning in helping implant users learn to see again. Although the abilities of visual implant users tend to improve with training, there is little evidence that this is due to distortions becoming less perceptually apparent, but instead may be due to better interpretation of distorted input. Unlike those with natural vision, implant recipients must accommodate various visual anomalies, such as inconsistent spatial distortions and phosphene fading. Furthermore, perceptual measures such as grating acuity and motion discrimination, which are often used with the intention of objectively assessing visual function, may be modulated via gamification, highlighting the importance of user engagement in basic psychophysical tasks. Gamification may be particularly effective at engaging reward systems in the brain, potentially fostering greater plasticity through more varied stimuli and active attentional engagement. However, the effectiveness of such gamified approaches varies, suggesting a need for personalized strategies in visual rehabilitation.

< Back to 2024 Symposia

Attention: accept, reject, or major revisions?

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 2

Organizers: Alon Zivony¹; ¹University of Sheffield
Presenters: Britt Anderson, Ruth Rosenholtz, Wayne Wu, Sarah Shomstein, Alon Zivony

Is attention research in crisis? After more than a century, we have come full circle from the intuition that “everybody knows what attention is” (James, 1890) to the conclusion that “nobody knows what attention is” (Hommel et al., 2019). It has been suggested that attention is an incoherent and sterile concept, or unsuitable for scientific research. And yet, attention research continues as strongly as ever with little response to these critiques. Is the field ignoring glaring theoretical problems, or does the current conception of attention merely require some revisions? In this symposium, our speakers bring different perspectives to examine this critical question. Rather than merely raising issues with the concept of attention, each also suggests practical and theoretical solutions, which can hopefully inform future research. Each speaker will present either a critical view or defence of the concept of attention, and suggest whether attention should be abandoned, kept as is, or redefined. Our first two speakers will argue that scientists may be better off without the concept of attention. Britt Anderson will criticize the use of attention as an explanation of observed phenomena. He will suggest that the common usage is non-scientific and results in circular logic. He offers in its place an attention-free account of so-called attention effects. Ruth Rosenholtz argues that recent work, for example on peripheral vision, calls into question many of the basic tenets of attention theory. She will talk about her year of banning ‘attention’ in order to rethink attention from the ground up. The second group of speakers will question common understanding of attention but will argue in favour of it as a scientific concept. Wayne Wu will suggest that our shared methodology of studying attention commits us to the Jamesonian functional conceptualization of attention. He will argue that attention can and should be retained if we locate it in the right level analysis in cognitive explanation. Sarah Shomstein will discuss “attentional platypuses”, empirical abnormalities that do not fit into current attention research. These abnormalities reveal the need for a new way of thinking about attention. Alon Zivony will argue that many of the conceptual problems with attention stem from the standard view that equates attention with selection. Moving away from this definition will allow us to retain attention but will also require a change in our thinking. Each talk will conclude with a take-home message about what attention is and isn’t, a verdict of whether it should be abandoned or retained, and suggestions of how their understanding of attention can be applied in future research. We will conclude with a panel discussion.

Talk 1

Attention: Idol of the Tribe

Britt Anderson¹; ¹Dept of Psychology and Centre for Theoretical Neuroscience, University of Waterloo

The term ’attention’ has been a drag on our science ever since the early days of experimental psychology. Our frequent offerings and sacrifices (articles and the debates they provoke), and our unwillingness to abandon our belief in this reified entity indicates the aptness of the Jamesian phrase ”idol of the tribe.” While causal accounts of attention are empty, attention might be, as suggested by Hebb, a useful label. It could be used to indicate that some experimental observable is not immediately explained by the excitation of receptor cells. However, labeling of something as ’attention’ means there is something to be explained; not that something has been explained. Common experimental manipulations used to provoke visual selective attention: instructions, cues, and reward are in fact the guide to explaining away ’attention’. The observations provoked by such manipulations frequently induce behavioral performance differences not explainable in terms of differences in retinal stimulation. These manipulations are economically summarized as components of a process in which base rates, evidence, value, and plausibility combine to determine perceptual experience. After briefly reviewing the history of how attention has been confusing from the start, I will summarize the notion of conceptual fragmentation and show how it applies. I will then review how the traditional conditions of an attentional experiment provide the basis for a superior, attention free, account of the phenomena of interest, and I will present some of the opportunities for the use of more formal descriptions that should lead to better theoretically motivated experimental investigations.

Talk 2

Attention in Crisis

Ruth Rosenholtz¹; ¹NVIDIA Research

Recent research on peripheral vision has led to a paradigm-shifting conclusion: that vision science as a field must rethink the concept of visual attention. Research has uncovered significant anomalies not explained by existing theories, and some methods for studying attention may instead have uncovered mechanisms of peripheral vision. Nor can a summary statistic representation in peripheral vision solve these problems on its own. A year of banning “attention” in my lab allowed us to rethink attention from the ground up; this talk will conclude with some of the resulting insights.

Talk 3

Attention Unified

Wayne Wu¹; ¹Department of Philosophy and Neuroscience Institute, Carnegie Mellon University

For over a century, scientists have expressed deep misgivings about attention. A layperson would find this puzzling, for they know what attention is as well as those with sight know what seeing is. People visually attend all the time. Attention is real, we know what it is, and we can explain it. I shall argue that the problem of attention concerns the conceptual and logical structure of the scientific theory of attention. Because of shared methodology, we are committed to a single functional conception of attention, what William James articulated long ago. I show how this shared conception provides a principle of unification that links empirical work. To illustrate this, I show how two cueing paradigms tied to “external” and “internal” attention, spatial cueing and retro-cueing, are instances of the same kind of attention. Against common skepticism, I demonstrate that we are all committed to the existence of attention as a target of explanation. Yet in step with the skeptic, I show that attention is not an explainer in the sense that it is not a neural mechanism. Locating attention at the right level of analysis in cognitive explanation is key to understanding what it is and how science has made massive progress in understanding it.

Talk 4

What does a platypus have to do with attention?

Sarah Shomstein¹; ¹Department of Psychological and Brain Sciences, George Washington University

Decades of research on understanding the mechanisms of attentional selection have focused on identifying the units (representations) on which attention operates in order to guide prioritized sensory processing. These attentional units fit neatly to accommodate our understanding of how attention is allocated in a top-down, bottom-up, or historical fashion. In this talk, I will focus on attentional phenomena that are not easily accommodated within current theories of attentional selection. We call these phenomena attentional platypuses, as they allude to an observation that within biological taxonomies the platypus does not fit into either mammal or bird categories. Similarly, attentional phenomena that do not fit neatly within current attentional models suggest that current models need to be revised. We list a few instances of the ‘attentional platypuses’ and then offer a new approach, that we term Dynamically Weighted Prioritization, stipulating that multiple factors impinge onto the attentional priority map, each with a corresponding weight. The interaction between factors and their corresponding weights determine the current state of the priority map which subsequently constrains/guides attention allocation. We propose that this new approach should be considered as a supplement to existing models of attention, especially those that emphasize categorical organizations.

Talk 5

It’s time to redefine attention

Alon Zivony¹; ¹Department of Psychology, University of Sheffield

Many models of attention assume that attentional selection takes place at a specific moment in time which demarcates the critical transition from pre-attentive to attentive processing of sensory inputs. In this talk, I will argue that this intuitively appealing assumption is not only incorrect, but it is also the reason behind the conceptual confusion about what attention is, and how it should be understood in psychological science. As an alternative, I will offer a “diachronic” framework that views attention as a modulatory process that unfolds over time, in tandem with perceptual processing. This framework breaks down the false dichotomy between pre-attentive and attentive processing, and as such, offers new solutions to old problems in attention research (the early vs. late selection debate). More importantly, by situating attention within a broader context of selectivity in the brain, the diachronic account can provide a unified and conceptually coherent account of attention. This will allow us to keep the concept of attention but will also require serious rethinking about how we use attention as a scientific concept.

< Back to 2024 Symposia

The temporal evolution of visual perception

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 2:30 – 4:30 pm, Talk Room 1

Organizers: Lina Teichmann¹, Chris Baker¹; ¹Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, USA
Presenters: Lina Teichmann, Iris I. A. Groen, Diana Dima, Tijl Grootswagers, Rachel Denison

The human visual system dynamically processes input over the course of a few hundred milliseconds to generate our perceptual experience. Capturing the dynamic aspects of the neural response is therefore imperative to understand visual perception. By bringing five speakers together who use a diverse set of methods and approaches, the symposium aims to elucidate the temporal evolution of visual perception from different angles. All five speakers (four female) are early-career researchers based in Europe, Australia, the US, and Canada. Speakers will be allotted 18 minutes of presentation time plus 5 minutes of questions after each talk. In contrast to a lot of the current neuroimaging work, the symposium talks will focus on temporal dynamics rather than localization. Collectively, the work presented will demonstrate that the complex and dynamic nature of visual perception requires data that matches its temporal granularity. In the first talk, Lina Teichmann will present data from a large-scale study focusing on how individual colour-space geometries unfold in the human brain. Linking densely-sampled MEG data with psychophysics, her work on colour provides a test case to study the subjective nature of visual perception. Iris Groen will discuss findings from intracranial EEG studies that characterize neural responses across the visual hierarchy. Applying computational models, her work provides fundamental insights into how the visual response unfolds over time across visual cortex. Diana Dima will speak about how responses evoked by observed social interactions are processed in the brain. Using temporally-resolved EEG data, her research shows how visual information is modulated from perception to cognition. Tijl Grootswagers will present on studies investigating visual object processing. Using rapid series of object stimuli and linking EEG and behavioural data, his work shows the speed and efficiency of the visual system to make sense of the things we see. To conclude, Rachel Denison will provide insights into how we employ attentional mechanisms to prioritize relevant visual input at the right time. Using MEG data, she will highlight how temporal attention affects the dynamics of evoked visual responses. Overall, the symposium aims to shed light on the dynamic nature of visual processing at all levels of the visual hierarchy. It will be a chance to discuss benefits and challenges of different methodologies that will allow us to gain a comprehensive insight into the temporal aspects of visual perception.

Talk 1

The temporal dynamics of individual colour-space geometries in the human brain

Lina Teichmann¹, Ka Chun Lam², Danny Garside³, Amaia Benitez-Andonegui⁴, Sebastian Montesinos¹, Francisco Pereira², Bevil Conway^3,5, Chris Baker^1,5; ¹Laboratory of Brain and Cognition, National Institute of Mental Health, Bethesda, USA, ²Machine Learning Team, National Institute of Mental Health, Bethesda, USA, ³Laboratory of Sensorimotor Research, National Eye Institute, Bethesda, USA, ⁴MEG Core Facility, National Institute of Mental Health, Bethesda, USA, ⁵equal contribution

We often assume that people see the world in a similar way to us, as we can effectively communicate how things look. However, colour perception is one aspect of vision that varies widely among individuals as shown by differences in colour discrimination, colour constancy, colour appearance and colour naming. Further, the neural response to colour is dynamic and varies over time. Many attempts have been made to construct formal, uniform colour spaces that aim to capture universally valid similarity relationships, but there are discrepancies between these models and individual perception. Combining Magnetoencephalography (MEG) and psychophysical data we examined the extent to which these discrepancies can be accounted for by the geometry of the neural representation of colour and their evolution over time. In particular, we used a dense sampling approach and collected neural responses to hundreds of colours to reconstruct individual fine-grained colour-space geometries from neural signals with millisecond accuracy. In addition, we collected large-scale behavioural data to assess perceived similarity relationships between different colours for every participant. Using a computational modelling approach, we extracted similarity embeddings from the behavioural data to model the neural signal directly. We find that colour information is present in the neural signal from approximately 70 ms onwards but that neural colour-space geometries unfold non-uniformly over time. These findings highlight the gap between theoretical colour spaces and colour perception and represent a novel avenue to gain insights into the subjective nature of perception.

Talk 2

Delayed divisive normalisation accounts for a wide range of temporal dynamics of neural responses in human visual cortex

Iris I. A. Groen¹, Amber Brands¹, Giovanni Piantoni², Stephanie Montenegro³, Adeen Flinker³, Sasha Devore³, Orrin Devinsky³, Werner Doyle³, Patricia Dugan³, Daniel Friedman³, Nick Ramsey², Natalia Petridou², Jonathan Winawer⁴; ¹Informatics Institute, University of Amsterdam, Amsterdam, Netherlands, ²University Medical Center Utrecht, Utrecht, Netherlands, ³New York University Grossman School of Medicine, New York, NY, USA, ⁴Department of Psychology and Center for Neural Science, New York University, New York, NY, USA

Neural responses in visual cortex exhibit various complex, non-linear temporal dynamics. Even for simple static stimuli, responses decrease when a stimulus is prolonged in time (adaptation), reduce to stimuli that are repeated (repetition suppression), and rise more slowly for low contrast stimuli (slow dynamics). These dynamics also vary depending on the location in the visual hierarchy (e.g., lower vs. higher visual areas) and the type of stimulus (e.g., contrast pattern stimuli vs. real-world object, scenes and face categories). In this talk, I will present two intracranial EEG (iEEG) datasets in which we quantified and modelled the temporal dynamics of neural responses across the visual cortex at millisecond resolution. Our work shows that many aspects of these dynamics are accurately captured by a delayed divisive normalisation model in which neural responses are normalised by recent activation history. I will highlight how fitting this model to the iEEG data unifies multiple disparate temporal phenomena in a single computational framework, thereby revealing systematic differences in temporal dynamics of neural population responses across the human visual hierarchy. Overall, these findings suggest a pervasive role of history-dependent delayed divisive normalisation in shaping neural response dynamics across the cortical visual hierarchy.

Talk 3

How natural action perception unfolds in the brain

Diana Dima¹, Yalda Mohsenzadeh¹; ¹Western University, London, ON, Canada

In a fraction of a second, humans can recognize a wide range of actions performed by others. Yet actions pose a unique complexity challenge, bridging visual domains and varying along multiple perceptual and semantic features. What features are extracted in the brain when we view others’ actions, and how are they processed over time? I will present electroencephalography work using natural videos of human actions and rich feature sets to determine the temporal sequence of action perception in the brain. Our work shows that action features, from visual to semantic, are extracted along a temporal gradient, and that different processing stages can be dissociated with artificial neural network models. Furthermore, using a multimodal approach with video and text stimuli, we show how conceptual action representations emerge in the brain. Overall, these data reveal the rapid computations underlying action perception in natural settings. The talk will highlight how a temporally resolved approach to natural vision can uncover the neural computations linking perception and cognition.

Talk 4

Decoding rapid object representations

Tijl Grootswagers¹, Amanda K. Robinson²; ¹The MARCS Institute for Brain, Behaviour and Development, School of Computer, Data and Mathematical Sciences, Western Sydney University, Sydney, NSW, Australia, ²Queensland Brain Institute, The University of Queensland, Brisbane, QLD, Australia

Humans are extremely fast at recognising objects, and can do this very reliably. Information about objects and object categories emerges within 200 milliseconds in the human visual system, even under difficult conditions such as occlusion or low visibility. These neural representations can be highly complex and multidimensional, despite relying on limited visual information. Understanding emerging object representations necessitates time-resolved neuroimaging methods with millisecond precision, such as EEG and MEG. Recent time-resolved neuroimaging work has used decoding methods in rapid serial visual presentation designs to show that relevant object-information about multiple sequentially presented objects is robustly encoded by the brain. This talk will highlight recent research on the time course of object representations in rapid image sequences, focusing on three key findings: (1) object representations are highly automatic, with robust representations emerging even with fast-changing visual input. (2) emerging object representations are highly robust to changes in context and task, suggesting strong reliance on feedforward processes. (3) object representational structures are highly consistent across individuals, to the extent that neural representations are predictive of independent behavioural judgments on a variety of tasks. Together, these findings suggest that the first sweep of information through the visual system contains highly robust information that is readily available for read-out in behavioural decisions.

Talk 5

Isolating neural mechanisms of voluntary temporal attention

Rachel Denison^1,2, Karen Tian^1,2, Jiating Zhu¹, David Heeger², Marisa Carrasco²; ¹Boston University, Department of Psychological and Brain Sciences, USA, ²New York University, Department of Psychology and Center for Neural Science, USA

To handle the continuous influx of visual information, temporal attention prioritizes visual information at task-relevant moments in time. We first introduce a probabilistic framework that clarifies the conceptual distinction and formal relation between temporal attention, linked to timing relevance, and temporal expectation, linked to timing predictability. Next, we present two MEG studies in which we manipulated temporal attention while keeping expectation constant, allowing us to isolate neural mechanisms specific to voluntary temporal attention. Participants were cued to attend to one of two sequential grating targets with predictable timing, separated by a 300 ms SOA. The first study used time-resolved steady-state visual evoked responses (SSVER) to investigate how temporal attention modulates anticipatory visual activity. In the pre-target period, visual activity (measured with a background SSVER probe) steadily ramped up as the targets approached, reflecting temporal expectation. Furthermore, we found a low-frequency modulation of visual activity, which shifted approximately 180 degrees in phase according to which target was attended. The second study used time-resolved decoding and source reconstruction to examine how temporal attention affects dynamic target representations. Temporal attention to the first target enhanced its orientation representation within a left fronto-cingulate region ~250 ms after stimulus onset, perhaps protecting it from interference from the second target within the visual cortex. Together these studies reveal how voluntary temporal attention flexibly shapes pre-target periodic dynamics and post-target routing of stimulus information to select a task-relevant stimulus within a sequence.

< Back to 2024 Symposia

Large-scale visual neural datasets: where do we go from here?

< Back to 2024 Symposia

Symposium: Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 2

Organizers: Alessandro Gifford¹, Kendrick Kay²; ¹Freie Universität Berlin, ²University of Minnesota
Presenters: Eline R. Kupers, Won Mok Shim, Ian Charest, Tomas Knapen, Jacob Prince, Alessandro T. Gifford

Vision science has witnessed an increase in worldwide initiatives collecting and publicly releasing large-scale visual neural datasets (LSVNDs). These initiatives have allowed thousands of vision scientists to readily harness LSVNDs, enabling new investigations and resulting in novel discoveries. This suggests vision science is entering a new era of inquiry characterized by big open data. The rapid growth in the collection and use of LSVNDs spawns urgent questions, the answers to which will steer the direction of the field. How can different researchers across the vision sciences spectrum benefit from these datasets? What are the opportunities and pitfalls of LSVNDs for theory formation? Which kinds of LSVNDs are missing, and what characteristics should future LSVNDs have to maximize their impact and utility? How can LSVNDs support a virtuous cycle between neuroscience and artificial intelligence? This symposium invites the VSS community to engage these questions in an interactive and guided community process. We will start with a short introduction (5 minutes), followed by six brief, thought-provoking talks (each 9 minutes plus 3 minutes for Q&A). Enriched by these perspectives, the symposium will then move to a highly interactive 30-minute discussion where we will engage the audience to discuss the most salient open questions on LSVNDs, generate and share insights, and foster new collaborations. Speakers from diverse career stages will cover a broad range of perspectives on LSVNDs, including dataset creators (Kupers, Shim), dataset users (Prince), and researchers playing both roles (Gifford, Charest, Knapen). Eline Kupers will expose behind-the-scenes knowledge on a particular LSVND that has received substantial traction in the field, the Natural Scenes Dataset (NSD), and will introduce ongoing efforts for a new large-scale multi-task fMRI dataset called Visual Cognition Dataset. Won Mok Shim will introduce the Naturalistic Perception Action and Cognition (NatPAC) 7T fMRI dataset, and discuss how this dataset allows investigation of the impact of goal-directed actions on visual representations under naturalistic settings. Ian Charest will present recent results on semantic representations enabled by NSD, as well as ongoing large-scale data collection efforts inspired by NSD. Tomas Knapen will demonstrate how combining LSVNDs with other datasets incites exploration and discovery, and will present ongoing large-scale data collection efforts in his group. Jacob Prince will provide a first-hand perspective on how researchers external to the data collection process can apply LSVNDs for diverse research aims across cognitive neuroscience, neuroAI, and neuroimaging methods development. Finally, Ale Gifford will highlight broad opportunities that LSVNDs offer to the vision sciences community, and present a vision for the future of large-scale datasets. This symposium will interest any VSS member interested in neural data as it will expose opportunities and limitations of LSVNDs and how they relate to smaller, more narrowly focused datasets. Our goal is to align the VSS community with respect to open questions regarding LSVNDs, and help incentivize and coordinate new large-scale data collection efforts. We believe this symposium will strengthen the impact of LSVNDs on the field of vision science, and foster a new generation of big-data vision scientists.

Talk 1

The Natural Scenes Dataset: Lessons Learned and What’s Next?

Eline R. Kupers^1,2, Celia Durkin², Clayton E Curtis³, Harvey Huang⁴, Dora Hermes⁴, Thomas Naselaris², Kendrick Kay²; ¹Stanford University, ²University of Minnesota, ³New York University, ⁴Mayo Clinic

Release and reuse of rich neuroimaging datasets have rapidly grown in popularity, enabling researchers to ask new questions about visual processing and to benchmark computational models. One highly used dataset is the Natural Scenes Dataset (NSD), a 7T fMRI dataset where 8 subjects viewed more than 70,000 images over the course of a year. Since its recent release in September 2021, NSD has gained 1700+ users and resulted in 55+ papers and pre-prints. Here, we share behind-the-scenes considerations and inside knowledge from the NSD acquisition effort that helped ensure its quality and impact. This includes lessons learned regarding funding, designing, collecting, and releasing a large-scale fMRI dataset. Complementing the creator’s perspective, we also highlight the user’s viewpoint by revealing results from a large anonymous survey distributed amongst NSD users. These results will provide valuable (and often unspoken) insights into both positive and negative experiences interacting with NSD and other publicly available datasets. Finally, we discuss ongoing efforts towards two new large-scale datasets: (i) NSD-iEEG, an intracranial electroencephalography dataset with extensive electrode coverage in cortex and sub-cortex using a similar paradigm to NSD and (ii) Visual Cognition Dataset, a 7T fMRI dataset that samples a large diversity of tasks on a common set of visual stimuli (in contrast to NSD which samples a large diversity of stimuli during a single task). By sharing these lessons and ideas, we hope to facilitate new data collection efforts and enhance the ability of these datasets to support new discoveries in vision and cognition.

Talk 2

Exploring naturalistic vision in action with the 7T Naturalistic Perception, Action, and Cognition (NatPAC) Dataset

Won Mok Shim^1,2, Royoung Kim^1,2, Jiwoong Park^1,2; ¹Institute of Basic Science, Republic of Korea, ²Sungkyunkwan University

Large-scale human neuroimaging datasets have provided invaluable opportunities to examine brain and cognitive functions. Our recent endeavor, the 7T NatPAC project, is designed to provide high-resolution human MRI structural and functional datasets using moderately dense sampling (12–16 2-hr sessions per subject) across a broad range of tasks. While previous large-scale datasets have featured sparse sampling of cognitive functions, our goal is to encompass a more extensive spectrum of cognitive and affective processes through diverse tasks, spanning both structured and naturalistic paradigms. Notably, we incorporated naturalistic tasks to probe a variety of higher-order cognitive functions including watching movies, freely speaking, and interactive 3D video game playing within a Minecraft environment. Through a collection of innovative Minecraft-based games simulating real-world behaviors, we aim to investigate the neural mechanisms of perception, action, and cognition as an integrative process that unfolds in naturalistic contexts. In this talk, I will focus on a shepherding game, where participants engage in strategic planning with hierarchical subgoals and adaptively update their strategies while navigating a virtual world. In combination with high-precision eye tracking data corrected for head motion, we explore how visual responses, including population receptive field (pRF) mapping, are modulated in the visual cortex and frontoparietal regions during free viewing and complex goal-directed behaviors compared to passive viewing of game replays and conventional pRF experiments. I will discuss the broader implications of the impact of goal-directed actions on visual representations and how large-scale datasets enable us to examine such effects in naturalistic settings.

Talk 3

Exploiting large-scale neuroimaging datasets to reveal novel insights in vision science

Ian Charest^1,2, Peter Brotherwood¹, Catherine Landry¹, Jasper van den Bosch¹, Shahab Bakhtiari^1,2, Tim Kietzmann³, Frédéric Gosselin¹, Adrien Doerig³; ¹Université de Montréal, ²Mila – Québec AI Institute, ³University of Osnabrück

Building quantitative models of neural activity in the visual system is a long-standing goal in neuroscience. Though this research program is fundamentally limited by the small scale and low signal-to-noise of most existing datasets, with the advent of large-scale datasets it has become possible to build, test, and discriminate increasingly expressive competing models of neural representation. In this talk I will describe how the scale of the 7T fMRI Natural Scenes Dataset (NSD) has made possible novel insights into the mechanisms underlying scene perception. We harnessed recent advancements in linguistic artificial intelligence to construct models that capture progressively richer semantic information, ranging from object categories to word embeddings to scene captions. Our findings reveal a positive correlation between a model’s capacity to capture semantic information and its ability to predict NSD data, a feature then replicated with recurrent convolutional networks trained to predict sentence embeddings from visual inputs. This collective evidence suggests that the visual system, as a whole, is better characterized by an aim to extract rich semantic information rather than merely cataloging object inventories from visual inputs. Considering the substantial power of NSD, collecting additional neuroimaging and behavioral data using the same image set becomes highly appealing. We are expanding NSD through the development of two innovative datasets: an electroencephalography dataset called NSD-EEG, and a mental imagery vividness ratings dataset called NSD-Vividness. Datasets like NSD not only provide fresh insights into the visual system but also inspire the development of new datasets in the field.

Talk 4

Farewell to the explore-exploit trade-off in large-scale datasets

Tomas Knapen^1,2, Nick Hedger³, Thomas Naselaris⁴, Shufan Zhang^1,2, Martin Hebart^5,6; ¹Vrije Universiteit, ²Royal Dutch Academy of Arts and Sciences, ³University of Reading, ⁴University of Minnesota, ⁵Justus Liebig University, ⁶Max Planck Institute for Human Cognitive and Brain Sciences

LSVNDs are a very powerful tool for discovery science. Due to their suitability for exploration, large datasets synergize well when supplemented with more exploitative datasets focused on small-scale hypothesis testing that can confirm exploratory findings. Similar synergy can be attained when combining findings across datasets, where one LSVND can be used to confirm and extend discoveries from another LSVND. I will showcase how we have recently leveraged several large-scale datasets in unison to discover principles of topographic visual processing throughout the brain. These examples demonstrate how LSVNDs can be used to great effect, especially in combination across datasets. In our most recent example, we combined the HCP 7T fMRI dataset (a “wide” dataset with 180 participants, 2.5 hrs of whole-brain fMRI each) with NSD (a “deep” dataset with 8 participants, 40 hrs of whole-brain fMRI each) to investigate visual body-part selectivity. We discovered homuncular maps in high-level visual cortex through connectivity with primary somatosensory cortex in HCP, and validated the body-part tuning of these maps using NSD. This integration of wide and deep LSVNDs allows inference about computational mechanisms at both the individual and population levels. For this reason, we believe the field needs a variety of LSVNDs. I will briefly present ongoing work from my lab collecting new ‘deep’ LSVND contributions: a brief (2.5-s) video watching dataset and a retinotopic mapping dataset, each with up to 10 sessions of 7T fMRI in 8 subjects.

Talk 5

Large datasets: a Swiss Army knife for diverse research aims in neuroAI

Jacob Prince¹, Colin Conwell², Talia Konkle¹; ¹Harvard University, ²Johns Hopkins University

This talk provides a first-hand perspective on how users external to the data collection process can harness LSVNDs as foundation datasets for their research aims. We first highlight recent evidence that these datasets help address and move beyond longstanding debates in cognitive neuroscience, such as the nature of category selective regions, and the visual category code more broadly. We will show evidence that datasets like NSD have provided powerful new insight into how items from well-studied domains (faces, scenes) are represented in the context of broader representational spaces for objects. Second, we will highlight the potential of LSVNDs to answer urgent, emergent questions in neuroAI – for example, which inductive biases are critical for obtaining a good neural network model of the human visual system? We will describe a series of controlled experiments leveraging hundreds of open-source DNNs, systematically varying inductive biases to reveal the factors that most directly impact brain predictivity at scale. Finally, for users interested in neuroimaging methods development, we will highlight how the existence of these datasets has catalyzed rapid progress in methods for fMRI signal estimation and denoising, as well as for basic analysis routines like PCA and computing noise ceilings. We will conclude by reflecting on both the joys and pain points of working with LSVNDs, in order to help inform the next generation of these datasets.

Talk 6

What opportunities do large-scale visual neural datasets offer to the vision sciences community?

Alessandro T. Gifford¹, Benjamin Lahner², Pablo Oyarzo¹, Aude Oliva², Gemma Roig³, Radoslaw M. Cichy¹; ¹Freie Universität Berlin, ²MIT, ³Goethe Universität Frankfurt

In this talk I will provide three complementary examples of the opportunities that LSVNDs offer to the vision sciences community. First, LSVNDs of naturalistic (thus more ecologically valid) visual stimulation allow the investigation of novel mechanisms of high-level visual cognition. We are extensively recording human fMRI and EEG responses for short naturalistic movie clips; modeling results reveal that semantic information such as action understanding or movie captions is embedded in neural representations. Second, LSVNDs contribute to the emerging field of NeuroAI, advancing research in vision sciences through a symbiotic relationship between visual neuroscience and computer vision. We recently collected a large and rich EEG dataset of neural responses to naturalistic images, using it on the one hand to train deep-learning-based end-to-end encoding models directly on brain data, thus aligning visual representations in models and the brain, and on the other hand to increase the robustness of computer vision models by exploiting inductive biases from neural visual representations. Third, LSVNDs make possible critical initiatives such as challenges and benchmarks. In 2019 we founded the Algonauts Project, a platform where scientists from different disciplines can cooperate and compete in creating the best predictive models of the visual brain, thus advancing the state-of-the-art in brain modeling as well as promoting cross-disciplinary interaction. I will end with some forward-looking thoughts on how LSVNDs might transform the vision sciences.

< Back to 2024 Symposia