Eye Movements: Mechanisms, naturalistic viewing

Talk Session: – ,

Talk 1,

Saccade preparation concurrently degrades foveal performance and appearance

Nina M. Hanning1 (), Hyun Seo Lee2, Marisa Carrasco2; 1Humboldt-Universität zu Berlin, Department Psychologie, 2New York University, Department of Psychology & Center for Neural Science

When we prepare saccadic eye movements, attention shifts toward the upcoming fixation location, enhancing perceptual performance and appearance. Recent work shows that the saccade target benefit is accompanied by a cost at the current fixation, where overall perceptual performance is reduced shortly before we look away. It is unknown whether the foveal costs of saccade preparation are limited to performance or also extend to visual appearance. Here, we investigated the presaccadic time course of foveal performance and appearance using a two-by-two AFC task. Participants prepared horizontal saccades while viewing random dot kinematograms (RDKs) at the presaccadic center of gaze. During saccade preparation, a motion signal of varying coherence was briefly presented, and participants reported both its perceived direction (performance) and whether its coherence appeared weaker or stronger than a fixed reference signal presented earlier in the trial (appearance) with a single, combined response. Motion coherence was parametrically varied, and the timing of the motion pulse relative to the saccade cue was sampled across several presaccadic intervals. Psychometric functions were fitted separately for performance and appearance as a function of motion coherence and time. Consistent with previous results, performance at fixation deteriorated during saccade preparation: the coherence required to reach 80% correct accuracy increased progressively following the saccade cue, reflecting a growing performance cost. Appearance showed a closely matched pattern: the point of subjective equality (PSE) shifted systematically to higher coherence values over the course of saccade preparation, indicating a gradual reduction in perceived motion strength at fixation. Together, these findings show that saccade preparation produces parallel costs in both performance and appearance at the current center of gaze. Combined with established presaccadic benefits at the saccade target, the results support a redistribution account in which presaccadic attentional gains at the movement goal are mirrored by concurrent perceptual costs at fixation.

This research was supported by a Marie Skłodowska-Curie individual fellowship by the European Commission (898520) to NMH.

Talk 2, 7:15 pm

Human Scene Viewing Fixation Patterns Emerge from Optimizing Scene Comprehension

Miguel P. Eckstein1 (), Shravan Murlidaran1; 1UC Santa Barbara

Introduction: Landmark findings of human eye movements during scene viewing include: first fixation at the center (center bias) and frequent fixations on faces, text, gazed objects, and objects critical to scene understanding. Here, we ask whether such signature human fixation patterns emerge from a strategy to optimize scene comprehension for an organism with foveated vision. We developed a foveated Visual-Language Model (VLM-FOV) whose fixation selection is optimized to comprehend scenes. Methods: Participants (N=20) viewed images of real-world scenes (M =277, UCSB) and were instructed to describe the scenes while eye movements were recorded. VLM-FOV incorporated visual loss as a function of eccentricity from the fixation point, mimicking human foveation, and trained it using reinforcement learning to maximize scene comprehension by optimally exploring through fixation selection with a 24K image training set. We compared the VLM-FOV model to saliency-based model fixations (GBVS, Harel et. al., 2006), and a neural network trained on human eye movement data to predict test image fixations (DeepGaze, Linardos, A. et al., 2021). Results: Scene description accuracy for the UCSB image set was superior for VLM-FOV with RL-optimized eye movements than fixation selection based on GBVS saliency or DeepGaze maps. Critically, the VLM-FOV model showed emergent fixation patterns resembling humans and DeepGaze: center bias for first fixations and frequent fixations on people, text, and objects critical to scene comprehension. The overall prediction of human fixations in the scene description task (UCSB images) for VLM-FOV (AUC = 0.91) was higher than GBVS (AUC = 0.86 ; p<0.001) and comparable, but somewhat inferior to DeepGaze (AUC = 0.93 ; p<0.001), which is expected because Deep Gaze (unlike VLM-FOV) is trained on human fixations. Conclusions: Our findings suggest that the landmark human fixation selection behaviors during scene viewing reflect an eye movement strategy to optimize scene comprehension.

Talk 3, 7:30 pm

Neural signatures of nonlinear information integration across fixations - evidence from recurrent neural network modelling of a large-scale MEG dataset

Carmen Amme1, Sushrut Thorat1, Philip Sulewski1,2, Malin Braatz1, Alexander Kroner1, Peter König1,3, Tim Kietzmann1; 1Institute of Cognitive Science, Osnabrück University, 2Vision and Computational Cognition Group, Max Planck Institute for Human Cognitive and Brain Sciences, 04103 Leipzig, Germany, 3Department of Neurophysiology and Pathophysiology, Center of Experimental Medicine, University Medical Center Hamburg-Eppendorf, 20251 Hamburg, Germany

The human visual system relies on rapid eye movements to sample different regions of the visual field. This fragmented information is likely integrated to facilitate successful interaction with the environment. Yet, the search for neural signatures of information integration across eye movements has so far remained unsuccessful (Xiao et al., 2024). Here, we revisit this question by harnessing the power of a large-scale MEG-eyetracking dataset and advanced deep neural network (DNN) modelling. In our dataset, five participants freely viewed 4,080 natural scenes, yielding data for about 200,000 fixations. Using a linear encoding approach, which maps DNN embeddings of fixation patches onto fixation-locked neural activity, we compared models using only the current fixation’s embedding with models incorporating embeddings from the preceding fixation. We tested whether local integration between successive fixations reflects (i) a linear remixing of features of past and present or (ii) a nonlinear integration process, as modelled by the Glimpse Prediction Network (GPN; Thorat et al., 2025), a self-supervised recurrent model trained either to (iia) predict visual embeddings for the subsequent fixation, or (iib) predict semantic large language model (LLM) embeddings of the scene captions. Linear remixing of successive fixation embeddings didn’t enhance encoding performance after fixation onset. In contrast, GPNs’ nonlinear integration did improve the encoding performance when contextualised with the previous fixation, where peak encoding performance at around 120 ms post fixation onset exhibited a 22.69% improvement (averaged across participants and 5 occipital magnetometers). This improvement was absent for the GPN variant trained to predict LLM embeddings of the scene captions, indicating that the training objective was critical to reveal integration in the brain. Contrasting prior attempts, our findings provide evidence for neural signatures of cross-fixation integration, suggesting that integration operates locally from fixation to fixation instead of converging towards a holistic scene representation.

This work was supported by the European Research Council’s (ERC) Starting grant #101039524 “TIME”.

Talk 4, 7:45 pm

An orbitofrontal-anterior temporal network complements the attention network for visuospatial attention in naturalistic viewing

Stefan Pollmann1, Nico Marek1, Radha Meghanathan1; 1Otto-von-Guericke-University, Magdeburg, Germany

Human neuroimaging experiments have consistently found dorsal and ventral attention networks subserving spatial shifts of covert attention or eye movements. However, these experiments used abstract, artificial stimuli for better experimental stimulus control and clearly defined instructions about where to attend or what to search for. This clearly is not an adequate model of eye movement control in natural environments full of semantic and social information. Here, we investigated fMRI activation and simultaneous eye movements during movie viewing, using the StudyForrest dataset. We correlated representational dissimilarity matrices of eye fixation patterns and BOLD patterns. Searching for intraindividual changes in activation patterns between movie scenes that mirrored changes in fixation patterns, we were able to identify the attention networks defined by previous experiments with artificial stimuli. In contrast, interindividual differences in fixation patterns, indicative of endogenous decisions about where to look, were mirrored by activation pattern differences in orbitofrontal and anterior temporal cortex. A follow-up analysis showed that this pattern was most pronounced for scenes containing faces. We were further able to replicate this pattern in two more datasets (Sherlock; 500 days of summer), estimating fixation patterns from the BOLD data. Thus, we could precisely replicate the role of the attention networks in gaze shifts in a more natural viewing environment. Crucially, we also showed that previous lab experiments, lacking the complexity of free viewing in a naturalistic environment, overlooked the importance of the orbitofrontal - anterior temporal network for individual differences in active social vision. Our findings show that the orbitofrontal - anterior temporal network needs to be considered in studies of active vision, particularly during looking at faces. 

Deutsche Forschungsgemeinschaft, PO 548/18-1

Talk 5, 8:00 pm

Local motion drives gaze stabilization in mice and humans

Felix Franke1, Federica Rosselli; 1University of Basel, Basel, Switzerland, 2Institute of Molecular and Clinical Ophthalmology Basel (IOB), Basel, Switzerland

Reflexive gaze stabilization, mediated by the optokinetic reflex (OKR), is critical for maintaining visual acuity during self-motion. Traditionally, this behaviour is assumed to require the computation of global image motion (pattern motion) to accurately compensate for the visual environment's trajectory. However, the primary visual inputs to the nucleus of the optic tract (NOT)—direction-selective retinal ganglion cells (DS-RGCs)—encode only local edge motion and suffer from the aperture problem. Here, we combine in vivo electrophysiology in mouse NOT, ex vivo retinal recordings, computational modelling, and cross-species psychophysics in humans and mice to determine whether the gaze stabilization system solves this ambiguity or relies on a simpler heuristic. We show that mouse NOT neurons have horizontally elongated receptive fields that cluster along the visual horizon, acting as linear integrators of local retinal inputs. When challenged with ambiguous motion stimuli, where the global motion direction diverges from the local edge vector, NOT neurons fail to encode the true global direction. Instead, their tuning shifts to align with the dominant local motion component, a result accurately predicted by a simple linear pooling model of DS-RGC inputs. Behaviourally, the mouse OKR mirrors this neural error, tracking the vector average of local edges rather than the object’s true trajectory. Remarkably, human OKR exhibits a similar systematic bias, rotating towards the local edge orientation away from the global pattern motion. These findings demonstrate that mammalian gaze stabilization relies on a low-latency, subcortical heuristic (linear pooling of local edges) that is evolutionarily conserved from mice to humans.

This work was financially supported by the Swiss National Science Foundation (SNSF) through multiple grants—Eccellenza grant PCEFP3_187001 (to F.F.); Spark grants CRSK-3_220987 (to F.F.) and CRSK-3_221257 (to F.B.R.); the Projects in Life Sciences Grant 310030_220209 (to F.F.).

Talk 6, 8:15 pm

V4 neuronal activity in response to microsaccades

Shawn M. Willett1,2, J. Patrick Mayo1,2; 1University of Pittsburgh, 2Center for the Neural Basis of Cognition

Microsaccades are thought to improve the detection of visual stimuli. One-way microsaccades might mediate stimulus visibility is by releasing photoreceptors from adaptation by shifting the retinal image. This idea is supported by the modulation of ongoing activity in many visual areas by microsaccades. Yet the specific modulation seems to depend on the particulars of the study (e.g. stimuli). Indeed, recent work indicates that microsaccade direction is a key parameter in determining the modulation across the visual hierarchy. However, how microsaccade-induced retinal changes affect neuronal activity remains unclear. In this study we quantify the modulation of V4 activity in response to microsaccades while monkeys attended to visual targets. We recorded eye position while monkeys fixated during an orientation change detection task. Monkeys maintained fixation until the orientation of one of two simultaneously presented Gabor stimuli changed. Gabor stimuli were odd-symmetric, full-contrast and counterphased at 10 Hz and presented in the lower left and right visual fields (4.4°-27° eccentricity). Eye position was measured using a scleral search coil sampled at 200 Hz. We detected microsaccades during the fixation epoch (500-5000 ms) using a 6°/s velocity threshold. We recorded the activity of over 3500 V4 neurons from two bilaterally implanted Utah arrays across 54 sessions. We found that microsaccades exerted significant and complex modulations of ongoing V4 activity. On average, roughly 80 milliseconds after microsaccade onset there was a significant suppression of V4 activity. This suppression lasted for approximately 100 milliseconds. Following this suppression was a large rebound enhancement in firing rates from 180-280 milliseconds after microsaccade onset. Importantly, we do not find an effect of microsaccade direction on the average V4 response. This work shows that intermediate visual areas like V4 are still affected by precise movements of the retina and may contribute to the improved visibility of stimuli after microsaccade.

Talk 7, 8:30 pm

A Unified Continuous Psychophysics Approach for Concurrent Assessment of Oculomotor Behavior and Perceptual Thresholds

Andrea Canessa1, Federica Ugoni1, Francesca Peveri1, Alice Sansalone1, Agostino Gibaldi2, Silvio P. Sabatini1; 1University of Genova, 2University of Modena-Reggio

Abstract: Traditional clinical assessment of oculomotor and perceptual functions often rely on batteries of disjointed, repetitive tasks (e.g., fixed-amplitude saccades, simple motion patterns) that lack ecological validity and are prone to cognitive prediction strategies. We present a novel, unified perceptual and computational framework designed to simultaneously quantify saccadic, pursuit, and vergence performance alongside perceptual thresholds for contrast and stereo within a single, continuous tracking task. ​Methods: The paradigm utilizes a Gabor patch stimulus moving within a viewing frustum defined by Hering’s binocular coordinates (version, vergence). Unlike standard deterministic trajectories, target motion is driven by a stochastic engine. This "blind" generation concurrently probes physiologically consistent smooth pursuit, fixation stability and reactive saccades. The stochastic nature of the task suppresses anticipatory motor planning strategies, yielding robust assessment of vergence stability and fusional ranges, smooth pursuit gain and latency, and saccadic gain, latency and main sequence. A multi-threaded adaptive engine (QUEST+) continuously modulates stimulus parameters (contrast, spatial frequency, binocular disparity) based on real-time gaze and vergence error, effectively closing the loop between behavioral performance and task difficulty. ​Results: The framework, tested on healthy participants, is able to provide a reliable and accurate assessment of oculomotor and perceptual performances by analyzing the continuous stream of version and vergence. Furthermore, the adaptive engine allows for the rapid estimation of contrast sensitivity functions and stereoacuity thresholds alongside the task flow. ​Conclusion: This framework aims ata paradigm shift from discrete, static testing to dynamic, continuous assessment. It represents a first step toward the gamification of sequential psychophysical evaluations, offering an alternative to conventional paradigms that are often unsuitable for fragile or less collaborative populations. By doing so, we provide an ecologically valid and adaptable tool that can be readily tailored to individual needs and extended to different neurological and neurodegenerative conditions.

Research reported in this publication was partially supported by the National Eye Institute of the National Institutes of Health under Award Number R01EY032162. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.