Spatial Vision: Neural mechanisms, modeling

Talk Session: Saturday, May 16, 2026, 10:45 am – 12:30 pm, Talk Room 1
Moderator: David Brainard, University of Pennsylvania

Talk 1, 10:45 am, 22.11

A nonlinear, spatio-temporo-chromatic, image-computable model of the ON-center midget retinal ganglion cell mosaic across the retina

Nicolas Cottaris1, Brian Wandell2, David Brainard1; 1University of Pennsylvania, 2Stanford University

Our previous work introduced a framework for synthesizing biologically plausible mosaics of linear spatio-chromatic receptive fields (RFs) for ON-center midget Retinal Ganglion Cells (mRGCs). We validated the spatial properties of these linear RF mosaics against in vivo and in vitro physiological data from macaques, and we further showed that a computational observer with access to the model output recapitulates aspects of human psychophysical spatio-chromatic contrast detection. To enhance realism and predictive power, we now extend the model by incorporating key non-linear and temporal mechanisms. Specifically, (a) the nonlinear cone phototransduction cascade, (b) mRGC temporal dynamics, and (c) static non-linearities (e.g., contrast gain control) at the mRGC RF center and surround components as well as the mRGC output stage. These additions preserve the model's agreement with published spatial tuning data in macaque mRGCs at the stimulus conditions under which the macaque data were collected (temporal frequency: 4 Hz, contrast: 25%, background luminance: 40 cd/m2 ). Crucially, the non-linear and linear models significantly diverge under conditions where adaptation effects are strong, e.g., low temporal frequency (< 2 Hz), high background luminance (> 100 cd/m^2), and high contrast (> 75%). This divergence reveals the critical importance of incorporating non-linear mechanisms in predicting performance under diverse stimulation conditions. Furthermore, we propose a novel method for integrating mRGC temporal dynamics with the phototransduction cascade, showing that a triphasic temporal impulse response filter, placed after the cone outer segment photocurrent generation stage, is sufficient to reproduce measured mRGC stimulus-referred dynamics. The updated model represents a significant step toward predicting human performance across a wider range of spatio-temporo-chromatic psychophysical tasks.

Talk 2, 11:00 am, 22.12

Contrast-Dependent Reorganization of Center–Surround Orientation Tuning

Lisa Schwetlick1,2, Peter Neri2,3; 1EPFL, 2ENS, 3IIT

Visual systems flexibly adjust local processing in response to a variety of contextual signals. For example, orientation tuning is known to depend strongly on surrounding orientations. Here, we asked (1) how orientations in the surround influence detection of a central orientation target, and (2) how this center–surround interaction reorganizes as a function of relative contrast. Such shifts may reveal transitions between linear and nonlinear computational regimes. Observers performed a 2-AFC orientation-discrimination task in which both the central target patch and a surrounding annulus were independently perturbed with orientation noise. Reverse-correlation analysis yielded tuning curves for the center and surround separately. By varying center and surround contrasts, we systematically examined not only the strength of contextual modulation but also how the shapes of the tuning curves change across contrast regimes. Tuning curves computed from the center noise show that target-aligned noise in the center biases decisions toward the target orientation, whereas orthogonal noise has the opposite effect. Introducing target-aligned or orthogonal surrounds causes suppression of the corresponding peak of the central tuning curve (i.e., same-orientation inhibition). Surround tuning curves at low contrasts similarly show a positive peak at target-aligned orientations. At high contrasts, however, this pattern reverses: tuning shifts toward a regime that emphasizes contrastive information in the surround. We also find considerable individual differences between observers. We compared the performance of several candidate models—including linear templates, energy-based nonlinear models, and divisive normalization models. No single strategy can account for behavior across all regimes. Instead, center–surround interactions appear to reorganize systematically with contrast, revealing dynamic transitions between linear and nonlinear processing strategies in early vision.

Talk 3, 11:15 am, 22.13

Adaptation reveals a gap between contrast discrimination and contrast response

Sanga Cho1 (), Sam Ling1; 1Boston University

Perceptual discrimination is linked to nonlinearities in neural coding, central to Weber-Fechner law. For instance, contrast discrimination is thought to arise from the nonlinearity between stimulus contrast and the neural contrast response function (CRF), with the canonical dipper-like shape of the threshold-versus-contrast (TvC) function attributed to the inverse slope of the CRF. However, this standard model linking brain with behavior has been difficult to evaluate, as nonlinear CRFs have been challenging to measure with fMRI. Here, we leverage a recently developed adaptation paradigm that captures nonlinearity of CRFs, to test models linking population CRFs to the TvC. To do so, each fMRI run started with initial 60-sec adaptation period to full visual field stimuli at 5%, 8%, or 32% contrast. Participants then viewed a sequence of stimuli at contrasts that straddled the adaptor contrast, interleaved with top-up adaptors, yielding voxel-wise CRFs at three adaptation conditions. We then compared these CRFs to TvC functions at the same adaptor levels, in the same participants. TvC functions were estimated via a staircase procedure, with participants judging which of two sequential stimuli had higher contrast, across different pedestal contrasts at the same three adaptation contrasts. We found that higher contrast adaptors substantially shifted neural sensitivity toward higher contrasts, increasing the semi-saturation across V1-V3. Psychophysical TvC dips also shifted toward higher contrasts with adaptation. While both neural and behavioral measures demonstrated ordinal alignment in how adaptation recenters sensitivity, the standard model still fell short. Model predictions based on ROI-averaged CRFs produced contrast sensitivity that was quantitatively misaligned with true psychophysical data, across adaptation conditions. Instead, voxel-wise analyses of population CRFs suggest that incorporating optimal pooling mechanisms may provide a principled bridge, linking neural contrast responses to contrast discrimination.

Talk 4, 11:30 am, 22.14

Orthogonal neural geometries of basic visual features in macaque V1

Cong Yu1, Xin Wang2, Shi-Ming Tang2; 1Zhejiang University, 2Peking University

The classical ice-cube model of Hubel and Wiesel proposes that V1 neurons are spatially organized into orthogonal maps of orientation and ocular dominance to optimize wiring efficiency. However, extending this framework to include additional features such as spatial frequency imposes geometrical constraints on how these features can be spatially arranged on the two-dimensional cortical surface. A recent two-photon calcium imaging study of ours (Zhang et al., Cerebral Cortex, 2024) found that cellular-resolution maps of orientation, spatial frequency, and ocular dominance in macaque V1 lack consistent orthogonal or parallel spatial arrangements, raising the question whether these orthogonal features are instead represented in population activity space. We applied principal component analysis (PCA) to these and additional datasets. We found that population responses formed near-orthogonal geometries in representational space, supporting the idea that feature encoding relies more on population-level activity than spatial layout. This orthogonal structure remained robust to dimensionality increase and was absent in response-shuffled control data, in which feature axes collapsed to chance-level alignment. Furthermore, artificially disrupting orthogonality, either by aligning feature axes or randomizing trial positions in PCA space, severely impaired the decodability of stimulus features, demonstrating that orthogonal representations are critical for maintaining feature separability. The same orthogonality was also replicated in the representational geometry of color and orientation in macaque V1 interblob neurons, but not in blob neurons that showed high color selectivity but poor orientation selectivity. These findings suggest that V1 population responses follow an orthogonal encoding geometry, which makes orthogonal representation of multiple stimulus features possible. Moreover, population codes, rather than spatial maps, better capture feature representations. This principle may also serve as an important benchmark for V1-inspired deep neural networks.

Talk 5, 11:45 am, 22.15

Visual masking reduces spatial but increases temporal field asymmetries: A speed-accuracy tradeoff study

Nicholas Crotty1 (), Marisa Carrasco1,2; 1Department of Psychology, New York University, 2Center for Neural Science, New York University

[INTRODUCTION] Visual perception is better and processing time is faster along the horizontal than the vertical meridian (horizontal-vertical anisotropy, HVA), and along the lower than the upper vertical meridian (vertical meridian asymmetry, VMA). Such asymmetries have been shown using speed-accuracy tradeoff (SAT) protocols, which conjointly measure discriminability and information accrual rate (processing time). According to “interruption theories” of visual masking, backwards-pattern masks disrupt the accrual of visual information shortly after onset. Thus, here we investigate for the first time whether and how visual masking alters differences in discriminability and processing time among the cardinal meridians. [METHODS] We introduce masking to the SAT protocol from Carrasco & McElree (2001), via a local patterned-mask following stimulus offset on half of the blocks. Participants performed both feature and conjunction search with stimulus arrays (set sizes of 1, 4, 8) presented at the cardinal locations for 40ms, and the mask subsequently presented. A tone at different intervals (40ms-2s) indicates the opening of the response window. We fit the 3-parameter SAT model (intercept, slope, asymptote) to the data using both a hierarchical nested-modeling and a novel Bayesian generalized-linear-mixed-effects (GLME) approach, quantifying the effects of masking and cardinal meridian on each SAT parameter. [RESULTS] The candidate HVA model from the nested approach indicated that masking reduced the asymmetry in discriminability, but increased the asymmetry in processing time. Both findings were confirmed by a Bayesian GLME model mirroring the candidate model. The candidate VMA model and its Bayesian equivalent indicated that masking reduced the discriminability asymmetry. Moreover, a fully-saturated Bayesian model indicated that masking also increased the VMA in processing time. [CONCLUSION] Both approaches revealed backwards pattern-masking reduced polar angle asymmetries in discriminability but strengthened them in information accrual. These findings highlight the importance of considering both spatial and temporal performance field asymmetries.

Supported by NSF GRFP to NC and NIH R01 EY027401 to MC

Talk 6, 12:00 pm, 22.16

Why cortical implant patients may never see the stars

Eirini Schoinas1 (), Leili Soo2, Alfonso Rodil Doblado2, Geoffrey M. Boynton3, Ione Fine3, Eduardo Fernández2; 1University of California, Santa Barbara, 2University Miguel Hernández, Elche, Spain, 3University of Washington

The majority of visual cortical prosthetic devices under development rely on small penetrating electrodes, motivated by the assumption that increasing electrode density and reducing electrode size will produce high-resolution vision. However, neurons in the brain are not analogous to pixels on a screen - stimulation of a small number of cells by a tiny depth electrode will not consistently produce discrete, punctate phosphenes. Here, we compared data from CORTIVIS patients (Fernández et al. 2021) implanted with small depth electrodes to predicted phosphenes generated using a “virtual patient” simulation based on a simple model of V1 (Fine and Boynton 2024). Although stimulation with the tiny electrodes of the CORTVIS implant occasionally produced small punctate "stars", patients reported a variety of percepts, including larger spots and irregularly shaped forms. These phosphenes were stable within electrodes but varied substantially across stimulation sites. Our model successfully predicted the wide variety of percepts reported by patients and further suggests that, for a subset of electrodes, stimulation may have activated multiple cortical locations — possibly as a result of axonal activation. This virtual patient model can also be used to simulate patient perceptual experiences for future cortical implants (such as Neuralink’s). Our simulations of ‘implants of the future’ show that high resolution cannot be obtained simply by having a very large number of very small electrodes – it is necessary to develop an encoding model to stimulate the right electrode or combination of electrodes. Unfortunately, no one has yet suggested a sensible way to measure the receptive field of each individual electrode in a blind individual. Models like ours are an example of how virtual prototyping can offer more realistic expectations about what kind of vision bionic eyes might provide.

PDC2022-133952-100 & PID2022-141606OB-I00 Ministerio de Ciencia, Innovación y Universidades; CIPROM/2023/25 Generalitat Valenciana: Programa Iberoamericano: CYTED- NT4SM; EU Horizon 2020 Research and Innovation Programme No. 899287: NeuraViPeR) [EF]; NIH R01EY014645 [IF & GMB].

Talk 7, 12:15 pm, 22.17

A generative diffusion model reveals V2’s representation of natural images

Neel Agarwal1, Gabriel Yancy1, Zahra Kadkhodaie2, Justin D. Lieber1, J. Anthony Movshon1, Eero P. Simoncelli1,2; 1Center for Neural Science, New York University, 2Center for Computational Neuroscience, Flatiron Institute

Neurons in visual cortex represent the complex features that occur in natural environments. Characterization of their responses has been limited by our inability to create stimuli that are both naturalistic and precisely controlled. Traditional stimuli – like gratings, textures, or noise – capture only specialized elements, while photographic images are too unconstrained for systematic study. Here, we leveraged a generative diffusion model to create stimuli, and used high-density electrode arrays to measure responses of populations of neurons in anesthetized macaque V2. We trained a diffusion model, using a denoising objective, to capture a prior probability distribution over a set of photographs of scenes from which natural images can be sampled. In the vicinity of any given image, this distribution may be approximately described as a low-dimensional manifold. We generated images both within and off this manifold, with controlled pixel-level distances (MSE) relative to a set of base natural images. Specifically, we generated off-manifold images by adding Gaussian white noise to base images, while we generated on-manifold images by adding noise and then forcing the noisy images back onto the learned manifold. On-manifold images drove population responses that were more diverse than distance-matched off-manifold images, without greatly changing the overall mean response. By analyzing responses to images that varied in distance from the base image, we found that the cosine similarity of the population response relative to the base image declined more steeply on-manifold than off-manifold. This occurred because some responses increased while others decreased along on-manifold trajectories, forming heterogeneous encoding axes. Our results suggest that V2 neurons provide an orderly representation of natural image structure. Diffusion models thus provide a powerful new stimulus engine to explore population coding in visual cortex.

Simons Foundation, NIH EY022428