Theory

Talk Session: Tuesday, May 19, 2026, 2:45 – 4:30 pm, Talk Room 1
Moderator: Michael Bonner, Johns Hopkins University

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

This session was recorded. Log In to view the video.

Talk 1, 2:45 pm, 54.11

Attractive and Repulsive Perceptual Biases Naturally Emerge in Generative Adversarial Inference

Hyun-Jun Jeon¹ (hjjeon@unist.ac.kr), Hansol Choi¹, Oh-Sang Kwon¹; ¹UNIST

Perceptual estimates of orientation exhibit noise-dependent biases: estimates are attracted toward cardinal orientations under high external noise but repelled away from them under high internal noise. Bayesian observer models with efficient coding constraints explain these effects with explicitly specified prior distributions and likelihoods, yet it remains unclear whether such properties can emerge from learning with natural cost functions. We propose a Generative Adversarial Inference (GAI) model in which estimates for noisy stimuli are constructed through generative reconstruction. GAI is realized through an adversarially trained autoencoding architecture with a discriminator that distinguishes real and generated stimulus–latent (x, z) pairs to enforce distribution-level consistency, while a reconstruction loss is applied simultaneously to guide the autoencoder to learn consistent stimulus-level mappings. The model was trained in a toy visual environment consisting of 32×32 Gabor patches whose orientations followed a bimodal distribution favoring the cardinal axes, reflecting natural orientation statistics. As a result of independent training runs with different random seeds (n = 30), the trained model consistently reproduced cardinal bias properties observed in human perception. Specifically, increasing external noise produced systematic attraction toward the cardinals, and internal noise in latent space induced reliable repulsion away from them. The learned latent representation exhibited efficient-coding signatures: Fisher information peaked at the cardinals and matched the implicit prior inferred from noisy reconstructions. Ablation experiments dissociated the contributions of the two training objectives. Reconstruction-only training produced attraction without consistent repulsion, whereas adversarial-only training yielded unreliable orientation estimates with large, unsystematic biases. Only the joint objective yielded a stable orientation manifold capable of converting internal variability into repulsive bias. These findings suggest that the attractive and repulsive cardinal biases traditionally attributed to efficient coding and Bayesian inference can emerge naturally within a learned generative system, providing an account of their origin in human vision.

This research was supported by the National Research Foundation of Korea (NRF-2023R1A2C1007917 to O.-S.K.).

Talk 2, 3:00 pm, 54.12

Hand-Engineered Image-Computable Models Can Still Outperform DNNs in V1 Similarity

Tanish Mendki¹, Sudhanshu Srivastava², Ansh Soni³; ¹University of California, Santa Barbara, ²University of California, San Diego, ³University of Pennsylvania

Task-based Deep Neural Network models (DNNs) are widely used as models of inferotemporal visual cortex (IT), with early work showing a large jump over previous hand-made models [Yamins and DiCarlo, 2014]. However, recent work has suggested that over time, not only has the performance-alignment relationship plateaued but reversed, with high performing models becoming worse models of IT [Linsley et al., 2023]. Here we attempt to see if this reversal extends to earlier cortical regions. We used a 515-image subset of the Natural Scenes Dataset (NSD) and extracted V1 voxel responses from eight subjects, constructing 515 x N matrices of neural data. For each model, we compared latent representations for the same 515 images across all layers and identified the best-fitting “V1 layer” based on subject-averaged alignment scores. We evaluated a broad set of models, including SOTA IT similarity models [Schrimpf et al., 2020], high-task-performant CNN’s and VIT’s, along with other models directly attempting to model V1 [Dapello et al., 2020]. Along with DNNs we also tested traditional, hand-crafted models such as HMAX [Riesenhuber and Poggio, 1999]. Three complementary similarity measures were used: (1) representational similarity analysis, (2) pairwise matching between model units and voxels, and (3) linear predictivity via ridge regression. Surprisingly, we find that the most modern models are equal or worse models of V1 compared to HMAX even as they are better models of IT. Furthermore, HMAX becomes the best model when utilizing representational similarity scores that care about representational geometry and indistinguishable from the best for strict pairwise matching, being only permutation invariant. These findings highlight that progress in task performance has not translated into better mechanistic models of V1, and that classical image-computable models should not be treated as obsolete benchmarks. Re-evaluating hand-engineered approaches, rather than defaulting to DNNs, might be crucial for improving biologically-grounded alignment.

Talk 3, 3:15 pm, 54.13

Efficient Coding Enables Data-Efficient Learning

Ananya Passi¹, Brian S. Robinson², Michael F. Bonner¹; ¹Johns Hopkins University, ²Johns Hopkins University Applied Physics Laboratory

Deep learning is the leading approach for building computational models of visual cortex. However, conventional approaches to deep learning require immense training data and fail to explain how the human visual system learns from very little data. One key factor could be unsupervised learning, which is thought to play an important role in biological vision but is uncommon in leading deep neural networks. Here, we ask whether integrating unsupervised efficient coding principles into neural networks can make them trainable in low-data regimes. Previous work has proposed that efficient coding in higher-level vision is achieved when neurons are tuned to the maximally varying axes of natural stimuli. We created a network that learns such an efficient code in a deep hierarchy, with each layer learning from the activations of its inputs—a process that can be implemented through Hebbian learning without any task whatsoever (not even a self-supervised task). We next examined how this network performs when combined with subsequent supervised learning on image classification using limited training data. We additionally evaluated the ability of this network to predict human fMRI responses and behavioral similarity judgments using the NSD and THINGS datasets, which contain data on thousands of natural images. We found that under data-limited conditions, with just tens to thousands of training images in total, a hybrid procedure that combines unsupervised efficient coding with supervised learning strongly outperforms conventional end-to-end supervised learning. Specifically, this hybrid approach yields superior classification performance and superior alignment with visual cortex representations and behavioral similarity judgments. Remarkably, under the most stringent data limitations, conventional supervised learning fails, while our hybrid approach still exhibits robust performance. Together, these results show that biologically inspired efficient coding enables rapid learning from limited data, and they establish a new fully unsupervised learning model of the visual hierarchy.

Talk 4, 3:30 pm, 54.14

Robustness aligns the local coding axes of artificial and biological vision

Nikolas McNeal^1,2 (nikolas@gatech.edu), N. Apurva Ratan Murty^2,3; ¹Machine Learning, School of Mathematics, Georgia Tech, ²Center of Excellence in Computational Cognition, Georgia Tech, ³Cognition and Brain Science, School of Psychology, Georgia Tech

The frontier of vision neuroscience is shifting toward closed-loop experiments that rely on computational models to actively synthesize stimuli and manipulate neural states. However, current model evaluation metrics (like prediction accuracy and representational similarity) prioritize global alignment across images and neglect the fine-grained “local-coding” representational structure required for closed-loop control. Here, we expose this evaluation gap and show that models with comparable global alignment can differ profoundly in their suitability for closed-loop model-guided experiments. To characterize local-coding axes, we trained encoding models (N=21 models) on human fMRI responses (3,514 voxels, from NSD) and used targeted adversarial probes to map the local representational geometry around stimuli. We found that even the state-of-the-art brain models (CLIP ResNet-50, ViTs, etc.) were highly fragile and exhibited idiosyncratic local-coding axes that varied between architectures (subspace similarity=0.05, energy overlap=0.01, ceiling=1). To identify models better aligned with the local-coding axes of the brain, we introduced the Control-Consensus Score (CCS). This score quantified the extent to which local-coding axes generalized across models, identifying coding directions that we hypothesized were more brain-like. We found that CCS varied non-linearly with adversarial sensitivity, with marked improvements emerging only once models exceeded a critical threshold of local robustness (logistic regression R^2=0.98). Finally, we validated our approach by generating stimulus manipulations for closed-loop experiments. We found that stimulus perturbations based on models with high CCS produced semantically meaningful and interpretable changes in the image, whereas those with low CCS did not. These high-CCS models also demonstrated superior efficacy in closed-loop tests on both macaque and human fMRI data. Together, these results show that global alignment does not imply local-axis alignment. Our work established local-coding axis alignment as a critical dimension of brain-model evaluation, particularly for next-generation studies that depend on precise and interpretable stimulus control.

This work is supported by the NIH Pathway to Independence Award (R00EY032603), NSF Nexus computation support (Allocation number: SOC250049), and a startup grant from Georgia Tech to NARM.

Talk 5, 3:45 pm, 54.15

Traveling waves along the cortical depth are modulated by visual stimulation and correlate with synchronous spiking activity

Lihao Yan¹, Mitchell Morton¹, Anirvan S. Nandy¹, Monika P. Jadi¹; ¹Yale University

Neural oscillations, or rhythmic fluctuations of neural activity, have played an important role in neuroscience since their discovery. However, oscillations have primarily been studied by averaging across trials, a process that can obscure important spatiotemporal dynamics present in single trials. Recent studies have revealed that neural oscillations often manifest as traveling waves (TWs) that propagate across the cortex. TWs have been shown to play functional roles in sensory processing, working memory, and motor control. However, previous studies have primarily focused on TWs along the cortical surface, and the existence of this phenomenon along the depth of the cortex remains understudied. Moreover, the neural bases of TWs remain unclear. In this study, we investigated the nature of TWs along the cortical depth by analyzing translaminar recordings in the visual cortex of awake monkeys. We observed local field potential (LFP) TWs along the cortical depth, with their speed and probability modulated by visual stimulation. Further, we found that LFP TW speed correlated positively with local spiking probability. To investigate the underlying neuronal mechanisms, we analyzed concurrent spiking activity using the multi-unit activity envelope (MUAe) signal. Surprisingly, we did not observe MUAe TWs; rather, we observed robust MUAe synchrony across cortical depth, which co-occurred with LFP TWs. Furthermore, the phase of MUAe during synchrony correlated with LFP TW speed. Finally, we developed a minimal model of LFP TWs and MUAe synchrony as a potential linking mechanism that encapsulates our findings. Our results provide new insights into the spatiotemporal dynamics of neural oscillations along the cortical depth and highlight the importance of developing mechanistic insights into the neuronal basis of LFP TWs.

Talk 6, 4:00 pm, 54.16

Early Coarse Coding and Later High-Dimensional Refinement in IT Cortex Reveal Distinct Contributions of Excitatory and Inhibitory Neurons

Sabine Muzellec¹, Sachi Sanghavi², Kohitij Kar¹; ¹York University, Department of Biology and Centre for Vision Research, Centre for Integrative and Applied Neuroscience, Toronto, Canada, ²Department of Computer Sciences, University of Wisconsin-Madison, USA

How do excitatory and inhibitory neurons contribute to the emergence of object representations in primate inferior temporal (IT) cortex? We recorded large-scale population activity with Utah arrays from three macaques viewing 640 images (80 exemplars × 8 objects) and classified neurons as putative excitatory or inhibitory using spike waveform clustering, matching all analyses for cell count. We first asked when each population begins to carry object information. Inhibitory neurons became reliable significantly earlier than excitatory neurons (~12.3ms lead, p<0.001), indicating involvement in the earliest feedforward sweep. We next asked whether these early signals share a common representational format. They do not: inhibitory responses showed higher across-image variance and larger within- and between-category dispersion (all p<0.05), and population similarity (CKA) between the two groups dropped from ~0.7 to ~0.4 within the first 100 ms, revealing sharply diverging representational trajectories. Do these differences matter for categorization? Not at the level of overall accuracy: category decoding did not differ across early or mid-latency windows (p>0.22), suggesting that both populations can support coarse discrimination. However, the representational differences became crucial when examining which images were easy or difficult. Excitatory neurons aligned substantially more strongly with monkeys’ image-by-image accuracy (70–170ms: Δ=7%, p<0.001). To understand why inhibitory neurons lead in time yet fail to predict behavior, we examined representational geometry. By 100–200ms, inhibitory responses collapsed into a low-dimensional, highly correlated, small-radius manifold, providing coarse category separation but limited image-level structure. In contrast, excitatory neurons, the primary projection population, expanded into higher-dimensional, larger-radius manifolds that preserved rich object-specific structure that could support richer primate behaviors. Consistent with this interpretation, feedforward ANNs (e.g., ResNet-18), which implement high-dimensional feature transformations but lack realistic inhibitory circuitry, predicted excitatory responses significantly better than inhibitory responses (p<0.001). Together, these results reveal a coarse-to-fine division of labor in the IT cortex.

KK is supported by the Canada Research Chair Program (CRC-2021-00326), SFARI (967073), Brain-Canada Foundation (2023-0259), the Canada First Research Excellence Funds (VISTA Program), and NSERC (RGPIN-2024-06223). SM is funded by Connected Minds Postdoctoral Fellowship (supported by CFREF).

Talk 7, 4:15 pm, 54.17

Partitioning Signal and Noise (PSN): A modality-general denoising technique for neural responses

Jacob S. Prince¹, Heiko Schütt², Dora Hermes³, Greta Tuckute¹, Ian Charest⁴, Peter Brotherwood⁴, David G. C. Hildebrand⁵, Michael J. Tarr⁶, George A. Alvarez¹, Talia Konkle¹, Kendrick N. Kay⁷; ¹Harvard University, ²Université du Luxembourg, ³Mayo Clinic, ⁴Université de Montréal, ⁵University of Houston, ⁶Carnegie Mellon University, ⁷University of Minnesota

Large-scale neural datasets, such as the Natural Scenes Dataset, BOLD5000, and THINGS, have rapidly inspired a growing ecosystem of open data in visual neuroscience. To maximize the number of included stimuli, such datasets typically contain only a few repeated trials per stimulus, which leads to noisy measurements of neural activity even after trial averaging. This persistent noise challenges our ability to make accurate inferences about neural tuning and representational geometry, and limits the utility of the data for computational modeling. We introduce Partitioning Signal and Noise (PSN), a novel low-rank denoising method that is applicable to any dataset with repeated trials. Standard PCA-based denoising retains dimensions with high total variance, but this approach fails to distinguish signal and noise and is therefore susceptible to signal loss and biased neural activity estimates. In contrast, PSN uses a simple generative model to explicitly estimate the signal and noise distributions that underlie a set of data. This enables improved denoising via low-rank reconstruction that focuses on signal-rich dimensions. We used ground-truth simulations to characterize the bias-variance properties of PSN. Compared to alternatives such as standard PCA, independent components analysis, and simple trial averaging, PSN achieves better recovery of ground-truth signal, while incurring minimal bias. We then applied PSN to diverse visual datasets taken from human fMRI, intracranial EEG, scalp EEG, macaque electrophysiology, and marmoset calcium imaging. In all cases, we observed substantial improvements in the stability of tuning profiles (e.g., for n=2,091 FFA-1 voxels in NSD: median tuning correlation between trial splits was r=0.367 for trial averaging, r=0.584 for PSN). As a consequence of improved signal fidelity, PSN consistently raised the predictive performance of encoding models. Broadly, by addressing the pervasive challenge of noise in large-scale data, PSN enables improved representational analyses, model evaluations, and theoretical insights from existing and future neural datasets.

This research was supported by NSF CAREER BCS-1942438 (TK), funding from the Kempner Institute (TK), and an NDSEG Research Fellowship (JSP).

Vision Sciences Society

Theory

Attractive and Repulsive Perceptual Biases Naturally Emerge in Generative Adversarial Inference

Hand-Engineered Image-Computable Models Can Still Outperform DNNs in V1 Similarity

Efficient Coding Enables Data-Efficient Learning

Robustness aligns the local coding axes of artificial and biological vision

Traveling waves along the cortical depth are modulated by visual stimulation and correlate with synchronous spiking activity

Early Coarse Coding and Later High-Dimensional Refinement in IT Cortex Reveal Distinct Contributions of Excitatory and Inhibitory Neurons

Partitioning Signal and Noise (PSN): A modality-general denoising technique for neural responses

Important Dates

MyVSS

Join VSS

Future Meetings