Linking neural representations to behavior with a unified model of human vision

Undergraduate Just-In-Time Abstract

Poster Presentation 23.349: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Undergraduate Just-In-Time 1

Kushal Dudipala1, Mayukh Deb1, Ben Lahner2, Ratan Murty1; 1Georgia Institute of Technology, 2Massachusetts Institute of Technology

A complete account of human vision must explain how visual inputs are transformed into neural population activity and, in turn, into behavior. To do this, we need models that operate on images (like us), build internal representations that correspond directly to measured brain responses (neurons/voxels) which are then linked to produce outputs that match human behavior. Progress has been limited by a prevailing one-level-at-a-time approach, in which models are optimized for neural alignment or behavioral prediction, but rarely both. Here, we introduce NeuroMap, a unified model of the human visual system trained to satisfy two constraints: its internal representations must align with activity in the visual cortex, and its outputs must predict human perceptual decisions. NeuroMap uses a Vision Transformer backbone with internal tokens mapped to visual areas V1, V2, V4, and IT using MOSAIC fMRI data, and an output layer trained on two-alternative forced-choice judgments from DreamSim. This joint objective directly couples neural representation and perceptual behavior within a single model while minimally increasing parameter count. Unlike standard encoding models, NeuroMap does not need a separate mapper to turn model features into brain predictions. Its internal units are trained from the outset to align to fMRI responses (noise-corrected Pearson ρ on NSD held-out: V1 = 0.59, V2 = 0.66, V4 = 0.68, IT = 0.57). As a result, the model not only recapitulates key aspects of visual cortical organization, but also remains accurate in predicting human behavior, achieving 92.4% accuracy at predicting human 2AFC choices. Because brain-like internal units and behavioral outputs are linked within the same system, NeuroMap opens the avenue for lesioning and micro-stimulating units corresponding to different visual areas and measuring downstream consequences on behavior. These results make NeuroMap a promising prototype for a unified model of vision connecting neural activity to perception.

Acknowledgements: This work was funded by the NIH Pathway to Independence Award from the NEI (R00EY032603) and a startup grant from Georgia Tech (to NARM)