Using generative models to probe scene representations in visual sensory memory

Undergraduate Just-In-Time Abstract

Poster Presentation 33.342: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Undergraduate Just-In-Time 2

David R. Song1 (), Ritik Raina1, Abe Leite1, Gregory J. Zelinsky1; 1Stony Brook University

In our lab’s prior work, we built Seen2Scene, a novel fixation-to-image latent diffusion model capable of generating metameric images using blurry peripheral input and high-resolution foveal information. Our prior work validated this model in the context of working memory, where a same–different task across an 8 second delay revealed scene metamers – pairs of images that, though different, were judged to be the same. In this poster, we extend that paradigm to the study of sensory memory scene representations. In a guided-saccade change-detection task on 300 Visual Genome images, participants monitored for scene changes while following a red moving ring around the image. The ring followed a predetermined scanpath generated from an empirical fixation density map for the image, and each saccade’s amplitude was at least 10 degrees of visual angle to ensure saccadic suppression. In change trials, the scene was replaced by a generation between fixations 4-5, 7-8, or 10-11. Potential metamers were generated by Seen2Scene, conditioned with foveal information at the pre-and-post-change locations, as well as the fixated locations of the followed scanpath. Catch trials contained no change. The primary outcome was scene-change miss rate, which we see as a proxy of sensory metamers – generated scenes that produce understandings with equivalent sensory memory representations. We found that miss rates rose with fixation index, rising from 34% at 5 fixations to 45% at 11 fixations, indicating that generations conditioned on more fixations had higher metamerism rates. These findings support the utility of Seen2Scene generations as probes into sensory memory scene representations, and the effect of fixation count suggests that fixation history may contribute some information to sensory representations. Future work will include feature analyses of metameric scene elements, as well as new scanpath conditions to assess whether and how sensory memory judgments are specifically influenced by fixation history.