Scene metamer judgments reveal attentional biases in visual working memory
Poster Presentation 23.319: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Visual Working Memory: Interactions with long-term memory
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Abe Leite1, Ritik Raina1, Seoyoung Ahn2, Gregory J. Zelinsky1; 1Stony Brook University, 2University of California, Berkeley
Scene representations evolve across viewing, with each fixation revealing not only the directly viewed object but also constraining inferences about the rest of the scene. We propose a novel paradigm for probing representations in visual working memory [VWM]. Participants view an image for a controlled number of fixations (1, 3, 5, or 10), then (following an 8s delay) see a second image for 200ms and must judge whether the second image was identical to the first. We use the term ‘scene metamers’ to describe physically different images judged to be identical, and propose that these images are confused because they have similar VWM representations. In this study, we directly test two hypotheses: first, that VWM is biased towards representing information at fixated locations (gaze-contingent bias); and second, that among non-fixated locations, VWM is biased towards representing information at semantically-meaningful locations that could potentially have been fixated (meaning-contingent bias). To do so, we employ the highly-controllable Seen2Scene model (presented at VSS 2025) to generate potential scene metamers that match the scene’s overall structure, but are biased to retain high-resolution visual information from small patches (size of human parafovea) at condition-dependent locations. In the own-fixation condition (gaze-contingent), these are the points the viewer fixated. In the same-image condition (meaning-contingent), they are the fixations of a different viewer on the same image. And in the cross-image condition (control), they are the fixations of a different viewer on a different image. Data from over 50 participants viewing 300 images each clearly reflect both of the hypothesized biases: gaze-contingent generations are more often metameric than meaning-contingent generations, which are in turn more often metameric than control generations. These effects hold when controlling physical and semantic similarity metrics. We conclude that our Seen2Scene-based paradigm is a promising way to probe the information underpinning visual memory representations.
Acknowledgements: AL is supported by NSF-GRFP #2234683 and NIH-NEI R01EY030669 to GJZ. RR and GJZ are supported by NSF-CompCog #2444540 to GJZ.