Hierarchical Representational Transformations of Working Memory in Humans and Machines

Poster Presentation 53.350: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Visual Working Memory: Models, neural

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

Qingqing Yang¹ (yang.6118@osu.edu), Hyewon Willow Han², Bogeng Song³, Julie Golomb¹, Dobromir Rahnev³, Yalda Mohsenzadeh², Hsin-Hung Li¹; ¹The Ohio State University, ²Western University, ³Georgia Institute of Technology

Working memory (WM) allows humans to maintain past information even as new inputs arrive, yet how WM representations transform between encoding and retrieval remains unclear. Moreover, whether and how the format of WM representations, and their transformations across phases, are aligned between the human brains and artificial agents remains largely unexplored. To address these questions, we analyzed WM representations in the public human 7T fMRI Natural Scenes Dataset and compared them to the representations in recurrent neural networks (RNNs) trained on the same naturalistic 1-back task. Using representational similarity, cross-decoding, and subspace geometry analyses, we compared rotational and non-rotational transformations between WM encoding and retrieval phases in different brain regions and model layers. Our analyses revealed convergent evidence for a gradient of WM coding schemes in both the human brain and neural networks. In humans, early visual regions (V1–hV4) underwent large representational changes from encoding to retrieval, including both rotational and non-rotational transformations. In contrast, representations were more stable in higher-order regions in the prefrontal cortex (FEF, dlPFC). Applying the same analyses to models showed a similar hierarchical pattern, but it critically depended on the learning objective and the recurrent architecture. We examined two different encoder architectures (ResNet and Vision Transformer ViT) with two learning objectives (supervised and self-supervised learning), followed by three recurrent architectures (vanilla RNN, GRU, and LSTM). Supervised encoders exhibited an increasing stability along network layers paralleling the cortical gradient in terms of both rotational and non-rotational transformations, while self-supervised encoders lacked such a hierarchy. Among recurrent architectures, gated architectures (GRU, LSTM) better reproduced the brain-like hierarchical patterns of WM transformations. Together, these results revealed hierarchical shifts between flexibility and stability in WM representational transformation in both humans and machines, with supervised learning objectives combined with gated recurrent dynamics most closely resembling human WM mechanisms.

Vision Sciences Society

Hierarchical Representational Transformations of Working Memory in Humans and Machines

Important Dates

MyVSS

Join VSS

Future Meetings