Hierarchical Representational Transformations of Working Memory in Humans and Machines
Poster Presentation: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Visual Working Memory: Models, neural
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Qingqing Yang1 (), Hyewon Willow Han2, Bogeng Song3, Julie Golomb1, Dobromir Rahnev3, Yalda Mohsenzadeh2, Hsin-Hung Li1; 1The Ohio State University, 2Western University, 3Georgia Institute of Technology
Working memory (WM) allows humans to maintain past information even as new inputs arrive, yet how WM representations transform between encoding and retrieval remains unclear. Moreover, whether and how the format of WM representations, and their transformations across phases, are aligned between the human brains and artificial agents remains largely unexplored. To address these questions, we analyzed WM representations in the public human 7T fMRI Natural Scenes Dataset and compared them to the representations in recurrent neural networks (RNNs) trained on the same naturalistic 1-back task. Using representational similarity, cross-decoding, and subspace geometry analyses, we compared rotational and non-rotational transformations between WM encoding and retrieval phases in different brain regions and model layers. Our analyses revealed convergent evidence for a gradient of WM coding schemes in both the human brain and neural networks. In humans, early visual regions (V1–hV4) underwent large representational changes from encoding to retrieval, including both rotational and non-rotational transformations. In contrast, representations were more stable in higher-order regions in the prefrontal cortex (FEF, dlPFC). Applying the same analyses to models showed a similar hierarchical pattern, but it critically depended on the learning objective and the recurrent architecture. We examined two different encoder architectures (ResNet and Vision Transformer ViT) with two learning objectives (supervised and self-supervised learning), followed by three recurrent architectures (vanilla RNN, GRU, and LSTM). Supervised encoders exhibited an increasing stability along network layers paralleling the cortical gradient in terms of both rotational and non-rotational transformations, while self-supervised encoders lacked such a hierarchy. Among recurrent architectures, gated architectures (GRU, LSTM) better reproduced the brain-like hierarchical patterns of WM transformations. Together, these results revealed hierarchical shifts between flexibility and stability in WM representational transformation in both humans and machines, with supervised learning objectives combined with gated recurrent dynamics most closely resembling human WM mechanisms.