Visual Memory

Talk Session: Wednesday, May 22, 2024, 8:15 – 10:00 am, Talk Room 2
Moderator: Wilma Bainbridge, University of Chicago

Talk 1, 8:15 am, 61.21

The effects of visual encoding speed on ERP markers of subsequent retrieval

Igor Utochkin1 (), Chong Zhao1, Edward Vogel1; 1University of Chicago

Our memory for meaningful visual stimuli is remarkable: Even when we see thousands of images, each presented for a few seconds, we can later recognize them among new images with high accuracy and in detail (Standing et al., 1973; Brady et al., 2008). However, recognition suffers if the images are encoded at a speed of 2 images per second or faster (Intraub, 1980; Potter, 1976; Potter et al., 2002). Presumably, this happens because the encoding of each new rapidly presented image disrupts relatively slow short-term memory consolidation which is essential for the instantiation of subsequent long-lasting episodic memory. Here, we studied how encoding speed impacts EEG markers of subsequent recognition, namely, ERP Old/New effects, that is, differences between ERP responses to earlier presented (old) and never presented (new) stimuli. In each block, participants memorized sequences of 20 real-world object images at a slow or fast rate (one image each 1,750 ms or 250 ms, respectively). Their memory was then tested with an “old/new” recognition task combined with EEG recording. Our analysis focused on two ERP Old/New components typically distinguished in the literature (Curran, 2000; Paller et al., 2007; Rugg & Curran, 2007): earlier frontal, FN400, and later parietal, LPC. Although observers showed significantly worse recognition at the fast compared to the slow encoding condition, we found almost equally pronounced FN400 in both conditions. In contrast, the LPC was much larger in amplitude in the slow than in the fast encoding blocks. One interpretation of this dissociation can be that fast encoding speed selectively impairs recollection-based memory (which reflects in reduced LPE) but not familiarity-based memory (little effect on FN400). However, other interpretations (e.g., that slower encoding speed causes a stronger confidence signal reflected in LPE) can also be discussed.

Acknowledgements: ONR N00014-22-1-2123

Talk 2, 8:30 am, 61.22

Color-shape concepts and their representation in macaque monkey

Spencer R. Loggia1,2 (), Helen E. Feibes1, Karthik Kasi1, Noah Lasky-Nielson1, Bevil R. Conway1; 1Laboratory of Sensorimotor Research, National Eye Institute, 2Department of Neuroscience, Brown University

Object concepts are important tools of cognition that often reflect the interaction of a color and a shape. So, “banana” is a yellow crescent. The brain areas that store color-shape interactions are poorly understood. Testing various hypotheses has been challenging because concepts differ between people, and the corresponding likelihood functions and priors about object shapes and colors are not precisely known. Moreover, functional brain patterns differ among individuals. To overcome these challenges, we raised two macaque monkeys to learn about the colors and shapes of a set of 14 objects. Shape was learned faster than color, as in humans. After the monkeys spent four years interacting with the objects, we scanned their brains while they held in mind the color or shape of the objects. We developed a search-light analysis inspired by convolutional networks that is more robust against noise and better generalizes to cross-cue decoding settings. Cross-cue decoding was significant throughout the cortical visual pathway, implying that color-shape concepts are stored in a distributed network. Overall, cross-cue decoding was best in the posterior parcel of inferior-temporal cortex (PIT) (Acc.=.36 +/-.04, chance=.17). Relative to within-cue decoding, cross decoding increased progressively from posterior to anterior inferior-temporal cortex (AIT) and rhinal cortex (r=.86, p=1.2e-16), suggesting the culmination of the ventral visual pathway in AIT/rhinal cortex is a key locus for generating color-shape concepts. Within PIT, color-decoded-from-shape was relatively greater than shape-decoded-from-color, while the opposite pattern was observed within AIT and rhinal cortex. These asymmetries suggest that PIT represents perceptual memory colors, while AIT and rhinal cortex, and their reciprocally connected targets, compute an abstract concept of colors associated with shapes that could be used to guide visual search.

Talk 3, 8:45 am, 61.23

Neural correlates of boundary extension during visual imagination

Timothy Vickery1 (), Banjit Singh1, Alyssa Levy1, Kallie Sweetman1, Zoe Cronin1, Helene Intraub1; 1University of Delaware

Participants typically remember seeing a greater expanse of a scene than was visible in a studied close-up (boundary extension, BE). Multivoxel pattern analysis (MVPA) was used to test the neural correlates of BE. For each participant, a classifier was trained using a whole-brain searchlight method to discriminate between close-up and wider-angle versions of 16 scenes during repeated perceptual exposures. Earlier in the experiment, each participant had studied either the close or the wide version of each scene and then, on cue, visually imagined it from memory. If a brain area reflects BE, then unlike classification during perception, visual images of close views should now be misclassified as wide (capturing false memory beyond the view) whereas visual images of wide views should be correctly classified. The classifier indeed revealed BE-consistent patterns during imagery in several high-level visual regions, especially in the posterior parietal cortex (cluster TFCE corrected for multiple comparisons). Importantly, this BE-consistent pattern did not reflect a brain-wide bias toward better classification of wider-angle views: (1) the pattern was constrained to visually-responsive regions (occipital, parietal, and inferior temporal); and (2) the pattern reversed (better classification of close views) in early visual cortex, suggesting a bias toward the object in these regions. Following the visual imagery task, participants were again shown their originally presented views and rated each one as closer or farther away than before (4-pt scale); rating analysis revealed the typically pattern: robust BE for close views and no directional error for the wide views, thus verifying BE with a common behavioral measure. We propose that our new method reflects active maintenance of boundary-extended scene representations in memory, and that it holds promise, not only for further exploration of BE, but as a general purpose tool for decoding false memory in the brain.

Acknowledgements: This work is supported by NIH COBRE 5P30GM145765-02 .

Talk 4, 9:00 am, 61.24

Individual differences in visual mental imagery assessed through standard and evolutionary classification images

Guido Maiello1 (), Ahmadreza Tajari2, Katherine R. Storrs3, Reuben Rideaux4, Thomas S. A. Wallis5,6, William J. Harrison7, Roland W. Fleming6,8; 1Justus Liebig University Giessen, 2Sharif University of Technology, 3University of Auckland, 4The University of Sydney, 5Technical University of Darmstadt, 6Centre for Mind, Brain and Behaviour (CMBB), Universities of Marburg, Giessen and Darmstadt, 7University of Queensland, 8Justus Liebig University Giessen

Mental imagery—the ability to visualize images in the mind’s eye—is associated with many perceptual and cognitive facilities that vary across individuals. Quantifying mental imagery abilities however is challenging and typically relies on subjective and self-report methods, as the world of the imagination is not directly measurable. In contrast to such approaches, here, we propose a method for directly visualising mental images using classification images. Despite their potential, classification images have not been adopted for evaluating individual differences in mental imagery ability due to two primary challenges: the time-consuming nature of traditional reverse correlation that requires many hours of testing, and the uncertainty about how to interpret the reconstructed images. To address these challenges, we first optimized a traditional reverse correlation paradigm to yield recognizable classification images in under 20 minutes, and then developed an additional “evolutionary” paradigm—based on genetic search. We used these methods in an experiment with 20 typical participants who underwent multiple sessions of “standard” and “evolutionary” reverse correlation tasks in which they detected the letter “S” in pure pixel noise images. We fed the generated classification images into deep neural network image classifiers trained at recognizing handwritten letters in noise. We took the networks’ cross-entropy loss as a measure of the quality of the generated classification images, and thus of the mental imagery abilities of each participant. This approach exhibited substantial test-retest reliability within both standard (r=.58, p<.01) and evolutionary (r=.42, p<.05) reverse correlation sessions, as well as across paradigms (r=.55, p<.01), and the reliability of the estimated individual differences improved linearly with increasing number of trials (r=.73, p<.001). These results indicate that both “standard” and “evolutionary” reverse correlation methods consistently measured individual differences in mental imagery. This work thus paves the way for a more nuanced and objective understanding of this complex cognitive function.

Talk 5, 9:15 am, 61.25

Beyond 'Gist': The Dynamic Interplay of Conceptual Information and Visual Detail in Long-Term Memory

Nurit Gronau1 (), Roy Shoval1, Rotem Avital-Cohen1; 1The Open University of Israel

Conceptual information plays an important role in visual LTM, however, the precise nature of such semantic-visual interactions is yet unclear. Here, we tested the effects of object meaning on memory for an arbitrary visual property, specifically, item location. Unlike studies examining object-location binding in visual WM, LTM's longer timescale might involve unique processes that are above and beyond those involved in VWM. According to 'Resource-limited' accounts, highly familiar items demand fewer encoding resources, allowing spare capacity for visual detail encoding (Popov & Reder, 2020). 'Schema-based' accounts, in contrast, suggest that conceptual knowledge may prioritize gist-based representations, at the expense of a visual representation (e.g., Bellana et al., 2021; Koutstaal et al., 2003). Namely, semantic information may hinder item-specific memory, particularly over long time lags. To test these opposing theories, participants encoded individual objects at arbitrary screen locations and were subsequently tested on their memory for these locations using a 4-AFC recognition test, encompassing both old/new items and old/new locations. As expected, overall memory was higher for meaningful (real-world) than for meaningless (scrambled) objects. Critically, given correct item identification, the relative correct location memory rates were significantly higher for the meaningful objects. A follow-up study employed only real-world objects that were independently rated for their ‘meaningfulness' and ‘visual complexity’ factors. Once again, object meaning was positively associated with location accuracy, providing a more fine-tuned measure of conceptual influence on visual memory. Finally, using objects with color-meaningful (e.g., red wine) versus color-meaningless (red balloon) features, we found that in contrast to feature-independent theories (Utochkin & Brady, 2020), location memory was more heavily reliant on color memory when the latter was meaningful. Collectively, our findings align with resource-limited theories, suggesting that meaningful stimuli or features allow an enhanced LTM for arbitrary visual details. Follow-up studies will test semantic-visual memory dynamics over longer time-lags.

Talk 6, 9:30 am, 61.26

The visual memorability of natural warning patterns: insights from humans and machines

Federico De Filippi1 (), Olivier Penacchio1,2, Akira R. O'Connor1, Julie M. Harris1; 1University of St Andrews, St Andrews, United Kingdom, 2Computer Vision Center, Universitat Autònoma de Barcelona, Barcelona, Spain

While some animals camouflage themselves, others advertise that they are toxic using bright colours and salient stripes and/or spots (‘warning patterns’). Their striking appearance is thought to warn off predators: a memorable pattern may help predators learn about toxicity and discourage future attacks on similar prey. However, how warning patterns influence visual memory has never been documented. Research suggests that when glancing at a picture, people do not intuitively know what makes it memorable or forgettable, but they remember and forget the same images (i.e., there is high inter-subject consistency). This means that the likelihood of remembering a picture (its ‘memorability’) can be computationally predicted from the visual information contained in the picture. Memorable images also lead to stronger neural firing when processed by real and artificial visual systems. We used a database of Lepidoptera (butterfly/moth) images, some of which carry warning signals (aposematic: AP), and some which do not (non aposematic: nAP). We measured human memorability for both AP and nAP Lepidoptera and examined the sources of memorability variation across subjects. Observers studied images while providing subjective ratings (1-10) of memorability, followed by a recognition test (‘Seen before?’). Memorability was computed as the proportion of subjects who remembered previously seeing each image. AP species appeared subjectively more memorable than nAP ones, but, on average, they were not better remembered. Remarkably, AP species led to high inter-subject consistency in memorability (Spearman’s rho = .79), but consistency for nAP species was comparatively low (Spearman’s rho = .37). When we exposed our Lepidoptera patterns to deep neural networks trained for object classification, we found that AP species that were memorable to humans also evoked stronger neural responses in some hidden layers. Taken together, these findings suggest that warning patterns might exploit shared visual mechanisms that underlie successes and failures in picture recognition.

Acknowledgements: This work is funded by the Biotechnology and Biological Sciences Research Council (United Kingdom Research and Innovation)

Talk 7, 9:45 am, 61.27

Semantic and Visual Features Drive the Intrinsic Memorability of Co-Speech Gestures

Xiaohan (Hannah) Guo1 (), Susan Goldin-Meadow1, Wilma A. Bainbridge1; 1The University of Chicago

Co-speech gestures that teachers spontaneously produce during explanations have been shown to benefit students’ learning. Further, prior work suggests that information conveyed through teachers’ gestures is less likely to deteriorate than through speech (Church et al., 2007). However, how intrinsic features of gestures affect students’ memory remains unclear. The memorability effect denotes a phenomenon where adults with different backgrounds consistently remember and forget particular visual stimuli (static images, dance moves, etc.), owing to the stimuli's intrinsic semantic and visual features. In this study, we investigate whether certain gestures are consistently remembered and, if so, which semantic and visual features are associated with these remembered gestures. We first created 360 10-second audiovisual stimuli by video recording 20 actors producing unscripted natural speech and gestures as they explained Piagetian conservation problems. Two trained experimenters extracted high-level semantics and low-level visual/acoustic features in speech and gesture for each audiovisual stimulus. We then tested online participants’ memories in three different conditions using a between-subjects study-test paradigm: the audiovisual stimuli (gesture+speech condition), the visual-only version of the same stimuli (gesture condition), and the audio-only version of the stimuli (speech condition). Within each of the two experimental blocks, participants encoded nine random stimuli from an actor and made old/new judgments on all 18 stimuli from the same actor immediately after. We discovered that participants show significant consistencies in their memory for the gesture, gesture+speech, and speech stimuli. Focusing on the visual-only (gesture) condition, we found that (1) more meaningful gestures and speech predicted more memorable gestures; (2) using both hands led to more memorable gestures than using one hand. Our results suggest that both semantic (conveyed through speech and gestures) and visual (conveyed through gesture) features make co-speech gestures memorable.