The interplay of visual memory and high-level vision
Wednesday, June 1, 2022, 12:00 – 2:00 pm EDT, Zoom Session
Organizer: Sharon Gilaie-Dotan1,2; 1Bar Ilan University, 2UCL
Presenters: Timothy F Brady, Noa Ofen, Sharon Gilaie-Dotan, Yoni Pertzov, Galit Yovel, Meike Ramon
While different studies show phenomenal human long term visual memory capacity, there are also indications that visual long term memory is influenced by different factors as familiarity, depth of processing, and visual category. In addition, individual differences also play a role and this is especially evident by individuals having exceptional visual long term memory for certain visual categories (e.g. super-recognizers for faces), while others may have very weak visual memory for these categories (e.g. prosopagnosia for faces). Furthermore, visual perception has long been regarded as a rather lower-level process preceding the more cognitive high-level visual long-term memory processes, and not much attention has been given to the possible influences of memory on perception and the interplay between perception, visual representations, memory, and behavior. In this symposium we will examine through a series of talks spanning different methods, populations and perspective the different influences on visual memory for different visual categories, culminating with the unique category of faces. Importantly support for a bi-directional interplay between perception and memory will be presented, relating to mechanisms, behavior and development. Tim Brady will open, describing perceptual constraints on visual long-term memory and the role of interference in limiting long-term memory for visual objects. He will also propose an interplay between perception and concepts in long-term memory. Noa Ofen will follow describing age dependent changes in the neural correlates of visual memory for scenes developing from childhood to adolescence that involve not only MTL and PFC but also visual cortex. Sharon Gilaie-Dotan will describe how during naturalistic encoding without task-related modulations physical image properties and visual categories influence image memory. Yoni Perzov will describe how face and object image familiarity (i.e. image memory during the perceptual process) influences visual exploration based on eye movement investigations, and that these may allow detection of concealed memories. Galit Yovel will describe how conceptual and social information contributes to face memory such that faces are learnt from concepts to percepts, and how this relates to face representations. Lastly, Meike Ramon will describe how exceptional individual abilities influence face-memory (aka super-recognizers) based on her deep-data approach on these individuals, and propose that their superior face memory is associated with consistency of identity-based representation rather than viewpoint (image/perception based) representation. These will be followed by a panel discussion, where we will consider present and future challenges and directions.
What limits visual long-term memory?
Timothy F Brady1; 1University of California San Diego
In visual working memory and visual attention, processing and/or remembering one item largely comes at the expense of all the other items (e.g., there is a ‘resource’ limit). By contrast, when we encode a new object in long-term memory, it does not seem to come directly at the expense of the other items in memory: Remembering the clothes your child is wearing today does not automatically squeeze out your memory of the food on your child’s breakfast plate. Yet at the same time, we do not perfectly or even partially remember the visual details of everything we encounter, or even everything we actively attend to and encode into long-term memory. So what determines which items survive in visual long-term memory and how precisely their visual features are remembered? In this talk, I’ll discuss several experiments detailing both perceptual constraints on visual long-term memory storage for visual features and the role of interference in limiting long-term memory for visual objects. I’ll suggest there is a rich interplay between perception and concepts in visual long-term memory, and that therefore long-term memory visual objects can be a helpful case study for vision scientists in understanding the structure of visual representations in general.
The neural correlates of the development of visual memory for scenes
Noa Ofen1; 1Wayne State University
Episodic memory – the ability to encode, maintain and retrieve information – is critical for everyday functioning at all ages, yet little is known about the development of episodic memory systems and their brain substrates. In this talk, I will present data from a series of studies with which we investigate how functional brain development underlies the development of memory for visual scenes throughout childhood and adolescence. Using functional neuroimaging methods, including functional MRI and intracranial EEG, we identified age differences in information flow between the medial temporal lobes and the prefrontal cortex that support the formation of memory for scenes. Critically we also identified activity in visual regions including the occipital cortex that plays a critical role in memory formation and shows complex patterns of age differences. The investigation of the neural basis of memory development has been fueled by recent advances in neuroimaging methodologies. Progress towards a mechanistic understanding of memory development hinges on the specification of the representations and functional operations that underlie the behavioral phenomena we wish to explain. Leveraging the rich understanding of visual representations offers a unique opportunity to make significant progress to that end.
Influences of physical image properties on image memory during naturalistic encoding
Sharon Gilaie-Dotan1,2; 1Bar Ilan University, 2UCL
We are constantly exposed to multiple visual scenes, and while freely viewing them without an intentional effort to memorize or encode them, only some are remembered. Visual memory is assumed to rely on high-level visual perception that shows a level of cue-invariance, and therefore is not assumed to be highly dependent on physical image cues as size or contrast. However, this is typically investigated when people are instructed to perform a task (e.g. remember or make some judgement about the images), which may modulate processing at multiple levels and thus may not generalize to naturalistic visual behavior. Here I will describe a set of studies where participants (n>200) freely viewed images of different sizes or of different levels of contrast while unaware of any memory-related task that would follow. We reasoned that during naturalistic vision, free of task-related modulations, stronger physical image cues (e.g. bigger or higher contrast images) lead to higher signal-to-noise ratio from retina to cortex and would therefore be better remembered. Indeed we found that physical image cues as size and contrast influence memory such that bigger and higher contrast images are better remembered. While multiple factors affect image memory, our results suggest that low- to high-level processes may all contribute to image memory.
How visual memory influences visual exploration
Yoni Pertzov1; 1The Hebrew University of Jerusalem
Due to the inhomogeneity of the photoreceptor distribution on the retina, we move our gaze approximately 3 times a second to gather fine detailed information from the surroundings. I will present a serious of studies that examined how this dynamic visual exploration process is effected by visual memories. Participants initially look more at familiar items and avoid them later on. These effects are robust across stimulus type (e.g. faces and other objects) and familiarity type (personally familiar and recently learned). The effects on visual exploration are evident even when explicitly instructing participants to avoid it. Thus, eye tracking could be used for detection of concealed memories in forensic scenarios.
Percepts and Concepts in Face Recognition
Galit Yovel1; 1Tel Aviv University
Current models of face recognition are primarily concerned with the role of perceptual experience and the nature of the perceptual representation that enables face identification. These models overlook the main goal of the face recognition system, which is to recognize socially relevant faces. We therefore propose a new account of face recognition according to which faces are learned from concepts to percepts. This account highlights the critical contribution of the conceptual and social information that is associated with faces to face recognition. Our recent studies show that conceptual/social information contributes to face recognition in two ways: First, faces that are learned in social context are better recognized than faces that are learned based on their perceptual appearance. These findings indicate the importance of converting faces from a perceptual to a social representation for face recognition. Second, we found that conceptual information significantly accounts for the visual representation of faces in memory, but not in perception. This was the case both based on human perceptual and conceptual similarity ratings as well as the representations that are generated by unimodal deep neural networks that represent faces based on visual information alone, and multi-model networks that represent visual and conceptual information about faces. Taken together, we propose that the representation that is generated for faces by the perceptual and memory systems is determined by social/conceptual factors, rather than our passive perceptual experience with faces per se.
Consistency – a novel account for individual differences in visual cognition
Meike Ramon1; 1University of Fribourg
Visual cognition refers to the processing of retinally available information, and its integration with prior knowledge to generate representations. Traditionally, perception and memory have been considered as isolated, albeit related cognitive processes. Much of vision research has investigated how input characteristics relate to overt behavior, and hence determine observed cognitive proficiency in either perception or memory. Currently, however, comparatively less focus has been devoted to understanding the contribution of observer-related aspects. Studies of acquired expertise have documented systematic changes in brain connectivity and exceptional memory feats through extensive training (Dresler et al., 2017). The mechanisms underlying naturally occurring cognitive superiority, however, are unfortunately much less understood. In this talk I will synthesize findings from studies of neurotypical observers with a specific type of cognitive superiority — that observed for face identity processing. Focussing on a growing group of these so-called “Super-Recognizers” (Russell et al., 2009) identified with the same diagnostic framework (Ramon, 2021), my lab has taken a deep-data approach to provide detailed case descriptions of these unique individuals using a range of paradigms and methods. Intriguingly, their superior abilities cannot be accounted for in terms of enhanced processing of certain stimulus dimensions, such as information content, or stimulus memorability (Nador et al., 2021a,b). Rather, our work convergingly points to behavioral consistency as their common attribute.
Understanding mesoscale visual processing through the lens of high-resolution functional MRI
Wednesday, June 1, 2022, 12:00 – 2:00 pm EDT, Zoom Session
Organizer: Shahin Nasr1,2,3; 1Massachusetts General Hospital, 2Athinoula A. Martinos Center for Biomedical Imaging, 3Harvard Medical School
Presenters: Shahin Nasr, Yulia Lazarova, Luca Vizioli, Roger Tootell
In the past twenty years, with the increase in popularity of high-resolution neuroimaging techniques, we have witnessed a surge in the number of studies focused on understanding the fine-scale functional organization of visual system. Taking advantage of state-of-the-art technologies, these studies have narrowed the gap in our understanding of mesoscale neuronal processing in the visual cortex of humans vs. animals. In this symposium, four speakers will present their recent findings about the mesoscale functional organization of visual cortex in humans. Using various high-resolution fMRI techniques they have successfully enhanced their access to the evoked activity within cortical columns across different visual areas. During the first talk, Dr. Shahin Nasr will describe his approach to his studies, based on using ultra-high field scanners combined with advanced processing pipelines to avoid spatial blurring for visualizing the columnar organization of human extrastriate visual areas (Nasr et al., 2016). In his talk, the speaker will also highlight the impact of amblyopia (lazy eye) on development of cortical columns across these areas. Accessing laminar-specific brain activity is a major step toward differentiating processing streams in visual system. In the second talk, Dr. Lazarova will present a study in which high-resolution fMRI was used to differentiate “factual vs. counterfactual” feedback streams across laminar layers of the early visual areas V1 and V2. Their findings suggest a mechanism for the coexistence of information streams and integration of feedforward and feedback processing streams (Larkum et al., 2018). Shifting our attention to the neuronal mechanism of cognitive control in higher-level visual areas, in the third talk Dr. Vizioli will present his study during which high-resolution fMRI was used to study the neuronal processing involved in active face perception. Using sub-millimeter spatial resolution provided by this technique, investigators compared the laminar profile of task-related activity modulation between V1, occipital face-selective and fusiform face-selective areas. These results provide evidence for the laminar signature of complex cognitive processes in human visual system. The interaction between the external world and the innate subject-specific criteria such as personal space plays an important role in defining the level of evoked brain activity. In the fourth talk, Dr. Roger Tootell will present a study during which high-resolution fMRI was used to reveal the columnar representation of personal and visual spaces in the human parietal cortex. Results of this study suggest that the variation in response, measured within the parietal cortical columns, is related to subjective discomfort levels during intrusion into personal space.
Visualizing cortical columns within the retinotopic visual areas of humans with normal and amblyopic vision
Shahin Nasr1, Bryan Kennedy; 1Massachusetts General Hospital, 2Athinoula A. Martinos Center for Biomedical Imaging, 3Harvard Medical School
In the past two decades, our knowledge of cortical columns in human visual cortex has expanded considerably, thanks to advances in high-resolution functional magnetic resonance imaging (fMRI) techniques and improvements in data processing methods. In the first part of this talk, I will present the recently developed fMRI data processing techniques which have improved our capabilities to visualize cortical columns by reducing the amount of unwanted spatial blurring (Wang et al., 2021). Then, in the second part, I will demonstrate the application of these techniques in revealing the selectivity, spatial distribution, and functional connectivity of cortical columns in human retinotopic visual areas (V1-V4), including those that are involved in color, motion, stereopsis, and shape encoding (Nasr et al., 2016; Tootell and Nasr, 2017; 2020). In the third part of this talk, I will demonstrate the application of these techniques in translational studies. Specifically, I will present striking evidence for the impacts of amblyopia, a disorder caused by the interruption of balanced binocular visual inputs in early stages of life, on the fine-scale organization and the response properties of cortical columns across the retinotopic visual areas. These findings will clarify the neuronal disorders that underlie multiple perceptual impairments in amblyopic individuals including stereoblindness and distorted spatial vision. All in all, the presented findings, most of them still unpublished, will highlight the importance of studying cortical columns in understanding visual perception in humans with normal and impaired vision.
Layer-specific profiles of factual and counterfactual feedback signals in human early visual cortex during navigation – 7T fMRI study
Yulia Lazarova1, Lucy Petro1,2, Angus Paton1,2, Lars Mucli1,2; 1Centre for Cognitive NeuroImaging, School of Psychology and Neuroscience, College of Medical, Veterinary and Life Sciences, University of Glasgow, 2Imaging Centre for Excellence (ICE), College of Medical, Veterinary and Life Sciences, University of Glasgow
We rely on pre-existing models of the world stored from past experiences in order to form internal representations of our present environment. In addition to facilitating perception in the present, these models enable us to engage in prospective thoughts and mental simulations unrelated to the immediate environment, known as counterfactual thoughts. Traces of both streams have been found to share some neuronal mechanisms at the earliest level of cortical processing (Monaco, 2020; Huang, 2021). It is still a challenge to understand the mechanisms that allow the parallel existence of these two streams of thought while at the same time keeping the perception of reality and imagination segregated. We used a VR headset to familiarise participants with a virtual environment prior to scanning. We recorded 7T fMRI while participants were presented with videos simulating navigation through the environment. Directional cues elicited expectations for an upcoming room that the participant was not presently viewing, but about which they could generate prospective thoughts. The lower right quadrant of the video was hidden behind an occluder, blocking feedforward input to the corresponding patch of the visual cortex. We applied MVPA analysis to probe the contents of the activation in the non-stimulated areas of V1 and V2. The results revealed that different top-down inputs target different cortical layers depending on the type of information they carry. Our data suggests the coexistence of information streams might depend on cortical layering, and layer-spanning pyramidal neurons that integrate feedforward and feedback processing (Larkum, 2018).
Characterizing top-down related functional microcircuitry of face processing in visual cortex using ultra high field fMRI
Luca Vizioli1,2, Logan Dowdle1,2, Essa Yacoub1; 1Center for Magnetic Resonance Research, University of Minnesota, 2Department of Neurosurgery, University of Minnesota
At ultra-high field it is now possible to acquire functional images with unprecedented spatial precision, spanning the submillimeter range. These measurements allow investigations of some fundamental units of neural computations, such as cortical layers and columns, that had previously only been accessible in animals using invasive electrophysiology. With submillimeter fMRI it is therefore possible, in principle, to study fine scale organization of high-level cognitive processes that are unique to humans. Here we evaluated top-down effects of high-level, socially relevant task demands, such as face perception, across cortical depths in lower and higher-level visual areas using functional images recorded with 0.7mm isotropic voxels. To this end, we instructed participants to perform either a face detection or a stimulus-irrelevant fixation task with identical phase scrambled (ranging from 0 to 40% phase coherence) faces. Using an independent functional localizer, we identified 3 regions of interest (i.e. Fusiform and Occipital face areas and V1) and segmented each region into 3 cortical depths. To evaluate task-related top-down modulations, we calculated the ratio of the activation during the face relative to the fixation task at each cortical depth. Task-related top-down modulations were more pronounced in the inner than the outer layers of V1; and in the outer compared to the inner layers in the FFA (p<0.05). These findings are consistent with feedback exchange between deeper and superficial layers, and with apical dendritic amplification being a key mechanism of conscious perception. This work represents a promising step towards characterizing laminar functional profiles for complex human-specific cognitive processes.
Columnar Encoding of Personal Space and Visual Space in Human Parietal Cortex
Roger Tootell1,2,3, Zahra Nasiravavaki1,2, Baktash Babadi1,2, Douglas Greve1,2,3, Daphne Holt2,3,4; 1Department of Radiology, Massachusetts General Hospital, 2Athinoula A. Martinos Center for Biomedical Imaging, 3Harvard Medical School, 4Department of Psychiatry, Massachusetts General Hospital
Personal space (PS) is the distance that people prefer to maintain between themselves and unfamiliar others. Intrusion into the PS evokes discomfort, and an urge to move further apart. Behavioral aspects of PS regulation have been well studied, but the brain mechanisms underlying PS have not. Here we hypothesized that PS processing involves a known body-defensive circuit including inferior parietal cortex. We examined this hypothesis at high spatial resolution, demonstrating two categories of space-sensitive cortical columns in inferior parietal cortex, using 7T fMRI (1.1 mm isotropic). First, personal space was measured in each subject, both outside and inside the scanner. During subsequent scanning, one category of columns responded when faces were presented at virtual distances that were within (but not beyond) each subject’s personal space boundary. In the majority of columns in this category, BOLD response amplitudes increased with increasing face proximity; in remaining columns, the responses decreased. These fMRI response variations appeared related to previously-described variations in subjective discomfort levels, and physiologic arousal, during intrusion into (but not beyond) personal space. The second category of columns responded strongly to either ‘near’ or ‘far’ binocular disparity in visual space, in random dot stereograms. These disparity columns in parietal cortex were functionally similar to disparity columns described previously in occipital cortex. Topographically, the disparity-selective columns were found to be systematically interdigitated with the personal space-sensitive columns. Thus, the transformation of visual to higher-order information may be computed in multiple discrete sites, rather than in a graded fashion, across parietal cortex.
2022 V-VSS Symposia
Understanding mesoscale visual processing through the lens of high-resolution functional MRI
Wednesday, June 1, 2022, 12:00 – 2:00 pm EDT, Zoom Session
Organizer: Shahin Nasr1,2,3; 1Massachusetts General Hospital, 2Athinoula A. Martinos Center for Biomedical Imaging, 3Harvard Medical School
In the past, our understanding of neuronal processing in mesoscale levels (i.e. the spatial scale of cortical columns and/or layers) relied heavily on invasive techniques, mostly not applicable to humans (or behaving animals). However, with recent advances in high-resolution neuroimaging techniques, our knowledge of mesoscale neuronal processing is expanding rapidly. In this symposium, speakers will describe their recent discoveries about the mesoscale neuronal processing in human visual system. Studying different brain areas, from early retinotopic areas (V1-V4) to higher-level regions (e.g. fusiform and parietal cortex), their findings have opened new horizons to understanding the neuronal underpinning of visual perception. More…
The interplay of visual memory and high-level vision
Wednesday, June 1, 2022, 12:00 – 2:00 pm EDT, Zoom Session
Organizers: Sharon Gilaie-Dotan1,2; 1Bar Ilan University, 2UCL
Every day we come across many visual images, whether on electronic devices, printed matter, or billboards. Some of these are already familiar to us, some are new and are burnt into memory while others are not. While some studies reveal extraordinary human image memory, the factors that influence which images are remembered are far from being understood. In this symposium, presenters studying diverse populations using a variety of techniques will examine different influences on image memory (as physical, categorical, conceptual, developmental and individual differences), their underlying mechanisms and the interplay between perception, visual representations, memory, and behavior. More…
What does the world look like? How do we know?
Friday, May 13, 2022, 5:00 – 7:00 pm EDT, Talk Room 2
Organizers: Mark Lescroart1, Benjamin Balas2, Kamran Binaee1, Michelle Greene3, Paul MacNeilage1; 1University of Nevada, Reno, 2North Dakota State University, 3Bates College
Presenters: Mark Lescroart, Caitlin M. Fausey, Martin N. Hebart, Michelle R. Greene, Jeremy Wilmer, Wilma Bainbridge
A central tenet of vision science is that perception is shaped by visual experience. The statistical regularities of our visual input are reflected in patterns of brain activity, enabling efficient behavior. A growing body of work has sought to understand the natural statistical regularities in human visual experience, and to increase the ecological validity of vision science research by using naturalistic stimuli in experiments. However, the stimuli available for experiments and the conclusions that can be drawn about natural image statistics–especially higher-order statistics, such as the co-occurrence rates of specific object categories in different scenes–are constrained by the limits of extant datasets. Datasets may be limited by sampling choices, by practical constraints related to the robustness of hardware and software used to collect data, by environmental factors like movement, or by the characteristics of different observers who vary in their behavioral repertoire as a function of development or experience. Consequently many visual datasets that aspire to generality are nonetheless sampled from convenience and thus limited in size or scope. Many datasets are also reliant on the use of proxies for visual experiences, such as photos or movies sampled from the internet. The potential consequences of this gap between what we hope to do and what we can do may be substantial. This workshop will examine the issues involved in sampling from representative visual experience, and will specifically address the lacunae of visual experience — what are we not sampling and why? How might the blind spots in our sampling lead to blind spots in our inferences about human vision? The symposium will begin with a brief 5-minute introduction, followed by six 15-minute talks with 2 minutes each for clarifying questions. We will finish with a 10-15 minute discussion of overarching issues in sampling. We will feature work that grapples with sampling issues across several areas of vision sciences. Mark Lescroart will talk about practical limits on the activities, locations, and people that can be sampled with mobile eye tracking. Caitlin Fausey will then talk about sampling visual experience in infants. Martin Hebart will talk about the THINGS initiative, which aims to comprehensively sample the appearance of object categories in the world. Michelle Greene will talk about the causes and consequences of biases in extant datasets. Jeremy Wilmer will talk about sampling participants–specifically about sampling across vs within racial groups. Finally, Wilma Bainbridge will talk about richly sampling memory representations. Perfectly representative sampling of visual experience may be an unreachable goal. However, we argue that a focus on the limits of current sampling protocols–of objects, of participants, of dynamic visual experience at different stages of development, and of mental states–will advance the field, and in the long run improve the ecological validity of vision science.
Methodological limits on sampling visual experience with mobile eye tracking
Mark Lescroart1, Kamran Binaee1, Bharath Shankar1, Christian Sinnott1, Jennifer A. Hart2, Arnab Biswas1, Ilya Nudnou3, Benjamin Balas3, Michelle R. Greene2, Paul MacNeilage1; 1University of Nevada, Reno, 2Bates College, 3North Dakota State University
Humans explore the world with their eyes, so an ideal sampling of human visual experience requires accurate gaze estimates while participants perform a wide range of activities in diverse locations. In principle, mobile eye tracking can provide this information, but in practice, many technical barriers and human factors constrain the activities, locations, and participants that can be sampled accurately. In this talk we present our progress in addressing these barriers to build the Visual Experience Database. First, we describe how the hardware design of our mobile eye tracking system balances participant comfort and data quality. Ergonomics matter, because uncomfortable equipment affects behavior and reduces the reasonable duration of recordings. Second, we describe the challenges of sampling outdoors. Bright sunlight causes squinting, casts shadows, and reduces eye video contrast, all of which reduce estimated gaze accuracy and precision. We will show how appropriate image processing at acquisition improves eye video contrast, and how DNN-based pupil detection can improve estimated pupil position. Finally, we will show how physical shift of the equipment on the head affects estimated gaze quality. We quantify the reduction in gaze precision and accuracy over time due to slippage, in terms of drift of the eye in the image frame and instantaneous jitter of the camera with respect to the eye. Addressing these limitations takes us some way towards achieving a representative sample of visual experience, but recording of long-duration, of highly dynamic activities, and in extreme lighting conditions remains challenging.
Sampling Everyday Infancy: Lessons and Questions
Caitlin M. Fausey1; 1University of Oregon
Everyday sights, sounds, and actions are the experiences available to shape experience-dependent change. Recent efforts to quantify this everyday input – using wearable sensors in order to capture experiences that are neither scripted by theorists nor perturbed by the presence of an outsider recording – have revealed striking heterogeneity. There is no meaningfully “representative” hour of a day, instance of a category, interaction context, or infant. Such heterogeneity raises questions about how to optimize everyday sampling schemes in order to yield data that advance theories of experience-dependent change. Here, I review lessons from recent research sampling infants’ everyday language, music, action, and vision at multiple timescales, with specific attention to needed next steps. I suggest that most extant evidence about everyday ecologies arises from Opportunistic Sampling and that we must collectively focus our ambitions on a next wave of Distributionally Informed Sampling. In particular, we must center (1) activity distributions with their correlated opportunities to encounter particular inputs, (2) content distributions of commonly and rarely encountered instances, (3) temporal distributions of input that comes and goes, and (4) input trajectories that change over developmental time, as we model everyday experiences and their consequences. Throughout, I highlight practical constraints (e.g., sensor battery life and fussy infants) and payoffs (e.g., annotation protocols that yield multi-timescale dividends) in these efforts. Grappling with the fact that people do not re-live the same hour all life long is a necessary and exciting next step as we build theories of everyday experience-dependent change.
The THINGS initiative: a global initiative of researchers for representative sampling of objects in brains, behavior, and computational models
Martin N. Hebart1; 1Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany
As scientists, we carry out experiments to contribute to the knowledge of the world. Yet we have to make choices in our experimental design that abstract away from the real world, which can lead to selection bias, limiting our ability to translate our research findings into generalizable conclusions. For studies involving the presentation of objects, central choices are which objects are shown and – in case of visual stimuli – in what format object images should be presented (e.g. abstracted, cropped from background, or natural photographs). In this talk, I will discuss the THINGS initiative, which is a large-scale global initiative of researchers collecting behavioral and brain imaging datasets using the THINGS object concept and image dataset. I will highlight the motivation underlying the development of THINGS, the advantages and limitations in the object and image sampling strategy, and new insights enabled by this strategy about the behavioral and neural representation of objects. Further, I will discuss strategies that offer more generalizable conclusions for their small-scale laboratory experiments using THINGS images. Moving beyond THINGS, I will discuss ideas for future sampling approaches that may further narrow the gap between stimulus sampling and neural representations.
What we don’t see in image databases
Michelle R. Greene1, Jennifer A. Hart1, Amina Mohamed1; 1Bates College
The rise of large-scale image databases has accelerated productivity in both human and machine vision communities. Most extant databases were created in three phases: (1) Obtaining a comprehensive list of categories to sample; (2) Scraping images from the web; (3) Verifying category labels through crowdsourcing. Subtle biases can arise in each stage: offensive labels can get reified as categories; images represent what is typical of the internet, rather than what is typical of daily experience, and verification is dependent on the knowledge and cultural competence of the annotators that provide “ground truth” labels. Here, we describe two studies that examine the bias in extant visual databases and the deep neural networks trained from them. 66 observers took part in an experience sampling experiment via text message. Each received 10 messages per day at random intervals for 30 days, and sent a picture of their surroundings if possible (N=6280 images). Category predictions were obtained from CNNs pretrained on the Places database. The dCNNs showed poor classification performance for these images. A second study investigated cultural biases. We scraped images of private homes from Airbnb from 219 countries. Pre-trained deep neural networks were less accurate and less confident in recognizing images from the Global South. We observed significant correlations between dCNN confidence and GDP per capita (r=0.30) and literacy rate (r=0.29). These studies show a dissociation between lived visual content and web-based content, and suggest caution when using the internet as a proxy for visual experience.
Multiracial Reading the Mind in the Eyes Test (MRMET): validation of a stimulus-diverse and norm-referenced version of a classic measure
Jeremy Wilmer1, Heesu Kim1, Jasmine Kaduthodil1, Laura Germine1, Sarah Cohan1, Brian Spitzer1, Roger Strong1; 1Wellesley College
Do racially homogeneous stimuli facilitate scientific control, and thus validity of measurement? Here, as a case in point, we ask whether a multiracial cognitive assessment utilizing a diverse set of stimuli maintains psychometric qualities that are as good as, if not better than, an existing Eurocentric measure. The existing measure is the Reading the Mind in the Eyes Test (RMET) (Baron-Cohen et al., 2001), a clinically significant neuropsychiatric paradigm that has been used to assess face expression reading, theory of mind, and social cognition. The original measure, however, lacked racially inclusive stimuli, among other limitations. In an effort to rectify this and other limitations of the original RMET, we have created and digitally administered a Multiracial version of the RMET (MRMET) that is reliable, validated, stimulus-diverse, norm-referenced, and free for research use. We show, with a series of sizable datasets (Ns ranging from 1,000 to 12,000), that the MRMET is on par or better than the RMET across a variety of psychometric indices. Moreover, the reliable signal captured by the two tests is statistically indistinguishable, evidence for full interchangeability. Given the diversity of the populations that neuropsychology aims to survey, we introduce the Multiracial RMET as a high-quality, inclusive alternative to the RMET that is conducive to unsupervised digital administration across a diverse array of populations. With the MRMET as a key example, we suggest that multiracial cognitive assessments utilizing diverse stimuli can be as good as, if not better than, Eurocentric measures.
An emerging landscape for the study of naturalistic visual memory
Wilma Bainbridge1; 1University of Chicago
Our memories are often rich and visually vivid, sometimes even resembling their original percepts when called to mind. Yet, until recently, our methods for quantifying the visual content in memories have been unable to capture this wealth of detail, relying on simple, static stimuli, and testing memory with low-information visual recognition or verbal recall tasks. Because of this, we have been largely unable to answer fundamental questions such as what aspects of a visual event drive memory, or how the neural representations of perceived and recalled visual content compare. However, in recent years, new methods in quantifying visual memories have emerged, following the growth of naturalistic vision research more broadly. Instead of verbal recall, drawings can directly depict the visual content in memory, at a level of detail allowing us to simultaneously explore questions about object memory, spatial memory, visual-semantic interactions, and false memories. Social media is presenting new memory stimulus sets on the order of hundreds or thousands, allowing us to examine neural representations for diverse memories across years. And, the internet has also allowed us to identify surprising new phenomena in memory—such as the existence of shared visual false memories learned across people (the “Visual Mandela Effect”), or the existence of a population of individuals who lack visual recall in spite of intact perception (“aphantasia”). In this talk, I will present exciting new directions in the naturalistic study of visual memory and provide resources for those interested in pursuing their own studies of naturalistic memory.
The probabilistic nature of vision: How should we evaluate the empirical evidence?
Friday, May 13, 2022, 5:00 – 7:00 pm EDT, Talk Room 1
Organizers: Ömer Dağlar Tanrıkulu1, Arni Kristjansson2; 1Williams College, 2University of Iceland
Presenters: Ömer Dağlar Tanrıkulu, Dobromir Rahnev, Andrey Chetverikov, Robbe Goris, Uta Noppeney, Cristina Savin
The presence of image noise and the absence of one-to-one inverse mapping from images back to scene properties has led to the idea that visual perception is inherently probabilistic. Our visual system is considered to deal with this uncertainty by representing sensory information in a probabilistic fashion. Despite the prevalence of this view in vision science, providing empirical evidence for such probabilistic representations in the visual system can be very challenging. Firstly, probabilistic perception is difficult to operationalize, and has therefore been interpreted differently by various researchers. Second, experimental results can typically be accounted for, in principle, by both probabilistic and non-probabilistic representational schemes. Our goal in this symposium is to evaluate the empirical evidence in favor of (or against) the probabilistic description of visual processing by discussing the potential advantages (and disadvantages) of different methodologies used within vision science to address this question. This symposium will bring together speakers from diverse perspectives, which include computational modeling, neuroscience, psychophysics and philosophy. Our speakers include promising junior researchers, as well as established scientists. In the first talk, Omer Daglar Tanrikulu will provide an introduction with a summary of the main challenges in providing evidence for probabilistic visual representations, as well as his proposal to sidestep these obstacles. Next, Dobromir Rahnev will focus on the difficulties in operationalizing the term “probabilistic perception” and suggest a tractable research direction with illustration of studies from his lab. In the third talk, Andrey Chetverikov will explain and illustrate empirical methodologies in distinguishing between representation of probabilities and probabilistic representations in vision. In the fourth talk, Robbe Goris will present a recently developed methodology to discuss the implications of observers’ estimates of their own visual uncertainty. In the fifth talk, Uta Noppeny will approach the issue from a multisensory perspective and discuss the success of Bayesian Causal Inference models in explaining how our brain integrates visual and auditory information to create a representation of the world. Finally, Cristina Savin will consider probabilistic representations at a mechanistic level and present a novel neural network model implementing Bayes-optimal decisions to account for certain sequential effects in perceptual judgments. Each 15-min talk will be followed by 5-min Q&A and discussion. The speaker line-up highlights the multidisciplinary nature of this symposium which reflects that our target audience is composed of researchers from all areas of vision science. We are confident that researchers at all career stages, as well as the broad audience of VSS, will benefit from this symposium. Students and early-career researchers will have a better understanding of the evidence for, or against, probabilistic visual perception, which will equip them with a perspective to evaluate other research that they will encounter at VSS. More importantly, such discussion will help both junior and senior scientists to draw their implicit assumptions about this important topic to the surface. This, in turn, will allow the general vision community to determine research directions that are more likely to increase our understanding of the probabilistic nature of visual processing.
How can we provide stronger empirical evidence for probabilistic representations in visual processing?
Ömer Dağlar Tanrıkulu1; 1Cognitive Science Program, Williams College, MA, USA
Probabilistic approaches to cognition have had great empirical success, especially in building computational models of perceptual processes. This success has led researchers to propose that the visual system represents sensory information probabilistically, which resulted in high-profile studies exploring the role of probabilistic representations in visual perception. Yet, there is still substantial disagreement over the conclusions that can be drawn from this work. In the first part of this talk, I will outline the critical views over the probabilistic nature of visual perception. Some critics underline the inability of experimental methodologies to distinguish between perceptual processes and perceptual decisions, while others point to the successful utilization of non-probabilistic representational schemes in explaining these experimental results. In the second part of the talk, I will propose two criteria that must be satisfied to provide empirical evidence for probabilistic visual representations. The first criterion requires experiments to demonstrate that representations involving probability distributions are actually generated by the visual system, rather than being imposed on the task by the experimenter. The second criterion requires the utilization of structural correspondence (as opposed to correlation) between the internal states of the visual system and stimulus uncertainty. Finally, I will illustrate how these two criteria can be met through a psychophysical methodology using priming effects in visual search tasks.
The mystery of what probabilistic perception means and why we should focus on the complexity of the internal representations instead
Dobromir Rahnev1; 1School of Psychology, Georgia Institute of Technology, Atlanta, GA
Two years ago, I joined an adversarial collaboration on whether perception is probabilistic. The idea was to quickly agree on a precise definition of the term “probabilistic perception” and then focus on designing experiments that can reveal if it exists. Two years later, we are still debating the definition of the term, and I now believe that it cannot be defined. Why the pessimism? At the heart of probabilistic perception is the idea that the brain represents information as probability distributions. Probability distributions, however, are mathematical objects derived from set theory that do not easily apply to the brain. In practice, probabilistic perception is typically equated with “having a representation of uncertainty.” This phrase ultimately seems to mean “having a representation of any information beyond a point estimate.” Defined this way, the claim that perception is probabilistic borders on the trivial, and the connection to the notion of probability distributions appears remote. I no longer think that there is a way forward. Indeed, in empirical work, the term probabilistic perception seems to serve as a litmus test of how researchers feel about Bayesian theories of the brain rather than a precise hypothesis about the brain itself. What then? I argue that the question that is both well-posed and empirically tractable is “How complex is the perceptual representation?” I will briefly review what we know about this question and present recent work from my lab suggesting that perceptual representations available for decision-making are simple and impoverished.
Representations of probabilities and probabilistic representations
Andrey Chetverikov1, Arni Kristjansson2; 1Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, The Netherlands, 2Icelandic Vision Lab, School of Health Sciences, University of Iceland, Reykjavík, Iceland.
Both the proponents and the opponents of probabilistic perception draw a distinction between representations of probabilities (e.g., the object I see is more likely to have orange hues than green) and probabilistic representations (this object is probably an orange and not an apple). The former corresponds to the probability distribution of sensory observations given the stimulus, while the latter corresponds to the opposite, the probabilities of potential stimuli given the observations. This dichotomy is important as even plants can respond to probabilistic inputs presumably without making any inferences about the stimulus. It is also important for the computational models of perception as the Bayesian observer aims to infer the stimulus, not the observations. It is then essential to evaluate the empirical evidence for probabilistic representations and not the representation of probabilities to answer the question posed by this symposium. However, is it possible to empirically distinguish between the two? We will discuss this question using the data from our recent work on probabilistic perception as an illustration.
Quantifying perceptual introspection
Robbe Goris1; 1Center for Perceptual Systems, University of Texas at Austin, Austin, TX, USA
Perception is fallible, and humans are aware of this. When we experience a high degree of confidence in a perceptual decision, it is more likely to be correct. I will argue that our sense of confidence arises from a computation that requires direct knowledge of the uncertainty of perception, and that it is possible to quantify the quality of this knowledge. I will introduce a new method to assess the reliability of a subject’s estimate of their perceptual uncertainty (i.e., uncertainty about uncertainty, which I term “meta-uncertainty”). Application of this method to a large set of previously published confidence studies reveals that a subject’s level of meta-uncertainty is stable over time and across at least some domains. Meta-uncertainty can be manipulated experimentally: it is higher in tasks that involve more levels of stimulus reliability across trials or more volatile stimuli within trials. Meta-uncertainty appears to be largely independent of task difficulty, task structure, response bias, and attentional state. Together, these results suggest that humans intuitively understand the probabilistic nature of perception and automatically evaluate the reliability of perceptual impressions.
Constructing a representation of the world across the senses
Uta Noppeney1; 1Donders Institute for Brain, Cognition and Behavior, Radboud University, Nijmegen, The Netherlands
Our senses are constantly bombarded with myriads of diverse signals. Transforming this sensory cacophony into a coherent percept of our environment relies on solving two computational challenges: First, we need to solve the causal inference problem – deciding whether signals come from a common cause and thus should be integrated, or come from different sources and be treated independently. Second, when there is a common cause, we should integrate signals across the senses weighted in proportion to their sensory precisions. I discuss recent research at the behavioural, computational and neural systems level investigating how the brain combines sensory signals in the face of uncertainty about the world’s causal structure. Our results show that the brain constructs a multisensory representation of the world approximately in line with Bayesian Causal Inference.
Sampling-based decision making
Cristina Savin1; 1Center for Neural Science, Center for Data Science, New York University, New York, NY
There is substantial debate about the neural correlates of probabilistic computation (as evidenced in a Computational Cognitive Neuroscience – GAC 2020 workshop). Among competing theories, neural sampling provides a compact account of how variability in neuron responses can be used to flexibly represent probability distributions, which accounts for a range of V1 response properties. As samples encode uncertainty implicitly, distributed across time and neurons, it remains unclear how such representations can be used for decision making. Here we present a simple model for how a spiking neural network can integrate posterior samples to support Bayes-optimal decision making. We use this model to study behavioral and neural consequences of sampling based decision making. As the integration of posterior samples in the decision circuit is continuous in time, it leads to systematic biases after abrupt changes in the stimulus. This is reflected in behavioral biases towards recent history, similar to documented sequential effects in human decision making, and stimulus-specific neural transients. Overall, our work provides a first mechanistic model for decision making using sampling-based codes. It is also a stepping stone towards unifying sampling and parametric perspectives of Bayesian inference.
Perceptual Organization – Lessons from Neurophysiology, Human Behavior, and Computational Modeling
Friday, May 13, 2022, 2:30 – 4:30 pm EDT, Talk Room 2
Organizers: Dirk B. Walther1, James Elder2; 1University of Toronto, 2York University
Presenters: James Elder, Thomas Serre, Anitha Pasupathy, Mary A. Peterson, Pieter Roelfsema, Dirk B. Walther
A principal challenge for both biological and machine vision systems is to integrate and organize the diversity of cues received from the environment into the coherent global representations we experience and require to make good decisions and take effective actions. Early psychological investigations date back more than 100 years to the seminal work of the Gestalt school. But in the last 50 years, neuroscientific and computational approaches to understanding perceptual organization have become equally important, and a full understanding requires integration of all three approaches. We understand perceptual organization as the process of establishing meaningful relational structures over raw visual data, where the extracted relations correspond to the physical structure and semantics of the scene. The relational structure may be simple, e.g., set membership for image segmentation, or more complex, for example, sequence representations of contours, hierarchical representations of surfaces, layered representations of scenes, etc. These representations support higher-level visual tasks such as object detection, object recognition, activity recognition and 3D scene understanding. This symposium will review the current state of perceptual organization research as well as open questions from a neuroscientific, psychophysical, and computational approach and highlight outstanding issues. Current feedforward computational models for object perception fail to account for the holistic nature of human object perception. A computational analysis of perceptual grouping problems leads to an alternative account that refines feedforward representations of local features with recurrent computations implementing global optimization objectives (James Elder). These principles can be seen in the recurrent computations leading to the formation of extra-classical receptive fields in early visual cortex. New neural network models of these recurrent circuits lead to emergent grouping principles of proximity and good continuation and demonstrate how recurrence leads to better contour detection and a more accurate account of human contour processing (Thomas Serre). These early contour representations are further integrated in mid-level stages of the ventral visual pathway to form object representations. A key challenge for perceptual organization is to accurately encode object shape despite occlusion and clutter. Behavioural and physiological results reveal that the visual system relies upon a competitive recurrent grouping-by-similarity computation to protect object encoding from the effects of crowding (Anitha Pasupathy). This kind of competitive computation also appears to be at the heart of figure/ground assignment, where convexity serves as a figural prior (Mary Peterson). While simple grouping operations may be achieved through a feedforward process, it will be argued that these more complex grouping operations are invoked through an incremental, attentive process that manifests as a more gradual spread of activation across visual cortex. (Pieter Roelfsema). To close, we show that local parallelism of contours leads to improved scene categorization as well as clearer representations of natural scenes in the human visual cortex. (Dirk B. Walther). Through these closely-related talks, the symposium will illustrate how integration of physiological, psychophysical and computational research has led to a better understanding of perceptual organization, and will highlight key open research questions and suggest directions for integrative research that will answer these questions.
The role of local and holistic processes in the perceptual organization of object shape
James Elder1; 1York University
Perceptual grouping is the problem of determining what features go together and in what configuration. Since this is a computationally hard problem, it is important to ask whether object perception really depends on perceptual grouping. For example, under ideal conditions, a collection of local features may be sufficient to classify an object. These features could be computed via a feedforward process, obviating the need for perceptual grouping. Indeed, this fast feedforward `bag of features’ conception of object processing is prevalent in both human and computer vision research. Here I will review psychophysical and computational research that challenges the ability of this class of model to explain object perception. Psychophysical assessment shows that humans are largely unable to pool local shape features to make object judgements unless these features are configured holistically. Further, the formation of these perceptual groups is itself found to rely on holistic shape representations, pointing to a recurrent circuit that conditions local grouping computations on this holistic encoding. While feedforward deep learning models for object classification are more powerful than earlier bag-of-feature models, we find that these models also fail to capture human sensitivity to holistic shape and perceptual robustness to occlusion. This leads to the hypothesis that a computational model designed to solve perceptual grouping tasks as well as object classification will form a better account of human object perception, and I will highlight how optimal solutions to these grouping tasks are typically based on a fusion of feedforward local computations with holistic optimization and feedback.
Recurrent neural circuits for perceptual grouping
Thomas Serre1; 1Brown University
Neurons in the visual cortex are sensitive to context: Responses to stimuli presented within their classical receptive fields (CRFs) are modulated by stimuli in their surrounding extra-classical receptive fields (eCRFs). However, the circuits underlying these contextual effects are not well understood, and little is known about how these circuits drive perception during everyday vision. We tackle these questions by approximating circuit-level eCRF models with a differentiable discrete-time recurrent neural network that is trainable with gradient-descent. After optimizing model synaptic connectivity and dynamics for object contour detection in natural images, the neural-circuit model rivals human observers on the task with far better sample efficiency than state-of-the-art computer vision approaches. Notably, the model also exhibits CRF and eCRF phenomena typically associated with primate vision. The model’s ability to accurately detect object contours also critically depends on these effects, and these contextual effects are not found in ablated versions of the model. Finally, we derive testable predictions about the neural mechanisms responsible for contextual integration and illustrate their importance for accurate and efficient perceptual grouping.
Encoding occluded and crowded scenes in the monkey brain: object saliency trumps pooling
Anitha Pasupathy1; 1University of Washington
I will present results from a series of experiments investigating how simple scenes with crowding and partial occlusion are encoded in midlevel stages of the ventral visual pathway in the macaque monkey. Past studies have demonstrated that neurons in area V4 encode the shape of isolated visual stimuli. When these stimuli are surrounded by distractors that crowd and occlude, shape selectivity of V4 neurons degrades, consistent with the decline in the animal’s ability to discriminate target object shapes. To rigorously test whether this is due to the encoding of “pooled” summary statistics of the image within the RF, we characterized responses and selectivity for a variety of target-distractor relationships. We find that the pooling model is a reasonable approximation for neuronal responses when targets and distractors are either all similar or all different. But when the distractors are all similar and can be perceptually grouped, the target becomes salient by contrast. This saliency is reflected in the neuronal responses and animal behavior being more resistant to crowding and occlusion. Thus, target saliency in terms of featural contrasts trumps pooled encoding. These results are consistent with a normalization model where target saliency titrates the relative influence of different stimuli in the normalization pool.
Inhibitory Competition in Figure assignment: Insights from brain and behavior
Mary A. Peterson1; 1University of Arizona
Behavioral and neural evidence indicates that the organization of the visual field into figures (i.e., objects) and their local grounds is not a simple, early, stage of processing, as traditional theories supposed. Instead, figure/object detection entails competition between different interpretations that might be seen. In the first part of my talk, I will discuss behavioral evidence that multiple interpretations compete in the classic demonstration that convexity is a figural prior. In the second part, I will present neural evidence of suppression in the BOLD response to the groundside of objects when a portion of a familiar configuration was suggested there but lost the competition for perception. These results begin to elucidate the complex interactions between local and global, high- and low-level factors involved in perceptually organizing the visual field into objects and backgrounds.
The neuronal mechanisms for object-based attention and how they solve the binding problem
Pieter Roelfsema1,2,3; 1Netherlands Institute for Neuroscience, 2Vrije Universiteit Amsterdam, 3Academic University Medical Center, Amsterdam
Our visual system groups image elements of objects and segregates them from other objects and the background. I will discuss the neuronal mechanisms for these grouping operations, proposing that there are two processes for perceptual grouping. The first is ‘base grouping’, which is a process that relies on neurons tuned to feature conjunctions and occurs in parallel across the visual scene. If there are no neurons tuned to the required feature conjunctions, a second process, called ‘incremental grouping’, comes into play. Incremental grouping is a time-consuming and capacity-limited process, which relies on the gradual spread of enhanced neuronal activity across the distributed representation of an object in the visual cortex, during a delayed phase of the neuronal responses. Incremental grouping can occur for only one object at any one time. The spread of enhanced activity corresponds to the spread of object-based attention at the psychological level of description. Hence, we found that the binding problem is solved by labelling the representation of image elements in the visual cortex with enhanced activity and we did not obtain any evidence for a role of neuronal synchronization. Inhibition of the late-phase activity in primary visual cortex completely blocked figure-ground perception, demonstrating a causal link between enhanced neuronal activity and perceptual organization. These neuronal mechanisms for perceptual grouping account for many of the perceptual demonstrations by the Gestalt psychologists.
Neural correlates of local parallelism during naturalistic vision
Dirk B. Walther1; 1University of Toronto
Human observers can rapidly perceive complex real-world scenes. Grouping visual elements into meaningful units is an integral part of this process. Yet, so far, the neural underpinnings of perceptual grouping have only been studied with simple lab stimuli. We here uncover the neural mechanisms of one important perceptual grouping cue, local parallelism. Using a new, image-computable algorithm for detecting local symmetry in line drawings and photographs, we manipulated the local parallelism content of real world scenes. We decoded scene categories from patterns of brain activity obtained via functional magnetic resonance imaging (fMRI) in 38 human observers while they viewed the manipulated scenes. Decoding was significantly more accurate for scenes containing strong local parallelism compared to weak local parallelism in the parahippocampal place area (PPA), indicating a central role of parallelism in scene perception. To investigate the origin of the parallelism signal we performed a model-based fMRI analysis of the public BOLD5000 dataset, looking for voxels whose activation time course matches that of the locally parallel content of the 4916 photographs viewed by the participants in the experiment. We found a strong relationship with average local symmetry in visual areas V1-4, PPA, and retrosplenial cortex (RSC). Notably, the parallelism-related signal peaked first in V4, suggesting V4 as the site for extracting paralleism from the visual input. We conclude that local parallelismis a perceptual grouping cue that influences neuronal activity throughout the visual hierarchy, presumably starting at V4. Parallelism plays a key role in the representation of scene categories in PPA.
How we make saccades: selection, control, integration
Friday, May 13, 2022, 2:30 – 4:30 pm EDT, Talk Room 1
Organizers: Emma Stewart1, Bianca R. Baltaretu1; 1Justus-Liebig University Giessen, Germany
Presenters: Jacqueline Gottlieb, Michele Basso, J. Patrick Mayo, J. Douglas Crawford, Alexander C. Schütz
Everyday behaviour is facilitated by our ability to correctly locate and fixate pertinent objects and locations within our surroundings. This is accomplished through the 2-3 saccades per second that are made to gather visual information and guide actions. With each saccade, however, a complex series of processes occurs: a saccade target must be selected, motor commands must ensure accurate saccade execution, and the perceptual consequences of making a saccade need to be accounted for. Over the past few decades, a wealth of research has given us insight into the related, intricate neural and behavioural mechanisms underlying saccade production. However, recent research has uncovered more nuanced roles for key established neural regions associated with the selection, control, and integration of saccadic eye movements. These regions extend from subcortical superior colliculus (SC) to the frontal cortex (i.e., frontal eye field, FEF) and posterior parietal cortex (Sommer & Wurtz, 1998, 2004; Medendorp et al., 2003). New evidence has also been uncovered about the goals and strategies of saccade target selection, and about how saccades actively shape and change our predictions about, and perception of, the world. While the underlying circuitry may have been identified, there is a significant gap in our knowledge about the complex interactions between the discrete neural components of a saccade, the goals that drive saccades, and the perception of the world that precedes and follows these events. In this symposium, leading researchers will unveil a more sophisticated perspective on the sequence of processes that occur before, during, and after a saccade, in humans and non-human primates, with a focus on three key areas. 1) Selection: What complex neural and behavioral processes underlie target selection? What new evidence is there for where perceptual decisions that drive saccades occur in the brain? Jacqueline Gottlieb will outline the link between target selection and uncertainty reduction in belief states, linking theories of information sampling with neurophysiological evidence. Furthermore, Michele Basso will highlight a new role for the superior colliculus in perceptual decision-making, reforming our understanding of the function of this subcortical structure. 2) Control: Once a target is selected, how does the visuomotor system exert its control over eye movements? J. Patrick Mayo will discuss new neurophysiological findings on the crucial role that FEF neurons play in online oculomotor control and decisions. 3) Integration: How does the visual system reconcile behaviourally and cortically pre- and postsaccadic information to perceive a seamless world across saccades? Doug Crawford will reveal the nature and identity of the cortical mechanism(s) that underlie object feature integration across saccades, and for action (i.e., grasping). Finally, Alexander Schütz will discuss recent behavioural and computational insights into how humans reconcile the perceptual differences in peripheral and foveal input across saccades, which will outline how we ultimately perceive the world across saccades. By bringing together behavioural, neurophysiological, neuroimaging, and computational findings, this symposium will present groundbreaking new advances that will establish a contemporary understanding of how saccades are made.
Saccadic control for reducing uncertainty
Jacqueline Gottlieb1; 1Columbia University
Saccades gather visual information. Although few scientists would question this statement, the neural mechanisms of saccade target selection are typically described in terms of reward with no reference to information. I will describe evidence from my laboratory that attentional sampling is sensitive to expected information gains (EIG). Saccade selective neurons in the parietal cortex are modulated by the two quantities that determine EIG – uncertainty and predictive validity – independently of rewards. Moreover, the effects of uncertainty before the saccade modulate the efficiency with which monkeys use the information after the saccade. The findings suggest that saccade target selection is closely coordinated with our belief states and is geared toward reducing the future uncertainty of those states.
A causal role of the primate superior colliculus in perceptual decision-making
Michele Basso1; 1University of California Los Angeles
People with Parkinson’s disease show impairments in their ability to use memory information to guide choices of action when faced with perceptual uncertainty. Changes in the inhibitory output of the basal ganglia underlies motor symptoms in Parkinson’s disease. The superior colliculus, a brainstem target of the basal ganglia, is known to play a role in aspects of attention and decision-making. Therefore, we asked whether changes in the level of inhibition in the superior colliculus altered the ability of monkeys to make perceptual decisions. Trained monkeys performed a two-choice perceptual decision-making task in which they reported the perceived orientation of a dynamic Glass pattern, before and after unilateral, reversible, inactivation of the superior colliculus. We found that unilateral SC inactivation produced significant decision biases and changes in reaction times consistent with a causal role for the primate superior colliculus in perceptual decision-making. Fitting signal detection theory and sequential sampling models to the data showed that superior colliculus inactivation produced a decrease in the relative evidence for contralateral decisions, as if adding a constant offset to a time-varying evidence signal for the ipsilateral choice. The results provide causal evidence for an embodied cognition model of perceptual decision-making and provide compelling evidence that the superior colliculus of primates (a brainstem structure) plays a causal role in how evidence is computed for decisions-a process usually attributed to the forebrain.
The interaction of saccadic and smooth pursuit eye movements signals in macaque frontal eye fields
J. Patrick Mayo1, Ruitong “Larry” Jiang1; 1The University of Pittsburgh
Natural vision involves the constant coordination of multiple different types of eye movements. Prior research has tended to focus on behavioral and neuronal correlates of a single type of eye movement (e.g., only saccades or only smooth pursuit). These investigations have set the stage for our current work on the selection and control of different types of eye movements. We recorded neuronal activity in the macaque frontal eye fields, a region of prefrontal cortex with an established role in saccadic control and smooth pursuit, while monkeys made saccades and pursuit in one of eight directions. Although the interaction of saccade and pursuit signals is traditionally thought to be minimal in FEF, we set out to test this idea by recording from populations of neurons using multi-contact linear electrode arrays. Taking inspiration from the classic characterization of visual-saccadic activity in FEF (“VMI”; visual-motor index), we created a contrast ratio called the Saccade Pursuit Index (SPI) to measure the relative firing rates of individual neurons to saccadic and smooth pursuit eye movements. We found that a large proportion of neurons elicited roughly equal firing rates during saccades and pursuit, forming a relatively continuous and unimodal distribution of SPI values. We extended our analyses to pairs of simultaneously recorded neurons, where the independence of saccadic and pursuit signals was evaluated using spike count correlations (“noise” correlations). Our results suggest that FEF neurons interact across different types of eye movements more than previously assumed, implicating FEF in the online control of real-time oculomotor decisions.
Cortical networks for transsaccadic perception: fMRI and functional connectivity
J. Douglas Crawford1, Bianca R. Baltaretu2, Benjamin T. Dunkley3, George Tomou1; 1York University, Toronto, Canada, 2Justus-Liebig University Giessen, Germany, 3Hospital for Sick Children, Toronto, Canada
Transsaccadic perception (TSP) requires the retention, updating, and integration / comparison of visual information obtained before and after a saccade. Based on our earlier psychophysical and TMS studies, we hypothesized that TSP taps into frontoparietal mechanisms for spatial updating, and that low level location/feature integration might occur through feedback to occipital cortex, whereas higher level interactions might occur through lateral dorsoventral connectivity or prefrontal convergence (e.g., Prime et al., Philos. Trans. R. Soc. Lond., B, Biol. Sci. 2011). Here, we set about localizing these interactions using an fMRI adaptation paradigm. We found evidence for transsaccadic orientation perception in supramarginal gyrus (SMG) (Dunkley et al., Cortex 2016). We then extended SMG’s role to updating object orientation for grasp, engaging a functional network including the frontal eye fields and parietal grasp areas (Balaretu et al., J. Neurosci. 2021). However, when we applied this approach to spatial frequency, we found saccade-feature interactions in dorsal occipital cortex (Baltaretu et al., Sci. Rep. 2021). Most recently, we employed a task involving transsaccadic discrimination of object orientation versus shape (Baltaretu et al., bioRxiv 2021). Graph theory analysis revealed a bilateral dorsal functional module extending across parietofrontal cortex, whereas saccade-feature interactions fell within two lateralized occipital modules that rejoined in the presence of saccades. Overall, our data are consistent with the notion that TSP is a cortical network phenomenon that includes interactions between saccade signals and spatial features (location, orientation) in parietal cortex versus identity-related features (spatial frequency, shape) in occipital cortex.
Interaction of peripheral and central visual information in transsaccadic perception
Alexander C. Schütz1, Emma E.M. Stewart2, Matteo Valsecchi3; 1Phillips-Universität Marburg, Germany, 2Justus-Liebig University Giessen, Germany, 3Universitá di Bologna, Italy
In active vision, relevant objects are selected in the peripheral visual field and then brought to the central visual field by saccadic eye movements. Hence, there are usually two sources of visual information about an object: information from peripheral vision before a saccade and information from central vision after a saccade. The well-known differences in processing and perception between the peripheral and the central visual field lead to the question whether and how the two pieces of information are matched and combined. This talk will provide an overview about different mechanisms that may alleviate differences between peripheral and central representations and allow for a seamless perception across saccades. Transsaccadic integration results in a weighted combination of peripheral and central information according to their relative reliability, such that uncertainty is minimized. It is a resource-limited process that does not apply to the whole visual field, but only to attended objects. Nevertheless, it is not strictly limited to the saccade target, but can be flexibly directed to other relevant locations. Transsaccadic prediction uses peripheral information to estimate the most likely appearance in the central visual field. This allows appearance to be calibrated in the peripheral and central visual field. Such a calibration is not only relevant to maintain perceptual stability across saccades, but also to match templates for visual search in peripheral and central vision.
Beyond representation and attention: Cognitive modulations of activity in visual cortex
Friday, May 13, 2022, 12:00 – 2:00 pm EDT, Talk Room 2
Organizers: Alex White1, Kendrick Kay2; 1Barnard College, Columbia University, 2University of Minnesota
Presenters: Alex L. White, Clare Press, Charlie S. Burlingham, Clayton E. Curtis, Jesse Breedlove
The concept of sensory representation has been immensely productive for studying visual cortex, especially in the context of ‘image-computable models’ of visually evoked responses. At the same time, many experiments have demonstrated that various forms of attention modulate those evoked responses. Several computational models of attention explain how task-relevant stimuli are represented more faithfully than task-irrelevant stimuli. However, these models still paint an incomplete picture of processing in visual cortex. Activity in visual brain regions has been shown to depend on complex interactions between bottom-up sensory input and task demands. In many cases, that activity is affected by cognitive factors that are not clearly related to sensory representation or attention, such as memory, arousal, and expectation. This symposium will bring together complementary perspectives on cognitive effects on activity in visual cortex. Each speaker will present a recently studied interaction of vision and cognition and how it manifests in experimental data. In addition, the speakers will consider the underlying mechanisms for the effects they observe. Key questions include: Are visual representations simply enhanced for any behaviorally relevant stimulus, or do task-specific neural networks modulate visual cortex only in the presence of specific stimuli? How do we interpret activity observed in the absence of retinal stimulation? Are there distinct representational systems for visual working memory, imagery, and expectations? In a final panel discussion, we will broach additional fundamental issues: To what extent is it possible to study representation in the absence of manipulating cognition? How can we build formal models that account for the range of cognitive and sensory effects in visual cortex? Each of the 5 speakers will be allotted 15 minutes for presentation plus 3 minutes for audience questions, for a total of 18 minutes per speaker. The final panel discussion will last 30 minutes. The panel will be moderated by Kendrick Kay, who will deliver an initial brief summary that will attempt to integrate the disparate studies presented by the speakers and weave together a coherent bigger picture regarding overall challenges and goals for studying cognitive effects on activity in visual cortex. A panel discussion will follow with questions posed by the moderator, as well as questions solicited from the audience.
High specificity of top-down modulation in word-selective cortex
Alex L. White1, Kendrick Kay2, Jason D. Yeatman3; 1Barnard College, Columbia University, 2University of Minnesota, 3Stanford University
Visual cortex is capable of processing a wide variety of stimuli for any number of behavioral tasks. So how does the specific information required for a given task get selected and routed to other necessary brain regions? In general, stimuli that are relevant to the current task evoke stronger responses than stimuli that are irrelevant, due to attentional selection on the basis of visual field location or non-spatial features. We will first review evidence that such attentional effects happen in category-selective regions, such as the visual word form area, as well as early retinotopic regions. We will then demonstrate evidence for top-down effects that are not domain-general, but extremely specific to task demands, stimulus features, and brain region. We measured fMRI responses to written words and non-letter shapes in retinotopic areas as well as word- and face-selective regions of ventral occipitotemporal cortex. In word-selective regions, letter strings evoked much larger responses when they were task-relevant (during a lexical decision task) than when they were irrelevant (during a color change task on the fixation mark). However, non-letter shapes evoked smaller responses when they were task-relevant than when irrelevant. This surprising modulation pattern was specific to word-selective regions, where response variability was also highly correlated with a region in the pre-central sulcus that is involved in spoken language. Therefore, we suggest that top-down modulations in visual cortex do not just generally enhance task-relevant stimuli and filter irrelevant stimuli, but can reflect targeted communication with broader networks recruited for specific tasks.
The influence of expectation on visual cortical processing
Clare Press1,2, Emily Thomas1,3, Daniel Yon1; 1Birkbeck, University of London, 2University College London, 3New York University
It is widely assumed that we must use predictions to determine the nature of our perceptual experiences. Work from the last few years suggests that supporting mechanisms operate via top-down modulations of sensory processing. However, theories within the domain of action concerning the operation of these mechanisms are at odds with those from other perceptual disciplines. Specifically, action theories propose that we cancel predicted events from perceptual processing to render our experiences informative – telling us what we did not already know. In contrast, theories outside of action – typically couched within Bayesian frameworks – demonstrate that we combine our predictions (priors) with the evidence (likelihood) to determine perception (posterior). Such functions are achieved via predictions sharpening processing in early sensory regions. In this talk I will present three fMRI studies from our lab that ask how these predictions really shape early visual processing. They will ask whether action predictions in fact shape visual processing differently from other types of prediction and about differences in representation across different cortical laminae. The studies compare processing of observed avatar movements and simple grating events, and ask about the information content associated with the stimulus types as well as signal level across different types of voxels. We can conclude that action expectations exhibit a similar sharpening effect on visual processing to other expectations, rendering our perception more veridical on average. Future work must now establish how we also use our predictions – across domains – to yield informative experiences.
Task-related activity in human visual cortex
Charlie S. Burlingham1, Zvi Roth2, Saghar Mirbagheri3, David J. Heeger1, Elisha P. Merriam2; 1New York University, 2National Institute of Mental Health, National Institutes of Health, 3University of Washington
Early visual cortex exhibits widespread hemodynamic responses during task performance even in the absence of a visual stimulus. Unlike the effects of spatial attention, these “task-related responses” rise and fall around trial onsets, are spatially diffuse, and even occur in complete darkness. In visual cortex, task-related and stimulus-evoked responses are similar in amplitude and sum together. Therefore, to interpret BOLD fMRI signals, it is critical to characterize task-related responses and understand how they change with task parameters. We measured fMRI responses in early visual cortex (V1/2/3) while human observers judged the orientation of a small peripheral grating in the right visual field. We measured task-related responses by only analyzing voxels in the ipsilateral hemisphere, i.e., far from the stimulus representation. Task-related responses were present in all observers. Response amplitude and timing precision were modulated by task difficulty, reward, and behavioral performance, variables that are frequently manipulated in cognitive neuroscience experiments. Surprising events, e.g., responding incorrectly when the task was easy, produced the largest modulations. Response amplitude also covaried with peripheral signatures of arousal, including pupil dilation and changes in heart rate. Our findings demonstrate that activity in early visual cortex reflects internal state — to such a large extent that behavioral performance can have a greater impact on BOLD activity than a potent visual stimulus. We discuss the possible physiological origins of task-related responses, what information about internal state can be gleaned from them, and analytic approaches for modelling them.
Unveiling the abstract format of mnemonic representations
Clayton E. Curtis1, Yuna Kwak1; 1New York University
Working memory (WM) enables information storage for future use, bridging the gap between perception and behavior. We hypothesize that WM representations are abstractions of low-level perceptual features. Yet the neural nature of these putative abstract representations has thus far remained impenetrable. Here, we first demonstrate that distinct visual stimuli (orientated gratings and moving dots) are flexibly re-coded into the same WM format in visual and parietal cortex when that representation is useful for memory-guided behavior. Next, we aimed to reveal the latent nature of the abstract WM representation. We predicted that the spatial distribution of higher response amplitudes across a topographic map forms a line at a given angle, as if the retinal positions constituting a line were actually visually stimulated. To test this, we reconstructed the spatial profile of neural activity during WM by projecting the amplitudes of voxel activity during the delay period for each orientation and direction condition into visual field space using parameters obtained from models of each visual map’s population receptive field. Remarkably, the visualization technique unveiled a stripe encoded in the amplitudes of voxel activity at an angle matching the remembered feature in many of the visual maps. Finally, we used models of V1 that demonstrate the feasibility of such a working memory mechanism and ruled out potential confounds. We conclude that mnemonic representations in visual cortex are abstractions of percepts that are more efficient than and proximal to the behaviors they guide.
With or without the retina: analyses of non-optic visual activity in the brain
Jesse Breedlove1, Ghislain St-Yves1, Logan Dowdle1, Tom Jhou2, Cheryl Olman1, Thomas Naselaris1; 1University of Minnesota, 2Medical University of South Carolina
One way to investigate the contribution of cognition on activity in the visual cortex is to fix or remove the retinal input altogether. There are many such non-optic visual experiences to draw from (e.g., mental imagery, synesthesia, hallucinations), all of which produce brain activity patterns consistent with the visual content of the experience. But how does the visual system manage to both accurately represent the external world and synthesize visual experiences? We approach this question by expanding on a theory that the human visual system embodies a probabilistic generative model of the visual world. We propose that retinal vision is just one form of inference that this internal model can support, and that activity in visual cortex observed in the absence of retinal stimulation can be interpreted as the most probable consequence unpacked from imagined, remembered, or otherwise assumed causes. When applied to mental imagery, this theory predicts that the encoding of imagined stimuli in low-level visual areas will resemble the encoding of seen stimuli in higher areas. We confirmed this prediction by estimating imagery encoding models from brain activity measured while subjects imagined complex visual stimuli accompanied by unchanging retinal input. In a different fMRI study, we investigated another far rarer form of non-optic vision: a case subject who, after losing their sight to retinal degeneration, now “sees” objects they touch or hear. The existence of this phenomenon further supports visual perception being a generative process that depends as much on top-down inference as on retinal input.
Beyond objects and features: High-level relations in visual perception
Friday, May 13, 2022, 12:00 – 2:00 pm EDT, Talk Room 1
Organizers: Chaz Firestone1, Alon Hafri1; 1Johns Hopkins University
Presenters: Alon Hafri, Melissa Le-Hoa Võ, Liuba Papeo, Daniel Kaiser, Hongjing Lu
A typical VSS program devotes sections to low-level properties such as motion, orientation, and location; higher-level properties such as faces, scenes, and materials; and core visual processes such as working memory and attention. Yet a notable absence among these are relational representations: properties holding *between* elements, beyond any properties each element has on its own. For example, beyond perceiving red apples and glass bowls, we may also see apples contained inside bowls; beyond perceiving an object and its motion, we may see it collide with another object; and beyond perceiving two agents, we may also see them socially interact. The aim of this symposium is to showcase work that investigates relational representation using the methods and tools of vision science, including classic paradigms from visual cognition, modern neuroimaging techniques, and state-of-the-art computational modeling. A central theme is that fully understanding the nature of visual perception — including core processes such as object and scene representation, visual attention, and working memory — requires a consideration of how visual elements relate to one another. First, Alon Hafri and Chaz Firestone will provide an overview of the “relational landscape”. They will delineate criteria for determining whether a relational property is perceived rather than merely judged or inferred, and they will discuss several case studies exemplifying this framework. Second, Melissa Võ will discuss her work on “scene grammar”, whereby the mind represents natural environments in terms of the typical composition of their objects (e.g., soap generally appears on sinks). Võ suggests that certain clusters of objects (especially “anchor objects”) guide visual search, object perception, and memory. Third, Liuba Papeo will present her work on social relations (e.g., when two agents approach, argue, or fight). Papeo shows that the visual system identifies social relations through a prototypical “social template”, and she explores the ways such representations generalize across visual contexts. Fourth, Daniel Kaiser will extend the discussion from objects to scene structure. Using neuroimaging evidence, he shows that natural scene processing is fundamentally relational: when configural relations between scene parts are disrupted, there are downstream consequences for scene and object processing. Finally, Hongjing Lu and Phil Kellman will discuss the computational machinery necessary to achieve relational representations. Although deep-learning models achieve remarkable success at many vision tasks, Lu and Kellman present modeling evidence arguing that abstract structure is necessary for representing visual relations in ways that go beyond mere pattern classification. Overall, this work explores how relational structure plays a crucial role in how we see the world around us, and raises important questions for future vision science research. David Marr famously defined vision as the capacity to “know what is where by looking” — to represent objects and their features, located somewhere in space. The work showcased here adds an exciting dimension to this capacity: not only what and where, but “how” visual elements are configured in their physical and social environment.
Perceiving relational structure
Alon Hafri1, Chaz Firestone1; 1Johns Hopkins University
When we open our eyes, we immediately see the colors, shapes, and sizes of the objects around us — round apples, wooden tables, small kittens, and so on — all without effort or intention. Now consider relations between these objects: An apple supported by a table, or two kittens chasing one another. Are these experiences just as immediate and perceptual, or do they require effort and reflection to arise? Which properties of relations are genuinely perceived, and how can we know? Here, we outline a framework for distinguishing perception of relations from mere judgments about them, centered on behavioral “signatures” that implicate rapid, automatic visual processing as distinct from high-level judgment. We then discuss several case studies demonstrating that visual relations fall within this framework. First, we show that physical relations such as containment and support are extracted in an abstract manner, such that instances of these relations involving very different objects are confused for one another in fast target-identification tasks. Second, we show that the mind “fills in” required elements of a relation that are inferred from physical interaction (e.g., a man running into an invisible “wall”), producing visual priming in object detection tasks. Third, we show that when objects look like they can physically fit together, this impression influences numerosity estimates of those objects. We argue that visual processing itself extracts sophisticated, structured relations, and we reflect on the consequences of this view for theorizing about visual perception more broadly.
Hierarchical relations of objects in real-world scenes
Melissa Le-Hoa Võ1; 1Goethe University – Frankfurt
The sources that guide attention in real-world scenes are manifold and interact in complex ways. We have been arguing for a while now that attention during scene viewing is mainly controlled by generic scene knowledge regarding the meaningful composition of objects that make up a scene (a.k.a. scene grammar). Contrary to arbitrary target objects placed in random arrays of distractors, objects in naturalistic scenes are placed in a very rule-governed manner. In this talk, I will highlight some recent studies from my lab in which we have tried to shed more light on the hierarchical nature of scene grammar. In particular, we have found that scenes can be decomposed into smaller, meaningful clusters of objects, which we have started to call “phrases”. At the core of these phrases you will find so-called “anchor objects”, which are often larger, stationary objects that anchor strong relational predictions about where other objects within the phrase are expected to be. Thus, within a “phrase” the spatial relations of objects are strongly defined. Manipulating the presence of anchor objects, we were able to show that both eye movements and body locomotion are strongly guided by these anchor objects when carrying out actions with naturalistic 3D settings. Overall, the data I will present will provide further evidence for the crucial role that anchor objects play in structuring the composition of scenes and thereby critically affecting visual search, object perception and the forming of memory representations in naturalistic environments.
(In what sense) We see social relations
Liuba Papeo1,2; 1CNRS, 2Université Claude Bernard Lyon
The most basic social relation is realized when two social agents engage in a physical exchange, or interaction. How do representations of social interactions come about, from basic processing in visual perception? Behavioral and neuroimaging phenomena show that human vision (and selective areas of the visual cortex) discriminates between scenes involving the same bodies, based on whether the individuals appear to interact or not. What information in a multiple-body scene channels the representation of social interaction? And what exactly is represented of a social relation in the visual system? I will present behavioral results, based on a switch cost paradigm, showing that the visual system exploits mere spatial information (i.e., relative position of bodies in space and posture features) to “decide” not only whether there is an interaction or not, but also who the agent and the patient are. Another set of results, based on a backward masking paradigm, shows that the visual processing of socially-relevant spatial relations is agnostic to the content of the interaction, and indeed segregated from, and prior to, (inter)action identification. Thus, drawing a divide between perception and cognition, the current results suggest that the visual representation of social relations corresponds to a configuration of parts (bodies/agents) that respect the spatial relations of a prototypical social interaction –a sort of social-template, theoretically analogous to the face- or body-template in the visual system– before inference. How specific/general to different instances of social interaction this template is will be the main focus of my discussion.
The role of part-whole relations in scene processing
Daniel Kaiser1; 1Justus-Liebig-Universität Gießen
Natural scenes are not arbitrary arrangements of unrelated pieces of information. Their composition rather follows statistical regularities, with meaningful information appearing in predictable ways across different parts of the scene. Here, I will discuss how characteristic relations across different scene parts shape scene processing in the visual system. I will present recent research, in which I used variations of a straightforward “jumbling” paradigm, whereby scenes are dissected into multiple parts that are then either re-assembled into typical configurations (preserving part-whole relations) or shuffled to appear in atypical configurations (disrupting part-whole relations). In a series of fMRI and EEG studies, we showed that the presence of typical part-whole relations has a profound impact on visual processing. These studies yielded three key insights: First, responses in scene-selective cortex are highly sensitive to spatial part-whole relations, and more so for upright than for inverted scenes. Second, the presence of typical part-whole structure facilitates the rapid emergence of scene category information in neural signals. Third, the part-whole structure of natural scenes supports the perception and neural processing of task-relevant objects embedded in the scene. Together, these results suggest a configural code for scene representation. I will discuss potential origins of this configural code and its role in efficient scene parsing during natural vision.
Two Approaches to Visual Relations: Deep Learning versus Structural Models
Hongjing Lu1, Phil Kellman1; 1University of California, Los Angeles
Humans are remarkably adept at seeing in ways that go well beyond pattern classification. We represent bounded objects and their shapes from visual input, and also extract meaningful relations among object parts and among objects. It remains unclear what representations are deployed to achieve these feats of relation processing in vision. Can human perception of relations be best emulated by applying deep learning models to massive numbers of problems, or should learning instead focus on acquiring structural representations, coupled with the ability to compute similarities based on such representations? To address this question, we will present two modeling projects, one on abstract relations in shape perception, and one on visual analogy based on part-whole relations. In both projects we compare human performance to predictions derived from various deep learning models and from models based on structural representations. We argue that structural representations at an abstract level play an essential role in facilitating relation perception in vision.