How does the brain combine generative models and direct discriminative computations for visual inference?

< Back to 2023 Symposia

Symposium: Friday, May 19, 2023, 12:00 – 2:00 pm, Talk Room 2

Organizers: Benjamin Peters1, Nikolaus Kriegeskorte1; 1Columbia University
Presenters: Benjamin Peters, Ralf Haefner, Divya Subramanian, Doris Tsao, Thomas Naselaris

How might the brain combine generative models and direct discriminative computations for visual inference? Using primate neurophysiology, human neuroimaging, psychophysics, and computational modeling, the speakers of this symposium will address the questions from different angles. Benjamin Peters will introduce the main question and discuss the importance of scaling visual tasks to real-world complexity. Ralf Haefner will talk about the role of feedback in a hierarchical generative model of the visual system. Divya Subramanian will show how generative and discriminative models can support stability across saccades. Doris Tsao will report evidence of how ambiguous, noisy, and internally generated images evoked by electrical stimulation, drug-induced hallucinations, and dreams are processed in the macaque face patch system. Thomas Naselaris will conclude the panel by discussing how mental imagery might be integral, irrelevant, or even obstructive to visual inference. Each 15-min talk will be followed by a five-minute Q&A session. The talks will be followed by a 20-minute panel discussion about the kind of experiments and data we need to discern whether and how vision combines generative models and discriminative computations. The topic is particularly timely since the recent surge in deep generative and discriminative models in machine learning has led to an enriched explanatory framework for understanding primate vision that integrates established vision theories with progress in statistical inference and engineering. The advent of high-density neural recording techniques and large-scale computational models creates unprecedented opportunities for making substantial progress over the coming years. The symposium aims to develop a unified perspective on the two conceptions of vision, inspire novel hybrid models, and establish agreement on what evidence we should pursue experimentally. The symposium will benefit junior and senior members of the vision science community alike. Speakers will present cutting-edge research relevant to a broad audience while introducing the debate didactically and touching upon deeper questions of models and explanations in vision science.

Presentations

Naturalistic primate vision combines generative and discriminative computations

Benjamin Peters1; 1Columbia University

Biological and machine vision needs to be statistically and computationally efficient, enabling both robust and rapid perception from sensory evidence. The normative ideal, as embodied by the generative approach, interprets evidence optimally in the context of a statistical model of the world. This ideal is computationally expensive or even intractable for naturalistic and dynamic vision. The discriminative approach emphasizes rapid inference without the use of an explicit generative model. I will introduce generative and discriminative models as an explanatory framework characterizing visual inference at different levels of analysis. Models of primate vision can be understood as points in a vast space of possible inference models that contains classical algorithms like predictive coding, the feedforward cascade of filters in a convolutional neural network, along with more novel concepts from engineering such as amortized inference. I will clarify what kind of evidence is required to discern whether primate visual inference is generative. A key insight is that “more is different”: what seems viable as a visual inference algorithm for an abstracted toy task may not scale to naturalistic real-world vision. We should therefore scale our tasks to be more naturalistic and dynamic, exposing the brain’s unique combination of generative models and discriminative computations.

Behavioral and neural evidence that the visual system performs approximate inference in a hierarchical generative model

Ralf Haefner1; 1University of Rochester

Whether visual processing in cortex is best modeled as Bayesian inference using a generative model, or a discriminative model, is an important open question (DiCarlo et al. 2021, CCN/GAC). A critical clue to answering this question lies in the functional role of the ubiquitous feedback connections in cortex (Felleman & Van Essen 1991). Inference in a hierarchical generative framework suggests that their role is to communicate top-down expectations (Lee & Mumford 2003). I will present recent behavioral and neurophysiological results that are compatible with this hypothesis; results that are difficult to explain in the context of alternative hypotheses about the role of feedback signals, attention or learning. Our behavioral results in the context of a classic discrimination task strongly suggest that expectations are communicated from decision-related areas to sensory areas on a time scale of 10s to 100s of milliseconds. In the context of classic evidence integration tasks, this feedback leads to a perceptual confirmation bias that is measurable as both a primacy effect (Lange et al. 2021) and overconfidence (Chattoraj et al. 2021). Importantly, the strength of this bias depends on the nature of the sensory inputs in a way that is predicted by approximate hierarchical inference, and that can explain a range of seemingly contradictory findings about the nature of temporal biases. Finally, I will present empirical evidence for a surprising neural signature of feedback-related expectation signals, namely that they induce information-limiting correlations between sensory neurons, again as predicted from approximate hierarchical inference (Lange et al. 2022).

Bayesian and Discriminative Models for Visual Stability across Saccades

Divya Subramanian1,2, John M. Pearson1, Marc A. Sommer1; 1Duke University, 2National Institutes of Health (NIH)

The brain interprets sensory inputs to guide behavior, but behavior itself disrupts sensory inputs. Perceiving a coherent world while acting in it constitutes active perception. For example, saccades constantly displace the retinal image and yet, we perceive a stable world. The visual system must compare the predicted sensory consequence of each saccade with the incoming sensory input to judge whether a mismatch occurred. This process is vulnerable to sensory uncertainty from two potential sources: external noise in the world (“image noise”) and internal uncertainty due to one’s own movements (“motor-driven noise”). Since Bayesian models have been influential in explaining how priors can compensate for sensory uncertainty, we tested whether they are used for visual stability across saccades. We found that non-human primates (2 rhesus macaques) used priors to compensate for internal, motor-driven noise in a Bayesian manner. For external, image noise, however, they were anti-Bayesian. Instead, they likely used a discriminative strategy, suggesting that vision across saccades is governed by both Bayesian and discriminative strategies. Next, we tested whether single neurons in the Frontal Eye Field (FEF), which receives both internal saccade-related and external visual information, support either the Bayesian or anti-Bayesian (discriminative) strategies. We found that FEF neurons predict the anti-Bayesian, but not Bayesian, behavior. Taken together, the results demonstrate that Bayesian and discriminative computations for visual stability are dissociable at the behavioral and neural levels and situate FEF along a pathway that selectively supports the discriminative contribution.

Probing for the existence of a generative model in the macaque face patch system

Doris Tsao1,2; 1UC Berkeley, 2Howard Hughes Medical Center

The idea that the brain contains a generative model of reality is highly attractive, explaining both how a perceptual system can converge on the correct interpretation of a scene through an iterative generate-and-compare process, and how it can learn to represent the world in a self-supervised way. Moreover, if consciousness corresponds directly to top-down generated contents, this would elegantly explain the mystery of why our perception of ambiguous images is always consistent across all levels. However, experimental evidence for the existence of a generative model in the visual system remains lacking. I will discuss efforts by my lab to fill this gap through experiments in the macaque face patch system, a set of regions in inferotemporal cortex dedicated to processing faces. This system is strongly connected in both feedforward and feedback directions, providing an ideal testbed to probe for the existence of a generative model. Our experiments leverage simultaneous recordings from multiple face patches with high channel-count Neuropixels probes to address representation in three realms: (1) representation of ambiguous images, (2) representation of noisy/degraded images, (3) representation of internally generated images evoked by electrical stimulation, drug-induced hallucinations, and dreams. In each case, we ask: is the content and dynamics of representation across the face patch network consistent with a generative model of reality?

Why is the human visual system generative?

Thomas Naselaris1; 1University of Minnesota

The human visual system is obviously generative: most humans can and do generate imagery in the absence of retinal stimulation, and the internal generation of imagery clearly engages the entire visual cortex. However, we know very little about what the brain does with its ability to generate images. We will consider the hypothesis that the ability of visual cortex to generate imagery is a consequence of housing a generative model of the world that is needed to see. We will present empirical evidence from mental imagery experiments that imagery and vision rely upon the same generative model to make inferences that are conditioned on unseen and seen data, respectively. We will then consider evidence for the alternate hypothesis that generativity is not for seeing, and may even obstruct seeing. According to this hypothesis, non-visual systems may route their process-specific variables through the visual cortex to non-visual solve tasks. These extra-visual inputs may evoke visual imagery, and may even use visual imagery, while contributing nothing to or even obstructing seeing. We review evidence from the animal literature that appears to support this view, and propose several novel experiments to adjudicate between the two hypotheses.

< Back to 2023 Symposia