Understanding representation in visual cortex: why are there so many approaches and which is best?

Organizers: Thomas Naselaris & Kendrick Kay; Department of Neurosciences, Medical University of South Carolina & Department of Psychology, Washington University in St. Louis
Presenters: Thomas Naselaris, Marcel van Gerven, Kendrick Kay, Jeremy Freeman, Nikolaus Kriegeskorte, James J. DiCarlo, MD, PhD

< Back to 2014 Symposia

Symposium Description

Central to visual neuroscience is the problem of representation: what features of the visual world drive activity in different areas of the visual system? Receptive fields and tuning functions have long served as the basic descriptive elements used to characterize visual representations. In recent years, the receptive field and the tuning function have been generalized and in some cases replaced with alternative methods for characterizing visual representation. These include decoding and multivariate pattern analysis, representational similarity analysis, the use of abstract semantic spaces, and models of stimulus statistics. Given the diversity of approaches, it is important to consider whether these approaches are simply pragmatic, driven by the nature of the data being collected, or whether these approaches might represent fundamentally new ways of characterizing visual representations. In this symposium, invitees will present recent discoveries in visual representation, explaining the generality of their approach and how it might be applicable to future studies. Invitees are encouraged to discuss the theoretical underpinnings of their approach and its criterion for “success”. Invitees are also encouraged to provide practical pointers, e.g. regarding stimulus selection, experimental design, and data analysis. Through this forum we hope to move towards an integrative approach that can be shared across experimental paradigms. Audience: This symposium will appeal to researchers interested in computational approaches to understanding the visual system. The symposium is expected to draw interest from a broad range of experimental backgrounds (e.g. fMRI, EEG, ECoG, electrophysiology). Invitees: The invitees will consist of investigators who have conducted pioneering work in computational approaches to studying visual representation.

Presentations

Visual representation in the absence of retinal input

Speaker: Thomas Naselaris; Department of Neurosciences, Medical University of South Carolina, Charleston, SC

An important discovery of the last two decades is that receptive fields in early visual cortex provide an efficient basis for generating images that have the statistical structure of natural scenes. This discovery has lent impetus to the theory that receptive fields in early visual cortex can function not only as passive filters of retinal input, but as mechanisms for generating accurate representations of the visual environment that are independent of retinal input. A number of theoretical studies argued that such internal visual representations could play an important functional role in vision by supporting probabilistic inference. In this talk, we will explore the idea of receptive fields as generators of internal representations by examining the role that receptive fields play in generating mental images. Mental images are the canonical form of internal visual representation: they are independent of retinal input and appear to be essential for many forms of inference. We present evidence from fMRI studies that voxel-wise receptive field models of the tuning to retinotopic location, orientation, and spatial frequency can account for much of the BOLD response in early visual cortex to imagining previously memorized works of art. We will discuss the implications of this finding for the structure of functional feedback projections to early visual cortex, and for the development of brain-machine interfaces that are driven by mental imagery.

Learning and comparison of visual feature representations

Speaker: Marcel van Gerven; Donders Institute for Brain, Cognition and Behaviour

Recent developments on the encoding and decoding of visual stimuli have relied on different feature representations such as pixel-level, Gabor wavelet or semantic representations. In previous work, we showed that high-quality reconstructions of images can be obtained via the analytical inversion of regularized linear models operating on individual pixels. However, such simple models do not account for the complex nonlinear transformations of sensory input that take place in the visual hierarchy. I will argue that these nonlinear transformations can be estimated independent of brain data using statistical approaches. Decoding based on the resulting feature space is shown to yield better results than those obtained using a hand-designed feature space based on Gabor wavelets. I will discuss how alternative feature spaces that are either learned or hand-designed can be compared with one another, thereby providing insight into what visual information is represented where in the brain. Finally, I will present some recent encoding and decoding results obtained using ultra-high field MRI.

Identifying the nonlinearities used in extrastriate cortex

Speaker: Kendrick Kay; Department of Psychology, Washington University in St. Louis

In this talk, I will discuss recent work in which I used fMRI measurements to develop models of how images are represented in human visual cortex. These models consist of specific linear and nonlinear computations and predict BOLD responses to a wide range of stimuli. The results highlight the importance of certain nonlinearities (e.g. compressive spatial summation, second-order contrast) in explaining responses in extrastriate areas. I will describe important choices made in the development of the approach regarding stimulus design, experimental design, and analysis. Furthermore, I will emphasize (and show through examples) that understanding representation requires a dual focus on abstraction and specificity. To grasp complex systems, it is necessary to develop computational concepts, language, and intuition that can be applied independently of data (abstraction). On the other hand, a model risks irrelevance unless it is carefully quantified, implemented, and systematically validated on experimental data (specificity).

Carving up the ventral stream with controlled naturalistic stimuli

Speaker: Jeremy Freeman; HHMI Janelia Farm Research Campus
Authors: Corey M. Ziemba, J. Anthony Movshon, Eero P. Simoncelli, and David J. Heeger Center for Neural Science New York University, New York, NY

The visual areas of the primate cerebral cortex provide distinct representations of the visual world, each with a distinct function and topographic representation. Neurons in primary visual cortex respond selectively to orientation and spatial frequency, whereas neurons in inferotemporal and lateral occipital areas respond selectively to complex objects. But the areas in between, in particular V2 and V4, have been more difficult to differentiate on functional grounds. Bottom-up receptive field mapping is ineffective because these neurons respond poorly to artificial stimuli, and top-down approaches that employ the selection of “interesting” stimuli suffer from the curse of dimensionality and the arbitrariness of the stimulus ensemble. I will describe an alternative approach, in which we use the statistics of natural texture images and computational principles of hierarchical coding to generate controlled, but naturalistic stimuli, and then use these images as targeted experimental stimuli in electrophysiological and fMRI experiments. Responses to such “naturalistic” stimuli reliably differentiate neurons in area V2 from those in V1, in both single-units recorded from macaque monkey, and in humans as measured using fMRI. In humans, responses to these stimuli, alongside responses to both simpler and more complex stimuli, suggest a simple functional account of the visual cortical cascade: Whereas V1 encodes basic spectral properties, V2, V3, and to some extent V4 represent the higher-order statistics of textures. Downstream areas capture the kinds of global structures that are unique to images of natural scenes and objects.

Vision as transformation of representational geometry

Speaker: Nikolaus Kriegeskorte; Medical Research Council, Cognition and Brain Sciences Unit, Cambridge, UK

Vision can be understood as the transformation of representational geometry from one visual area to the next, and across time, as recurrent dynamics converge within a single area. The geometry of a representation can be usefully characterized by a representational distance matrix computed by comparing the patterns of brain activity elicited a set of visual stimuli. This approach enables to compare representations between brain areas, between different latencies after stimulus onset, between different individuals and between brains and computational models. I will present results from human functional imaging of early and ventral-stream visual representations. Results from fMRI suggest that the early visual image representation is transformed into an object representation that emphasizes behaviorally important categorical divisions more strongly than accounted for by visual-feature computational models that are not explicitly optimized to distinguish categories. The categorical clusters appear to be consistent across individual human brains. However, the continuous representational space is unique to each individual and predicts individual idiosyncrasies in object similarity judgements. The representation flexibly emphasizes task-relevant category divisions through subtle distortions of the representational geometry. MEG results further suggest that the categorical divisions emerge dynamically, with the latency of categoricality peaks suggesting a role for recurrent processing.

Modern population approaches for discovering neural representations and for discriminating among algorithms that might produce those representations.

Speaker: James J. DiCarlo, MD, PhD; Professor of Neuroscience Head, Department of Brain and Cognitive Sciences Investigator, McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge, USA
Authors: Ha Hong and Daniel Yamins Department of Brain and Cognitive Sciences and McGovern Institute for Brain Research Massachusetts Institute of Technology, Cambridge, USA

Visual object recognition (OR) is a central problem in systems neuroscience, human psychophysics, and computer vision. The primate ventral stream, which culminates in inferior temporal cortex (IT), is an instantiation of a powerful OR system. To understand this system, our approach is to first drive a wedge into the problem by finding the specific patterns of neuronal activity (a.k.a. neural “representations”) that quantitatively express the brain’s solution to OR. I will argue that, to claim discovery of a neural “representation” for OR, one must show that a proposed population of visual neurons can perfectly predict psychophysical phenomena pertaining to OR. Using simple decoder tools, we have achieved exactly this result, demonstrating that IT representations (as opposed to V4 representations) indeed predict OR phenomena. Moreover, we can “invert” the decoder approach to use large-scale psychophysical measurements to make new, testable predictions about the IT representation. While decoding methods are powerful for exploring the link between neural activity and behavior, they are less well suited for addressing how pixel representations (i.e. images) are transformed into neural representations that subserve OR. To address this issue, we have adopted the representational dissimilarity matrices (RDM) approach promoted by Niko Kriegeskorte. We have recently discovered novel models (i.e. image-computable visual features) that, using the RDM measure of success, explain IT representations dramatically better than all previous models.

< Back to 2014 Symposia