Integrative mechanisms for 3D vision: combining psychophysics, computation and neuroscience

Integrative mechanisms for 3D vision: combining psychophysics, computation and neuroscience

Friday, May 7, 1:00 – 3:00 pm
Royal Ballroom 1-3

Organizers: Andrew Glennerster, University of Reading

Presenters: Roland W. Fleming (Max Planck Institute for Biological Cybernetics), James T Todd (Department of Psychology, Ohio State University), Andrew Glennerster (University of Reading), Andrew E Welchman (University of Birmingham), Guy A Orban (K.U. Leuven), Peter Janssen (K.U. Leuven)

Symposium Description

Estimating the three-dimensional (3D) structure of the world around us is a central component of our everyday behavior, supporting our decisions, actions and interactions. The problem faced by the brain is classically described in terms of the difficulty of inferring a 3D world from (“ambiguous”) 2D retinal images. The computational challenge of inferring 3D depth from retinal samples requires sophisticated neural machinery that learns to exploit multiple sources of visual information that are diagnostic of depth structure. This sophistication at the input level is demonstrated by our flexibility in perceiving shape under radically different viewing situations. For instance, we can gain a vivid impression of depth from a sparse collection of seemingly random dots, as well as from flat paintings. Adding to the complexity, humans exploit depth signals for a range of different behaviors, meaning that the input complexity is compounded by multiple functional outputs. Together, this poses a significant challenge when seeking to investigate empirically the sequence of computations that enable 3D vision.

This symposium brings together speakers from different perspectives to outline progress in understanding 3D vision. Fleming will start, addressing the question of “What is the information?”, using computational analysis of 3D shape to highlight basic principles that produce depth signatures from a range of cues. Todd and Glennerster will both consider the question of “How is this information represented?”, discussing different types of representational schemes and data structures. Welchman, Orban and Janssen will focus on the question of “How is it implemented in cortex?”. Welchman will discuss human fMRI studies that integrate psychophysics with concurrent measures of brain activity. Orban will review fMRI evidence for spatial correspondence in the processing of different depth cues in the human and monkey brain. Janssen will summarize results from single cell electrophysiology, highlighting the similarities and differences between the processing of 3D shape at the extreme ends of the dorsal and ventral pathways. Finally, Glennerster, Orban and Janssen will all address the question of how depth processing is affected by task.

The symposium should attract a wide range of VSS participants, as the topic is a core area of vision science and is enjoying a wave of public enthusiasm with the revival of stereoscopic entertainment formats. Further, the goal of the session in linking computational approaches to behavior to neural implementation is one that is scientifically attractive.


From local image measurements to 3D shape

Roland W. Fleming, Max Planck Institute for Biological Cybernetics

There is an explanatory gap between the simple local image measurements of early vision, and the complex perceptual inferences involved in estimating object properties such as surface reflectance and 3D shape.  The main purpose of my presentation will be to discuss how populations of filters tuned to different orientations and spatial frequencies can be ‘put to good use’ in the estimation of 3D shape.  I’ll show how shading, highlights and texture patterns on 3D surfaces lead to highly distinctive signatures in the local image statistics, which the visual system could use in 3D shape estimation.  I will discuss how the spatial organization of these measurements provides additional information, and argue that a common front end can explain both similarities and differences between various monocular cues.  I’ll also present a number of 3D shape illusions and show how these can be predicted by image statistics, suggesting that human vision does indeed make use of these measurements.

The perceptual representation of 3D shape

James T Todd, Department of Psychology, Ohio State University

One of the fundamental issues in the study of 3D surface perception is to identify the specific aspects of an object’s structure that form the primitive components of an observer’s perceptual knowledge.  After all, in order to understand shape perception, it is first necessary to define what ”shape” is.  In this presentation, I will assess several types of data structures that have been proposed for representing 3D surfaces.   One of the most common data structures employed for this purpose involves a map of the geometric properties in each local neighborhood, such as depth, orientation or curvature. Numerous experiments have been performed in which observers have been required to make judgments of local surface properties, but the results reveal that these judgments are most often systematically distorted relative to the ground truth and surprisingly imprecise, thus suggesting that local property maps may not be the foundation of our perceptual knowledge about 3D shape.  An alternative type of data structure for representing 3D shape involves a graph of the configural relationships among qualitatively distinct surface features, such as edges and vertices. The psychological validity of this type of representation has been supported by numerous psychophysical experiments, and by electrophysiological studies of macaque IT. A third type of data structure will also be considered in which surfaces are represented as a tiling of qualitatively distinct regions based on their patterns of curvature, and there is some neurophysiological evidence to suggest that this type of representation occurs in several areas of the primate cortex.

View-based representations and their relevance to human 3D vision

Andrew Glennerster, School of Psychology and CLS, University of Reading

In computer vision, applications that previously involved the generation of 3D models can now be achieved using view-based representations. In the movie industry this makes sense, since both the inputs and outputs of the algorithms are images, but the same could also be argued of human 3D vision. We explore the implications of view-based models in our experiments.

In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene using known baseline information from interocular separation or proprioception. If, on the other hand, observers use a view-based representation to guide their actions, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk.

In the same context, I will discuss psychophysical evidence on sensitivity to depth relief with respect to surfaces. The data are compatible with a hierarchical encoding of position and disparity similar to the affine model of Koenderink and van Doorn (1991).  Finally, I will discuss two experiments that show how changing the observer’s task changes their performance in a way that is incompatible with the visual system storing a 3D model of the shape or location of objects. Such task-dependency indicates that the visual system maintains information in a more ‘raw’ form than a 3D model.

The functional roles of visual cortex in representing 3D shape

Andrew E Welchman, School of Psychology, University of Birmingham

Estimating the depth structure of the environment is a principal function of the visual system, enabling many key computations, such as segmentation, object recognition, material perception and the guidance of movements. The brain exploits a range of depth cues to estimate depth, combining information from shading and shadows to linear perspective, motion and binocular disparity. Despite the importance of this process, we still know relatively little about the functional roles of different cortical areas in processing depth signals in the human brain. Here I will review recent human fMRI work that combines established psychophysical methods, high resolution imaging and advanced analysis methods to address this question. In particular, I will describe fMRI paradigms that integrate psychophysical tasks in order to look for a correspondence between changes in behavioural performance and fMRI activity. Further, I will review information-based fMRI analysis methods that seek to investigate different types of depth representation in parts of visual cortex. This work suggests a key role for a confined ensemble of dorsal visual areas in the processing information relevant to judgments of 3D shape.

Extracting depth structure from multiple cues

Guy A Orban, K.U. Leuven

Multiple cues provide information about the depth structure of objects: disparity, motion and shading and texture. Functional imaging studies in humans have been preformed to localize the regions involved in extracting depth structure from these four cues. In all these studies extensive controls were used to obtain activation sites specific for depth structure. Depth structure from motion, stereo and texture activates regions in both parietal and ventral cortex, but shading only activates a ventral region. For stereo and motion the balance between dorsal and ventral activation depends on the type of stimulus: boundaries versus surfaces. In monkey results are similar to those obtained in humans except that motion is a weaker cue in monkey parietal cortex. At the single cell level neurons are selective for gradients of speed, disparity and texture. Neurons selective for first and second order gradients of disparity will be discussed by P Janssen. I will concentrate on neurons selective for speed gradients and review recent data indicating that a majority of FST neurons is selective for second order speed gradients.

Neurons selective to disparity defined shape in the temporal and parietal cortex

Peter Janssen, K.U. Leuven; Bram-Ernst Verhoef, KU Leuven

A large proportion of the neurons in the rostral lower bank of the Superior Temporal Sulcus, which is part of IT, respond selectively to disparity-defined 3D shape (Janssen et al., 1999; Janssen et al., 2000). These IT neurons preserve their selectivity for different positions-in-depth, which proves that they respond to the spatial variation of disparity along the vertical axis of the shape (higher-order disparity selectivity). We have studied the responses of neurons in parietal area AIP, the end stage of the dorsal visual stream and crucial for object grasping, to the same disparity-defined 3D shapes (Srivastava et al., 2009). In this presentation I will review the differences between IT and AIP in the neural representation of 3D shape. More recent studies have investigated the role of AIP and IT in the perceptual discrimination of 3D shape using simultaneous recordings of spikes and local field potentials in the two areas, psychophysics and reversible inactivations. AIP and IT show strong synchronized activity during 3D-shape discrimination, but only IT activity correlates with perceptual choice. Reversible inactivation of AIP produces a deficit in grasping but does not affect the perceptual discrimination of 3D shape. Hence the end stages of both the dorsal and the ventral visual stream process disparity-defined 3D shape in clearly distinct ways. In line with the proposed behavioral role of the two processing streams, the 3D-shape representation in AIP is action-oriented but not crucial for 3D-shape perception.