S5 – Prediction in Visual Processing

S5 – Prediction in Visual Processing

Friday, May 6, 2:30 – 4:30 pm, Royal Ballroom 4-5

Organizers: Jacqueline M. Fulvio, Paul R. Schrater; University of Minnesota

Presenters: Jacqueline M. Fulvio, University of Minnesota; Antonio Torralba, Massachusetts Institute of Technology; Lars Muckli, University of Glasgow, UK; Eileen Kowler, Rutgers University; Doug Crawford, York University; Robert A. Jacobs, University of Rochester

Symposium Description

In a world constantly in flux, we are faced with uncertainty about the future and must make predictions about what lies ahead. However, research on visual processing is dominated by understanding information processing rather than future prediction – it lives in the present (and sometimes the past) without considering what lies ahead.

Yet prediction is commonplace in natural vision. In walking across a busy street in New York City, for example, successful prediction means both the life or death of the pedestrian and the employment status of the cab driver.

In fact, prediction plays an important role in almost all aspects of vision with a dynamic component, including object interception, eye-movement planning, visually-guided reaching, visual search, and rapid decision-making under risk, and is implicit in “top-down” processing in the interpretation of static images (e.g. object recognition, shape from shading, etc.). Prediction entails combining current sensory information with an internal model (“beliefs”) of the world to fill informational gaps and derive estimates of the world’s future “hidden” state. Naturally, the success of the prediction is limited by the quality of the information and the internal model. This has been demonstrated by a variety of behaviors described above.

The symposium will focus on the importance of analyzing the predictive components of human behavior to understand visual processing in the brain. The prevalence of prediction suggests there may be a commonality in both computational and neural structures supporting it. We believe that many problems in vision can be profitably recast in terms of models of prediction, providing new theoretical insights and potential transfer of knowledge.

Speakers representing a variety of research areas will lead a discussion under the umbrella of prediction that (i) identifies characteristics and limitations of predictive behavior; (ii) re-frames outstanding questions in terms of predictive modeling; & (iii) outlines experimental manipulations of predictive task components for future work. The symposium is expected to spark interest among all areas represented at the conference with the goal of group discovery of a common set of predictive principles used by the brain as the discussion unfolds.


Predictive processing through occlusion

Jacqueline M. Fulvio, University of Minnesota; Paul R. Schrater, University of Minnesota

Missing information is a challenge for sensory motor processing. Missing information is ubiquitous – portions of sensory data may be occluded due to conditions like scene clutter and camouflage; or missing at the present time – task demands may require anticipation of future states, such as when we negotiate a busy intersection. Rather than being immobilized by missing information, predictive processing fills in the gaps so we may continue to act in the world. While much of perceptual-motor research implicitly studies predictive processing, a specific set of predictive principles used by the brain has not been adequately formalized. I will draw upon our recent work on visual extrapolation, which requires observers to predict an object’s location behind an occluder as well as its reemergence point. Through the results, I will demonstrate that these predictions are derived from model-based forward look ahead—current sensory data is applied to an internal model of the world. I will also show that predictions are subject to performance trade-offs, such that the choice of internal model may be a flexible one that appropriately weights the quality (i.e. uncertainty) of the sensory measurements and the quality (i.e. complexity) of the internal model. Finally, having established the role of internal models in prediction, I will conclude with a discussion about how prediction may be used as a tool in the experimental context to encourage general model learning, with evidence from our recent work on perceptual learning.

Predicting the future

Antonio Torralba, Massachusetts Institute of Technology; Jenny Yuen, Massachusetts Institute of Technology

In this talk I will make a link with computer vision and recent techniques for addressing the problem of predicting the future. Some of the representations to address this problem in computer vision are reminiscent of current views on scene understanding in humans. When given a single static picture, humans can not only interpret the instantaneous content captured by the image, but also they are able to infer the chain of dynamic events that are likely to happen in the near future. Similarly, when a human observes a short video, it is easy to decide if the event taking place in the video is normal or unexpected, even if the video depicts a an unfamiliar place for the viewer. This is in contrast with work in computer vision, where current systems rely on thousands of hours of video recorded at a single place in order to identify what constitutes an unusual event. In this talk I will discuss techniques for predicting the future based on a large collection of stored memories. We show how, relying on large collections of videos, using global images features, such as the ones used to model fast scene recognition, we can index events stored in memory similar to the query, and how we can build a simple model of the distribution of expected motions. Consequently, the model can make predictions of what is likely to happen in the future, as well as evaluate how unusual is a particular event.

Predictive coding – contextual processing in primary visual cortex V1

Lars Muckli, University of Glasgow, UK; Petra Vetter, University of Glasgow, UK; Fraser Smith, University of Glasgow, UK

Primary visual cortex (V1) is often characterized by the receptive field properties of its feed-forward input. Direct thalamo-fugal input to any V1 cell however, is less than 5 % (Douglas and Martin 2007), and much of V1 response variance remains unexplained. We propose that one of the core functions of cortical processing is to predict upcoming events based on contextual processing. To gain a better understanding of contextual processing in the cortex we focused our fMRI studies on non-stimulated retinotopic regions of early visual cortex (2). We investigated activation along the non-stimulated long-range apparent motion path (1), occluded a visual quarterfield of a natural visual scene (3), or blindfolded our subjects and presented environmental sounds (4). We were able to demonstrate predictive activity along the illusory apparent motion path (1), use decoding to classify natural scenes from non-stimulated regions in V1 (3), and to decode environmental sounds from V2 and V3, but not from V1 (4). Is this contextual processing useful to predict upcoming visual events? To investigate predictability we used our contextual stimuli (apparent motion) as the prime stimuli and tested with a probe stimulus along the apparent motion path to find that predicted stimuli are processed more efficiently – leading to less fMRI signal and better detectability (1). In summary, we have found brain imaging evidence that is consistent with the hypothesis of predictive coding in early visual areas.

Prediction in oculomotor control

Eileen Kowler, Rutgers University; Cordelia Aitkin, Rutgers University; Elio Santos, Rutgers University; John Wilder, Rutgers University

Eye movements are crucial for vision. Saccadic eye movements bring the line of sight to selected objects, and smooth pursuit maintains the line of sight on moving objects. A major potential obstacle to achieving accurate and precise saccadic or pursuit performance is the inevitable sensorimotor delay that accompanies the processing of the position or motion of visual signals.  To overcome the deleterious effects of such delays, eye movements display a remarkable capacity to respond on the basis of predicted sensory signals. Behavioral and neurophysiological studies over the past several years have addressed the mechanisms responsible for predictive eye movements. This talk will review key developments, focusing on anticipatory smooth eye movements (smooth eye movements in the direction of the expected future motion of a target).  Anticipatory smooth eye movements (a) can be triggered by high-level, symbolic cues that signal the future path of a target, and (b) are generated by neural pathways distinct from those responsible for maintained smooth pursuit. When the predictability of the target motion decreases, anticipatory smooth eye movements are not suppressed, but rather reflect expectations about the likely future path of the target estimated on the basis of the recent past history of motions.  Comparable effects of expectations have been shown to apply to the temporal pattern of saccades. The pervasive influence of prediction on oculomotor control suggests that one of the more important benefits of the ability to generate predictions from either explicit cues or statistical estimates is to ensure accurate and timely oculomotor performance.

Calculation of accurate 3-D reach commands from initial retinal and extra-retinal conditions

Doug Crawford, York University; Gunnar Blohm, Queen’s University

Reach movements can be guided in ‘closed loop’ fashion, using visual feedback, but in biological systems such feedback is relatively slow. Thus rapid movements require ‘open loop’ transformations based on initial retinal and extra-retinal conditions. This is complicated, because the retina is attached to the interior surface of a sphere (the eye) that rotates three-dimensionally with respect to the world, the other eye, and effectors such as the reach system. Further, head movement causes the eyes to translate with respect to both the visual world and the shoulder. Optimism continues to abound that linear approximations will capture the main properties of this system (i.e., most visuomotor studies implicitly treat the retina as a flat, shifting plane), but unfortunately this ignores several fundamentals that the real brain must deal with. Amongst these is the need for eye and head orientation signals to solve the spatial relationships between patterns of stimulation on the two retinas (for depth vision) and between the external world and motor effectors. Here we will describe recent efforts to 1) understand the geometric problems that the brain encounters in planning reach, 2) determine if the brain actually solves these problems, and 3) model how the brain might solve these problems.

Are People Successful at Learning Sequences of Actions on a Perceptual Matching Task?

Robert A. Jacobs, University of Rochester; Reiko Yakushijin, Aoyama Gakuin University

Human subjects were trained to perform a perceptual matching task requiring them to manipulate comparison objects until they matched target objects using the fewest manipulations possible. Efficient performance of this task requires an understanding of the hidden or latent causal structure governing the relationships between actions and perceptual outcomes. We use two benchmarks to evaluate the quality of subjects’ learning. One benchmark is based on optimal performance as calculated by a dynamic programming procedure. The other is based on an adaptive computational agent that uses a reinforcement learning method known as Q-learning to learn to perform the task. Our analyses suggest that subjects were indeed successful learners. In particular, they learned to perform the perceptual matching task in a near-optimal manner (i.e., using a small number of manipulations) at the end of training. Subjects were able to achieve near- optimal performance because they learned, at least partially, the causal structure underlying the task. In addition, subjects’ performances were broadly consistent with those of model-based reinforcement learning agents that built and used internal models of how their actions influenced the external environment. On the basis of these results, we hypothesize that people will achieve near-optimal performances on tasks requiring sequences of actions — especially sensorimotor tasks with underlying latent causal structures — when they can detect the effect of their actions on the environment, and when they can represent and reason about these effects using an internal mental model.