Cortical organization and dynamics for visual perception and beyond

Cortical organization and dynamics for visual perception and beyond

Friday, May 9, 2008, 1:00 – 3:00 pm Royal Palm 4

Organizer: Zoe Kourtzi (University of Birmingham)

Presenters: Martin I. Sereno (UCL and Birkbeck, London), Uri Hasson (New York University), Wim Vanduffel (Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, and Laboratorium voor Neurofysiologie en Psychofysiologie, K.U. Leuven Medical School, Campus Gasthuisberg, Belgium.), Charles E. Connor (Johns Hopkins University School of Medicine), Geoffrey M. Boynton (University of Washington), Pieter R. Roelfsema (Netherlands Institute for Neuroscience)

Symposium Description

The symposium aims to showcase state-of-the-art work and methods for studying the cortical dynamics that mediate complex and adaptive behaviours.

Extensive work in anatomy, neurophysiology and brain imaging has approached this challenge by studying the topography and neural function of discrete cortical structures in the human and non-human primate brain. This approach has been very successful in generating a roadmap of the primate brain: identifying a large number of different cortical areas associated with different functions and cognitive abilities. However, understanding how the brain generates complex and adaptive behaviours entails extending beyond isolated cortical centres and investigating the spatio-temporal dynamics that underlie information processing within and across cortical networks.

Recent developments in multi-site neurophysiological recordings and stimulation combined with advances in brain imaging have provided powerful methods for studying cortical circuits and novel insights into cortical dynamics.

The symposium will bring together pioneers in the study of cortical circuits in the human and the monkey brain and combine evidence from interdisciplinary approaches: physiology, imaging, computational modelling.

First we will present brain imaging work that characterizes the common principles of spatial and temporal organization across and beyond the human visual cortex (Sereno, Hasson). Second, we will discuss studies that delineate the causal interactions within these cortical circuits combining fMRI and microstimulation (Vanduffel). Third, we will discuss neurophysiological evidence for the functional role of these spatiotemporal interactions in the integration of sensory information to global percepts for visual recognition and actions (Connor). Fourth, we will present brain imaging work showing that cortical circuits adapt to the task demands and the attentional state of the observer (Boynton). Finally, we will present computational approaches investigating how attention and learning shape interactions within cortical circuits for adaptive behaviour (Roelfsema).

Thus, the symposium will serve as a forum for discussing novel evidence on cortical organization and dynamics emerging from current human and animal research and a tutorial for interdisciplinary state-of-the-art methods for research in this field. As such, the symposium will target a broad audience of researchers and students in the vision sciences society interested in understanding the link between brain and behaviour.

Abstracts

Finding the parts of the cortex

Martin I. Sereno

Understanding brain dynamics requires knowing what its parts are. Human neuroimaging has attempted that using contrasts between high level cognitive tasks averaged across subjects in 3-D. Two problems are: (1) higher level tasks generate activity in multiple cortical areas, some of which adjoin each other, and (2) cross-subject 3-D averages must use blurring kernels close to the modal size of human cortical areas (1 cm) to overcome anatomical variation and variation in how subjects perform tasks. Even liberal statistical thresholds underestimate the area of cortex involved and activation borders only accidentally represent cortical area borders.

Another way to subdivide cortex is to find receptotopic (retinotopic, tonotopic, somatotopic) maps. Topological retinal maps were expected in V1 and early secondary visual areas based on non-human primate data. However, recent work in parietal, temporal, cingulate, and frontal cortex shows that these maps are present at higher levels, extending to the boundaries between modalities (e.g., VIP). This was not expected on the basis of work in animals because higher areas have larger receptive fields with a substantial degree of scatter. Independent manipulation of stimulus and attention shows that higher level maps are largely maps of attention. Three possible reasons why spatial maps might persist at high levels are: (1) intracortical connections are overwhelmingly local, (2) sensory space (retinal, frequency, skin position) is the most important feature for distinguishing events, and (3) cortical space remains a convenient way to allocate processing, even if it is not explicitly spatial.

A hierarchy of temporal receptive windows in human cortex

Uri Hasson, Eunice Yang, Ignacio Vallines, David Heeger, and Nava Rubin

Real-world events unfold at different time scales, and therefore cognitive and neuronal processes must likewise occur at different time scales. We present a novel procedure that identifies brain regions responsive to sensory information accumulated over different time scales. We measured fMRI activity while observers viewed silent films presented forward, backward, or piecewise-scrambled in time. In a first experiment, responses to backward presentations were time-reversed and correlated with those to forward presentations. In visual cortex, this yielded high correlation values, indicating responses were driven by stimulation over short time scales. In contrast, responses depended strongly on time-reversal in the Superior Temporal Sulcus (STS), Precuneus, posterior Lateral Sulcus (LS), Temporal Parietal Junction (TPJ) and Frontal Eye Field (FEF). These regions showed highly reproducible responses for repeated forward, but not backward presentations. In a second experiment, stimulus time scale was parametrically varied by shuffling the order of segments from the same films. The results show clear differences in temporal characteristics, with LS, TPJ and FEF responses depending on information accumulated over longer durations (~ 36 s) than STS and Precuneus (~12 s). We conclude that, similar to the known cortical hierarchy of spatial receptive fields, there is a hierarchy of progressively longer temporal receptive windows in the human brain.

Investigating causal functional interactions between brain regions by combining fMRI and intracortical electrical microstimulation in awake behaving monkeys

Wim Vanduffel

Areas of the frontal and parietal cortex are thought to exert control over information flow in the visual cortex through feedback signals (Kastner and Ungerleider, 2000; Moore, 2003). Although a plethora of studies provided correlation data to support this hypothesis, corroborating causal evidence is virtually absent (but see e.g. Moore and Armstrong, 2003). Also, several models suggest that the frontal signals modulating incoming sensory activity are gated by bottom-up stimulation (van der Velde and de Kamps, 2001; Roelfsema, 2006). To test these models and examine the spatial organization of any observed modulations, we developed a combination of fMRI (Vanduffel et al. 2001) and chronic electrical microstimulation (EM) in awake, behaving monkeys. This approach allowed us to investigate the impact of increased frontal eye field (FEF) output, using biologically relevant currents, on visually-driven responses throughout occipito-temporal cortex.

Activity in higher-order visual areas, monosynaptically connected to the FEF, was strongly modulated in the absence of visual stimulation, shwoing that the combination of fMRI with EM holds great potential as in-vivo tractography tool (see also Tolias et al. 2005). Activity in early visual areas, however, could only be modulated in the presence of bottom-up stimulation, resulting in a topographically specific pattern of enhancement and suppression. This result suggests that bottom-up activation of recurrent connections is needed to enable top-down modulation in visual cortex. We furthermore uncovered a potentially new subdivision in many areas of the visual cortex, as the regions with strong visual responses are largely separate from regions influenced by feedback.

Spatiotemporal integration of object structure information

Charles E. Connor

Image representation in early visual cortex is extremely local. Object perception depends on spatial integration of this local information by neurons at later cortical stages processing larger image regions. We have studied the spatial and temporal characteristics of this integration process at multiple cortical stages in the macaque monkey. We have found that neurons in area V4 integrate across local changes in boundary orientation (a first-order derivative) to derive curvature (a second-order derivative). V4 neurons also integrate across position and binocular disparity to derive 3D orientation. At the next processing stage in posterior inferotemporal cortex (PIT), neurons integrate across spatially disjoint object boundary regions to derive more complex, larger-scale shape configurations. At still higher processing stages in central and anterior IT, neurons derive more complete boundary configurations with potential ecological relevance. CIT/AIT neurons also integrate disparity and shading information to derive surface and volumetric elements of 3D object structure. These integration mechanisms are largely linear at early time points, producing ambiguous representations of object structure. Over the course of approximately 50 ms, presumably through recursive intracortical processing, nonlinear selectivity gradually emerges, producing more explicit signals for specific combinations of structural elements.

Feature-Based Attention in Human Visual Cortex

John Serences and Geoffrey M. Boynton

The spatial resolution of functional MRI makes it ideal for studying the effects of spatial attention on responses in the human visual cortex:  with fMRI we can trace the enhancement of the BOLD signal in regions that are retinotopically associated with the spatial location of the attentional spotlight. Studying the effects of feature-based attention is more difficult because the columnar organization of visual features such as direction of motion and orientation are too small for traditional fMRI experiments. However, recent developments in pattern classification algorithms by Kamitani and Tong (2006) have allowed researchers to investigate these feature-based attentional effects by studying how the pattern of fMRI responses within a visual area is affected by changes in the physical and attended feature. I will present the results of two studies in which we have applied these methods to show that (1) in all early visual areas, feature-based attention for direction of motion spreads across to unattended locations of the visual field, and (2) only area MT+ (and possibly V3A) represent the perceived, rather than the physical direction of motion. These results provide evidence that the early stages of the visual system respond more than just to the bottom-up stimulus properties. Instead, the cortical circuitry adapts to the task demands and attentional state of the observer.

How attentional feedback guides learning of sensory representations

Aurel Wannig and Pieter R. Roelfsema

I will describe our new theory, AGREL (attention-gated reinforcement learning; Roelfsema & van Ooyen, 2005), which proposes a new role for feedback connections in learning. We aim to understand the neuronal plasticity that underlies learning in classification tasks and test the predictions of our theory using a multilayer neural network. Stimuli are presented to the lowest layer representing a sensory area of the cortex.

Activity is then propagated to the highest layer representing the motor cortex, which has to choose one out of a number of actions that correspond to the various stimulus categories. Neurons in the highest layer engage in a competition for action selection. A reward is delivered if this action is correct, and no reward is delivered in case of an error. On erroneous trials the correct action is not revealed to the network. The distinguishing feature of AGREL is that the neurons that win the competition in the motor cortex feed back to lower layers, just as is observed for attentional effects in neurophysiology. This attentional feedback signal gates synaptic plasticity at lower layers in the network so that only neurons receiving feedback change their synapses. i.e. the attentional feedback acts as a credit assignment signal. We show that the feedback signal makes reinforcement learning as powerful as previous non-biological learning schemes, such as error-backpropagation. Moreover, we demonstrate that AGREL changes the tuning of sensory neurons in just the same way as is observed in the visual cortex of monkeys that are trained in categorization tasks.

 

 

Perceptual expectations and the neural processing of complex images

Perceptual expectations and the neural processing of complex images

Friday, May 9, 2008, 1:00 – 3:00 pm Royal Palm 6-8

Organizer: Bharathi Jagadeesh (University of Washington)

Presenters: Moshe Bar (Harvard Medical School), Bharathi Jagadeesh (University of Washington), Nicholas Furl (University College London), Valentina Daelli (SISSA), Robert Shapley (New York University)

Symposium Description

The processing of complex images occurs within the context of prior expectations and of current knowledge about the world. A clue about an image, “think of an elephant”, for example, can cause an otherwise nonsensical image to transform into a meaningful percept. The informative clue presumably activates the neural substrate of an expectation about the scene that allows the visual stimulus representation to be more readily interpreted. In this symposium we aim to discuss the neural mechanisms that underlie the use of clues and context to assist in the interpretation of ambiguous stimuli. The work of five laboratories, using imaging, single-unit recording, MEG, psychophysics, and network models of visual processes all show evidence of the impact of prior knowledge on the processing of visual stimuli.

In the work of Bar, we see evidence that a short latency neural response may be induced in higher level cortical areas by complex signals traveling through a fast visual pathway. This pathway may provide the neural mechanism that modifies the processing of visual stimuli as they stream through the brain. In the work of Jagadeesh, we see a potential effect of that modified processing: neural selectivity in inferotemporal cortex is sufficient to explain performance in a classification task with difficult to classify complex images, but only when the images are evaluated in a particular framed context: Is the image A or B (where A or B are photographs, for example a horse and a giraffe). In the work of Furl, human subjects were asked to classify individual exemplars of faces along a particular dimension (emotion), and had prior experience with the images in the form of an adapting stimulus. In this context, classification is shifted away from the adapting stimulus. Simultaneously recorded MEG activity shows evidence reentrant signal, induced by the prior experience of the prime, that could explain the shift in classification. In the work of Treves, we see examples of networks that reproduce the observed late convergence of neural activity onto the response to an image stored in memory, and that can simulate mechanisms possibly underlying predictive behavior. Finally, in the work of Shapley, we see that simple cells in layer 2/3 of V1 (a major input layer for intra-cortical connections) paradoxically show dynamic nonlinearities.

The presence of a dynamic nonlinearity in the responses of V1 simple cells indicates that first-order analyses often capture only a fraction of neuronal behavior, a consideration with wide ranging implications for the analysis in visual responses in more advanced cortical areas. Signals provided by expectation might influence processing throughout the visual system to bias the perception and neural processing of the visual stimulus in the context of that expectation.

The work to be described is of significant scientific merit and reflects recent work in the field; it is original, forcing re-examination of the traditional view of vision as a method of extracting information from the visual scene in the absence of contextual knowledge, a topic of broad interest to those studying visual perception.

Abstracts

The proactive brain: using analogies and associations to generate predictions

Moshe Bar

Rather than passively ‘waiting’ to be activated by sensations, it is proposed that the human brain is continuously busy generating predictions that approximate the relevant future. Building on previous work, this proposal posits that rudimentary information is extracted rapidly from the input to derive analogies linking that input with representations in memory.

The linked stored representations then activate the associations that are relevant in the specific context, which provides focused predictions. These predictions facilitate perception and cognition by pre-sensitizing relevant representations. Predictions regarding complex information, such as those required in social interactions, integrate multiple analogies. This cognitive neuroscience framework can help explain a variety of phenomena, ranging from recognition to first impressions, and from the brain’s ‘default mode’ to a host of mental disorders.

Neural selectivity in inferotemporal cortex during active classification of photographic images

Bharathi Jagadeesh

Images in the real world are not classified or categorized in the absence of expectations about what we are likely to see. For example, giraffes are quite unlikely to appear in one’s environment except in Africa. Thus, when an image is viewed, it is viewed within the context of possibilities about what is likely to appear. Classification occurs within limited expectations about what has been asked about the images. We have trained monkeys to answer questions about ambiguous images in a constrained context: is the image A or B, where A and B are pictures from the visual world, like a giraffe or a horse and recorded responses in inferotemporal cortex while the task is performed, and while the same images are merely viewed. When we record neural responses to these images, while the monkey is required to ask (and answer) a simple question, neural selectivity in IT is sufficient to explain behavior. When the monkey views the same stimuli, in the absence of this framing context, the neural responses are insufficiently selective to explain the separately collected behavior. These data suggest that when the monkey is asked a very specific and limited question about a complex image, IT cortex is selective in exactly the right way to perform the task well. We propose this match between the needs of the task, and the responses in IT results from predictions, generated in other brain areas, which enhance the relevant IT representations.

Experience-based coding in categorical face perception

Nicholas Furl

One fundamental question in vision science concerns how neural activity produces everyday perceptions. We explore the relationship between neural codes capturing deviations from experience and the perception of visual categories. An intriguing paradigm for studying the role of short-term experience in categorical perception is face adaptation aftereffects – where perception of ambiguous faces morphed between two category prototypes (e.g., two facial identities or expressions) depends on which category was experienced during a recent adaptation period. One might view this phenomenon as a perceptual bias towards novel categories – i.e., those mismatching recent experience. Using fMRI, we present evidence consistent with this viewpoint, where perception of nonadapted categories is associated with medial temporal activity, a region known to subserve novelty processing. This raises a possibility, consistent with models of face perception, that face categories are coded with reference to a representation of experience, such as a norm or top-down prediction. We investigated this idea using MEG by manipulating the deviation in emotional expression between the adapted and morph stimuli. We found signals coding for these deviations arising in the right superior temporal sulcus – a region known to contribute to observation of actions and, notably, face expressions. Moreover, adaptation in the right superior temporal sulcus was also predictive of the magnitude of behavioral aftereffects. The relatively late onset of these effects is suggestive of a role for backwards connections or top-down signaling. Overall, these data are consistent with the idea that face perception depends on a neural representation of the deviation of short-term experience.

Categorical perception may reveal cortical adaptive dynamics

Valentina Daelli, Athena Akrami, Nicola J van Rijsbergen and Alessandro Treves, SISSA

The perception of faces and of the social signals they display is an ecologically important process, which may shed light on generic mechanisms of cortically mediated plasticity. The possibility that facial expressions may be processed also along a sub-cortical pathway, leading to the amygdala, offers the potential to single out uniquely cortical contributions to adaptive perception. With this aim, we have studied adaptation aftereffects, psychophysically, using faces morphed between two expressions. These are perceptual changes induced by adaptation to a priming stimulus, which biases subjects to see the non-primed expression in the morphs. We find aftereffects even with primes presented for very short periods, or with faces low-pass filtered to favor sub-cortical processing, but full cortical aftereffects are much larger, suggesting a process involving conscious comparisons, perhaps mediated by cortical memory attractors, superimposed on a more automatic process, perhaps expressed also subcortically. In a modeling project, a simple network model storing discrete memories can in fact explain such short term plasticity effects in terms of neuronal firing rate adaptation, acting against the rigidity of the boundaries between long-term memory attractors. The very same model can be used, in the long-term memory domain, to account for the convergence of neuronal responses, observed by the Jagadeesh lab in monkey inferior temporal cortex.

Contrast-sign specificity built into the primary visual cortex, V1

Williams and Shpaley

We (Wlliams & Shapley 2007) found that in different cell layers in the macaque primary visual cortex, V1, simple cells have qualitatively different responses to spatial patterns. In response to a stationary grating presented for 100ms at the optimal spatial phase (position), V1 neurons produce responses that rise quickly and then decay before stimulus offset. For many simple cells in layer 4, it was possible to use this decay and the assumption of linearity to predict the amplitude of the response to the offset of a stimulus of the opposite-to-optimal spatial phase. However, the linear prediction was not accurate for neurons in layer 2/3 of V1, the main cortico-cortical output from V1. Opposite-phase responses from simple cells in layer 2/3 were always near zero. Even when a layer 2/3 neuron’s optimal-phase response was very transient, which would predict a large response to the offset of the opposite spatial phase, opposite-phase responses were small or zero. The suppression of opposite-phase responses could be an important building block in the visual perception of surfaces.

Simple cells like those found in layer 4 respond to both contrast polarities of a given stimulus (both brighter and darker than background, or opposite spatial phases). But unlike layer 4 neurons, layer 2/3 simple cells code unambiguously for a single contrast polarity. With such polarity sensitivity, a neuron can represent “dark-left – bright-right” instead of just an unsigned boundary.

 

 

Modern Approaches to Modeling Visual Data

Modern Approaches to Modeling Visual Data

Friday, May 8, 3:30 – 5:30 pm
Royal Ballroom 6-8

Organizer: Kenneth Knoblauch (Inserm, U846, Stem Cell and Brain Research Institute, Bron, France)

Presenters: Kenneth Knoblauch (Inserm, U846, Bron, France), David H. Foster (University of Manchester, UK), Jakob H Macke (Max-Planck-Institut für biologische Kybernetik, Tübingen), Felix A. Wichmann (Technische Universität Berlin & Bernstein Center for Computational Neuroscience Berlin, Germany), Laurence T. Maloney (NYU)

Symposium Description

A key step in vision research is comparison of experimental data to models intended to predict the data. Until recently, limitations on computer power and lack of availability of appropriate software meant that the researcher’s tool kit was limited to a few generic techniques such as fitting individual psychometric functions. Use of these models entails assumptions such as the exact form of the psychometric function that are rarely tested. It is not always obvious how to compare competing models, to show that one describes the data better than another or to estimate what percentage of ‘variability’ in the responses of the observers is really captured by the model. Limitations on the models that researchers are able to fit translate into limitations on the questions they can ask and, ultimately, the perceptual phenomena that can be understood. Because of recent advances in statistical algorithms and the increased computer power available to all researchers, it is now possible to make use of a wide range of computer-intensive parametric and nonparametric approaches based on modern statistical methods. These approaches allow the experimenter to make more efficient use of perceptual data, to fit a wider range of perceptual data, to avoid unwarranted assumptions, and potentially to consider more complex experimental designs with the assurance that the resulting data can be analyzed. Researchers are likely familiar with nonparametric resampling methods such as bootstrapping (Efron, 1979; Efron & Tibshirani, 1993). We review a wider range of recent developments in statistics in the past twenty years including results from the machine learning and model selection literatures. Knoblauch introduces the symposium and describes how a wide range of psychophysical procedures (including fitting psychophysical functions, estimating classification images, and estimating the parameters of signal detection theory) share a common mathematical structure that can be readily addressed by modern statistical approaches. He also shows how to extend these methods to model more complex experimental designs and also discusses modern approaches to smoothing data. Foster describes how to relax the typical assumptions made in fitting psychometric functions and instead use the data itself to guide fitting of psychometric functions. Macke describes a technique—decision-images— for extracting critical stimulus features based on logistic regression and how to use the extracted critical features to generate optimized stimuli for subsequent psychophysical experiments. Wichmann describes how to use “inverse” machine learning techniques to model visual saliency given eye movement data. Maloney discusses the measurement and modeling of super-threshold differences to model appearance and gives several examples of recent applications to surface material perception, surface lightness perception, and image quality. The presentations will outline how these approaches have been adapted to specific psychophysical tasks, including psychometric-function fitting, classification, visual saliency, difference scaling, and conjoint measurement. They show how these modern methods allow experimenters to make better use of data to gain insight into the operation of the visual system than hitherto possible.

Abstracts

Generalized linear and additive models for psychophysical data

Kenneth Knoblauch

What do such diverse paradigms as classification images, difference scaling and additive conjoint measurement have in common?  We introduce a general framework that permits modeling and evaluating experiments covering a broad range of psychophysical tasks. Psychophysical data are considered within a signal detection model in which a decision variable, d, which is some function, f, of the stimulus conditions, S, is related to the expected probability of response, E[P], through a psychometric function, G: E[P] = G(f(d(S))). In many cases, the function f is linear, in which case the model reduces to E[P] = G(Xb), where X is a design matrix describing the stimulus configuration and b a vector of weights indicating how the observer combines stimulus information in the decision variable. By inverting the psychometric function, we obtain a Generalized Linear Model (GLM).  We demonstrate how this model, which has previously been applied to calculation of signal detection theory parameters and fitting the psychometric function, is extended to provide maximum likelihood solutions for three tasks: classification image estimation, difference scaling and additive conjoint measurement.  Within the GLM framework, nested hypotheses are easily set-up in a manner resembling classical analysis of variance.  In addition, the GLM is easily extended to fitting and evaluating more flexible (nonparametric) models involving arbitrary smooth functions of the stimulus. In particular, this approach permits a principled approach to fitting smooth classification images.

Model-free estimation of the psychometric function

David H. Foster, K. Zychaluk

The psychometric function is central to the theory and practice of psychophysics. It describes the relationship between stimulus level and a subject’s response, usually represented by the probability of success in a certain number of trials at that stimulus level. The psychometric function itself is, of course, not directly accessible to the experimenter and must be estimated from observations. Traditionally, this function is estimated by fitting a parametric model to the experimental data, usually the proportion of successful trials at each stimulus level. Common models include the Gaussian and Weibull cumulative distribution functions. This approach works well if the model is correct, but it can mislead if not. In practice, the correct model is rarely known. Here, a nonparametric approach based on local linear fitting is advocated. No assumption is made about the true model underlying the data except that the function is smooth. The critical role of the bandwidth is explained, and a method described for estimating its optimum value by cross-validation. A  wide range of data sets were fitted by the local linear method and, for comparison, by several parametric models. The local linear method usually performed better and never worse than the parametric ones. As a matter of principle, a correct parametric model will always do better than a nonparametric model, simply because the parametric model assumes more about the data, but given an experimenter’s ignorance of the correct model, the local linear method provides an impartial and consistent way of addressing this uncertainty.

Estimating Critical Stimulus Features from Psychophysical Data: The Decision-Image Technique Applied to Human Faces

Jakob H. Macke, Felix A. Wichmann

One of the main challenges in the sensory sciences is to identify the stimulus features on which the sensory systems base their computations: they are a pre-requisite for computational models of perception. We describe a technique—decision-images— for extracting critical stimulus features based on logistic regression. Rather than embedding the stimuli in noise, as is done in classification image analysis, we want to infer the important features directly from physically heterogeneous stimuli.  A Decision-image not only defines the critical region-of-interest within a stimulus but is a quantitative template which defines a direction in stimulus space. Decision-images thus enable the development of predictive models, as well as the generation of optimized stimuli for subsequent psychophysical investigations. Here we describe our method and apply it to data from a human face discrimination experiment. We show that decision-images are able to predict human responses not only in terms of overall percent correct but are able to predict, for individual observers, the probabilities with which individual faces are (mis-) classified. We then test the predictions of the models using optimized stimuli. Finally, we discuss possible generalizations of the approach and its relationships with other models.

Non-linear System Identification: Visual Saliency Inferred from Eye-Movement Data

Felix A. Wichmann, Wolf Kienzle, Bernhard Schölkopf, Matthias Franz

For simple visual patterns under the experimenter’s control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer’s data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manually—e.g. by selecting Gabors as visual filters—we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

Measuring and modeling visual appearance of surfaces

Laurence T. Maloney

Researchers studying visual perception have developed numerous experimental methods for probing the perceptual system. The range of techniques available to study performance near visual threshold is impressive and rapidly growing and we have a good understanding of what physical differences in visual stimuli are perceptually discriminable. A key remaining challenge for visual science is to develop models and psychophysical methods that allow us to evaluate how the visual system estimates visual appearance. Using traditional methods, for example, it is easy to determine how large a change in the parameters describing a surface is needed to produce a visually discriminable surface. It is less obvious how to evaluate the contributions of these same parameters to perception of visual qualities such as color, gloss or roughness. In this presentation, I’ll describe methods for modeling judgments of visual appearance that go beyond simple rating methods and describe how to model them and evaluate the resulting models experimentally. I’ll describe three applications. The first concerns how illumination and surface albedo contribute to the rated dissimilarity of illuminated surfaces in three-dimensional scenes. The second concerns modeling of super-threshold differences in image quality using difference scaling, and the third concerns application of additive conjoint measurement to evaluating how observers perceive gloss and meso-scale surface texture (‘bumpiness’) when both are varied.

 

Retinotopic and Non-retinotopic Information Representation and Processing in Human Vision

Retinotopic and Non-retinotopic Information Representation and Processing in Human Vision

Friday, May 8, 3:30 – 5:30 pm
Royal Ballroom 1-3

Organizers: Haluk Ogmen (University of Houston) and Michael H. Herzog (Laboratory of Psychophysics, BMI, EPFL, Switzerland)

Presenters: Doug Crawford (Centre for Vision Research, York University, Toronto, Ontario, Canada), David Melcher (Center for Mind/Brain Sciences and Department of Cognitive Sciences University of Trento, Italy), Patrick Cavanagh (LPP, Université Paris Descartes, Paris, France), Shin’ya Nishida (NTT Communication Science Labs, Atsugi, Japan), Michael H. Herzog (Laboratory of Psychophysics, BMI, EPFL, Switzerland)

Symposium Description

Due to the movements of the eyes and those of the objects in the environment, natural vision is highly dynamic. An understanding of how the visual system can cope with such complex inputs requires an understanding of reference frames, used in the computations of various stimulus attributes. It is well known that the early visual system has a retinotopic organization. It is generally thought that the retinotopic organization of the early visual system is insufficient to support the fusion of visual images viewed at different eye positions. Moreover, metacontrast masking and anorthoscopic perception show that a retinotopic image is neither sufficient nor necessary for the perception of spatially extended form. How retinotopic representations are transformed into more complex non-retinotopic representations has been long-standing and often controversial question. The classical paradigm to study this question has been the study of memory across eye movements. As we shift our gaze from one fixation to another one, the retinotopic representation of the environment undergoes drastic shifts, yet phenomenally our environment appears stable. How is this phenomenal stability achieved? Does the visual system integrate information across eye movements and if so how? A variety of theories ranging from purely retinotopic representations without information integration to detailed spatiotopic representations with point-by-point information integration have been proposed. Talks in this symposium (Crawford, Melcher, Cavanagh) will address the nature of trans-saccadic memory, the role of extra-retinal signals, retinotopic, spatiotopic, and objectopic representations for information processing and integration during and across eye movements. In addition to the challenge posed by eye movements on purely retinotopic representations, recent studies suggest that, even under steady fixation, computation of moving form requires non-retinotopic representations. This is because objects in the environment often move with complex trajectories and do not stimulate sufficiently retinotopically anchored receptive fields. Moreover, occlusions can “blank out” retinotopic information for a significant time period. These failures to activate sufficiently retinotopically anchored neurons, in turn, suggest that some form of non-retinotopic information analysis and integration should take place. Talks in this symposium (Nishida, Herzog) will present recent findings that show how shape and color information for moving objects can be integrated according to non-retinotopic reference frames. Taken together, the talks at the symposium aim to provide a recent perspective to the fundamental problem of reference frames utilized by the visual system and present techniques to study these representations during both eye movement and fixation periods. The recent convergence of a variety of techniques and stimulus paradigms in elucidating the roles of non-retinotopic representations provides timeliness for the proposed symposium. Since non-retinotopic representations have implications for a broad range of visual functions, we expect our symposium to be of interest to the general VSS audience including students and faculty.

Abstracts

Cortical Mechanisms for Trans-Saccadic Memory of Multiple Objects

Doug Crawford, Steven Prime

Humans can retain the location and appearance of 3-4 objects in visual working memory, independent of whether a saccade occurs during the memory interval.  Psychophysical experiments show that, in the absence of retinal cues, extra-retinal signals are sufficient to update trans-saccadic memory, but where and how do these signals enter the visual system? It is know that ‘dorsal stream’ areas like the parietal eye fields update motor plans by remapping them in gaze-centered coordinates, but the equivalent neural mechanisms for updating object features across saccades are less understood. We investigated the possible role of extra-retinal signals from the cortical gaze control system by applying trans-cranial magnetic stimulation (TMS) to either the human parietal eye fields or the frontal eye fields, during the interval between viewing several objects and testing their remembered orientation and location. Parietal TMS had a baseline effect on memory of one feature and reduced memory capacity from approximately three down to one feature, but only when applied to the right hemisphere near the time of a saccade. The effects of frontal cortex TMS on trans-saccadic memory capacity were similar, but were more symmetric, and did not affect baseline feature memory. In our task, the latter would occur if spatial memory were disrupted without affecting feature memory. These experiments show that cortical gaze control centers usually associated with the ‘dorsal’ stream of vision are also involved in visual processing and memory of object features during saccades, possibly influencing ‘ventral stream’ processing through re-entrant pathways.

Trans-Saccadic Perception: “Object-otopy” across Space and Time

David Melcher

Real-world perception is typically trans-saccadic: we see the same object across multiple fixations. Yet saccadic eye movements can dramatically change the location in which an object is projected onto the retina. In a series of experiments using eye tracking, psychophysics, neuroimaging and TMS, we have investigated how information from a previous fixation can influence perception in the subsequent fixation. Specifically, we have tested the idea that the “remapping” of receptive fields around the time of saccadic eye movements might play a role in trans-saccadic perception. Our results suggest that two mechanisms interact to produce “object-otopic” perception across saccades. First, a limited number of objects that are individuated in a scene (treated as unique objects potentially subject to action, as opposed to being part of the background gist) are represented and updated across saccades in a sensorimotor “saliency map” (possibly in posterior parietal cortex). Second, the updating of these “pointers” in the map leads to the remapping of receptive fields in intermediate visual areas. We have found that perception can be retinotopic, spatiotopic or even-in the case of moving objects-can involve the combination of information for the same object that is neither retinally or spatially matched. At the same time, however, the visual system must give priority to the retinal information, which tends to be most reliable during fixation of stable objects.

Spatiotopic Apparent Motion

Patrick Cavanagh, Martin Szinte

When our eyes move, stationary objects move over our retina. Our visual system cleverly discounts this retinal motion so that we do not see the objects moving when they are not. What happens if the object does move at the time of the eye movement? There is a question of whether we will see the displacement at all, but if we do see it, is the motion determined by the displacement on the retina or the displacement in space? To address this, we asked subjects to make horizontal saccades of 10°. Two dots were presented, one before and one after the saccade displaced vertically on the screen by 3° from the first. Each dot was presented for 400 msec and the first turned off about 100 msec before the saccade and the second dot turned on 100 msec after the saccade. In this basic condition, the retinal locations of the two dots were in opposite hemifields, separated horizontally by 10°. Nevertheless, subjects reported the dots appeared to be in motion vertically – the spatiotopic direction – although with a noticeable deviation from true vertical. This spatiotopic apparent motion was originally reported by Rock and Ebenholtz (1962) but for displacements along the direction of the saccade. In our experiments, we use the deviation from spatiotopic motion to estimate errors in the remapping of pre-saccadic locations that underlies this spatiotopic motion phenomenon.

Trajectory Integration of Shape and Color of Moving Object

Shin�ya Nishida, Masahiko Terao, Junji Watanabe

Integration of visual input signals along motion trajectory is widely recognized as a basic mechanism of motion detection. It is however not widely recognized that the same computation is potentially useful for shape and color perception of moving objects. This is because trajectory integration can improve signal-to-noise ratio of moving feature extraction without introducing motion blur. Indeed, trajectory integration of shape information is indicated by several phenomena including multiple-slit view (e.g., Nishida, 2004). Trajectory integration of color information is also indicated by a couple of phenomena, motion-induced color mixing (Nishida et al., 2007) and motion-induced color segregation (Watanabe & Nishida, 2007). In the motion-induced color segregation, for instance, temporal alternations of two colors on the retina are perceptually segregated more veridically when they are presented as moving patterns rather than as stationary alternations at the same rate. This improvement in temporal resolution can be explained by a difference in motion trajectory along which color signals are integrated. Furthermore, we recently found that the improvement in temporal resolution is enhanced when an observer views a stationary object while making a pursuit eye movement, in comparison with when an observer views a moving object without moving eyes (Terao et al, 2008, VSS). This finding further strengthens the connection of the motion-induced color segregation with subjective motion deblur.

A Litmus Test for Retino- vs. Non-retinotopic Processing

Michael Herzog, Marc Boi, Thomas Otto, Haluk Ogmen

Most visual cortical areas are retinotopically organized and accordingly most visual processing is assumed to be processed within a retinotopic coordinate frame. However, in a series of psychophysical experiments, we have shown that features of elements are often non-retinotopically integrated when the corresponding elements are motion grouped. When this grouping is blocked, however, feature integration occurs within retinotopic coordinates (even though the basic stimulus paradigm is identical in both conditions and grouping is modulated by spatial or temporal contextual cues only). Hence, there is strong evidence for both retino- and non-retinotopic processing. However, it is not always easy to determine which of these two coordinate systems prevails in a given stimulus paradigm. Here, we present a simple psychophysical test to answer this question. We presented three squares in a first frame, followed by an ISI, the same squares shifted one position to the right, the same ISI, and the squares shifted back to their original position. When this cycle is repeated with ISIs longer than 100ms, three squares are perceived in apparent motion. With this specific set-up, features integrate between the central squares iff integration takes place non-retinotopically. With this litmus test we showed, for example, that motion processing is non-retinotopic whereas motion adaptation is retinotopic. In general, by adding the feature of interest to the central square, it can be easily tested whether a given stimulus paradigm is processed retino- or non-retinotopically.

 

Dynamic Processes in Vision

Dynamic Processes in Vision

Friday, May 8, 3:30 – 5:30 pm Royal Ballroom 4-5

Organizer: Jonathan D. Victor (Weill Medical College of Cornell University)

Presenters: Sheila Nirenberg (Dept. of Physiology and Biophysics, Weill Medical College of Cornell University), Diego Contreras (Dept. of Neuroscience, University of Pennsylvania School of Medicine), Charles E. Connor (Dept. of Neuroscience, The Johns Hopkins University School of Medicine), Jeffrey D. Schall (Department of Psychology, Vanderbilt University)

Symposium Description

The theme of the symposium is the importance of analyzing the time course of neural activity for understanding behavior. Given the very obviously spatial nature of vision, it is often tempting to ignore dynamics, and to focus on spatial processing and maps. As the speakers in this symposium will show, dynamics are in fact crucial: even for processes that appear to be intrinsically spatial, the underlying mechanism often resides in the time course of neural activity. The symposium brings together prominent scientists who will present recent studies that exemplify this unifying theme. Their topics will cover the spectrum of VSS, both anatomically and functionally (retinal ganglion cell population coding, striate cortical mechanisms of contrast sensitivity regulation, extrastriate cortical analysis of shape, and frontal and collicular gaze control mechanisms). Their work utilizes sophisticated physiological techniques, ranging from large-scale multineuronal ex-vivo recording to intracellular in vivo recording, and employs a breadth of analytical approaches, ranging from information theory to dynamical systems.

Because of the mechanistic importance of dynamics and the broad range of the specific topics and approaches, it is anticipated that the symposium will be of interest to physiologists and non-physiologists alike, and that many VSS members will find specific relevance to their own research.

Abstracts

How neural systems adjust to different environments: an intriguing role for gap junction coupling

Sheila Nirenberg

The nervous system has an impressive ability to self-adjust – that is, as it moves from one environment to another, it can adjust itself to accommodate the new conditions. For example, as it moves into an environment with new stimuli, it can shift its attention; if the stimuli are low contrast, it can adjust its contrast sensitivity; if the signal-to-noise ratio is low, it can change its spatial and temporal integration properties. How the nervous system makes these shifts isn’t clear. Here we show a case where it was possible to obtain an answer. It’s a simple case, but one of the best-known examples of a behavioral shift – the shift in visual integration time that accompanies the switch from day to night vision. Our results show that the shift is produced by a mechanism in the retina – an increase in coupling among horizontal cells. Since coupling produces a shunt, the increase causes a substantial shunting of horizontal cell current, which effectively inactivates the cells. Since the cells play a critical role in shaping integration time (they provide feedback to photoreceptors that keeps integration time short), inactivating them causes integration time to become longer. Thus, a change in the coupling of horizontal cells serves as a mechanism to shift the visual system from short to long integration times.  The results raise a new, and possibly generalizable idea: that a neural system can be shifted from one state to another by changing the coupling of one of its cell classes.

Cortical network dynamics and response gain

Diego Contreras

The transformation of synaptic input into spike output by single neurons is a key process underlying the representation of information in sensory cortex. The slope, or gain, of this input-output function determines neuronal sensitivity to stimulus parameters and provides a measure of the contribution of single neurons to the local network. Neuronal gain is not constant and may be modulated by changes in multiple stimulus parameters. Gain modulation is a common neuronal phenomenon that modifies response amplitude without changing selectivity.  Computational and in vitro studies have proposed cellular mechanisms of gain modulation based on the postsynaptic effects of background synaptic activation, but these mechanisms have not been studied in vivo.  Here we used intracellular recordings from cat primary visual cortex to measure neuronal gain while changing background synaptic activity with visual stimulation.  We found that increases in the membrane fluctuations associated with increases in synaptic input do not obligatorily result in gain modulation in vivo.  However, visual stimuli that evoked sustained changes in resting membrane potential, input resistance, and membrane fluctuations robustly modulated neuronal gain.  The magnitude of gain modulation depended critically on the spatiotemporal properties of the visual stimulus.  Gain modulation in vivo may thus be determined on a moment-to-moment basis by sensory context and the consequent dynamics of synaptic activation.

Dynamic integration of object structure information in primate visual cortex

Charles E. Connor

Object perception depends on extensive processing of visual information through multiple stages in the ventral pathway of visual cortex.  We use neural recording to study how information about object structure is processed in intermediate and higher-level ventral pathway cortex of macaque monkeys.  We find that neurons in area V4 (an intermediate stage) represent object boundary fragments by means of basis function tuning for position, orientation, and curvature.  At subsequent stages in posterior, central, and anterior inferotemporal cortex (PIT/CIT/AIT), we find that neurons integrate information about multiple object fragments and their relative spatial configurations.  The dynamic nature of this integration process can be observed in the evolution of neural activity patterns across time following stimulus onset.  At early time points, neurons are responsive to individual object fragments, and their responses to combined fragments are linearly additive.  Over the course of approximately 60 ms, responses to individual object fragments decline and responses to specific fragment combinations increase.  This evolution toward nonlinear selectivity for multi-fragment configurations involves both shifts in response properties within neurons and shifts in population activity levels between primarily linear and primarily nonlinear neurons.  This pattern is consistent with a simple network model in which the strength of feedforward and recurrent inputs varies continuously across neurons.

Timing of selection for the guidance of gaze

Jeffrey D. Schall

Time is of the essence in the execution of visually guided behavior in dynamic environments.  We have been investigating how the visual system responds to unexpected changes of the image when a saccade is being planned.  Performance of stop signal or double-step tasks can be explained as the outcome of a race between a process that produces the saccade and a process that interrupts the preparation.  Neural correlates of dynamic target selection and these race processes have been identified in the frontal eye field and superior colliculus.  The timecourse of these processes can provide useful leverage for understanding how early visual processing occurs.

 

Is number visual? Is vision numerical? Investigating the relationship between visual representations and the property of magnitude

Is number visual? Is vision numerical? Investigating the relationship between visual representations and the property of magnitude

Friday, May 8, 1:00 – 3:00 pm
Royal Ballroom 6-8

Organizer: Michael C. Frank (Massachusetts Institute of Technology)

Presenters: David Burr (Dipartimento di Psicologia, Università Degli Studi di Firenze and Department of Psychology, University of Western Australia), Michael C. Frank (Massachusetts Institute of Technology), Franconeri, Steven (Northwestern University), David Barner (University of California, San Diego), Justin Halberda (Johns Hopkins University)

Symposium Description

The ability to manipulate exact numbers is a signature human achievement, supporting activities like building bridges, designing computers, and conducting economic transactions. Underlying this ability and supporting its acquisition is an evolutionarily-conserved mechanism for the manipulation of approximate quantity: the analog magnitude system. The behavioral and neural signatures of magnitude representations have been extensively characterized but how these representations interact with other aspects of cognitive and visual processing is still largely unknown. Do magnitude features attach to objects, scenes, or surfaces? Is approximate magnitude representation maintained even for sets for which exact quantity is known? Is magnitude estimation ability altered by experience?

The goal of our symposium is to look for answers to these questions by asking both how number is integrated into visual processing and how visual processing in turn forms a basis for the acquisition and processing of exact number. We address these questions through talks on three issues: 1) the basic psychophysical properties of numerical representations (Halberda, Burr), 2) how visual mechanisms integrate representations of number (Franconeri & Alvarez), and 3) how these representations support exact computation, both in standard linguistic representations (Frank) and via alternative representations (Barner).

The issues addressed by our symposium have been a focus of intense recent interest. Within the last four years there have been a wide variety of high-profile reports from developmental, neuroscientific, comparative, and cross-linguistic/cross-cultural studies of number. Research on number is one of the fastest moving fields in cognitive science, due both to the well-defined questions that motivate research in this field and to the wide variety of methods that can be brought to bear on these questions.

The target audience of our symposium is a broad group of vision scientists, both students and faculty, who are interested in connecting serious vision science with cognitive issues of broad relevance to a wide range of communities in psychology, neuroscience, and education. In addition, the study of number provides an opportunity to link innovations in vision research methods—including psychophysical-style experimental designs, precise neuroimaging methods, and detailed computational data analysis—with deep cognitive questions about the nature of human knowledge. We anticipate that attendees of our symposium will come away with a good grasp of the current state of the art and the outstanding issues in the interface of visual and numerical processing.

Abstracts

A visual sense of number

David Burr

Evidence exists for a non-verbal capacity to apprehend number, in humans (including infants), and in other primates. We investigated numerosity perception in adult humans, by measuring Weber fractions with a series of techniques, and by adaptation. The Weber fraction measurements suggest that number estimation and “subitizing” share common mechanisms. Adapting to large numbers of dots increased apparent numerosity (by a factor of 2-3), and adapting to small numbers increased it. The magnitude of adaptation depended primarily on the numerosity of the adapter, not on size, orientation or contrast of test or adapter, and occurred with very low adapter contrasts. Varying pixel density had no effect on adaptation, showing that it depended solely on numerosity, not related visual properties like texture density. We propose that just as we have a direct visual sense of the reddishness  of half a dozen ripe cherries so we do of their sixishness. In other words there are distinct qualia for numerosity, as there are for colour, brightness and contrast, not reducible to spatial frequency or density of texture.

Language as a link between exact number and approximate magnitude

Michael C. Frank

Is exact number a human universal? Cross-cultural fieldwork has given strong evidence that language for exact number is an invention which is not present in all societies. This result suggests a range of questions about how learning an exact number system may interact with pre-existing analog magnitude representations. More generally, number presents a tractable case of the Whorfian question of whether speakers of different languages differ in their cognition. We addressed these questions by studying the performance of the Pirahã, an Amazonian group in Brazil, on a range of simple quantity matching tasks (first used by Gordon, 2004). We compared the performance of this group to the performance of English-speakers who were unable to use exact numerical representations due to a concurrent verbal interference task. We found that both groups were able to complete simple one-to-one matching tasks even without words for numbers and both groups relied on analog magnitude representations when faced with a more difficult task in which items in the set to be estimated were presented one at a time. However, performance between the two groups diverged on tasks in which other strategies could be used. We conclude that language for number is a “cognitive technology” which allows the manipulation of exact quantities across time, space, and changes in modality, but does not eliminate or substantially alter users’ underlying numerical abilities.

Rapid enumeration is based on a segmented visual scene

Steve Franconeri, George Alvarez

How do we estimate the number of objects in a set?  One primary question is whether our estimates are based on an unbroken visual image or a segmented collection of discrete objects.  We manipulated whether individual objects were isolated from each other, or grouped into pairs by irrelevant lines.  If number estimation operates over an unbroken image, then this manipulation should not affect estimates. But if number estimation relies on a segmented image, then grouping pairs of objects into single units should lead to lower estimates. In Experiment 1, participants underestimated the number of grouped squares, relative to when the connecting lines were ‘broken’. Experiment 2 presents evidence that this segmentation process occurred broadly across the entire set of objects.  In Experiment 3, a staircase procedure provides a quantitative measure of the underestimation effect.  Experiment 4 shows that is the strength of the grouping effect was equally strong for a single thin line, and the effect can be eliminated by a tiny break in the line.  These results provide the first direct evidence that number estimation relies on a segmented input.

Constructing exact number approximately: a case study of mental abacus representations

David Barner

Exact numerical representation is usually accomplished through linguistic representations. However, an alternative route for accomplishing this task is through the use of a “mental abacus”—a mental image of an abacus (a device used in some cultures for keeping track of exact quantities and doing arithmetic via the positions of beads on a rigid frame). We investigated the nature of mental abacus representations by studying children ages 7-15 who were trained in this technique. We compared their ability to read the cardinality of “abacus flashcards” (briefly presented images of abacuses in different configurations) with their ability to enumerate sets of dots after similarly brief, masked presentation. We conducted five studies comparing abacus flashcards to: (1) random dot enumeration, (2) spatially proximate dot enumeration, (3) enumeration of dots arranged in an abacus configuration without the abacus frame, (4) enumeration of dots on a rotated abacus, (5) enumeration of dots arranged on an abacus. In all conditions, participants were faster and more accurate in identifying the cardinality of an abacus than they were in enumerating the same number of beads, even when the display was physically identical. Analysis of errors suggested that children in our studies viewed the abacus as a set of objects with each separate row of beads being a single object, each with its own independent magnitude feature. Thus, the “mental abacus” draws on pre-existing approximate and exact visual abilities to construct a highly accurate system for representing large exact number.

An interface between vision and numerical cognition

Justin Halberda

While the similarity of numerical processing across different modalities (e.g., visual objects, auditory objects, extended visual events) suggests that number concepts are domain general even at the earliest ages (4 month old babies), visual processing is constrained in ways that may have constrained the numerical concepts humans have developed.  In this talk I discuss how online processing of numerical content is shaped by the constraints of both object-based and ensemble-based visual processing and discuss how numerical content and vision engage one another.

 

Common mechanisms in Time and Space perception

Common mechanisms in Time and Space perception

Friday, May 8, Time 1:00 – 3:00 pm
Royal Ballroom 1-3

Organizer: David Eagleman (Baylor College of Medicine)

Presenters: Concetta Morrone (Università di Pisa, Pisa, Italy), Alex Holcombe (University of Sydney), Jonathan Kennedy (University of Cardiff), David Eagleman (Baylor College of Medicine)

Symposium Description

Most of the actions we carry out on a daily basis require timing on the scale of tens to hundreds of milliseconds. We must judge time to speak, to walk, to predict the interval between our actions and their effects, to determine causality and to decode information from our sensory receptors. However, the neural bases of time perception are largely unknown. Scattered confederacies of investigators have been interested in time for decades, but only in the past few years have new techniques been applied to old problems. Experimental psychology is discovering how animals perceive and encode temporal intervals, while physiology, fMRI and EEG unmask how neurons and brain regions underlie these computations in time. This symposium will capitalize on new breakthroughs, outlining the emerging picture and highlighting the remaining confusions about time in the brain. How do we encode and decode temporal information? How is information coming into different brain regions at different times synchronized? How plastic is time perception? How is it related to space perception?  The experimental work of the speakers in this symposium will be shored together to understand how neural signals in different brain regions come together for a temporally unified picture of the world, and how this is related to the mechanisms of space perception.  The speakers in this symposium are engaged in experiments at complementary levels of exploring sub-second timing and its relation to space.

Abstracts

A neural model for temporal order judgments and their active recalibration: a common mechanism for space and time?

David M. Eagleman, Mingbo Cai, Chess Stetson

Human temporal order judgments (TOJs) dynamically recalibrate when participants are exposed to a delay between their motor actions and sensory effects.  We here present a novel neural model that captures TOJs and their recalibration.  This model employs two ubiquitous features of neural systems: synaptic scaling at the single neuron level and opponent processing at the population level.  Essentially, the model posits that different populations of neurons encode different delays between motor-sensory or sensory-sensory events, and that these populations feed into opponent processing neurons that employ synaptic scaling.  The system uses the difference in activity between populations encoding for ‘before’ or ‘after’ to obtain a decision.  As a consequence, if the network’s ‘motor acts’ are consistently followed by sensory feedback with a delay, the network will automatically recalibrate to change the perceived point of simultaneity between the action and sensation.  Our model suggests that temporal recalibration may be a temporal analogue to the motion aftereffect.  We hypothesize that the same neural mechanisms are used to make perceptual determinations about both space and time, depending on the information available in the neural neighborhood in which the module unpacks.

Space-time in the brain

Concetta Morrone, David Burr

The perception of space and time are generally studied separately and thought of as separate and independent dimensions. However, recent research suggests that these attributes are tightly interlinked: event timing may be modality-specific and tightly linked with space. During saccadic eye movements, time becomes severely compressed, and can even appear to run backwards. Adaptation experiments further suggest that visual events of sub-second duration are timed by neural visual mechanisms with spatially circumscribed receptive fields, anchored in real-world rather than retinal coordinates. All these results sit nicely with recent evidence implicating parietal cortex with coding of both space and sub-second interval timing.

Adaptation to space and to time

Jonathan Kennedy, M.J. Buehner, S.K. Rushton

Human behavioural adaptation to delayed visual-motor feedback has been investigated by Miall and Jackson (2006: Exp Brain Res) in a closed-loop manual tracking task with a semi-predictably moving visual target. In intersensory, open-loop and predictable sensory-motor tasks, perceptual adaptation of the involved modalities has been demonstrated on several occasions in recent years, using temporal order judgments and perceptual illusions (e.g. Stetson, Cui, Montague, & Eagleman, 2006: Neuron; Fujisaki, Shimojo, Kashino, & Nishida, 2004: Nature Neuroscience).
Here we present results from two series of experiments: the first investigating perceptual adaptation in Miall and Jackson’s tracking task, by adding visual-motor temporal order judgments; and the second investigating the localization of perceptual adaptation across the involved modalities.
We will discuss these results in the light of recent developments in modeling adaptation to misalignment in spatial (Witten, Knudsen, & Sompolinsky, 2008: J Neurophysiol) and temporal  (Stetson et al, 2006) domains, and consider their implications for what, if any, common mechanisms and models may underlie all forms of adaptation to intersensory and sensory-motor misalignment.

A temporal limit on judgments of the position of a moving object

Alex Holcombe, Daniel Linares, Alex L. White

The mechanisms of time perception have consequences for perceived position when one attempts to determine the position of a moving object at a particular time. While viewing a luminance-defined blob orbiting fixation, our observers report the blob’s perceived position when the fixation point changes color. In addition to the error in the direction of motion (flash-lag effect), we find that the standard deviation of position judgments increases over a five-fold range of speeds such that it corresponds to a constant 70-80 ms of the blob’s trajectory (also see Murakami 2001). This result is in sharp contrast to acuity tasks with two objects moving together, for which thresholds vary very little with velocity. If the 70 ms of temporal variability is dependent on low-level factors, we would expect a different result when we triple the eccentricity, but this had little effect. If the variability is due to uncertainty about the time of the color change, then we should be able to reduce it by using a sound as the time marker (as the auditory system may have better temporal resolution) or by using a predictable event, such as the time a dot moving at a constant velocity arrives at fixation. Although average error differs substantially for these conditions, in both the reported positions still spanned about 70-80 ms of the blob’s trajectory. Finally, when observers attempt to press a button in time with arrival of the blob at a landmark, the standard deviation of their errors is about 70 ms. We theorize that this temporal imprecision originates in the same mechanisms responsible for the poor temporal resolution of feature-binding (e.g. Holcombe & Cavanagh 2001; Fujisaki & Nishida 2005).

 

New Methods for Delineating the Brain and Cognitive Mechanisms of Attention

New Methods for Delineating the Brain and Cognitive Mechanisms of Attention

Friday, May 7, 1:00 – 3:00 pm
Royal Ballroom 4-5

Organizers: George Sperling, University of California, Irvine

Presenters: Edgar DeYoe (Medical College of Wisconsin), Jack L. Gallant (University of California, Berkeley), Albert J. Ahumada (NASA Ames Research Center, Moffett Field CA 94035), Wilson S. Geisler (The University of Texas at Austin), Barbara Anne Dosher (University of California, Irvine), George Sperling (University of California, Irvine)

Symposium Description

This symposium brings together the world’s leading specialists in six different subareas of visual attention. These distinguished scientists will expose the audience to an enormous range of methods, phenomena, and theories. It’s not a workshop; listeners won’t learn how to use the methods described, but they will become aware of the existence of diverse methods and what can be learned from them. The participants will aim their talks to target VSS attendees who are not necessarily familiar with the phenomena and theories of visual attention but who can be assumed to have some rudimentary understanding of visual information processing. The talks should be of interest to and understandable by all VSS attendees who have an interest in visual information processing: students, postdocs, academic faculty, research scientists, clinicians, and the symposium participants themselves. Attendees will see examples of the remarkable insights achieved by carefully controlled experiments combined with computational modeling. DeYoe reviews his extraordinary fMRI methods for localizing spatial visual attention in the visual cortex of alert human subjects to measure their ”attention maps”. He shows in exquisite detail how top-down attention to local areas in visual space changes the BOLD response (an indicator of neural activity) in corresponding local areas V1 of visual cortex and in adjacent spatiotopic visual processing areas. This work is of fundamental significance in defining the topography of attention and it has important clinical applications. Gallant is the premier exploiter of natural images in the study of visual cortical processing. His work uses computational models to define the neural processes of attention in V4 and throughout the attention hierarchy. Gallant’s methods complement DeYoe’s in that they reveal functions and purposes of attentional processing that often are overlooked with simple stimuli traditionally used. Ahumada, who introduced the reverse correlation paradigm in vision science, here presents a model for the eye movements in perhaps the simplest search task (which happens also to have practical importance): the search for a small target near horizon between ocean and sky. This is an introduction to the talk by Geisler. Geisler continues the theme of attention as optimizing performance in complex tasks in studies of visual search. He presents a computational model for how attention and stimulus factors jointly control eye movements and search success in arbitrarily complex and difficult search tasks. Eye movements in visual search approach those of an ideal observer in making optimal choices given the available information, and observers adapt (learn) rapidly when the nature of the information changes. Dosher has developed analytic descriptions of attentional processes that enable dissection of attention into three components: filter sharpening, stimulus enhancement, and altered gain control. She applies these analyses to show how subjects learn to adjust the components of attention to easy and to difficult tasks. Sperling reviews the methods used to quantitatively describe spatial and temporal attention windows, and to measure the amplification of attended features. He shows that different forms of attention act independently.

Abstracts

I Know Where You Are Secretly Attending! The topography of human visual attention revealed with fMRI

Edgar DeYoe, Medical College of Wisconsin; Ritobrato Datta, Medical College of Wisconsin

Previous studies have described the topography of attention-related activation in retinotopic visual cortex for an attended target at one or a few locations within the subject’s field of view. However, a complete description for all locations in the visual field is lacking. In this human fMRI study, we describe the complete topography of attention-related cortical activation throughout the central 28° of visual field and compare it with previous models. We cataloged separate fMRI-based maps of attentional topography in medial occipital visual cortex when subjects covertly attended to each target location in an array of 3 concentric rings of 6 targets each. Attentional activation was universally highest at the attended target but spread to other segments in a manner depending on eccentricity and/or target size.. We propose an “Attentional Landscape” model that is more complex than a ‘spotlight’ or simple ‘gradient’ model but includes aspects of both. Finally, we asked subjects to secretly attend to one of the 18 targets without informing the investigator. We then show that it is possible to determine the target of attentional scrutiny from the pattern of brain activation alone with 100% accuracy. Together, these results provide a comprehensive, quantitative and behaviorally relevant account of the macroscopic cortical topography of visuospatial attention. We also show how the pattern of attentional enhancement as it would appear distributed within the observer’s field of view thereby permitting direct observation of a neurophysiological correlate of a purely mental phenomenon, the “window of attention.”

Attentional modulation in intermediate visual areas during natural vision

Jack L. Gallant, University of California, Berkeley

Area v4 has been the focus of much research on neural mechanisms of attention. However, most of this work has focused on reduced paradigms involving simple stimuli such as bars and gratings, and simple behaviors such as fixation. The picture that has emerged from such studies suggests that the main effect of attention is to change response rate, response gain or contrast gain. In this talk I will review the current evidence regarding how neurons are modulated by attention under more natural viewing conditions involving complex stimuli and behaviors. The view that emerges from these studies suggests that attention operates through a variety of mechanisms that modify the way information is represented throughout the visual hierarchy. These mechanisms act in concert to optimize task performance under the demanding conditions prevailing during natural vision.

A model for search and detection of small targets

Albert J. Ahumada, NASA Ames Research Center, Moffett Field CA 94035

Computational models predicting the distribution of the time to detection of small targets on a display are being developed to improve workstation designs. Search models usually contain bottom-up processes, like a saliency map, and top-down processes, like a priori distributions over the possible locations to be searched. A case that needs neither of these features is the search for a very small target near the horizon when the sky and the ocean are clear. Our models for this situation have incorporated a saccade-distance penalty and inhibition-of-return with a temporal decay. For very small, but high contrast targets, using the simple detection model that the target is detected if it is foveated is sufficient. For low contrast signals, a standard observer detection model with masking by the horizon edge is required. Accurate models of the the search and detection process without significant expectations or stimulus attractors should make it easier to estimate the way in which the expectations and attractors are combined when they are included.

Ideal Observer Analysis of Overt Attention

Wilson S. Geisler, The University of Texas at Austin

In most natural tasks humans use information detected in the periphery, together with context and other task-dependent constraints, to select their fixation locations (i.e., the locations where they apply the specialized processing associated with the fovea). A useful strategy for investigating the overt-attention mechanisms that drive fixation selection is to begin by deriving appropriate normative (ideal observer) models. Such ideal observer models can provide a deep understanding of the computational requirements of the task, a benchmark against which to compare human performance, and a rigorous basis for proposing and testing plausible hypotheses for the biological mechanisms. In recent years, we have been investigating the mechanisms of overt attention for tasks in which the observer is searching for a known target randomly located in a complex background texture (nominally a background of filtered noise having the average power spectrum of natural images). This talk will summarize some of our earlier and more recent findings (for our specific search tasks): (1) practiced humans approach ideal search speed and accuracy, ruling out many sub-ideal models; (2) human eye movement statistics are qualitatively similar to those of the ideal searcher; (3) humans select fixation locations that make near optimal use of context (the prior over possible target locations); (4) humans show relatively rapid adaptation of their fixation strategies to simulated changes in their visual fields (e.g., central scotomas); (5) there are biologically plausible heuristics that approach ideal performance.

Attention in High Precision Tasks and Perceptual Learning

Barbara Anne Dosher, University of California, Irvine; Zhong-Lin Lu, University of Southern California

At any moment, the world presents far more information than the brain can process. Visual attention allows the effective selection of information relevant for high priority processing, and is often more easily focused on one object than two. Both spatial selection and object attention have important consequences for the accuracy of task performance. Such effects are historically assessed primarily for relatively “easy” lower-precision tasks, yet the role of attention can depend critically on the demand for fine, high precision judgments. High precision task performance generally depends more upon attention and attention affects performance across all contrasts with or without noisy stimuli. Low precision tasks with similar processing loads generally show effects of attention only at intermediate contrasts and may be restricted to noisy display conditions. Perceptual learning can reduce the costs of inattention. The different roles of attention and task precision are accounted for within the context of an elaborated perceptual template model of the observer showing distinct functions of attention, and providing an integrated account of performance as a function of attention, task precision, external noise and stimulus contrast. Taken together, these provide a taxonomy of the functions and mechanisms of visual attention.

Modeling the Temporal, Spatial, and Featural Processes of Visual Attention

George Sperling, University of California, Irvine

A whirlwind review of the methods used to quantitatively define the temporal, spatial, and featural properties of attention, and some of their interactions. The temporal window of attention is measured by moving attention from one location to another in which a rapid sequence of different items (e.g., letters or numbers) is being presented. The probability of items from that sequence entering short-term memory defines the time course of attention: typically 100 msec to window opening, maxim at 300-400 msec, and 800 msec to closing. Spatial attention is defined like acuity, by the ability to alternately attend and ignore strips of increasingly finer grids. The spatial frequency characteristic so measured then predicts achievable attention distributions to arbitrarily defined regions. Featural attention is defined by the increased salience of items that contain to-be-attended features. This can be measured in various ways; quickest is an ambiguous motion task which shows that attended features have 30% greater salience than neutral features. Spatio-temporal interaction is measured when attention moves as quickly as possible to a designated area. Attention moves in parallel to all the to-be-attended areas, i.e., temporal-spatial independence. Independence of attentional modes is widely observed; it allows the most efficient neural processing.

Integrative mechanisms for 3D vision: combining psychophysics, computation and neuroscience

Integrative mechanisms for 3D vision: combining psychophysics, computation and neuroscience

Friday, May 7, 1:00 – 3:00 pm
Royal Ballroom 1-3

Organizers: Andrew Glennerster, University of Reading

Presenters: Roland W. Fleming (Max Planck Institute for Biological Cybernetics), James T Todd (Department of Psychology, Ohio State University), Andrew Glennerster (University of Reading), Andrew E Welchman (University of Birmingham), Guy A Orban (K.U. Leuven), Peter Janssen (K.U. Leuven)

Symposium Description

Estimating the three-dimensional (3D) structure of the world around us is a central component of our everyday behavior, supporting our decisions, actions and interactions. The problem faced by the brain is classically described in terms of the difficulty of inferring a 3D world from (“ambiguous”) 2D retinal images. The computational challenge of inferring 3D depth from retinal samples requires sophisticated neural machinery that learns to exploit multiple sources of visual information that are diagnostic of depth structure. This sophistication at the input level is demonstrated by our flexibility in perceiving shape under radically different viewing situations. For instance, we can gain a vivid impression of depth from a sparse collection of seemingly random dots, as well as from flat paintings. Adding to the complexity, humans exploit depth signals for a range of different behaviors, meaning that the input complexity is compounded by multiple functional outputs. Together, this poses a significant challenge when seeking to investigate empirically the sequence of computations that enable 3D vision.

This symposium brings together speakers from different perspectives to outline progress in understanding 3D vision. Fleming will start, addressing the question of “What is the information?”, using computational analysis of 3D shape to highlight basic principles that produce depth signatures from a range of cues. Todd and Glennerster will both consider the question of “How is this information represented?”, discussing different types of representational schemes and data structures. Welchman, Orban and Janssen will focus on the question of “How is it implemented in cortex?”. Welchman will discuss human fMRI studies that integrate psychophysics with concurrent measures of brain activity. Orban will review fMRI evidence for spatial correspondence in the processing of different depth cues in the human and monkey brain. Janssen will summarize results from single cell electrophysiology, highlighting the similarities and differences between the processing of 3D shape at the extreme ends of the dorsal and ventral pathways. Finally, Glennerster, Orban and Janssen will all address the question of how depth processing is affected by task.

The symposium should attract a wide range of VSS participants, as the topic is a core area of vision science and is enjoying a wave of public enthusiasm with the revival of stereoscopic entertainment formats. Further, the goal of the session in linking computational approaches to behavior to neural implementation is one that is scientifically attractive.

Abstracts

From local image measurements to 3D shape

Roland W. Fleming, Max Planck Institute for Biological Cybernetics

There is an explanatory gap between the simple local image measurements of early vision, and the complex perceptual inferences involved in estimating object properties such as surface reflectance and 3D shape.  The main purpose of my presentation will be to discuss how populations of filters tuned to different orientations and spatial frequencies can be ‘put to good use’ in the estimation of 3D shape.  I’ll show how shading, highlights and texture patterns on 3D surfaces lead to highly distinctive signatures in the local image statistics, which the visual system could use in 3D shape estimation.  I will discuss how the spatial organization of these measurements provides additional information, and argue that a common front end can explain both similarities and differences between various monocular cues.  I’ll also present a number of 3D shape illusions and show how these can be predicted by image statistics, suggesting that human vision does indeed make use of these measurements.

The perceptual representation of 3D shape

James T Todd, Department of Psychology, Ohio State University

One of the fundamental issues in the study of 3D surface perception is to identify the specific aspects of an object’s structure that form the primitive components of an observer’s perceptual knowledge.  After all, in order to understand shape perception, it is first necessary to define what ”shape” is.  In this presentation, I will assess several types of data structures that have been proposed for representing 3D surfaces.   One of the most common data structures employed for this purpose involves a map of the geometric properties in each local neighborhood, such as depth, orientation or curvature. Numerous experiments have been performed in which observers have been required to make judgments of local surface properties, but the results reveal that these judgments are most often systematically distorted relative to the ground truth and surprisingly imprecise, thus suggesting that local property maps may not be the foundation of our perceptual knowledge about 3D shape.  An alternative type of data structure for representing 3D shape involves a graph of the configural relationships among qualitatively distinct surface features, such as edges and vertices. The psychological validity of this type of representation has been supported by numerous psychophysical experiments, and by electrophysiological studies of macaque IT. A third type of data structure will also be considered in which surfaces are represented as a tiling of qualitatively distinct regions based on their patterns of curvature, and there is some neurophysiological evidence to suggest that this type of representation occurs in several areas of the primate cortex.

View-based representations and their relevance to human 3D vision

Andrew Glennerster, School of Psychology and CLS, University of Reading

In computer vision, applications that previously involved the generation of 3D models can now be achieved using view-based representations. In the movie industry this makes sense, since both the inputs and outputs of the algorithms are images, but the same could also be argued of human 3D vision. We explore the implications of view-based models in our experiments.

In an immersive virtual environment, observers fail to notice the expansion of a room around them and consequently make gross errors when comparing the size of objects. This result is difficult to explain if the visual system continuously generates a 3-D model of the scene using known baseline information from interocular separation or proprioception. If, on the other hand, observers use a view-based representation to guide their actions, they may have an expectation of the images they will receive but be insensitive to the rate at which images arrive as they walk.

In the same context, I will discuss psychophysical evidence on sensitivity to depth relief with respect to surfaces. The data are compatible with a hierarchical encoding of position and disparity similar to the affine model of Koenderink and van Doorn (1991).  Finally, I will discuss two experiments that show how changing the observer’s task changes their performance in a way that is incompatible with the visual system storing a 3D model of the shape or location of objects. Such task-dependency indicates that the visual system maintains information in a more ‘raw’ form than a 3D model.

The functional roles of visual cortex in representing 3D shape

Andrew E Welchman, School of Psychology, University of Birmingham

Estimating the depth structure of the environment is a principal function of the visual system, enabling many key computations, such as segmentation, object recognition, material perception and the guidance of movements. The brain exploits a range of depth cues to estimate depth, combining information from shading and shadows to linear perspective, motion and binocular disparity. Despite the importance of this process, we still know relatively little about the functional roles of different cortical areas in processing depth signals in the human brain. Here I will review recent human fMRI work that combines established psychophysical methods, high resolution imaging and advanced analysis methods to address this question. In particular, I will describe fMRI paradigms that integrate psychophysical tasks in order to look for a correspondence between changes in behavioural performance and fMRI activity. Further, I will review information-based fMRI analysis methods that seek to investigate different types of depth representation in parts of visual cortex. This work suggests a key role for a confined ensemble of dorsal visual areas in the processing information relevant to judgments of 3D shape.

Extracting depth structure from multiple cues

Guy A Orban, K.U. Leuven

Multiple cues provide information about the depth structure of objects: disparity, motion and shading and texture. Functional imaging studies in humans have been preformed to localize the regions involved in extracting depth structure from these four cues. In all these studies extensive controls were used to obtain activation sites specific for depth structure. Depth structure from motion, stereo and texture activates regions in both parietal and ventral cortex, but shading only activates a ventral region. For stereo and motion the balance between dorsal and ventral activation depends on the type of stimulus: boundaries versus surfaces. In monkey results are similar to those obtained in humans except that motion is a weaker cue in monkey parietal cortex. At the single cell level neurons are selective for gradients of speed, disparity and texture. Neurons selective for first and second order gradients of disparity will be discussed by P Janssen. I will concentrate on neurons selective for speed gradients and review recent data indicating that a majority of FST neurons is selective for second order speed gradients.

Neurons selective to disparity defined shape in the temporal and parietal cortex

Peter Janssen, K.U. Leuven; Bram-Ernst Verhoef, KU Leuven

A large proportion of the neurons in the rostral lower bank of the Superior Temporal Sulcus, which is part of IT, respond selectively to disparity-defined 3D shape (Janssen et al., 1999; Janssen et al., 2000). These IT neurons preserve their selectivity for different positions-in-depth, which proves that they respond to the spatial variation of disparity along the vertical axis of the shape (higher-order disparity selectivity). We have studied the responses of neurons in parietal area AIP, the end stage of the dorsal visual stream and crucial for object grasping, to the same disparity-defined 3D shapes (Srivastava et al., 2009). In this presentation I will review the differences between IT and AIP in the neural representation of 3D shape. More recent studies have investigated the role of AIP and IT in the perceptual discrimination of 3D shape using simultaneous recordings of spikes and local field potentials in the two areas, psychophysics and reversible inactivations. AIP and IT show strong synchronized activity during 3D-shape discrimination, but only IT activity correlates with perceptual choice. Reversible inactivation of AIP produces a deficit in grasping but does not affect the perceptual discrimination of 3D shape. Hence the end stages of both the dorsal and the ventral visual stream process disparity-defined 3D shape in clearly distinct ways. In line with the proposed behavioral role of the two processing streams, the 3D-shape representation in AIP is action-oriented but not crucial for 3D-shape perception.

 

Understanding the interplay between reward and attention, and its effects on visual perception and action

Understanding the interplay between reward and attention, and its effects on visual perception and action

Friday, May 7, 3:30 – 5:30 pm
Royal Ballroom 4-5

Organizers: Vidhya Navalpakkam, Caltech; Leonardo Chelazzi, University of Verona, Medical School, Italy and  Jan Theeuwes, Vrije Universiteit, the Netherlands

Presenters: Leonardo Chelazzi (Department of Neurological and Visual Sciences, University of Verona – Medical School, Italy), Clayton Hickey (Department of Cognitive Psychology, Vrije Universiteit Amsterdam, The Netherlands), Vidhya Navalpakkam (Division of Biology, Caltech, Pasadena), Miguel Eckstein (Department of Psychology, University of California, Santa Barbara), Pieter R. Roelfsema (Dept. Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam), Jacqueline Gottlieb (Dept. of Neuroscience and Psychiatry, Columbia University, New York)

Symposium Description

Adaptive behavior requires that we deploy attention to behaviorally relevant objects in our  visual environment. The mechanisms of selective visual attention and how it affects visual perception have been a topic of extensive research in the last few decades. In comparison, little is known about the role of reward incentives and how they affect attention and visual perception. Generally, we choose actions that in prior experience have resulted in a rewarding outcome, a principle that has been formalized in reward learning theory. Recent developments in vision research suggest that selective attention may be guided by similar economic principles. This symposium will provide a forum for researchers examining the interplay between reward and attention, and their effects on visual perception and action, to present their work and discuss their developing ideas.

The goal of this symposium will be to help bridge the existing gap between the fields of vision that focused on attention, and decision-making that focused on reward, to better understand the combined roles of reward and attention on visual perception and action. Experts from different faculties including psychology, neuroscience and computational modeling will present novel findings on reward and attention, and outline challenges and future directions, that we hope will lead to a cohesive theory. The first three talks will focus on behavior and modeling. Leo Chelazzi will speak about how attentional deployment may be biased by the reward outcomes of past attentional episodes, such as the gains and losses associated with attending to objects in the past. Vidhya Navalpakkam will speak about how reward information may bias saliency computations to influence overt attention and choice in a visual search task. Miguel Eckstein will show how human eye movement strategies are influenced by reward, and how they compare with an ideal reward searcher. The last three talks will focus on neurophysiological evidence for interactions between reward and attention. Clayton Hickey will provide behavioral and EEG evidence for direct, non-volitional role of reward-related reinforcement learning in human attentional control. Pieter Roelfsema will present neural evidence on the remarkable correspondence between effects of reward and attention on competition between multiple stimuli, as early as in V1, suggesting a unification of theories of reward expectancy and attention. Finally, Jacqueline Gottlieb will present neural evidence from LIP on how reward-expectation shapes attention, and compare it with studies on how reward-expectation shapes decision-making.

We expect the symposium to be relevant to a wide audience with interests in psychology, neuroscience, and modeling of attention, reward, perception or decision-making.

Abstracts

Gains and losses adaptively adjust attentional deployment towards specific objects

Leonardo Chelazzi, Department of Neurological and Visual Sciences, University of Verona – Medical School, Italy; Andrea Perlato, Department of Neurological and Visual Sciences, University of Verona – Medical School, Italy; Chiara Della Libera, Department of Neurological and Visual Sciences, University of Verona – Medical School, Italy

The ability to select and ignore specific objects improves considerably due to prior experience (attentional learning). However, such learning, in order to be adaptive, should depend on the more-or-less favourable outcomes of past attentional episodes. We have systematically explored this possibility by delivering monetary rewards to human observers performing attention-demanding tasks. In all experiments, participants were told that high and low rewards indexed optimal and sub-optimal performance, respectively, though reward amount was entirely pre-determined. Firstly, we demonstrated that rewards adjust the immediate consequences of actively ignoring a distracter, known as negative priming. Specifically, we found that negative priming is only obtained following high rewards, indicating that lingering inhibition is abolished by poor outcomes. Subsequently, we assessed whether rewards can also adjust attentional biases in the distant future. Here, observers were trained with a paradigm where, on each trial, they selected a target while ignoring a distracter, followed by differential reward. Importantly, the probability of a high vs. low reward varied for different objects. Participants were then tested days later in the absence of reward. We found that now the observers’ ability to select and ignore specific objects strongly depended on the probability of high vs. low reward associated to a given object during training and also critically on whether the imbalance had been applied when the object was shown as target or distracter during training. These observations show that an observer’s attentional biases towards specific objects strongly reflect the more-or-less favourable outcomes of past attentional processing of the same objects.

Understanding how reward and saliency affect overt attention and decisions

Vidhya Navalpakkam, Division of Biology, Caltech; Christof Koch, Division of Engineering, Applied Science and Biology, Caltech; Antonio Rangel, Division of Humanities and Social Sciences, Caltech; Pietro Perona, Division of Engineering and Applied Science, Caltech

The ability to rapidly choose among multiple valuable targets embedded in a complex perceptual environment is key to survival in many animal species. Targets may differ both in their reward value as well as in their low-level perceptual properties (e.g., visual saliency). Previous studies investigated separately the impact of either value on decisions, or saliency on attention, thus it is not known how the brain combines these two variables to influence attention and decision-making. In this talk, I will describe how we addressed this question with three experiments in which human subjects attempted to maximize their monetary earnings by rapidly choosing items from a brief display. Each display contained several worthless items (distractors) as well as two targets, whose value and saliency were varied systematically. The resulting behavioral data was compared to the predictions of three computational models which assume that: (1) subjects seek the most valuable item in the display, (2) subjects seek the most easily detectable item (e.g., highest saliency), (3) subjects behave as an ideal Bayesian observer who combines both factors to maximize expected reward within each trial. We find that, regardless of the motor response used to express the choices, decisions are influenced by both value and feature-contrast in a way that is consistent with the ideal Bayesian observer. Thus, individuals are able to engage in optimal reward harvesting while seeking multiple relevant targets amidst clutter. I will describe ongoing studies on whether attention, like decisions, may also be influenced by value and saliency to optimize reward harvesting.

Optimizing eye movements in search for rewards

Miguel Eckstein, Department of Psychology, University of California, Santa Barbara; Wade Schoonveld, Department of Psychology, University of California, Santa Barbara; Sheng Zhang, Department of Psychology, University of California, Santa Barbara

There is a growing literature investigating how rewards influence the planning of saccadic eye movements and the activity of underlying neural mechanisms (for a review see, Trommershauser et al., 2009). Most of these studies reward correct eye movements towards a target at a given location (e.g., Liston and Stone, 2008). Yet, in every day life, rewards are not directly linked to eye movements but rather to a correct perceptual decision and follow-up action.  The role of eye movements is to explore the visual scene and maximize the gathering of information for a subsequent perceptual decision.  In this context, we investigate how varying the rewards across locations assigned to correct perceptual decisions in a search task influences the planning of human eye movements. We extend the ideal Bayesian searcher (Najemnik & Geisler, 2005) by explicitly including reward structure to: 1) determine the (optimal) fixation sequences that maximize total reward gains; 2) predict the theoretical increase in gains from taking into account reward structure in planning eye movements during search.  We show that humans strategize their eye movements to collect more reward.  The pattern of human fixations shares many of the properties with the fixations of the ideal reward searcher. Human increases in total gains from using information about the reward structure are also comparable to the benefits in gains of the ideal searcher.  Finally, we use theoretical simulations to show that the observed discrepancies between the fixations of humans and the ideal reward searcher do not have major impact in the total collected rewards. Together, the results increase our understanding of how rewards influence optimal and human saccade planning in ecologically valid tasks such as visual search.

Incentive salience in human visual attention

Clayton Hickey, Department of Cognitive Psychology, Vrije Universiteit Amsterdam; Leonardo Chelazzi, Department of Neurological and Visual Sciences, University of Verona – Medical School; Jan Theeuwes, Department of Cognitive Psychology, Vrije Universiteit Amsterdam

Reward-related midbrain dopamine guides animal behavior, creating automatic approach towards objects associated with reward and avoidance from objects unlikely to be beneficial. Using measures of behavior and brain electricity we show that the dopamine system implements a similar principle in the deployment of covert attention in humans. Participants attend to an object associated with monetary reward and ignore an object associated with sub-optimal outcome, and do so even when they know this will result in bad task performance. The strength of reward’s impact on attention is predicted by the neural response to reward feedback in anterior cingulate cortex, a brain area known to be a part of the dopamine reinforcement circuit. These results demonstrate a direct, non-volitional role for reinforcement learning in human attentional control.

Reward expectancy biases selective attention in the primary visual cortex

Pieter R. Roelfsema, Dept. Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam; Chris van der Togt, Dept. Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam; Cyriel  Pennartz, Dept. Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam; Liviu Stănişor, Dept. Vision & Cognition, Netherlands Institute for Neuroscience, Amsterdam

Rewards and reward expectations influence neuronal activity in many brain regions as stimuli associated with a higher reward tend to give rise to stronger neuronal responses than stimuli associated with lower rewards. It is difficult to dissociate these reward effects from the effects of attention, as attention also modulates neuronal activity in many of the same structures (Maunsell, 2004). Here we investigated the relation between rewards and attention by recording neuronal activity in the primary visual cortex (area V1), an area usually not believed to play a crucial role in reward processing, in a curve-tracing task with varying rewards. We report a new effect of reward magnitude in area V1 where highly rewarding stimuli cause more neuronal activity than unrewarding stimuli, but only if there are multiple stimuli in the display. Our results demonstrate a remarkable correspondence between reward and attention effects. First, rewards bias the competition between simultaneously presented stimuli as is also true for selective attention. Second, the latency of the reward effect is similar to the latency of attentional modulation (Roelfsema, 2006). Third, neurons modulated by rewards are also modulated by attention. These results inspire a unification of theories about reward expectation and selective attention.

How reward shapes attention and the search for information

Jacqueline Gottlieb, Dept. of Neuroscience and Psychiatry, Columbia University; Christopher Peck, Dept. of Neuroscience and Psychiatry, Columbia University; Dave Jangraw, Dept. of Neuroscience and Psychiatry, Columbia University

In the neurophysiological literature with non-human primates, much effort has been devoted to understanding how reward expectation shapes decision making, that is, the selection of a specific course of action. On the other hand, we know nearly nothing about how reward shapes attention, the selection of a source of information. And yet, understanding how organisms value information is critical for predicting how they will allocate attention in a particular task. In addition, it is critical for understanding active learning and exploration, behaviors that are fundamentally driven by the need to discover new information that may prove valuable for future tasks.
To begin addressing this question we examined how neurons located in the parietal cortex, which encode the momentary locus of attention, are influenced by the reward valence of visual stimuli. We found that reward predictors bias attention in valence-specific manner. Cues predicting reward produced a sustained excitatory bias and attracted attention toward their location. Cues predicting no reward produced a sustained inhibitory bias and repulsed attention from their location. These biases were persisted and even grew with training, even though they came in conflict with the operant requirement of the task, thus lowering the animal’s task performance.  This pattern diverges markedly from the assumption of reinforcement learning (that training improves performance and overcomes maladaptive biases, and suggests that the effects of reward on attention may differ markedly from the effects on decision making. I will discuss these findings and their implications for reward and reward-based learning in cortical systems of attention.

 

Vision Sciences Society