## Modern Approaches to Modeling Visual Data

### Friday, May 8, 3:30 – 5:30 pm

Royal Ballroom 6-8

**Organizer:** Kenneth Knoblauch (Inserm, U846, Stem Cell and Brain Research Institute, Bron, France)

**Presenters:** Kenneth Knoblauch (Inserm, U846, Bron, France), David H. Foster (University of Manchester, UK), Jakob H Macke (Max-Planck-Institut für biologische Kybernetik, Tübingen), Felix A. Wichmann (Technische Universität Berlin & Bernstein Center for Computational Neuroscience Berlin, Germany), Laurence T. Maloney (NYU)

### Symposium Description

A key step in vision research is comparison of experimental data to models intended to predict the data. Until recently, limitations on computer power and lack of availability of appropriate software meant that the researcher’s tool kit was limited to a few generic techniques such as fitting individual psychometric functions. Use of these models entails assumptions such as the exact form of the psychometric function that are rarely tested. It is not always obvious how to compare competing models, to show that one describes the data better than another or to estimate what percentage of ‘variability’ in the responses of the observers is really captured by the model. Limitations on the models that researchers are able to fit translate into limitations on the questions they can ask and, ultimately, the perceptual phenomena that can be understood. Because of recent advances in statistical algorithms and the increased computer power available to all researchers, it is now possible to make use of a wide range of computer-intensive parametric and nonparametric approaches based on modern statistical methods. These approaches allow the experimenter to make more efficient use of perceptual data, to fit a wider range of perceptual data, to avoid unwarranted assumptions, and potentially to consider more complex experimental designs with the assurance that the resulting data can be analyzed. Researchers are likely familiar with nonparametric resampling methods such as bootstrapping (Efron, 1979; Efron & Tibshirani, 1993). We review a wider range of recent developments in statistics in the past twenty years including results from the machine learning and model selection literatures. Knoblauch introduces the symposium and describes how a wide range of psychophysical procedures (including fitting psychophysical functions, estimating classification images, and estimating the parameters of signal detection theory) share a common mathematical structure that can be readily addressed by modern statistical approaches. He also shows how to extend these methods to model more complex experimental designs and also discusses modern approaches to smoothing data. Foster describes how to relax the typical assumptions made in fitting psychometric functions and instead use the data itself to guide fitting of psychometric functions. Macke describes a technique—decision-images— for extracting critical stimulus features based on logistic regression and how to use the extracted critical features to generate optimized stimuli for subsequent psychophysical experiments. Wichmann describes how to use “inverse” machine learning techniques to model visual saliency given eye movement data. Maloney discusses the measurement and modeling of super-threshold differences to model appearance and gives several examples of recent applications to surface material perception, surface lightness perception, and image quality. The presentations will outline how these approaches have been adapted to specific psychophysical tasks, including psychometric-function fitting, classification, visual saliency, difference scaling, and conjoint measurement. They show how these modern methods allow experimenters to make better use of data to gain insight into the operation of the visual system than hitherto possible.

## Abstracts

### Generalized linear and additive models for psychophysical data

Kenneth Knoblauch

What do such diverse paradigms as classification images, difference scaling and additive conjoint measurement have in common? We introduce a general framework that permits modeling and evaluating experiments covering a broad range of psychophysical tasks. Psychophysical data are considered within a signal detection model in which a decision variable, d, which is some function, f, of the stimulus conditions, S, is related to the expected probability of response, E[P], through a psychometric function, G: E[P] = G(f(d(S))). In many cases, the function f is linear, in which case the model reduces to E[P] = G(Xb), where X is a design matrix describing the stimulus configuration and b a vector of weights indicating how the observer combines stimulus information in the decision variable. By inverting the psychometric function, we obtain a Generalized Linear Model (GLM). We demonstrate how this model, which has previously been applied to calculation of signal detection theory parameters and fitting the psychometric function, is extended to provide maximum likelihood solutions for three tasks: classification image estimation, difference scaling and additive conjoint measurement. Within the GLM framework, nested hypotheses are easily set-up in a manner resembling classical analysis of variance. In addition, the GLM is easily extended to fitting and evaluating more flexible (nonparametric) models involving arbitrary smooth functions of the stimulus. In particular, this approach permits a principled approach to fitting smooth classification images.

### Model-free estimation of the psychometric function

David H. Foster, K. Zychaluk

The psychometric function is central to the theory and practice of psychophysics. It describes the relationship between stimulus level and a subject’s response, usually represented by the probability of success in a certain number of trials at that stimulus level. The psychometric function itself is, of course, not directly accessible to the experimenter and must be estimated from observations. Traditionally, this function is estimated by fitting a parametric model to the experimental data, usually the proportion of successful trials at each stimulus level. Common models include the Gaussian and Weibull cumulative distribution functions. This approach works well if the model is correct, but it can mislead if not. In practice, the correct model is rarely known. Here, a nonparametric approach based on local linear fitting is advocated. No assumption is made about the true model underlying the data except that the function is smooth. The critical role of the bandwidth is explained, and a method described for estimating its optimum value by cross-validation. A wide range of data sets were fitted by the local linear method and, for comparison, by several parametric models. The local linear method usually performed better and never worse than the parametric ones. As a matter of principle, a correct parametric model will always do better than a nonparametric model, simply because the parametric model assumes more about the data, but given an experimenter’s ignorance of the correct model, the local linear method provides an impartial and consistent way of addressing this uncertainty.

### Estimating Critical Stimulus Features from Psychophysical Data: The Decision-Image Technique Applied to Human Faces

Jakob H. Macke, Felix A. Wichmann

One of the main challenges in the sensory sciences is to identify the stimulus features on which the sensory systems base their computations: they are a pre-requisite for computational models of perception. We describe a technique—decision-images— for extracting critical stimulus features based on logistic regression. Rather than embedding the stimuli in noise, as is done in classification image analysis, we want to infer the important features directly from physically heterogeneous stimuli. A Decision-image not only defines the critical region-of-interest within a stimulus but is a quantitative template which defines a direction in stimulus space. Decision-images thus enable the development of predictive models, as well as the generation of optimized stimuli for subsequent psychophysical investigations. Here we describe our method and apply it to data from a human face discrimination experiment. We show that decision-images are able to predict human responses not only in terms of overall percent correct but are able to predict, for individual observers, the probabilities with which individual faces are (mis-) classified. We then test the predictions of the models using optimized stimuli. Finally, we discuss possible generalizations of the approach and its relationships with other models.

### Non-linear System Identification: Visual Saliency Inferred from Eye-Movement Data

Felix A. Wichmann, Wolf Kienzle, Bernhard Schölkopf, Matthias Franz

For simple visual patterns under the experimenter’s control we impose which information, or features, an observer can use to solve a given perceptual task. For natural vision tasks, however, there are typically a multitude of potential features in a given visual scene which the visual system may be exploiting when analyzing it: edges, corners, contours, etc. Here we describe a novel non-linear system identification technique based on modern machine learning methods that allows the critical features an observer uses to be inferred directly from the observer’s data. The method neither requires stimuli to be embedded in noise nor is it limited to linear perceptive fields (classification images). We demonstrate our technique by deriving the critical image features observers fixate in natural scenes (bottom-up visual saliency). Unlike previous studies where the relevant structure is determined manually—e.g. by selecting Gabors as visual filters—we do not make any assumptions in this regard, but numerically infer number and properties them from the eye-movement data. We show that center-surround patterns emerge as the optimal solution for predicting saccade targets from local image structure. The resulting model, a one-layer feed-forward network with contrast gain-control, is surprisingly simple compared to previously suggested saliency models. Nevertheless, our model is equally predictive. Furthermore, our findings are consistent with neurophysiological hardware in the superior colliculus. Bottom-up visual saliency may thus not be computed cortically as has been thought previously.

### Measuring and modeling visual appearance of surfaces

Laurence T. Maloney

Researchers studying visual perception have developed numerous experimental methods for probing the perceptual system. The range of techniques available to study performance near visual threshold is impressive and rapidly growing and we have a good understanding of what physical differences in visual stimuli are perceptually discriminable. A key remaining challenge for visual science is to develop models and psychophysical methods that allow us to evaluate how the visual system estimates visual appearance. Using traditional methods, for example, it is easy to determine how large a change in the parameters describing a surface is needed to produce a visually discriminable surface. It is less obvious how to evaluate the contributions of these same parameters to perception of visual qualities such as color, gloss or roughness. In this presentation, I’ll describe methods for modeling judgments of visual appearance that go beyond simple rating methods and describe how to model them and evaluate the resulting models experimentally. I’ll describe three applications. The first concerns how illumination and surface albedo contribute to the rated dissimilarity of illuminated surfaces in three-dimensional scenes. The second concerns modeling of super-threshold differences in image quality using difference scaling, and the third concerns application of additive conjoint measurement to evaluating how observers perceive gloss and meso-scale surface texture (‘bumpiness’) when both are varied.