Measuring the salience of an object in a scene
22.25, Saturday, 17-May, 10:45 am - 12:30 pm, Talk Room 2
Alasdair Clarke1, Michal Dziemianko1, Frank Keller1; 1Institute for Language, Cognition and Computation, School of Informatics, University of Edinburgh
Over the past 15 years work on visual salience has been restricted to models of low-level, bottom-up salience that give an estimate of the salience for every pixel in an image. This study concerns the question of how to measure the salience of objects. More precisely, given an image and a list of areas of interest (AOIs), can we assign salience scores to the AOIs that reflect their visual prominence? Treating salience as a per-object feature allows us to incorporate a notion of salience into higher-level, cognitive models. There is increasing evidence that fixations locations are best explained at an object level [Einhauser et al 2008, JoV; Nuthmann & Henderson 2010, JoV] and an object-level notion of visual salience can be easily incorporated with other object features representing semantics [Hwang et al 2011, VisRes; Greene 2013, FrontiersPsych] and task relevance]. Extracting scores for AOIs from the saliency maps that are output by existing models is a non-trivial task. Using simple psychophysical (1/f-noise) stimuli, we demonstrate that simple methods for assigning salience score to AOIs (such as taking the maxima, mean, or sum of the relevant pixels in the salience map) produce unintuitive results, such as predicting that larger objects are less salient. We also evaluate object salience models over a range of tasks and compare to empirical data. Beyond predicting the number of fixations to different objects in a scene, we also estimate the difficulty of visual search trials; and incorporate visual salience into language production tasks. We present a simple object-based salience model (based on comparing the likelihood of an AOI given the rest of the image to the likelihood of a typical patch of the same area) that gives intuitive results for the 1/f-noise stimuli and performs as well as existing methods on empirical datasets.