A scene is more than the sum of its objects: The mechanisms of object-object and object-scene integration

Time/Room: Friday, May 19, 2017, 12:00 – 2:00 pm, Talk Room 1
Organizer(s): Liad Mudrik, Tel Aviv University and Melissa Võ, Goethe University Frankfurt
Presenters: Michelle Greene, Monica S. Castelhano, Melissa L.H. Võ, Nurit Gronau, Liad Mudrik

< Back to 2017 Symposia

Symposium Description

In the lab, vision researchers are typically trying to create “clean”, controlled environments and stimulations in order to tease apart the different processes that are involved in seeing. Yet in real life, visual comprehension is never a sterile process: objects appear with other objects in cluttered, rich scenes, which have certain spatial and semantic properties. In recent years, more and more studies are focusing on object-object and object-scene relations as possible guiding principles of vision. The proposed symposium aims to present current findings in this continuously developing field, while specifically focusing on two key questions that have attracted substantial scientific interest in recent years; how do scene-object and object-object relations influence object processing, and what are the necessary conditions for deciphering these relations. Greene, Castelhano and Võ will each tackle the first question in different ways, using information theoretic measures, visual search findings, eye movement, and EEG measures. The second question will be discussed with respect to attention and consciousness: Võ’s findings suggest automatic processing of object-scene relations, but do not rule out the need for attention. This view is corroborated and further stressed by Gronau’s results. With respect to consciousness, Mudrik, however, will present behavioral and neural data suggesting that consciousness may not be an immediate condition for relations processing, but rather serve as a necessary enabling factor. Taken together, these talks should lay the ground for an integrative discussion of both complimentary and conflicting findings. Whether these are based on different theoretical assumptions, methodologies or experimental approaches, the core of the symposium will speak to how to best tackle the investigation of the complexity of real-world scene perception.


Measuring the Efficiency of Contextual Knowledge

Speaker: Michelle Greene, Stanford University

The last few years have brought us both large-scale image databases and the ability to crowd-source human data collection, allowing us to measure contextual statistics in real world scenes (Greene, 2013). How much contextual information is there, and how efficiently do people use it? We created a visual analog to a guessing game suggested by Claude Shannon (1951) to measure the information scenes and objects share. In our game, 555 participants on Amazon’s Mechanical Turk (AMT) viewed scenes in which a single object was covered by an opaque bounding box. Participants were instructed to guess about the identity of the hidden object until correct. Participants were paid per trial, and each trial terminated upon correctly guessing the object, so participants were incentivized to guess as efficiently as possible. Using information theoretic measures, we found that scene context can be encoded with less than 2 bits per object, a level of redundancy that is even greater than that of English text. To assess the information from scene category, we ran a second experiment in which the image was replaced by the scene category name. Participants still outperformed the entropy of the database, suggesting that the majority of contextual knowledge is carried by the category schema. Taken together, these results suggest that not only is there a great deal of information about objects coming from scene categories, but that this information is efficiently encoded by the human mind.

Where in the world?: Explaining Scene Context Effects during Visual Search through Object-Scene Spatial Associations

Speaker: Monica S. Castelhano, Queen’s University

The spatial relationship between objects and scenes and its effects on visual search performance has been well-established. Here, we examine how object-scene spatial associations support scene context effects on eye movement guidance and search efficiency. We reframed two classic visual search paradigms (set size and sudden onset) according to the spatial association between the target object and scene. Using the recently proposed Surface Guidance Framework, we operationalize target-relevant and target-irrelevant regions. Scenes are divided into three regions (upper, mid, lower) that correspond with possible relevant surfaces (wall, countertop, floor). Target-relevant regions are defined according to the surface on which the target is likely to appear (e.g., painting, toaster, rug). In the first experiment, we explored how spatial associations affect search by manipulating search size in either target-relevant or target-irrelevant regions. We found that only set size increases in target-relevant regions adversely affected search performance. In the second experiment, we manipulated whether a suddenly-onsetting distractor object appeared in a target-relevant or target-irrelevant region. We found that fixations to the distractor were significantly more likely and search performance was negatively affected in the target-relevant condition. The Surface Guidance Framework allows for further exploration of how object-scene spatial associations can be used to quickly narrow processing to specific areas of the scene and largely ignore information in other areas. Viewing effects of scene context through the lens of target-relevancy allows us to develop new understanding of how the spatial associations between objects and scenes can affect performance.

What drives semantic processing of objects in scenes?

Speaker: Melissa L.H. Võ, Goethe University Frankfurt

Objects hardly ever appear in isolation, but are usually embedded in a larger scene context. This context — determined e.g. by the co-occurrence of other objects or the semantics of the scene as a whole — has large impact on the processing of each and every object. Here I will present a series of eye tracking and EEG studies from our lab that 1) make use of the known time-course and neuronal signature of scene semantic processing to test whether seemingly meaningless textures of scenes are sufficient to modulate semantic object processing, and 2) raise the question of its automaticity. For instance, we have previously shown that semantically inconsistent objects trigger an N400 ERP response similar to the one known from language processing. Moreover, an additional but earlier N300 response signals perceptual processing difficulties that go in line with classic findings of impeded object identification from the 1980s. We have since used this neuronal signature to investigate scene context effects on object processing and recently found that a scene’s mere summary statistics — visualized as seemingly meaningless textures — elicit a very similar N400 response. Further, we have shown that observers looking for target letters superimposed on scenes fixated task-irrelevant semantically inconsistent objects embedded in the scenes to a greater degree and without explicit memory for these objects. Manipulating the number of superimposed letters reduced this effect, but not entirely. As part of this symposium, we will discuss the implications of these findings for the question as to whether object-scene integration requires attention.

Vision at a glance: the necessity of attention to contextual integration processes

Speaker: Nurit Gronau, The Open University of Israel

Objects that are conceptually consistent with their environment are typically grasped more rapidly and efficiently than objects that are inconsistent with it. The extent to which such contextual integration processes depend on visual attention, however, is largely disputed. The present research examined the necessity of visual attention to object-object and object-scene contextual integration processes during a brief visual glimpse. Participants performed an object classification task on associated object pairs that were either positioned in expected relative locations (e.g., a desk-lamp on a desk) or in unexpected, contextually inconsistent relative locations (e.g., a desk-lamp under a desk). When both stimuli were relevant to task requirements, latencies to spatially consistent object pairs were significantly shorter than to spatially inconsistent pairs. These contextual effects disappeared, however, when spatial attention was drawn to one of the two object stimuli while its counterpart object was positioned outside the focus of attention and was irrelevant to task-demands. Subsequent research examined object-object and object-scene associations which are based on categorical relations, rather than on specific spatial and functional relations. Here too, processing of the semantic/categorical relations necessitated allocation of spatial attention, unless an unattended object was explicitly defined as a to-be-detected target. Collectively, our research suggests that associative and integrative contextual processes underlying scene understanding rely on the availability of spatial attentional resources. However, stimuli which comply with task-requirements (e.g., a cat/dog in an animal, but not in a vehicle detection task) may benefit from efficient processing even when appearing outside the main focus of visual attention.

Object-object and object-scene integration: the role of conscious processing

Speaker: Liad Mudrik, Tel Aviv University

On a typical day, we perform numerous integration processes; we repeatedly integrate objects with the scenes in which they appear, and decipher the relations between objects, resting both on their tendency to co-occur and on their semantic associations. Such integration seems effortless, almost automatic, yet computationally speaking it is highly complicated and challenging. This apparent contradiction evokes the question of consciousness’ role in the process: is it automatic enough to obviate the need for conscious processing, or does its complexity necessitate the involvement of conscious experience? In this talk, I will present EEG, fMRI and behavioral experiments that tap into consciousness’ role in processing object-scene integration and object-object integration. The former revisits subjects’ ability to integrate the relations (congruency/incongruency) between an object and the scene in which it appears. The latter examines the processing of the relations between two objects, in an attempt to differentiate between associative relations (i.e., relations that rest on repeated co-occurrences of the two objects) vs. abstract ones (i.e., relations that are more conceptual, between two objects that do not tend to co-appear but are nevertheless related). I will claim that in both types of integration, consciousness may function as an enabling factor rather than an immediate necessary condition.

< Back to 2017 Symposia