Oddness at a glance: Unraveling the time course of typical and atypical scene perception
53.521, Tuesday, May 14, 8:30 am - 12:30 pm, Vista Ballroom
Abraham Botros1, Michelle Greene1, Li Fei-Fei1; 1Computer Science Department, Stanford University
Our ability to quickly recognize the "gist" of a scene is nothing short of remarkable. However, little is known about the content of mental representations built during brief glances. To what extent does scene gist perception rely on prior experience and expectations? In the face of atypical input, is additional processing necessary for recognition? Here, we examined the perceptual time course of both typical and atypical scene stimuli. We used a carefully-selected collection of real-world scene images, consisting of 50 "odd" and 50 "doppelganger" images. "Odd" images contained improbable real-world situations, such as divers signing papers underwater or a wild animal on a couch. "Doppelgangers" were visually similar to their "odd" counterparts except for the root "oddness." We assessed scene perception using a free-response system coupled with variable presentation time. Ten participants viewed odd and doppelganger images at counter-balanced presentation times (20ms, 40ms, 80ms, 150ms, and 500ms, masked); participants were instructed to type descriptions of what they saw in as much detail as possible. Responses were analyzed using a concept tree in an Amazon Mechanical Turk (AMT) interface. AMT workers evaluated the general correctness and detail, the number and specificity of objects and scene details mentioned, and the demonstrated understanding of the oddness in the picture. There was a steady increase in all of the aforementioned factors as presentation time increased. In addition, all of these factors showed poorer performance for odd images compared to doppelgangers. In particular, participants required between 150ms and 500ms to correctly describe odd images. In shorter presentation times, participants had a defined tendency to rationalize impoverished visual input into sensible explanations more akin to normal visual experience. Overall, this implicates the possibility of top-down constraints imposed on early sensory input in order to maximize hypothesis likelihood, especially for atypical real-world scenes.