Find the Orange: How rich and accurate is the visual percept that guides action?

Poster Presentation 36.328: Sunday, May 19, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Scene Perception: Virtual environments, intuitive physics

Aryan Zoroufi1 (), Nishad Gothoskar1, Josh Tenenbaum1, Nancy Kanwisher1; 1Massachusetts Institute of Technology

Are the visual representations that guide our online interactions with the world sparse and impoverished, or richly detailed, including the 3D shape of objects and their spatial and physical relationship to each other? We address this question using a naturalistic virtual reality environment in which participants (N=10) are asked to find an occluded target object (orange) on a tabletop environment as quickly as possible, by pressing a button to indicate which occluder should be moved first, or by reaching directly for the occluder. The occluders differ in width, orientation, 3D shape, and the presence of holes (which enable the participant to see through parts of the occluder). As instructed, people launch their actions quickly, within 500 ms after stimulus onset. We find that the decisions about which of two occluder objects to move first are guided by fairly accurate estimates of the area behind occluders that take into account 1) the 3D structure of the scene (not just the 2D pixel area of the occluders) and 2) the relative size of the hidden object. We also find that the decisions are similarly fast and accurate whether participants explicitly report or move the objects. Overall, these results suggest that a fast and accurate 3D representation of both visible and occluded parts of a scene are rapidly available to guide rational search in naturalistic environments. Future work using this framework will investigate whether the information that is rapidly available during naturalistic viewing includes not only geometric but also physical properties of the scene.

Acknowledgements: Thanks to NSF NCS Project 6945933 for funding the study