Scene Structure Predicts Perceptual Decisions in Naturalistic Detection Tasks

Poster Presentation 33.474: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Decision Making: Perception 2

Jun Yang1,2 (), Tiziana Vercillo2, Teresa Emma Cutrona2, Simone Azeglio1,3, Giandomenico Iannetti2,4, Peter Neri1,2; 1Ecole Normale Supérieure-PSL, France, 2Italian Institute of Technology, Italy, 3Institut de la Vision, France, 4University College London, UK

We recognize objects effortlessly, even when embedded within complex natural scenes. It remains unclear how this feat is achieved---in particular, we do not know how scene structure shapes perceptual decision-making. We tackle this issue by integrating four complementary approaches: controlled psychophysics, augmented reality (AR)-based naturalistic paradigms, deep neural network (DNN) modeling, and electroencephalography (EEG) decoding. All paradigms involved the same detection protocol: participants were asked to report the absence/presence of a target probe (e.g. short segment) randomly placed within the scene. We trained neural networks to predict psychophysical responses (correct/incorrect) generated by humans under various manipulations (probe type, image category, task structure, and timing of spatial cues). DNNs achieved reliable prediction using only contextual information (scene background), without any information about target configuration. Network predictability was greater for conditions under which humans were presented with greater spatial uncertainty, highlighting the role of global scene context in determining the operation of local perceptual processes. AR experiments extended these findings to real-world, naturalistic behavioral settings. Under some conditions, elementary image features (edge density and texture entropy) supported reliable classification of correct/incorrect responses via traditional machine learning methods, providing an interpretable account of perceptual variability. EEG recordings revealed neural signatures of perceptual outcome prediction during context-only presentations, and fusing EEG information with image-based information significantly improved decoding accuracy, showing that neural activity captures aspects of scene structure that modulate probe detection outcomes. Together, these converging results establish that natural scene statistics systematically modulate local perceptual processes independently of target/task configuration, and provide a unified framework for linking environmental structure, cortical dynamics, and perceptual decision-making in real-world detection tasks.