Average Temperature from Visual Scene Ensembles Without Reliance on Color, Contrast or Low Spatial Frequencies

Poster Presentation: Tuesday, May 21, 2024, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Ensembles, natural image statistics

Vignash Tharmaratnam1, Dirk Bernhardt-Walther2, Jonathan S. Cant1; 1University of Toronto Scarborough, 2University of Toronto

Summary statistics for groups (i.e., ensembles) of faces or objects can be rapidly extracted to optimize visual processing, without reliance on visual working memory (VWM). We have previously demonstrated that this ability extends to complex groups of scenes. Namely, participants were able to extract average scene content and spatial boundary from scene ensembles. In the present study we tested whether this ability extends to scene features that are not solely attributable to visual processing and are instead computed cross-modally. Specifically, we examined ensemble processing of scene temperature. Given that the apparent temperature (i.e., how hot or cold a scene would feel) of single scenes is accurately rated by observers (Jung & Walther, 2021), we predicted that average scene temperature could be extracted by observers, without reliance on VWM. Crucially, across 4 experiments, we tested if this ability would depend on low-level visual features. Participants rated the average temperature of scene ensembles, with either colored stimuli (Exp 1), gray-scaled stimuli (Exp. 2), gray-scaled stimuli with a 75% contrast reduction (Exp. 3), or gray-scaled high spatial frequency filtered stimuli (> 6 cycles/degree, Exp. 4). In all experiments, we varied set size by randomly presenting 1, 2, 4, or 6 scenes to participants on each trial, and measured VWM capacity using a 2-AFC task. Participants were able to accurately extract average temperature in all experiments, with all 6 scenes being integrated into their summary statistics. This occurred without relying on VWM, as fewer than 1.2 scenes were remembered on average. These results reveal that computing cross-modal summary statistics (i.e., average temperature) does not rely on lower-level visual features. Overall, these results reveal that with minimal low-level visual information available, abstract multisensory information can be rapidly retrieved and combined from long term memory to form statistical representations.