Understanding the time course and spatial biases of natural scene segmentation

Poster Presentation 56.350: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Perceptual Organization: Parts, wholes, groups

Ruben Coen-Cagli1 (), Jonathan Vacher2, Dennis Cregin1, Tringa Lecaj1, Sophie Molholm1, Pascal Mamassian3; 1Albert Einstein College of Medicine, 2Université Paris Cité, 3Ecole Normale Supérieure

Image segmentation is central to visual function, yet human’s ability to cut natural scenes into individual objects or segments remains largely unexplored because it is notoriously difficult to study experimentally. We present a new experimental paradigm that overcomes this barrier. We flash two dots briefly, before and during the presentation of a natural image, and the observers report whether they perceive that the image regions near the two dots belong to the same or different segments. By repeatedly sampling multiple locations on the image, we then reconstruct a perceptual probabilistic segmentation map, namely the probabilities that each pixel belongs to any segment. Leveraging this method, we addressed two fundamental questions. First, strong spatial biases (a preference to group together items that are close in visual space) have been revealed using synthetic stimuli, but are they part of natural vision? Our data–unsurprisingly, but for the first time–directly shows spatial biases in human perceptual segmentation of natural images. The probability that participants reported two regions as grouped together, decreased with the distance between the two dots, regardless of whether the two regions belonged to the same or different segments in the perceptual segmentation maps. Second, is perceptual segmentation of natural images fast and parallel across the visual field, or a serial, time-consuming process? A prominent theory proposes that judging if two regions are grouped requires a gradual spread of attention between those regions, thus taking longer at larger distances (e.g. Jeurissen et al 2016 eLife). Surprisingly, whereas reaction times in our task increased with distance when the two regions were judged to be in the same segment, consistent with the theory, reaction times decreased with distance otherwise. We show that a dynamic Bayesian ideal observer model unifies these findings, through the interaction between spatial biases and evidence accumulation.

Acknowledgements: This research was supported by a NIH-ANR CRCNS grant (NIH-EY031166 to R.C.C. and ANR-19-NEUC-0003 to P.M.) and NIH grant P50 HD105352 (Support for the Rose F. Kennedy IDD Research Center, S.M.).