Synthetic Scene Generation for Evaluating Visual Feature Contributions to Segmentation Decisions

Poster Presentation 56.470: Tuesday, May 19, 2026, 2:45 – 6:45 pm, Pavilion
Session: Multisensory Processing: Audiovisual

Joshua M. Martin1 (), Thomas S. A. Wallis; 1Centre for Cognitive Science and Institute for Psychology, Technical University of Darmstadt., 2Center for Mind, Brain and Behavior (CMBB), Universities of Marburg, Giessen and Darmstadt.

Neural network models have been widely applied to study the processes underlying high-level object recognition; however, mid-level visual processes, such as segmentation and grouping, remain relatively understudied. One reason is the lack of suitable datasets: natural images are difficult to annotate manually and offer limited experimental control over the availability of visual cues. This challenge is well suited to synthetic data generation using physically based rendering , which can produce fully customizable 3D scenes while preserving key aspects of visual realism through physically accurate light transport. Here, we present a Blender-based rendering pipeline for quantifying how different visual features contribute to segmentation decisions. The pipeline draws inspiration from two main paradigms: (1) digital embryos (complex 3D geometric shapes based on simulations of embryonic development), and (2) dead leaves (scenes created by sequentially layering shapes to reproduce key statistical properties of natural images). Together, the resulting scenes are ideal for studying segmentation processes, as they include complex 3D geometrical shapes, while approximating natural scenes through the presence of depth cues and partial occlusions. A central advantage of our approach over existing rendering pipelines is precise control over the availability of visual cues. By independently manipulating features such as material, lighting, geometry, and motion, experimenters can generate scenes that are matched in their overall configuration but differ only in whether one or more cues are informative for a segmentation decision. Furthermore, the scalable and automated nature of the pipeline can generate large, varied datasets with pixel-accurate ground-truth segmentation masks, making it an ideal testbed for training and testing neural network models. Overall, this approach combines the benefits of controlled psychophysics-style stimulus design and large-scale computer vision frameworks, offering a new method for studying and comparing mid-level visual processing in biological and artificial systems.

Acknowledgements: Funded by the European Union (ERC, SEGMENT, 101086774). Views and opinions expressed are however, those of the author(s) only and do not necessarily reflect those of the European Union or the European Research Council. Neither the European Union nor the granting authority can be held responsible for them.