Object and Scene contributions to neural representations in natural images

Poster Presentation 33.332: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Neural mechanisms

Suhyun Kim1, Hojin Jang1; 1Korea University, Seoul, South Korea

Object and scene understanding are fundamental to human vision. Substantial evidence from cognitive neuroscience shows that objects and scenes are processed in partially distinct regions of the visual cortex, suggesting that they rely on different underlying representational mechanisms. Yet real-world environments always contain objects embedded within scenes. Growing work reveals systematic, bidirectional influences between scene context and object processing, hinting at compositional integration rather than strict modularity. In this exploratory study, we investigate how object-derived and scene-derived information jointly contribute to neural representations of complex natural images. Leveraging the large-scale Natural Scenes Dataset (NSD), we investigate whether brain activity patterns are better accounted for by an object-based representational strategy, defined by constituent objects and their relational structure, or by a gist-based strategy that rapidly encodes global layout and scene-level structure. Applying representational similarity analysis across high-level visual cortex with each of the theoretical models, we find that object-based features more strongly reflect the response structure in object-selective regions (LOC, FFA, EBA), whereas both strategies perform comparably in scene-selective regions (PPA, OPA), suggesting complementary contributions to scene understanding. We further compare human neural representational structure to convolutional neural networks sharing an identical architecture but trained to perform either object or scene categorization. Interestingly, scene-trained models exhibit lower similarity to representations derived from the gist-based strategy and higher similarity to those derived from the object-based strategy, relative to object-trained models. Together, these preliminary findings suggest that scene-selective cortical regions encode natural scenes by integrating object structure and global layout. Furthermore, scene-trained convolutional networks, despite their optimization for scene categorization, appear to rely on object-based information for scene representation.

Acknowledgements: This work was supported by the National Research Foundation of Korea grant funded by the Korea government (RS-2024-00451866).