How to build a scene: Relational representations are constructed in a canonical order

Poster Presentation 43.319: Monday, May 22, 2023, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Spatiotemporal factors

Zekun Sun1 (), Chaz Firestone1, Alon Hafri2; 1Johns Hopkins University, 2University of Delaware

The world contains not only objects and features (e.g., glass vases, wooden tables), but also relations holding between them (e.g., glass vases *on* wooden tables). How does the mind combine discrete elements when constructing relational representations from visual scenes? Here, four experiments test the intriguing possibility that the mind builds relational representations according to the ‘roles’ of the participating objects. We take inspiration from psycholinguistics in hypothesizing that ‘reference’ objects (those that are large, stable, and/or physically ‘control’ other objects; e.g., tables) — rather than ‘figure’ objects (e.g., vases) — serve as the scaffold for relational representations. Using a ‘drag-and-drop’ task, we found that when participants had to position items to compose scenes from linguistic descriptions (e.g., “the vase is on the table”), they consistently placed the reference object first (i.e., first fixing the table, then placing the vase on top of it). We next explored whether this pattern goes beyond such preferences and drives visual processing itself. In a recognition task, participants quickly verified whether a description of a subsequently presented visual scene (e.g., “the vase is on the table”) was correct. Crucially, sometimes the reference object (e.g., the table) appeared right before the figure object (the vase), or vice-versa. We found a ‘reference-object advantage’: participants were faster to correctly respond when the reference object appeared before the figure object than vice-versa. Notably, the effect arose no matter the order of elements mentioned in the linguistic descriptions, and it was not explained by size or shape differences, since the same effects arose in an experiment using identical objects differing only in color (e.g., red book on blue book). We suggest that the mind employs a sequential routine for building relational representations from visual scenes, respecting the role that each element plays in the relation.

Acknowledgements: This work is supported by NSF BCS #2021053 awarded to C.F., and NSF SBE #2105228 awarded to A.H.