Which Visual Features Shape the Representational Geometry for Prediction?

Poster Presentation 53.427: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Temporal Processing: Neural mechanisms, models

Jiaming Xu1 (), Sai Prashanth Raja Sundaram1, Youjin Oh1, Mary Hayhoe1, Robbe L. T. Goris1; 1University of Texas at Austin

Under natural circumstances, visual inputs have a rich and complex statistical structure. Our visual system must exploit this structure to support key tasks such as temporal prediction—the ability to anticipate upcoming states of the environment. Recent work has argued that this ability is rooted in an information processing strategy known as temporal straightening. From a geometrical perspective, visual prediction is difficult because the stream of images on the retina evolves along irregular, complex temporal trajectories. The temporal straightening hypothesis proposes that the visual system exploits natural statistical regularities to transform this input into representations that follow straighter temporal trajectories, thereby facilitating prediction through simple linear extrapolation. In natural vision, these statistical regularities are grounded in the spatial organization of the scene, with predictive structure arising from the distribution of luminance, chromatic, and binocular information. Here, we ask which of these information sources are essential for temporal prediction and how the geometry of the representational trajectory changes when specific cues are removed. We hypothesize that selectively removing individual information sources diminishes the statistical specification of the scene and progressively impairs predictive processing, as reflected by a reduction in temporal straightening. We generated stimuli from egocentric videos recorded during locomotion by reconstructing environments in 3D using Gaussian Splatting and replaying the original camera path in Virtual Reality. For each video, we created three conditions: fully naturalistic, color removed, and both color and stereo removed. We used an AXB discrimination task to measure perceptual distances across all frame pairs and computed perceptual curvature from these data. Our initial results (two environments, three conditions, one subject) suggest that the impact of removing chromatic or binocular information depends on scene structure, indicating that the neural computations underlying temporal straightening flexibly adapt to environmental statistics to support context-dependent temporal prediction.