Exposure to Covarying Visual Depth Cues Across Gaze Distances Promotes Generalized Stereoscopic Processing in Deep Neural Networks

Poster Presentation 23.411: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: 3D Shape and Space Perception: Cues, integration

Joonsik Moon1 (), Peter Bex1; 1Northeastern University

When shifting gaze between far and near distances, vergence eye movements rotate the eyes to fixate in 3D space, inducing changes in low-level image disparity statistics across the retina. Crucially, near and far fixation alters the distributions of binocular crossed and uncrossed disparities along with covariation with other spatial information (e.g., spatial frequency). We hypothesized that exposure to these naturally covarying statistics may be essential for the visual system to learn generalized and robust stereoscopic processing, an aspect often overlooked in standard computational models. We trained Deep Neural Networks on stereoscopic image datasets simulating "Near" and "Far" fixations designed to capture the distance-dependent covariance of disparity range and spatial information. We systematically manipulated training composition by varying the proportion of Far-to-Near fixation ratios and evaluated depth discrimination performance on independent test sets. Networks trained on mixed fixation distance datasets achieved better depth estimation accuracy compared to homogeneous models (trained on 100% Near or Far fixation), even when tested on their trained domain. We also found that mixed training induced a significantly higher standard deviation in the first convolutional layer's filter weights. This result suggests that exposure to broader visual cue statistics prevents overfitting to the intrinsic statistics of specific fixation distance ranges. Instead, the system appears to evolve efficient coding mechanisms to extract inherent cues that remain consistent across variations in gaze geometry. Visual processing mechanisms are shaped by the statistical structure of the natural environment and by binocular viewing geometry. The results suggest that the joint statistics of disparity and spatial cues across varying gaze distances are critical for constructing an efficient and robust stereoscopic system.

Acknowledgements: This research was supported by NIH R01EY032162.