Exposure to Covarying Visual Depth Cues Across Gaze Distances Promotes Generalized Stereoscopic Processing in Deep Neural Networks

Poster Presentation 23.411: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: 3D Shape and Space Perception: Cues, integration

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

There is a Poster PDF for this presentation, but you must be a current member or registered to attend VSS 2026 to view it.
Please go to your Account Home page to register.

Joonsik Moon¹ (moon.joo@northeastern.edu), Peter Bex¹; ¹Northeastern University

When shifting gaze between far and near distances, vergence eye movements rotate the eyes to fixate in 3D space, inducing changes in low-level image disparity statistics across the retina. Crucially, near and far fixation alters the distributions of binocular crossed and uncrossed disparities along with covariation with other spatial information (e.g., spatial frequency). We hypothesized that exposure to these naturally covarying statistics may be essential for the visual system to learn generalized and robust stereoscopic processing, an aspect often overlooked in standard computational models. We trained Deep Neural Networks on stereoscopic image datasets simulating "Near" and "Far" fixations designed to capture the distance-dependent covariance of disparity range and spatial information. We systematically manipulated training composition by varying the proportion of Far-to-Near fixation ratios and evaluated depth discrimination performance on independent test sets. Networks trained on mixed fixation distance datasets achieved better depth estimation accuracy compared to homogeneous models (trained on 100% Near or Far fixation), even when tested on their trained domain. We also found that mixed training induced a significantly higher standard deviation in the first convolutional layer's filter weights. This result suggests that exposure to broader visual cue statistics prevents overfitting to the intrinsic statistics of specific fixation distance ranges. Instead, the system appears to evolve efficient coding mechanisms to extract inherent cues that remain consistent across variations in gaze geometry. Visual processing mechanisms are shaped by the statistical structure of the natural environment and by binocular viewing geometry. The results suggest that the joint statistics of disparity and spatial cues across varying gaze distances are critical for constructing an efficient and robust stereoscopic system.

Acknowledgements: This research was supported by NIH R01EY032162.

Vision Sciences Society

Exposure to Covarying Visual Depth Cues Across Gaze Distances Promotes Generalized Stereoscopic Processing in Deep Neural Networks

Important Dates

MyVSS

Join VSS

Future Meetings