Stereo Vision Without 3D Geometry: Keeping Stereo Vision in Retinal (vs World) Coordinates

Poster Presentation 33.424: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Pavilion
Session: 3D Shape and Space Perception: Miscellaneous

Paul Linton1 (); 1Columbia University

How we infer 3D geometry from multiple viewpoints has received renewed interest (Tsao & Tsao, 2022; O’Connell et al., 2025; Lee & Watrous et al., 2025). We formalize an alternative (‘minimal’) model of stereo vision, where stereo vision isn’t about inferring 3D geometry, but eradicating rivalry relative to fixation (Linton, 2023). On this model, the three common stages of stereo analysis are a mistake: 1. RELATIVE DISPARITIES: Stereo depth being nearer vs. further than fixation is a fundamentally different visual experience. It is therefore a mistake to eradicate fixation (move from ‘absolute’ to ‘relative’ disparities), as much of the literature does. Depth from diplopia (Ziegler & Hess, 1997) is also hard to explain otherwise. 2. CORRESPONDENCE PROBLEM: The world doesn’t change with fixation, but our stereo percept does, leading to manifestly illogical outcomes. Cross-fusing two coins side-by-side leads to a single fused central coin and two monocular flankers. This is hard to justify as a ‘solution’ to the correspondence problem. Instead, stereo vision appears to fuse whatever is on the fovea, and stereo matching (and depth) radiates outwards from that fixed starting point. 3. DISPARITY SCALING: Linton Stereo Illusion (VSS Demo Night 2024; ECVP 2024) demonstrates that perceived stereo depth is not inferred from the underlying 3D geometry. Instead, stereo depth appears to be a linear function of disparity on the retina. We aim at a model that is independent of physical size and distance. Pilot data suggests z = 5.4x, where x is disparity, and z is the x-axis angular size match of the perceived depth. IMPLICATIONS: First, this approach provides a more plausible evolutionary account for stereo vision. Second, it suggests a mechanism consistent with early (V1) processing. Third, it transforms our understanding of visual space, from a fully articulated 3D space, to fixation plus nearer/further than fixation.

Acknowledgements: This research project and related results were made possible by the support of the NOMIS Foundation. NOMIS Foundation ‘New Theory of Visual Experience’ grant to PL at the Italian Academy for Advanced Studies, Columbia University. Research conducted in Visual Inference Lab, Zuckerman Mind Brain Behavior Institute, Columbia University.