3D: Disparity and shape

Talk Session: Monday, May 22, 2023, 10:45 am – 12:15 pm, Talk Room 2
Moderator: Fulvio Domini, Brown University

Talk 1, 10:45 am, 42.21

Warping a disparity field: cooperation between shading and disparity for sparsely defined surfaces

Celine Aubuchon1 (), Jovan Kemp1, Fulvio Domini1; 1Brown University

Binocular disparities allow us to perceive the 3D layout and structure of objects even in the absence of other cues. Remarkably, although the disparity field is generally sparse, continuous surfaces appear smooth. This suggests that some interpolation process is used by the visual system to fill in the gaps of the discrete disparity input signal. Previous work has shown that continuous shading gradients may be leveraged in this process by cooperatively “filling in” unspecified gaps where no disparity exists in sparse disparity fields. We systematically tested observers’ ability to measure unspecified regions of stereoscopically presented surfaces using a probe adjustment task. In this task, observers adjusted the binocular disparity of a small dot until it appeared to rest on the surface of a 3D cosine bump. Our aim was to evaluate whether conflicting shading information, though providing no disparity information, would influence how observers interpolate the surface within the unspecified regions. Importantly, we controlled what disparity information was available during different surface conditions, including disparity-only, shading-only, and combined disparity and shading. The disparity field in both the disparity-only and combined-cue condition specified a 3D bump with its peak centered on the image plane. Two different conflicting shading patterns were either presented alone (shading-only) or combined with the disparity surface (combined-cue): one generated from a cosine bump with its peak shifted upwards on the image plane, and one with its peak shifted downwards on the image plane. This created regions of predicted positive and negative depth bias. Strikingly, we found highly reliable depth biases in the combined-cue stimulus predicted by the conflicting shading information, which could not be explained by independent combination of disparity and shading. Instead, we find evidence of a cooperation between disparity and shading, where smooth shading information effectively warps the interpolation of the disparity field.

Talk 2, 11:00 am, 42.22

Disparity modulations from both fixational vergence and version contribute to stereopsis

Yuanhao H. Li1 (), Janis Intoy1, Jonathan D. Victor2, Michele Rucci1; 1University of Rochester, 2Weill Cornell Medical College

Humans extract depth information from differences between the images in the two eyes (binocular disparities), a process known as stereopsis. The visual system is highly sensitive to these differences, a remarkable accomplishment given that disparity signals change continually on the retina. This happens because of fixational eye movements (FEM), the relatively large instability of fixation that causes the two retinal images to move largely independently from each other. Previous work has suggested that, rather than being detrimental, FEM may actually contribute to stereopsis. These experiments focused on the disparity modulations introduced by fixational vergence, which are particularly suited to be used by the visual system since they do not depend on the stimulus. Here we show that disparity modulations due to fixational version are also beneficial. In a forced-choice task, observers reported the vertical slant of a planar surface relative to the frontoparallel plane (top closer than bottom or vice versa). Random-dot stereograms were examined through a stereoscope, while eye movements were recorded at high-resolution via Dual Purkinje Imaging. A custom system for gaze-contingent control updated the display in real-time to either counteract all FEM (retinal stabilization) or selectively eliminate disparity modulations caused by fixational vergence or version. Results confirm that stereoscopic discrimination is impaired under retinal stabilization (d’ was reduced from an average of 2.8 to 1.4) and that fixational vergence alone is sufficient to re-establish normal performance. Critically, however, version alone is also sufficient to recover performance as long as it introduces disparity modulations on the retina. Furthermore, we show that performance in all conditions is proportional to the strength of the disparity modulations caused by retinal image motion (p<0.001) with stronger modulations resulting in higher sensitivity. These findings support the proposal that stereopsis relies on transient disparity signals caused by eye movements.

Acknowledgements: Research supported by NIH grants EY018363 (MR) and EY07977 (JV)

Talk 3, 11:15 am, 42.23

Stereoscopic slant contrast revisited

Clara Wang1 (), Yoel Yakobi1, Frederick Kingdom1; 1McGill University

The perceived slant (or inclination) of a slanted stereoscopic surface is affected by a surrounding stereo-slanted surface. Previous studies have shown that the effect is generally contrasting, that is the perceived slant of the test stereo-surface is shifted away from that of the surrounding stereo-surface. However previous studies have not established whether a surrounding stereo surface affects test surfaces slanted in the opposite slant direction, and have not measured the mutual contrasting effect between test and surround in order to measure their perceived angular difference. Using an adjustable matching slanted surface in a two-interval-forced-choice procedure, observers measured the perceived slant of both a central test as well as its surrounding surface for a range of combinations of test and surround slants. For each combination of test and surround slant two measures were calculated: (1) the perceived slant difference between the test and surround when the two were presented in isolation and (2) the perceived slant difference between the test and surround when the two were presented in combination, i.e. when affecting each other. The difference between these two measures is termed here the mutual contrasting effect, or MCE. MCEs were plotted as a function of surround slant; as the test-surround slant difference increased from zero, there was a sharp rise in MCEs followed by various degrees of decline at larger test-surround slant differences. Importantly, MCEs were consistently observed with opposite signs of test and surround slant. Our findings suggest that positive and negative stereoscopic slants are encoded by a single bipolar mechanism, one subject to mutual inhibitory interactions between neighboring stereo-slanted surfaces that rapidly rise and gradually decline with the angular difference between test and surround.

Acknowledgements: Supported by Natural Science and Engineering Research Council of Canada grant #RGPIN-2016-03915 to FK and research bursaries from the McGill Faculty of Medicine and Health Sciences to CW and YY.

Talk 4, 11:30 am, 42.24

Cardinal viewpoints of 3D objects predicted by 2D optical flow model

Emma E.M. Stewart1 (), Roland W. Fleming1,3, Alexander C. Schütz2,3; 1Justus-Liebig University Giessen, Germany, 2University of Marburg, Germany, 3Center for Mind, Brain and Behavior, Universities of Marburg and Giessen, Germany

Humans have the remarkable ability to determine which viewpoints of 3D objects are qualitatively distinct or special. Seeing an object from such viewpoints can benefit object recognition and recall. In particular, the front, back and side views—sometimes referred to as cardinal viewpoints—are often particularly informative, yet there is a lack of formal, quantitative models to predict which viewpoints are qualitatively distinct and explain what makes them special. Here we compared human discrimination judgements to predictions from a 2D optical-flow model to predict cardinal viewpoints of 3D objects. We tested human discrimination performance at cardinal (front, back) and non-cardinal viewpoints of 35 familiar objects. Participants were shown a base (cardinal or non-cardinal) viewpoint alongside a rotated viewpoint (0, 5, 10, 15 degrees), and indicated whether the two views were the same or different. For n = 100 participants, we found a marked benefit in viewpoint discrimination when the base viewpoint was cardinal, with some variability between objects. We reasoned that a 2D optical-flow model that can predict human viewpoint dissimilarity judgements (Stewart et al, 2022) may also explain the variability in this data. We found that the model could explain human discrimination performance, and could differentiate cardinal from non-cardinal viewpoints. To verify that these findings generalize to non-familiar objects—for which recognizable indicators of the front and back (e.g., face, tail) are absent—we created 10 novel 3D objects, and participants (n = 50) indicated the “front” viewpoint. The model could predict which viewpoints were most likely to be chosen as the front, even with unfamiliar objects. This study shows that a 2D model can predict the cardinal viewpoints of a 3D object, and explain variance in human viewpoint discrimination performance at cardinal and non-cardinal viewpoints. This provides a quantitative method to define qualitatively special viewpoints of 3D objects.

Acknowledgements: This project was supported by the Deutsche Forschungsgemeinschaft, through project numbers 460533638 and 222641018–SFB/TRR-135 TP C1, and by the Research Cluster “The Adaptive Mind”, funded by the Hessian Ministry for Higher Education, Research, Science, and the Arts.

Talk 5, 11:45 am, 42.25

Humans and 3D neural field models make similar 3D shape judgements

Thomas OConnell1 (), Tyler Bonnen2, Yoni Friedman1, Ayush Tewari1, Josh Tenenbaum1, Vincent Sitzmann1, Nancy Kanwisher1; 1MIT, 2Stanford University

Human visual perception captures the 3D shape of objects. While convolutional neural networks (CNNs) are similar to aspects of human visual processing, there is a well-documented gap in performance between CNNs and humans on shape processing tasks. A new deep learning approach, 3D neural fields (3D-NFs), has driven remarkable recent progress in 3D computer vision. 3D-NFs encode the geometry of objects in a coordinate-based representation (e.g. input: xyz coordinate, output: volume density and RGB at that position). Here, we investigate whether humans and 3D-NFs display similar behavior on 3D match-to-sample tasks. In each trial, a participant sees a rendered sample image of a manmade object, then matches it to a target image of the same object from a different viewpoint versus a lure image of a different object. We trained 3D-NFs that take an image as input, then output a rendered image of the depicted object from a new viewpoint (multi-view loss). A trial is correct if 3D-NFs computed from the sample and target images are more similar than 3D-NFs computed from the sample and lure images. In Experiment 1 (n=120), 3D-NF behavior is more similar to human behavior than standard object-recognition CNNs, regardless of whether lure objects were from a.) a different category than the target, b.) the same category as the target, or c.) matched to have the most similar 3D-NF to the target as possible. In Experiment 2 (n=200), we create 5 difficulty conditions using 25 CNNs. Again, we find remarkable agreement between the 3D-NFs and human behavior, with both largely unaffected by the CNN-defined conditions. In Experiment 3 (n=200), we replicate Experiment 2 using algorithmically generated shapes with no category structure. Overall, 3D-NFs and humans show similar patterns of behavior for 3D shape judgements, suggesting 3D-NFs as a promising framework for investigating human 3D shape perception.

Talk 6, 12:00 pm, 42.26

SimpleXR: An open-source Unity toolbox for simplifying vision research using augmented and virtual reality

Justin Kasowski1, Michael Beyeler1; 1University of California, Santa Barbara

Extended reality (XR) is a powerful tool for human behavioral research. The ability to create 3D visual scenes and measure responses to arbitrary visual stimuli enables the behavioral researcher to test hypotheses in a well-controlled environment. However, software packages such as SteamVR, OpenXR, and ARKit have been developed for game designers rather than behavioral researchers. While Unity is considered the most beginner-friendly platform, barriers still exist for inexperienced programmers. Toolboxes such as VREX and USE have focused on simplifying experimental design and remote data collection, but no tools currently exist to help with all aspects of an experiment. To address this challenge, we have developed SimpleXR (sXR), an open-source Unity package that allows for creating complex experiments with relatively little code. The toolbox contains a plethora of tools that are particularly useful for the visual sciences, such as creating dynamic scenes, randomizing object locations, accessing eye-tracker data, and applying full-screen shader effects (e.g., blurring, gaze-contingent scotomas, edge detection) either in virtual reality (VR) or to the pass-through camera for augmented reality (AR) tasks. sXR also provides one-line commands for interacting with virtual objects, displaying stimuli and instructions, using timers, and much more. Additionally, it automatically switches between desktop and immersive VR modes. sXR creates separate user interfaces for the experimenter and participant, allowing the experimenter to track performance and monitor for anomalies. By using Unity’s Universal Rendering Pipeline, sXR allows researchers to develop across platforms, including VR headsets, AR glasses, and smartphones. sXR is freely available at github.com/unity-sXR/sXR.