VSS, May 13-18

3D Perception

Talk Session: Saturday, May 14, 2022, 2:30 – 4:15 pm EDT, Talk Room 1
Moderator: Fulvio Domini, Brown University

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 3:33 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 2:30 pm, 24.11

Compositional and texture-independent 3D orientation coding in visual cortex

Judith Hoeller1 (), Michalis Michaelos1, Marius Pachitariu1, Sandro Romani1; 1HHMI Janelia

To facilitate reasoning about 3D visual scenes our brain must convert complex visual inputs to simpler representations. While this conversion has been studied with psychophysics in humans, the underlying neural representations are poorly understood. Here we study how neural representations change as visual scenes are transformed by 3D rotations. Like images, representations at the level of the retina are “compositional”: The change of the representation to one transformation plus the change to another transformation is the same as the change to the combined transformation. We hypothesize that representations in the visual cortex are also compositional and texture independent. Based on this hypothesis we construct a geometric model in which we input neural activities to 3D-rotated planar surfaces and then predict the change in neural activities to multiples of those rotations. To test our hypothesis, we imaged the activity of more than 60,000 neurons in the visual cortex while a mouse is passively viewing planar textures at different 3D orientations projected to a monitor. Even though our textures differ widely from each other (e.g. in spatial frequency, “natural-ness” etc.), we find that 3D orientation is responsible for a large fraction of the variance in neural activity. In a dimensionally reduced subspace of neural activity that contains most of the variance, 3D orientation can be decoded linearly within the range of values that we tested. Finally, the geometric model fits our data well. These findings show that the mouse visual cortex encodes the 3D orientations of planar surfaces in a compositional and texture independent manner. We conclude that mice are a promising animal model for studying 3D vision, and that our theory provides a step towards understanding how visual scenes are encoded by the brain.

Talk 2, 2:45 pm, 24.12

Searching for hidden objects in 3D environments

Erwan David1 (), Melissa L.-H. Vo1; 1Scene Grammar Lab, Goethe University Frankfurt

The vast majority of visual search paradigms tend to use targets that are directly visible. However, in natural conditions, the target could be stored inside some furniture. Does knowing that a target may be hidden inside another object (e.g., drawer, box, oven) affect search behavior? To answer that question we created 28 virtual scenes modeled with interactable furniture, so that, for instance, one could open a refrigerator or a cupboard to look at its content inside. Participants wore a virtual reality (VR) headset to look for objects in 3 blocks of 27 trials (40s max.). In the first block, targets were always visible, while in block 2 and 3 targets could be hidden inside objects as well as visible. Participants did not know they could use the VR controller to interact with the scene until the start of block 2. We hypothesized that knowing that objects could be hidden, would alter participants’ search strategies, for instance, slowing down search due to the increased search space. However, compared to trials in block 1, our analyses show that participants were no less efficient searching for visible targets when there was a possibility that they could be hidden. In blocks 1 and 2, we report a decrease in average saccade amplitudes for visible and hidden targets, which we ascribe to the exploration of containers. As expected, searching for actually hidden targets increased scanning times and scanpath lengths, but had no effect on search initiation, target verification times or search success. In sum, knowledge about hidden objects did not significantly alter search behavior for visible objects in 3D environments. This study is the first in a series of hidden object paradigms in VR. Future experiments will explicitly investigate the importance of scene grammar related to object search efficiency.

Acknowledgements: This work was supported by SFB/TRR 26 135 project C7 to Melissa L.-H. Võ and the Hessisches Ministerium für Wissenschaft und Kunst (HMWK; project ‘The Adaptive Mind’).

Talk 3, 3:00 pm, 24.13

Degraded disparity signal reduces magnitude but not precision of depth estimates

Ailin Deng1 (), Fulvio Domini2; 1Brown University

With a weakened binocular disparity signal for depth (e.g., when the viewing distance of a target object is increased), traditional probabilistic models of depth perception predict an increase in estimation noise. An alternative theory of deterministic depth perception, termed Intrinsic Constraint, instead predicts a decrease in the slope of the function relating distal shape to estimated depth (the gain), while estimation noise remains unchanged. Here, we investigated the relationship between the strength of the disparity signal and the estimated depth by modulating dot brightness of random-dot stereograms (RDSs). In the first experiment, participants viewed a cylindrical curved surface defined by a RDS and adjusted a 2-dimensional curved probe to match the surface profile. The stimulus varied by three within-subject factors: six simulated depths, orientations (horizontal or vertical), and dot brightness (bright or dim). Results indicated that a surface defined by a dimmer RDS was perceived shallower. More importantly, the gain of the dimmer RDS also decreased while response variability remained the same. In other words, participants were less sensitive to changes in depth while retaining the same level of precision. In a set of follow-up 2IFC tasks, we again found evidence that dimmer RDSs yielded decreased estimated depths without systematic increases in the Just-Noticeable-Differences. Taken together, the results from both experiments indicate that the strength of the disparity signal modulates the magnitude of the estimated depth but not the estimation noise. These findings support deterministic depth perception where the strength of the depth signal directly maps to the magnitude of the perceived depth. More generally, our results indicate that care should be taken when selecting RDS properties like dot brightness to prevent unpredicted biases in perceived shapes.

Talk 4, 3:15 pm, 24.14

A collection of stationary objects flashed periodically produce depth perception under ordinary viewing conditions

Frédéric Gosselin1 (), Mégan Brien1, Justine Mathieu1, Ariane Tremblay1; 1Département de psychologie, Université de Montréal

We discovered that stationary objects flashed periodically produce depth perception under ordinary viewing conditions. We believe that small involuntary eye movements (Ko, Snodderly & Poletti, 2016) induce apparent motions of magnitudes proportional to the flash periods (for a similar proposal, see Gosselin & Faghel-Soubeyrand, 2017) and that these apparent motions are interpreted by the brain as a form of parallax that results normally from microscopic head movements (Aytekin & Rucci, 2012). Here, we tested a somewhat counterintuitive prediction of this hypothesis: perceived depth should increase linearly with viewing distance. Eight observers were shown, on an Asus VG278HR at a refresh rate of 120 Hz, a stimulus made of 200 white discs distributed randomly on a black background spanning 10 x 10 cm. Each disc flashed with one of 13 periods evenly spread between 8.33 ms and 108.33 ms; the period was chosen to be inversely proportional to the value of a depth map at the disc location. The depth map represented the thick vertices of a cube. All observers reported seeing clearly this volumetric shape during the experiment. Participants viewed the stimulus binocularly, sitting comfortably in a chair at distances of 45, 70, 95, 120, 145 and 170 cm, three times. On every trial, they were asked to move their chair at a randomly selected viewing distance indicated on the computer monitor. Viewing distances were marked on the floor with stripes of photoluminescent tape. When ready, subjects pressed on the computer mouse button to initiate the presentation of the stimulus for 2 s. Finally, they were instructed to estimate the perceived depth in the stimulus by adjusting the length of a horizontal line drawn on the computer monitor with the computer mouse. As expected, we found a strong positive linear relationship between viewing distances and mean depth estimations (r=0.9585, p=0.0025).

Acknowledgements: This work was supported by an NSERC Discovery Grant awarded to Frédéric Gosselin.

Talk 5, 3:30 pm, 24.15

Stereoscopic distortions when viewing geometry does not match inter-pupillary distance

Jonathan Tong1 (), Robert Allison1, Laurie Wilcox1; 1York University

The relationship between depth and binocular cues (disparity and convergence) is defined by the distance separating the two eyes, also known as the inter-pupillary distance (IPD). This relationship is mapped in the visual system through experience and feedback, and adaptively recalibrated as IPD gradually increases during development. However, with the advent of stereoscopic-3D displays, situations may arise in which the visual system views content that is captured or rendered with a camera separation that differs from the viewer’s own IPD; without feedback, this will likely result in a systematic and persistent misperception of depth. We tested this prediction using a VR headset in which the inter-axial separation of virtual cameras and the separation between the optics are coupled. Observers (n=15) were asked to adjust the angle between two intersecting textured-surfaces until it appeared to be 90°, at each of three viewing distances. In the baseline condition the lens and camera separations matched each observer’s IPD. In two ‘mismatch’ conditions (tested in separate blocks) the lens and camera separations were set to the maximum (71 mm) and minimum (59 mm) allowed by the headset. We found that when the lens and camera separation were less than the viewer’s IPD they exhibited compression of space; the adjusted angle was smaller than their baseline setting. The reverse pattern was seen when the lens and camera separation were larger than the viewer’s IPD. Linear regression analysis supported these conclusions with a significant correlation between the magnitude of IPD mismatch and the deviation of angle adjustment relative to the baseline condition. We show that these results are well explained by a geometric model that considers the scaling of disparity and convergence due to shifts in virtual camera and optical inter-axial separations relative to an observer’s IPD.

Acknowledgements: This work was funded by an NSERC Collaborative Research and Development (CRD) grant in collaboration with Qualcomm Canada Inc

Talk 6, 3:45 pm, 24.16

Near space distance perception in cluttered scenes

Rebecca L Hornsey1 (), Paul B Hibbard1; 1University of Essex

An abundance of visual cues contribute to the perception of distance, in both physical and virtual environments. Here, two environments were created in virtual reality to test the impact of scene clutter on distance judgements in near-space. The two environments were (1) sparse, only the stimuli were visible; and (2) cluttered, additional objects were added to the scene. It was predicted that there would be under-constancy of distance perception in both environments, but fewer errors in the cluttered environment due to the additional visual cues available compared to the sparse environment. Using an equidistance task, 21 participants were required to match the distance of a target stimulus to that of a reference stimulus. While holding scene lighting between environments consistent, and using both ambient and directional lighting, performance was measured in terms of accuracy of estimates (centimetre error) and precision (standard deviation of responses). As expected, under-constancy of distance was found in both environments. However, there was no significant difference in the accuracy or precision of distance estimates between the environments. These results show that, in our environments, no additional improvement in performance was gained from the presence of scene clutter, over and above what was achieve on the basis of the geometrical and lighting cues associated with the target object and table surface.

Talk 7, 4:00 pm, 24.17

Mug shots: Systematic biases in the perception of facial orientation

Nikolaus F. Troje1 (), Maxwell Esser2, Anne Thaler3; 1York University

The angular orientation of a face pictured in half-profile view is systematically overestimated by the human observer. For instance, a 35 deg view is estimated to be oriented around 45 deg. What is the cause for this perceptual orientation bias? Here, we address three related questions. (1) Is the phenomenon specific to pictorial projections or does it also occur in 3D space? (2) Can it be explained with the depth compression expected when the vantage point of the observer is closer to the picture than the point of projection? (3) Does the visual system use a shape prior that does not match the elliptical horizontal cross section of a typical head? Exp. 1 was conducted in virtual reality. We used a method of adjustment (“orient this face into a 45° position”). We found the orientation bias was smaller than expected and only marginally different between picture and 3D conditions. In Exp. 2 we presented static pictures and systematically varied the vantage point of the observer relative to the point of projection of the picture. We observed a pronounced bias which was not dependent on the vantage point. In Exp. 3, we replicated the orientation bias with a non-facial object – a coffee mug with a handle that defined its orientation. We systematically modified the shape of the mug between circular and elliptical horizontal cross sections. Mugs were then presented either as static images or as short movies with the mug rotating about its vertical axis. Participants estimated orientation almost veridically for circular shapes and displayed predictable errors for other shapes. The shape-dependent orientation biases were much smaller for the movies compared to the pictures. We conclude: The visual system adopts the heuristic of a cylindrical head shape unless explicit information about its shape is provided, e.g., through structure-from-motion.

Acknowledgements: CFREF VISTA