Binocular Vision

Talk Session: Wednesday, May 22, 2024, 11:00 am – 12:45 pm, Talk Room 2
Moderator: Jorge Otero-Milan, UC Berkeley

Talk 1, 11:00 am, 62.21

Addressing the Vergence-Accommodation Conflict in Virtual Reality: A Geometrical Approach

Xiaoye Michael Wang1 (), Colin Dolynski1, Michael Nitsche2, Gabby Resch3, Ali Mazalek4, Timothy N Welsh1; 1University of Toronto, 2Georgia Institute of Technology, 3Ontario Tech University, 4Toronto Metropolitan University

Technologies on the mixed reality continuum, such as virtual reality (VR), commonly yield distortions in perceived distance. One source of such distortions is the vergence-accommodation conflict, where the eyes’ accommodative state is coerced to the fixed locations of a headset’s screen, while the angles at which the two eyes converge in virtual space continuously update. The current study conceptualizes the effect of vergence-accommodation conflict as a constant outward offset to the vergence angle of approximately 0.2°. Based on this conceptualization, a novel model was developed to predict and account for the resulting distance distortions in VR using the stereoscopic viewing geometry. Leveraging this model, an inverse transformation algorithm along the observer’s line of sight was applied to the rendered virtual environment to counter the effect of vergence offset. To test the effects of the transformation, participants performed a series of manual pointing movements on a tabletop with or without the inverse transformation algorithm. Results showed that the participants increasingly undershot the targets when the inverse transformation was not available, but were consistently more accurate when the algorithm was applied to the virtual environment. The results indicate that systematically transforming the rendered virtual environment based on perceptual geometry could ameliorate distance distortions arising from the vergence-accommodation conflict. The findings of the present study could be applied to designing VR-based applications, such as for medical/surgery training, to improve the accuracy when interacting with virtual objects.

Acknowledgements: This work was supported by the Social Sciences and Humanities Research Council of Canada (SSHRC), the Canada Research Chair Program, the Natural Sciences and Engineering Research Council of Canada (NSERC), the Canada Foundation for Innovation, and the Ontario Ministry for Research and Innovation.

Talk 2, 11:15 am, 62.22

The influence of simulated ocular counter roll on stereoacuity

Stephanie M Reeves (), Jorge Otero-Millan1,2; 1Herbert Wertheim School of Optometry & Vision Science, University of California Berkeley, 2Department of Neurology, Johns Hopkins University

Stereopsis relies on precise binocular alignment to compute binocular disparity and infer 3D depth. When humans tilt their heads towards the shoulder, the two eyes rotate around the lines of sight in the opposite direction of head tilt. This ocular counter roll (OCR) only partially compensates for the head tilt. The torsion induced during OCR results in a misalignment of the horizontal meridians of the two eyes, which leads to vertical disparities between the retinas. The current work sought to investigate the effect of retinal image rotation due to OCR on stereoacuity while upright. We hypothesized that these vertical disparities will result in decreased stereoacuity. To investigate this research question, we recruited 8 participants to view stereoscopic random dot ring stimuli (spanning 2° to 3.5° peripherally, duration of 200 ms) with the use of a haploscope. Subjects reported whether a stimulus with crossed and uncrossed disparities of 0.1. 0.3, 0.5, 0.7, and 0.9 arcmins appeared in front or behind a fixation target with zero disparity. The stimulus rings were rotated by ±0°, 5°, 10°, and 30° to simulate OCR. Results revealed that stereoscopic thresholds during the 30° stimulus rotation were significantly worse than the 0° stimulus rotation thresholds (t(7) = 3.00, p=0.02). The reduction in stereoacuity at the 30° stimulus rotation was not worse than what is predicted by the reduction in horizontal disparity alone (p=0.31). Stimulus rotations of 0°, 5°, and 10° were not different from one another (p>0.66). Taken together, these results indicate that the limited amount of OCR (typically less than 10° in humans for any head tilt) may be optimized for stereopsis: with more torsion than is natural for the human body, stereoacuity gets worse, while modest amounts of torsion are tolerable for stereopsis.

Talk 3, 11:30 am, 62.23

Reversed depth representation in human and artificial visual systems

Bayu Gautama Wundari1, Hiroshi Ban1,2; 1National Institute of Information and Communications Technology (NICT), Japan, 2Graduate School of Frontier Biosciences, Osaka University, Japan

Stereopsis facilitates the brains of animals with front-facing eyes in linking the left and right retinal light patterns to disentangle the complex depth information in the sensory input. Extracting depth structures has been thought to be sufficiently explained by the binocular energy model in the primary visual cortex. However, engaging in real-world 3D tasks requires more complex stereo computations in higher visual areas. The locations and mechanisms through which the brain transforms the physical stimuli representation (binocular disparity) into perceptual representation (depth perception) remain unclear. To address the issue, we combined human psychophysics, neuroimaging, and deep neural network (DNN) simulations. We designed random-dot stereograms (RDSs) by varying the binocular dot contrast correlation to reverse the physical and the perceived depth: RDSs whose physical disparity elicited near were perceived as far, and vice versa. Participants (N=22) reported perceiving depth in reverse when presented with the engineered RDSs in a two-alternative forced choice (2AFC) task discriminating near and far. Decoding analysis on their fMRI voxel response patterns (V1-3, V3A, V3B, hMT, hV4) revealed that only V3A represented reversed depth, suggesting that reversed depth is an extrastriate phenomenon involving more complex stereo computation beyond early areas. We tested two DNNs to gain insight into the network architecture underlying reversed depth in V3A. One network could learn the geometry of contextual background for regressing disparity from a rectified pair of stereo images. The second network did not include such contextual information in its disparity estimation. Both networks were trained on the SceneFlow datasets, which provide accurate depth maps. We demonstrate that the network incorporating contextual information exhibited similar behavioral performance in the human’s depth judgment tasks. We conclude that V3A may house the neuronal circuit that learns the spatial context to generate neuronal activity that gives rise to conscious reversed depth perception.

Acknowledgements: This research was supported by ERATO (JPMJER1801) and Ministry of Education, Culture, Science, Sports and Technology (21H00968)

Talk 4, 11:45 am, 62.24

Surviving Continuous Flash Suppression: A Two-Photon Calcium Imaging Study in Macaque V1

Cai-Xia Chen1, Dan-Qing Jiang1, Xin Wang1, Sheng-Hui Zhang1, Shi-Ming Tang1, Cong Yu1; 1Peking University

Continuous flash suppression (CFS) has been widely used to study visual consciousness or awareness. Although the flashing Mondrian noise presented to one eye can suppress the perception of a stimulus presented to the other eye, some low-level visual information can survive the suppression and participate in downstream visual processing subconsciously. However, it remains elusive how the responses of V1 neurons, which receive stimulus inputs from two eyes, are affected by CFS. To address this issue, we used two-photon calcium imaging to record responses of superficial-layer V1 neurons to a target under CFS in two FOVs of an awake, fixating macaque. The target was a circular-windowed square-wave grating (d=1°, SF=3/6 cpd, contrast=0.45, drifting speed=4°/s). The flashing stimulus was a circular Mondrian noise pattern (d=1.89°, contrast=0.50, TF=10 Hz). The stimuli were presented for 1000-ms with 1500-ms intervals. The square grating at various orientations was first presented alone to either eye to identify oriented-tuned V1 neurons (~700 per FOV) and calculate each neuron’s ocular dominance index (ODI). Then the grating target was presented to one eye and the flashing noise to the other eye to measure neuronal responses under CFS. With the presence of flashing noise, orientation responses of neurons preferring the noise eye (ODI>0.2), in the form of population orientation tuning function, were completely suppressed (by 96.5%) without measurable bandwidth, and those preferring both eyes (-0.2<ODI<0.2) were also severely suppressed (by 89.5%) with unmeasurable or very wide bandwidth. However, although the responses of neurons preferred the grating eye were also significantly suppressed (by 75.5%), the tuning bandwidth was still measurable, which increased from 11-13° to 19° (half-height half-width). These results indicate that only a small portion of the orientation responses in V1 neurons preferring the target eye can survive continuous flash suppression, while orientation responses of other neurons are mostly wiped out.

Acknowledgements: supported by the National Science and Technology Innovation 2030 Major Program (2022ZD0204600)

Talk 5, 12:00 pm, 62.25

Computational Mechanisms of Perceptual Traveling Waves

João Victor XAVIER CARDOSO1, Hsin-Hung LI2,3, David J. HEEGER2,3, Laura DUGUÉ1,4; 1Université Paris Cité, CNRS, Integrative Neuroscience and Cognition Center, F-75006 Paris, France, 2Department of Psychology, New York University, New York, NY, 3Center for Neural Science, New York University, New York, NY, 4Institut Universitaire de France (IUF), Paris, France

Binocular rivalry is a perceptual phenomenon in which perception alternates between rival images presented to each eye. Under the right conditions, the dynamics of these alternations form a wave-like pattern starting where one rival image locally becomes the dominant percept. Studies have shown a link between these perceptual traveling waves and waves of brain activity in primary visual cortex (Lee et al., 2005). Here, we replicate and extend previous psychophysics studies of perceptual waves observed in binocular rivalry (e.g., Wilson et al., 2001), and fit a computational model to the behavioral data. A pair of orthogonal gratings, each windowed by an annulus and projected to one eye, were presented to human participants (n=21). Replicating previous results, a local contrast increment in one eye induced perceptual dominance that emerged locally and progressively expanded as it rendered invisible the stimulus presented to the other eye. Participants pressed a key when a perceptual wave reached a target area enabling us to measure propagation speed. We observed (1) slower speeds for more eccentric annuli, commensurate with differences in cortical magnification; (2) slower speed when crossing the vertical meridian, consistent with inter-hemispheric communication; (3) morning participants perceived faster waves than afternoon participants, interpreted as circadian variations in cortical excitability; (4) allocating attention to the annulus was necessary for perceptual waves to be perceived; and (5) rhythmic, local contrast increments induced rhythmic perceptual waves. Finally, we adapted a previously proposed binocular rivalry model (Li, et al. 2017) so it can reproduce both temporal and spatial patterns of perceptual waves. The model could replicate our main findings, along with features reported by other studies, such as changes in propagation speed as a function of attention, input strength and recurrent excitation. Together, our research aims to develop a computational framework for understanding perceptual traveling waves in binocular rivalry.

Talk 6, 12:15 pm, 62.26

Dichoptic contrast integration across the human visual cortex hierarchy using functional MRI

Kelly Chang1 (), Xiyan Li1,2, Kimberly Meier1,3, Kristina Tarczy-Hornoch1, Geoffrey M. Boynton1, Ione Fine1; 1University of Washington, 2University of California, San Diego, 3University of Houston

Introduction: A recent behavioral study by Meier et al. (2023) showed that when the contrast of a non-rivalrous grating is modulated independently in the two eyes, the perceived contrast of the combined stimulus roughly follows the maximum contrast over the two eyes. Here, in a similar paradigm using fMRI, we investigated the neural locus of this behavioral result. Methods: We measured BOLD fMRI signals in early visual cortex (V1 – V3) while participants (n = 10) viewed non-rivalrous dichoptic gratings (2-cpd) that varied slowly in contrast over time in each eye independently at 1/6 and 1/8 Hz. Observers provided a continuous report of perceived contrast over time by positioning a joystick lever. We fit a Minkowski mean [ (L(t)^m + R(t)^m) / 2 ] ^ (1/m) to the behavioral and fMRI time-courses, where L(t) and R(t) are the contrast time-courses in each eye. An exponent parameter of m = 1 is simple averaging, and as m → ∞ the model increases towards a max response in which neural responses or perceived contrast is driven by the eye presented with the highest contrast. Results: The magnitude of m was smallest in V1 (m = 2.00) and increased across the visual hierarchy toward a max model in V2 (m = 5.19) and V3 (m = 8.12). Behavioral responses measured during scanning were consistent with a max model (m = 6.55) and the later stages of the visual hierarchy. Conclusion: Our fMRI results in V1 are similar to a previous fMRI study that used a normalization model (Moradi & Heeger, 2009) to predict V1 BOLD responses. However, the integration of contrast in V1 differs systematically from perceived contrast. BOLD signals in V2 and V3 were consistent with behavioral measurements, implicating these higher visual areas as the neural locus of perceived contrast.

Acknowledgements: Knights Templar Eye Foundation, Research to Prevent Blindness, UW Center for Human Neuroscience, Unrestricted grant from Research to Prevent Blindness to UW Department of Ophthalmology

Talk 7, 12:30 pm, 62.27

Selectivity for binocular disparity in the primate superior colliculus may not be directly inherited from V1

Incheol Kang1 (), Gongchen Yu1, Leor Katz1, Richard Krauzlis1, Hendrikje Nienborg1; 1National Eye Institue, NIH

The primate superior colliculus (SC) gets prominent inputs from V1 where selectivity for horizontal binocular disparity is well-established. Such disparity selective input could provide a direct route for depth information supporting orienting behaviors in 3D environments. Here, we used multichannel linear arrays to record from the superficial and intermediate layers of the SC of one rhesus macaque while presenting random-dot stereograms (RDSs) at the neurons’ receptive fields (mean = 13.1°, range = 0.5° ~ 41.1°). We examined disparity tuning for both correlated and anti-correlated RDSs, in which corresponding dots shown to the left and right eye had opposite luminance polarities. Of the 393 isolated units, 272 (69%) were significantly selective for binocular disparity (Disparity Discrimination Index, p < 0.05). Units recorded in the same session tended to prefer similar disparities, suggesting clustering for disparity. Disparity tuning properties were comparable between neurons in the superficial (more visual) and intermediate (more visuomotor) layers. Consistent with the idea of pooling inputs from V1, the disparity selectivity emerged quickly after stimulus onset (~43 ms), typically showed even-symmetric tunings (78%) and had a broad tuning width. As in V1, the disparity tuning for anti-correlated RDSs was inverse to that for the correlated RDSs with a reduced amplitude compared to that for correlated RDSs. However, this amplitude reduction was substantially more pronounced in the SC (a median of 18%) compared to V1. Furthermore, the disparity selectivity was negatively correlated with the degree of monocularity (r = -0.35, p < 10-8), unlike previous findings in V1. Together, we find that most SC neurons are selective for binocular disparity providing a plausible neural substrate for how the SC supports visual orienting in 3D natural environments. Several properties of the disparity tuning appear incompatible with direct pooling of V1 and suggest that it is shaped by additional mechanisms.