VSS, May 13-18

Spatial Vision

Talk Session: Saturday, May 14, 2022, 5:15 – 7:15 pm EDT, Talk Room 2
Moderator: Andrew Watson, Apple

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 3:19 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 5:15 pm, 25.21

Topological Receptive Field Model: An enhancement to the pRF

Yanshuai Tu1 (), Zhong-Lin Lu2,3,4, Yalin Wang1; 1School of Computing and Augmented Intelligence, Arizona State University, Tempe, AZ, USA, 2Division of Arts and Sciences, NYU Shanghai, Shanghai, China, 3Center for Neural Science and Department of Psychology, New York University, New York, United States of America, 4NYU-ECNU Institute of Brain and Cognitive Science, NYU Shanghai, Shanghai, China

The population receptive field (pRF) model is the state-of-the-art retinotopic map analysis method. However, because of the relatively low signal-to-noise ratio and low spatial resolution in fMRI signals, large portions of the retinotopic maps from the voxel-wise decoding pRF solutions often violate the topological condition observed in neurophysiology, that is, nearby neurons have nearby receptive fields. It is advantageous but challenging to impose the topological condition when decoding fMRI time series. Here, we propose a topological receptive field (tRF) framework to impose both topological conditions by combining topology-preserving segmentation and topological fMRI decoding iteratively, using the Beltrami coefficient, a metric used in quasiconformal theory, to quantify topological conditions. We validated the proposed framework on both synthetic and real human retinotopy data. The synthetic data were generated using the double-sech model with two levels of fMRI noise and then decoded with both tRF and pRF. We found that tRF performed better than pRF, with a smaller average visual coordinate recovery error (2.485 vs 2.924 degrees) and no violation of the topological condition (0 vs 393 flipped triangles, out of a total of 2798). We also compared the performance of the two methods on the 12 visual areas retinotopic maps of the first three observers in the Human Connectome Project 7T Retinotopic dataset. The results also showed that the tRF provided better fits to the fMRI time series than the pRF (average RMSE=0.273 vs 0.276) and generate no topological violations (0 vs 870 flipped triangles out of a total of 19640). To our knowledge, this is the first work that enforces the topological condition in decoding retinotopic fMRI signals, and the first automatic visual area segmentation method that preserves the topology graph. The general framework can be extended to other sensory maps.

Acknowledgements: R01EY032125

Talk 2, 5:30 pm, 25.22

Nonlinear spatiotemporal suppression by population receptive fields of human visual cortex

Eline R Kupers1, Insub Kim1, Kalanit Grill-Spector1,2; 1Department of Psychology, Stanford University, CA, USA, 2Wu Tsai Neurosciences Institute, Stanford University, CA, USA

When multiple visual stimuli are presented simultaneously in the receptive field, the neurophysiological response is surprisingly lower than when the identical stimuli are presented sequentially (Kastner et al. 1998, 2001, 2021; Reynolds et al. 1999). However, the underlying mechanism of this suppression effect is not well-understood. Here we collected fMRI data in two separate experiments and computationally tested simultaneous suppression using population receptive field (pRF) models. First, we mapped each voxel’s spatial pRF and defined visual areas using cartoon stimuli (Toonotopy; Finzi et al. 2020). Second, we presented colorful square stimuli either sequentially, where four stimuli appeared one at a time, in random order, or simultaneously, where identical stimuli appeared for the same duration but all at once. To examine how temporal and spatial summation contribute to brain responses, both conditions used two durations (200 and 1000 ms) and two stimuli sizes (4 and 16 deg2). We found that V1-V2 voxel responses were similar for simultaneous and sequential conditions, and larger for longer and bigger stimuli. This response pattern was well-predicted by a linear pRF model (Dumoulin & Wandell, 2008), as pRFs were small and typically covered one square. However, in V3 and higher visual areas, responses were lower for simultaneous than sequential presentations, higher for shorter than longer durations, and did not increase much with stimulus size. While pRFs in these regions covered multiple stimuli, the linear pRF model failed to predict our observations. A compressive spatial summation pRF model (Kay et al. 2013) predicted the modest increase with stimulus size, but overpredicted simultaneous suppression and did not predict larger responses for shorter durations. Our results indicate that sensory suppression cannot be explained by spatial pRFs only and that spatiotemporal pRFs with nonlinearities are necessary for predicting responses beyond early visual areas.

Acknowledgements: Funding: NEI R01 EY023915

Talk 3, 5:45 pm, 25.23

Divisive normalization and the computational neuropharmacology of vision

Marco Aqil1 (), Tomas Knapen, Serge Dumoulin; 1Spinoza Centre for Neuroimaging

Neural processing is hypothesized to apply the same mathematical operations in a variety of contexts, implementing so-called canonical neural computations. Divisive normalization (DN) is considered a prime candidate for a canonical computation. Here, we use a combination of state-of-the-art experiments (ultra-high-field functional MRI, PET) and mathematical methods (population receptive field [pRF] modeling) to investigate the role of divisive normalization (DN) as the canonical neural computation underlying visuospatial responses throughout the human visual hierarchy. We found that 1) a DN-pRF model explains seemingly unrelated response signatures, unifying and outperforming existing pRF models throughout the human visual hierarchy, and 2) specific model parameters modulate the presence of distinct nonlinear response properties (surround suppression and compressive spatial summation). Furthermore, we investigated the biophysical implementation underlying DN. Activation of specific neurotransmitter receptors is known to modulate responses to visual stimuli; hence, we hypothesized that specific neuropharmacological mechanisms may also underlie the operations of DN throughout the visual system. To test this hypothesis, we compared maps of DN pRF model parameters to the distribution of serotonin and GABA receptors obtained from PET imaging. We found highly significant correlations between receptor densities, in particular GABA and 5-HT1B, and DN model parameters. Our findings 1) extend the role of DN as a canonical computation to neuronal populations throughout the human visual hierarchy and 2) provide novel evidence for the role of neurotransmitter systems as the biological mechanism underlying neuromodulation of DN computations. We propose that these findings provide new insights into the canonical principles of information encoding in the cortex, as well as their biophysical implementation.

Talk 4, 6:00 pm, 25.24

Population receptive field size varies between thin vs. thick stripes in cortical areas V2/V3

Roger Tootell1,2,3 (), Louis Vinke1,2, Bryan Kennedy1,2, Shahin Nasr1,2,3; 1Massachusetts General Hospital, 2Martinos Center for Biomedical Imaging, 3Department of Radiology, Harvard Medical School

Introduction: Variations in population receptive field (pRF) size are crucially related to the sensitivity to spatial frequency (SF) across visual cortex, across both cortical areas, and across the representation of retinotopic eccentricity in each area. For instance, cortical sites that represent peripheral (compared to central) visual fields have larger pRF sizes and respond more strongly to stimuli with lower SFs. However, within a given retinotopic representation, it remains unknown whether pRF size differs with variations in columnar sensitivity (e.g. color, disparity, motion, etc.). Here we tested this hypothesis, partly by leveraging a prior finding: we found a stronger response to low SF stimuli in V2/V3 thick stripes, compared to thin stripes (Tootell and Nasr, 2017). Methods: Using high-resolution fMRI (7T; voxel size: 1 mm isotropic), in 4 human subjects, we measured the pRF size by presenting moving bars at multiple orientations and motion directions. In two individuals, we also measured the reproducibility of the results by scanning them on two different days. In all subjects, we also independently localized V2/V3 thin vs. thick stripes (Tootell and Nasr, 2021). Results: Consistent with previous studies, in all individuals, we found a larger pRF in peripheral vs. central representations, and a progressively larger pRF size in V1<V2<V3<V3A. The pRF sizes were reproducible across sessions, in both deep and superficial cortical layers (r>0.39). Here, we found that average pRF size in V2/V3 was significantly larger in thick compared to thin stripes, even when the measurements were confined to iso-eccentric (3˚<r<10˚) representations. In all subjects, this effect was observed within both deep and superficial layers, consistent with a columnar organization. Conclusion: Our results show primary evidence for a heterogeneous, column-based distribution of pRF sizes at iso-eccentric sites. This supports the hypothesis that pRF size co-varies with column-scale variations in sensitivity to SF.

Acknowledgements: This work was supported by NIH NEI (grant R01EY030434), and by the MGH/HST Athinoula A. Martinos Center for Biomedical Imaging. Crucial resources were made available by a NIH Shared Instrumentation Grant S10-RR019371.

Talk 5, 6:15 pm, 25.25

A role for spatiotemporal dynamics in the function of the visual system.

Zachary Davis1 (), Lyle Muller2, John Reynolds1; 1The Salk Institute for Biological Studies, 2Western University

Recent advances in large scale electrophysiological and optical recording techniques have revealed that intrinsic fluctuations in cortical activity exhibit organized spatiotemporal structure, often in a form well characterized as traveling waves. While intrinsic traveling waves (iTWs) had often been studied during states of anesthesia, sleep, or low arousal, we find that iTWs occur during normal activity in the visual cortex of the alert non-human primate. iTWs occur multiple times per second and modulate the magnitude of sensory evoked activity in a phase-dependent manner. Further, we have shown that the state of iTWs impact visual perception in marmosets as they perform a challenging visual detection task. We have constructed a large-scale conductance-based topographic spiking network model that recapitulates the phenomenology of iTWs in vivo. The model shows that large-scale iTWs emerge from propagation delays in locally asynchronous spiking dynamics throughout cortical horizontal fiber networks. The model predicts that neuronal activity during iTWs is sparse, in the sense that only a small fraction of the neural population participates in any individual iTW. As a result, iTWs can occur without inducing correlated variability, which has been shown to impair sensory discrimination. The model also predicts that iTWs traverse feature domains through the horizontal fibers that connect similarly-tuned cortical columns and are coordinated across visual cortical areas via the retinotopically ordered interareal projections. We find preliminary evidence supporting these predictions from electrophysiological recordings in vivo. Taken together, these findings lead to the conclusion that feature-selective and retinotopically-ordered projection systems endow the visual system with the capacity to organize intrinsic spiking activity into iTWs to improve perception.

Acknowledgements: This work was funded by the NIH National Eye Institute.

Talk 6, 6:30 pm, 25.26

The temporal dynamics of visual crowding and segmentation

Michael Herzog1 (), Greg Francis1,2, Mauro Manassi3; 1EPFL, 2Purdue, 3University of Aberdeen

Perception of a target strongly deteriorates when flanking elements are presented (crowding). Classically, crowding is explained by pooling mechanisms where target and flanker features are combined, e.g., when neurons in higher visual areas with larger receptive fields pool information from neurons in lower visual areas with smaller receptive fields. Crowding is proposed to occur in the feedforward sweep of information processing. Here, we show that crowding occurs in a highly temporal fashion, requiring substantial processing time. We presented a vertical vernier for 20ms flanked by either 2 vertical lines (one on each side of the vernier) or 2 cuboids (the cuboids contained the lines). For lines and cuboids, strong crowding occurred. When we increased the stimulus duration, crowding remained strong for the 2 lines but gradually decreased for the cuboids reaching nearly unflanked performance from a duration of about 120ms on (uncrowding). These results show that, first, uncrowding cannot be explained by simple pooling models. Second, as we proposed previously, uncrowding occurs when the flankers make up a good Gestalt (cuboids) that segregate from the target vernier. Third, the computation of good Gestalts of the cuboids takes substantial time. What matters for uncrowding is not stimulus duration per se, but processing time. When we presented only the 2 cuboids alone for 20ms, then an ISI of 120ms, and then the cuboids plus the vernier, strong uncrowding occurred. Using the Laminart model, we show how processing evolves dynamically in a recurrent fashion.

Acknowledgements: GF: Human Brain Project SGA3 (945539), Visiting Scientist Grant from the Swiss National Science Foundation; MM: Carnegie Trust for the Universities of Scotland RIG009850 ; MHH: “Basics of visual processing: from elements to figures” of the SNF

Talk 7, 6:45 pm, 25.27

Eccentricity driven modulations of visual crowding across the central fovea

Ashley M. Clark1 (), Martina Poletti2; 1University of Rochester

It is well established that surrounding a stimulus with flankers decreases acuity, a phenomenon known as visual crowding. While it has been widely documented that the magnitude of crowding increases with eccentricity, it remains unknown if this increase starts already within the central 1-deg foveola, and if so, what is the rate of growth at this scale. Addressing this question is important as foveal vision is often confronted with crowded stimuli. We measured subjects’ (N=6) visual acuity in a 4AFC task. Stimuli were viewed monocularly in either isolation or with surrounding flankers and were presented at different foveal and extrafoveal eccentricities. Stimuli were presented in Pelli font, designed for testing foveal crowding, and their widths, ranging from 0.4' to 4.5', were adjusted using an adaptive procedure. Eye movements were measured with high-precision using digital Dual Purkinje Image eye-tracking. To limit visual stimulation around the desired foveal eccentricity, we used a state-of-the-art custom-made gaze-contingent display system allowing for more accurate gaze localization and retinal stabilization. Our results show that the impact of crowding increases with eccentricity already within the foveola; acuity in the presence of flankers decreases as a function of foveal eccentricity, with a decrease of 32% at the center of gaze, and an additional 24% drop as early as 10’ away. Further, crowding increases with eccentricity at a rate that is three times slower in the foveola (from 0-0.4 deg) than extrafoveally (from 1-6 deg). These findings reveal that visual crowding does not affect the whole foveola equally; its impact significantly increases even with minute changes in eccentricity. Therefore, under normal viewing conditions with crowded foveal stimuli, acuity likely drops considerably even a few arcminutes away from the preferred locus of fixation.

Acknowledgements: NIH R01 EY029788-01

Talk 8, 7:00 pm, 25.28

A contrast masking investigation of color induction

Chien-Chung Chen1 (), Cheng-Ying Yu2, Chih-Hsien Huang2; 1National Taiwan University, 2Taipei First Girls High School

The color appearance of a uniform region (test) can be altered by a surrounding periodic pattern (inducer) alternating in two colors (Monnier & Shevell, 2003, Nature Neuroscience). We investigated the properties of the mechanisms underlying the contributions of different inducer components by observing how various surrounding patterns affect color contrast discrimination in the test. The bull-eye inducer (3.3 cyc/deg along the radius) had rings alternating in two out of three colors: L-M (“red”), M-L (“green”), and neutral gray. The thirteenth ring (2-degree eccentricity) of the inducer was replaced by the uniform test. The test regions contained either a red or green target superimposed on a pedestal of the same color or the pedestal alone. The pedestal contrast varied from -46 to -26 dB. We used a temporal 2AFC and PSI adaptive staircase to measure the target threshold at a 75% proportional correct level. The observer’s task was to decide which interval contained the target. The target threshold vs. pedestal contrast (TvC) function for a red target surrounded by a nearby green ring and a distanced red ring (full inducer) showed a dipper shape with threshold decrement (facilitation) at low and increment (masking) at high pedestal contrasts. The TvC function for inducers with nearby green and a distanced gray ring shifted downward on log-log coordinates from that of the full inducer and had more pronounced facilitation. The TvC function shifted further downward and produced little masking for inducers with nearby gray and a distanced red ring and that for the uniform gray background had the lowest threshold and had only masking effects. The result for the green target mirrored the red target data. The data is best explained by a divisive inhibition model with an additive short-range and a multiplicative long-range lateral interaction.

Acknowledgements: Supported by MOST (Taiwan) 109-2410-H-002 -086 -MY4