Spatial Vision

Talk Session: Monday, May 22, 2023, 8:15 – 9:45 am, Talk Room 1
Moderator: Dennis Levi, UC Berkeley

Talk 1, 8:15 am, 41.11

How do visual abilities relate to each other?

Simona Garobbio1 (), Marina Kunchulia2, Michael H. Herzog1; 1EPFL, 2Free University of Tbilisi

In vision, there is surprisingly very little evidence for common factors. Using large scale test batteries, most studies have found that there are no or only weak correlations between performance levels in different visual tests. Factor analysis confirmed these results. This means that a participant excelling in one test may rank lowest in another test. In aging research, cross-sectional studies have repeatedly found that older adults show deteriorated performance in most visual tests compared to young adults. However, within the older population, there is no evidence for a common factor underlying visual abilities. To investigate further the decline of visual abilities due to age as well as the relationship between visual abilities, we performed a longitudinal study. Older adults performed a battery of 12 visual tests three times, with re-tests after about four and seven years. Performance in most visual tests is stable across the seven years, except for visual acuity as determined with the Freiburg visual acuity test, which shows strong decline. Our results suggest that the decline of most visual abilities due to age is slower than usually thought - biased by the decline in visual acuity. This paradoxical outcome provokes the question whether visual acuity is a misleading or a very good test.

Talk 2, 8:30 am, 41.12

Spatial Frequency Maps in Human Visual Cortex: A Replication and Extension

Jiyeong Ha1 (), William Broderick2, Kendrick Kay3, Jonathan Winawer1; 1New York University, 2Flatiron Institute, 3University of Minnesota

Neurons in primary visual cortex (V1) of non-human primates are tuned to spatial frequency, with preferred frequency declining with eccentricity. fMRI studies show that spatial frequency tuning can be measured at the mm scale in humans (single voxels), and confirm that preferred frequency declines with eccentricity. Recently, fMRI-based quantitative models of spatial frequency have been developed, both at the scale of voxels (Aghajari, Vinke, & Ling, 2020, J Neurophys) and maps (Broderick, Simoncelli, & Winawer, 2022, JoV). For the voxel-level approach, independent spatial frequency tuning curves were fit to each voxel. For the map-level approach, a low dimensional parameterization (9 parameters) described spatial frequency tuning across all of V1 as a function of voxel eccentricity, voxel polar angle, and stimulus orientation. Here, we sought to replicate and extend Broderick et al.’s results using an independent dataset (Natural scenes dataset, NSD; Allen et al, 2022, Nat Neurosci). Despite many experimental differences between Broderick et al and NSD, including field strength (3T vs 7T), number of stimulus presentations per observer (96 vs 32), and stimulus field of view (12° vs 4.2° maximal eccentricity), most, though not all, of the model parameters showed good agreement. Notably, parameters that capture the dependency of preferred spatial frequency on voxel eccentricity, cardinal vs oblique stimulus orientation, and radial vs tangential stimulus orientation, were similar. We also extended Broderick et al.’s results by fitting the same parametric model to NSD data from V2 and V3. From V1 to V2 to V3, there was an increasingly sharp decline in preferred spatial frequency as a function of eccentricity, and an increasingly large bandwidth in the voxel spatial frequency tuning functions. Together, the results show robust reproducibility of visual fMRI experiments, and bring us closer to a systematic characterization of spatial encoding in the human visual system.

Talk 3, 8:45 am, 41.13

Crowding does not follow Gestalt principles in foveal and amblyopic vision

John A. Greenwood1 (), Alexandra Zmuda1, Annegret H. Dahlmann-Noor2,3, Alexandra V. Kalpadakis-Smith1; 1University College London, London, UK, 2Moorfields Eye Hospital, London, UK, 3NIHR Biomedical Research Centre, London, UK

Crowding is the disruption to object recognition that occurs in clutter, a process that strongly limits peripheral vision and becomes elevated in foveal/central vision with amblyopia (‘lazy eye’). Bottom-up ‘pooling’ models depict crowding as an unwanted integration of target and flanker elements, with the amount of disruption driven by the similarity between elements. In contrast, top-down ‘grouping’ approaches argue that crowding follows Gestalt principles of organisation. We reasoned that if crowding is driven by grouping, then wherever crowding occurs, grouping effects should follow. To test this, we compared crowding in peripheral vision with the typical and amblyopic fovea. Observers judged the orientation of a black Landolt-C target, with size-acuity thresholds measured via QUEST procedure. Stimuli were shown in typical peripheral vision (15deg.), the typical fovea, or amblyopic fovea (n=10 in each). In all groups, thresholds were low with an isolated target, and elevated when two black Landolt-C flankers (left/right of the target with matched sizes) or a surrounding box were added. This crowding effect was reduced with white flankers (due to decreased target-flanker similarity) for all groups. In peripheral vision, we replicate several grouping effects with the target amidst a row of six flankers: alternating black-and-white flankers gave strong elevations (‘target-flanker grouping’), while thresholds were reduced by all-white flankers (‘flanker-flanker grouping’) and a row of boxes (‘uncrowding’). Performance differed in both the typical and amblyopic fovea: although the alternating black-and-white flankers elevated thresholds, the all-white flankers and the row of ‘uncrowding’ boxes did not give performance improvements. We also replicate these patterns with narrower Vernier stimuli. Our results suggest that grouping effects derive from diverse processes, some of which are absent from foveal and amblyopic vision. The presence of crowding, and its modulation by target-flanker similarity, was nonetheless clear. We conclude that crowding and grouping are distinct processes.

Acknowledgements: Funded by the UK Medical Research Council & Moorfields Eye Charity

Talk 4, 9:00 am, 41.14

A texture statistics encoding model reveals sensitivity to mid-level features across human visual cortex

Margaret Henderson1 (), Michael Tarr1, Leila Wehbe1; 1Carnegie Mellon University

Mid-level visual features, such as texture and contour, provide a computational link between low- and high-level visual representations. While the detailed nature of mid-level representations in the brain is not yet fully understood, past work has shown that a texture statistics model (P-S model; Portilla and Simoncelli, 2000) captures key aspects of neural responses in areas V1-V4 as well as human behavioral data. However, it is not currently known how well this model accounts for the responses of higher visual cortex regions to natural scene images. To examine this, we constructed single voxel encoding models based on P-S statistics and fit the models to human fMRI data from the Natural Scenes Dataset (Allen et al., 2021). We demonstrate that our texture statistics encoding model can predict the held-out responses of individual voxels in early retinotopic areas as well as higher-level category-selective areas. The ability of the model to reliably predict signal in higher visual cortex voxels suggests that the representation of texture statistics features is widespread throughout visual cortex, potentially playing a role in higher-order visual processing. Furthermore, we use variance partitioning analyses to identify which features are most uniquely predictive of brain responses, and show that the contribution of higher-order texture features increases from early areas to higher areas on the ventral and lateral surfaces of the brain. We also show that patterns of sensitivity to individual texture model features can be used to identify key components of the overall representational space within visual cortex. These results provide a key step forward in characterizing how mid-level feature representations emerge across the visual system, and how they may contribute to higher-order processes like object and scene recognition.

Acknowledgements: This research was funded by a Distinguished Postdoctoral Fellowship from the Carnegie Mellon Neuroscience Institute to MMH. Collection of the NSD dataset was supported by NSF IIS-1822683 and NSF IIS-1822929.

Talk 5, 9:15 am, 41.15

Mapping triangles and breads in shape spaces: a big-data approach to estimating category distributions

Filipp Schmidt1,2 (), Roland W. Fleming1,2; 1Justus Liebig University Giessen, 2Center for Mind, Brain and Behavior (CMBB), Marburg and Giessen

To classify objects, we compare them to mental representations of things we've seen before. By extracting objects’ shape features our visual system can map them into a mental shape space and compare their positions in that space. For example, if a stimulus falls within reasonable distance of the distribution of previously seen triangles, we tend to classify it as a triangle. To understand these classification decisions, we must understand the distribution of class members (e.g., of triangles) in our mental shape space. However, because of the enormous variety in shapes across and within object classes, it is difficult if not impossible to map out these distributions by probing individual objects (e.g., by asking “Is this shape a triangle?”). Here, we analyze drawings from Google’s “Quick, Draw!” database, which were contributed by volunteers from countries all over the world. Specifically, we looked at drawings where participants were instructed to “draw a triangle” or “draw a bread” (> 120,000 drawings each). By simplifying these drawings to their basic geometric form (best-fitting triangle, rectangle, or ellipse), we could express each drawing as a combination of just a few geometric parameters (e.g., angles, axis ratios etc.). We compared the distributions as well as their modes (i.e., the “typical triangle” and “typical bread”) in the resulting shape parameter spaces across countries. For triangles, we obtain very similar distributions of shapes with the typical triangle being equilateral. For bread, however, we obtain markedly different distributions of shapes for different countries (e.g., more rectangular, square toast in the United States and United Kingdom versus more elliptic, elongated loaves in Germany and Poland). This illustrates how we can use drawings to map out mental shape spaces, and test for the universality of object classes across different regions of the world.

Acknowledgements: Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)–project number 222641018–SFB/TRR 135 TP C1; European Research Council (ERC) Consolidator Award ‘SHAPE’–project number ERC-CoG-2015-682859; Hessian Ministry of Higher Education, Research and the Arts–cluster project “The Adaptive Mind”

Talk 6, 9:30 am, 41.16

V4 neurons are tuned for local and non-local features of natural planar shape

Timothy D. Oleskiw1,5 (), James H. Elder2, Ingo Fruend6, Gerick M. Lee1, Andrew Sutter1,3, Anitha Pasupathy4, Eero P. Simoncelli1,5, J. Anthony Movshon1, Lynne Kiorpes1, Najib Majaj1; 1New York University, 2York University, 3Drew University, 4University of Washington, 5Flatiron Institute Center for Computational Neuroscience, 6Verbally GmbH

Planar shape, i.e., the silhouette contour of a solid body, carries rich information important for object recognition, including both local (curvature) and global shape cues. While curvature-selective neurons have been identified in area V4 of primate, it remains unclear whether a) curvature is the best way to characterize the shape selectivity of these neurons and b) whether selectivity is limited to local shape. Here we employ a unique array of shape stimuli to dissociate tuning for local and global shape properties. These stimuli have been used previously to identify an intriguing congruence between the curvature statistics of natural shape and the population response of shape-selective V4 neurons. However, this evidence is indirect, as neural curvature selectivity was not analyzed at the single-neuron level. To address these limitations, we first assess how model neurons, trained on single-unit V4 responses, encode the curvatures of various shape stimuli. A mutual information analysis reveals that these neurons are tuned to extract information more efficiently from shapes with natural curvature distributions, indicating a tuning to the ecological statistics of curvature. Second, to more directly measure neuronal tuning for natural shape we recorded activity from area V4 of a juvenile Macaca nemestrina observing natural and synthetic shapes. Consistent with our model neuron analysis, we found that synthetic shapes with natural curvature distributions elicited stronger responses than synthetic shapes with more random distributions, despite having much lower entropy. Remarkably, we also found that natural shapes elicited stronger V4 responses than synthetic shapes with matching curvature statistics, indicating selectivity for non-local shape features. Together, our findings demonstrate for the first time that V4 neurons are tuned to the ecological statistics of both local and non-local object shape not explained by existing models of V4 shape selectivity.

Acknowledgements: This work is funded in part by NIH EY031446 & EY022428