Visual and auditory object recognition in relation to spatial abilities

Poster Presentation 63.415: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Models

Conor J. R. Smithson1 (), Jason K. Chow1, Isabel Gauthier1; 1Vanderbilt University

Domain-general object recognition (o) is the ability to individuate members of an object category. Visual o is typically measured using novel objects (e.g. Greebles). Stimuli used to measure auditory o include birdsong, mechanical keyboard presses, and laughter. Previous work suggests a nearly perfect correlation across the visual and auditory modalities for this ability (Chow et al., 2023). However, until now the relationship between the two modalities has not been tested in a large sample. We also assess whether o can be distinguished from spatial ability, which has historically dominated measures of visual ability in psychometric studies. Using structural equation modeling with a large sample (n = 283), we estimate the relationships between these abilities at the construct level. We find that visual and auditory o are very closely related (r = .8, 95% CI [.68, .92]), but that this relationship is smaller once the influence of fluid intelligence (Gf) is controlled for (r = .6, 95% CI [.36, .83]). This supports the idea that visual and auditory o may rely substantially on a single cross-modal ability, but that they are nevertheless distinct. Model comparison further supports the claim that visual and auditory o are separable, as a model with distinct visual and auditory abilities had superior fit compared to a model with a single cross-modal ability. Spatial ability was measured using three tests (3D rotation, 2D rotation, paper-folding) and had sizable relationships with both visual (r = .74) and auditory (r = .69) o. However, the associations between spatial ability and both visual and auditory o were no longer significant once Gf was controlled for. O's partial independence from Gf and spatial abilities suggests that it could potentially offer incremental validity if used to predict performance in real-world domains requiring visual abilities.

Acknowledgements: This work was supported by the David K. Wilson Chair Research Fund and NSF BCS Award 2316474