Sparse components distinguish visual pathways and their alignment to neural networks

Poster Presentation 63.411: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Models

Ammar Marvi1 (), Nancy Kanwisher1, Meenakshi Khosla2; 1MIT, 2UCSD

What distinguishes the representations and computations of the ventral, dorsal, and lateral visual streams, and why do current computational models often fail to reflect these differences? Prevailing hypotheses suggest specialized functions for each stream: the ventral stream in object recognition, the dorsal stream in visually guided action, and the lateral stream in motion and social information processing. However, linear encoding models of deep neural networks (DNN) optimized for object categorization predict responses across the three visual streams similarly well. Such findings may indicate a failure to capture neural tuning in model-brain comparison tools, especially those using linear mappings. To address this question we first employed data-driven factorization to identify dominant sparse components within each stream. This method revealed face, place, body, text, and food-selective components in the ventral stream; social interaction, implied motion, and hand-selective components in the lateral stream; and some less interpretable components in the dorsal stream. To systematically assess this effect and its relation to models we propose a new technique – Sparse Components Alignment (SCA) – to measure model-brain alignment while remaining sensitive to neural tuning. Using the same methodological framework as RSA, we assessed stimulus-level representational dissimilarities. However, instead of relying on population geometry, SCA computes pairwise distances between stimuli based on the likelihood that they are processed by the same sparse component. We report three findings: (1) sparse representations differ strikingly across streams, (2) DNNs optimized for object categorization are more similar to the ventral visual stream in these sparse representations, and (3) the clarity of these differences is markedly enhanced with SCA compared to linear encoding or RSA methods. Thus, SCA reveals a notably stronger fit between DNNs and the ventral visual pathway than between DNNs and other pathways, underscoring the importance of characterizing neural tuning—above and beyond representational geometry—in assessing model-brain alignment.

Acknowledgements: This work was funded by NIH grant R01-EY033843