Computational Advantages of Pathway Specialization for Object Recognition and Visually-guided Grasping

Poster Presentation 23.434: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Functional Organization of Visual Pathways: Cortical visual processing 2

Aida Mirebrahimi1, David C. Plaut1, Marlene Behrmann1,2; 1Carnegie Mellon University, 2University of Pittsburgh

Decades of neuroscience and psychology evidence support a division of the visual cortex into dorsal occipito-parietal and ventral occipito-temporal pathways. Ventral regions are commonly linked to object recognition, whereas dorsal regions are closely associated with spatial processing and visually-guided action, suggesting a functional dissociation between these pathways. Yet, the computational motivations for such dual-stream architecture remain unclear. We introduce a computational framework to test whether an early-shared, late-modular dual-pathway architecture provides a measurable advantage over a single unified pathway for jointly supporting object recognition and visually guided grasping—behaviors traditionally linked to ventral and dorsal streams, respectively. To do so, we trained capacity-matched single- and dual-pathway convolutional neural networks on a new dataset of ~450 naturalistic 3D objects spanning 15 subordinate categories across three superordinate categories. Objects were rendered from 250 viewpoints with transformations in position, scale, and rotation, yielding ~150,000 images. Images include a category label and geometry-based grasp endpoints for joint training on recognition and grasping. We found that late-stage pathway separation accelerated grasp learning without compromising categorization ability. Across three increasingly challenging out-of-distribution tests, the dual-path multitask network (DP_MT) reached grasp thresholds in significantly fewer epochs than a capacity-matched single-path multitask model (SP_MT), while matching a categorization-only baseline in learning rate and final accuracy. At convergence, post-threshold gain analyses showed that both multitask models achieved larger grasp improvements than a grasp-only baseline under milder OOD shifts (novel viewpoints and novel instances), but under the strongest OOD shift (novel categories) only DP_MT maintained this advantage. Task-gradient analyses provide a mechanistic account: DP_MT’s semi-modular organization mitigates late-layer cross-task interference while preserving early shared representations benefiting both tasks. These results quantify the computational benefits of late-stage segregation, offering a rationale for two-stream organization in primate vision and concrete design principles for artificial systems that must jointly support recognition and action.