On the relationship between distributed association networks and category-preferring visual streams in the inferotemporal cortex

Poster Presentation 43.301: Monday, May 18, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object recognition: Categories

Joseph Salvo1,3, Nathan Anderson2,3, Rodrigo Braga3; 1NIH, 2Brigham Young University, 3Northwestern University

The ventral visual stream in the inferotemporal cortex (ITC) comprises hierarchically-organized visual regions involved in recognition. Within ITC, contrast-based approaches reveal regions responding preferentially to images containing scenes (Epstein et al., 1998), faces (Kanwisher et al., 1997), or text (Cohen et al., 2002). These regions are arranged as streams along the long axis of the ITC’s posterior half. Posterior ITC also includes a broad region of the canonical dorsal attention network (dATN), an association network implicated in top-down visual attention (Corbetta, 1998) definable using intrinsic (‘resting-state’) correlations. We asked whether the visual streams are confined to the canonical dATN, given the dATN’s role in higher-order visual processing, or if the streams extend beyond the dATN, implying dATN involvement in earlier stages of visual processing. We collected 3T data from eight healthy adults across eight MRI sessions, allowing for individualized analyses, using a multi-echo sequence (Lynch et al., 2020). Participants performed tasks targeting language, theory of mind, mental imagery, and visual categories (scenes, faces, text). Large-scale networks were identified through functional connectivity using seed-based and data-driven clustering approaches (Braga & Buckner, 2017). The categories activated parallel ITC regions, arranged text, faces, scenes from lateral to medial. Face and scene streams clearly passed through the dATN, and the text stream did so in most participants, linking the dATN with mid-stage visual processing of multiple stimulus categories. All streams also overlapped with different large-scale networks in more anterior ITC: text regions aligned with the core language network, scenes aligned with a mental scene construction network (Default network A; DN-A), and faces tentatively aligned with the social cognition network (DN-B). Our results suggest a conserved arrangement of parallel visual streams passing through the dATN to terminate in distinct association networks that are functionally related to the content of each visual stream.