Large-scale visual neural datasets: where do we go from here?

Symposium: Friday, May 17, 2024, 12:00 – 2:00 pm, Talk Room 2

Organizers: Alessandro Gifford1, Kendrick Kay2; 1Freie Universität Berlin, 2University of Minnesota
Presenters: Eline R. Kupers, Won Mok Shim, Ian Charest, Tomas Knapen, Jacob Prince, Alessandro T. Gifford

Vision science has witnessed an increase in worldwide initiatives collecting and publicly releasing large-scale visual neural datasets (LSVNDs). These initiatives have allowed thousands of vision scientists to readily harness LSVNDs, enabling new investigations and resulting in novel discoveries. This suggests vision science is entering a new era of inquiry characterized by big open data. The rapid growth in the collection and use of LSVNDs spawns urgent questions, the answers to which will steer the direction of the field. How can different researchers across the vision sciences spectrum benefit from these datasets? What are the opportunities and pitfalls of LSVNDs for theory formation? Which kinds of LSVNDs are missing, and what characteristics should future LSVNDs have to maximize their impact and utility? How can LSVNDs support a virtuous cycle between neuroscience and artificial intelligence? This symposium invites the VSS community to engage these questions in an interactive and guided community process. We will start with a short introduction (5 minutes), followed by six brief, thought-provoking talks (each 9 minutes plus 3 minutes for Q&A). Enriched by these perspectives, the symposium will then move to a highly interactive 30-minute discussion where we will engage the audience to discuss the most salient open questions on LSVNDs, generate and share insights, and foster new collaborations. Speakers from diverse career stages will cover a broad range of perspectives on LSVNDs, including dataset creators (Kupers, Shim), dataset users (Prince), and researchers playing both roles (Gifford, Charest, Knapen). Eline Kupers will expose behind-the-scenes knowledge on a particular LSVND that has received substantial traction in the field, the Natural Scenes Dataset (NSD), and will introduce ongoing efforts for a new large-scale multi-task fMRI dataset called Visual Cognition Dataset. Won Mok Shim will introduce the Naturalistic Perception Action and Cognition (NatPAC) 7T fMRI dataset, and discuss how this dataset allows investigation of the impact of goal-directed actions on visual representations under naturalistic settings. Ian Charest will present recent results on semantic representations enabled by NSD, as well as ongoing large-scale data collection efforts inspired by NSD. Tomas Knapen will demonstrate how combining LSVNDs with other datasets incites exploration and discovery, and will present ongoing large-scale data collection efforts in his group. Jacob Prince will provide a first-hand perspective on how researchers external to the data collection process can apply LSVNDs for diverse research aims across cognitive neuroscience, neuroAI, and neuroimaging methods development. Finally, Ale Gifford will highlight broad opportunities that LSVNDs offer to the vision sciences community, and present a vision for the future of large-scale datasets. This symposium will interest any VSS member interested in neural data as it will expose opportunities and limitations of LSVNDs and how they relate to smaller, more narrowly focused datasets. Our goal is to align the VSS community with respect to open questions regarding LSVNDs, and help incentivize and coordinate new large-scale data collection efforts. We believe this symposium will strengthen the impact of LSVNDs on the field of vision science, and foster a new generation of big-data vision scientists.

Talk 1

The Natural Scenes Dataset: Lessons Learned and What's Next?

Eline R. Kupers1,2, Celia Durkin2, Clayton E Curtis3, Harvey Huang4, Dora Hermes4, Thomas Naselaris2, Kendrick Kay2; 1Stanford University, 2University of Minnesota, 3New York University, 4Mayo Clinic

Release and reuse of rich neuroimaging datasets have rapidly grown in popularity, enabling researchers to ask new questions about visual processing and to benchmark computational models. One highly used dataset is the Natural Scenes Dataset (NSD), a 7T fMRI dataset where 8 subjects viewed more than 70,000 images over the course of a year. Since its recent release in September 2021, NSD has gained 1700+ users and resulted in 55+ papers and pre-prints. Here, we share behind-the-scenes considerations and inside knowledge from the NSD acquisition effort that helped ensure its quality and impact. This includes lessons learned regarding funding, designing, collecting, and releasing a large-scale fMRI dataset. Complementing the creator’s perspective, we also highlight the user’s viewpoint by revealing results from a large anonymous survey distributed amongst NSD users. These results will provide valuable (and often unspoken) insights into both positive and negative experiences interacting with NSD and other publicly available datasets. Finally, we discuss ongoing efforts towards two new large-scale datasets: (i) NSD-iEEG, an intracranial electroencephalography dataset with extensive electrode coverage in cortex and sub-cortex using a similar paradigm to NSD and (ii) Visual Cognition Dataset, a 7T fMRI dataset that samples a large diversity of tasks on a common set of visual stimuli (in contrast to NSD which samples a large diversity of stimuli during a single task). By sharing these lessons and ideas, we hope to facilitate new data collection efforts and enhance the ability of these datasets to support new discoveries in vision and cognition.

Talk 2

Exploring naturalistic vision in action with the 7T Naturalistic Perception, Action, and Cognition (NatPAC) Dataset

Won Mok Shim1,2, Royoung Kim1,2, Jiwoong Park1,2; 1Institute of Basic Science, Republic of Korea, 2Sungkyunkwan University

Large-scale human neuroimaging datasets have provided invaluable opportunities to examine brain and cognitive functions. Our recent endeavor, the 7T NatPAC project, is designed to provide high-resolution human MRI structural and functional datasets using moderately dense sampling (12–16 2-hr sessions per subject) across a broad range of tasks. While previous large-scale datasets have featured sparse sampling of cognitive functions, our goal is to encompass a more extensive spectrum of cognitive and affective processes through diverse tasks, spanning both structured and naturalistic paradigms. Notably, we incorporated naturalistic tasks to probe a variety of higher-order cognitive functions including watching movies, freely speaking, and interactive 3D video game playing within a Minecraft environment. Through a collection of innovative Minecraft-based games simulating real-world behaviors, we aim to investigate the neural mechanisms of perception, action, and cognition as an integrative process that unfolds in naturalistic contexts. In this talk, I will focus on a shepherding game, where participants engage in strategic planning with hierarchical subgoals and adaptively update their strategies while navigating a virtual world. In combination with high-precision eye tracking data corrected for head motion, we explore how visual responses, including population receptive field (pRF) mapping, are modulated in the visual cortex and frontoparietal regions during free viewing and complex goal-directed behaviors compared to passive viewing of game replays and conventional pRF experiments. I will discuss the broader implications of the impact of goal-directed actions on visual representations and how large-scale datasets enable us to examine such effects in naturalistic settings.

Talk 3

Exploiting large-scale neuroimaging datasets to reveal novel insights in vision science

Ian Charest1,2, Peter Brotherwood1, Catherine Landry1, Jasper van den Bosch1, Shahab Bakhtiari1,2, Tim Kietzmann3, Frédéric Gosselin1, Adrien Doerig3; 1Université de Montréal, 2Mila - Québec AI Institute, 3University of Osnabrück

Building quantitative models of neural activity in the visual system is a long-standing goal in neuroscience. Though this research program is fundamentally limited by the small scale and low signal-to-noise of most existing datasets, with the advent of large-scale datasets it has become possible to build, test, and discriminate increasingly expressive competing models of neural representation. In this talk I will describe how the scale of the 7T fMRI Natural Scenes Dataset (NSD) has made possible novel insights into the mechanisms underlying scene perception. We harnessed recent advancements in linguistic artificial intelligence to construct models that capture progressively richer semantic information, ranging from object categories to word embeddings to scene captions. Our findings reveal a positive correlation between a model's capacity to capture semantic information and its ability to predict NSD data, a feature then replicated with recurrent convolutional networks trained to predict sentence embeddings from visual inputs. This collective evidence suggests that the visual system, as a whole, is better characterized by an aim to extract rich semantic information rather than merely cataloging object inventories from visual inputs. Considering the substantial power of NSD, collecting additional neuroimaging and behavioral data using the same image set becomes highly appealing. We are expanding NSD through the development of two innovative datasets: an electroencephalography dataset called NSD-EEG, and a mental imagery vividness ratings dataset called NSD-Vividness. Datasets like NSD not only provide fresh insights into the visual system but also inspire the development of new datasets in the field.

Talk 4

Farewell to the explore-exploit trade-off in large-scale datasets

Tomas Knapen1,2, Nick Hedger3, Thomas Naselaris4, Shufan Zhang1,2, Martin Hebart5,6; 1Vrije Universiteit, 2Royal Dutch Academy of Arts and Sciences, 3University of Reading, 4University of Minnesota, 5Justus Liebig University, 6Max Planck Institute for Human Cognitive and Brain Sciences

LSVNDs are a very powerful tool for discovery science. Due to their suitability for exploration, large datasets synergize well when supplemented with more exploitative datasets focused on small-scale hypothesis testing that can confirm exploratory findings. Similar synergy can be attained when combining findings across datasets, where one LSVND can be used to confirm and extend discoveries from another LSVND. I will showcase how we have recently leveraged several large-scale datasets in unison to discover principles of topographic visual processing throughout the brain. These examples demonstrate how LSVNDs can be used to great effect, especially in combination across datasets. In our most recent example, we combined the HCP 7T fMRI dataset (a "wide" dataset with 180 participants, 2.5 hrs of whole-brain fMRI each) with NSD (a "deep" dataset with 8 participants, 40 hrs of whole-brain fMRI each) to investigate visual body-part selectivity. We discovered homuncular maps in high-level visual cortex through connectivity with primary somatosensory cortex in HCP, and validated the body-part tuning of these maps using NSD. This integration of wide and deep LSVNDs allows inference about computational mechanisms at both the individual and population levels. For this reason, we believe the field needs a variety of LSVNDs. I will briefly present ongoing work from my lab collecting new ‘deep’ LSVND contributions: a brief (2.5-s) video watching dataset and a retinotopic mapping dataset, each with up to 10 sessions of 7T fMRI in 8 subjects.

Talk 5

Large datasets: a Swiss Army knife for diverse research aims in neuroAI

Jacob Prince1, Colin Conwell2, Talia Konkle1; 1Harvard University, 2Johns Hopkins University

This talk provides a first-hand perspective on how users external to the data collection process can harness LSVNDs as foundation datasets for their research aims. We first highlight recent evidence that these datasets help address and move beyond longstanding debates in cognitive neuroscience, such as the nature of category selective regions, and the visual category code more broadly. We will show evidence that datasets like NSD have provided powerful new insight into how items from well-studied domains (faces, scenes) are represented in the context of broader representational spaces for objects. Second, we will highlight the potential of LSVNDs to answer urgent, emergent questions in neuroAI – for example, which inductive biases are critical for obtaining a good neural network model of the human visual system? We will describe a series of controlled experiments leveraging hundreds of open-source DNNs, systematically varying inductive biases to reveal the factors that most directly impact brain predictivity at scale. Finally, for users interested in neuroimaging methods development, we will highlight how the existence of these datasets has catalyzed rapid progress in methods for fMRI signal estimation and denoising, as well as for basic analysis routines like PCA and computing noise ceilings. We will conclude by reflecting on both the joys and pain points of working with LSVNDs, in order to help inform the next generation of these datasets.

Talk 6

What opportunities do large-scale visual neural datasets offer to the vision sciences community?

Alessandro T. Gifford1, Benjamin Lahner2, Pablo Oyarzo1, Aude Oliva2, Gemma Roig3, Radoslaw M. Cichy1; 1Freie Universität Berlin, 2MIT, 3Goethe Universität Frankfurt

In this talk I will provide three complementary examples of the opportunities that LSVNDs offer to the vision sciences community. First, LSVNDs of naturalistic (thus more ecologically valid) visual stimulation allow the investigation of novel mechanisms of high-level visual cognition. We are extensively recording human fMRI and EEG responses for short naturalistic movie clips; modeling results reveal that semantic information such as action understanding or movie captions is embedded in neural representations. Second, LSVNDs contribute to the emerging field of NeuroAI, advancing research in vision sciences through a symbiotic relationship between visual neuroscience and computer vision. We recently collected a large and rich EEG dataset of neural responses to naturalistic images, using it on the one hand to train deep-learning-based end-to-end encoding models directly on brain data, thus aligning visual representations in models and the brain, and on the other hand to increase the robustness of computer vision models by exploiting inductive biases from neural visual representations. Third, LSVNDs make possible critical initiatives such as challenges and benchmarks. In 2019 we founded the Algonauts Project, a platform where scientists from different disciplines can cooperate and compete in creating the best predictive models of the visual brain, thus advancing the state-of-the-art in brain modeling as well as promoting cross-disciplinary interaction. I will end with some forward-looking thoughts on how LSVNDs might transform the vision sciences.