What makes a task naturalistic? Towards a theory-inspired approach to studying visually-guided behaviour

Symposium: Friday, May 15, 2026, 10:30 am – 12:30 pm, Talk Room 1

Organizers: Jolande Fooken1, Constantin A. Rothkopf1; 1Centre for Cognitive Science, Technical University Darmstadt, Germany
Presenters: Kendrick Kay, Jody Culham, Lukas T. Oesch, Caroline Robertson, Kathryn Bonnen, Constantin A. Rothkopf

As vision scientists, we aim to understand how behaving systems sense, process, and act upon visual information in service of their goals. Imagine making yourself a peanut butter jelly sandwich. This seemingly simple action requires the integration of perceptual, sensorimotor, and cognitive processes. Traditional laboratory paradigms have advanced the field by isolating separate process-specific components and studying their impact on perception. Yet, a major challenge today is determining how these insights generalize to more naturalistic settings, where multiple processes operate simultaneously and where factors unknown to the experimenter can shape the visual computations that unfold moment to moment. An important step to bridging the gap between the lab and the wild, is to systematically measure natural behaviour in everyday tasks to understand the functional demands placed on the visual system. Moreover, empirical studies should be complemented by computational approaches that formalize the structure of natural visual tasks and identify the underlying neural processes that support them. The goal of our symposium is to encourage vision scientists to consider what it means for a task to be naturalistic. Specifically, we will pose three questions: Does the way a task is designed influence neural and behavioural responses? Can task goals be formalized by carefully studying diverse behaviour? Can we find theory-driven approaches to studying natural behaviour? The symposium will include six talks from speakers at different career stages. Speakers will present various experimental techniques, including neural imaging in humans, fluorescence microscopy in mice, virtual reality, motion capture, eyetracking, and computational modelling. To start things off, Jolande Fooken will give a brief introduction (3 minutes) on how we came to ask the question: “What makes a task naturalistic?”. The introduction will be followed by six symposium talks (15 + 2 minutes questions). First, Kendrick Kay will highlight the importance of considering visual task design when investigating how the brain transforms visual inputs into representations. When studying neural responses in natural viewing, we not only need to consider task demands but also how natural actions, such as reaching and grasping, can be studied in the constrained space of the brain scanner. Second, Jody Culham will highlight how 3D displays and active video games can improve naturalism in human fMRI. Leaving the scanner, Lukas Oesch will demonstrate the neural control of decision monitoring in freely behaving mice. From perceptual decisions in mice, we will switch to cognitive processing in humans. Fourth, Caroline Robertson will show that immersive virtual reality with in-headset eye-tracking offers a methodological bridge between traditional vision science paradigms and natural behavior. We will then leave the virtual and dive into the real world. Fifth, Kathryn Bonnen will present how humans adapt visuomotor priorities and control strategies when walking in complex environments. Lastly, to tie things together, Constantin Rothkopf will describe how we can formalize the regularities observed in natural behaviour through theory-driven computational models. Each individual talk will conclude with a take-home message about key features of a naturalistic task. These take-home messages will set up the concluding panel discussion (15 minutes).

Talk 1

A broad sampling of how different tasks influence visually evoked responses in the human brain

Kendrick Kay1, Clay Curtis2,3, Adrian Wong1, Amy Poole1, Eline Kupers4; 1Center for Magnetic Resonance Research (CMRR), Department of Radiology, University of Minnesota, Minneapolis, Minnesota, USA, 2Department of Psychology, New York University, New York, NY, USA, 3Center for Neural Science, New York University, New York, NY, USA, 4Department of Psychology, York University, Toronto, Ontario, Canada

Visual neuroscientists seek to understand how the brain transforms visual inputs to representations that support behavior. In many visual neuroscience experiments, neural responses to visual stimuli are measured and the task performed by the observer is often neglected or assumed to have secondary importance to the scientific questions being tackled. However, since task demands influence visually evoked responses, it is clear that a complete characterization of visual processing must refer to both the stimulus and the task. Here, we describe ongoing efforts to collect a large-scale 7T fMRI dataset, termed the Visual Cognition Dataset (VCD), where extensive whole-brain fMRI responses (> 40 hours of data per participant) are measured from a small set of observers (n = 6) while they engage in a variety of visual tasks on a common set of stimuli. We sample both classic laboratory tasks (e.g., engaging in fixation, attentional, decision-making, and working memory tasks on controlled artificial stimuli such as Gabors and random-dot kinematograms) as well as more naturalistic tasks (e.g., making 'what', 'where', and 'how' judgments on complex natural scenes). Importantly, we impose shared constraints on the tasks with respect to timing and trial structure to facilitate interpretation and direct comparison of tasks. We anticipate that VCD will serve as a rich resource for gaining insights into task-dependent neural responses and, more generally, how the brain executes visual tasks.

Talk 2

Bringing the real world into the brain scanner using 3D displays and video games

Jody Culham1; 1University of Western Ontario, London, ON, Canada

Vision science often uses simple stimuli like images along with simple stimulus-response tasks, particularly in studies using functional magnetic resonance imaging (fMRI), where more complex stimuli and behaviors are limited by methodological constraints. In this talk, I will review two new approaches in brain imaging for investigating more realistic stimuli and tasks and show how they can provide novel findings. First, I will discuss how the contributions of real-world geometry (physical/virtual size and distance) can be explored using tangible objects and binocular simulations with a 3D projector. Results suggest that real-world distance is represented throughout the visual system, especially for physical (vs. simulated) displays. Second, I will discuss how video games can be used to study brain activation in “closed-loop” scenarios where continuous visual feedback is used to guide actions. Active video game play evokes different brain regions and interactions compared to a control condition with matched visual input and motor output. Together, new approaches show that a move toward naturalistic neuroimaging can reveal new aspects of visual processing that may have been neglected with conventional approaches.

Talk 3

Anterior cingulate neurons combine outcome monitoring of past decisions with ongoing movement signals

Lukas T. Oesch1, Makenna C. Thomas1, Davis Sandberg1, João Couto1, Anne K. Churchland; 1Department of Neurobiology, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles

In dynamic environments, animals must closely monitor the effects of their actions to inform switches in behavioral strategy. Anterior cingulate cortex (ACC) neurons track decision outcomes in these environments. Yet, it remains unclear whether ACC neurons similarly monitor behavioral history in static environments and, if so, whether these signals are distinct from movement representations. We recorded large-scale ACC activity in freely moving mice making visual evidence-accumulation decisions. Many ACC neurons exhibited nonlinear mixed selectivity for previous choices and outcomes (trial history) and were modulated by movements. Trial history could be stably decoded from population activity and accounted for a separable component of neural activity than posture and movements. Trial history encoding was conserved across different subjects and was unaffected by fluctuating behavioral biases. These findings demonstrate that trial history monitoring in ACC is implemented in a conserved population code that is independent of the volatility of subjects’ task environment.

Talk 4

What makes vision naturalistic? Insights from immersive VR and active visual perception

Caroline Robertson1; 1Dartmouth College, NH, USA

Natural behavior unfolds as a continuous stream of self-directed perceptual decisions—what to look at, when, and why. A central challenge for naturalistic vision science is that these decisions are shaped not only by sensory input, but also by the ongoing process of constructing semantic understanding across the visual scene. Capturing the emerging process requires situating vision in active paradigms that engage perceptual-mnemonic interactions. In this talk, I will argue that immersive virtual reality with in-headset eye-tracking offers a way forward—a methodological bridge between traditional vision paradigms and natural behavior. This approach allows participants to explore rich environments with natural eye, head, and body movements while preserving full experimental control over stimulus content and timing. By enabling dense sampling of gaze across large, diverse sets of naturalistic stimuli within standardized conditions, immersive VR provides an exceptionally powerful substrate for computational modeling of natural visual behavior. I will highlight evidence from recent work revealing three aspects of naturalistic visual processing that emerge in active viewing. In immersive environments, memory anticipates upcoming input across head turns, generating predictions that speed perceptual judgments. Active viewing also heightens the influence of semantic meaning on attention, producing richer, more differentiated patterns of meaning-guided sampling than in matched passive conditions. When participants freely navigate the motor and semantic landscapes of real-world scenes, individual differences become more pronounced—exposing stable conceptual priorities that generalize across environments and sessions. Together, these findings illustrate how immersive VR can help us understand how memory, meaning, and individual differences shape natural visual behavior.

Talk 5

Leveraging spatiotemporal structure in natural tasks: Evidence from visually-guided walking

Kathryn Bonnen1; 1School of Optometry, Indiana University

In our everyday lives, perception and action unfold continuously in time, constrained by the dynamics of both the environment and the body. Walking is a critical naturalistic behavior that depends on precisely timed visual sampling to support foothold selection, navigate obstacles, and plan routes. These tasks rely on distinct spatiotemporal structures of gaze and movement, each reflecting different visuomotor priorities and control strategies, and walkers seamlessly transition between these tasks adapting their gaze and actions as environmental demands change. This complexity presents a challenge: how do we identify the meaningful units and patterns of behavior that reveal how vision supports movement in the real world? We address this question by measuring gaze, head, and body motion during walking in complex natural terrains. We use information about the task, environmental structure, and motor behavior to characterize the spatiotemporal organization of gaze. This approach leads to the identification of recurring visuomotor motifs such as step-by-step foothold-fixation sequences, forward-looking navigation checks, exploratory gaze forks, and a rhythmic “gaze cycle” in low-demand conditions. Each appears to serve a distinct functional role, and walkers sequence these visuomotor behaviors to meet the evolving demands of their environment as they move through it. Together, these results demonstrate how information about task features, environmental structure, and motor behaviour can be used to decompose visually-guided walking into meaningful visuomotor components, revealing how vision and movement are coordinated during walking.

Talk 6

Computational elements of natural vision

Constantin A. Rothkopf1; 1Centre for Cognitive Science, Technical University Darmstadt, Germany

Natural, everyday tasks such as navigation, food preparation, and sports give rise to complex, sequential behavior of the whole body, including the eyes, head, and limbs. During such goal-directed behavior, vision has been described as tightly integrated with cognitive processes, including attention, memory, decision-making, planning, action selection and generation, and learning. Observing behavior in such scarcely controlled tasks appears to yield highly variable behavioral measurements with broad interindividual differences, rendering analysis with statistical methods developed for trial-based experiments challenging. How can such behavior be understood quantitatively? In this talk, I will present several naturalistic tasks together with a computational model of sequential decisions that incorporates sensory uncertainty, internal model uncertainty, and motor variability. These so-called Partially Observable Markov Decision Processes (POMDP) models can be understood as a generalization of signal detection theory to sequential tasks with additional sources of noise. The value of these models lies in their ability to provide parsimonious explanations of behavioral strategies, their associated errors and variability, and to quantify individual differences and explain their sources. For exemplary tasks including navigation, pouring a cup of coffee, and catching a fly ball, these models provide a subject-by-subject, trial-by-trial, and moment-by-moment explanation of how patterns of goal-directed sensorimotor behavior arise from the continuous and dynamic interactions of uncertainties in perception, cognition, and action, and how these uncertainties are actively shaped during natural behavior.