Motion: Neural mechanisms, models and illusions

Talk Session: Tuesday, May 19, 2026, 5:15 – 7:15 pm, Talk Room 2
Moderator: Alex Huk, UCLA

Talk 1, 5:15 pm, 55.21

First-order, second-order and nonrigid motion tuning in marmoset MT/MTC/FST

Krischan Koerfer1, Alexia Nelms2, Markus Lappe1, Jude Mitchell2,3; 1Institute for Psychology and Otto Creutzfeldt Center for Cognitive and Behavioral Neuroscience, University of Münster, 2Brain and Cognitive Sciences, University of Rochester, 3Center of Visual Science, University of Rochester

Neurons in area MT show strong direction tuning for first-order (luminance-defined) motion. By contrast, reports of second-order motion tuning in MT are mixed, and the cortical basis of nonrigid motion processing remains poorly understood, while behavioural evidence from pursuit and speed perception suggesting partly distinct pathways (Koerfer & Lappe, 2022; Koerfer et al., 2024). In marmoset monkeys motion-selective areas MT, MTC and FST are fully accessible on the cortical surface. We recorded extracellular single-unit activity using linear electrode arrays across these areas while presenting first-order, second-order and nonrigid motion stimuli during free-viewing. Off-line eye position was corrected to establish retinotopy and map receptive fields. Receptive fields were defined based on the response to brief (50 ms) first- and second-order motion probes (16 directions of motion, 10 deg/s), yielding 147 units with significant receptive fields. For tuning measurements, we presented first-order, second-order and nonrigid motion for 150 ms in 16 directions at 10 deg/s. Firing rates were quantified with a direction selectivity index (DSI), and tuning significance was assessed using a bootstrap and shuffle-based procedure. Across the population, 47 units showed significant direction tuning at this fixed speed for first-order motion. Most recorded units were also significantly driven by second-order and nonrigid motion albeit without direction tuning. DSIs for these conditions did not rise reliably above shuffled controls, even in units that were clearly tuned for first-order motion. These results suggest that, under our stimulus conditions, direction-selective tuning in the marmoset MT/MTC/FST complex is dominated by first-order motion, whereas second-order and nonrigid motion primarily modulate firing rate without strong or consistent direction selectivity. This dissociation constrains models of how different motion pathways contribute to the perception of complex and nonrigid motion and suggests that direction-selective tuning for second-order and nonrigid motion may emerge in areas downstream or parallel to MT/MTC/FST.

Funding: R01-EY030998, EU MSCA SE 101068206

Talk 2, 5:30 pm, 55.22

Representational geometry of optical flow patterns determines the response properties of MST neurons

Nathaniel Powell1 (), Mary Hayhoe1, Gregory DeAngelis2, Xuexin Wei1; 1University of Texas at Austin, 2University of Rochester

Neural tuning properties are shaped by stimulus statistics. Efficient coding predicts that distributions of neurons and Fisher information should be allocated according to stimulus statistics. While efficient coding can account for encoding of certain stimulus variables, its prediction seems to be inconsistent with encoding of heading direction in macaque MST (i.e., more neurons are selective to lateral headings, yet the sensitivity to them is lower). We developed computational models of how the visual system hierarchically processes heading direction. A given heading direction causes an optical flow pattern, which is encoded by motion selective neurons in the model MT layer. Model MST neurons then linearly integrate MT responses. We constructed several variations of models, and investigated the predicted neuronal selectivity and representational geometry. The models used optical flow patterns as inputs that were generated either by real human movement or by uniformly sampled heading directions from analytically derived flow patterns, inconsistent with natural statistics. To test the model predictions, we re-analyzed the response of recorded macaque MST neurons from published datasets. When changing input statistics, model MST responses adapt, as reflected in the change in Fisher information and distribution of neural preferences, consistent with efficient coding. However, when heading direction is uniformly distributed, model MST neurons still exhibit large heterogeneity in tuning properties similar to neural data. This observation cannot be explained by efficient coding. Crucially, we found that the tuning heterogeneity is due to the asymmetry of the stimulus manifold defined by the optical flow. Model MST neurons inherit this asymmetry, causing the heterogeneity in their tuning properties. Importantly, macaque MST neurons exhibit similar asymmetrical structures in their representational geometry. We found that the geometry of the input manifold influenced the tuning properties of downstream areas and could account for the puzzling tuning properties observed in MST.

This work is supported by NIH EY05729 (to Hayhoe), NSF 2318065 (to Hayhoe and Wei) and a Sloan Research fellowship (to Wei).

Talk 3, 5:45 pm, 55.23

Consistency between visual motion and locomotion modulates responses in primate area MT

Penny-Shuyi Chen1,2, Declan P Rowley1,3, Alexander C Huk1,4; 1Fuster Laboratory for Cognitive Neuroscience UCLA, 2Neuroscience Interdisciplinary Program UCLA, 3Department of Ophthalmology UCLA, 4Department of Psychiatry & Biobehavioral sciences UCLA

Primate motion area MT has been extensively studied, often in animals whose movements are highly constrained, with them passively viewing or performing artificial tasks when viewing synthetic stimuli. It is therefore unknown whether MT exhibits computations that involve interactions between active behaviors and resulting retinal motions. Here, we employ a virtual reality setup for neural recordings in marmosets to test whether responses in MT are affected by whether retinal motions resulted from the animal’s own movements. Two head-fixed marmosets ran on a treadmill. They viewed simulated realistic optic flow patterns (dot fields), with a densely-structured flat ground plane and a less-regular upper zone. In the “Closed-Loop” condition, the stimulus was updated in real time to be consistent with the subject’s concurrent treadmill locomotion. In a “Replay” condition, the stimulus from the “Closed-Loop” was replayed, but the visual display was no longer yoked to the treadmill. We conducted multi-area Neuropixels recordings (MT, MST, V1 and V2) and high-precision digital DPI eye-tracking, and also performed extensive mapping of the receptive fields of individual neurons. As expected, MT activity was strongly driven by retinal motion (number of MT units = 3243). But, surprisingly, the population average firing rate was approximately 20% higher during Replay than Closed Loop. This Closed-Loop versus Replay modulation was evident in highly-responsive and well-tuned single neurons, and was not explained by initial analyses of running modulation, saccade modulation, or arousal states. Furthermore, MT showed higher activity and larger response variance during open-loop Replay compared to (visual-motion-matched) Closed-Loop when there was a mismatch between visual motion and ongoing locomotion. V1/V2 responses (n=1587) did not show this effect. These results suggest that MT responses to self-generated motion signals are suppressed, consistent with a gain-based mechanism that differentiates self-generated retinal motions from those derived from moving objects and scene elements.

Fuster endowment, UCLA

Talk 4, 6:00 pm, 55.24

Better, But Not Sufficient: Comparing Video Artificial Neural Networks, and Macaque IT Dynamics

Matteo Dunnhofer1,2, Christian Micheloni2, Kohitij Kar1; 1York University, 2University of Udine

Feedforward ANNs trained on images remain the dominant models of visual cortex, despite being limited to static inference. In contrast, the primate visual system is a dynamical system operating in a dynamic world. Recent evidence shows that macaque inferior temporal (IT) cortex encodes both object identity and motion velocity. How well do static image–trained artificial neural networks (ANNs) capture IT dynamic computations? Do video-trained ANNs improve upon these static models? We recorded activity from 131 IT sites (two monkeys) while they passively viewed 920 short (300ms) videos of moving objects. We tested 12 feedforward, 2 recurrent, and 13 video-trained ANNs. Feedforward models were unfolded in time by extracting frame-by-frame features to approximate instantaneous, history-independent computations. Video ANNs outperformed other models in decoding motion-direction (400 clips; video ANNs accuracy=0.69, percent correct; feedforward ANNs accuracy=0.58; Δaccuracy=0.109,p<0.05) and motion-speed (320 clips; video ANNs accuracy accuracy=0.68; feedforward ANNs accuracy=0.63; Δaccuracy=0.049,p<0.05) on naturalistic videos, paralleling IT (0.58;0.57) where late responses carried stronger motion information. In a stress test, we generated 100 appearance-free videos (AFV), where appearance was replaced by random pixels preserving motion trajectories. IT decoders trained on appearance-based responses generalized to above-chance motion decoding for AFV (accuracy=0.59), whereas all ANNs collapsed to chance-level. We then tested how well ANN features predict neural responses. Feedforward models explained a significant portion of IT variance during early responses (~90–180ms; mean %EV=62.5%) but declined significantly later (~480–570ms; %EV=23.1%). Video ANNs showed modest improvements in this late window (Δ%EV=6.9%; t(24)=7.55; p<0.01), suggesting that temporal training aids modeling late-phase IT dynamics. Interestingly, recurrent models performed comparably to feedforward networks, indicating that shallow recurrence alone does not bridge the gap. Together, these findings show that temporal training promotes closer alignment between ANN and IT dynamics, yet video models remain bound to appearance-based temporal cues and fail to capture appearance-invariant motion encoding.

KK is supported by the Canada Research Chair Program (CRC-2021-00326), SFARI (967073), Brain-Canada Foundation (2023-0259), and NSERC (RGPIN-2024-06223). MD was funded by the European Union (MSCA Project 101151834 - PRINNEVOT).

Talk 5, 6:15 pm, 55.25

Uncertainty-weighted integration of motion and position: psychophysical and fMRI evidence from the Double-Drift Illusion

Ke Yin1 (), Shijia Zhang2, Jiyan Zou3, Ce Mo1; 1Sun Yat-sen University, 2South China Normal University, 3Fudan University

Visual information from multiple physical dimensions (e.g., motion, position) is integrated to form coherent percepts, yet this process is complicated by sensory uncertainty that is ubiquitous in complex visual scenes. Normative Bayesian accounts propose that information with low uncertainty should be assigned greater weights during integration, but direct behavioral and neural evidence for such uncertainty-weight integration in vision remains scarce. Here, we leveraged the double-drift illusion (DDI) to test how the visual system integrates motion and position signals with different levels of uncertainty. In this illusion, the perceived change of external stimulus position is biased in the direction of the internal stimulus motion. The magnitude of the DDI thus reflects the relative weighting of internal stimulus motion in the integration process. To independently manipulate the uncertainty of motion and position, we designed a novel random dot kinematogram (RDK) stimulus that robustly induced the DDI, where the uncertainty of motion and position was modulated by varying the motion direction variance and the density of the RDK dots, respectively. In a series of psychophysical experiments, we found that increasing motion uncertainty reduced the DDI, whereas increasing position uncertainty enhanced it, indicating that the two signals were weighted according to their relative uncertainty. Moreover, similar behavioral patterns were found when the RDK stimulus changed only in its spatiotopic position while its retinotopic position remained constant, indicating the same uncertainty-weighted integration process in both the spatiotopic and the retinotopic reference frame. Multivariate fMRI analyses further showed that the neural representation of the DDI in extrastriate visual areas was attenuated by the increasing motion uncertainty. This effect was most pronounced in V2/3 and gradually diminished up the visual hierarchy. Our findings revealed, for the first time, an optimal uncertainty-weighted visual integration mechanism that is implemented in V2/3.

National Natural Science Foundation of China (32471104 to C.M.)

Talk 7, 6:30 pm, 55.26

When Form Meets Motion: Recurrent Connections Enhance Biological Motion Perception

Shuangpeng Han1 (), Ziyu Wang1, Thomas Serre2, Mengmi Zhang1; 1Nanyang Technological University, 2Brown University

Biological motion perception (BMP) refers to humans’ ability to perceive and recognize the actions of living beings purely from their motion patterns, even when those actions are presented only as sparse point-light displays. Neurophysiological studies indicate that BMP in the human brain arises from coordinated processing across the ventral and dorsal visual streams. The seminal two-stream model of Giese & Poggio (2003) formalized this division of labor, proposing parallel form and motion pathways whose dynamic integration supports robust action recognition. However, this model lacks recurrent connections and top-down mechanisms, which are supported by neurophysiological evidence but absent from both classical theories and modern AI systems. Building on this foundation, we introduce the Two-Stream Motion Perceiver (TSMP), which translates the core Giese & Poggio architecture into a modern deep learning framework while incorporating biologically motivated recurrent connections and top-down modulations from memories. TSMP preserves the separation of form and motion pathways, each equipped with memory mechanisms: the ventral pathway encodes and stores body-form structure, while the dorsal pathway learns and memorizes prototypical motion patterns. During inference, TSMP operates in a recurrent manner, wherein each iteration produces top-down signals that progressively select and refine both motion and form representations. We evaluate TSMP, along with humans and a broad set of AI models, using our curated large-scale BMP dataset, which contains 79,744 stimuli spanning 30 BMP conditions grounded in neuroscience. TSMP exhibits notable improvements in out-of-domain generalisation on BMP conditions, surpassing all existing AI baselines by up to 35% in top-1 accuracy and achieving a marked alignment with human behaviour. Our work offers a biologically grounded pathway toward models that more faithfully capture the computations underlying human biological motion perception.

This research is supported by the National Research Foundation, Singapore under its NRFF award NRF-NRFF15-2023-0001 and Mengmi Zhang’s Startup Grant from Nanyang Technological University, Singapore.

Talk 7, 6:45 pm, 55.27

Causal inference in visual motion and position perception

Jeongbeen Yoo1 (), Oh-sang Kwon1; 1Ulsan National Institute of Science and Technology

Most theoretical frameworks of motion perception overlook the fact that motion typically co-occurs with changes in object position. To address this, we previously developed an object-tracking model (Kwon, Tadin, & Knill, 2015). The model accounts for several interactions between motion and position signals, including the Motion-Induced Position Shift (MIPS), where an object’s perceived position is biased in the direction of its internal pattern motion. However, internal pattern motion has also been shown to bias boundary position in the opposite direction in near fovea (Zhang, Yeh & De Valois, 1993), a phenomenon that the original object-tracking model cannot explain. Here, we extend the model to improve its generalizability. The key idea is that the model now considers two ecologically valid forward models to interpret the pattern motion inside a boundary. The first case is a moving object with internal pattern motion. In this case, the boundary belongs to the object, and the boundary motion is coupled with the pattern motion (object model). The second case is pattern motion observed through an aperture. In this case, the boundary belongs to the aperture, and the boundary motion and internal pattern motion are independent (aperture model). We assumed that the visual system applies these two forward models to interpret sensory inputs and optimally estimate the state of the world. Results showed that under the object model, boundary estimates shift in the direction of the pattern motion, whereas under the aperture model, they shift in the opposite direction. Furthermore, if we assume that the forward model with higher likelihood dominates perception, the direction of the bias changes systematically with stimulus properties, consistent with empirical data. Overall, the extended model provides a rational account of how visual motion and position signals interact, reconciling previously contradictory findings by incorporating ecologically valid forward models.

This research was supported by the National Research Foundation of Korea (NRF-2023R1A2C1007917 to O.-S.K.).

Talk 8, 7:00 pm, 55.28

Visibility of Ground Contact Point can Eliminate Depth Misperceptions caused by Interocular Delays

Anthony LoPrete1 (), Johannes Burge1; 1University of Pennsylvania

Abstract: Blur and luminance differences between the eyes introduce millisecond-scale interocular delays that can cause dramatic depth misperceptions for moving objects. Monovision corrections for presbyopia–which intentionally induce interocular blur differences–are worn by millions of people worldwide. The attendant depth misperceptions could therefore have public safety implications. However, despite the extreme consistency of laboratory measurements (and one high-profile airplane crash), reports of motion illusions (and/or accidents) among monovision wearers are rare in daily life. One potential reason is that in real-world viewing, additional cues (e.g. ground contact points) indicate the true instead of incorrect depths. We used a Magic Leap 2 augmented reality headset to measure how viewing context affects delay-induced depth misperceptions. A trial-randomized experiment presented virtual target objects–sometimes rendered with interocular luminance or blur differences–that were depth-consistent with the natural environment. On half the trials, an occluder blocked visibility of where the target contacted the ground. The target was an upright rectangular polygon that moved left or right across the scene (1.7-3.9mph) at distances ranging from 1m to 2m behind the occluder. The moving target, occluder, and ground were textured with 1/f noise. To increase the strength of ground contact cues, the ground was also littered with randomly positioned virtual ‘grass blades’. Textures and grass blades were refreshed on each trial to prevent observers from using consistent visual landmarks. Observers reported the perceived target distance by placing a virtual marker on the ground. When the occluder obscured the target contact point with the ground, target distances were substantially misperceived (e.g. >1.0m); errors increased with target speed. When the occluder was absent, the illusion was nearly eliminated. The data help explain the reduced frequency of effects in real-world viewing, and show that, in certain real-world contexts, monovision can still cause pronounced misestimations of depth.

This work was supported by the National Eye Institute and the Office of Behavioral and Social Sciences Research, National Institutes of Health Grant R01-EY028571 to J.B. and an Ashton Fellowship from the University of Pennsylvania to A.L.