VSS, May 13-18

Motion: Models, neural mechanisms

Talk Session: Tuesday, May 17, 2022, 5:15 – 7:15 pm EDT, Talk Room 1
Moderator: Joo-Hyun Song, Brown

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 4:05 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 5:15 pm, 55.11

Predictive neural representations of sensory input revealed by a novel dynamic RSA approach

Ingmar Engbert Jacob de Vries1 (), Moritz Franz Wurm1; 1Center for Mind/Brain Sciences, University of Trento

Our capacity to interact with dynamic external stimuli in a timely manner (e.g., catch a ball) suggests that our brain generates predictions of unfolding external dynamics. While theories assume such an internal representation of future external states, the rich dynamics of predictive neural representations remain largely unexplored. One approach for investigating neural representations is representational similarity analysis (RSA), which typically uses models of static stimulus features at different hierarchical levels of complexity (e.g., colour, shape, category) to investigate how these features are represented in the brain. We present a novel dynamic extension to RSA that uses temporally variable models to capture the neural representation of dynamic stimuli. Here we tested this approach on source-reconstructed MEG data of 21 healthy human subjects who observed 14 unique 5sec-long ballet videos, with ~35 repetitions per video. Dynamic RSA revealed unique insights into the representation of low-level visual, body posture and kinematic features: Both low- and high-level information was represented ~ 40–300 msec after the actual visual input, but with low-level information represented most prominently in visual areas, and higher-level information also in slightly more anterior areas. Strikingly, the motion of the ballet dancer was not only represented in a lagged manner, but also in a second distinct temporal window that preceded the actual input by ~ 100–300 msec, indicating that these neural representations predicted future motion. Taken together, dynamic RSA reveals delayed bottom-up and predictive top-down processing in naturalistic dynamic stimuli. As such, it opens the door for addressing important outstanding questions on how our brain represents and predicts the dynamics of the world. More generally, it can be used to study interesting concepts such as predictive coding not only in naturalistic dynamic visual stimuli, but also across a wide range of domains such as naturalistic reading, sign language, etc.

Talk 2, 5:30 pm, 55.12

Decoding of binocular motion extends the hierarchy of motion processing in the human brain

Puti Wen1 (), Michael Landy2, Bas Rokers3; 1Psychology, New York University Abu Dhabi, 2Psychology and Center for Neural Science, New York University, 3Psychology, New York University Abu Dhabi, Psychology and Center for Neural Science, New York University

We live in a 3-dimensional world and motion perception plays a critical role in organisms’ survival. Past neuroimaging studies have shown that motion direction can be reliably decoded from BOLD activity within the motion processing pathway (V1 to hMT). However, the majority of experimental paradigms limit the motion stimuli to the fronto-parallel plane (i.e., 2D motion). Here, we examine if additional areas in visual cortex are involved in decoding 3D motion. To this aim, we presented random dots drifting in eight different motion directions (left/right, toward/away, and four intermediate directions). The stimuli produced distinct retinal motion velocities in the two eyes. For example, motion directly toward or away from the observer produces horizontally opposite retinal motion in the two eyes. In a control experiment, we instead presented vertical retinal motion, which contains virtually identical motion energy but produces transparent 2D, rather than 3D motion percepts. We decoded the presented stimuli from BOLD activity using a probabilistic decoding algorithm (TAFKAP; van Bergen & Jehee, 2021). We found that 3D motion direction can be decoded throughout the canonical motion hierarchy. In V1, horizontal (3D) and vertical (transparent) motion directions are decoded equally well. In hMT, however, 3D motion decoding performance is consistently superior to decoding of transparent motion. Critically, we found that 3D motion direction (but not the vertical control) could be decoded at an equal or greater accuracy when equating for number of voxels in IPS0 compared to in V1 and hMT. Decoding performance was much poorer in areas IPS2-5 and other regions in the ventral pathway. Our results suggest a role for IPS0 in 3D motion processing in addition to its sensitivity to 3D object structure and static depth. Our paradigm can be applied in the future to investigate the transformation from sensory input to perception in the visual pathway.

Acknowledgements: Funding: NIH EY08266 (MSL); Aspire VRI20-10 (BR)

Talk 3, 5:45 pm, 55.13

Causal inference underlies hierarchical motion perception

Sabyasachi Shivkumar1,2 (), Boris Penaloza1,2, Gabor Lengyel1,2, Gregory C. DeAngelis1,2, Ralf M. Haefner1,2; 1Brain and Cognitive Sciences, University of Rochester, 2Center for Visual Science, University of Rochester

Perception of object motion is affected by the motion of other objects in the scene. Prior work (Gershman et al. 2016, Shivkumar et al. 2020, Bill et al. 2020,2021) formalized this process as Bayesian causal inference (Kording et al. 2007). A key signature of this process is the transition from integrating to segmenting motion signals as their differences increase. To quantitatively test the causal inference predictions and constrain model parameters, we designed a new motion estimation task that overcomes two shortcomings of prior experiments. First, in our design, perceiving retinal motion vs integrating motion signals is reflected in strikingly different behavioral reports. Second, it probes the perception of motion of an object relative to a larger object that itself is perceived to move relative to an even larger object. The stimulus consisted of moving dots (target) surrounded by two concentric rings of moving dots. Human observers (n=10) reported the perceived target direction using a dial. The motion direction of the inner ring was clockwise to the target and its difference was varied to probe the transition from the integration to the segmentation of target and inner ring motion. Importantly, the motion of the outer ring was always counterclockwise to the target. Consequently, observers integrating the target and inner ring were predicted to have a clockwise bias in motion perception, while segmenting the target from the inner ring predicted a counterclockwise bias. Our results clearly support these predictions. In an additional experiment, we verified a critical prediction of our model characterizing the transition from integration to segmentation depending on the uncertainty about the target motion. The fitted model parameters may help characterize causal inference across different clinical populations (Noel et al. 2021). Our model can also make predictions for the influence of surround on MT neurons responses (Born et al. 2005).

Acknowledgements: This work was supported by NIH U19NS118246

Talk 4, 6:00 pm, 55.14

Effects of optical material properties on detection of deformation of non-rigid rotating objects

Mitchell J.P. van Zuijlen1, Jan Jaap R. van Assen2, Shin'ya Nishida1,3; 1Cognitive Informatics Lab, Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University., 2Perceptual Intelligence Lab, Industrial Design Engineering, Delft University of Technology., 3NTT Communication Science Labs, Nippon Telegraph and Telephone Corp.

The pattern of image motion (optical flow) produced by dynamic changes in position, orientation and/or geometry of a 3D object can vary greatly depending on the objects’ optical material. This is because different image components, such as surface textures, occluding contours, shading, highlights (for glossy materials), and internal reflections (for transparent materials), have different dependencies on the surface orientation, viewing direction, illumination, etc. This material dependency of image motion makes correct perception of dynamic realistic 3D objects a challenging task for the human visual system. We therefore tested the human ability to perceive non-rigid deformations across ten materials. Materials varied exclusively on optical properties (e.g., textured matte, glossy, mirror-like, and transparent), without changing mechanical properties. The target object was an infinite knot stimulus rotating around a vertical axis at 3 degrees per frame for 120 frames under one of three light conditions. The object was deformed by an inward pulling force in six levels of intensity (including zero force, rigid, condition). The movie of each object was rendered in one of three illumination conditions using Maxwell Renderer. Observers performed a 2-IFC task to choose which of the two stimuli deformed more, and a yes-no judgment as to whether the presented stimulus deformed or not. The results show that there was no effect of illumination on deformation detection while the different optical materials had a moderate effect on deformation detection. We found the largest performance difference between the optically more complex transparent stimuli and the simpler textured matte stimuli. We did find individual differences, where the performance for some observers was clearly more stable across material conditions. These results suggest that the visual system can robustly interpret the deformation of a moving 3D object despite large variations in the optical flow caused by changing optical conditions.

Acknowledgements: This work has been supported by JSPS Kakenhi JP20H05957 and a Marie-Skłodowska-Curie Actions Individual Fellowship (H2020-MSCA-IF-2019-FLOW).

Talk 5, 6:15 pm, 55.15

Properties of V1 and MT motion tuning emerge from unsupervised predictive learning

Katherine Storrs1, Onno Kampman2, Reuben Rideaux3, Guido Maiello1, Roland Fleming1; 1Department of Experimental Psychology, Justus Liebig University Giessen, Germany, 2Department of Psychology, University of Cambridge, UK, 3Queensland Brain Institute, University of Queensland, Australia

Our ability to perceive motion arises from a hierarchy of motion-tuned cells in visual cortices. Signatures of V1 and MT motion tuning emerge in artificial neural networks trained to report speed and direction of sliding images (Rideaux & Welchman, 2020). However, the brain’s motion code must develop without access to such ground truth information. Here we tested whether a more realistic learning objective—unsupervised learning by predicting future observations—also yields motion processing that resembles physiology. We trained a two-layer recurrent convolutional network based on predictive coding principles (PredNet; Lotter, Kreiman & Cox, 2016) to predict the next frame in videos. Training stimuli were 64,000 six-frame videos depicting natural image fragments sliding with uniformly-sampled random velocity and direction. The network’s learning objective was to minimise mean absolute pixel error between its prediction and the actual next frame. Despite receiving no explicit information about direction or velocity, we found that almost all units in both layers of the network developed tuning to a specific motion direction and velocity, when probed with sliding sinusoidal gratings. The network also recapitulated population-level properties of motion tuning in V1. In both layers, mean activation across the population of units showed a motion direction anisotropy, peaking at 90 and 270 degrees (vertical motion), likely due to static orientation statistics of natural images. Like MT neurons, units in the network appeared to solve the “aperture problem”. When probed using pairs of orthogonally-drifting gratings superimposed to create plaid patterns, almost all units were tuned to the direction of the whole pattern, rather than its individual components. Unsupervised predictive learning creates neural-like single-unit tuning, population tuning statistics, and integration of locally-ambiguous motion signals, and provides an interrogable model of why motion computations take the form they do.

Acknowledgements: Supported by the HMWK cluster project “The Adaptive Mind”; the DFG (SFB-TRR-135 #222641018); the ERC (ERC-2015-CoG-682859: “SHAPE”); Marie Skłodowska-Curie Actions (H2020-MSCA-ITN-2017:#765121 and H2020-MSCA-IF- 2017: #793660), the ARC (DE210100790) and an Alexander von Humboldt fellowship.

Talk 6, 6:30 pm, 55.16

The speed of a moving object is underestimated behind an occluder in action and perception tasks

Melisa Menceloglu1 (), Diyarhi Roy1, Joo-Hyun Song1; 1Brown University

Accurately extrapolating a moving object’s trajectory when it becomes occluded is useful in everyday situations such as crossing a busy street or passing another car. Interestingly, recent research has shown that people tend to underestimate a moving object’s speed behind an occluder. Here, we aimed to determine whether this occlusion bias depended on the type of action: discrete vs. continuous. We presented a bar that moved across the screen with constant velocity and went behind an occluder for the second half of its movement. Participants estimated the moment that the bar reached the goal position by pressing a button or reaching and touching the goal position to stop the moving bar. The original occlusion bias was shown using a button press task. We reasoned that reach might reduce the bias based on prior findings that time perception improves with concurrent continuous movement. In line with the previous occlusion bias findings, we observed that participants were more likely to stop the bar after it passed the goal position. However, contrary to our expectations, the occlusion bias was mostly similar across button press and reach. Next, we examined whether the occlusion bias was present at the perceptual level of analysis. Participants judged whether a tone was presented before or after the bar reached the goal position. A perceptual occlusion bias was present but less pronounced than action (~80ms for perception vs. ~130ms for action). Overall, the bias was roughly constant across various motion durations and directions and grew over trials, replicating the previous findings. These results may point to a limitation where the visual system “blinks” when a moving object goes behind an occluder, creating a lag in motion estimation for perception and action.

Acknowledgements: NSF SBE 2104666 to M.M. and NSF BCS 1849169 to J.H.S.

Talk 7, 6:45 pm, 55.17

Laminar Organization of Pre-Saccadic Attention in Marmoset Area MT

Shanna H Coop1 (), Gabriel H Sarch2, Amy Bucklaew1, Jacob L Yates3, Jude F Mitchell1; 1University of Rochester, 2Carnegie Mellon University, 3University of Maryland College Park

Attention leads eye movements producing perceptual enhancements at the saccade target immediately before saccades (Deubel and Schneider, 1996; Kowler et al, 1995; Rolfs and Carrasco, 2012). Pre-saccadic attention has been related to enhanced neural responses before saccades made into a neuron’s receptive field in macaque visual area V4 (Moore and Chang, 2009). Here, we examined pre-saccadic attention in the middle temporal area (MT) of the marmoset monkey and took advantage of their smooth cortical surface to measure neural effects as a function of laminar position. First, we find that current source density (CSD) methods provide estimates of the input layer in area MT based on an early latency sink. Next we examined pre-saccadic attention in a saccade foraging paradigm. In each trial, the marmoset made a saccade from a central fixation point to one of three equally eccentric stimuli (full coherence dot fields with motion sampled independently between apertures, each from 16 directions). We positioned stimuli such that one foraged location overlapped the receptive fields of neurons under study and examined how tuning functions for motion direction changed (i.e., additive and gain changes). We found that saccades towards the receptive field were associated with increases in gain that were predominantly in superficial layers while additive increases in rate were shared across all layers. We also examined the extracellular waveform shapes and found a bi-modal distribution of peak-to-trough durations (i.e., narrow and broad spiking categories). In particular, the gain increases among superficial layer neurons were specific to the broad spiking category. Broad spiking neurons in those layers should include projection cells that relay information to other cortical areas. This suggests that increases in sensitivity could be specific to cell classes that output sensory information to later stages of processing involved in decision making.

Acknowledgements: SC, AB, and JFM from NIH EY030998; GS from NSF GRF 2020305476; JLY from 1K99EY032179

Talk 8, 7:00 pm, 55.18

Effects of simulated and perceived motion on cognitive task performance

Onoise G. Kio1 (), Robert S. Allison1; 1York University

Compelling simulated motion in virtual environments can induce the sensation of self motion (or vection) in stationary observers. While the usefulness and functional significance of vection is still debated, the literature has shown that perceived magnitude of vection is lower when observers perform attentionally demanding cognitive tasks than when attentional demands are absent. Could simulated motion and the resulting vection experienced in virtual environments in turn affect how observers perform various attention demanding tasks? In this study therefore, we investigated how accurately and rapidly observers could perform attention-demanding aural and visual tasks while experiencing levels of vection-inducing motion in a virtual environment. Seventeen adult observers were exposed to different levels of simulated motion at virtual camera speeds of 0 (stationary), 5, 10 and 15 m/s in a straight virtual corridor rendered through a Vive-Pro Virtual Reality headset. During these simulations, they performed aural or visual discrimination tasks, or no task at all. We recorded the accuracy, the time observers took to respond to each task, and the intensity of vection they reported. Repeated Measures ANOVA showed that levels of simulated motion did not significantly affect accuracy on either task (F(3,48) = 1.469, p = .235 aural; F(3,48) = 1.504, p = .226 visual), but significantly affected the response times on aural tasks (F(3,48) = 4.320, p = .009 aural; F(3,48) = 0.916, p = .440 visual). Observers generally perceived less vection at all levels of motion when they performed visual discrimination tasks compared to when they had no task to perform (F(2,32) = 13.784, p = .038). This suggests that perceived intensities of vection are significantly reduced when people perform attentionally demanding tasks related to visual processing. Conversely, vection intensity or simulated motion speed can affect performance on aural tasks.