Talk 1, 8:15 am
Object correspondence across movements at saccadic speed
Melis Ince1,2 (), Carolin Hübner1,3, Martin Rolfs1,2; 1Department of Psychology, Humboldt-Universität zu Berlin, Germany, 2Berlin School of Mind and Brain, Humboldt-Universität zu Berlin, Germany, 3Department of Psychology, Technische Universität Chemnitz, Germany
Saccadic eye movements impose rapid motion on the retinal image, raising the question of how object correspondence is established from one fixation to the next. Here, we investigated if the rapid motion itself — by providing spatiotemporal continuity — plays a role in achieving object correspondence. To isolate the contribution of high-speed motion, we simulated saccadic motion using a high-temporal-resolution projector (updating the display every 0.69 ms) while observers fixated their gaze throughout the experiments. We first investigated the contribution of motion at saccadic speed to object correspondence using a two-frame quartet-motion display. We positioned identical Gabor patches as objects at opposing corners within an imaginary rectangle. One object then moved continuously — along a curved trajectory (inward or outward) — to one of the neighboring corners, while the other jumped to the opposite side, completing the quartet. On each trial, participants first reported quartet rotations (clockwise or counterclockwise), indicating perceived object correspondence, and then traced the perceived continuous motion trajectory using a mouse, indicating motion visibility (location and curvature). We found that motion visibility declined as speed increased, eventually reaching chance levels for location and curvature reports. At the same time, continuous motion biased the quartet rotation perception even at the highest (saccade-like) speeds. These results suggest that high-speed motion informs object correspondence, even if that motion is effectively invisible. We are currently following up on this finding in a second study, in which we combine a version of our quartet motion display with the go/no-go reviewing paradigm (Sasi et al., 2023). We investigate if object files are maintained through motion at saccadic speed. By combining objective measures of stimulus visibility, the perception of object correspondence, and the maintenance of object files over time, we aim to shed light on the fundamental mechanisms behind object continuity at saccadic speeds.
Acknowledgements: Funding from the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. [865715 – VIS-A-VIS]) and the Heisenberg Programme of the Deutsche Forschungsgemeinschaft (grants RO3579/8-1 and RO3579/12-1) granted to MR.
Talk 2, 8:30 am
Detecting Moving Objects During Self-motion
Hope Lutwak1, Bas Rokers2, Eero Simoncelli1,3; 1Center for Neural Science, New York University, 2Psychology, Center for Brain and Health, Aspire Precision Medicine Research Institute, New York University Abu Dhabi, 3Center for Computational Neuroscience, Flatiron Institute
As we move through the world, the pattern of light projected on our eyes is complex and dynamic, yet we are still able to distinguish moving and stationary objects. One might hypothesize that this is achieved by detecting discontinuities in the spatial pattern of velocities, however this computation is also sensitive to velocity discontinuities at boundaries of stationary objects. We instead propose that humans make use of the specific constraints that self-motion imposes on retinal velocities. When an eye translates and rotates within a rigid 3D world, the velocity at each location on the retina is constrained to a line segment in the 2D space of retinal velocities (Longuet, Higgins, Prazdny 1980). The slope and intercept of this segment is determined by the eye’s translation and rotation, and the position along the segment is determined by depth of the scene. Since all possible velocities arising from a rigid world must lie on this segment, velocities not on the segment must correspond to moving objects. We hypothesize that humans make use of these constraints, by partially inferring self-motion based on the global pattern of retinal velocities, and using deviations of local velocity from the resulting constraint lines to detect moving objects. Using a head-mounted virtual reality device we simulated a translation forward in different virtual environments: one consisting of textured cubes above a textured ground plane, and one of scattered depth-matched dots. Participants had to determine if a cued cube/dot moved relative to the scene. Consistent with the hypothesis, we found that performance depended on the deviation of the object velocity from the constraint segment, not on the difference between retinal velocities of the object and its surround. Our findings contrast with previous inconclusive results, that relied on an impoverished stimulus with a limited field of view.
Talk 3, 8:45 am
Object motion representation in the macaque inferior temporal cortex – a gateway to understanding the brain's intuitive physics engine
Over the past decade, there have been significant advances in understanding how primates recognize objects in the presence of identity-preserving variations. However, primates' vision encompasses more than object recognition. In the dynamic world, an effective interaction with moving objects and the ability to infer and predict their motion are essential for survival. In this study, we systematically investigated hierarchically connected brain areas in the ventral visual pathway of rhesus macaques (areas V4 and IT), implicated in object recognition, to first characterize their responses to object motion, speed, and direction. Subsequently, we quantified the correlative links between these responses and two distinct object motion-based behaviors, one reliant on information directly available in videos (e.g., velocity discrimination) and the other predicated on predictive motion estimates from videos (e.g., future frame predictions). Further, by employing causal microsimulation strategies, we tested the critical role of the macaque IT cortex in these behaviors. Interestingly, while current computational models of object and action recognition are accurate on stationary object-based tasks, we observed that their predictions suffer significant deficits in our dynamic tasks compared to primates. These findings call into question the widely accepted demarcation of the primate ventral and dorsal cortices into the "what" and "where" pathways. These explorations highlight the imperative to examine the interplay between these cortical hierarchies for a more profound understanding of visual motion perception, which serves as a gateway to intuitive physics. The data also provide valuable empirical constraints to guide the next generation of dynamic brain models.
Acknowledgements: CIHR, Canada Research Chair Program, Google Research, CFREF, Brain Canada, SFARI
Talk 4, 9:00 am
Acquisition of second-order motion perception by learning to recognize the motion of objects made by non-diffusive materials
Zitang Sun1 (), Yen-Ju Chen1, Yung-Hao Yang1, Shin’ya Nishida1,2; 1Cognitive Informatics Lab, Graduate School of Informatics, Kyoto University, Japan, 2NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Japan
Many animals, including flies, macaques, and humans, have an ability to visually recognize image motion not only from shifts of spatial patterns defined by luminance modulations (first-order motion) but also from those defined by high-level image features such as temporal modulations and contrast modulations (second-order motion). In the past, second-order motion perception has been extensively studied using carefully designed artificial stimuli (e.g., drift-balanced motion) to control first-order motion components, but why and how the visual system has acquired this perceptual ability in natural environments remains poorly understood. We hypothesized that the biological system might naturally learn second-order motion perception for the purpose of estimating correct physical object motion amidst internal optical fluctuations produced, for example, by highlights of glossy materials and refractions of transparent materials. As a proof concept, we developed a DNN-based model to process both first- and second-order motions in natural scenes. The model was based on our two-stage model (Sun et al., NeurIPS 2023) consisting of a trainable motion energy sensing and a recurrent self-attention network, each inspired by biological computations in V1 and MT. For preprocessing for complex second-order features, we added a second input pathway with a vanilla multi-layered convolution network. The model was trained on two distinct optical flow datasets generated by rendering random object motion: one with purely diffuse reflection (PD) and the other with non-diffuse (ND) material properties, the latter including ample optical turbulence made by specular reflections and transparent refractions. The ND-trained model demonstrated significantly better recognition of various types of second-order motion, aligning closely with human performance measured in our psychophysical experiments. Also, this performance was unachievable without the second input pathway. The results suggest that second-order motion perception might have evolved, at least partially, to help robust estimation of object motion while countering optical fluctuations under natural environments.
Acknowledgements: This work is supported by JST JPMJFS2123, MEXT/JSPS JP20H00603 and JP20H05605.
Talk 5, 9:15 am
Deep feature matching vs spatio-temporal energy filtering for robust moving object segmentation
Recent methods for optical flow estimation achieve remarkable precision and are successfully applied in downstream tasks such as segmenting moving objects. These methods are based on matching deep neural network features across successive video frames. For humans, in contrast, the dominant motion estimation mechanism is believed to rely on spatio-temporal energy filtering. Here, we compare both motion estimation approaches for segregating a moving object from a moving background. We render synthetic videos based on scanned 3d objects and backgrounds to obtain ground truth motion for realistic scenes. We transform the videos by replacing the textures with random dots that follow the motion of the original video. This way, each individual frame does not contain any other information about the object apart from the motion signal. Humans have been shown to be able to use random dot motion for recognizing objects in these stimuli (Robert et al. 2023). We compare segmentation methods based on the recent RAFT optical flow estimator (Teed and Deng 2020) and the spatio-temporal energy model of Simoncelli & Heeger (1998). Our results show that the spatio-temporal energy approach works almost as well as using RAFT for the original videos when combined with an established segmentation architecture. Furthermore, we quantify the amount of segmentation information that can be decoded from both models when using the optimal non-negative superposition of feature maps for each video. This analysis confirms that both optic flow representations can be used for motion segmentation while RAFT performs slightly better for the original videos. For the random dot stimuli however, hardly any information about the object can be decoded from RAFT while the brain-inspired spatio-temporal energy filtering approach is only mildly affected. Based on these results we explore the use of spatio-temporal filtering for building a more robust model for moving object segmentation.
Acknowledgements: Deutsche Forschungsgemeinschaft (DFG, German Research Foundation): Germany’s Excellence Strategy – EXC 2064/1 – 390727645 and SFB 1233, TP4, project number: 276693517.
Talk 6, 9:30 am
Anisotropy in perceived nonrigidity
Shape-from-motion models generally assume that objects are rigid, which simplifies the computations but cannot handle movements and locomotion of organisms, all of which require nonrigid shape deformations. We have demonstrated that rotated rigid objects can appear strikingly nonrigid, depending on speed and shape. We showed that nonrigid percepts arise from the outputs of direction-selective motion cells and are countered by feature-tracking and shape-based priors. Now we present the surprising finding that perceived nonrigidity changes with the rigid object’s orientation, and model it with documented cortical anisotropies. When two solid 3D circular rings attached rigidly at an angle are rotated horizontally around a vertical axis at medium speed, observers see either rigid rotation or non-rigid wobbling. A 90° image rotation markedly enhances the non-rigid percept. We observed that the elliptical projections of the rings in the rotated image appear narrower and longer than in the original image, like the increased perceived height versus width when a square is rotated 45° to form a diamond. We successfully model the perceived changes in shape with optimal Bayesian decoding of V1 outputs, by incorporating anisotropies in the number and tuning-widths of orientation selective cells and the probability distribution of orientations in images of natural scenes. We show quantitatively that elongating the ellipses alone leads to more perceived nonrigidity even for horizontal rotation, but the vertical rotation further enhances nonrigidity. We incorporated the cortical anisotropies into motion flow computations. The estimated motion fields were decomposed into gradients of divergence, curl, and deformation and compared to the gradients for physical rotation and wobbling. The gradients for the vertical rotation were a closer match to physical wobbling, while the gradients for the horizontal rotation were in between physical wobbling and rotating. This asymmetry indicates that hardwired cortical anisotropies can explain changes in perceived non-rigidity with motion axis.