Motion adaptation induced object position bias in macaque IT and SlowFast video recognition models

Poster Presentation 63.325: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Motion: Neural mechanisms

Elizaveta Yakubovskaya1 (), Hamidreza Ramezanpour1, Sara Djambazovska1, Kohitij Kar1; 1Department of Biology, York University

To efficiently interact with their environment, primates excel in not just recognizing objects ('what') but also discerning their spatial attributes ('where'). This dual capacity, traditionally attributed to the functional segregation of ventral and dorsal visual processing pathways, is currently being reexamined in light of emerging evidence. Recent work of Hong et al. (2016) revealed the macaque inferior temporal (IT) cortex’s role in encoding object positions. Our study further ventures into this relatively uncharted territory with three main objectives: firstly, to extend the findings of Hong et al., assessing how scaling of neural recording sites in IT influences object-position estimates; secondly, to investigate the impact of motion adaptation (a phenomenon typically associated with dorsal stream) on these estimates; and thirdly, to evaluate whether existing ventral stream models align with our observations. We performed large-scale recordings across IT cortex of 3 monkeys (~500 sites). Monkeys passively fixated Test images (640; 1 of 8 objects, varying latent parameters, embedded in naturalistic backgrounds). Indeed, we observed highly accurate (Pearson R >0.7) IT-population based linear decodes of object positions. Next, to test whether motion-direction adaptation biases position estimates, we preceded the Test image presentation by prolonged (3000 ms) oriented gratings moving in one of four directions. Remarkably, IT-based (192 sites) position decodes showed a significant bias (p<0.0001; permutation test) in the direction opposite to the preceding motion. These biases align with perceptual reports, suggesting that the IT cortex represents perceptual rather than ground-truth positions. Interestingly, simulating the experiments in-silico on SlowFast networks (video recognition model with ResNet-50 backbone) demonstrated a similar bias (absent in vanilla ResNet-50 with scaled activation mimicking neural fatigue). Our findings introduce a framework for probing how dorsal-ventral interactions could generate adaptation after-effects and a model-based hypotheses space to guide the exploration of computational mechanisms critical for dynamic scene perception.

Acknowledgements: Canada Research Chair Program, Google Research, CFREF, Brain Canada, SFARI