VSS, May 13-18

V-VSS, June 1-2

Distributed population activity in the macaque inferior temporal cortex but not current deep neural networks predict the ponzo illusion

Poster Presentation 36.335: Sunday, May 15, 2022, 2:45 – 6:45 pm EDT, Banyan Breezeway Poster 3
Session: Object Recognition: Neural mechanisms

Poster Presentation 86.664: Thursday, June 2, 2022, 10:00 am – 12:00 pm EDT, Video Chat
Session: Poster Session 5

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 2:39 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Vivian C. Paulun1 (), Kristine Zheng1, Kohitij Kar1; 1Massachusetts Institute of Technology

Primates must accurately estimate the size of objects in their environment to interact with them efficiently. Hong, Yamins, Majaj, and DiCarlo (2016) reported that one could accurately approximate an object’s size within an image from the population activity across the macaque inferior temporal (IT) cortex upon brief (100 ms) image presentations. These neural predictions were consistent with human behavioral estimates of object size within the same images—suggesting a linear IT readout model as the leading neural decoding hypothesis for object size estimation in primates. However, perceived and image-based (i.e., retinal) object sizes were highly correlated in the Hong et al. (2016) study. Notably, two objects with identical retinal sizes may be perceived to differ in size when embedded at different locations along a linear perspective (Ponzo illusion). Therefore, this size illusion allows us to perform a stronger test and assess whether the IT-based linear readout model predicts the perceived or retinal size. We created a set of image pairs by placing objects "near" or "far" with respect to a linear perspective background. We performed large-scale neural recordings (2 Utah arrays; n=192 sites) across the macaque IT cortex while the monkey fixated the images for 100 ms. Extending the results of Hong et al. (2016), we observed that approximations of object sizes from the IT responses (~190-205 ms) showed a significant bias (“far”-”near”; Δ=22%; p<0.001) that is qualitatively similar to that measured behaviorally in humans. Interestingly, however, most deep convolutional neural network (DCNN) models that so far best approximate the primate IT responses failed to demonstrate such a bias. Together, our results provide further support for the linear IT readout model of object size perception while exposing a significant explanatory gap in current DCNNs as models of primate vision.

Acknowledgements: This work was supported by the German Research Foundation (PA 3723/1-1), the Center for Brains, Minds and Machines (CBMM), funded by NSF STC award CCF-1231216 and Simons Foundation grant SCGB-542965 (James J DiCarlo). We thank the DiCarlo and Kanwisher Labs for helpful discussions.