A biologically inspired framework for contrastive learning of visual representations: BioCLR

Poster Presentation 63.407: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Models

Zhixian Han1 (), Vlada Volyanskaya1, Anne B. Sereno1,2; 1Purdue University, 2Indiana University School of Medicine

Self-supervised learning is a machine learning paradigm where a model is trained by supervisory signals generated by itself. Self-supervised learning is interesting because it does not require human-labeled data and can train artificial neural networks to capture essential features that are useful for downstream tasks. SimCLR (a Simple framework for Contrastive Learning of visual Representations) is a contrastive self-supervised learning framework that has been shown to outperform many other models. However, the data augmentation procedure (generating additional images by randomly modifying the original images) used by SimCLR may not be biologically plausible. Therefore, we propose BioCLR (a Biologically inspired framework for Contrastive Learning of visual Representations) to better align with how a brain might implement contrastive self-supervised learning models. We used the CIFAR-10 dataset to train our model. Much research supports the idea that primate cortical visual processing is segregated into two streams that generate similar but different visual representations. Using the CIFAR-10 dataset, we trained one artificial pathway to recognize the average color of each image (average vector value of all pixels in the image) and another pathway to recognize the orientation of the object in each image. Then we used the internal neural representations generated by the two pathways to replace the data augmentation procedure in SimCLR and trained our BioCLR model with contrastive self-supervised learning. We used the supervised object recognition task as the downstream task to test the models. We found that in the downstream task, our BioCLR model achieved significantly higher testing accuracy than a baseline model with the same neural network architecture but was not trained with self-supervised learning. Our results suggest that the different internal representations produced by segregated visual pathways may be used to implement contrastive self-supervised learning and improve the object recognition performance.

Acknowledgements: Funding to Anne B. Sereno: Purdue University and NIH CTSI (Indiana State Department of Health #20000703). We extend our gratitude to Pankaj Meghani and the Viasat Corporate Partner Mentors offered by the Data Mine at Purdue University for their comments on the earlier versions of the model.