Contrasting learning dynamics: Immediate generalisation in humans and generalisation lag in deep neural networks

Poster Presentation 36.314: Sunday, May 19, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Object Recognition: Acquisition of categories

There is a Poster PDF for this presentation, but you must be a current member or registered to attend VSS 2024 to view it.
Please go to your Account Home page to register.

Lukas S. Huber1,2 (), Fred W. Mast1, Felix A. Wichmann2; 1Cognition, Perception and Research Methods, Department of Psychology, University of Bern, 2Neural Information Processing Group, Department of Computer Science, University of Tübingen

Behavioral comparisons of human and deep neural network (DNN) models of object recognition help to benchmark and improve DNN models but also might help to illuminate the intricacies of human visual perception. However, machine-to-human comparisons are often fraught with difficulty: Unlike DNNs, which typically learn from scratch using static, uni-modal data, humans process continuous, multi-modal information and leverage prior knowledge. Additionally, while DNNs are predominantly trained in a supervised manner, human learning heavily relies on interactions with unlabeled data. We address these disparities by attempting to align the learning processes and examining not only the outcomes but also the dynamics of representation learning in humans and DNNs. We engaged humans and DNNs in a task to learn representations of three novel 3D object classes. Participants completed six epochs of an image classification task—reflecting the train-test iteration process common in machine learning—with feedback provided only during training phases. To align the starting point of learning we utilized pre-trained DNNs. This experimental design ensured that both humans and models learn new representations from the same static, uni-modal inputs in a supervised learning environment. We collected ~6,300 trials from human participants in the laboratory and compared the observed dynamics with various DNNs. While DNNs exhibit learning dynamics with fast training progress but lagging generalization, human learners often display a simultaneous increase in train and test performance, showcasing immediate generalization. However, when solely focusing on test performance, DNNs show good alignment with the human generalization trajectory. By synchronizing the learning environment and examining the full scope of the learning process, the present study offers a refined comparison of representation learning. Collected data reveals both similarities and differences between human and DNN learning dynamics. This disparity emphasizes that global assessments of DNNs as models of human visual perception seem problematic without considering specific modeling objectives.

Acknowledgements: This research was funded by the Swiss National Science Foundation (214659 to LSH). FAW is a member of the Machine Learning Cluster of Excellence, funded by the Deutsche Forschungsgemeinschaft under Germany’s Excellence Strategy—EXC number 2064/1—Project number 390727645.