3D shape recognition in humans and deep neural networks

Poster Presentation 63.402: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Models

There is a Poster PDF for this presentation, but you must be a current member or registered to attend VSS 2024 to view it.
Please go to your Account Home page to register.

Shuhao Fu1 (), Daniel Tjan2, Philip Kellman1, Hongjing Lu1; 1University of California, Los Angeles, 2Cerritos High School

Both humans and deep neural networks can recognize objects from 3D shapes depicted with sparse visual information, such as a set of points randomly sampled on the surfaces of 3D objects (termed point cloud). Although networks achieve human-like performance for recognizing objects from 3D shapes, it is unclear whether network models acquire similar 3D shape representations to human vision for object recognition. We hypothesize that training neural networks enable the model to gain access to some local 3D shape features and distinctive parts associated with objects, which are adequate to provide good object recognition performance. However, the networks lack representations of the global 3D shapes of objects. We conducted two experiments to test this hypothesis. In Experiment 1, we created Lego-style point clouds to mimic object shapes constructed by Legos. Lego-style 3D objects disrupt local shape features but preserve the global 3D shape of objects. Point clouds of Lego-style objects were shown to both human participants and a dynamic graph convolutional neural network (DGCNN) trained to recognize 3D objects from point cloud displays. Humans maintained high recognition performance when the disruption of local shape was moderate (e.g., the size of Lego pieces was small) (recognition performance for intact 3D shapes: 90% vs. Lego shapes 89%). In contrast, the DGCNN performance dropped significantly, from 90% to 54%. In Experiment 2, we spatially scrambled object parts to disrupt the global 3D shape. We found the opposite result: human recognition performance for part-scrambled displays significantly worsened, but the neural network showed similar recognition performance for the part-scrambled objects when recognizing objects from intact 3D shapes. Hence, the two experiments provide double-dissociation results to show that human object recognition relies on global 3D shapes, but neural networks learn to recognize 3D objects from local shape features.

Acknowledgements: We gratefully acknowledge the generous funding provided by NSF BCS-2142269.