Three-dimensional shape cues affect human and artificial recognition systems differently
Poster Presentation 56.413: Tuesday, May 19, 2026, 2:45 – 6:45 pm, Pavilion
Session: Object Recognition: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Nicholas Baker1, Mikayla Cutler, William Friebel, Luke Baumel, Joseph Tocco, George Thiruvathukal; 1Loyola University of Chicago
Humans and neural networks use shape and texture information differently. While shape is the primary cue in human object recognition, neural networks are more biased towards texture cues. Many tests of shape vs. texture bias have focused on shape recognition from an object’s external contour. However, shape information is also conveyed through internal contours, shading, and attached shadows, especially when an object is viewed from noncanonical perspectives. Using models from ShapeNet, we created datasets of 120,000 texture-substituted images of objects from many viewpoints with and without shading and attached shadows. We tested humans’ and several neural networks’ ability to classify these objects by both their shape and their texture. Humans were much better at classifying texture-substituted objects by their shape than any network, although these differences were greater when shape was defined only by the external contour than when 3D cues were included. Our findings suggest that networks’ texture bias is reduced when 3D cues are included in images. We next tested whether the inclusion of 3D cues benefitted humans and neural networks more for images of objects viewed from canonical or noncanonical perspectives. Consistent with earlier research, we found that 3D cues primarily benefitted humans for noncanonical images. For neural networks, the greatest performance gains were for canonical images. These findings suggest fundamental differences in how humans and networks use shading and attached shadows for object recognition. We argue that humans use these cues to infer objects’ 3D structures while neural networks use them as another surface-level cue like texture.