Investigating the impact of Gaussian noise on face recognition performance for humans and convolutional neural networks

Poster Presentation: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Pavilion
Session: Object Recognition: Structure of categories

Ikhwan Jeon1, Connor Parde2, Frank Tong3; 1Vanderbilt University

Recent work has shown that convolutional neural networks (CNNs) perform worse than human observers at recognizing objects presented with superimposed Gaussian noise, indicating poor alignment between CNN models and humans (Jang et al., 2021). However, it is unknown whether Gaussian noise might also disproportionately impair CNNs at tasks of face recognition, which more-heavily rely on the processing of lower-spatial-frequency information. We evaluated humans and face-trained CNNs using face images with varying levels of Gaussian noise, where noise intensity was specified by the signal to signal-plus-noise ratio (SSNR). Human participants completed a face-recognition task wherein they viewed face images of 10 different celebrities (80 images/celebrity) in an intermixed, randomized order, and were instructed to select the identity shown in each image. Each face image was presented for 200ms at a randomly-assigned SSNR level (SSNR = 0.1, 0.15, 0.2, 0.25, 0.3, 0.4, 0.5, 0.6, 0.8, or 1). Then, the same set of face images was presented to a face-classification trained version of AlexNet. The identity response from the CNN was indicated by the identity node for which each input image elicited the highest response. Performance for both human participants and the CNN was quantified by accuracy as a function of SSNR and was compared to the results of object recognition (Jang et al. 2021). Overall, humans exhibited better performance than the CNNs. But both humans and the CNNs better performed in face recognition than object recognition (SSNR thresholds of human: 0.17 for faces, 0.26 for objects; of CNNs: 0.25 for faces, 0.5 for objects). Moreover, face-trained model performance appeared highly similar to human, suggesting similar degrees of robustness to Gaussian visual noise. The fact that face-trained CNNs provide a reasonable account of human recognition of noisy face images may suggest that these CNNs are fairly aligned with the human face recognition system.

Acknowledgements: Supported by NIH grants R01EY029278 and R01EY035157 to FT.