Training deep learning algorithms for face recognition with large datasets improves performance but reduces similarity to human representations

Poster Presentation: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Models

Nitzan Guy1 (), Mandy Rosemblaum2, Adva Shoham3, Galit Yovel4; 1Tel Aviv University

The perceptual representation of facial identity is influenced by factors such as familiarity and experience variability. Yet the impact of overall experience with faces - specifically, the number of identities, amount of exposure for each identity, and the variability of head pose - on the nature of face representations remains unknown. In this work, face-trained deep neural networks were used to answer these questions by manipulating the number of identities, exposure for each identity, and variations in head pose during model training. Model quality evaluation included accuracy assessment with a standard face benchmark (Labeled Face in the Wild dataset), testing for human-like face effects (e.g., the inversion effect), and examining the correlation of models with human similarity representations. Our findings reveal that the number of identities and images per identity significantly influenced model performance. Intriguingly, while increased experience improves accuracy, correlations with human representations were higher for models trained with limited experience (e.g., models trained on only 500 identities and 300 images per identity) compared to models with extensive experience (e.g., CLIP and VGG16 trained on Vggface2 dataset that includes more than 8000 identities and hundreds of images per identity). With respect to head pose, we limited training to poses that varied between frontal to 20 degrees (frontal-only model) and compared to a model that was trained on 25-45 degrees (three-quarter-only model). Whereas both models reached similar performance level, the frontal model was more similar to human representations than the three-quarter model. Taken together, our findings show that similarity between face-trained DNN and human representations does not correspond with model performance, and may not require extensive training with the large datasets of faces that are commonly used to train deep learning models.

Acknowledgements: ISF 917/21