Modeling face-Identity “likeness” with a convolutional neural network trained for face identification

Poster Presentation 63.421: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Models

Connor J. Parde1,2 (), Alice J. O'Toole2; 1Vanderbilt University, 2The University of Texas at Dallas

Certain face images are considered a better “likeness” of an identity than others. However, it remains unclear whether perceived likeness is driven by: a) the proximity of a face image to an averaged prototype comprising all instances of an identity (i.e., a “central prototype”), or b) how closely a face image approximates the most-common appearances of the identity (i.e., the “density” of known instances). Convolutional neural networks (CNNs) trained for face recognition can be used to instantiate face spaces that model both within- and between-identity variability. These networks provide a testbed for investigating human perceptions of face likeness. We compared human-assigned likeness ratings with likeness ratings based on identity-prototypes and local image density from a face-identification CNN. Participants (n=50) viewed 20 face images simultaneously (5 viewpoints x 4 illumination conditions) of each of 72 identities and adjusted a slider bar to indicate whether each image was a “good likeness” of the identity being shown. Responses were collapsed across participants to generate a single likeness rating per image. Next, we used face-image descriptors from a CNN to generate likeness ratings based on either the proximity of a descriptor from the “prototype” created by averaging all corresponding same-identity descriptors (prototype-proximity likeness) or the density of same-identity descriptors located around a given descriptor in the output space generated by the CNN (density likeness). For all measures of perceived likeness (human-assigned ratings, prototype-proximity and density), likeness differed across viewpoint (p < 0.0001) and illumination (p < 0.001). However, only the CNN density-based likeness ratings mirrored the pattern of human likeness ratings. These results demonstrate that density within an identity-specific face space is a better model of human-assigned perceived-likeness ratings than distance to an identity prototype. In addition, these results show that viewpoint and illumination influence perceived-likeness ratings for face images.

Acknowledgements: Funding provided by National Eye Institute Grant R01EY029692-04 to AOT