Individual differences in fusing the face identification decisions of humans and machines

Poster Presentation 63.429: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Disorders, individual differences

P. Jonathon Phillips1 (), Géraldine Jeckeln2, Carina Hahn1, Amy Yates1, Peter Fontana1, Alice O'Toole2; 1NIST, 2The University of Texas at Dallas

Combining the decisions of two individuals (fusion) can increase face identification performance. However, fusion can be beneficial or detrimental, depending on the particular individuals paired (cf. medical imaging, Kurvers et al., 2016). Here, we establish the circumstances under which fusing two particular individuals (human-human) or one individual and a machine (human-machine) reliably benefit face-identification accuracy. We modeled pairwise human-human and human-machine fusions, using human data (27 forensic facial examiners, 32 students, 14 controls) from White et al. (2015) and machine data from a deep convolutional neural network (Parkhi et al., 2015). Both were tested with 84 face-identity matching trials, and indicated whether images showed the same identity or different identities (humans: 5-point scale; machine: similarity score between embeddings). Human and DCNN accuracy were measured as the area under the curve (AUC). For each pair, we averaged the independent responses (human-human, human-machine) and measured: the fused AUC (from averaged responses); the fusion benefit (difference between the fused AUC and the AUC of the more accurate person in the pair), and ΔAUC (the absolute difference in baseline accuracy for the two performers (human-human, human-machine). Fusion benefit decreased as ΔAUC increased [human-human: r(2626) = -0.7385 , p <0.001; human-machine: r(71) = -0.8996, p < 0.001]. We used this result to implement “selective fusion” for each human-machine pair. Specifically, ΔAUC was used to determine whether to fuse responses or to retain the better performer’s responses. Selective fusion yielded greater accuracy than fused AUC in 88% of trials. To achieve optimal human-human pairing, we implemented graph theory (Galil 1986) and show that optimal pairing (AUC = 0.97) significantly outperformed randomly generated pairs (AUC M = 0.94, SD = 0.006). Optimizing collaboration benefits at the level of individual performers (human or machine) is critical for reducing face-identification errors in applied settings (e.g., forensic face examination).