Evaluating the Alignment of Machine and Human Explanations in Visual Object Recognition through a Novel Behavioral Approach

Poster Presentation 63.408: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Object Recognition: Models

Yousif Kashef Alghetaa1 (), Simon Kornblith2, Kohitij Kar1; 1Department of Biology, York University, 2Anthropic PBC

Understanding how computer vision models make decisions is paramount, particularly with increasing scrutiny from various institutions. The field of Explainable Artificial Intelligence (XAI) provides tools to interpret these model decisions, but the explanations are often at odds. Kar et al. (2022) suggested evaluating the goodness of machine explanations based on their alignment with human cognitive processes. This study builds on that concept, addressing the challenge of reliably approximating human explanations, a task complicated by the limitations of existing psychophysical tools like 'bubbles' and classification-images. Our study introduces a novel method to assess the alignment between human and machine explanations in object discrimination tasks. We establish a two-model framework: a target (ResNet-50, whose explanations are under scrutiny) and a reference model (a fully differentiable model, AlexNet, as a stand-in for humans). The objective is to eventually compare the target model's explanations with human explanations. We begin by analyzing feature attribution maps (heat maps showing how image features influence model outputs) from both models. We compare these maps using various metrics to create a baseline ranking of explanation similarity between ResNet-50 and AlexNet. Following this, we create explanation-masked images (EMIs) by retaining only the most informative pixels based on ResNet-50's (Target) feature attributions. We hypothesize that the impact of these EMIs on both model behaviors could reflect the similarity of their underlying explanations. We then estimate the object discrimination accuracy of both ResNet-50 and AlexNet on these EMIs. The correlation between their performances provides a ranking of explanation similarity. Our results showed a significant correlation (Spearman R=0.65, p=0.003), indicating a strong alignment between the two models' explanations. This finding sets the stage for extending our method to human subjects, using their behavioral responses to EMIs to evaluate the accuracy of ResNet-50's explanations, offering a new direction for comparing machine and human explanations.

Acknowledgements: Google Research, CFREF, Brain Canada, SFARI