Feature Visualizations do not sufficiently explain hidden units of Artificial Neural Networks

Poster Presentation 43.305: Monday, May 22, 2023, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Object Recognition: Models

There is a Poster PDF for this presentation, but you must be a current member or registered to attend VSS 2023 to view it.
Please go to your Account Home page to register.

Thomas Klein1,2 (), Wieland Brendel2, Felix Wichmann1; 1Neural Information Processing Group, University of Tübingen, 2Max Planck Institute for Intelligent Systems, Tübingen

Artificial Neural Networks (ANNs) have been proposed as computational models of the primate ventral stream, because their performance on tasks such as image classification rivals or exceeds human baselines. But useful models should not only predict data well, but also offer insights into the systems they represent, which remains a challenge for ANNs. We here investigate a specific method that has been proposed to shed light on the representations learned by ANNs: Feature Visualizations (FVs), that is, synthetic images specifically designed to excite individual units ("neurons") of the target network. Theoretically, these images should visualize the features that a unit is sensitive to, like receptive fields in neurophysiology. We conduct a psychophysical experiment to establish an upper bound on the interpretability afforded by FVs, in which participants need to match five sets of exemplars (natural images that highly activate certain units) to five sets of FVs of the same units---a task that should be trivial if FVs were informative. Extending earlier work that has cast doubts on the utility of this method, we show that (1) even human experts perform hardly better than chance when trying to match a unit's FVs to its exemplars and that (2) matching exemplars to each other is much easier, even if only a single exemplar is shown per set. Presumably, this difficulty is not caused by so-called polysemantic units (neurons that code for multiple unrelated features, possibly mixing them in their visualizations) but by the unnatural visual appearance of FVs themselves. We also investigate the effect of visualizing units from different layers and find that the interpretability of FVs declines in later layers---contrary to what one might expect, since later layers should represent semantic concepts. These findings highlight the need for better interpretability-techniques, if ANNs are ever to become useful models of human vision.

Acknowledgements: Funded by EXC number 2064/1 – project number 390727645 and by the German Research Foundation (DFG): SFB 1233 – project number 276693517. The authors would like to thank the International Max Planck Research School for Intelligent Systems (IMPRS-IS) for supporting Thomas Klein.