Norm-referenced Encoding Supports Transfer Learning of Expressions across Strongly Different Head Shapes

Poster Presentation: Wednesday, May 22, 2024, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Models

Martin A. Giese1 (), Michael Stettler1, Alexander Lappe1, Nick Taubert1, Ramona Siebert1, Peter Thier1; 1Hertie Institute / CIN, University Clinic Tuebingen

Humans can recognise human facial expressions spontaneously even from non-human heads, cartoons, or animal faces, even if they have never seen such expressions before. The computational basis of this generalization capability is unknown. We propose a novel deep neural network architecture that exploits norm-referenced encoding of facial expressions. This means that facial expressions are represented in terms of the shifts of landmarks relative to the neutral face. RESULTS: We tested the model by training it with human expressions and only a single neutral face of each other non-human head shape. We demonstrate that the developed architecture accomplishes 100% classification accuracy on a matching expression dataset, which we developed, that contains the same expressions retargeted to very different head shapes (humans, monkeys, and cartoon avatars). As opposed to our new architecture, established deep networks machine learning architectures accomplish maximally 63.1% accuracy on this transfer learning task (chance level accuracy: 14.3%). At the same time, the proposed neural representation reproduces the gradual analogous encoding of expression strength, which has been observed in cortical face-selective neurons in the superior temporal sulcus in monkey cortex. We also demonstrate that the proposed architecture scales up in a data-efficient manner. The proposed algorithm performs better on the FERG cartoon dataset (Aneja et al. 2016) than the original methods proposed by these authors (92.15% as compared to 89.02%). CONCLUSION: Norm referenced encoding provides an interesting theoretical concept not only to explain the tuning properties of face-selective neurons, but also with respect to optimized transfer learning of expressions across different facial shapes.

Acknowledgements: Supported by ERC 2019-SyG-RELEVANCE-856495, SSTeP-KiZ BMG: ZMWI1-2520DAT700, and NVIDIA Corp.