Linguistic and visual similarity judgements predict EEG representational dynamics in visual perception and sentence reading

Poster Presentation 26.463: Saturday, May 18, 2024, 2:45 – 6:45 pm, Pavilion
Session: Scene Perception: Neural mechanisms

Katerina Marie Simkova1 (), Jasper JF van den Bosch2, Clayton Hickey1, Ian Charest3; 1CHBH, School of Psychology, University of Birmingham, 2School of Psychology, University of Leeds, 3cerebrUM, Département de Psychologie, Université de Montréal

Emerging evidence in cognitive and computational neuroscience suggests that multi-modal computational models converge on representations that improve the performance in each of the modalities used. This latent representational space also enables the prediction of brain response profiles across modalities but it remains unclear how vision and linguistics share meaningful representations in the human brain. Here, we collected ~7 hours of electroencephalography (EEG) data from each of six participants passively viewing 100 natural scene images or actively reading 100 sentence captions describing the images. The activity pattern similarity was estimated using a cross-validated Mahalanobis distance computed on a spatiotemporal transformation of the modality-specific EEG data across all pairs of conditions. To establish the presence of shared representations in both modalities and to assess their behavioural relevance, we collected behavioural similarity judgements through multiple arrangement (MA) tasks on the set of images and sentences from two independent groups of participants (n = [24, 22]). This was used to construct the visual and linguistic fixed model RDMs each characterising the unique similarity structure of the two modalities. We then quantified the extent to which the behavioural model RDMs generalise to the visual and linguistic EEG RDMs using cosine similarity. We observed a significant relationship between the visually evoked EEG RDMs and both MA models (visual MA: 0.138 ± 0.024; linguistic MA: 0.136 ± 0.024). Interestingly, both MA models also revealed significant overlap with the linguistic EEG RDMs (visual MA: 0.042 ± 0.007; linguistic MA: 0.043 ± 0.007, all p < 0.001). These results remain when controlling for the potential influence of prior exposure to the cross-modal stimuli. We demonstrate that a similar representation emerges regardless of whether participants viewed an image or read its sentence caption and provide further evidence for behaviourally relevant shared representations in vision and language.