VSS, May 13-18

Multisensory Processing

Talk Session: Tuesday, May 17, 2022, 8:15 – 9:45 am EDT, Talk Room 1
Moderator: Abigail Noyce, Carnegie Mellon

Times are being displayed in EDT timezone (Florida time): Wednesday, July 6, 3:58 am EDT America/New_York.
To see the V-VSS schedule in your timezone, Log In and set your timezone.

Search Abstracts | VSS Talk Sessions | VSS Poster Sessions | V-VSS Talk Sessions | V-VSS Poster Sessions

Talk 1, 8:15 am, 51.11

Early visual cortex represents human sounds more distinctly than non-human sounds.

Giusi Pollicina1 (), Polly Dalton1, Petra Vetter2; 1Royal Holloway, University of London, 2University of Fribourg

A high number of feedback connections link early visual cortex to several other cortical areas. Among these, feedback sent by auditory cortex is sufficient to produce distinguishable neural activity in early visual cortex when participants listen to different natural sounds in the absence of visual stimulation or sight (Vetter, Smith & Muckli, 2014, Current Biology; Vetter, Bola et al., 2020, Current Biology). However, the content of this flux of information has not been fully explored yet. Our study focused on understanding to what degree of specificity auditory information is fed back to visual cortex. We presented a large sample of sounds to 18 blindfolded participants while acquiring functional MRI data. 36 natural sounds were selected according to different semantic categories (e.g. animate sounds, divided into humans and animals, divided into specific species or types of sound, and the same for inanimate sounds). The boundaries of V1, V2 and V3 were drawn using individual retinotopic mapping. We analysed the fMRI activity patterns produced by these sounds in each early visual region using Multivoxel Pattern Analysis (MVPA). Results showed that the MVPA classifier could distinguish significantly above chance animate from inanimate sounds, as well as between human, animal, vehicle and object sounds in early visual cortex. Pairwise classification demonstrated that sounds produced by humans were generally better distinguished compared to other semantic categories. Searchlight analyses showed that decoding also worked in regions of higher level visual and multisensory processing. These results suggest that auditory feedback relays categorical information about sounds, particularly human sounds, to areas that were once believed to be exclusively specialised for vision. We conclude that early visual cortex function is not restricted to the processing of low-level visual features, but includes representation and potential employment of semantic and categorical sound information, which might be used to predict visual stimuli.

Acknowledgements: Departmental PhD studentship to GP; research grant from the John Templeton Foundation (Prime Award No 48365) as part of the Summer Seminars in Neuroscience and Philosophy (SSNAP, subcontract No 283-2608) to PV; PRIMA grant by the Swiss National Science Foundation (PR00P1_185918/1) to PV.

Talk 2, 8:30 am, 51.12

Measuring EEG correlates of individual differences in visual and auditory top-down attention

Jasmine Kwasa1 (), Abigail Noyce1, Barbara Shinn-Cunningham1; 1Carnegie Mellon University Neuroscience Institute

Individual variability in performance on selective attention tasks that require top-down control can be large. Recent work also suggests that auditory spatial selective attention engages the same fronto-parietal network used by visual spatial selective attention; however, the brain regions that process visual and auditory sensory input are quite differently organized. We hypothesized that despite these dissimilarities, individual differences in performance would correlate between visual and auditory selective attention tasks, and that electroencephalography (EEG) measures, namely event-related potential (ERP) N1 responses and inter-trial phase coherence (ITC) peaks, would correspond to attention performance. We collected EEG while subjects sustained selective attention at a cued spatial location (left or right) and reported its visual or auditory sequence while ignoring a distractor sequence. Behavioral performance on the tasks was correlated across sensory modalities. At the group level, spatial attention modulated N1 and ITC peak responses for both auditory and visual tasks. At the individual level, the degree of N1 modulation correlated with task performance for the auditory but not the visual task, while ITC peak modulation correlated with performance for the visual but not the auditory task. Our work supports the idea of a shared fronto-parietal attention network, while highlighting differences in how to best measure the effects of top-down attention in these two sensory modalities.

Acknowledgements: This work was supported by NIDCD R01 DC013825 (to BGSC), ONR project N00014-20-1-2709 (to BGSC), and NINDS F99 NS115331 (to JAK).

Talk 3, 8:45 am, 51.13

Thinking outside the box in the study of visual preferences: External elements determine ‘goodness’ judgments for a dot within a square frame

Jiangxue Valentina Ning1 (), Benjamin van Buren1; 1The New School

Imagine that you are at an art museum, making a judgment about how good a painting looks. What factors would contribute to your judgment? Whereas previous investigations of visual preferences have focused on the arrangement of elements within frames, here we asked about the influence of *extra-frame* factors. In Experiments 1a and 1b, we asked whether the orientation of a square frame with respect to its surroundings influences ‘goodness of fit’ ratings for single dots placed within it. In both experiments, in the unrotated condition, observers gave relatively higher ratings to compositions in which the dot was positioned near the square’s vertical symmetry axis. However, in the 45°-rotated condition (which looked like a diamond), observers now rated relatively highly compositions in which the dot was positioned near the diamond’s vertical symmetry axis. These results suggest that the relationship between a frame and its surroundings dramatically changes which of its internal symmetry axes plays the greatest role in determining visual preference judgments — but an alternative explanation is that observers simply preferred to see dots placed along the egocentric midline. We ruled this possibility out in Experiments 2a and 2b, which investigated the effects of another contextual manipulation — an additional ‘accentuating’ dot placed just outside one of the square’s corners. In both experiments, in the unaccentuated condition, observers again gave relatively higher ratings to compositions in which the dot was positioned near the square’s vertical symmetry axis. However, in conditions with an accentuating dot outside one of the square’s corners, there was a marked shift, such that observers now rated relatively highly compositions in which the dot was positioned near the symmetry axis bisecting the accentuated corner. These results collectively demonstrate that, in order to understand visual preferences for dots within a frame, one needs to think outside the box.

Talk 4, 9:00 am, 51.14

Greater sensitivity to Visual-Vestibular Conflict Correlates with Lower VR Sickness

Savannah Halow1 (), Allie Hamilton2, Eelke Folmer3, Paul MacNeilage4; 1University of Nevada, Reno

Visual-vestibular conflict during virtual reality (VR) use is thought to cause VR sickness, but the relation between conflict-sensitivity and sickness is poorly understood. We investigated this relationship by manipulating fixation behavior (head/scene-fixed) and retinal stimulus location (RSL, central/peripheral/full) in a 2x3 design during a conflict detection task. We measured sickness via completion of Simulator Sickness Questionnaires before and after each condition and discomfort scores every 3 minutes. During each trial, subjects made yaw head movements of 15-50° over ~1.5 seconds, and fixated on either an environmentally-fixed (scene-fixed) target, or a target fixed relative to their field of view (FOV, head-fixed). We manipulated RSL by reducing FOV to ~40° using a peripheral mask (central), a ~40° scotoma (peripheral), or not at all (full). The visual scene was an optokinetic drum displayed using the HTC Vive Pro Eye. Visual scene motion was manipulated to be slower or faster than the subject’s head movement, and subjects reported the direction of conflict on each trial (as with or against, respectively). We fit a psychometric function to the data to find the visual gain (visual/head speed) perceived as stationary (PSE, accuracy) and range of gains compatible with perception of a stationary visual environment (JND, precision). Results show correlations between JND, PSE, and sickness scores. Better precision is associated with better accuracy during conflict detection, and better accuracy and precision are both associated with lower reports of VR sickness. Additionally, sensitivity to conflict (lower JND) was greatest and sickness was lowest during scene-fixed and central conditions, consistent both with the known benefits of natural fixation behavior and FOV restriction. These results are the first to our knowledge to demonstrate an association between conflict sensitivity and VR sickness.

Acknowledgements: Research was supported by NIGMS of NIH under grant number P20 GM103650 and by NSF under grant number IIS-1911041

Talk 5, 9:15 am, 51.15

Multisensory processing supports deep encoding of visual objects

Shea E. Duarte1 (), Joy J. Geng1; 1University of California, Davis

Visual object recognition memory can be improved when an object is encoded alongside its characteristic sound (a dog and a bark). Recent research showed that this redundant auditory information specifically benefits recollection-based recognition memory, suggesting that the recognition enhancement is based on memory for specific details of the perceptually encoded event rather than general object familiarity (Duarte et al., 2021). While previous work has focused exclusively on audiovisual memory effects on individual objects, recollection-specific effects may be impacted by the presence of additional items in the visual field at encoding. In the present work, we investigated whether this recollection improvement is impacted by the presence of a second visual object at encoding. Participants performed an audiovisual encoding task in which pairs of visual objects were presented with a sound that was congruent with one of the objects or a control white-noise sound. Participants reported whether just one of the objects (indicated by a retroactive cue) would fit in a suitcase (Experiment 1) or whether the two items were related (Experiment 2). For both experiments, they performed a remember/know recognition task for each individual visual item after the perceptual encoding. Results from Experiment 1 (n=50) replicated the finding that recollection was improved for objects paired with congruent sounds at encoding relative to those paired with control sounds, even with an additional visual object present at encoding. Experiment 2 (n=50) showed that when participants are required to consider the relationship between the two visual items at encoding, the memory benefit of audiovisual processing is mitigated such that recollection and familiarity-based recognition are not different between conditions. These results suggest that multisensory processing supports visual memory by facilitating elaboration on an object’s identity, and that this facilitation is mitigated when elaboration on both an audiovisual and visual stimuli is required by task demands.

Talk 6, 9:30 am, 51.16

EEG evoked activity suggests amodal evidence integration in multisensory decision-making

Thomas Schaffhauser1, Alain De Cheveigné1, Yves Boubenec1, Pascal Mamassian1; 1CNRS & Ecole Normale Supérieure, Paris, France

Recent works in neuroimaging have revealed neural signatures of evidence integration (O’Connell et al., 2012, Nat Neuro; Philiastides et al., 2014, J Neuro) that reflect the ramping activity of neurons in the parietal cortex. While these experiments focused on unisensory visual and auditory perceptual decision-making, it is unclear to what extent the neural correlates of multisensory evidence integration are shared with their unisensory counterparts. To address this issue, we designed a change detection paradigm in which twenty-one participants monitored a continuous stream of visual random dot motion and auditory tone clouds. The random dot motion was displayed within a circular aperture and consisted of 200 small dots repositioned every 50 ms. The tone clouds consisted of 10 simultaneous 50 ms pure tones drawn from a range of 6 octaves (220 to 14,080 Hz) with a resolution of 12 semitones per octave. In this continuous bimodal stream, participants had to detect unisensory changes (a change from incoherent noise to a coherent pattern of upward moving dots or rising tone sequences) or bimodal changes (simultaneous auditory and visual changes in coherence) while continuous EEG was acquired via 64 scalp electrodes. EEG activity was denoised with spatial filtering techniques to isolate components that capture neural activity most reproducibly evoked by stimulus change onset (de Cheveigné & Simon, 2008, J Neuro Methods). EEG evoked activity could be discriminated between visual and auditory target stimuli highlighting separable encoding of visual and auditory coherence changes. Further analyses revealed a component rising before participants response that echoes evidence accumulation and appeared to be common for both unisensory (visual, auditory) and redundant audio-visual changes. These results point to a single amodal accumulator that integrates evidence coming from each sensory modality in isolation or a combined bimodal signal.