Visual Inputs Reconstructing through Enhanced 3T fMRI Data from Optimal Transport Guided Generative Adversarial Network

Poster Presentation: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Pavilion
Session: Spatial Vision: Machine learning, neural networks

Yujian Xiong1 (), Wenhui Zhu1, Yalin Wang1, Zhong-Lin Lu2,3; 1Arizona State University, 2NYU Shanghai, 3New York University

Unraveling the intricacies of the human visual system via the reconstruction of visual inputs from functional Magnetic Resonance Imaging (fMRI) has seen significant strides with deep learning. However, the persistent demand for high-quality, subject-specific 7-Tesla (7T) fMRI experiments poses challenges. Integrating smaller 3-Tesla (3T) datasets or accommodating subjects with short, low-quality scans remains a hurdle. Here we propose a novel framework employing an Optimal Transportation Guided Generative Adversarial Network (GAN) to enhance 3T fMRI, surmount limitations in scarce 7T data and challenges associated with short, low-quality 3T scans which have less burden for subjects. Our model, the OT Guided GAN, comprises a six-layered U-Net designed to enhance 3T fMRI scans to a quality comparable to the original 7T scans. Training is conducted across 17 subjects in two datasets with distinct experimental conditions: the 7T Natural Scenes Dataset and the 3T Natural Object Dataset. Shared input images between these datasets consist of a common set viewed by both 3T and 7T subjects, enabling an unsupervised training scenario. Subsequently, two linear regression models transform the combined set of original 7T and enhanced 3T fMRI for input into the pre-trained Stable Diffusion model, facilitating the reconstruction of visual input images. We test the framework’s ability to reconstruct visual input images of natural scenes from an untrained 3T subject. The capabilities of the enhanced 3T fMRI data are demonstrated through the Fréchet Inception Distance (FID) score and human judgment, underscoring its proficiency in generating superior input visual images compared to recent methods that demand extensive 7T data. Once the framework is adequately trained, it can enhance any new subject with only 3T fMRI beyond the training set, utilizing the improved results to excel in demanding data tasks with superior performance.

Acknowledgements: The work was partially supported by NSF (DMS-1413417 \& DMS-1412722) and NIH (R01EY032125 \& R01DE030286).