Cue-invariant 3D shape reconstruction from human brain activity
Poster Presentation 23.414: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: 3D Shape and Space Perception: Cues, integration
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Shiyun Yang1,2 (), Shuntaro Aoki1,2, Reo Tsukasa1,2, Misato Tanaka1,2, Yukiyasu Kamitani1,2; 1Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan., 2ATR Computational Neuroscience Laboratories, Soraku, Kyoto 619-0288, Japan.
The human brain effortlessly constructs three-dimensional (3D) percepts from depth cues within 2D retinal images. While the neural processing of individual depth cues is well-documented, how these signals are integrated to form a coherent, cue-invariant 3D perception remains elusive. Here, we present a method to reconstruct 3D object appearance directly from human brain activity. Adopting a translator-generator framework, we decoded fMRI responses elicited by rendered 2D images into the latent space of a deep autoencoder trained on 3D point clouds, and subsequently transformed these latent features into explicit 3D shapes using a generative model. This approach accurately reconstructed 3D geometries of single objects from fMRI responses to their rendered 2D images, capturing the global shape and orientation for both familiar object categories and entirely novel objects unseen during training. Crucially, the same decoder, trained exclusively on 2D pictorial depth cues, successfully generalized to reconstruct 3D shapes from random-dot stereograms (RDSs), stimuli defined solely by binocular disparity. Reconstruction performance was generally high across the visual cortex for both stimulus types. However, in the RDS condition, while the dorsal visual cortex and MT-neighboring regions remained robust, performance in early visual areas tended to decline. Control experiments using "contour-matched" stereograms of slanted bars confirmed that the dorsal stream could recover the slanted 3D structures even though 2D contour information was experimentally silenced. These findings provide a tangible externalization of the brain's internal 3D world, offering direct evidence for a robust, cue-invariant representation of 3D appearance.
Acknowledgements: Supported by JSPS KAKENHI (JP25H00450, JP20H05705, JP20H05954), NEDO (JPNP20006), and JST CREST (JPMJCR22P3), all awarded to YK.