Cue-invariant 3D shape reconstruction from human brain activity

Poster Presentation 23.414: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: 3D Shape and Space Perception: Cues, integration

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

Shiyun Yang^1,2 (yang.shiyun.32r@st.kyoto-u.ac.jp), Shuntaro Aoki^1,2, Reo Tsukasa^1,2, Misato Tanaka^1,2, Yukiyasu Kamitani^1,2; ¹Graduate School of Informatics, Kyoto University, Sakyo-ku, Kyoto 606-8501, Japan., ²ATR Computational Neuroscience Laboratories, Soraku, Kyoto 619-0288, Japan.

The human brain effortlessly constructs three-dimensional (3D) percepts from depth cues within 2D retinal images. While the neural processing of individual depth cues is well-documented, how these signals are integrated to form a coherent, cue-invariant 3D perception remains elusive. Here, we present a method to reconstruct 3D object appearance directly from human brain activity. Adopting a translator-generator framework, we decoded fMRI responses elicited by rendered 2D images into the latent space of a deep autoencoder trained on 3D point clouds, and subsequently transformed these latent features into explicit 3D shapes using a generative model. This approach accurately reconstructed 3D geometries of single objects from fMRI responses to their rendered 2D images, capturing the global shape and orientation for both familiar object categories and entirely novel objects unseen during training. Crucially, the same decoder, trained exclusively on 2D pictorial depth cues, successfully generalized to reconstruct 3D shapes from random-dot stereograms (RDSs), stimuli defined solely by binocular disparity. Reconstruction performance was generally high across the visual cortex for both stimulus types. However, in the RDS condition, while the dorsal visual cortex and MT-neighboring regions remained robust, performance in early visual areas tended to decline. Control experiments using "contour-matched" stereograms of slanted bars confirmed that the dorsal stream could recover the slanted 3D structures even though 2D contour information was experimentally silenced. These findings provide a tangible externalization of the brain's internal 3D world, offering direct evidence for a robust, cue-invariant representation of 3D appearance.

Acknowledgements: Supported by JSPS KAKENHI (JP25H00450, JP20H05705, JP20H05954), NEDO (JPNP20006), and JST CREST (JPMJCR22P3), all awarded to YK.

Vision Sciences Society

Cue-invariant 3D shape reconstruction from human brain activity

Important Dates

MyVSS

Join VSS

Future Meetings