Better Models Through Worse Images: Degradation Training Helps Align CNNs with Humans

Poster Presentation 56.415: Tuesday, May 19, 2026, 2:45 – 6:45 pm, Pavilion
Session: Object Recognition: Models

Connor Parde1, Zelin Zhao1, Frank Tong1; 1Vanderbilt University, Department of Psychology

Human object recognition is remarkably robust to blur, noise, and adverse environmental conditions. A central question in vision science is how the visual system develops such robustness, and whether artificial systems can acquire similar tolerance through targeted training. Convolutional neural networks (CNNs), by contrast, often fail when presented with degraded input, raising the question of which training regimes yield robustness patterns that enhance performance and align CNNs with human perception. Here, we directly compared human blur and noise thresholds with the robustness profiles of CNNs trained on degraded images. Human participants (n=20 per condition) performed a progressive-revelation task using 800 images from 16 ImageNet categories, in which stimuli began as extreme blur or noise and gradually became clearer. Recognition thresholds were defined as the degradation level at which the correct category was identified. AlexNet and VGG-19 models were trained on the same categories under multiple regimes: standard images; blur; Gaussian noise; uniform noise; and adverse-weather corruption (50% or 75% snow/fog/rain). CNN thresholds were computed analogously by identifying the highest degradation level that permitted accurate classification. Blur training produced the broadest robustness gains, generalizing across both blur and noise. Noise training strongly enhanced noise robustness and modestly improved blur tolerance. Adverse-weather training conferred noticeably weaker gains. Models trained on Gaussian or uniform noise outperformed those trained on naturalistic weather corruptions. This disparity appears to reflect the severity of the degradation, with stronger distortions producing more robust, human-aligned representations. Further, networks trained on milder noise levels showed substantially reduced robustness, comparable to the weather-trained models. Critically, all degradation-trained networks showed stronger correlations with human thresholds than standard-trained models, with noise-trained networks achieving the highest correspondence and blur-trained networks also substantially improving alignment. These results suggest that access to strongly degraded input may play a key role in shaping robust, human-like visual systems.

Acknowledgements: This research was supported by NIH grants R01EY035157 and R01CA240274 to FT, and P30EY008126 to the Vanderbilt Vision Research Center.