Spatial scrambling in human vision: investigating efficiency for discriminating scrambled letters using convolutional neural networks and confusion matrices

Poster Presentation 56.452: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Pavilion
Session: Spatial Vision: Machine learning, neural networks

Xingqi Raffles Zhu1 (), Robert F. Hess1, Alex S. Baldwin1; 1McGill University

One limitation in our ability to discriminate different letters would be any spatial disorganization in the projections between different visual areas. This “scrambling” could be a source of a positional noise limiting human performance. In this study, we explored different forms this scrambling could take. Based on the idea that letter identification is supported by an optimal spatial frequency, we used spatially-bandpass letters. We devised a physiologically-inspired decomposition and resynthesis scheme, to generate letters composed of log Gabor wavelets. The form of these wavelets is similar to that of an oriented “simple cell” receptive field. We then introduced two forms of scrambling. The first was scrambling at the input to the "oriented receptive field" stage (subcortical scrambling of the receptive field). The second was scrambling at the output from that stage (scrambling connections to the higher “cortical” stages). We also performed a bandpass noise control condition. To compare against human performance, we simulated the responses of both a template-matching observer (TMO) and three convolutional neural networks (CNNs). The three CNNs were trained on the letter stimuli to perform each of the three noise conditions. We computed human efficiency relative to CNN performance. We also characterized mistakes using confusion matrices and computed the population stability index (PSI) as a distance measure between mistakes made by human and model observers. We found the CNNs employed distinct strategies for each condition. Human relative efficiency was higher for subcortical than cortical scrambling. In bandpass noise, PSIs for both TMO and CNNs were comparable. For our scrambling conditions however, the PSI of TMO was significantly higher than that of CNNs in all but one comparison. Our results suggest that the human strategy for identifying scrambled letters is better captured by CNNs, which may share more similar strategies for identifying scrambled letters than a simple TMO.

Acknowledgements: This work has been supported by Réseau de Recherche en Santé de la Vision (RRSV), Fond de Recherche du Québec - Santé (FRQ-S), and Canadian Institutes of Health Research (CIHR).