A modular image-computable psychophysical spatial vision model
Poster Presentation 36.346: Sunday, May 19, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Spatial Vision: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Jannik Reichert1 (), Felix A. Wichmann2; 1Max Planck Institute for Software Systems, 2University of Tübingen
To explain the initial encoding of pattern information in the human visual system, the standard psychophysical spatial vision model is based on channels specific to spatial frequency and orientation, followed by divisive normalization (contrast gain-control). Schütt and Wichmann (2017, Journal of Vision) developed an image-computable implementation of the standard model and showed it to be able to explain data for contrast detection, contrast discrimination, and oblique and natural-image masking. Furthermore, the model induces a sparse encoding of luminance information. Whilst the model's MATLAB code is publicly available, it is non-trivial to extend, or integrate into larger pipelines because it does not provide a modular, pluggable programming framework. Based on the previous MATLAB implementation we developed a modular image-computable implementation of this spatial vision model as a PyTorch framework. Furthermore, we added a number of refinements, like a jointly spatially and spatial frequency dependent contrast gain-control. With luminance images as input, it is easy to employ the model on real-world images. Using the same psychophysical data, we compare our model’s predictions of contrast detection, contrast discrimination, and oblique and natural-image masking with the previous implementation. The major advantage of our framework, however, derives from its modularity and the automatic differentiation offered by PyTorch as these facilitate the implementation and evaluation of new components for the spatial vision model. Furthermore, our framework allows the integration of this psychophysically validated spatial vision model into larger image-processing pipelines. This could be used to take inputs from retina models instead of from pre-computed luminance images or to further process the model’s outputs with higher-level vision models. Given its flexibility, the model could also be used as a plug-in for or replacement of parts of artificial neural networks, which would enable comparison of aspects of human and machine vision.
Acknowledgements: JR was supported by CS@max planck – The Max Planck Graduate Center for Computer and Information Science. FAW is a member of the Machine Learning Cluster of Excellence, funded by the Deutsche Forschungsgemeinschaft under Germany’s Excellence Strategy – EXC number 2064/1 – Project number 390727645.