Are Search Templates Target-object Reconstructions?

Poster Presentation 56.319: Tuesday, May 21, 2024, 2:45 – 6:45 pm, Banyan Breezeway
Session: Visual Search: Memory, search templates

Seoyoung Ahn1 (), Hossein Adeli2, Gregory Zelinsky1; 1Stony Brook University, 2Columbia University

Search theory heavily relies on the concept of a template, an internal representation of the target that, via a matching process to a visual input, creates a top-down attention biasing. The search template was originally conceptualized as being specific to a given object, but over the years this definition broadened to include the features of a target category. Exploiting recent generative methods, we suggest re-conceptualizing the search template yet again, thinking of it now as a fully-generated target object residing in peripheral vision and not just a collection of features. Our approach is to generate potential target-object appearances in degraded peripheral pixels. For example, when searching for a mosquito, our attention may be drawn to any small, roundish-shaped objects because they provide an ideal canvas for generating or attaching limbs. We used a Generative Adversarial Network (GAN)-based method to reconstruct peripheral objects so that they more closely resemble the typical appearance of the target category. We quantified the extent of pixel changes this reconstruction requires and tested whether the "reconstruction cost" accounts for target guidance in both digit and natural object-array search tasks. Our model, even though not explicitly trained for target detection, exhibited remarkable performance (~%90 accuracy) in locating target objects in blurred peripheral input, outperforming an object detector baseline. Moreover, the model exhibits a strong behavioral alignment with human eye-movements collected during the same task. Our model explained attention guidance comparably or significantly better than the detector in both target-present and target-absent conditions (Our Pearson’s r = 0.891, p = 0.013, and Detector's r = 0.911, p = 0.012 for target-present; Our's r = 0.332, p = 0.052, and Detector's r = 0.134, p = 0.056 for target-absent). Our work suggests that the target template may be an internal generation of a potential search target in peripheral vision.

Acknowledgements: This work was supported in part by NSF IIS awards 1763981 and 2123920 to G.Z. and by a grant to S.A from the American Psychological Association.