Visual Search: Real-world scenes, objects

Talk Session: Tuesday, May 19, 2026, 8:15 – 9:45 am, Talk Room 1
Moderator: Geoffrey Woodman, Vanderbilt University

Talk 1, 8:15 am, 51.11

Processing of scene summary statistics is rapid and affects feature-based attention

Jessica N. Goetz1, Mark B. Neider1; 1University of Central Florida

Numerous studies have suggested that the visual system relies on the rapid processing of summary scene statistics to dynamically tune selective attention mechanisms during visual search (e.g., Becker et al., 2025; Rosenholtz, 2024). We have demonstrated that primary color features are used to guide attention more than secondary color features when observers search for real-world objects defined by more than one color (Goetz & Neider, 2025a). However, we have not investigated whether these effects translate to scenes. In the current study, 27 participants were shown a 200 ms target preview on a white background followed by a search array presented against a phase-scrambled real-world scene. We used the MATCH toolbox (Goetz & Neider, 2025b) to determine the primary and secondary colors in the scenes and real-world object targets. We selected scenes that were close in color to the target’s primary color (primary scene), secondary color (secondary scene), and scenes far from the target’s primary color (control scene). The toolbox was used to create color-matching distractors to the target’s primary and secondary colors. Additionally, there were distractors that matched the target’s shape but differed from the target’s color. We predicted dynamic tuning of selective attention based on each scene such that the central guiding feature would differ depending on the specific elements of each scene. First saccade served as a proxy to measure visual selection. The results indicated that in control scenes, both colors were used to guide attention (all ps < .001), whereas colors and shape were used to guide attention in secondary scenes (all ps < .001). Critically, in primary scenes, the secondary color was used most to guide attention (all ps ≤ .007). The data suggest that search templates are malleable and sensitive to the rapid processing of scene properties, specifically the color summary statistics in a scene.

Talk 2, 8:30 am, 51.12

Visual search for real-world objects is shaped by the variability of object-category templates

Susan Ajith1 (), Daniel Kaiser1,2,3,4, Lu-Chun Yeh1; 1Neural Computation Group, Department of Mathematics and Computer Science, Physics, Geography, Justus Liebig University Giessen, 2Center for Mind, Brain and Behavior (CMBB), Universities of Giessen, Marburg, and Darmstadt, 3Center for Applied Computer Science and Data Science (ZAD), Justus Liebig University Giessen, 4Cluster of Excellence “The Adaptive Mind”, Universities of Giessen, Marburg, and Darmstadt

Object categories vary widely in their appearance. While some are defined by relatively narrow feature configurations across exemplars (e.g., doors), others yield vast feature variations (e.g., bags). When searching for an object, such differences in diagnostic variability could shape the precision of target templates, in turn affecting search efficiency: Categories with low variability should yield narrow templates and efficient search, whereas highly variable categories should produce coarser templates and less efficient search. Here, we used human drawings as a window into the variability of target templates, where more diverse drawings within a category indexed greater template variability. In two experiments, we then examined the effects of this variability on visual search performance. In Experiment 1, we selected 200 real-world object categories and quantified variability in their exemplar drawings from the THINGS drawings dataset using a deep neural network (DNN). Participants then performed a cued visual search task in which they indicated whether the target category was present or absent. Critically, search performance correlated with object variability: the greater the variability, the longer it took participants to find the target or report its absence. In Experiment 2, we selected 12 real-world object categories and quantified their variability at the individual level. Participants drew four exemplars per category before performing the cued search task. On the group level, object variability again affected search performance, with less efficient search for objects with variable drawings. By computing similarity between each participant’s drawings and individual search targets, we found evidence for prioritised templates based on drawing order: participants’ first drawing was the strongest predictor of search performance. Further, individuals’ own drawings predicted their search better than other participants’ drawings, revealing that individual differences in template variability shape search efficiency. Together, our findings provide robust evidence that inherent variability in object category templates shapes search behavior.

SA is funded by the JLU graduate scholarship. LCY is supported by the MSCA programme (101149060). DK is supported by the DFG (SFB/TRR135,222641018; KA4683/5-1, 518483074, KA4683/6-1, 536053998), “The Adaptive Mind”, and an ERC Starting Grant (PEP, ERC-2022-STG 101076057)

Talk 3, 8:45 am, 51.13

Salience bias during visual search in natural scenes

Hyunwoo Gu1, Justin L. Gardner1; 1Stanford University

Classical attentional capture studies using simplified search arrays have shown that goal- and salience-driven systems can be placed in conflict, making search less efficient. However, unlike artificial arrays, salience and target-relevant features in natural scenes show complex statistical relationships that vary between concordance and conflict. This raises a key question about which system exerts greater influence on gaze selection, and how their influences evolve under conflict in natural scenes. We analyzed published natural visual search datasets using saliency models and vision-language model template matching, finding that human fixation choices are biased by salience features. Furthermore, search templates directly estimated from gaze patterns recapitulated this bias, showing a tendency to be similar to salience templates derived from free-viewing. Using deep stimulus synthesis, we further demonstrated that this salience bias can be used to control gaze selection patterns during visual search. Consistent with classical attentional capture, we found that mismatch between salience and target features lowers search performance. However, capture was not static; we observed that salience bias was strongest during initial fixations and attenuated as search progressed. To further characterize these dynamics, we fit an observer model that chose fixations from either salience or target maps at each step. We found that the initial probability for salience was higher but the probability of remaining in the target mode increased, indicating a tendency to switch from salience to target drive as search progressed. Finally, analyses of natural stimuli revealed that salience features alone yield above-chance search performance. This stands in contrast to classical search arrays, which are often designed to make salient distractors task-irrelevant, thereby motivating observers to suppress them. Thus, in natural scenes, salience serves as an informative signal that helps constrain search, justifying the initial bias despite the potential for capture.

Talk 4, 9:00 am, 51.14

Where is my cereal? Impacts of target saliency and background complexity on visual search at home

Qingying Gao1 (), Aleks Mihailovic1, Pradeep Ramulu1, Yingzi Xiong1; 1Johns Hopkins University

Efficient visual search in a scene relies on both bottom-up features such as target saliency and number of distractors, and top-down factors such as prior knowledge of target locations. At familiar places such as homes, do bottom-up features still matter when we look for common objects? This question is especially relevant to individuals with vision impairment, who show slower visual search than those with normal vision, even at familiar home environments. Thirty-two participants with vision impairment performed three everyday tasks at their homes, including paying bill, preparing meal, and taking medicine. Each task requires sequential search of predetermined objects (e.g., bowl, cereal, milk, and spoon for preparing meal), and object-level search time was recorded. Photos of home environments containing each object were captured, and target saliency and background complexity were quantified through two methods: 1) computer vision (CV)-based analysis, with target saliency quantified by root-mean-square contrast, color diversity, and itti’s saliency; and background complexity quantified by clutterness, scene congestion, and object density, and 2) human experts’ ratings based on a clinically-developed rubric (1-5 for saliency and clutterness, respectively). Across participants, the search time ranged from 3.97s for medicine to 25.78s for stamp. For CV-based analysis, target saliency, but not background complexity, showed small but significant associations with search time. Higher contrast (r = -0.14), color diversity (r = -0.12), and itti’s saliency (r = -0.17) were associated with shorter search time (p < 0.05). For human expert-rated features, however, background complexity, but not task saliency, was significantly associated with search time (r = 0.29, p < 0.001). Our results provide evidence that scene features affect visual search by individuals with vision impairment, even in familiar home environments. Computer vision models and human experts each have their strengths and limitations in estimating environmental features that best predict visual search time.

Talk 5, 9:15 am, 51.15

An Investigation of the Effects of Target Long-Term Memory Strength on Hybrid Search

Raevan L. Hanan1, Melissa R. Beck1, David A. Tomshe1; 1Louisiana State University

Hybrid search requires observers to conduct a working memory task, locating a target in a visual display, while retrieving multiple potential targets from long-term memory (LTM). Prior work has demonstrated that the availability and precision of LTM representations are shaped by exposure, encoding quality, and learning criteria. Further, the availability and precision of LTM representations can meaningfully influence search performance. The present study examines whether strengthening LTM representations of target items improves hybrid search performance. Participants (N = 144) were assigned to either baseline (n=72) or high (n=72) LTM strength conditions. The high-strength condition included more exposure to the items during familiarization phase and required more detailed representations and a higher accuracy threshold to progress from the familiarization to the search phase. These requirements for familiarization encourage more precise and more readily accessible target templates for the high-strength group relative to the baseline group. The strengthened templates should facilitate retrieval into WM, increasing search accuracy and potentially reducing the time for template-to-object comparison during search. The high-strength group (M = 0.90, SD = 0.04), demonstrated higher search accuracy than the baseline group (M = 0.85, SD = 0.09), t(105) = -4.57, p < .001. For search response time, the high memory strength group (M = 4661.68, SD = 785.63), when compared to the baseline (M = 4941.46, SD = 945.04), revealed only a marginal effect, t(137) = 1.93, p = .055. This pattern of effects indicates that improved LTM strength impacts primarily search accuracy rather than search speed, suggesting that target LTM strength impacts target identification during search, but not search guidance. These results highlight the value of strengthening LTM representations for improving accuracy in applied search tasks where correct identification is critical (i.e., security distinguishing contraband from benign items, pharmacists identifying medications, or criminalists recognizing objects during evidence recovery).

Talk 6, 9:30 am, 51.16

Typo detection as a window into interactions between visual search and readability

Emily Heffernan1 (), Kiran Panicker1, Anna Kosovicheva1, Benjamin Wolfe; 1University of Toronto

When scanning this VSS abstract for (hopefully nonexistent!) spelling errors, the reader must balance speed and accuracy. As a form of visual search, typo detection relies on both reading and attentional guidance, but the extent to which the visual properties of text and visual saliency impact this process has received limited attention. In three experiments, we explored how text appearance and attentional (mis)guidance influence search dynamics. In Experiments 1 and 2, participants (N = 17 each) scanned “pseudoparagraphs” of random words for typos in a single-target visual search paradigm. Words were presented in either hard- or easy-to-read fonts. The “hard” font condition used a high stroke contrast in Experiment 1 and a condensed font in Experiment 2. For both experiments, accuracy remained high in each condition, but participants took longer to respond after fixating the typo in the hard-to-read condition. Drift diffusion modelling isolated this change to slower evidence accumulation for the hard-to-read fonts; nondecision time and decision threshold were unchanged across conditions. These findings demonstrate that manipulations to text appearance can interrupt processing efficiency. In Experiment 3, we studied how attentional guidance impacts search in a typo foraging task with 0–4 typos per pseudoparagraph. Pilot participants (N = 9) completed two conditions: a baseline task and a second with imperfect “spellchecker” annotations. Participants were faster and more accurate with annotations. However, eye tracking data indicated a shift in strategy—with annotations, participants had shorter fixations and fixated fewer words. In other words, annotations generally improved search outcomes but decreased quitting thresholds. This novel paradigm highlights how stimulus properties can influence search, with poor readability hampering evidence accumulation and reliance on automated aids impacting quitting threshold. These results point to shared processes across different forms of visual search but also highlight the importance of readability for text searches.

Funding Acknowledgments: This work was supported by a SSHRC Insight Grant to BW and AK.