Discriminability varies across themes of visual scenes in a 2AFC task on theme-matched real-vs-AI image pairs

Poster Presentation 53.339: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Virtual reality

Alan L. F. Lee1,2 (), Haonan Kong1, Elaine Jiaxi Fan1, Kevin Kaiwen Guo3, Siu Lung Jacky Tse1; 1Department of Psychology, Lingnan University, Hong Kong, China, 2Cognitive Science Research Centre, Lingnan University, Hong Kong, China, 3School of Engineering, Capital Normal University, Beijing, China

Images generated by artificial intelligence (AI) can be indistinguishable from photographs of real scenes. While many recent studies have focused on face images, few have studied non-face images. This gap raises an important question about how our generic ability to discriminate between real photographs and AI-generated images varies across visual scene types. Here, we created pairs of theme-matched, real-vs-AI images to address this question using a two-alternative, forced-choice (2AFC) task. From online photo databases, we obtained 60 real photographs for 12 scene categories, every 3 of which belong to the following 4 themes: humans, nature, interactables, and man-made structures. We then used commercially-available multimodal large language models to transform each real photograph first into detailed text descriptions and then into an AI-generated image. These theme-matched image pairs were then evenly divided into 5 batches, with 40 participants (total = 200 from Prolific.com) completing the following tasks for each batch. On each trial, participants viewed the two images side by side simultaneously. They chose the one that was more likely to be AI-generated and then rated their confidence on a continuous scale based on their decision. In general, participants achieved above-chance discriminability, with an average 2AFC d’ of ~0.70. They were also metacognitively sensitive about their decisions, with significantly higher confidence for correct than for incorrect responses. Critically, we found that discriminability varied across scene categories: with food being the highest (0.96), followed by images with humans (~0.81), and indoor scenes being the lowest (0.50). Furthermore, performance and confidence were dissociated depending on scene category, e.g., despite the high discriminability, confidence for food images was significantly lower than that for images with humans. These findings suggest that the theme of a visual scene can influence our perception and metacognition of realism judgment on images.

Acknowledgements: This work was partially supported by the Research Matching Grant Scheme from the University Grants Council of HKSAR, China.