An AI-Human Similarity Database for Studying Context-Dependent Representational Alignment in Medical Imaging

Poster Presentation 53.329: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Perceptual Organization: Grouping

Giovanna C. Del Sordo1 (), Eben W. Daggett1, Jon Art1, Michael C. Hout1; 1New Mexico State University

Developing artificial intelligence systems for medical image interpretation requires not only high performance but also alignment with human perceptual reasoning. Although AI can provide rapid and scalable diagnostic assistance, systems trained only on categorical labels often form internal representations that do not reflect how humans reason about images, reducing interpretability and introducing potential risks in clinical use. Understanding human perceptual reasoning, however, is not straightforward. Similarity perception is highly flexible: similarity ratings (e.g., between two images) change depending on task context, methodology, and observer expertise. To date, no large-scale resource systematically captures these contextual influences for use in training or evaluating representationally-aligned medical AI. We constructed an open-access database of human and AI similarity judgments (using a triadic comparison task) for a large set of histological medical images. Participants viewed three images and selected the one most dissimilar to the others. This method enables control over context, allowing us to model how framing shapes perceived similarity. Three context conditions were tested: Narrow (all images depict the same pathology), Intermediate (same tissue type but mixed pathology), Broad (different tissues and pathologies). These conditions induce shifts in perceived similarity, providing a structured testbed for studying similarity “flexing.” In parallel, we collected similarity judgments and natural-language rationales from vision-enabled large language models (ChatGPT Vision, Claude Max, Gemini Vision) using the same triadic tasks. The resulting database therefore includes both human- and machine-generated similarity structures, for direct comparison of psychological spaces of where AI and human perception converge/diverge. The database will support researchers in computer vision, cognitive science, and medical AI by providing a “sandbox” for evaluating model robustness to context, benchmarking AI-human alignment, and exploring methods for incorporating human perceptual structure into model training. Ultimately, this resource aims to advance the development of safer, more interpretable, and human-aligned AI systems for medical diagnostics.

Acknowledgements: This research is supported by the Institute for Applied Practice in AI and Machine Learning at New Mexico State University