Identifying the visual features of European Paleolithic cave paintings that are diagnostic of category, age, and location
Poster Presentation 56.412: Tuesday, May 19, 2026, 2:45 – 6:45 pm, Pavilion
Session: Object Recognition: Models
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
David Tomz1 (dtomz@stanford.edu), Kushin Mukherjee1, Isobel Wisher2,3,4, Frederieke Wullf3, Laerke Braedder3, Chuan Yan6, Barbara Tversky1,5, Riccardo Fusaroli3,4, Kristian Tylén3,4, Judith E Fan1,6,7; 1Department of Psychology, Stanford University, 2Department of Archaeology and Heritage Studies, Aarhus University, 3Department of Linguistics, Cognitive Science and Semiotics, Aarhus University, 4Interacting Minds Center, Aarhus University, 5Teachers College, Columbia University, 6Department of Computer Science, Stanford University, 7Graduate School of Education, Stanford University
Humans living during the Upper Paleolithic period (45,000 to 12,000 years cal.BP) and located in Europe were among the earliest humans to produce figurative art. Analyzing the way these early depictions carry meaning could advance understanding of human visual abstraction—the capacity to produce a set of visual marks to represent an abstract idea. In this study, we analyzed 140 Paleolithic cave paintings of animals from four cave sites in Monte Castillo, Spain. These depictions were photographed, then traced by archaeologists specializing in cave art from this period. Based on all available evidence, archaeologists have attributed each depiction as representing one of four animal species (i.e., Horse, Bison, Hind, Ibex) and being a particular age (Gravettian: 31,500-24,000 cal.BP; Solutrean: 24,000-20,000 cal.BP; Magdalenian: 20,000-13,000 cal.BP). Because these experts had access to a wide array of information beyond the visual properties of each depiction, it was unknown what information directly available in these images is associated with their age, location, and which animal species they depict. To address this question, we leveraged state-of-the-art computer vision algorithms (ResNet, ViT, SigLIP) to measure how reliably age, animal species, and location information could be decoded from only the visual features of these cave depictions. We found that all of these properties could be decoded with above-chance accuracy (age: 74.7%; location: 68.0%; species: 63.4%, ps<.01), and that some regions were more diagnostic than others for predicting age (head>legs: t=4.398, p<.001) and location (head>legs: t=3.716, p<.001). Taken together, these findings highlight the promise of developing computational models that capture how observable variation in early cave depictions is linked to major cultural changes that unfolded throughout the Upper Paleolithic period.
Acknowledgements: EU ERC consolidator grant #101044626 to KT; U.S. NSF CAREER Award #2436199, U.S. NSF DRL #2400471, and a Hoffman-Yee grant from the Stanford Human-Centered AI Institute (HAI) to J.E.F.