What does the world look like? How do we know?

Friday, May 13, 2022, 5:00 – 7:00 pm EDT, Talk Room 2

Organizers: Mark Lescroart1, Benjamin Balas2, Kamran Binaee1, Michelle Greene3, Paul MacNeilage1; 1University of Nevada, Reno, 2North Dakota State University, 3Bates College
Presenters: Mark Lescroart, Caitlin M. Fausey, Martin N. Hebart, Michelle R. Greene, Jeremy Wilmer, Wilma Bainbridge

< Back to 2022 Symposia

A central tenet of vision science is that perception is shaped by visual experience. The statistical regularities of our visual input are reflected in patterns of brain activity, enabling efficient behavior. A growing body of work has sought to understand the natural statistical regularities in human visual experience, and to increase the ecological validity of vision science research by using naturalistic stimuli in experiments. However, the stimuli available for experiments and the conclusions that can be drawn about natural image statistics–especially higher-order statistics, such as the co-occurrence rates of specific object categories in different scenes–are constrained by the limits of extant datasets. Datasets may be limited by sampling choices, by practical constraints related to the robustness of hardware and software used to collect data, by environmental factors like movement, or by the characteristics of different observers who vary in their behavioral repertoire as a function of development or experience. Consequently many visual datasets that aspire to generality are nonetheless sampled from convenience and thus limited in size or scope. Many datasets are also reliant on the use of proxies for visual experiences, such as photos or movies sampled from the internet. The potential consequences of this gap between what we hope to do and what we can do may be substantial. This workshop will examine the issues involved in sampling from representative visual experience, and will specifically address the lacunae of visual experience — what are we not sampling and why? How might the blind spots in our sampling lead to blind spots in our inferences about human vision? The symposium will begin with a brief 5-minute introduction, followed by six 15-minute talks with 2 minutes each for clarifying questions. We will finish with a 10-15 minute discussion of overarching issues in sampling. We will feature work that grapples with sampling issues across several areas of vision sciences. Mark Lescroart will talk about practical limits on the activities, locations, and people that can be sampled with mobile eye tracking. Caitlin Fausey will then talk about sampling visual experience in infants. Martin Hebart will talk about the THINGS initiative, which aims to comprehensively sample the appearance of object categories in the world. Michelle Greene will talk about the causes and consequences of biases in extant datasets. Jeremy Wilmer will talk about sampling participants–specifically about sampling across vs within racial groups. Finally, Wilma Bainbridge will talk about richly sampling memory representations. Perfectly representative sampling of visual experience may be an unreachable goal. However, we argue that a focus on the limits of current sampling protocols–of objects, of participants, of dynamic visual experience at different stages of development, and of mental states–will advance the field, and in the long run improve the ecological validity of vision science.


Methodological limits on sampling visual experience with mobile eye tracking

Mark Lescroart1, Kamran Binaee1, Bharath Shankar1, Christian Sinnott1, Jennifer A. Hart2, Arnab Biswas1, Ilya Nudnou3, Benjamin Balas3, Michelle R. Greene2, Paul MacNeilage1; 1University of Nevada, Reno, 2Bates College, 3North Dakota State University

Humans explore the world with their eyes, so an ideal sampling of human visual experience requires accurate gaze estimates while participants perform a wide range of activities in diverse locations. In principle, mobile eye tracking can provide this information, but in practice, many technical barriers and human factors constrain the activities, locations, and participants that can be sampled accurately. In this talk we present our progress in addressing these barriers to build the Visual Experience Database. First, we describe how the hardware design of our mobile eye tracking system balances participant comfort and data quality. Ergonomics matter, because uncomfortable equipment affects behavior and reduces the reasonable duration of recordings. Second, we describe the challenges of sampling outdoors. Bright sunlight causes squinting, casts shadows, and reduces eye video contrast, all of which reduce estimated gaze accuracy and precision. We will show how appropriate image processing at acquisition improves eye video contrast, and how DNN-based pupil detection can improve estimated pupil position. Finally, we will show how physical shift of the equipment on the head affects estimated gaze quality. We quantify the reduction in gaze precision and accuracy over time due to slippage, in terms of drift of the eye in the image frame and instantaneous jitter of the camera with respect to the eye. Addressing these limitations takes us some way towards achieving a representative sample of visual experience, but recording of long-duration, of highly dynamic activities, and in extreme lighting conditions remains challenging.

Sampling Everyday Infancy: Lessons and Questions

Caitlin M. Fausey1; 1University of Oregon

Everyday sights, sounds, and actions are the experiences available to shape experience-dependent change. Recent efforts to quantify this everyday input – using wearable sensors in order to capture experiences that are neither scripted by theorists nor perturbed by the presence of an outsider recording – have revealed striking heterogeneity. There is no meaningfully “representative” hour of a day, instance of a category, interaction context, or infant. Such heterogeneity raises questions about how to optimize everyday sampling schemes in order to yield data that advance theories of experience-dependent change. Here, I review lessons from recent research sampling infants’ everyday language, music, action, and vision at multiple timescales, with specific attention to needed next steps. I suggest that most extant evidence about everyday ecologies arises from Opportunistic Sampling and that we must collectively focus our ambitions on a next wave of Distributionally Informed Sampling. In particular, we must center (1) activity distributions with their correlated opportunities to encounter particular inputs, (2) content distributions of commonly and rarely encountered instances, (3) temporal distributions of input that comes and goes, and (4) input trajectories that change over developmental time, as we model everyday experiences and their consequences. Throughout, I highlight practical constraints (e.g., sensor battery life and fussy infants) and payoffs (e.g., annotation protocols that yield multi-timescale dividends) in these efforts. Grappling with the fact that people do not re-live the same hour all life long is a necessary and exciting next step as we build theories of everyday experience-dependent change.

The THINGS initiative: a global initiative of researchers for representative sampling of objects in brains, behavior, and computational models

Martin N. Hebart1; 1Max Planck Institute for Human Cognitive and Brain Sciences, Leipzig, Germany

As scientists, we carry out experiments to contribute to the knowledge of the world. Yet we have to make choices in our experimental design that abstract away from the real world, which can lead to selection bias, limiting our ability to translate our research findings into generalizable conclusions. For studies involving the presentation of objects, central choices are which objects are shown and – in case of visual stimuli – in what format object images should be presented (e.g. abstracted, cropped from background, or natural photographs). In this talk, I will discuss the THINGS initiative, which is a large-scale global initiative of researchers collecting behavioral and brain imaging datasets using the THINGS object concept and image dataset. I will highlight the motivation underlying the development of THINGS, the advantages and limitations in the object and image sampling strategy, and new insights enabled by this strategy about the behavioral and neural representation of objects. Further, I will discuss strategies that offer more generalizable conclusions for their small-scale laboratory experiments using THINGS images. Moving beyond THINGS, I will discuss ideas for future sampling approaches that may further narrow the gap between stimulus sampling and neural representations.

What we don’t see in image databases

Michelle R. Greene1, Jennifer A. Hart1, Amina Mohamed1; 1Bates College

The rise of large-scale image databases has accelerated productivity in both human and machine vision communities. Most extant databases were created in three phases: (1) Obtaining a comprehensive list of categories to sample; (2) Scraping images from the web; (3) Verifying category labels through crowdsourcing. Subtle biases can arise in each stage: offensive labels can get reified as categories; images represent what is typical of the internet, rather than what is typical of daily experience, and verification is dependent on the knowledge and cultural competence of the annotators that provide “ground truth” labels. Here, we describe two studies that examine the bias in extant visual databases and the deep neural networks trained from them. 66 observers took part in an experience sampling experiment via text message. Each received 10 messages per day at random intervals for 30 days, and sent a picture of their surroundings if possible (N=6280 images). Category predictions were obtained from CNNs pretrained on the Places database. The dCNNs showed poor classification performance for these images. A second study investigated cultural biases. We scraped images of private homes from Airbnb from 219 countries. Pre-trained deep neural networks were less accurate and less confident in recognizing images from the Global South. We observed significant correlations between dCNN confidence and GDP per capita (r=0.30) and literacy rate (r=0.29). These studies show a dissociation between lived visual content and web-based content, and suggest caution when using the internet as a proxy for visual experience.

Multiracial Reading the Mind in the Eyes Test (MRMET): validation of a stimulus-diverse and norm-referenced version of a classic measure

Jeremy Wilmer1, Heesu Kim1, Jasmine Kaduthodil1, Laura Germine1, Sarah Cohan1, Brian Spitzer1, Roger Strong1; 1Wellesley College

Do racially homogeneous stimuli facilitate scientific control, and thus validity of measurement? Here, as a case in point, we ask whether a multiracial cognitive assessment utilizing a diverse set of stimuli maintains psychometric qualities that are as good as, if not better than, an existing Eurocentric measure. The existing measure is the Reading the Mind in the Eyes Test (RMET) (Baron-Cohen et al., 2001), a clinically significant neuropsychiatric paradigm that has been used to assess face expression reading, theory of mind, and social cognition. The original measure, however, lacked racially inclusive stimuli, among other limitations. In an effort to rectify this and other limitations of the original RMET, we have created and digitally administered a Multiracial version of the RMET (MRMET) that is reliable, validated, stimulus-diverse, norm-referenced, and free for research use. We show, with a series of sizable datasets (Ns ranging from 1,000 to 12,000), that the MRMET is on par or better than the RMET across a variety of psychometric indices. Moreover, the reliable signal captured by the two tests is statistically indistinguishable, evidence for full interchangeability. Given the diversity of the populations that neuropsychology aims to survey, we introduce the Multiracial RMET as a high-quality, inclusive alternative to the RMET that is conducive to unsupervised digital administration across a diverse array of populations. With the MRMET as a key example, we suggest that multiracial cognitive assessments utilizing diverse stimuli can be as good as, if not better than, Eurocentric measures.

An emerging landscape for the study of naturalistic visual memory

Wilma Bainbridge1; 1University of Chicago

Our memories are often rich and visually vivid, sometimes even resembling their original percepts when called to mind. Yet, until recently, our methods for quantifying the visual content in memories have been unable to capture this wealth of detail, relying on simple, static stimuli, and testing memory with low-information visual recognition or verbal recall tasks. Because of this, we have been largely unable to answer fundamental questions such as what aspects of a visual event drive memory, or how the neural representations of perceived and recalled visual content compare. However, in recent years, new methods in quantifying visual memories have emerged, following the growth of naturalistic vision research more broadly. Instead of verbal recall, drawings can directly depict the visual content in memory, at a level of detail allowing us to simultaneously explore questions about object memory, spatial memory, visual-semantic interactions, and false memories. Social media is presenting new memory stimulus sets on the order of hundreds or thousands, allowing us to examine neural representations for diverse memories across years. And, the internet has also allowed us to identify surprising new phenomena in memory—such as the existence of shared visual false memories learned across people (the “Visual Mandela Effect”), or the existence of a population of individuals who lack visual recall in spite of intact perception (“aphantasia”). In this talk, I will present exciting new directions in the naturalistic study of visual memory and provide resources for those interested in pursuing their own studies of naturalistic memory.

< Back to 2022 Symposia