Spatiotemporal Integration of Motion Information in Human Crowds

Poster Presentation 23.467: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Action: Navigation, locomotion

Jiayi Pang1 (), William H. Warren1; 1Brown University

The ability to perceive and coordinate locomotion in flocks, schools, and crowds is fundamental for social animals, including humans. A common example is collective motion, walking together with a crowd of pedestrians. To do so, individuals adjust their walking direction (heading) and speed to coordinate with multiple neighbors in the field of view. Most models of this ‘flocking’ behavior assume that visual information about the motion of all neighbors is averaged simultaneously, in a form of ensemble perception. However, there is evidence that some species only process one or two neighbors at a time. In this study, we investigate how visual information about multiple moving neighbors is spatially and temporally integrated by a participant walking with a virtual crowd. We evaluate three hypothesis: (1) Simultaneous sampling - the movements of all neighbors are integrated at once and averaged to produce a precise estimation of group motion; (2) Sequential sampling - neighbors are sampled one at a time, and successive samples are averaged over time; and (3) Subsampling - a subset of neighbors is sampled simultaneously, and successive samples are averaged over time. Participants followed a crowd of 9 neighbors viewed in a VR headset, while we recorded eye movements and changes in heading direction in response to brief perturbations of neighbors’ heading. In Experiment 1, we identified a 0.5s perturbation as sufficient to elicit reliable heading responses by manipulating both perturbation duration and the number of perturbed neighbors. In Experiment 2, we applied an equivalent noise analysis by manipulating the variability of neighbors’ heading directions within this brief 0.5 s perturbation. The analysis indicated that all 9 neighbors were integrated simultaneously within a single sample, consistent with hypothesis1. The results support ensemble perception in a crowd, at least up to 9 neighbors, consistent with simultaneous averaging models of human flocking.