Determining the Optimum Frame Rate Required for Accurate Face-Diet Estimation for Wearable Eye-Tracking
Poster Presentation 53.468: Tuesday, May 19, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Face and Body Perception: Development, clinical
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Parsa Delavari1,2, Ipek Oruc1,2; 1Graduate Program in Neuroscience, University of British Columbia, 2Department of Ophthalmology and Visual Sciences, University of British Columbia
Understanding how often people see faces and attend to them is central to theories of face perception and experience-driven visual development. Wearable eye-tracking devices with egocentric cameras offer a powerful way to quantify this “face diet” in natural environments, but the temporal resolution needed to recover these measures reliably is not well established. The minimum required frame rate determines both measurement fidelity and computational burden. Here we examine the minimum frame rate needed to obtain estimates of daily-life face exposure, including a new measure of gaze-confirmed viewing of attended faces, with acceptable accuracy. Five participants wore Tobii Pro Glasses 3 for two hours of everyday activity. Faces were detected using InsightFace, and gaze data were used to identify attended faces in the scene. Based on full-resolution 25 fps recordings, participants were exposed to 15.55 ± 8.12 minutes/ hour of faces, consistent with prior work. Gaze-confirmed viewing of attended faces was 1.93 ± 1.76 minutes/hour, suggesting that direct attention to faces occupies only a small proportion of time. These 25 fps estimates were used as a benchmark to compare against various levels of downsampling at 12, 6, 3, 2, and 1 fps, and at 1 frame every 2, 4, 8, 15, 30, 60, and 120 seconds. For each rate, multiple subsamples were generated, and 95% confidence intervals were compared against a 5% error tolerance. For all faces, the confidence intervals for three of five participants remained within tolerance at 0.25 fps, and one at 0.07 fps; the most conservative case required 1 fps. In contrast, attended-face exposure required substantially higher temporal resolution, with stable estimates ranging from 1 fps to 12 fps across participants. Identifying frame rates that balance practical feasibility and reliable estimation is essential, and these findings highlight that minimum frame-rate requirements may vary in a task-dependent manner.
Acknowledgements: This work was supported by a Natural Sciences and Engineering Research Council of Canada (NSERC) Discovery Grant RGPIN-2025-05239 (IO).