A Bayesian Network approach to examine how toddlers with and without autism spectrum disorder learn from watching naturalistic videos of same-aged peers
Poster Presentation 33.475: Sunday, May 17, 2026, 8:30 am – 12:30 pm, Pavilion
Session: Decision Making: Perception 2
Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions
Caius Gibeily1,2 (), Warren R. Jones1,2, Sarah R. Shultz1,2; 1Emory University, 2Marcus Autism Center
Background—Autism spectrum disorder (ASD) is characterized by altered social and communicative functioning. Eye-tracking can predict diagnostic status and developmental outcomes in children with and without autism, yet most approaches quantify what children look at rather than how past viewing behavior shapes future viewing. A method for quantifying this dynamic, moment-to-moment “learning” from visual experience could offer new insight into cascading developmental processes. Methods—We developed a Bayesian network (BN) approach to model how toddlers’ viewing history shapes subsequent viewing states. TD toddlers (n=150) and toddlers with ASD (n=216) were eye-tracked while viewing movies of toddlers playing in a daycare. Visual targets were annotated using a data-driven procedure and semantically clustered using OpenAI’s CLIP model. Dynamic viewing patterns were then hierarchically clustered to condense moment-by-moment variation in viewing patterns and BNs fit for each movie and group. Results—TD and ASD toddlers differed in the distribution of targets viewed, with TD viewers showing greater attention to social-action/facial targets and ASD viewers to object-action/body-image targets (χ²(df=3, n=133)=31.5, p<0.05). BN structures, constructed from edges found in >=70% of bootstrapped samples, revealed both immediate and latent dependencies in future viewing states. Models incorporating prior viewing history improved prediction of subsequent states (TD: 12.5±3.7; ASD: 15.7±3.5; metric: percent accuracy above chance). Conclusion—Toddlers with ASD exhibit distinct patterns of contingent visual engagement when viewing naturalistic social scenes. By quantifying the extent to which future viewing behavior is changed by past viewing behavior, this framework provides a tractable metric of learning from visual experience. Such information may identify what is and is not being learned versus what is not, informing intervention targets such as appropriate levels of attentional or semantic complexity. More broadly, this approach offers a principled way to examine how moment-to-moment attentional biases exert cascading influence not only within a viewing session but potentially across developmental time.