A signal detection framework for interpreting behavioral similarity

Poster Presentation 26.468: Saturday, May 16, 2026, 2:45 – 6:45 pm, Pavilion
Session: Theory

Yu (Eric) Qian1 (), Wilson S. Geisler1, Xue-Xin Wei1; 1University of Texas at Austin

Introduction: Analyzing behavioral similarity between observers is fundamental in perceptual and behavioral sciences. One recent application involves assessing how well the behavior of computational models (e.g., deep neural networks) predict that of animals or humans. While a number of methods were proposed (e.g., Cohen’s kappa, Gwet's AC1 and Krippendorff’s alpha), how to properly interpret these metrics remains debated. Methods: We proposed a principled framework based on signal detection theory (STD) to study behavioral similarity. In particular, we leverage a recent generalization of STD that models the decision of two observers simultaneously in binary choice tasks. This framework models the joint distribution of the decision variables of the two observers, and their similarity is captured by decision variable correlation (DVC). The decision of each observer is determined by the decision variables and criterion. By varying DVC and other key variables (such as d-prime, criterion of the observers), we can investigate how other popular behavioral metrics depend on key variables in SDT. Results: We find that estimating DVC from behavior is reliable from experimentally relevant sample sizes. Investigating popular metrics (e.g., Cohen's kappa), we find that these measures are monotonic in DVC but have complex relationships with other variables in SDT. They are sensitive to d-prime, criterion of individual observers, and class imbalance. For example, Cohen’s kappa reduces systematically when the difference of the accuracy between the two observers increases. In conclusion, we show that DVC is a robust and principled measure of behavioral similarity when information about the ground truth class is available. Conclusion: Under the SDT framework, the commonly reported behavioral agreement metrics are co-determined by DVC, d-prime, criterion and class imbalance. Thus, DVC and other SDT variables together constitute a more faithful and comprehensive representation of the underlying decision making processes.

Acknowledgements: This research is funded by NIH R01NS133924 and a Sloan Research Fellowship (to X.-X. Wei).