Ten mice suffice: Validating small sample factor analysis in vision science using sampling distributions and photopigment factors from spectral sensitivity data

Poster Presentation: Sunday, May 17, 2026, 2:45 – 6:45 pm, Pavilion
Session: Color, Light and Materials: Neural mechanisms

David Peterzell1,2,3 (); 1Fielding Graduate University, 2University of Florence, 3Color and Vision Network (CVNet)

Previously, factor analysis of spectral sensitivities from 86 bioengineered mice expressing human M and L photopigments (Jacobs, Williams, Cahill & Nathans, 2007) revealed that individual variability reflected underlying biological mechanisms rather than measurement error (Peterzell, Bloxham, & Jacobs, 2017 ECVP). Two primary factors, explaining >92% of variance, coincided with known M and L opsin absorption spectra, validating that factor analysis can extract number and tuning of known visual mechanisms from high-quality physiological (electroretinogram) data. A third factor (peak ~560 nm) explained 3.4% more variability due to tiny variability (<0.2nm) in peak wavelengths of the measured spectra. The present study examined minimum sample size requirements for detecting this factor structure using systematic sampling distributions. Random samples of 3-85 mice (100 iterations per sample size) were drawn, computing correlations between each sample's correlation matrix and the full 86-mouse primary correlation matrix. Additionally, principal component analysis with varimax rotation was performed on each sample, determining component retention using 95% variance criteria. Results revealed that samples as small as 10 mice reliably reproduced the correlation structure (Pearson r = 0.96, SD = 0.03) and consistently extracted 3 components. Even 8-9 mice showed good reliability (r = 0.93-0.96), though with greater variability. Below 8 mice, estimates became unstable in some but not all instances. Factor loadings from standardized data showed the expected two factor M+L structure with small lambda max variability (third factor) across sample sizes ≥10. These findings challenge conventional recommendations requiring large samples (N>100-200) for factor analysis. When data exhibit high signal-to-noise ratios with substantial individual variability in known mechanisms, samples of 10-12 participants (or even fewer) may suffice for correlation and factor analyses, and for reanalyzing small sample experimental data. This has important implications for re-analyzing archival data and for analyzing physiological and behavioral studies where large samples are impractical.

Acknowledgements: This project was begun with the late Jerry Jacobs who provided the data.