Rating or comparing? Do pairwise methods offer advantages over Likert scales for measuring latent perceptual dimensions?

Poster Presentation 23.339: Saturday, May 16, 2026, 8:30 am – 12:30 pm, Banyan Breezeway
Session: Scene Perception: Intuitive physics

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

William Gaudreau¹ (williamj.gaudreau@gmail.com), Kimele Persaud², Omer Daglar Tanrikulu¹; ¹University of New Hampshire, ²Rutgers University - Newark

Likert scales are the most common format for rating data, but they have limitations: over many trials, participants may rescale their internal standard to the stimulus set, recall earlier responses to stay consistent, and cluster responses on a few integers. Two-alternative forced-choice (2AFC) tasks sidestep these issues because observers compare two stimuli at a time and need not maintain a global rating scale. Building on this idea, Clark et al. (2018) proposed using mElo scores to place stimuli on a latent scale derived from 2AFC data. We asked whether these advantages extend to a latent dimension without ground truth (perceptual surprise) and whether mElo scores are more sensitive than mean Likert ratings. Thirty-six participants viewed 58 videos of everyday objects falling; some outcomes violated material expectations, others behaved normally. In one block, participants rated how expected each outcome was on a Likert scale; in another, they chose the more expected outcome in 2AFC trials. We then computed mean Likert ratings and mElo scores for each video. Contrary to our predictions, mElo values were not more uniformly spread than mean Likert ratings, and split-sample correlations (25%, 50%, 75% of observers) indicated similar reliability. The two measures were strongly correlated across videos and both distinguished expected from unexpected outcomes, with mElo yielding only a modestly larger effect size for this contrast. Together, these results indicate that observers’ visual expectations for dynamic material events can be captured reliably with both Likert ratings and 2AFC-based mElo scores. For the kinds of naturalistic stimuli commonly used in vision science, pairwise methods offer only modest gains in sensitivity, but they provide a viable alternative for building continuous perceptual scales on latent dimensions such as surprise, animacy, or realism, which are central to contemporary theories of predictive vision but lack an objective ground truth.

Vision Sciences Society

Rating or comparing? Do pairwise methods offer advantages over Likert scales for measuring latent perceptual dimensions?

Important Dates

MyVSS

Join VSS

Future Meetings