Temporal Processing

Talk Session: Saturday, May 16, 2026, 2:30 – 4:15 pm, Talk Room 2
Moderator: Nihan Alp, Sabanci University

Schedule of Events | Search Abstracts | Symposia | Talk Sessions | Poster Sessions

This session was recorded. Log In to view the video.

Talk 1, 2:30 pm, 24.21

Late, Non-overlapping Masks Disrupt Fast Target Processing

Martina Morea¹ (martina.morea@epfl.ch), Roberta Cessa², Michael H Herzog¹, Marco Bertamini^2,3; ¹Laboratory of Psychophysics, Brain Mind Institute, École Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland., ²Department of General Psychology, University of Padova, Padova, Italy, ³Department of Psychology, University of Liverpool, Liverpool, UK.

Visual perception arises at remarkably fast speed: humans can react to visual stimuli within ~190 ms, or even distinguish animals from cars in around 300 ms. Perception is thought to arise from an initial feedforward sweep followed by reentrant processing that stabilizes and refines the percept. Backward masking has long served as a tool to disrupt this recurrent processing, typically using masks that spatially overlap with the target (pattern masking) or surround it (meta-contrast masking). Recent findings show that even foveally presented masks—far from the peripheral target—can impair perception, suggesting the existence of “foveal feedback,” whereby peripheral representations project to foveal cortex to exploit its higher resolution. In this study, we examined whether similar effects occur for a low-level stimulus and task such as peripheral vernier offset discrimination. Participants performed a vernier discrimination task in which the target was followed by masks presented at various spatial locations and stimulus onset asynchronies (SOAs). We performed five experiments (with a total of 108 participants), in which the mask was either a dynamic noise patch or two vertical lines. For all mask types, locations, and SOAs (up to 250 ms), performance was impaired relative to no-mask baseline. Thus, perception is far less stable than it might appear, even for simple low-level tasks such as vernier acuity. Notably, the strongest masking occurred when parafoveal masks were presented in the target’s direction roughly 100 ms after its onset. These findings expand traditional models that confine masking to strictly spatial, retinotopic interactions. We propose that perception depends on a dynamically modulated spatiotemporal buffer influenced by feedback, attention, and oculomotor mechanisms.

Talk 2, 2:45 pm, 24.22

Action preparation modulates temporal crowding

Lilas Haddad¹ (lilas_haddad@brown.edu), Tri Nguyen¹, Yaffa Yeshurun^2,3, Joo-Hyun Song¹; ¹Brown University, ²School of Psychological Sciences, University of Haifa, ³Institute of Information Processing and Decision Making, University of Haifa

Temporal crowding is the impairment of object recognition when distractors appear before and after a target, and it usually reduces target encoding precision. Recent work suggests that perception and action interact in a bi-directional way, and that preparing an action can change how visual information is prioritized. Because prior studies of temporal crowding relied mainly on indirect mouse responses, it remains unclear whether preparing an action toward the target alters the impact of temporal crowding. We tested whether preparing an action, either relevant (Exp.1) or irrelevant (Exp.2) to the target orientation, changes temporal crowding. Participants judged the orientation (0 to 360°) of a target bar shown alone or with two temporally adjacent distractors, at short (200ms) or long (400ms) intervals. In Exp. 1 (N = 23), participants prepared an orientation relevant action by grasping the circle in the target orientation. In Exp.2 (N = 23), they prepared an orientation irrelevant action by pointing to the tip of the bar. Grasping requires specifying the orientation of the object to form a stable hand posture, whereas pointing requires locating the end point of the bar and does not depend on orientation. In both experiments, keypress responses in non-action trials served as baseline. Across both experiments, action preparation reduced guessing for crowded targets, suggesting increased access to target information. However, when the action required encoding orientation (grasping), temporal crowding decreased precision in orientation reports compared with the non-action baseline. This pattern was not observed when the action was orientation irrelevant (pointing). Results show that action preparation changes perception under temporal crowding. Preparing an action increases the availability of the target signal, but when action involves the crowded target feature, it increases susceptibility to distractor noise. Thus, action influences temporal crowding in different ways depending on its relevance to the visual feature being judged.

Talk 3, 3:00 pm, 24.23

Prediction In Naturalistic Movies Inverts The Temporal Order Of The Visual Processing Hierarchy

Tiziano Causin¹ (tiziano.causin@unitn.it), Ingmar de Vries^1,2, Christoph Huber-Huber¹, Eva Berlot², Floris de Lange², Moritz Wurm¹; ¹CIMeC, University of Trento, ²Donders Institute for Brain, Cognition, and Behaviour, Radboud University

In our ever-changing visual environment, the brain is continuously trying to predict what comes next at different temporal scales and levels of feature complexity. Up to now, visual perception has been mostly studied in static and artificial setups, thus hindering our ability to understand how representations unfold over time in the real world. While classical paradigms reveal neural activation to progress from low to high-level regions, we hypothesize that this order is flipped in realistic dynamic contexts because higher-level brain areas recurrently anticipate early processing stages. During MEG, 59 participants watched a movie (“1917”), which is characterized by a continuous stream of rich visual information without scene cuts. For each time point of the movie, we constructed gaze-dependent models from low-level (e.g. pixelwise, optical flow) to higher-level visual features (body posture, semantic saliency). We applied a dynamic extension to representational similarity analysis (dRSA) to measure the temporal alignment between timepoints in the movie and neural representations along the visual processing hierarchy. In occipito-parietal MEG sensors, we find that pixelwise luminance and optical flow (OF) magnitude are represented most strongly around ~40ms after stimulus onset, while OF direction was predicted ~-280ms in advance. In occipital sensors, body posture was represented after ~80ms from stimulus onset. Notably, this latency decreased markedly in temporal sensors (~0ms) and became predictive in frontal sensors (~-20ms), indicating that high-level areas represent body-posture earlier than occipital areas represent low-level visual information. Taken together, both regionwise and modelwise comparisons support the notion that an inverted hierarchy of predictive latencies governs the temporal order with which different representations appear over time. More generally, dRSA on naturalistic continuous stimuli paves the way for novel insights into the processing dynamics in the visual hierarchy.

Talk 4, 3:15 pm, 24.24

Gradual emergence of temporal-order judgments in late-sighted children

Marin Vogelsang¹ (ozaki@mit.edu), Lukas Vogelsang¹, Priti Gupta², Naviya Lall², Manvi Jain², Chetan Ralekar^1,3, Suma Ganesh⁴, Pawan Sinha¹; ¹Massachusetts Institute of Technology, ²Project Prakash, Dr Shroff's Charity Eye Hospital, ³IIT Roorkee, ⁴Pediatric Ophthalmology, Dr Shroff's Charity Eye Hospital

Determining the simultaneity or sequencing of visual events critically impacts perceptual inference. Cues signaling such temporal structure have been shown to significantly aid object discovery. This analysis constitutes a key developmental building block for transforming an infant's 'blooming buzzing confusion' into an organized sensorium. However, several factors underlying its genesis remain unclear. In particular, is early experience with spatially distinct visual entities necessary to learn their temporal arrangement, given that temporal relationships between different entities may be meaningful only if they are spatially resolvable? To address this question, we studied individuals with prolonged visual deprivation due to dense congenital cataracts who received sight surgeries late in childhood. We longitudinally tracked 13 patients from immediately before surgery through the first post-operative month and also examined 15 late-sighted individuals several years after surgery. Our control group comprised 22 normally-sighted individuals. In each trial, participants observed two transiently flashed visual stimuli with temporal lags between 17 and 500 ms and indicated which stimulus appeared first. We found that patients with several years of post-operative visual experience performed fully on par with controls. However, this ability was not evident immediately after surgery but followed a protracted progression, mirroring normal development. To examine the potential functional consequence of such gradual progression, we conducted simulations with video-based deep networks. Our results reveal that incorporating temporal degradations into early training phases yielded more temporally-extended model representations and improved downstream generalization, thereby conferring important advantages for robust visual development. Taken together, these findings reveal neural plasticity late into childhood for acquiring temporal-order judgments, in contrast to several other visual functions. They further suggest that time-based binding mechanisms may facilitate visual learning in the late-sighted, and that gradual temporal maturation may confer adaptive advantages, with implications for typical, atypical, and computational visual development.

NIH R01EY020517

Talk 5, 3:30 pm, 24.25

Dissociating Behavioral and Neural Correlates of Percept Duration Across the Visual Field

Julia Papiernik-Kłodzińska^1,2,3 (j.papiernik@doctoral.uj.edu.pl), Marek Binder¹, Renate Rutiku¹; ¹C-lab, Institute of Psychology, Jagiellonian University, ²Doctoral School in the Social Sciences, Jagiellonian University, Krakow, Poland, ³Centre for Brain Research, Jagiellonian University, Krakow, Poland

Despite the impression of uniform visual perception, it is evident that visual perception varies systematically across the visual field. It is better along the horizontal than the vertical meridian (horizontal-vertical asymmetry, HVA) and in the lower than the upper part of the vertical meridian (vertical meridian anisotropy, VMA). These phenomena have been reported for various perceptual tasks, yet their impact on the temporal processing of visual percepts remains unclear. Therefore, we sought to determine whether the time needed to create new percepts is influenced by the visual field inhomogeneities. A custom procedure using figure-ground modulation and steady-state visual evoked potential analysis (SSVEP) was implemented to assess disparities between objective and subjective temporal characteristics of stimuli. During trials, 6 Hz flickering pop-out figures emerged in different areas of the visual field from a randomly changing background consisting of line segments flickering and rotating at 6 and 15 Hz. Participants counted how many times each figure flickered, with flickers objectively consisting of either 5, 6, or 7 cycles. The study involved 25 healthy participants and used a 64-electrode Biosemi EEG system. The results show that participants consistently underestimated the number of flickering stimuli, averaging 4.56 counts per trial. Spatial location significantly affected behavioral responses, revealing HVA without evidence of VMA. Neural measures (ERPs, SSVEPs) did not mirror these spatial effects. Significant ERPs clusters varied across objective stimulus durations, regardless of the low accuracy in their subjective perception (0.16). The SSVEPs similarly remained unaffected by location yet showcased an increase of 6 Hz activity in the figure versus baseline time window. These findings highlight a disconnect between subjective perceptual asymmetries observed in behavior and the lack of similarly robust neural markers in EEG, which invites further refinement of the methodology used to study the spatial aspects of subjective visual processing.

This work was supported by the National Science Centre in Poland (Grant nr. 2021/42/E/HS6/00425).

Talk 6, 3:45 pm, 24.26

Decoupling of Illusory Time Perception and Awareness: Electrophysiological Evidence for Illusion-Blind Metacognition

Tutku Öztel¹ (toztel17@ku.edu.tr), Martin Wiener¹; ¹George Mason University

Human time perception is subjective. This subjectivity renders timed intervals prone to illusory percepts induced by stimulus psychophysical properties. One robust example for stimulus-induced temporal illusions is the leftward shift in psychometric functions associated with p(long) responses for timed durations as a function of stimulus velocity (Karsilar, Kisa & Balci, 2019). Recently, we have shown that subjects are unaware when illusory distortions in time occur, as shown by a lack of metacognitive insight (Öztel & Balci, 2020). What we still do not know about this metacognitive inability is how (1) stimulus-induced temporal illusions are represented and (2) if at all, metacognitive processes associated with these illusions are encoded at neural level. To address these questions, we recorded 64-channel electroencephalogram (EEG) activity in human adults (n = 26) in a temporal bisection task where participants classified the duration (1-3.5s) of an animated walking stickman figure with different velocity (i.e., 25, 50, 100 fps*) as being short or long and reported their confidence. Behaviorally we replicated our previous findings wherein faster velocities led to longer reported intervals, as well as lower precision yet faster reaction time; in contrast, confidence judgments did not vary with psychophysical shifts, only tracking reaction time. To our surprise, common EEG signatures of time perception, such as the frontocentral contingent negative variation (CNV) or late positive component of timing (LPCt) (Wiener & Thomspon, 2015; Ofir & Landau, 2022) did not vary with walking speed. Instead, onset-locked responses over occipital electrodes covaried with walking speed and correlated with subject-level shifts in psychometric functions. We further observed that these effects were driven by entrainment to walking speed. Altogether, our findings suggest that temporal illusions are engaged by bottom-up sensory processes, but fail to reach the metacognitive level; in contrast, metacognitive insight for time appears largely related to top-down sampling of motor output.

Talk 7, 4:00 pm, 24.27

IT Duration Coding Reflects Stimulus History, and Video ANNs Qualitatively Approximate This Computation

Dominique Chuaqui¹, Matteo Dunnhofer^1,2, Kohitij Kar¹; ¹York University, ²University of Udine

The ventral stream is frequently modeled as a static object-recognition system, yet accumulating evidence points to meaningful history-dependent computation. Recent evidence shows that the inferior temporal (IT) cortex, long considered a shape-selective endpoint of the ventral stream, responds differently to static images based on preceding stimulus history. Dynamic tasks provide a powerful tool for studying these temporal influences. Therefore, here, we used duration estimation as a tractable test case to ask: Does IT compute how long an object persists by integrating temporal history, or would a purely frame-based system, such as a feedforward ANN, yield the same outcome? We further asked whether modern video ANNs provide a better mechanistic hypothesis for IT’s dynamics. To diagnose history-dependence, we manipulated temporal coherence without altering the underlying images. Objects were shown either in their natural temporal sequence (coherent) or as temporally scrambled versions containing the exact same frames (incoherent). If IT duration estimates rely only on instantaneous object-bearing frames, coherence should not matter. If IT integrates stimulus history, coherent and incoherent videos should yield different duration predictions. We recorded activity from 119 reliable IT neurons (split-half>0.5) while macaques viewed 200 videos. Linear models trained on IT activity showed significantly higher duration-prediction accuracy for coherent than incoherent stimuli (r=0.68 vs. 0.60, p<0.001), demonstrating robust history-dependence in IT dynamics. We then evaluated two ANN hypothesis classes. Feedforward image models showed no coherence effect (Δ=0.0067, p=0.334) and performed below IT for both stimulus types. In contrast, video ANNs, which explicitly integrate temporal context, exhibited a significant coherence advantage (Δ = 0.0677, p =0.0073, statistically indistinguishable from IT: Δ= −0.0087, p=0.687), though their absolute correlations remained lower than IT. Together, these findings show that IT’s duration coding is shaped by stimulus history, and temporally integrated video ANNs partially capture this dependence, highlighting key constraints for future ANNs.

KK is supported by the Canada Research Chair Program (CRC-2021-00326), SFARI (967073), Brain-Canada Foundation (2023-0259), and NSERC (RGPIN-2024-06223).

Vision Sciences Society

Temporal Processing

Late, Non-overlapping Masks Disrupt Fast Target Processing

Action preparation modulates temporal crowding

Prediction In Naturalistic Movies Inverts The Temporal Order Of The Visual Processing Hierarchy

Gradual emergence of temporal-order judgments in late-sighted children

Dissociating Behavioral and Neural Correlates of Percept Duration Across the Visual Field

Decoupling of Illusory Time Perception and Awareness: Electrophysiological Evidence for Illusion-Blind Metacognition

IT Duration Coding Reflects Stimulus History, and Video ANNs Qualitatively Approximate This Computation

Important Dates

MyVSS

Join VSS

Future Meetings