Using V1-Based Models to Detect Changes in Natural Scenes
53.544, Tuesday, May 14, 8:30 am - 12:30 pm, Vista Ballroom
Pei Ying Chua1, Kenneth Kwok1; 1DSO National Laboratories, Singapore
We studied the performance of V1-based models in detecting the presence or absence of targets in natural scenes. All models were based on features of the human visual system, and incorporate mechanisms of visual processing such as colour opponency, receptive field tuning, linear and non-linear behaviour, and response pooling. They compare a baseline image and another of the same scene (which may contain a target) to detect if a target is present in the second image. Performance is evaluated by their sensitivity and accuracy in correctly detecting targets. For natural scenes, it is difficult to obtain two images under identical lighting and environmental conditions, and different conditions might lead to false detection of targets. The human visual system has to deal with diurnal changes in the ambient environment and thus human vision models might perform well even with images taken under different conditions. Using models of the visual system, it is possible to investigate the features of human visual processing that enable accurate target detection across a range of ambient conditions. We compared various models performance in detecting targets using 500 pairs of natural scene images obtained from the publicly available Change Detection Benchmark Dataset (Bourdis, Marraud, & Sahbi, 2011). Performance was measured in terms of sensitivity and accuracy. We found that models that contained response normalisation mechanisms (such as within-field suppression) were more sensitive to targets, while models that replicated eccentricity-dependent contrast sensitivity levels were more accurate in detecting targets. Response normalisation keeps cells within their dynamic ranges which might have contributed towards enhanced sensitivity. Further, models which were tuned to be more sensitive to high spatial frequencies were both more accurate and sensitive, possibly because high spatial frequencies represent the fine details within the image and are thus more likely to contain information about the presence of targets.