Motions of Parts and Wholes: An Exogenous Reference-Frame Model of Non-Retinotopic Processing
33.416, Sunday, 18-May, 8:30 am - 12:30 pm, Banyan Breezeway
Aaron Clarke1, Haluk Öğmen2, Michael Herzog1; 1BMI, SV, ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE (EPFL), 2Dept. of Electrical & Computer Engineering, University of Houston
Object parts are seen relative to their object. For example, the reflector on a moving bicycle wheel appears to follow a circular path orbiting the wheel’s center. It is almost impossible to perceive the reflector’s “true” retinotopic motion, which is a cycloid. The visual system discounts the bicycle motion from the reflector motion in a similar way to how eye movements are discounted from saccadic shifts. With reflectors, however, no efference copy is available. In addition, the visual system needs to create an exogenous reference-frame for each bicycle. Relativity of motion cannot easily be explained by classical motion models because they can only pick out retinotopic motion. Here, we show how a two-stage model, based on vector fields, can explain relativity of motion. The image is first segmented into objects and their parts (e.g. bicycles, reflectors, etc.) using association fields. Motion is computed for each object (e.g., bicycle and reflector motions) using standard motion detectors. Ambiguous correspondence matches are resolved using an autoassociative neural network (Dawson, 1991). Next, the motion vectors are grouped into local manifolds using grouping cues such as proximity and common fate (resulting in all the motion vectors from one bicycle and its parts being grouped together). Within each group, the common motion vector is, then, subtracted from the individual motion vectors (e.g., the bicycle motion is subtracted from the motion of its reflectors). Thus, the model tracks the bicycle and its reflectors across time, discounting for the bicycle’s overall motion. We test our model on several benchmarks, including the non-retinotopic motion perception in the Ternus-Pikler Display. Our model clearly outperforms all past models that either lack image segmentation or that apply only a single spatio-temporal filtering stage and thus fail to put object parts into a motion-based exogenous reference frame.