Monocular and Binocular Visual Tracking


TR Number



Journal Title

Journal ISSN

Volume Title


Virginia Tech


Visual tracking is one of the most important applications of computer vision. Several tracking systems have been developed which either focus mainly on the tracking of targets moving on a plane, or attempt to reduce the 3-dimensional tracking problem to the tracking of a set of characteristic points of the target. These approaches are seriously handicapped in complex visual situations, particularly those involving significant perspective, textures, repeating patterns, or occlusion.

This dissertation describes a new approach to visual tracking for monocular and binocular image sequences, and for both passive and active cameras. The method combines Kalman-type prediction with steepest-descent search for correspondences, using 2-dimensional affine mappings between images. This approach differs significantly from many recent tracking systems, which emphasize the recovery of 3-dimensional motion and/or structure of objects in the scene. We argue that 2-dimensional area-based matching is sufficient in many situations of interest, and we present experimental results with real image sequences to illustrate the efficacy of this approach.

Image matching between two images is a simple one to one mapping, if there is no occlusion. In the presence of occlusion wrong matching is inevitable. Few approaches have been developed to address this issue. This dissertation considers the effect of occlusion on tracking a moving object for both monocular and binocular image sequences. The visual tracking system described here attempts to detect occlusion based on the residual error computed by the matching method. If the residual matching error exceeds a user-defined threshold, this means that the tracked object may be occluded by another object. When occlusion is detected, tracking continues with the predicted locations based on Kalman filtering. This serves as a predictor of the target position until it reemerges from the occlusion again. Although the method uses a constant image velocity Kalman filtering, it has been shown to function reasonably well in a non-constant velocity situation. Experimental results show that tracking can be maintained during periods of substantial occlusion.

The area-based approach to image matching often involves correlation-based comparisons between images, and this requires the specification of a size for the correlation windows. Accordingly, a new approach based on moment invariants was developed to select window size adaptively. This approach is based on the sudden increasing or decreasing in the first Maitra moment invariant. We applied a robust regression model to smooth the first Maitra moment invariant to make the method robust against noise.

This dissertation also considers the effect of spatial quantization on several moment invariants. Of particular interest are the affine moment invariants, which have emerged, in recent years as a useful tool for image reconstruction, image registration, and recognition of deformed objects. Traditional analysis assumes moments and moment invariants for images that are defined in the continuous domain. Quantization of the image plane is necessary, because otherwise the image cannot be processed digitally. Image acquisition by a digital system imposes spatial and intensity quantization that, in turn, introduce errors into moment and invariant computations. This dissertation also derives expressions for quantization-induced error in several important cases. Although it considers spatial quantization only, this represents an important extension of work by other researchers.

A mathematical theory for a visual tracking approach of a moving object is presented in this dissertation. This approach can track a moving object in an image sequence where the camera is passive, and when the camera is actively controlled. The algorithm used here is computationally cheap and suitable for real-time implementation. We implemented the proposed method on an active vision system, and carried out experiments of monocular and binocular tracking for various kinds of objects in different environments. These experiments demonstrated that very good performance using real images for fairly complicated situations.



Adaptive Window Selection, Binocular Tracking, Active Vision, Low-level Vision, Monocular Tracking, Moment Invariants, Image Matching