Kernel-based object tracking
- ISSN: 01628828
- DOI: 10.1109/TPAMI.2003.1195991
Abstract
A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects, is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information, Kalman tracking using motion models, and face tracking.
Kernel-based object tracking
Dorin Comaniciu, Senior Member, IEEE, Visvanathan Ramesh, Member, IEEE, and
Peter Meer, Senior Member, IEEE
Abstract—A new approach toward target representation and localization, the central component in visual tracking of nonrigid objects,
is proposed. The feature histogram-based target representations are regularized by spatial masking with an isotropic kernel. The
masking induces spatially-smooth similarity functions suitable for gradient-based optimization, hence, the target localization problem
can be formulated using the basin of attraction of the local maxima. We employ a metric derived from the Bhattacharyya coefficient as
similarity measure, and use the mean shift procedure to perform the optimization. In the presented tracking examples, the new method
successfully coped with camera motion, partial occlusions, clutter, and target scale variations. Integration with motion filters and data
association techniques is also discussed. We describe only a few of the potential applications: exploitation of background information,
Kalman tracking using motion models, and face tracking.
Index Terms—Nonrigid object tracking, target localization and representation, spatially-smooth similarity function, Bhattacharyya
coefficient, face tracking.
æ
1 INTRODUCTION
REAL-TIME object tracking is the critical task in manycomputer vision applications such as surveillance [44],
[16], [32], perceptual user interfaces [10], augmented reality
[26], smart rooms [39], [75], [47], object-based video compres-
sion [11], and driver assistance [34], [4].
Two major components can be distinguished in a typical
visual tracker. Target Representation and Localization is mostly a
bottom-up process which has also to cope with the changes in
the appearance of the target. Filtering and Data Association is
mostly a top-down process dealing with the dynamics of the
tracked object, learning of scene priors, and evaluation of
different hypotheses. The way the two components are
combined and weighted is application dependent and plays
a decisive role in the robustness and efficiency of the tracker.
For example, face tracking in a crowded scene relies more on
target representation than on target dynamics [21], while in
aerial video surveillance, e.g., [74], the target motion and the
ego-motion of the camera are the more important compo-
nents. In real-time applications, only a small percentage of the
system resources can be allocated for tracking, the rest being
required for the preprocessing stages or to high-level tasks
such as recognition, trajectory interpretation, and reasoning.
Therefore, it is desirable to keep the computational complex-
ity of a tracker as low as possible.
The most abstract formulation of the filtering and data
association process is through the state space approach for
modeling discrete-time dynamic systems [5]. The informa-
tion characterizing the target is defined by the state
sequence fxkgk0;1;..., whose evolution in time is specified
by the dynamic equation xk fk xkÿ1;vk. The available
measurements fzkgk1;... are related to the corresponding
states through the measurement equation zk hk xk;nk. In
general, both f k and hk are vector-valued, nonlinear, and
time-varying functions. Each of the noise sequences,
fvkgk1;... and fnkgk1;... is assumed to be independent and
identically distributed (i.i.d.).
The objective of tracking is to estimate the state xk given all
the measurements z1:k up that moment, or equivalently to
construct the probability density function (pdf)p xkjz1:k. The
theoretically optimal solution is provided by the recursive
Bayesian filter which solves the problem in two steps. The
prediction step uses the dynamic equation and the already
computed pdf of the state at time t kÿ 1, p xkÿ1jz1:kÿ1, to
derive the prior pdf of the current state, p xkjz1:kÿ1. Then, the
update step employs the likelihood function p zkjxk of the
current measurement to compute the posterior pdf p xkjz1:k).
When the noise sequences are Gaussian and fk and hk
are linear functions, the optimal solution is provided by the
Kalman filter [5, p. 56], which yields the posterior being also
Gaussian. (We will return to this topic in Section 6.2.) When
the functions fk and hk are nonlinear, by linearization the
Extended Kalman Filter (EKF) [5, p. 106] is obtained, the
posterior density being still modeled as Gaussian. A recent
alternative to the EKF is the Unscented Kalman Filter (UKF)
[42] which uses a set of discretely sampled points to
parameterize the mean and covariance of the posterior
density. When the state space is discrete and consists of a
finite number of states, Hidden Markov Models (HMM)
filters [60] can be applied for tracking. The most general
class of filters is represented by particle filters [45], also
called bootstrap filters [31], which are based on Monte Carlo
integration methods. The current density of the state is
represented by a set of random samples with associated
weights and the new density is computed based on these
samples and weights (see [23], [3] for reviews). The UKF can
be employed to generate proposal distributions for particle
filters, in which case the filter is called Unscented Particle
Filter (UPF) [54].
When the tracking is performed in a cluttered environ-
ment where multiple targets can be present [52], problems
564 IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 25, NO. 5, MAY 2003
. D. Comaniciu and V. Ramesh are with the Real-Time Vision and Modeling
Department, Siemens Corporate Research, 755 College Road East,
Princeton, NJ 08540. E-mail: comanici@scr.siemens.com.
. P. Meer is with the Electrical and Computer Engineering Department,
Rutgers University, 94 Brett Road, Piscataway, NJ 08854-8058.
Manuscript received 21 May 2002; revised 13 Oct. 2002; accepted 16 Oct.
2002.
Recommended for acceptance by M. Irani.
For information on obtaining reprints of this article, please send e-mail to:
tpami@computer.org, and reference IEEECS Log Number 116595.
0162-8828/03/$10.00 ß 2003 IEEE Published by the IEEE Computer Society
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime




