Sign up & Download
Sign in

Scale-invariant visual tracking by particle filtering

by Arie Nakhmani, Allen Tannenbaum
Proceedings of SPIE (2008)

Abstract

Visual tracking is an important task that has received a lot of attention in recent years. Robust generic tracking tools are of major interest for applications ranging from surveillance and security to image guided surgery. In these applications, the objects of interest may be translated and scaled. We present here an algorithm that uses scaled normalized cross-correlation matching as the likelihood within the particle filtering framework. We do not need color and contour cues in our algorithm. Experimental results with constant rectangular templates show that the method is reliable for noisy and cluttered scenarios, and provides accurate and smooth trajectories in cases of target translation and scaling.

Cite this document (BETA)

Available from Arie Nakhmani's profile on Mendeley.
Page 1
hidden

Scale-invariant visual tracking by particle filtering

Scale-invariant visual tracking by particle filtering
Arie Nakhmani*a, Allen Tannenbauma,b
aDept. of Electrical Engineering, Technion - Israel Institute of Technology, Haifa 32000, Israel
bSchools of Electrical and Computer and Biomedical Engineering, Georgia Institute of
Technology, Atlanta, Georgia 30332-0250
ABSTRACT
Visual tracking is an important task that has received a lot of attention in recent years. Robust generic tracking tools are
of major interest for applications ranging from surveillance and security to image guided surgery. In these applications,
the objects of interest may be translated and scaled. We present here an algorithm that uses scaled normalized cross-
correlation matching as the likelihood within the particle filtering framework. We do not need color and contour cues in
our algorithm. Experimental results with constant rectangular templates show that the method is reliable for noisy and
cluttered scenarios, and provides accurate and smooth trajectories in cases of target translation and scaling.
Keywords: Tracking, cross-correlation, CONDENSATION algorithm, scale-invariant, surveillance
1. INTRODUCTION
In this note, we investigate the problem of tracking arbitrary targets in video sequences. Many of the algorithms available
tend to be application-specific, are appropriate for a very limited class of video sequences, and suppose strong prior
information on the tracked target (e.g., shape, texture, size, color, camera dynamics, or motion constraints). On the other
hand, a number of more generic target visual tracking algorithms search for distinctive features that can be followed
from frame to frame. For these reasons, any progress on general arbitrary target (without distinctive features) trackers
will be of interest for active vision, recognition, and surveillance applications.
In the present work, we propose a video tracking framework for tracking non-articulated (blob-like) targets, which lack
prominent features. The proposed algorithm works in a variety of scenarios, and deals naturally with clutter and noise in
the scenes, target scaling, and low contrast targets. The most important assumption is that the target motion and scaling
are smooth, without abrupt changes.
We suppose that the target of interest is selected (by human operator or by automatic detection algorithm) in the first
frame of video sequence. Tracking is performed by acquiring the target’s centroid trajectory in a given bounding box.
We should note that this problem formulation is not new, and a large literature is available on this topic. We mention
here only a few of the most relevant works for the approach taken in this paper. The comprehensive survey on visual
tracking methods can be found in the paper by Yilmaz et.al. [1]. A deep analysis of particle filters is provided in [2],
where rigorous theory and applications of particle filters are presented. Also, a powerful application of particle filters to
image sequences (CONDENSATION algorithm) can be found in the paper by Blake and Isard [3]. The possible
solutions to scale invariant template matching are presented in [4-6]; see these works and the references therein.
Although several attempts of combining area template matching with particle filtering have been made previously [7,
8], they used adaptive and learning schemes which makes them different from the algorithm given in this paper.
The remainder of this paper is organized as follows. Section 2 explains the scale invariant template-matching problem.
We briefly discuss the classical template matching with the normalized cross-correlation coefficient function (NCC), and
we define the concept of scaled cross-correlation (SNCC). In Section 3, we consider the general problem of tracking with
particle filters, and present the algorithm using measurement steps that are based on SNCC. In Section 4, we test our
algorithm on three video sequences that illustrate some of its key features. Finally, in Section 5, we summarize our
research, and present the conclusions. We also discuss several problems that still need to be solved, and propose the
future directions for the research.
Page 2
hidden
2. SCALE-INVARIANT TEMPLATE MATCHING
Let I(m,n) denote the intensity value of the image (or the search region), and P(i,j) denote the intensity value of the
template patch. We assume that the size of I is Mx My, and the size of P is Nx Ny. Clearly, we assume that the size of I
is greater than the size of P. It is known that the noisy version of the patch is placed somewhere in the image I. Our goal
is to determine the most probable position of the patch in image I. The standard approach to this problem is to compute
the coordinates of the maximum normalized cross-correlation coefficient (NCC) between the image and the template.
These coordinates represent the location of the best match. The normalized cross-correlation coefficient is defined for
any pixel (m,n) by:
1 1
2 2
1 1 1 1
( ( 1, 1) ( , ))( ( , ) )
( , )
( ( 1, 1) ( , )) ( ( , ) )
yx
y yx x
NN
i j
N NN N
i j i j
I i m j n I m n P i j P
NCC m n
I i m j n I m n P i j P
 
   
     

     

 
(1)
where the mean intensity is defined by:
1 1
1 ( , ),
yx NN
x y i j
P P i jN N  
  (2)
1 1
1( , ) ( 1, 1),
yx NN
x y i j
I m n I i m j n
N N  
     (3)
1,2,..., 1,
1,2,..., 1.
x x
y y
m M N
n M N
  
   (4)
The values of NCC(m,n) are between -1 and 1 (1 for perfect match, and 0 for “no correlation”).
The technique presented here is used in many practical applications, and has demonstrated robustness to noise and
intensity variations [9]. Unfortunately, this technique fails in the case of a scaling (zoom) of the desired target in the
image I. The straightforward solution to this problem is to find the location of maximum for the scaled normalized cross-
correlation function (SNCC):
1 1
2 2
1 1 1 1
( ( , ))( ( , ) )
( , , )
( ( , )) ( ( , ) )
yx
y yx x
NN
i j
N NN N
i j i j
J J m n P i j P
SNCC m n s
J J m n P i j P
 
   
 

 

 
(5)
where
s – is the scaling factor (>0),
J = I(m+s(i-1), n+s(j-1)) (if the indices are not integer, then they should be rounded, or the value of J should be
interpolated from the closest neighbors),
P - is defined in (2), and
1 1
1( , ) ( ( 1), ( 1))
yx NN
x y i j
J m n I m s i n s jN N  
     (6)
Page 3
hidden
In other words, the template patch is compared to the scaled version of the image I, and the best match is found. Since
the number of possible scalings is infinite, even the approximate solution by scale grating can be very computationally
demanding, and not appropriate for real-time applications. We propose to overcome this problem by assuming that the
scale does not change abruptly, therefore it can be modeled as a simple Markov process, e.g., for the frame k:
1 0; ~ (0, ); 1k k k ks s v v N s    (7)
Remarks:
1) One should make sure that ks remains positive for each frame.
2) If some prior knowledge about changes in scale is available, this knowledge can be incorporated into the model
by modifying the distribution of kv . For example, if we suppose that most of the time the scale will not change,
then we should choose the truncated normal distribution added to delta distribution at s=0.
This definition fits well into the particle filtering framework, and makes the problem tractable. Furthermore, we are
interested only in non-negative values of SNCC, thus we use the half-wave rectified scaled cross-correlation, in which
the negative values replaced by zeros.
In the next section, we will combine the advantages of the SNCC and particle filtering techniques.
3. PARTICLE FILTERING
3.1 Particle filtering
Our tracker is based on the CONDENSATION algorithm proposed by Isard and Blake [3]. In this section, a short
overview of the algorithm is given, and the application to scale invariant tracking is presented. The algorithm uses the
SNCC as the likelihood for determining the target’s position. We refer the reader to reference [2] for the complete
background on particle filtering.
In general, the goal of particle filtering is to estimate the sequence of hidden state parameters Xk, based only on the
observed data Zk. These estimates follow from the posterior distribution P(Xk|Z0,Z1,…,Zk). It is assumed that the state and
the observations are first order Markov processes, and each Zk depends only on Xk. The particle filter estimates the
P(Xk|Z0,Z1,…,Zk) distribution, and it does not require any linearity or Gaussian assumptions on the model. The particle
filter will generate a set of N samples that approximate the filtering distribution. For the k-th frame, we denote the state
vector by Xk=(x1,x2,…). For example, the state can be the top-left corner coordinates of the desired target (x1=x, x2=y) in
the frame, and its scaling (x3=s). Additionally, the state can include velocity and acceleration of the target. The state
estimate is recursively obtained as follows:
0 1 1 0 1 1 1( | , ,... ) ( | , ,... ) ( | )k k k k k k kP X Z Z Z P X Z Z Z P X X     (8)
where
( )( | ) ( | )SNCCk k k k kP Z X P Z X SNCC    (9)
The prediction step that corresponds to the distribution 1( | )k kP X X  is governed by system state dynamical equations.
For example, if state time evolution is assumed to be smoothly changing, and there is no additional information about the
target dynamics, then the simplest model given by
1 , ~ (0, )k k k kX X v v N   (10)
is many times appropriate. The mean of kX over all the particles is approximately the actual value of kX .
Page 4
hidden
3.2 The algorithm
The state estimation is carried out by updating weighted particles according to (8). The following table summarizes the
algorithm steps.
INITIALIZATION
The N particles ( )0 , ( 1,..., )nX n N are drawn from the uniform distribution, or selected by the operator.
For each video frame (k-th frame), we perform the following steps:
STEP 1:
Using the particles from previous frame, predict the new state by sampling from:
( ) ( )
1~ ( | ).n nkk kX P X X  (11)
STEP 2:
Measure and weight the new position in terms of the measured features kZ :
( ) ( )( )
( ) ( ) ( )
1
( | ),
.
n nSNCC
kk k
n n n
k k k
P Z X
w w




(12)
STEP 3:
Resample the particles ( ) , ( 1,..., )nkX n N according to the weight ( )nkw .
STEP 4:
Compute the state estimate from:
( )
1
1ˆ ,
N
n
k k
n
X X
N 
  (13)
and repeat the steps (1-4) for the next video frame.
The result of this algorithm is the estimated state Xˆ , that includes the information about the position and scaling of the
tracked target in every video frame.
4. EXPERIMENTAL RESULTS AND DISCUSSION
We tested the proposed algorithm in various situations, including highly cluttered exterior scenes with shadows and
partial occlusions with a high rate of success. A single template was used for every video.
 We chose the simplest motion model (10).
 We selected the target manually in the first video frame.
 We tracked the targets with 60 particles.
 The video resolution is 240x320, and the frame rate is 25 frames per second.
Page 5
hidden
Figure 1: Maneuvering vehicle sequence with the tracking results.
4.1 Sequence 1: Maneuvering Vehicle
In the first sequence, we want to track a vehicle. Despite the significant zoom and moving camera, our tracker manages
to follow the target (see Figure 1). This video represents a challenging scenario for tracking in outdoor conditions.
4.2 Sequence 2: Boat
In the second sequence the boat is tracked. The contrast of the boat with the background is so low, that the following the
boat is hard even for a human observer (see Figure 2). Additionally, the scene is very noisy (water glare and the plume
behind the boat). The tracker manages to overcome these problems. Although in frame 798 the tracker has the wrong
estimate of scale (because of bad measurements), the algorithm reestablishes the correct estimate after a few frames.
Page 6
hidden
Figure 2: The boat sequence with the tracking results.
4.3 Sequence 3: A Crowded Party
In this sequence, we want to track a single person in a large crowd. The results of tracking are shown in Figure 3. In the
frame 83 the person tracked, despite the variations in the form and partial occlusion. In the frames 115-125, a full
occlusion occurs. At frame 123, our tracker temporary lost track and the scaling is wrong. Nevertheless, the tracker finds
the right position after the person reappears.
We note that for all sequences, we used simple target dynamics model and a constant template. We assumed that no
additional information is given about the target, besides the template. With learned higher order models, and smoothly
changing adaptive template we expect to get even better results with the same algorithm.
Page 7
hidden
Figure 3: Crowded party sequence with the tracking results.
5. CONCLUSION
In this paper, we presented an algorithm for tracking video sequences of scaled and translated targets without the need
for adaptation and learning mechanisms. Using a rather low dimensional state space, we achieve robust tracking results
with many complicated and cluttered real world video sequences, including sequences with a moving camera.
The combination of the particle filter with a correlation tracker makes it possible to get smooth target trajectories. The
algorithm can cope with translations, and moderate deformations of the tracked target, when the deformations affect only
a small portion of pixels in the template. The algorithm is appropriate also for small targets with low contrast. The
algorithm is time efficient, and should be suitable for real-time applications.
The disadvantage of our approach is that it is not capable to track the targets subjected to large rotations. The problems
of partial and full occlusions should be addressed too. The next step in our research will be to add rotation states to the
particle filter definition, and to choose good dynamic models for rotation, to achieve rotation invariant tracking. In
addition, other types of correlation measures should be tested. Finally, in the future, the algorithm should be extended for
multiple target tracking.
Page 8
hidden
REFERENCES
[1] Yilmaz, A., Javed, O., and Shah, M., "Object Tracking: A Survey," ACM Computing Surveys, Vol. 38(4), (2006).
[2] Doucet, A., de Freitas, N., and Gordon, N., Sequential Monte Carlo Methods in Practice, Springer, (2001).
[3] Isard, M., and Blake, A., "CONDENSATION—Conditional Density Propagation for Visual Tracking," International
Journal of Computer Vision, Vol. 29(1), pp. 5-28, (1998).
[4] Cahn von Seelen, U.M., and Bajcsy, R.,"Adaptive Correlation Tracking of Targets with Changing Scale,"
Reconnaisance, Surveillance, and Target Acquisition for the Unmanned Ground Vehicle, Morgan Kaufmann
Publishers, San Francisco, CA, pp. 313-322, (1997).
[5] Zhao, F., Huang, Q., and Gao, W., "Image Matching by Normalized Cross-Correlation," ICASSP Proceedings,
(2006).
[6] Ooi, J., and Rao, K., "New Insights Into Correlation-Based Template Matching," Proceedings of SPIE, Vol. 1468,
pp. 740-751, (1991).
[7] Mei, X., Zhou, S.K., and Porikli, F., "Probabilistic Visual Tracking via Robust Template Matching and Incremental
Subspace Update," IEEE International Conference on Multimedia and Expo, pp. 1818-1821,( 2007).
[8] Zhou, S., Chellappa, R., and Moghaddam, B., "Appearance Tracking Using Adaptive Models in a Particle Filter,"
Proc. of Asian Conf. on Computer Vision, (2004)
[9] Lewis, J.P., "Fast Normalized Cross-Correlation," Vision Interface, Quebec, Canada, pp. 120-123, (1995).

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

1 Reader on Mendeley
by Discipline
 
by Academic Status
 
100% Ph.D. Student
by Country
 
100% United States