Robust automated registration in the spatial domain
Available from
Stefan van der Walt's profile on Mendeley.
Page 1
Robust automated registration in the spatial domain
Robust automated registration in the spatial domain
StØfan van der Walt, Ben Herbst
Department of Electrical and Electronic Engineering,
University of Stellenbosch, South Africa
stefan@sun.ac.za
Abstract
Image registration algorithms have a great number of
applications. Recently, many such methods have been
developed, based on invariant localised interest points.
These methods are fast, accurate and generally robust
ideal in almost every way. Unfortunately, they are inad-
equate under certain circumstances. Zokai & Wolberg
suggested the log-polar transform (LPT) for use in cases
involving large changes in scale and rotation. We demon-
strate that in addition to these properties, the LPT is use-
ful in cases where other local descriptors fail. Although
registration using the LPT is normally computationally
intensive, we suggests ways in which its computational
cost may be signi cantly reduced. Stacking of images,
often used in astronomy, as well as panoramic stitching
are used as examples.
1. Introduction
Registration algorithms can be divided into two broad
classes: those that operate in the spatial and frequency
(i.e. Fourier) domains, respectively. In the spatial do-
main, there are sparse methods including local descrip-
tors, that depend on some form of feature extraction, and
dense methods that operate directly on image values such
as optical ow and correlation. The two classes generally
differ in that the spatial methods are localised, whereas
the frequency domain methods [15, 6, 8, 7] operate glob-
ally. Attempts have been made to bridge this gap, by us-
ing wavelet and other transforms to locate information-
carrying energy [4]. These have been met with varying
success.
Each registration method has its own particular ad-
vantages and disadvantages. Fourier methods, for exam-
ple, are fast but inaccurate, suffer from resampling and
occlusion effects [16, p. 1425], and only operate glob-
ally. Iterative registration, on the other hand, is highly
accurate but extremely slow, and prone to misregistration
due to local minima in the minimisation space.
These problems led to the development of methods
based on localised interest points [1, 2, 10, 17, 18], such
as the scale-invariant feature transform (SIFT) [13], the
fast Speeded Up Robust Features (SURF) [9] and oth-
ers [11]. All these methods depend on unique localised
features, which are available in many images. There are,
however, cases where it is very dif cult to distinguish one
feature from another without examining its spatial con-
text.
As an example, we will use frames recorded by a
CCD mounted on a telescope pointing at a deep-space
object. It is very dif cult to nd features to track in these
images, because the stars (all potential features) are virtu-
ally identical and rotationally invariant. Since local fea-
tures fail, and global methods are slow and unreliable, we
would like to nd an algorithm that can bridge the gap.
We will proceed to show that the log-polar transform
(LPT) is an ideal candidate. While previously its use has
been limited due to its high computational cost, we de-
velop ways of reducing those costs and making the LPT
behave more like local features.
2. The log polar transform
The log-polar transform (LPT) spatially warps an image
onto new axes, angle (θ) and log-distance (L). Using the
centre of the image, (xc, yc) as reference, pixel coordi-
nates (x, y) are written in terms of their offset from the
centre,
x¯ = x− xc
y¯ = y − yc.
For each pixel, the angle is de ned by
θ =
{
arctan
(y¯
x¯
)
x¯ 6= 0
0 x¯ = 0
with a distance of
L = logb
(√
x¯2 + y¯2
)
.
The base, b, which determines the width of the transform
output, is chosen to be
b = eln(d)/w = d 1w ,
where d is the distance from (xc, yc) to the corner of the
image, and w is the width or height of the input image,
whichever is largest.
StØfan van der Walt, Ben Herbst
Department of Electrical and Electronic Engineering,
University of Stellenbosch, South Africa
stefan@sun.ac.za
Abstract
Image registration algorithms have a great number of
applications. Recently, many such methods have been
developed, based on invariant localised interest points.
These methods are fast, accurate and generally robust
ideal in almost every way. Unfortunately, they are inad-
equate under certain circumstances. Zokai & Wolberg
suggested the log-polar transform (LPT) for use in cases
involving large changes in scale and rotation. We demon-
strate that in addition to these properties, the LPT is use-
ful in cases where other local descriptors fail. Although
registration using the LPT is normally computationally
intensive, we suggests ways in which its computational
cost may be signi cantly reduced. Stacking of images,
often used in astronomy, as well as panoramic stitching
are used as examples.
1. Introduction
Registration algorithms can be divided into two broad
classes: those that operate in the spatial and frequency
(i.e. Fourier) domains, respectively. In the spatial do-
main, there are sparse methods including local descrip-
tors, that depend on some form of feature extraction, and
dense methods that operate directly on image values such
as optical ow and correlation. The two classes generally
differ in that the spatial methods are localised, whereas
the frequency domain methods [15, 6, 8, 7] operate glob-
ally. Attempts have been made to bridge this gap, by us-
ing wavelet and other transforms to locate information-
carrying energy [4]. These have been met with varying
success.
Each registration method has its own particular ad-
vantages and disadvantages. Fourier methods, for exam-
ple, are fast but inaccurate, suffer from resampling and
occlusion effects [16, p. 1425], and only operate glob-
ally. Iterative registration, on the other hand, is highly
accurate but extremely slow, and prone to misregistration
due to local minima in the minimisation space.
These problems led to the development of methods
based on localised interest points [1, 2, 10, 17, 18], such
as the scale-invariant feature transform (SIFT) [13], the
fast Speeded Up Robust Features (SURF) [9] and oth-
ers [11]. All these methods depend on unique localised
features, which are available in many images. There are,
however, cases where it is very dif cult to distinguish one
feature from another without examining its spatial con-
text.
As an example, we will use frames recorded by a
CCD mounted on a telescope pointing at a deep-space
object. It is very dif cult to nd features to track in these
images, because the stars (all potential features) are virtu-
ally identical and rotationally invariant. Since local fea-
tures fail, and global methods are slow and unreliable, we
would like to nd an algorithm that can bridge the gap.
We will proceed to show that the log-polar transform
(LPT) is an ideal candidate. While previously its use has
been limited due to its high computational cost, we de-
velop ways of reducing those costs and making the LPT
behave more like local features.
2. The log polar transform
The log-polar transform (LPT) spatially warps an image
onto new axes, angle (θ) and log-distance (L). Using the
centre of the image, (xc, yc) as reference, pixel coordi-
nates (x, y) are written in terms of their offset from the
centre,
x¯ = x− xc
y¯ = y − yc.
For each pixel, the angle is de ned by
θ =
{
arctan
(y¯
x¯
)
x¯ 6= 0
0 x¯ = 0
with a distance of
L = logb
(√
x¯2 + y¯2
)
.
The base, b, which determines the width of the transform
output, is chosen to be
b = eln(d)/w = d 1w ,
where d is the distance from (xc, yc) to the corner of the
image, and w is the width or height of the input image,
whichever is largest.
Page 2
0 50 100 150 200
0
50
100
150
200
Colour input
1.00 3.45 11.89 41.01 141.42
Distance from centre
0
pi
2pi
A
n
gl
e
(
θ
)
Log polar transform
Figure 1: Illustration of the log-polar transform.
When warping images, it is not possible to use the for-
ward transform. Since we use discrete coordinates (inte-
ger x and y values), more than one input coordinate may
map to the same output coordinate. Worse still, not every
output coordinate will be covered.
One solution is to calculate the irregular grid of co-
ordinates obtained by transforming each input coordi-
nate (without discretising). Then, the input is warped
and resampled (using interpolation) at the required output
positions. An easier and computationally less intensive
method is to reverse the process. For each output coor-
dinate, the transformation is applied in reverse, to obtain
a coordinate in the input image. Using interpolation, an
output value is determined from the input. This can be
done if, ignoring the effect of discretisation, the trans-
formation function is bijective (a one-to-one correspon-
dence, and all input and output coordinates are mapped).
Given θ and L , we would now like to nd x and y.
First, calculate the distance r from the centre,
r = eln(b)L
= eL ln(d)/w
after which x and y can be recovered as
x = r cos(θ) + xc
y = r sin(θ) + yc.
Note the relationship of the input image to the axes of the
LPT: if the input is rotated it results in a shift in the θ axis,
whereas scaling the input is seen as a shift in the L axis.
It is this property of the LPT that is used in registration.
3. Fast registration based on the log-polar
transform
In [16], af ne registration based on the log-polar trans-
form is described. Given a reference frame, R(x, y), and
a target frame, B(x, y), we want to nd a transformation
T , such that
R(x, y) = B(T (x, y)).
Assuming that the frames are images taken of the same
object from a long distance, we know that the transforma-
tion must be a similarity, i.e. it is limited to translation,
rotation and scale. If we express a coordinate (x, y) as a
homogeneous coordinate p = [x, y, 1]T , we can view the
transformation as a matrix multiplication,
T (p) = Mp
where
M =
s cos(θ) −s sin(θ) tx
s sin(θ) s cos(θ) ty
0 0 1
=
[
sR t
0T 1
]
and R represents rotation, t translation and s scale. The
LPT proceeds as follows:
• From the centre of the reference frame, pr =
[xr, yr]T , cut a square roughly 20% the size of the
image (or at least the size of the objects we wish to
track), and obtain its LPT. The square represents a
feature.
0
50
100
150
200
Colour input
1.00 3.45 11.89 41.01 141.42
Distance from centre
0
pi
2pi
A
n
gl
e
(
θ
)
Log polar transform
Figure 1: Illustration of the log-polar transform.
When warping images, it is not possible to use the for-
ward transform. Since we use discrete coordinates (inte-
ger x and y values), more than one input coordinate may
map to the same output coordinate. Worse still, not every
output coordinate will be covered.
One solution is to calculate the irregular grid of co-
ordinates obtained by transforming each input coordi-
nate (without discretising). Then, the input is warped
and resampled (using interpolation) at the required output
positions. An easier and computationally less intensive
method is to reverse the process. For each output coor-
dinate, the transformation is applied in reverse, to obtain
a coordinate in the input image. Using interpolation, an
output value is determined from the input. This can be
done if, ignoring the effect of discretisation, the trans-
formation function is bijective (a one-to-one correspon-
dence, and all input and output coordinates are mapped).
Given θ and L , we would now like to nd x and y.
First, calculate the distance r from the centre,
r = eln(b)L
= eL ln(d)/w
after which x and y can be recovered as
x = r cos(θ) + xc
y = r sin(θ) + yc.
Note the relationship of the input image to the axes of the
LPT: if the input is rotated it results in a shift in the θ axis,
whereas scaling the input is seen as a shift in the L axis.
It is this property of the LPT that is used in registration.
3. Fast registration based on the log-polar
transform
In [16], af ne registration based on the log-polar trans-
form is described. Given a reference frame, R(x, y), and
a target frame, B(x, y), we want to nd a transformation
T , such that
R(x, y) = B(T (x, y)).
Assuming that the frames are images taken of the same
object from a long distance, we know that the transforma-
tion must be a similarity, i.e. it is limited to translation,
rotation and scale. If we express a coordinate (x, y) as a
homogeneous coordinate p = [x, y, 1]T , we can view the
transformation as a matrix multiplication,
T (p) = Mp
where
M =
s cos(θ) −s sin(θ) tx
s sin(θ) s cos(θ) ty
0 0 1
=
[
sR t
0T 1
]
and R represents rotation, t translation and s scale. The
LPT proceeds as follows:
• From the centre of the reference frame, pr =
[xr, yr]T , cut a square roughly 20% the size of the
image (or at least the size of the objects we wish to
track), and obtain its LPT. The square represents a
feature.
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
2 Readers on Mendeley
by Discipline
50% Mathematics
by Academic Status
50% Researcher (at an Academic Institution)
50% Professor
by Country
50% South Africa
50% Ghana



