Sign up & Download
Sign in

Superresolution imaging in live

by Julie S Biteen, Michael A Thompson, Nicole K Tselentis, Lucy Shapiro, W E Moerner
Proceedings of SPIE (2009)

Cite this document (BETA)

Available from link.aip.org
Page 1
hidden

Superresolution imaging in live

Super-resolution Imaging
by
Stéfan Johann van der Walt
Dissertation presented for the degree Doctor of Philosophy in
Engineering at Stellenbosch University
Promotor: Prof. Barend M. Herbst
Applied Mathematics, Mathematical Sciences
December 2010
Page 2
hidden
Declaration
By submitting this dissertation electronically, I declare that the entirety of the
work contained therein is my own, original work, that I am the owner of the
authorship thereof (unless to the extent explicitly otherwise stated) and that
I have not previously in its entirety or in part submitted it for obtaining any
qualification.
S.J. van der Walt 23 April 2010
i
Page 3
hidden
Copyright © 2010 Stellenbosch University
All rights reserved.
Page 4
hidden
Abstract
Super-resolution Imaging
S.J. van der Walt
Applied Mathematics
Stellenbosch University
Private Bag X1, Matieland 7602,
South Africa
Dissertation: PhDEng
December 2010
Super-resolution imaging is the process whereby several low-resolution pho-
tographs of an object are combined to form a single high-resolution estimation.
We investigate each component of this process: image acquisition, registra-
tion and reconstruction. A new feature detector, based on the discrete pulse
transform, is developed. We show how to implement and store the transform
efficiently, and how to match the features using a statistical comparison that
improves upon correlation under mild geometric transformation. To simplify
reconstruction, the imaging model is linearised, whereafter a polygon-based in-
terpolation operator is introduced to model the underlying camera sensor. Fi-
nally, a large, sparse, over-determined system of linear equations is solved, using
regularisation. The software developed to perform these computations is made
available under an open source license, and may be used to verify the results.
iii
Page 5
hidden
Samevatting
Super-resolusie Beeldvorming
S.J. van der Walt
Toegepaste Wiskunde
Universiteit van Stellenbosch
Privaatsak X1, 7602 Matieland, Suid-Afrika
Proefskrif: PhDIng
Desember 2010
In super-resolusie beeldvorming word verskeie lae-resolusie foto's van 'n onder-
werp gekombineer in 'n enkele, hoë-resolusie afskatting. Ons ondersoek elke
stap van hierdie proses: beeldvorming, -belyning en hoë-resolusie samestelling.
'n Nuwe metode wat staatmaak op die diskrete pulstransform word ontwikkel
om belangrike beeldkenmerke te vind. Ons wys hoe om die transform effektief
te bereken en hoe om resultate kompak te stoor. Die kenmerke word vergelyk
deur middel van 'n statistiese model wat bestand is teen klein lineêre beeld-
vervormings. Met die oog op 'n vereenvoudigde samestellingsberekening word
die beeldvormingsmodel gelineariseer. In die nuwe model word die kamerasensor
gemodelleer met behulp van veelhoek-interpolasie. Uiteindelik word 'n groot, yl,
oorbepaalde stelsel lineêre vergelykings opgelos met behulp van regularisering.
Die sagteware wat vir hierdie berekeninge ontwikkel is, is beskikbaar onderhewig
aan 'n oopbron-lisensie en kan gebruik word om die gegewe resultate te verifieer.
iv
Page 6
hidden
To my father, who opened so many doors for me,
and to my mother, who taught me to cross their thresholds with a smile.
Page 7
hidden
Acknowledgements
As with many a large undertaking, doctoral research affects the candidate's life
in ways unanticipated beforehand. I'd like to thank those friends, colleagues
and family members who supported me during this fascinating (and sometimes
daunting!) journey. At the risk of neglecting some, I'd like to mention a few
individuals and institutions by name:
ˆ My study advisor, Prof. Ben Herbst, who allowed me the freedom to
explore and almost certainly had more faith in me than I deserved.
ˆ DPSS (Defence Peace, Safety & Security, a branch of the CSIR), the
National Research Foundation and Stellenbosch University who supported
me financially. Additional funding for overseas travel was provided by
Enthought, Inc., the Python Software Foundation and the Helen Wills
Neuroscience Institute of the University of California at Berkeley.
ˆ Francois Malan, whose insights during discussions on the subtler aspects
of digital photography proved invaluable.
ˆ Michèlewithout your encouragement and love, this text may never have
been written.
Data Sources
ˆ Douglas Kerr, who gratiously allowed me to use a figure from his paper
entitled  The Proper Pivot Point for Panoramic Photography.
ˆ The library data sequence used in Chapter 7 is by Barbara Levienaise-
Obadia, University of Surrey. Tomas Pajdla and Daniel Martinec, CMP,
Prague provided the text dataset. Both sets were collected by David
Capel and are available for download from http://www.robots.ox.ac.
uk/~vgg/data.
ˆ Chelsea the Cat, who proved decidedly difficult to photograph, makes her
appearance in Chapters 4 and 7.
ˆ An illustration of the steps taken during gradient-based optimisation (see
Chapter 7) was made by Oleg Alexandrov, who kindly released his work
into the public domain.
vi
Page 8
hidden
Contents
Abstract iii
Samevatting iv
Contents vii
1 Introduction 1
2 Image acquisition 8
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.2 Digital photography . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.1 Lens distortions . . . . . . . . . . . . . . . . . . . . . . . . 9
2.2.2 Effects due to motion . . . . . . . . . . . . . . . . . . . . 9
2.2.3 Sensor layout . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.4 Diffraction limited photography . . . . . . . . . . . . . . . 14
2.2.5 Noise processes . . . . . . . . . . . . . . . . . . . . . . . . 14
3 The Discrete Pulse Transform 20
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Background theory . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3.1 Pulse representation and manipulation . . . . . . . . . . . 23
3.3.2 Algorithm for computing the 2D Discrete Pulse Transform 26
3.3.3 Benchmark . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4 Feature detection and matching 34
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Feature detection using the discrete pulse transform . . . . . . . 35
4.2.1 Scale space . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Pulse strength . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Matching and correspondence . . . . . . . . . . . . . . . . . . . . 39
4.3.1 Statistical feature comparison . . . . . . . . . . . . . . . . 39
4.3.2 Other matching methods . . . . . . . . . . . . . . . . . . 42
4.4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
vii
Page 9
hidden
CONTENTS viii
5 Accurate image registration 47
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.2 The transformation matrix . . . . . . . . . . . . . . . . . . . . . 48
5.3 Sparse registration: Estimating a homography from correspon-
dences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.3.1 Least-squares estimation . . . . . . . . . . . . . . . . . . . 50
5.3.2 Estimation in the presence of outliers . . . . . . . . . . . 54
5.4 Dense registration methods . . . . . . . . . . . . . . . . . . . . . 58
5.4.1 Error minimisation . . . . . . . . . . . . . . . . . . . . . . 58
5.4.2 Pyramidal methods . . . . . . . . . . . . . . . . . . . . . 60
5.4.3 Mutual information . . . . . . . . . . . . . . . . . . . . . 61
5.4.4 Log-polar registration . . . . . . . . . . . . . . . . . . . . 62
5.5 Photometric registration . . . . . . . . . . . . . . . . . . . . . . . 64
6 Super-resolution image processing 73
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.2 The dawn of super-resolution . . . . . . . . . . . . . . . . . . . . 74
6.2.1 Why is super-resolution possible? . . . . . . . . . . . . . . 76
6.2.2 Maximum a posteriori estimate . . . . . . . . . . . . . . . 76
6.2.3 Pixel probabilities . . . . . . . . . . . . . . . . . . . . . . 77
6.2.4 Camera parameters . . . . . . . . . . . . . . . . . . . . . 78
6.2.5 Boundary effects . . . . . . . . . . . . . . . . . . . . . . . 79
6.2.6 The point-spread function . . . . . . . . . . . . . . . . . . 79
6.2.7 Solving the linear system . . . . . . . . . . . . . . . . . . 80
6.3 Other approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.1 Averaging . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.3.2 Map and deblur . . . . . . . . . . . . . . . . . . . . . . . 81
6.3.3 Pan-sharpening . . . . . . . . . . . . . . . . . . . . . . . . 82
6.3.4 Compressive sampling / compressed sensing . . . . . . . . 82
7 Super-resolution as a sparse linear problem 85
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7.2 The camera matrix, A . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3 Linear interpolation operators . . . . . . . . . . . . . . . . . . . . 89
7.3.1 Bilinear interpolation . . . . . . . . . . . . . . . . . . . . 89
7.3.2 Polygon-based interpolation . . . . . . . . . . . . . . . . . 92
7.4 Solving the large, sparse least-squares problem Ax = b . . . . . . 98
7.4.1 Iterative optimisation methods . . . . . . . . . . . . . . . 100
7.4.2 Iterative-interpolation super-resolution . . . . . . . . . . . 103
7.5 Structural metrics . . . . . . . . . . . . . . . . . . . . . . . . . . 103
Page 10
hidden
CONTENTS ix
7.6 Sensitivity to photometric registration . . . . . . . . . . . . . . . 105
7.7 Recursive implementation . . . . . . . . . . . . . . . . . . . . . . 105
7.8 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
8 Conclusion 113
A Data-set format A-1
B Software API B-1
Page 12
hidden
Chapter 1
Introduction
Unlike many other branches of science, students of digital image
warping benefit from the direct visual realization of mathematical
abstractions and concepts. As a result, readers are fortunate to have
images clarify what mathematical notation sometimes obscures. This
makes the study of digital image warping a truly fascinating and en-
joyable endeavor.
 George Wolberg [Wol90] in a quote equally valid for computer
vision and digital image processing.
Satellites cannot be re-launched and, as far as we are aware, time travel is not
possible
1
. These constraints indicate two situations in which super-resolution
imaging could be applied. For example, a satellite designed to monitor crop
growth may later be retargetted for military observation. For the latter purpose,
the resolution (or sampling interval) of recorded signals, such as photographs or
videos, need to be much higher. In most cases, it would not be cost-effective to
refurbish the satellite camera with a new lens (although this is sometimes done,
as with NASA's Hubble service missions). Another example is security footage,
typically taken with a low-cost camera. A criminal act is committed, but the
face of the perpetrator cannot be distinguished due to the low resolution of the
footage. In both these situations, a signal that would otherwise be of little use
may yield important information after applying image super-resolution.
The Reginald Denny case was the first time that super-resolution techniques
were used in a United States courtroom [Mor97]. Denny, a truck driver, was
nearly beaten to death during the 1992 Los Angeles riots, but his assailants
were captured on video by a news helicopter. To prove the identity of one of
the mobsters, super-resolution-like techniques were applied to a video segment
showing a rose tattoo on his arm [NYN93]. In the end, this evidence was not
enough for a conviction, but it paved the way for similar image processing
technology in the American courtroom.
1
This statement may be refuted by future time-travellers.
1
Page 13
hidden
CHAPTER 1. INTRODUCTION 2
No matter how high the resolution of a camera system, there often exists
the need to improve it. This was the driving force for research done at NASA
Ames that led to the first mathematical super-resolution formulation [CKK
+
93].
Making the best of the existing situation, they applied their techniques to pho-
tos taken during the Pathfinder mission, the result of which is shown at the
beginning of Chapter 6 in Figure 6.1.
We should point out that resolution is a nebulous concept. It is easy
to increase the resolution of an image by simply scaling (resizing) it, possibly
using an interpolation method such as Lanczos or cubic interpolation. While
this increases the number of pixels, it does not add any information to the
image. In this work, we think of improved resolution as an increase in high-
frequency informationthe detail that provides definition on a small scale. We
aim to make blurred text legible, or to recognise faces that were previously
unrecogniseable.
We can therefore describe image super-resolution as a class of algorithms
that combine several low-resolution (LR) images into a single high-resolution
(HR) image of improved detail. If all low-resolution images are identical, no
improvement is possible, but in practice even images taken of a static object from
the same camera position contain differing information due to signal noise. For
example, a common technique used by amateur astronomers is to photograph
a section of the sky with an inexpensive CCD-camera mounted on a tracking
telescope. Any one of the resulting images is of little value on its own, but by
adding them together the contribution of the signal is raised with respect to
zero-mean noise; stars and galaxies appear as if by magic.
To perform super-resolution imaging, we need to model the entire photo-
graphic process: from the time the light reflects off a surface until digital data
arrives on our computers. This process encompasses a large number of distor-
tions, noise processes and other non-linear transformationstoo many to model
realistically. We therefore make numerous assumptions and simplifications to
find a suitable linear model that leads to viable computation. Our model is
expressed simply as the (overdetermined) set of linear equations,
Ax = b;
where x is a representation of the scene in high-resolution (this is what we
would like to estimate), b are our low resolution images and A is a matrix that
encompasses the entire photographic process.
Almost all our effort is spent finding the matrix A that models the relation
of the available low-resolution photographs to the desired high-resolution scene
photograph all the way down to individual pixel level. Thereafter, finding x is
Page 14
hidden
CHAPTER 1. INTRODUCTION 3
Figure 1.1: An overview of the super-resolution process.
a matter of solving a large, sparse linear system.
The results shown in Chapter 7 provide convincing evidence that super-
resolution does workmaybe not as well as in the popular television series
Crime Scene Investigationbut well, nonetheless.
Structure of this document
Super-resolution imaging is not a single operation, but rather a combined se-
quence of algorithms, each of which is discussed in a separate chapter of this
dissertation (see Figure 1.1). Each chapter opens with a concise introduction to
the topic, accompanied by a list of references to existing literature. Due to the
generic nature of these subjects, a comprehensive list is not feasible; instead, we
attempt to collect a representative sub-set of influential papers in each field.
An important complement to this document is the accompanying software
package. Developed under a free, open source license, this Python and C based
library implements all the fundamental algorithms required to continue further
research on super-resolution. Snippets of code are given throughout the text to
Page 15
hidden
CHAPTER 1. INTRODUCTION 4
illustrate its use. Functions are documented more completely in Appendix B-1.
Chapter 2 discusses image acquisition, the process of capturing digital data
from photographic hardware such as cameras or scan-sensors. We introduce
the different degradations experienced during digital image formation, in an-
ticipation of an imaging model required for reconstruction. Suitable camera
configurations are discussed, and noise suppression techniques are suggested.
Chapter 3 gives an overview of the Discrete Pulse Transform (DPT), as
well as its efficient implementation in two dimensions, in preparation for the
identification of features in the next chapters.
Chapters 4 and 5 go hand in hand, and describe how to align two images
where one has undergone a geometrical transformation. Chapter 5 discusses
two popular methods: direct and feature based registration. For the latter
purpose, Chapter 4 develops a feature detector based on the Discrete Pulse
Transform of Chapter 3. Chapter 5 also treats photometric correction: adjusting
the histogram of two images to be more similar.
Once all images are accurately aligned, Chapter 6 constructs the super-
resolution problem, based on a model of the image acquisition pipeline from
Chapter 2. It also lists a number of fast heuristic super-resolution meth-
ods. Chapter 7 shows how to set up the super-resolution problem as an over-
determined set of linear equations, and how to solve the least squares problem
using iterative sparse methods.
The appendices, starting on page A-1, treat topics that are deemed ancillary
to the main discussion.
Contributions
The basic ideas in super-resolution are well established; for example, the frame-
work provided by Cheeseman et al. in 1993 [CKK
+
93] is still used today. Close
investigation of seemingly different approaches such as [Ban09] often reveals
similar strategies, and, unless a revolutionary new approach is discovered, im-
provement lies in the detail of existing algorithms. We therefore carefully in-
vestigate each component of the existing super-resolution framework, and make
improvents where possible. These contributions are listed below, and are further
detailed in corresponding chapters of the dissertation.
Acquisition
Pre-processing is avoided in the literature due to its tendency to remove or de-
stroy the information required to perform super-resolution reconstruction. We
show how certain noise-removal techniques, popularised in astronomical and
Page 20
hidden
CHAPTER 2. IMAGE ACQUISITION 9
2.2 Digital photography
2.2.1 Lens distortions
Radial distortion
The camera lens may introduce a number of distortions, the most common of
which is radial (or barrel) distortion. Radial distortion occurs when a lens is
not symmetrically designed with regard to the aperture stop. Figure 2.1 shows
radial distortion in a real photograph.
Radial distortion can be measured and removed before reconstruction, but
this destroys some super-resolution information. Instead, the parameters may
be included in the transformation model [CJAE05]. We choose to neglect radial
distortion, since super-resolution is almost always executed on a small region
inside a much larger image, where the distortion is negligible.
Vignetting
The effect of reduced peripheral image illumination is known as vignetting.
Vignetting occurs naturally (see [Ray02, p. 131], the cos4  law of illumination),
due to lens design or due to part of the camera extending into the field of view.
It is not hard to measure and correct for vignetting, but in our data-sets most
objects of interest are in the centre of the frame.
Chromatic aberration
The refractive indices of optical objects vary according to wavelength. In differ-
ent colour band objects may therefore appear at slightly differing positions, as
seen in Figure 2.2. In our experiments, we assume that chromatic aberration is
not severe.
2.2.2 Effects due to motion
Geometric transformation
Rotating or translating a camera causes similar changes in the resulting pho-
tographs. As part of our model, we need to estimate these changes so that pixels
in different input photos may be related to one another (the topic of Chapter 5).
This estimation process is much simplified if we assume that the transforma-
tion is linear, a valid assumption if the object lies on a plane or far away from
the camera. The need for super-resolution often arises in these circumstances,
when photographs of a far-away object is taken, yielding only low-resolution
representations.
Page 21
hidden
CHAPTER 2. IMAGE ACQUISITION 10
Figure 2.1: Barrel distortion. A test pattern (top left) was photographed; a small
section of the photo is shown on the right. On the photo, the black line is curved,
as seen when comparing to the superimposed dashed white line. Radial (or barrel)
distortion, shown on the diagram at the bottom left, is witnessed.
Page 26
hidden
CHAPTER 2. IMAGE ACQUISITION 15
Figure 2.5: Front page of B.E. Bayer's US patent.
Page 27
hidden
CHAPTER 2. IMAGE ACQUISITION 16
Figure 2.6: Pixel colouring in the Bayer pattern.
Figure 2.7: The Airy disc (2D and 3D representations) which shows the intensity
distribution of light that travelled through a circular aperture.
Readout noise is modelled as white noise, while photon and dark noise are
Poisson-distributedoften approximated as Gaussian in the literature. Fixed
pattern noise is a gain factor that differs for each pixel element.
Noise removal
Super-resolution estimation depends on the combination of input frames to pro-
vide missing high-frequency content, destroyed due to aliasing. Slight shifts in
camera position yield small, localised changes which we exploit to do the recon-
struction. Notably, any noise removal process that combines neighbouring pixels
destroys this information. The process combines pixels from different frames,
corresponding to the same scene position, a process in some ways similar to
averaging. As such, noise from zero-mean processes often do not disturb the
reconstruction significantly.
Of the above, the only process modelled as having a non-zero mean is fixed
pattern noise [HK94], which can be detected and removed, as described below.
Note that care has to be taken with noise removal: it is easy to distort other
zero-mean noise sources to have non-zero means. This might be the reason why
noise removal is not applied in super-resolution literature.
Page 33
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 22
X becomes monotone (or constant), whereafter the LU operator has no effect.
The original sequence is reconstructed by summing the different levels of the
decomposition:
X =
X
L
DL:
Extension to two dimensions
The two-dimensional discrete pulse transform is obtained in much the same way,
although the definitions of Lm and Um have to be adapted. The extension to
the two-dimensional case is given by Anguelov and Fabris-Rotelli (we refer to
[AFR08] for an overview).
In two dimensions, we say that two pixels are connected if they are neighbours
typically in a 4- or 8-connected sense, i.e., North, South, East, West and possibly
the diagonal directions. A connected set on (x; y) is the set including (x; y) and
pixels connected to (x; y) via any other connected pixel. For example, f(x; y) is
connected to f(x+ 2; y) via f(x+ 1; y). We call such a connected set N (x; y).
Specifically, a connected set containing n+ 1 elements (that is, (x; y) itself plus
n connections) is denoted Nn(x; y).
Definition 2. The two-dimensional operators, Ln and Un, are defined as
Lnf(x; y) = max
V 2Nn(x;y)
min
(i;j)2V
f(i; j); (x; y) 2 Z2;
Unf(x; y) = min
V 2Nn(x;y)
max
(i;j)2V
f(i; j); (x; y) 2 Z2:
3.3 Implementation
The implementation given here was designed for the two-dimensional case and
was first presented as [FRV09]. In the meantime, D. Laurie (who, together
with C. Rohwer, first described a fast algorithm for calculating the discrete
pulse transform in one dimension [LR06]) proposed a more general graph-based
representation of the process that should allow an efficient implementation in
higher dimensions. At the time of writing, no optimised implementation is
available for comparison. In [SW09], a similar idea is discussed in the context
of connected operators.
When decomposing an image into pulses using the DPT, the number of
pulses may vary from approximately 30,000 for a typical 300  300 image to
over a hundred thousand for a 500  500 image. We need an efficient storage
scheme to represent such a large number of pulses in memory. Furthermore,
we need to be able to calculate certain attributes of the pulses (such as the
Page 35
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 24
2
6
6
6
6
4
0 0 0 0 0
1 1 1 1 1
0 1 1 0 1
0 1 1 1 0
3
7
7
7
7
5
is written as
'
&
$
%
ˆ value = 1
ˆ columns =
h
0 5 1 3 4 5 1 4
i
ˆ startrow = 1
ˆ rowoffset =
h
0 2 6 8
i
Instead of specifying column values, columns now indicates the start and past-
end indices of the one-dimensional pulses that comprise the rows. The values
of rowoffset, as in the previous example, specify where in columns each new
row starts. The pulse may only cover a few rows of the entire image, therefore
we use startrow to indicate the first occurrence, saving us from storing every
single row.
As an example, consider the third row of the two-dimensional pulse above,
which consists of two one-dimensional pulses: the first stretching from column 1
up to (but excluding) 3, the other from 4 up to 5. Since we are interested in the
third row (row number 2), and we only start recording rows at startrow = 1,
we find the corresponding column indices in row_offset[21] = 2. At position
2, columns contains 1, 3 and 4, 5 as expected.
An advantage of this storage scheme is that it can also be used to store
connected regions, a capability we exploit later to initialise the algorithm.
Queries
Given a pulse in the above format, we would like to calculate the following
queries rapidly:
Area / number of non-zeros The area of the pulse is the sum of the lengths
of the one-dimensional pulses comprising its rows. Each such length is given as
the corresponding difference between the pulse start-end positions in columns.
In the example above, the area would be (5 0) + (3 1) + (5 4) + (4 1) =
5 + 2 + 1 + 3 = 11.
Adjacent Set / Boundary positions Each pulse has four or more boundary
positions  connected to the pulse in a 4- or 8-connected sense (see Fig. 3.2 for
an illustration of the 4-connected case)  that form the adjacent set. To find the
Page 37
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 26
Another way to implement this algorithm would have been using interval trees,
but the structures and memory allocations involved are more complex, so it is
not obvious without a benchmark whether that would be beneficial.
Operations
Merging Two Pulses Later on, when performing the Discrete Pulse Trans-
form, we are required to merge two pulses that touch. This is done on a row-
by-row basis. In the trivial case where a row is contained in only one of the two
pulses, we simply include that row in the output. Otherwise, we need to sort and
join the one-dimensional pulses that comprise the row carefully. Note, however,
that these one-dimensional pulses cannot overlap in our problem description.
We therefore:
1. Extract the stop-start intervals that form the one-dimensional pulses in
row j.
2. Sort the intervals according to their starting position.
3. Step over the intervals and link (join) them if they touch.
4. Save the linked intervals as the representation of row j.
5. Repeat for row j + 1.
3.3.2 Algorithm for computing the 2D Discrete Pulse
Transform
Each step of the Discrete Pulse Transform is now described in more detail. We
use the following terms:
Input image The input image or data  an M  N matrix of integer values
between 0 and 255.
Label image An M N array of integer values that indicate the connectivity
of pixels in an image. If neighbouring pixels have the same value (i.e. are
4-connected), then they are assigned the same label value.
Intermediate reconstruction An M  N image can be decomposed into
pulses with areas ranging from 1 through MN . When added together,
these pulses reconstruct the input image. It is also possible to only add
pulses with area > k. We call this an intermediate reconstruction, as it
only approximates the image up to a certain level.
The algorithm consists of three steps:
Page 39
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 28
where xn gives the parent of node n. For example, x3 = 1, which tells us that
the parent of node three is node one. Similarly, x2 = 2 implies that node two
has no parentit is the root of a tree.
Labelling connected regions as trees The goal of the connected compo-
nents algorithm is to assign unique labels to each connected region in anMN
image I. An array, L, of length MN is used to store trees as indicated in the
paragraph above.
The image is traversed in raster scan order (i.e. along rows). A region
counter, k, is initialised to zero. At each pixel position (r; c):
1. Calculate the offset into the tree array as t = rN + c.
2. If the pixel is not connected to (does not have the same value as) the pixel
above it or to the left, assign Lt = t, effectively creating a new tree.
3. If the pixel is connected to the pixel above, assign Lt = LtN , joining
node t to its parent in the previous row.
4. If, in addition, the pixel is connected to the left, assign Lt1 = LtN .
5. If the pixel is only connected to the left, assign Lt = Lt1.
Appropriate care needs to be taken in the first row and column to prevent
indexing errors on the image boundary.
The label vector, L, can also be seen as the flattened version of a label image
so that Lr;c = LrN+c. From this image, all connected regions are extracted as
pulses and stored in the format discussed in Section 3.3.1.
We then proceed to perform the Discrete Pulse Transform as discussed next.
Identifying Pulses to Merge The Discrete Pulse Transform is performed
by alternately executing the Lk (lower) and Uk (upper) operators that extract
pulses of area k. If you think of the image as a height-map, then the U1-operator
removes all valleys of area one. Here, a valley is defined as a connected area
that is surrounded only by higher values. Similarly, the L1-operator removes
peaks of area one, where peaks are connected areas surrounded only by lower
values.
After applying the L1- and U1-operators and storing the removed peaks and
valleys (those form the first level of the DPT), we need to merge pulses that
were joined in the process. Note that, at each decomposition level, we have the
intermediate reconstruction available. It is obtained by setting the image values
corresponding to the removed positive (negative) pulses equal to the maximum
(minimum) value on the adjacent set.
Page 40
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 29
For each pulse, we calculate its boundary positions using the method de-
scribed in Section 3.3.1. We then examine the boundary values on the interme-
diate reconstruction, and if any of those values are equal to the pulse value, a
merge is required. After examining all boundary positions, a list is drawn up
of all coordinates that fall on merge boundaries. At each of those positions, a
merge is performed as described in Section 3.3.1, after which the label image is
updated.
The Lk+1 and Uk+1 operators are now applied, repeating the merging process
for higher values of k until the image has been entirely decomposed (in other
words, until the final MN -sized pulse has been removed). All the removed
pulses together form the Discrete Pulse Transform or decomposition.
3.3.3 Benchmark
The decomposition algorithm described was implemented an executed on a Intel
Core Duo 3.16GHz processor. Memory utilisation peaked at less that 150MB
during decomposition of Airplane and roughly 60MB while processing Chelsea
(including the memory required to store the decomposition itself). Computation
times were 3.73s (Airplane) and 1.51s (Chelsea). Reconstruction executed in a
few milliseconds.
Figure 3.6 shows benchmark times for the discrete pulse transform applied
to random matrices. Random matrices with a large number of discrete values
seem to be the worst caseexecution times are much lower for real photographs
and for signals limited to, say, 255 discrete values. Both these observations are
explained intuitively: a random matrix has many more pulses than a typical
photograph and limited discrete values cause merging of pulses that would oth-
erwise remain separated. It would be interesting to investigate whether a link
exists between image entropy and the number of pulses generated.
Page 41
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 30
S
o
f
t
w
a
r
e
The Discrete Pulse Transform is implemented as supreme.lib.dpt.
The decomposition is obtained as follows:
import supreme.api as sr
image = sr.test_data ()
# Perform the decomposition -- this may take
# 5-10 seconds
pulses = sr.lib.dpt.decompose(image.astype(int))
# Reconstruct the original from the pulses
Z = sr.lib.dpt.reconstruct(pulses , image.shape)
# Display the two images side -by-side
sr.show(image , Z)
The returned `pulses` is a dictionary indexed by area, i.e., pulses[2]
contains all pulses of area 2 (those that form the second level of the de-
composition).
The reconstruction simply adds all the pulses together, and can
also be done manually, as illustrated below. Note the use of the
connected_region_handler to calculate pulse values and to add pulses
to the output array. This is necessary because the pulse itself is stored in a
sparse format, as discussed later in this chapter.
import numpy as np
from supreme.lib.dpt import \
connected_region_handler as crh
# Create an output image
out = np.zeros(image.shape , dtype=int)
# Reconstruct output by adding all the pulses
for area in pulses:
for p in pulses[area]:
crh.set_array(out , p, crh.get_value(p), 'add')
sr.show(out)
Page 42
hidden
CHAPTER 3. THE DISCRETE PULSE TRANSFORM 31
Figure 3.5: Two test images used for benchmarking the 2D DPT. On the left is
Chelsea the Cat ( 300 351) and on the right is Airplane (512 512).
0 20000 40000 60000 80000 100000 120000 140000 160000Number of pixels
0
20
40
60
80
100
Tim
e(s)
Execution timesQuadratic fitExecution times (recombination)
Figure 3.6: Benchmark of the discrete pulse transform on random images of varying
size. Values on the x-axis indicate the total number of pixels, i.e., N2 for an N  N
matrix. In the bottom curve, labeled recombination, the number of discrete input
values were limited to 255. For large images, this causes the algorithm to execute more
quickly than expected due to accidental merges.
Page 46
hidden
CHAPTER 4. FEATURE DETECTION AND MATCHING 35
4.2 Feature detection using the discrete pulse
transform
4.2.1 Scale space
The discrete pulse transform decomposes an image into a sum of differently
sized pulses (see Chapter 3). Pulses of a fixed size form a single level of the
DPT, and the different levels a scale space, similar in concept to the well known
Gaussian scale space, on which many feature detectors such as SIFT rely.
Given all pulses pk;i(x; y) 2 Dk belonging to the different levels k = 1 : : :K,
the original signal can be reconstructed using
f(x; y) =
X
k
Dk(x; y) =
X
k
X
i
pk;i(x; y):
The range of i varies according to the number of pulses at any specific level.
A single pixel, (x; y), belongs to zero or more pulses of different sizes (in other
words, it is included in zero or more levels of the decomposition). A partial
reconstruction of the signal that suppresses higher levels of detail is given by
f^L = f
L1X
k=1
Dk or equivalently f^L(x; y) = f(x; y)
L1X
k=1
X
i
pk;i(x; y):
This is the image reconstructions at scale L. Scale 1 represents the original
image, while level K represents only the largest pulses. At scale L, only size
L and larger pulses are used in the reconstruction. The difference between two
layers, L and L+ 1, is DL =
P
i pL.
4.2.2 Pulse strength
According to the Cambridge Advanced Learner's Dictionary, a feature is a
typical quality or an important part of something. Indeed, we hope to isolate
features that are robust, i.e. they remain recognisable under moderate geometric
and photometric transformations.
The DPT provides an easy way to identify prominent features: we simply
count the number of pulses in which a pixel occurs. The strength of pixel (x; y)
is
s(x; y) =
X
k
X
i
x;y(pk;i)
Page 47
hidden
CHAPTER 4. FEATURE DETECTION AND MATCHING 36
(a) L=1 (b) L=100
(c) L=1000 (d) L=10000
Figure 4.1: Reconstruction of Chelsea at different scales. The changes in brightness
are due to automated scaling, done to make pulses (especially in the higher levels) more
visible.
(a) Input Image (b) Pulse strength.
Figure 4.2: The pulse strength of an image, used to identify important segments. In
this case, the pulse strength was calculated for all pulses of size less than 100. Darker
values indicate stronger pulses.
Page 51
hidden
CHAPTER 4. FEATURE DETECTION AND MATCHING 40
Quantile-Quantile (QQ) plots
A QQ-plot is a graphical technique used to determine whether two data se-
quences are drawn from the same distribution. The quantiles (the fraction of
points below given probability thresholds) of the first data set are plotted against
those of the second. If the data sets come from the same distribution, the result
should be a straight line.
When the two data-sequences to be compared, let's say x and y, are equal
in length N , a simple way to form the QQ-plot is by sorting the sequences from
small to large and then connecting points (xi; yi) from i = 0 to N 1. The
deviation from the straight line can be measured as
 = x y:
Since both x and y contain only positive elements, we can normalise  using
the norms of the components,
1 =
kk2
max(kxk2 ; kyk2)
:
The higher the value of 1, the larger the mismatch in distribution between
the two sequences. The algorithm for calculating correspondences, based on this
principle, is given as Algorithm 4.1.
To understand why the 2-norm of Q (as referred to in the algorithm) is a
good indication of the patch deviation, consider using the 1-norm,
kQk1 = maxn
X
m

qm;n

;
the maximum absolute column sum. This is akin to measuring the maximum
error over all quantiles. The 1-norm,
kQk1 = maxm
X
n

qm;n

;
gives the maximum absolute row sumwhich measures the maximum deviation
across patch rows. The 2-norm,
kQk2 = maxx 6=0
kQxk2
kxk2
;
lies somewhere in between, in the sense that
1
p
N
kQk1  kQk2 
p
N kQk1 :
Page 52
hidden
CHAPTER 4. FEATURE DETECTION AND MATCHING 41
Algorithm 4.1 Calculating point-correspondences using Quantile-Quantile
comparisons.
Given the features in two images, A and B, calculate likely feature correspon-
dences (i.e., which feature in A is represented by which feature in B).
1. Cut out image patches centered around the features in both A and B. The
patches in A are called PA;i and those in B are PB;ithey form N  N
matrices. The size of patches can be determined using the discrete pulse
transform as described above, or alternatively set to the expected size of
the average image feature (roughly between 10 10 and 50 50).
2. For each patch, sort the component of its rows, i.e.
PA;i =
2
4
3 9 2
1 8 3
4 4 1
3
5 =) PA;i =
2
4
2 3 9
1 3 8
1 4 4
3
5 :
3. For each patch PA;i, calculate the distribution deviation against all patches
from B:
(a) Construct the matrix
Q =
2
6
6
6
4
e0
e1
.
.
.
eN1
3
7
7
7
5
where en = PA;i(n; )PB;j(n; ) with PA;i(n; ) being the n-th row
of PA;i.
(b) Calculate the patch deviation between patch PA;i and PB;j as
Di;j =
kQk2
max(

PA;i


2
;

PB;j


2
)
:
4. The best correspondence for patch PA;i in B then is
argmin
j
Di;j :
Page 53
hidden
CHAPTER 4. FEATURE DETECTION AND MATCHING 42
Figure 4.5: Correspondence matching on a picture of the Hubble Space Telescope
(taken from the NASA archives). The second frame is a rotated version of the first.
Note that the majority of, but not all, correspondences are correct.
Figure 4.6: Correspondence matching on real-world data. These photos were taken
by NASA during the Pathfinder missions.
Because the algorithm does not compute pixel intensity differences, but rather
compares intensity distributions, it is fairly robust to mild geometric trans-
formations. The output of the algorithm, applied to both an artificial and a
real-world data set, is shown in Figures 4.5 and 4.6.
4.3.2 Other matching methods
After image patches have been cut out around detected features, several algo-
rithms are available to find correspondences. We mention a few.
Page 59
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 48
problems as a linear coordinate transformation,
2
6
6
4
x0
y0
z0
3
7
7
5 = H
2
6
6
4
x1
y1
1
3
7
7
5 ;
where the 33matrixH represents a projection, encompassing rotation, scaling,
skew and perspective. Note the use of homogeneous coordinates, for which the
equivalence 2
6
6
4
x0
y0
z0
3
7
7
5 
2
6
6
4
x0=z0
y0=z0
1
3
7
7
5
holds. Homogeneous coordinate are returned to Euclidean form simply by di-
viding with the last element.
Estimating the transformation matrix H accurately is the main topic of
this chapter. We study its properties, and then explore registration algorithms.
These come in two flavours: those that operate in the spatial domain, and
those that use a transformed domain, such as the Fourier, wavelet or log-polar
domains. Within the spatial domain, methods can be divided further into cat-
egories of dense and sparse registration. Dense methods generally examine
images as a whole to determine geometric relationships between pixels, whereas
sparse methods rely on features, such as described in the previous chapter.
To conclude, we briefly discuss photometric registrationdetermining the
differences in exposure between framesused to equalise differences in input
frame exposures.
For further reading, we refer to two review articles, [Bro92] and [ZF03].
In [Bro92], one of the first thorough surveys of registration methods, Brown
characterises algorithms based on their choice of four components: a feature
space, a search space, a search strategy and a similarity metric. More recently,
[ZF03] gives a thorough overview of recent developments, including the use of
mutual information.
5.2 The transformation matrix
The ability of the transformation matrix, H, to represent a projective trans-
formation is made possible by extending the two-dimensional position vector,
[x; y]T , to three components. The value of the last dimension may vary, and
Page 62
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 51
coordinate, x, in Euclidean form (i.e., normalised so that z = 1) to a target
coordinate x0 = Hx yields
x0 = H00x+H01y +H02
y0 = H10x+H11y +H12
z0 = H20x+H21y +H22:
By also converting x0 to Euclidean form, we can compare the source and target
coordinates,
x0
z0

H00x+H01y +H02
H20x+H21y +H22
= 0
y0
z0

H10x+H11y +H12
H20x+H21y +H22
= 0:
Multiplying by the denominator yields
x0
z0
(H20x+H21y +H22)H00xH01y H02 = 0
y0
z0
(H20x+H21y +H22)H10xH11y H12 = 0
which can also be written as the system
Ah =
2
6
6
4
x y 1 0 0 0 x
0x
z0
x0y
z0
x0
z0
0 0 0 x y 1 y
0x
z0
y0y
z0
y0
z0
.
.
.
3
7
7
5
2
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
4
H00
H01
H02
H10
H11
H12
H20
H21
H22
3
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
7
5
= 0:
Each feature correspondence fills two rows of A, so that n point correspondences
yields a 2n 9 matrix.
We now have to solve the homogeneous set of linear equations,
Ah = 0 h 6= 0:
For 4 point correspondences, the solution is the one-dimensional null-space of
Page 65
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 54
S
o
f
t
w
a
r
e
Homography estimation is provided as
supreme.register.sparse. The open source package homest
(http://www.ics.forth.gr/~lourakis/homest/) provides similar
functionality.
# x0 is a list of x-coordinates in the first image
# y0 is a list of y-coordinates in the first image
# x1 is a list of x-coordinates in the second image
# y0 is a list of y-coordinates in the second image
# We would like to find the matrix , H, such that
# [x0] [x1]
# H [y0] = [y1]
# [1 ] [1 ]
import supreme.register as sr
# The mode parameter can be 'direct ' or 'iterative '
H = sr.sparse(y0, x0 , y1 , x1, mode='iterative ')
5.3.2 Estimation in the presence of outliers
Sadly, no feature correspondence algorithm is perfect, consequently we have to
deal with a number of incorrect correspondences amongs those provided. By far
the most common approach is Random Sample Consensus, or RANSAC [FB80].
RANdom SAmple Consensus (RANSAC)
In fact, RANSAC is so popular that, in 2006 for the 25th anniversary of the
algorithm, a special session was held at the International Conference on Com-
puter Vision and Pattern Recognition to discuss all the variations that had been
proposed.
Probably the best known extension is MLESAC [TZ00], but recently Chum
[CM05] published a number of improvements including LO-RANSAC and PROSAC.
A review of many extensions is available in [CKY97].
Given noisy data to fit to some model (e.g., a straight line, a linear trans-
formation, or anything else), the sample-verify steps that form the RANSAC
algorithm are outlined in Algorithm 5.1.
Page 66
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 55
Algorithm 5.1 The RANdom SAmple Consensus (RANSAC) algorithm.
RANSAC aims to find the best parameters, p, for a model M , such that the
number of outliers on a data-set X,
X
i
[M(xijp)  K]
is minimised (K is a user set threshold). For example, the straight line model
has the form
M(x; yjm; c) = y mx c:
Repeat, for a pre-determined number of times (see [FB80]), the following pro-
cedure:
1. Randomly draw, without replacement (i.e., without drawing any point
twice), N samples from the noisy data-set. Here, N is the minimum
number of points required to fit the model, M (e.g., 2 in the case of a
straight line).
2. Estimate the model parameters, p, that best fit the N randomly drawn
samples.
3. Based on the model parameters, determine the number of data-points
(from the entire data-set) that qualify as inliers (i.e., those data points
that fit the model well):
Q =
X
i
[M(xijp) < K]
If the number of inliers exceeds those found in previous runs, store the
current model parameters.
After termination, the model parameters associated with the largest set of inliers
are available.
MSAC: It is noted in [TZ00] that the RANSAC minimises an error of the
form
C =
X
i
(ei)
where
 (ei) =
8
<
:
0 ei < T
1 ei  T
:
Thus, the cost function counts the number of outliers, but does not take the
model error of each sample, ei, into account. We can modify step 3 of RANSAC
to read:
Based on the model parameters, evaluate the cost function
C =
X
i
 (ei)
Page 67
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 56
where
 (ei) =
8
<
:
ei ei < T
1 ei  T
and
ei = M(xijp):
If the value of the error function, C, is lower than in previous runs,
store the current model parameters.
This approach is known as m-estimator sample consensus, or MSAC.
LO-RANSAC: In [CM05] it is suggested to execute an inner loop of RANSAC
on the inliers every time a new minimum error is obtained. The last sentence
of the MSAC description above then becomes
If the value of the error function, C, is lower than in previous runs,
proceed to do another RANSAC run, this time only sampling from
the best set of inliers, but still verifying against the entire data-set.
In his dissertation, Chum showed that this leads to quicker convergence at a
small increase in computational cost.
Early termination Fischler & Bolles [FB80] estimate the number of itera-
tions required to draw an inlier-only set to be
k = M
where  is the expected probability of choosing an inlier, andM is the number of
samples drawn. Since the value of  is usually unknown, we set it conservatively
low, increasing the number of iterations and (hopefully) ensuring a good match.
The number of iterations is therefore unnecessarily high, but can be updated
during each run as we learn more about our data set [Cap05]. For example,
given that a run has encountered N inliers in a size L dataset, we set
k =

N
L
M
if the new k is smaller than the existing value but larger than some predefined
minimum.
Page 73
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 62
5.4.4 Log-polar registration
Log-polar registration [ZW00] is based on the observation that, under certain
coordinate transformations, rotations and scalings become translations. Specif-
ically, this property is observed in the transformed image
f(log(R); )
where (R; ) are polar coordinates. Scaling the image by a factor s yields
f(log(R) + log(s); )
whereas rotation by an angle  results in
f(log(R);  + ):
To construct the log-polar transform, we first convert cartesian coordinates to
polar coordinates,
R =
p
(x xc)2 + (y yc)2
 =
8
<
:
arctan

yyc
xxc

x xc 6= 0
=2 x xc = 0
where (xc; yc) are the centre coordinates. The log axis of the log-polar transform
becomes
L = logbR:
Denote the height or width of the input image, whichever is largest, by w. We
would like the log-axis of the transformed image to have the same dimension.
The maximum value of R occurs at the corner points, where R = Rmax =
p
x2c + y2c . Setting L = logbR = w, we calculate b as
b = eln(Rmax)=w = R1=wmax:
Warping in reverse When warping images, it is not practical to use this
forward transform. Some cartesian coordinates may, under integer roundoff,
map to the same log-polar position, and, worse, some log-polar positions may
have no integer cartesian counterparts, leaving empty patches in the transformed
image. Instead, we take each coordinate in log-polar space, calculate its (non-
integer, floating point) position in cartesian space, and interpolate the image to
Page 75
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 64
Chapter 4. The adaptive polar transform discussed next claims to address these
concerns, although we have not verified that it does so in practice.
The adaptive polar transform (APT) The log polar transform non-uniformly
samples the source image, causing oversampling near the fovea and undersam-
pling in the outer regions. The adaptive polar transform [MZE09] aims to ad-
dress this problem. Due to the uniform sampling employed, a straightforward
correlation can no longer be used to register images. Instead, the APT of the
each image is projected to two one-dimensional signalsone representing scale,
the other rotation angle. By correlating these image projections over differ-
ent frames, the registration parameters are recovered. The authors of [MZE09]
claim that this can be done with fewer operations than the traditional log polar
transform. They also describe a feature search strategy, similar in principle to
the one given above in Computing feature correspondence.
S
o
f
t
w
a
r
e
The log polar transform is implemented as
supreme.transform.logpolar. Registration parameters are calcu-
lated using supreme.register.lp_patch_match.
5.5 Photometric registration
Almost all digital cameras provide functionality to ensure properly exposed
photographs. This entails adjusting parameters such as the sensor gain, aperture
size and exposure time. Furthermore, scene radiance is modified by the non-
linear camera response function, whereby high intensity values are compressed
and low intensity values are expanded.
When performing super-resolution imaging, we relate each pixel to its coun-
terpart in other input frames. As such, we prefer a specific point in the scene
to have equal pixel intensity values over all frames.
Estimating the parameters of these processes is known as photometric regis-
tration. We can loosely distinguish between relative and absolute photometric
registration, as used in astronomy [PSF
+
08]. In absolute registration, the goal
of is to determine the true flux incident on the CCD. In relative registration, we
merely wish to establish this quantity relative to some known reference (such as
the exposure in a chosen frame).
For super-resolution, we aim to establish a crude photometric relationship
between input frames, and luckily the accuracies needed are not anything near
those required in astronomy. Given a reference and a source input frame, fR
Page 76
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 65
(a) Input image, Chelsea the Cat.
(b) Log-polar transform (LPT) of input image. (c) LPT after scaling by 2.
(d) LPT after rotating by 30. (e) LPT after scaling by 2 and rotating by 30.
Figure 5.3: The log-polar transform. Note how changes in scale and rotation cause
translation in the transform domain.
Page 78
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 67
The means and variances are calculated using those entries where both images
have intensity values in [10; 200] (for 255 level greyscale images). This is done to
avoid clipping at high intensities, noise at low intensities and zeros introduced
by out-of-boundary values during warping.
S
o
f
t
w
a
r
e
Photometric registration is implemented as
supreme.photometry.photometric_adjust. To modify an image,
source, to look like the image target, use:
from supreme.photometry import photometric_adjust
a, b = photometric_adjust(source , target)
source = source * a + b
Histogram Adjustment The above procedure relies on the camera response
function being r
. If we are unsure of this modelling, we may instead estimate
p using histogram matching. One of the first uses of histogram matching on
multi-sensor data is that of Horn and Woodham in their 1979 paper on removing
stripes from LANDSAT imagery [HW79].
We assume that the radiance distribution of two registered images is the
same, at least in the overlapping region. The function q is known to be monotone
increasing (i.e., q(x0) < q(x1) if x0 < x1), so that the standard method for
transformation of a random variable can be applied.
Given values x 2 fS and corresponding y 2 fR, we seek
y = q(x):
This introduces a relationship between the distribution functions (the cumula-
tive probability density functions),
PX(x) = PY (y):
The function q(x) is determined as
y = q(x) = P1Y (PX(x))
where P1Y is the inverse of PY . Since both the distribution functions, PX and
PY , can be measured from fS and fR, calculating q(x) should pose no problem.
The only hurdle is that, instead of continuous distributions, PX and PY are
Page 79
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 68
discrete, so that
PY (y) =
X
8t:ty
p(t)
where p is the histogram of y 2 fR. The inverse, P
1
Y (z), can then be formulated
as follows:
Calculate the value y for which the discrete distribution function
PY (y) is closest to z.
Since y can only be one of several discrete values, the function maps a continuous
z to a discrete y. For our specific application, the calculation of the histogram
neglects zero-values, since those are introduced during warping when out-of-
boundary coordinates occur. Results are shown in Figure 5.4.
S
o
f
t
w
a
r
e
Histogram matching is implemented as
supreme.photometry.histogram_adjust. To modify an image,
source, to look like the image target, use:
from supreme.photometry import histogram_adjust
q = histogram_adjust(source , target)
source = q(source)
Photometric registration for super-resolution
Super-resolution input frames are often small, cropped areas of a larger scene.
As such, they may contain few grey-levels, resulting in a poor density function
estimate when applying the histogram method. The discretisation of function
q(x) furthermore causes unacceptable quantisation errors. It may be possi-
ble to approximate the distribution functions locally and to calculate a more
exact inverse, but we have not investigated this possibility further. Until we
do, we recommend the affine model, ax + b, which is linear, robust, fast and
parametrised by only two variables. Other interesting research include estima-
tion of the camera response function from a single image [NCT07].
Page 81
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 70
Bibliography
[Ana89] P. Anandan. A computational framework and an algo-
rithm for the measurement of visual motion. Interna-
tional Journal of Computer Vision, 2(3):283310, Jan-
uary 1989.
[BAHH92] James R. Bergen, P. Anandan, Keith J. Hanna, and Ra-
jesh Hingorani. Hierarchical Model-Based Motion Esti-
mation. Lecture Notes In Computer Science; Vol. 588,
1992.
[Bro92] L.G. Brown. A Survey of Image Registration Tech-
niques. ACM computing surveys (CSUR), 24(4):376,
1992.
[Cap01] David Peter Capel. Image Mosaicing and Super-
resolution. Ph.D. dissertation, University of Oxford,
2001.
[Cap05] D. Capel. An effective bail-out test for RANSAC con-
sensus scoring. In Proc. BMVC, pages 629638, 2005.
[CB09] Guilherme Holsbach Costa and José Carlos M.
Bermudez. Registration Errors: Are They Always Bad
for Super-Resolution? IEEE Transactions on Signal
Processing, 57(10):38153826, October 2009.
[CKK
+
93] P. Cheeseman, B. Kanefsky, R. Kraft, J. Stutz, and
R. Hanson. Super-resolved surface reconstruction from
multiple images. In G. R. Heidbreder, editor, Proceed-
ings of the Thirteenth International Workshop on Maxi-
mum Entropy and Bayesian Methods. Kluwer Academic,
1993.
[CKY97] S. Choi, T. Kim, and W. Yu. Performance Evalua-
tion of RANSAC Family. Journal of Computer Vision,
24(3):271300, 1997.
[CM05] O. Chum and J. Matas. Matching with PROSAC 
Progressive Sample Consensus. 2005 IEEE Computer
Society Conference on Computer Vision and Pattern
Recognition (CVPR'05), (I):220226, 2005.
Page 83
hidden
CHAPTER 5. ACCURATE IMAGE REGISTRATION 72
[TU96] P. Thevenaz and M. Unser. A pyramid approach to
sub-pixel image fusion based on mutual information.
Proceedings of 3rd IEEE International Conference on
Image Processing, I:265268, 1996.
[TU00] P. Thevenaz and M. Unser. Optimization of mutual in-
formation for multiresolution image registration. IEEE
Transactions on Image Processing, 9(12):208399, 2000.
[TZ00] P.H.S. Torr and A. Zisserman. MLESAC: A New Ro-
bust Estimator with Application to Estimating Image
Geometry. Computer Vision and Image Understanding,
78(1):138156, April 2000.
[Van05] S.J. Van Der Walt. Automated stratigraphic classifi-
cation and feature detection from images of borehole
cores. Msc, Stellenbosch University, 2005.
[VH07] S.J. Van Der Walt and B.M. Herbst. Methods Used
in Increased Resolution Processing: Polygon based in-
terpolation and robust log-polar based registration. In
Proceedings of the International Conference on Com-
puter Vision Theory and Applications (VISAPP) 2007,
Barcelona, Spain, 2007.
[Vio95] P. Viola. Alignment by Maximization of Mutual Infor-
mation. Ph.D. dissertation, Massachusetts Institute of
Technology, 1995.
[vWNA06] M. ƒadík, M. Wimmer, L. Neumann, and A. Artusi.
Image attributes and quality for evaluation of tone map-
ping operators. National Taiwan University, pages 35
44, 2006.
[ZF03] B. Zitova and J. Flusser. Image registration methods: a
survey. Image and Vision Computing, 21(11):9771000,
2003.
[ZW00] Siavash Zokai and George Wolberg. Robust image reg-
istration using log-polar transform. In Proceedings of
the IEEE International Conference on Image Process-
ing, Canada, 2000.
Page 86
hidden
CHAPTER 6. SUPER-RESOLUTION IMAGE PROCESSING 75
(a) Enlarged input image, one of 25. (b) Super-resolved image.
Figure 6.1: A high-resolution image of a rock named Wedge, calculated using the
first super-resolution algorithm. The technique, developed by NASA Ames's Cheese-
man, Kanefsky, Hanson, and Stutz [CKK
+
93], here combines 25 images taken as part
of NASA's Pathfinder mission.
(a) An image, obtained by sampling on
a grid (the blue dots).
(b) An overlay of multiple, rotated images, each
sampled on a grid.
Figure 6.2: Sampling a signal multiple times may lead to an increase in sampling
rate. In this case, notice how a denser sampling is obtained in the overlapping image
regions.
the best algorithms still seek a maximum a posteriori estimate much like they
did. One of the first papers on MAP image restoration was [Bes86], while
another MAP approach to super-resolution was published in [HC90]. What
follows is an overview of [CKK
+
93].
Page 87
hidden
CHAPTER 6. SUPER-RESOLUTION IMAGE PROCESSING 76
6.2.1 Why is super-resolution possible?
Imagine having a device that samples a 1Hz signal once every second. Because
the sampling rate is slower than the Nyquist rate, a perfect reconstruction is not
possible. Making use of another two identical devices can be beneficial: each is
activated 0.3 seconds after the previous, and, by combining the resulting data,
a sampling rate of 3Hz is achieved.
We use a similar experimental setup for super-resolution imaging; the sam-
pling device (a camera), while moving slightly, samples the scene radiance
1
by
taking several photos. Unlike the first experiment, we do not have control over
the sampling delay (the relative movement of the camera), resulting in a dense
but irregular sampling (see Fig. 6.2).
During super-resolution post-processing, the photographs taken (called low-
resolution or LR frames) are combined to form an estimate of the scene radiance
(called the high-resolution or HR frame). The high-resolution frame itself is a
sampled version of the scene radiance, albeit at higher resolution.
6.2.2 Maximum a posteriori estimate
Given a number of low-resolution frames, concatenated to form the vector b,
and the accompanying camera parameters, c, the super-resolution problem is
the estimation of a high-resolution image, x, such that the posterior
P (xjb; c)
is maximised. Finding a high-resolution image from a number of low-resolution
image is no easy task; we prefer to rewrite the maximisation, making use of
Bayes's theorem, as
P (xjb; c) =
P (bjx; c)P (xjc)
P (bjc)
: (6.1)
The likelihood P (bjx; c) is easier to maximise, since it inverts the problem and
asks the question: Given a high-resolution image and camera parameters, how
would the low-resolution images look?. The denominator is not important in
the maximisation, since it is independent of x.
If the prior P (xjc) is set to a constant, the problem reduces to the maximum
likelihood estimator. We can think of the maximum a posteriori estimator as
the maximum-likelihood (ML) estimator, regularised by the term P (xjc).
1
In reality, we sample image irradiancethe visible-light energy incident on the image
sensor. Under some mild assumptions, and especially in narrow-field imaging systems, scene
radiance is proportional to image irradiance [AMK07].
Page 89
hidden
CHAPTER 6. SUPER-RESOLUTION IMAGE PROCESSING 78
yields
arg max
x
P (xjb; c) = arg max
x
[logP (bjx; c) + logP (xjc)]
= arg max
x
h
kbAxk2 +  logP (xjc)
i
;
where various constants are absorbed by .
When choosing the prior as Gaussian with zero mean and spherical covari-
ance we have
arg max
x
P (bjx; c) = arg max
x
h
kbAxk2 xTx
i
= arg min
x
h
kbAxk2 + xTx
i
: (6.3)
Knowing that our pixel values are not centred around zero, the zero-mean Gaus-
sian distribution does not seem appropriate. In the next chapter, we show how
to manipulate our input data to fit this model.
The form of (6.3) is well known as the regularised solution to the least-
squares problem Ax = b.
6.2.4 Camera parameters
Our camera parameters, c, are defined as all the known information about
the image formation process. This therefore includes registration information,
parameters of the point-spread function, and so forth. Usually, the number of
registration parameters is very small compared to the data-set; this forms the
basis for the following argument.
Is it satisfactory to pre-calculate the camera parameters during registration,
or should they be included in the MAP estimation? For example, we could
determine the parameters that maximise
arg max
x;c
P (x; cjb):
This approach, however, involves the difficult process of modelling the joint
distribution, as done in [PCRZ07b]. Cheeseman et al. motivates why this is
unnecessary [CKK
+
93]: the small number of registration parameters are ob-
tained from a large number of data-points (all image pixels); as such, c is highly
over-determined and can be pre-computed accurately without modelling its dis-
tribution.
Page 94
hidden
CHAPTER 6. SUPER-RESOLUTION IMAGE PROCESSING 83
[GNA
+
] A. Garzelli, F. Nencini, L. Alparone, B. Aiazzi, and
S. Baronti. Pan-sharpening of multispectral images:
a critical review and comparison. IEEE International
Geoscience and Remote Sensing Symposium, 2004.
IGARSS '04. Proceedings. 2004, pages 8184.
[GR99] F. Guichard and L. Rudin. Velocity estimation from
images sequence and application to super-resolution.
In Proceedings of the 1999 International Conference on
Image Processing, pages 527531. IEEE, 1999.
[HC90] Yi-Ping Hung and David B. Cooper. Maximum a-
posteriori probability 3-D surface reconstruction using
multiple intensity images directly. In Bernd Girod, ed-
itor, Sensing and Reconstruction of Three-Dimensional
Objects and Scenes, volume 1260, pages 3648, Santa
Clara, CA, USA, January 1990. SPIE.
[JS61] W. James and Charles Stein. Estimation with
Quadratic Loss. In Jerzy Neyman, editor, Proceedings of
the Fourth Berkeley Symposium on Mathematical Statis-
tics and Probability, pages 361379, Berkeley, Califor-
nia, 1961. University of California Press.
[LWL04] Anat Levin, Yair Weiss, and Dani Lischinski. Coloriza-
tion using optimization. ACM Transactions on Graph-
ics, 23(3):689, 2004.
[MW08] R.F. Marcia and R.M. Willett. Compressive coded aper-
ture superresolution image reconstruction. In Int. Conf.
on Acoustics, Speech and Sig. Proc., ICASSP, pages
833836, 2008.
[PCRZ07b] Lyndsey C. Pickup, David P. Capel, Stephen J. Roberts,
and Andrew Zisserman. Overcoming Registration Un-
certainty in Image Super-Resolution: Maximize or
Marginalize? EURASIP Journal on Advances in Signal
Processing, 2007:115, 2007.
[Ste81] C.M. Stein. Estimation of the Mean of a Multivari-
ate Normal Distribution. The Annals of Statistics,
9(6):11351151, 1981.
Page 95
hidden
CHAPTER 6. SUPER-RESOLUTION IMAGE PROCESSING 84
[WH08] John Wright and Thomas Huang. Image super-
resolution as sparse representation of raw image patches.
2008 IEEE Conference on Computer Vision and Pat-
tern Recognition, June 2008.
Page 97
hidden
CHAPTER 7. SUPER-RESOLUTION AS A SPARSE LINEAR PROBLEM 86
combine all k camera matrices and input images to form
A =
2
6
6
6
6
6
4
A(0)
A(1)
.
.
.
A(k)
3
7
7
7
7
7
5
and b =
2
6
6
6
6
6
4
b(0)
b(0)
.
.
.
b(k)
3
7
7
7
7
7
5
the resulting system Ax = b is overdetermined if k > z2. Note that, even when
combining a large number of frames, there is no guarantee that each additional
frame provides independent information (imagine, for example, the case where
k identical frames are combined). In practice, the values of x can only be
determined accurately in positions where multiple frames overlap.
The goal of this chapter is to explore the structure and formation of the ma-
trix A, and to study different methods of solving x in the least-squares problem
where Ax = b is overdetermined.
7.2 The camera matrix, A
The camera matrix A, which represents the image formation process, is the only
customisable parameter in the linear problem Ax = b, and has to be chosen
with care. In the previous chapter we discuss a simplified expression for image
formation,
b(i) = S # (h(T (i)(x))) + (i);
approximated as
b(i) = A(i)x + (i): (7.1)
Note that here we examine the formation of a single image, b(i), while the
camera matrix that produces all low-resolutions frames is simply
A =
2
6
6
6
6
6
4
A(0)
A(1)
.
.
.
A(i)
3
7
7
7
7
7
5
as mentioned above. From (7.1) we note that the camera matrix encapsulates
three processes: geometrical transformation, the effect of the point-spread func-
tion and down-sampling. How, then, should A(i) be calculated?
Each row of A(i) represents weights applied to values in x to form a single
Page 99
hidden
CHAPTER 7. SUPER-RESOLUTION AS A SPARSE LINEAR PROBLEM 88
Figure 7.2: The sparse matrix structure of a geometric transformation operator,
employing bilinear interpolation. In this example, the transformation is a clockwise
rotation by 5. Gaps appear when boundary pixels are met, and the interpolator returns
zero (by design).
some samples in the high-resolution image may not be taken into consideration
at all. In [Cap01, p. 126], this problem is addressed by designing the camera
matrix as follows:
1. Construct a convolution kernel (representing the camera point-spread func-
tion) that operates on the low-resolution image.
2. Use the known geometric transformation, H, to modify the kernel for op-
erating on high-dimensional images. For example, each kernel coordinate
will change from 2
6
6
4
x
y
1
3
7
7
5 to H
2
6
6
4
x
y
1
3
7
7
5 ;
thereby also altering the kernel shape and the convolution path. Con-
versely, we can think of it as fetching all high-resolution pixels that
contribute to a specific low-resolution pixel.
The approach works well, but introduces some challenges of its own:
ˆ What should the shape and size of the convolution kernel be? The camera
Page 100
hidden
CHAPTER 7. SUPER-RESOLUTION AS A SPARSE LINEAR PROBLEM 89
response function is well modelled as a Gaussian kernel, but even so the
optimal variance, 2, is unknown.
ˆ How should the transformed kernel be represented and applied? Capel
models the kernel as a piecewise bilinear surface, allowing easy transfor-
mation and integration.
As mentioned in the previous chapter, the kernel variance parameter, 2, can
be established experimentally. Reconstructions are made while varying 2 until
the result shows little oscillatory behaviour. Still, we prefer not to have the free
parameter at all.
In the next section, we examine the simplified operator, A(i) = T (i), which
no longer requires such a parameter.
7.3 Linear interpolation operators
Note that, from here onwards, we neglect the frame number to simplify notation,
i.e., A = A(i) and T = T (i).
The camera matrix can be approximated simply by bilinear interpolation,
A = T;
introducing the obvious flaw that only 4 high-resolution pixels are used to cal-
culate the value of any low-resolution pixel. In reality, a low-resolution pixel
may (and probably will) depend on more high-resolution pixels; the exact num-
ber being determined by the resolution increase, z, and the severity of the
transformation, H. If, however, the zoom ratio is chosen conservatively, the
approximation may be a good one, as illustrated in Figure 7.3.
Next, we examine the bilinear interpolation operator, whereafter polygon-
based interpolation is introduced allowing any appropriate number of high-
resolution pixelsirrespective of the zoom factor or the severity of the transformation
to contribute to the low-resolution pixel.
7.3.1 Bilinear interpolation
The bi-linear transformation/interpolation operator, A = T , is near-Toeplitz
with interpolation coefficients appearing on the diagonals, as shown in Fig-
ure 7.2. The coefficients in Equation (7.2) are derived from bilinear interpolation
as follows:
Suppose a function is known at four grid coordinates, namely f00 = f(0; 0),
f01 = f(0; 1), f10 = f(1; 0), and f11 = f(1; 1). We need to calculate f(x; y)

Sign up today - FREE

Mendeley saves you time finding and organizing research. Learn more

  • All your research in one place
  • Add and import papers easily
  • Access it anywhere, anytime

Start using Mendeley in seconds!

Already have an account? Sign in

Readership Statistics

8 Readers on Mendeley
by Discipline
 
 
 
by Academic Status
 
50% Ph.D. Student
 
25% Student (Postgraduate)
 
13% Post Doc
by Country
 
38% United States
 
25% United Kingdom
 
13% Germany