Detecting interpretable and accurate scale-invariant keypoints
2009 IEEE 12th International Conference on Computer Vision (2009)
- ISBN: 9781424444205
- DOI: 10.1109/ICCV.2009.5459458
Available from
Falko Schindler's profile on Mendeley.
or
Page 1
Detecting interpretable and accurate scale-invariant keypoints
Detecting Interpretable and Accurate Scale-Invariant Keypoints
Wolfgang Fo¨rstner, Timo Dickscheid, Falko Schindler
Department of Photogrammetry
Institute of Geodesy and Geoinformation, University of Bonn
Nussallee 15, 53115 Bonn, Germany
wf@ipb.uni-bonn.de,dickscheid|falko.schindler@uni-bonn.de
Abstract
This paper presents a novel method for detecting scale
invariant keypoints. It fills a gap in the set of available
methods, as it proposes a scale-selection mechanism for
junction-type features. The method is a scale-space exten-
sion of the detector proposed by Fo¨rstner (1994) and uses
the general spiral feature model of Bigu¨n (1990) to unify
different types of features within the same framework. By
locally optimising the consistency of image regions with re-
spect to the spiral model, we are able to detect and classify
image structures with complementary properties over scale-
space, especially star and circular shapes as interpretable
and identifiable subclasses. Our motivation comes from
calibrating images of structured scenes with poor texture,
where blob detectors alone cannot find sufficiently many
keypoints, while existing corner detectors fail due to the
lack of scale invariance. The procedure can be controlled
by semantically clear parameters. One obtains a set of key-
points with position, scale, type and consistency measure.
We characterise the detector and show results on common
benchmarks. It competes in repeatability with the Lowe de-
tector, but finds more stable keypoints in poorly textured ar-
eas, and shows comparable or higher accuracy than other
recent detectors. This makes it useful for both object recog-
nition and camera calibration.
1. Introduction
Local image features are an important aspect of com-
puter vision research. The idea is to represent the image
content by a set of small, possibly overlapping represen-
tative parts, which are invariant to distortions arising from
the acquisition process, from illumination or viewpoint, and
can reliably be found in other images of the same object.
Corresponding features in different views may then be de-
termined by nearest neighbour search in the space of de-
scriptions of the surrounding image region, providing both
sparseness and robustness compared to a search over the
whole image domain.
Keypoints play a central role as they are anchored to
a specific position in the image which is useful for both
matching and recognition. We distinguish two types: Point-
like keypoints refer to a specific point in the image, as
a junction or the centre of a round spot, whereas blob-
like keypoints refer to small regions, not necessarily round,
where no specific point needs to be identifiable in the image
within the region. Procedures for using keypoints consist
of two parts: a keypoint detector and a keypoint descriptor.
Here we are only concerned with the detection, and rely on
the power of Lowe’s SIFT descriptor [10].
“There is no such thing as generic keypoints” [21]. The
choice of a particular detector must reflect the task at hand.
The motivation for the new detector proposed in this paper
arose in the context of automatic image orientation in poorly
textured, structured scenes, as shown in Figure 1. We found
that in such cases, state of the art keypoint detectors of-
ten yield too few features or poor geometric configurations.
Sometimes multiple features are computed in very nearby
locations, which have to be eliminated during matching to
fulfil the uniqueness constraint. The Harris affine detector
does not reliably extract the corner features, as one would
expect. However, a combined set of features from two or
three complementary detectors may well give stable corre-
spondences for camera calibration and orientation. Thus we
require the following properties from a keypoint detector:
Completeness & Complementarity: The detector
should as much as possible exploit structural elements vis-
ible in the image to yield a maximally complete set of key-
points. This implies that different types of complementary
keypoints are extracted at the same time.
Invariance and repeatability: The detected keypoints
should be scale and rotation invariant and provide high re-
peatability in order to support image matching.
Accuracy: The keypoints should have high localisation
accuracy to support camera calibration.
Interpretability: Basic interpretable elements in the
1
Wolfgang Fo¨rstner, Timo Dickscheid, Falko Schindler
Department of Photogrammetry
Institute of Geodesy and Geoinformation, University of Bonn
Nussallee 15, 53115 Bonn, Germany
wf@ipb.uni-bonn.de,dickscheid|falko.schindler@uni-bonn.de
Abstract
This paper presents a novel method for detecting scale
invariant keypoints. It fills a gap in the set of available
methods, as it proposes a scale-selection mechanism for
junction-type features. The method is a scale-space exten-
sion of the detector proposed by Fo¨rstner (1994) and uses
the general spiral feature model of Bigu¨n (1990) to unify
different types of features within the same framework. By
locally optimising the consistency of image regions with re-
spect to the spiral model, we are able to detect and classify
image structures with complementary properties over scale-
space, especially star and circular shapes as interpretable
and identifiable subclasses. Our motivation comes from
calibrating images of structured scenes with poor texture,
where blob detectors alone cannot find sufficiently many
keypoints, while existing corner detectors fail due to the
lack of scale invariance. The procedure can be controlled
by semantically clear parameters. One obtains a set of key-
points with position, scale, type and consistency measure.
We characterise the detector and show results on common
benchmarks. It competes in repeatability with the Lowe de-
tector, but finds more stable keypoints in poorly textured ar-
eas, and shows comparable or higher accuracy than other
recent detectors. This makes it useful for both object recog-
nition and camera calibration.
1. Introduction
Local image features are an important aspect of com-
puter vision research. The idea is to represent the image
content by a set of small, possibly overlapping represen-
tative parts, which are invariant to distortions arising from
the acquisition process, from illumination or viewpoint, and
can reliably be found in other images of the same object.
Corresponding features in different views may then be de-
termined by nearest neighbour search in the space of de-
scriptions of the surrounding image region, providing both
sparseness and robustness compared to a search over the
whole image domain.
Keypoints play a central role as they are anchored to
a specific position in the image which is useful for both
matching and recognition. We distinguish two types: Point-
like keypoints refer to a specific point in the image, as
a junction or the centre of a round spot, whereas blob-
like keypoints refer to small regions, not necessarily round,
where no specific point needs to be identifiable in the image
within the region. Procedures for using keypoints consist
of two parts: a keypoint detector and a keypoint descriptor.
Here we are only concerned with the detection, and rely on
the power of Lowe’s SIFT descriptor [10].
“There is no such thing as generic keypoints” [21]. The
choice of a particular detector must reflect the task at hand.
The motivation for the new detector proposed in this paper
arose in the context of automatic image orientation in poorly
textured, structured scenes, as shown in Figure 1. We found
that in such cases, state of the art keypoint detectors of-
ten yield too few features or poor geometric configurations.
Sometimes multiple features are computed in very nearby
locations, which have to be eliminated during matching to
fulfil the uniqueness constraint. The Harris affine detector
does not reliably extract the corner features, as one would
expect. However, a combined set of features from two or
three complementary detectors may well give stable corre-
spondences for camera calibration and orientation. Thus we
require the following properties from a keypoint detector:
Completeness & Complementarity: The detector
should as much as possible exploit structural elements vis-
ible in the image to yield a maximally complete set of key-
points. This implies that different types of complementary
keypoints are extracted at the same time.
Invariance and repeatability: The detected keypoints
should be scale and rotation invariant and provide high re-
peatability in order to support image matching.
Accuracy: The keypoints should have high localisation
accuracy to support camera calibration.
Interpretability: Basic interpretable elements in the
1
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
27 Readers on Mendeley
by Discipline
11% Engineering
by Academic Status
41% Ph.D. Student
11% Post Doc
11% Student (Master)
by Country
22% China
15% Germany
7% Japan


