A Fuzzy Associative Approach for Recognition of 3D Objects in Arbitrary Pose
- ISBN: 9781424418183
- DOI: 10.1109/FUZZY.2008.4630447
Abstract
Once the human vision system has seen a 3D object from a few different viewpoints, depending on the nature of the object, it can generally recognize that object from new arbitrary viewpoints. This useful interpolative skill relies on the highly complex pattern matching systems in the human brain, but the general idea can be applied to a computer vision recognition system using comparatively simple machine learning techniques. An approach to the recognition of 3D objects in arbitrary pose relative the the vision equipment given only a limited training set of views is presented. This approach involves computing a disparity map using stereo cameras, extracting a set of features from the disparity map, and classifying it via a fuzzy associative map to a trained object.
A Fuzzy Associative Approach for Recognition of 3D Objects in Arbitrary Pose
Arbitrary Pose
Aaron Mavrinac, Ahmad Shawky, and Xiang Chen
Abstract— Once the human vision system has seen a 3D
object from a few different viewpoints, depending on the nature
of the object, it can generally recognize that object from new
arbitrary viewpoints. This useful interpolative skill relies on
the highly complex pattern matching systems in the human
brain, but the general idea can be applied to a computer
vision recognition system using comparatively simple machine
learning techniques. An approach to the recognition of 3D
objects in arbitrary pose relative the the vision equipment
given only a limited training set of views is presented. This
approach involves computing a disparity map using stereo
cameras, extracting a set of features from the disparity map,
and classifying it via a fuzzy associative map to a trained object.
I. INTRODUCTION
Humans are generally able to recognize 2D shapes, regard-
less of changes in orientation, scale, or skew, after having
seen the shape in one such configuration. This shape recog-
nition has a very wide range of applications, and accordingly,
much work has gone into automating it with computers. The
basic theory is that shapes can be extracted from otherwise
cluttered and cumbersome images, from which some set of
quantifiers efficiently describing the shapes can be obtained
and compared to known values through some algorithm
for classification. The nature of these quantifiers and the
classification algorithm are a subject of much research; most
use quantifiers invariant to the aforementioned transforma-
tions (rotation, scale, skew, etc.) such as Fourier descriptors,
moment invariants, and Hough transformations, and most use
machine learning methods such as fuzzy logic and neural
networks for classification.
Humans are also generally able to recognize 3D objects,
regardless of their orientation, after having seen a sufficient
number of different views (depending, of course, on the
nature of the object itself). To generalize from the 2D case,
it is possible to automate this process in a similar manner
by obtaining quantifiers describing the 3D surface rather
than the 2D shape. Such quantifiers can be extracted from
range images, or in the case of stereo vision, disparity
maps. However, a single such image gives information only
from a certain perspective; this is commonly referred to
as 2.5D. To approach full 3D information, range images
must be taken from different perspectives around the object.
For classification to continue to work as generalized from
the 2D case, the sets of quantifiers from each perspective
must be combined to fully describe the object, and the
classification algorithm must be designed to operate on this
type of information.
In this paper, we expand on previous work in object recog-
nition using invariant values on 2D images [10], justifying
the selection of proper invariant descriptors for 3D shapes
based on disparity maps and modifying the classification
scheme to reflect the new object description. The result is
a system capable of recognizing a trained object based on a
disparity map taken by a stereo camera rig from any view,
where training requires only a few different such views.
II. PRELIMINARY THEORY
A. Disparity Map
We assume a stereo vision system capable of generating
rectified stereo images, wherein the epipolar lines are parallel
and horizontally aligned as if captured by parallel cameras.
In the general case, this requires internal and external (stereo)
calibration of the cameras, which is beyond the scope of this
work; for a thorough geometrical treatment see [3], [29], and
for some practical methods see [4], [5], [6].
Throughout this paper, the following convention is used for
the world and image coordinate systems: lowercase x and y
represent image coordinates starting at the upper left corner,
and uppercase X , Y , and Z represent world coordinates
(which, unless otherwise specified, are mutually orthogonal
with Z perpendicular to the rectified image planes and have
their origin at the optical center of the left camera). Figure
1 illustrates their relationship.
Fig. 1. Coordinate System Convention
Given a pixel of coordindates (x
1
, y
1
) in one image of
an epipolar-rectified stereo pair, and a corresponding pixel
(x
2
, y
2
) in the other (where y
1
= y
2
), their disparity d is
defined as x
2
− x
1
[29]. This can be used to triangulate
the depth to the original 3D point in the environment (from
710
978-1-4244-1819-0/08/$25.00 c©2008 IEEE
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime



