Estimation of pointing poses on monocular images with neural techniques-an experimental comparison
Page 1
Estimation of pointing poses on monocular images with neural techniques-an experimental comparison
Estimation of Pointing Poses on Monocular
Images with Neural Techniques - An
Experimental Comparison
Frank-Florian Steege, Christian Martin, and Horst-Michael Groß
Department of Neuroinformatics and Cognitive Robotics,
Ilmenau Technical University, Ilmenau, Germany
frank-florian.steege@stud.tu-ilmenau.de, christian.martin@tu-ilmenau.de
http://www.tu-ilmenau.de/neurob
Abstract. Poses and gestures are an important part of the nonverbal
inter-human communication. In the last years many different methods for
estimating poses and gestures in the field of Human-Machine-Interfaces
were developed. In this paper for the first time we present an exper-
imental comparison of several re-implemented Neural Network based
approaches for a demanding visual instruction task on a mobile sys-
tem. For the comparison we used several Neural Networks (Neural Gas,
SOM, LLM, PSOM and MLP) and a k-Nearest-Neighbourhood classifi-
cator on a common data set of images, which we recorded on our mobile
robot Horos under real world conditions. For feature extraction we use
Gaborjets and the features of a special histogram on the image. We also
compare the results of the different approaches with the results of human
subjects who estimated the target point of a pointing pose. The results
obtained demonstrate that a cascade of MLPs is best suited to cope with
the task and achieves results equal to human subjects.
1 Introduction and Motivation
In recent years the Human-Machine Interaction has reached a large importance.
One of the most important and informative aspects of nonverbal inter-human
communication are gestures and poses. In particular, pointing poses can simplify
communication by linking speech to objects or locations in the environment in a
well-defined way. Therefore, a lot of work has been done in recent years focusing
on integrating pointing pose estimation into Human-Machine-Interfaces.
Numerous approaches, which can estimate the target of such a pointing pose
have been developed in recent years. Our goal is to provide an approach, which
can be used to estimate a pointing pose on a mobile robot by means of low-cost
sensors. Therefore, in this paper we refer only to approaches using monocular
images to capture the pose of the user. Second, approaches that do not use
Neural Networks to estimate the target of the pointing pose like Haasch [1],
whousedanobject-attentionsystemandaskincolormaporNickel[2],who
estimated the target by the use of a virtual line through the tracked hand and
head of the user, are also not considered in this paper.
J. Marques de Sa´ et al. (Eds.): ICANN 2007, Part II, LNCS 4669, pp. 593–602, 2007.
c© Springer-Verlag Berlin Heidelberg 2007
Images with Neural Techniques - An
Experimental Comparison
Frank-Florian Steege, Christian Martin, and Horst-Michael Groß
Department of Neuroinformatics and Cognitive Robotics,
Ilmenau Technical University, Ilmenau, Germany
frank-florian.steege@stud.tu-ilmenau.de, christian.martin@tu-ilmenau.de
http://www.tu-ilmenau.de/neurob
Abstract. Poses and gestures are an important part of the nonverbal
inter-human communication. In the last years many different methods for
estimating poses and gestures in the field of Human-Machine-Interfaces
were developed. In this paper for the first time we present an exper-
imental comparison of several re-implemented Neural Network based
approaches for a demanding visual instruction task on a mobile sys-
tem. For the comparison we used several Neural Networks (Neural Gas,
SOM, LLM, PSOM and MLP) and a k-Nearest-Neighbourhood classifi-
cator on a common data set of images, which we recorded on our mobile
robot Horos under real world conditions. For feature extraction we use
Gaborjets and the features of a special histogram on the image. We also
compare the results of the different approaches with the results of human
subjects who estimated the target point of a pointing pose. The results
obtained demonstrate that a cascade of MLPs is best suited to cope with
the task and achieves results equal to human subjects.
1 Introduction and Motivation
In recent years the Human-Machine Interaction has reached a large importance.
One of the most important and informative aspects of nonverbal inter-human
communication are gestures and poses. In particular, pointing poses can simplify
communication by linking speech to objects or locations in the environment in a
well-defined way. Therefore, a lot of work has been done in recent years focusing
on integrating pointing pose estimation into Human-Machine-Interfaces.
Numerous approaches, which can estimate the target of such a pointing pose
have been developed in recent years. Our goal is to provide an approach, which
can be used to estimate a pointing pose on a mobile robot by means of low-cost
sensors. Therefore, in this paper we refer only to approaches using monocular
images to capture the pose of the user. Second, approaches that do not use
Neural Networks to estimate the target of the pointing pose like Haasch [1],
whousedanobject-attentionsystemandaskincolormaporNickel[2],who
estimated the target by the use of a virtual line through the tracked hand and
head of the user, are also not considered in this paper.
J. Marques de Sa´ et al. (Eds.): ICANN 2007, Part II, LNCS 4669, pp. 593–602, 2007.
c© Springer-Verlag Berlin Heidelberg 2007
Page 2
594 F.-F. Steege, C. Martin, and H.-M. Groß
Fig. 1. (left) Our robot Horos, used for experimental investigation of the pointing
pose estimation is shown. The images for the estimation of the pointing target were
taken with the firewire camera (located in the right eye). (right) The configuration
used for recording the ground truth training and test data. The subject stood in front
of the robot and pointed at one of the marked targets on the ground in a distance of
1 to 3 m from the subject. The distance of the robot to the subject varied between 1
mand2m.
However there are several approaches that utilize different Neural Networks
to estimate the pointing pose. No¨lker and Ritter [3] used Gaborfilters in com-
bination with a Local Linear Map (LLM) and a Parametrized Self-Organizing
Map (PSOM) to estimate the target of a pointing pose on a screen the user is
pointing to. Richarz et al. [4] recently also used Gaborfilters on monocular im-
ages and a cascade of Multi-Layer Perceptrons (MLP) as function-approximator
to determine the target point of a pointing-pose on the ground. Takahashi [5]
suggested to use a special kind of histogram features in combination with a SOM
to estimate the pose of a person in an image. Finally, since the head pose is typ-
ically also important for a pointing pose, approaches estimating the head pose
are also considered in this paper: Kru¨ger and Sommer [6] utilized Gaborfilters
and a LLM to estimate the head pose, while Stiefelhagen [7] presented a system
that works on edge-filtered images and uses a MLP for head pose estimation.
All these approaches achieved more or less good results for their particular
task, but can not be compared with each other, because they use different images
captured in different environments and they use different combinations of meth-
odsforfeatureextractionaswellasdifferent Neural Networks for approximating
the target point or the direction of the pose.
Therefore, for this paper we implemented and compared several selected neu-
ral approaches, all trained and tested with the same set of training and test data.
In this way we give an overview of the suitability of the different approaches for
the task of estimating a pointing pose on a monocular image. The referred ap-
proaches suggest different applications for the recognition of a pointing pose.
In our comparison we choose an application where a user points at a target on
Fig. 1. (left) Our robot Horos, used for experimental investigation of the pointing
pose estimation is shown. The images for the estimation of the pointing target were
taken with the firewire camera (located in the right eye). (right) The configuration
used for recording the ground truth training and test data. The subject stood in front
of the robot and pointed at one of the marked targets on the ground in a distance of
1 to 3 m from the subject. The distance of the robot to the subject varied between 1
mand2m.
However there are several approaches that utilize different Neural Networks
to estimate the pointing pose. No¨lker and Ritter [3] used Gaborfilters in com-
bination with a Local Linear Map (LLM) and a Parametrized Self-Organizing
Map (PSOM) to estimate the target of a pointing pose on a screen the user is
pointing to. Richarz et al. [4] recently also used Gaborfilters on monocular im-
ages and a cascade of Multi-Layer Perceptrons (MLP) as function-approximator
to determine the target point of a pointing-pose on the ground. Takahashi [5]
suggested to use a special kind of histogram features in combination with a SOM
to estimate the pose of a person in an image. Finally, since the head pose is typ-
ically also important for a pointing pose, approaches estimating the head pose
are also considered in this paper: Kru¨ger and Sommer [6] utilized Gaborfilters
and a LLM to estimate the head pose, while Stiefelhagen [7] presented a system
that works on edge-filtered images and uses a MLP for head pose estimation.
All these approaches achieved more or less good results for their particular
task, but can not be compared with each other, because they use different images
captured in different environments and they use different combinations of meth-
odsforfeatureextractionaswellasdifferent Neural Networks for approximating
the target point or the direction of the pose.
Therefore, for this paper we implemented and compared several selected neu-
ral approaches, all trained and tested with the same set of training and test data.
In this way we give an overview of the suitability of the different approaches for
the task of estimating a pointing pose on a monocular image. The referred ap-
proaches suggest different applications for the recognition of a pointing pose.
In our comparison we choose an application where a user points at a target on
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
1 Reader on Mendeley
by Discipline
by Academic Status
100% Ph.D. Student
by Country
100% Germany


