Human – robot interaction via voice-controllable intelligent user interface Harsha Medicherla † and Ali Sekmen
Robotica (2007)
- ISSN: 02635747
- DOI: 10.1017/S0263574707003414
Available from www.journals.cambridge.org
or
Author-supplied keywords
Available from www.journals.cambridge.org
Page 1
Human – robot interaction via voice-controllable intelligent user interface Harsha Medicherla † and Ali Sekmen
Robotica (2007) volume 25, pp. 521–527. © 2007 Cambridge University Press
doi:10.1017/S0263574707003414 Printed in the United Kingdom
Human–robot interaction via voice-controllable
intelligent user interface
Harsha Medicherla† and Ali Sekmen‡∗
†Department of Electrical and Computer Engineering, Tennessee State University, 3500 John A. Merritt Blvd. Nashville,
TN 37209, USA
‡Department of Computer Science, Tennessee State University, 3500 John A. Merritt Blvd. Nashville, TN 37209, USA
(Received in Final Form: January 23, 2007. First published online: March 6, 2007)
SUMMARY
Anunderstanding of howhumans and robots can successfully
interact to accomplish specific tasks is crucial in creating
more sophisticated robots that may eventually become an
integral part of human societies. A social robot needs to be
able to learn the preferences and capabilities of the people
with whom it interacts so that it can adapt its behaviors for
more efficient and friendly interaction. Advances in human–
computer interaction technologies have been widely used in
improving human–robot interaction (HRI). It is now poss-
ible to interact with robots via natural communication means
such as speech. In this paper, an innovative approach for
HRI via voice-controllable intelligent user interfaces is
described. The design and implementation of such interfaces
are described. The traditional approaches for human–robot
user interface design are explained and the advantages of the
proposed approach are presented. The designed intelligent
user interface, which learns user preferences and capabilities
in time, can be controlled with voice. The system was suc-
cessfully implemented and tested on a Pioneer 3-AT
mobile robot. 20 participants, who were assessed on spatial
reasoning ability, directed the robot in spatial navigation tasks
to evaluate the effectiveness of the voice control in HRI.
Time to complete the task, number of steps, and errors were
collected. Results indicated that spatial reasoning ability and
voice-control were reliable predictors of efficiency of robot
teleoperation. 75%of the subjects with high spatial reasoning
ability preferred using voice-control over manual control.
The effect of spatial reasoning ability in teleoperation with
voice-control was lower compared to that of manual control.
KEYWORDS: Human–robot interaction; Mobile robots; Speech
recognition; Intelligent user interfaces.
1. Introduction
One of the overarching goals of robotics research is that
robots ultimately coexist with people in human societies as
an integral part of them. In order to achieve this goal, robots
need to be accepted by people as natural partners within the
society. It is therefore essential for robots to have human-like
perception and interaction capabilities that can be utilized for
effective human–robot interaction (HRI).
∗Corresponding author. E-mail: asekmen@tnstate.edu
A social robot is defined as “an autonomous or semi-
autonomous robot that interacts and communicates with
humans by following the behavioral norms expected by the
people with whom the robot is intended to interact.”1 A
social robot needs to be able to learn the preferences and
capabilities of the people with whom it interacts so that
it can adapt its behaviors for more efficient and friendly
interaction. Social Robotics focuses on the development of
robots that operate with people to meet or address some
social needs.2 One active area of research in Social Robotics
is investigating specifically how to socially equip robots to
respond to the needs of the people. These needs can include
social companionship or entertainment, which try to elicit
social responses from people, such as Honda humanoid,
Kismet,3 and Sony Aibo.4 The continuum continues toward
the development of systems that draw upon social attitudes
to address specific needs of people, such as care-giving
in healthcare;5 autonomous systems such as in response
to AAAI Robotics Challenge,6 and “human-like” personal
assistance systems such as ISAC and Cog.7,8 This area
utilizes studies in interpersonal interaction for application
to interactions between people and systems. Studies have
shown that people respond to artificial systems with an
unconscious similarity to similar interpersonal situations,
including a tendency to anthropomorphize or attribute human
qualities.9,10
In some critical (social or nonsocial) applications, a human
user interacts with a robot via Graphical User Interfaces
(GUIs) and controls the robot with joystick, mouse, or
similar devices. GUIs usually contain standard components
considering a large number of users. Some of these user
interface components may be redundant and sometimes con-
fusing for some of the users depending on the user’s prefer-
ences, capabilities, and the context in which robots are used.
In addition, the users may sometimes need to control robots
without any physical effort. For example, it may be hard for a
disabled person to control a robot; a manual pointing device
and vocal interaction might be more convenient.
Spatial reasoning ability might be important in mobile
robot teleoperation, especially if the robot is at a distant
location from its operator.11,12 GUIs sometimes may create
heavy information load depending on the nature of the task
and the user’s skills such as his/her spatial reasoning ability.
For example, sonar range information may be extremely
useful for people with high spatial-reasoning ability to
doi:10.1017/S0263574707003414 Printed in the United Kingdom
Human–robot interaction via voice-controllable
intelligent user interface
Harsha Medicherla† and Ali Sekmen‡∗
†Department of Electrical and Computer Engineering, Tennessee State University, 3500 John A. Merritt Blvd. Nashville,
TN 37209, USA
‡Department of Computer Science, Tennessee State University, 3500 John A. Merritt Blvd. Nashville, TN 37209, USA
(Received in Final Form: January 23, 2007. First published online: March 6, 2007)
SUMMARY
Anunderstanding of howhumans and robots can successfully
interact to accomplish specific tasks is crucial in creating
more sophisticated robots that may eventually become an
integral part of human societies. A social robot needs to be
able to learn the preferences and capabilities of the people
with whom it interacts so that it can adapt its behaviors for
more efficient and friendly interaction. Advances in human–
computer interaction technologies have been widely used in
improving human–robot interaction (HRI). It is now poss-
ible to interact with robots via natural communication means
such as speech. In this paper, an innovative approach for
HRI via voice-controllable intelligent user interfaces is
described. The design and implementation of such interfaces
are described. The traditional approaches for human–robot
user interface design are explained and the advantages of the
proposed approach are presented. The designed intelligent
user interface, which learns user preferences and capabilities
in time, can be controlled with voice. The system was suc-
cessfully implemented and tested on a Pioneer 3-AT
mobile robot. 20 participants, who were assessed on spatial
reasoning ability, directed the robot in spatial navigation tasks
to evaluate the effectiveness of the voice control in HRI.
Time to complete the task, number of steps, and errors were
collected. Results indicated that spatial reasoning ability and
voice-control were reliable predictors of efficiency of robot
teleoperation. 75%of the subjects with high spatial reasoning
ability preferred using voice-control over manual control.
The effect of spatial reasoning ability in teleoperation with
voice-control was lower compared to that of manual control.
KEYWORDS: Human–robot interaction; Mobile robots; Speech
recognition; Intelligent user interfaces.
1. Introduction
One of the overarching goals of robotics research is that
robots ultimately coexist with people in human societies as
an integral part of them. In order to achieve this goal, robots
need to be accepted by people as natural partners within the
society. It is therefore essential for robots to have human-like
perception and interaction capabilities that can be utilized for
effective human–robot interaction (HRI).
∗Corresponding author. E-mail: asekmen@tnstate.edu
A social robot is defined as “an autonomous or semi-
autonomous robot that interacts and communicates with
humans by following the behavioral norms expected by the
people with whom the robot is intended to interact.”1 A
social robot needs to be able to learn the preferences and
capabilities of the people with whom it interacts so that
it can adapt its behaviors for more efficient and friendly
interaction. Social Robotics focuses on the development of
robots that operate with people to meet or address some
social needs.2 One active area of research in Social Robotics
is investigating specifically how to socially equip robots to
respond to the needs of the people. These needs can include
social companionship or entertainment, which try to elicit
social responses from people, such as Honda humanoid,
Kismet,3 and Sony Aibo.4 The continuum continues toward
the development of systems that draw upon social attitudes
to address specific needs of people, such as care-giving
in healthcare;5 autonomous systems such as in response
to AAAI Robotics Challenge,6 and “human-like” personal
assistance systems such as ISAC and Cog.7,8 This area
utilizes studies in interpersonal interaction for application
to interactions between people and systems. Studies have
shown that people respond to artificial systems with an
unconscious similarity to similar interpersonal situations,
including a tendency to anthropomorphize or attribute human
qualities.9,10
In some critical (social or nonsocial) applications, a human
user interacts with a robot via Graphical User Interfaces
(GUIs) and controls the robot with joystick, mouse, or
similar devices. GUIs usually contain standard components
considering a large number of users. Some of these user
interface components may be redundant and sometimes con-
fusing for some of the users depending on the user’s prefer-
ences, capabilities, and the context in which robots are used.
In addition, the users may sometimes need to control robots
without any physical effort. For example, it may be hard for a
disabled person to control a robot; a manual pointing device
and vocal interaction might be more convenient.
Spatial reasoning ability might be important in mobile
robot teleoperation, especially if the robot is at a distant
location from its operator.11,12 GUIs sometimes may create
heavy information load depending on the nature of the task
and the user’s skills such as his/her spatial reasoning ability.
For example, sonar range information may be extremely
useful for people with high spatial-reasoning ability to
Page 2
522 Human–robot interaction via voice-controllable intelligent user interface
navigate a mobile robot while it may be not beneficial for
low spatial-reasoning ability people. People with low spatial-
reasoning abilities may make use of a detailed status report
while people with high spatial-reasoning abilities may not.13
Intelligent User Interface (IUI) design has been studied
in different areas including educational systems, intelligent
support systems, and information filtering.14–16 IUIs should
be able to employ intelligent techniques. User adaptivity
and user modeling are two of such important techniques.17
In this research, we make use of user adaptivity and user
modeling techniques. We define an adaptive user interface
for robotics systems as: “A knowledge-based interface that
changes its contents to accommodate individual differences,
preferences, and to reflect the mission robots are used for.”
An IUI adapts itself and makes communication decisions
dynamically at run-time.18,19 An IUI differs from direct
manipulation interfaces, where the former takes decision
on behalf of the user and latter represents the case where
the graphical objects are presented to the user for direct
manipulation.20 The architecture of IUIs includes learning
the user model and inferring from the model to make
decisions. The user models are extracted from the knowledge
bases. Knowledge bases are structures that represent the
intelligence of these interfaces. In the work of Cook and
Kay the user model is displayed as a graph.21 Each node is
marked as known/not known or believed/not believed and
thereby the node probabilities are inferred from the model.
Run-time adaptation of information is the key in designing
IUIs. An algorithm for run-time adaptation is proposed by
Gorniak and Poole.22 This algorithm predicts future action by
observing the length of the sequences of actions, the actions
themselves, and the frequency of actions for predicting the
future user behavior. The Incremental Probabilistic Action
Modeling (IPAM)23 is another algorithm that predicts the
next element in a sequence based on detection of action
patterns. Gajos et al. implemented three GUIs and evaluated
them by comparing to a nonadaptive base. They employed
recency-based and frequency-based algorithms.24
Speech is the main communication means for human
beings. When people lack a common language, cooperation
is often greatly reduced. Stating this fact, we believe users
would interact with voice-controllable GUIs more efficiently
than with the traditional ones. In addition, a user may
need to control robot(s) without any physical effort. For
example, a soldier may not be in a suitable position to
command soldier robots manually or a disabled person
might find vocal communication more convenient. Oviatt
et al. discusses adaptive conversational (social) interfaces
and compares them to command interfaces.25
This paper describes the design, implementation, and
testing of a voice-controllable adaptive user interface for
a mobile robot in navigational tasks. The interface offers
different GUI components for a group of users depending
on their capabilities, preferences, and the part of the task
that they are interested in. The interface learns the users’
capabilities and preferences in time as they interact more
with the robot.
This paper is organized as follows: Section 2 describes
the development platform and the GUI used for HRI.
The voice-controllable IUI design and implementation is
Fig. 1. Pioneer with a laptop computer attached.
explained in Section 3. The experimental procedure to
assess the effectiveness of the IUI is given in Section 4.
The experimental results are presented in Section 5. Some
conclusions are given and the future work is motivated in
Section 6.
2. System Architecture
The Pioneer 3-AT produced by ActivMedia is shown in
Fig. 1. It has 16 sonar sensor range finders, a laser
range finder, a pan-tilt-zoom camera, bumpers, and optical
encoders. Fuzzy logic-based behaviors have been developed
and converted into Microsoft’s Component Object Model
(COM) components so that they can be easily integrated.
Some of the behaviors are emulating, tracking, following
wall, following center, move to point, and shadowing.
Figure 2 displays a simple GUI that is developed for
voice-controllable or nonvoice-controlled (manual control)
interaction with the robot. It provides drive commands,
camera display with pan-tilt control, sonar and laser range
finding visual displays, robot behavior controls, and status
reports. Figure 3 illustrates the system architecture. The user
can interact with the interface by speaking. The speech is
converted to commands that are understood by the robot.
The interactions of the user with the interface are recorded
in a database. When the database collects sufficient metrics,
the learning algorithm (described in the next section) forms a
tree-structured user model. The interactions of the user with
the interface are queried against the model and the system
predicts the future actions based on the model.
3. Intelligent User Interfaces
An interface is made intelligent by inferring from the user
model. One of the ways of developing the model is to
collect the metrics of users’ interaction with the interface.
The metrics are saved into a database and can be retrieved
when the application starts. After collecting the metrics, a
user model is developed using the learning algorithms of
Bayesian networks fromdata. Heckermen et al. combined the
prior knowledge of user with the incoming (statistical) data
to generate one or more Bayesian networks.26 Cheng et al.
employed an information theoretic dependency analysis for
learning Bayesian network structure.27 A message-passing
algorithm for inference in Bayesian networks was developed
by Pearl.28,29
navigate a mobile robot while it may be not beneficial for
low spatial-reasoning ability people. People with low spatial-
reasoning abilities may make use of a detailed status report
while people with high spatial-reasoning abilities may not.13
Intelligent User Interface (IUI) design has been studied
in different areas including educational systems, intelligent
support systems, and information filtering.14–16 IUIs should
be able to employ intelligent techniques. User adaptivity
and user modeling are two of such important techniques.17
In this research, we make use of user adaptivity and user
modeling techniques. We define an adaptive user interface
for robotics systems as: “A knowledge-based interface that
changes its contents to accommodate individual differences,
preferences, and to reflect the mission robots are used for.”
An IUI adapts itself and makes communication decisions
dynamically at run-time.18,19 An IUI differs from direct
manipulation interfaces, where the former takes decision
on behalf of the user and latter represents the case where
the graphical objects are presented to the user for direct
manipulation.20 The architecture of IUIs includes learning
the user model and inferring from the model to make
decisions. The user models are extracted from the knowledge
bases. Knowledge bases are structures that represent the
intelligence of these interfaces. In the work of Cook and
Kay the user model is displayed as a graph.21 Each node is
marked as known/not known or believed/not believed and
thereby the node probabilities are inferred from the model.
Run-time adaptation of information is the key in designing
IUIs. An algorithm for run-time adaptation is proposed by
Gorniak and Poole.22 This algorithm predicts future action by
observing the length of the sequences of actions, the actions
themselves, and the frequency of actions for predicting the
future user behavior. The Incremental Probabilistic Action
Modeling (IPAM)23 is another algorithm that predicts the
next element in a sequence based on detection of action
patterns. Gajos et al. implemented three GUIs and evaluated
them by comparing to a nonadaptive base. They employed
recency-based and frequency-based algorithms.24
Speech is the main communication means for human
beings. When people lack a common language, cooperation
is often greatly reduced. Stating this fact, we believe users
would interact with voice-controllable GUIs more efficiently
than with the traditional ones. In addition, a user may
need to control robot(s) without any physical effort. For
example, a soldier may not be in a suitable position to
command soldier robots manually or a disabled person
might find vocal communication more convenient. Oviatt
et al. discusses adaptive conversational (social) interfaces
and compares them to command interfaces.25
This paper describes the design, implementation, and
testing of a voice-controllable adaptive user interface for
a mobile robot in navigational tasks. The interface offers
different GUI components for a group of users depending
on their capabilities, preferences, and the part of the task
that they are interested in. The interface learns the users’
capabilities and preferences in time as they interact more
with the robot.
This paper is organized as follows: Section 2 describes
the development platform and the GUI used for HRI.
The voice-controllable IUI design and implementation is
Fig. 1. Pioneer with a laptop computer attached.
explained in Section 3. The experimental procedure to
assess the effectiveness of the IUI is given in Section 4.
The experimental results are presented in Section 5. Some
conclusions are given and the future work is motivated in
Section 6.
2. System Architecture
The Pioneer 3-AT produced by ActivMedia is shown in
Fig. 1. It has 16 sonar sensor range finders, a laser
range finder, a pan-tilt-zoom camera, bumpers, and optical
encoders. Fuzzy logic-based behaviors have been developed
and converted into Microsoft’s Component Object Model
(COM) components so that they can be easily integrated.
Some of the behaviors are emulating, tracking, following
wall, following center, move to point, and shadowing.
Figure 2 displays a simple GUI that is developed for
voice-controllable or nonvoice-controlled (manual control)
interaction with the robot. It provides drive commands,
camera display with pan-tilt control, sonar and laser range
finding visual displays, robot behavior controls, and status
reports. Figure 3 illustrates the system architecture. The user
can interact with the interface by speaking. The speech is
converted to commands that are understood by the robot.
The interactions of the user with the interface are recorded
in a database. When the database collects sufficient metrics,
the learning algorithm (described in the next section) forms a
tree-structured user model. The interactions of the user with
the interface are queried against the model and the system
predicts the future actions based on the model.
3. Intelligent User Interfaces
An interface is made intelligent by inferring from the user
model. One of the ways of developing the model is to
collect the metrics of users’ interaction with the interface.
The metrics are saved into a database and can be retrieved
when the application starts. After collecting the metrics, a
user model is developed using the learning algorithms of
Bayesian networks fromdata. Heckermen et al. combined the
prior knowledge of user with the incoming (statistical) data
to generate one or more Bayesian networks.26 Cheng et al.
employed an information theoretic dependency analysis for
learning Bayesian network structure.27 A message-passing
algorithm for inference in Bayesian networks was developed
by Pearl.28,29
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
3 Readers on Mendeley
by Discipline
33% Engineering
33% Psychology
by Academic Status
33% Student (Master)
33% Ph.D. Student
33% Student (Postgraduate)
by Country
33% United Kingdom
33% Switzerland
33% Spain


