The use of eye-gaze data in the evaluation of assistive technology software for older people.
Abstract
This paper reports on recent work undertaking usability study of a software-based assistive technology. The software was developed to support increased opportunities and interactions for people in residential nursing homes and extra-care housing. The objective of the project was to allow older people and those with early onset of dementia to have access to some of the functionality of modern computers. The software could also have applications in other markets, such as schools and for older people living at home. The intention is to provide opportunities for active participation and facilitate more access to hobbies, interests, past-times and to develop and maintain social networks. The complex interface of modern computers otherwise often excludes people from access to digital media including video and internet telephony, games and activities, information and resources on the internet and other facilities that may be useful to them if presented in a different way.The study presented is being carried out in 3 residential homes with 20 participants. Eye-gaze recording was a key element of the usability testing. The study methodology was designed to provide feedback towards the design of the software and to better understand the use of computers by this target group. This paper presents the results of the first stage of the usability study, in particular the paper concentrates on the use of the eye-gaze data. The design of the sessions allowed participants to explore the system independently and then to complete some pre-defined tasks. The users' interaction with the computer was recorded through video, audio, screen and eye-gaze recording as well as a data-log of the physical and eye interaction. The process of acquiring eye-gaze data with this fairly non-typical cohort is examined and the value of this data in contributing to the design of this software is explored.
The use of eye-gaze data in the evaluation of assistive technology software for older people.
COGAIN 2008
‘Communication, Environment
and Mobility Control by Gaze’
Edited by Howell Istance, Olga Štěpánková and Richard Bates
COGAIN NoE is funded by the EU IST 6th framework program
ISBN 978-80-01-04151-2
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 1
Prague, Czech Republic
Welcome to COGAIN 2008!
This is the fourth international conference on communication by
gaze interaction organised by the European Framework 6
Network of Excellence, COGAIN. The first conference in
Copenhagen, held in 2005, was an opportunity for partners in the
Network to take stock of current research in Europe. The second
conference was an open event, held in Turin, and had ‘gazing
into the future’ as its theme. Its purpose was to chart a research
agenda for gaze-based communication over the coming 10 years. The third conference in Leicester
attracted approximately 100 delegates from 16 countries. Its theme was gaze-based creativity and
interacting with games and on-line communities, the first conference ever to have this as its main focus.
The theme of this year’s conference is communication, environmental control and mobility
control, particularly for people with motor impairments. We are expanding the areas in which gaze
communication can be used effectively to include control of mobility and of the user’s environment. This
will require that gaze tracking systems can be used on mobile platforms and in a range of different lighting
conditions. A person with motor neuron disease and an experienced user of gaze communication devices
said recently "One of the future challenges should be to make a computer so that you can drive the
wheelchair (safely!) using only an eye tracker". Enabling a suitable level of safety is critical if this
challenge is to be met and we need to understand fully the safety issues involved, and investigate
thoroughly solutions to these issues.
Gaze control has the potential to make really significant improvements to the quality of life of
people with severe motor impairments. There is a need for faster and more versatile interaction techniques
as well. It is important that the needs of users, both able-bodied and with disabilities are studied and that
system solutions are evaluated against these. The conference reflects the importance of these aspects in its
three paper sessions.
Now at the end of its fourth
year, the COGAIN conferences have
become the major international events
that focus particularly on gaze
communication for people with
disabilities. It has become clear that
there is a need for a permanent
communication platform where all
parties interested in gaze
communication can meet, discuss and
collaborate.
This is the aim of COGAIN association
(http://www.cogain.org/association),
which will be established during the
second day of the conference
(3rd September 2008) in Prague.
So welcome to Prague and enjoy the
conference!
Howell Istance and Olga Štěpánková
COGAIN 2008 Conference Co-chairs
A highlight of any visit to Prague is the Charles Bridge. The
stone Gothic bridge connects the Old Town and Malá Strana in
Prague and was commissioned by Czech King and Holy Roman
Emperor Charles IV in 1357. Charles Bridge is on the top of
every Prague visitor's must-see list with thirty 17th century
Baroque statues lining the bridge, and a beautiful view of the city.
It is also popular with artists, musicians and souvenir vendors.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
2 September 2-3, 2008
Prague, Czech Republic
COGAIN 2008 Programme
8:30-9:00 Registration
09:00-
10:30
Keynote: The Human-Technical Challenge of
Developing Gaze-Controlled Devices
Dr. Anthony Hornof, University of Oregon, USA
10:30-
11:00
Refreshments
11:00-
12:30
Session 1: Overcoming Technical Challenges in
Mobile and Other Systems
Off-the-Shelf Mobile Gaze Interaction
J. San Agustin and J. P. Hansen, IT University of Copenhagen, Denmark
Fast and Easy Calibration for a Head-Mounted Eye Tracker
C. Cudel, S Bernet, and M Basset, University of Haute Alsace, France
Magic Environment
L. Figueiredo, T. Nunes, F. Caetano, and A. Gomes, ESTG/IPG, Portugal
AI Support for a Gaze-Controlled Wheelchair
P. Novák, T. Krajník, L. Přeučil, M. Fejtová, and O. Štěpánková, Czech
Technical University, Czech Republic
A Comparison of Pupil Centre Estimation Algorithms
D. Droege, C Schmidt, and D. Paulus, University of Koblenz-Landau,
Germany
12:30-
14:00
Lunch
14:00-
15:30
Session 2: Broadening Gaze-Based Interaction
Techniques
User Performance of Gaze-Based Interaction with On-line Virtual
Communities
H. Istance, De Montfort University, UK, A. Hyrskykari, University of
Tampere, Finland, S. Vickers, De Montfort University, UK and N. Ali,
University of Tampere, Finland
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 3
Prague, Czech Republic
Multimodal Gaze Interaction in 3D Virtual Environments
E. Castellina and F. Corno, Politecnico di Torino, Italy
How Can Tiny Buttons Be Hit Using Gaze Only?
H. Skovsgaard, J. P. Hansen, IT University of Copenhagen, Denmark
J. Mateo, Wright State University, Ohio, US
Gesturing with Gaze
H. Heikkilä, University of Tampere, Finland
NeoVisus: Gaze Driven Interface Components
M. Tall, Sweden
15:30-
16:00
Refreshments
16:00-
17:30
Session 3: Focusing on the User: Evaluating Needs
and Solutions
Evaluations of Interactive Guideboard with Gaze-Communicative
Stuffed-Toy Robot
T. Yonezawa, H. Yamazoe, A. Utsumi, and S. Abe, ATR Intelligent
Robotics and Communications Laboratories, Japan
Gaze-Contingent Passwords at the ATM
P. Dunphy, A. Fitch, and P. Oliver, Newcastle University, UK
Scrollable Keyboards for Eye Typing
O Špakov and P. Majaranta, University of Tampere, Finland
The Use of Eye-Gaze Data in the Evaluation of Assistive
Technology Software for Older People
S. Judge, Barnsley District Hospital Foundation, UK and S. Blackburn,
Sheffield University, UK
A Case Study Describing Development of an Eye Gaze Setup for a
Patient with 'Locked-in Syndrome' to Facilitate Communication,
Environmental Control and Computer Access
Z. Robertson and M. Friday, Barnsley General Hospital, UK
17:30 Close
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
4 September 2-3, 2008
Prague, Czech Republic
COGAIN 2008 Papers Index
Session 1: Overcoming Technical Challenges in Mobile and other Systems 5
Off-the-Shelf Mobile Gaze Interaction.....................................................................6
Fast and Easy Calibration for a Head-Mounted Eye Tracker.......................................... 11
Magic Environment.......................................................................................... 15
AI Support for a Gaze Controlled Wheelchair .......................................................... 19
A Comparison of Pupil Centre Estimation Algorithms ................................................. 23
Session 2: Broadening Gaze-based Interaction Techniques..................27
User Performance of Gaze-Based Interaction with On-line Virtual Communities ................ 28
Multimodal Gaze Interaction in 3D Virtual Environments ............................................ 33
How Can Tiny Buttons Be Hit Using Gaze Only? ........................................................ 38
Gesturing with Gaze ........................................................................................ 43
NeoVisus: Gaze Driven Interface Components.......................................................... 47
Session 3: Focusing on the User: Evaluating Needs and Solutions ..........52
Evaluations of Interactive Guideboard with Gaze-communitive Stuffed-toy Robot.............. 53
Gaze-Contingent Passwords at the ATM................................................................. 59
Scrollable Keyboards for Eye Typing ..................................................................... 63
The Use of Eye-Gaze Data in the Evaluation of Assistive Technology Software for Older
People ........................................................................................................ 67
A Case Study Describing Development of an Eye Gaze Setup for a Patient with ‘Locked-In
Syndrome’ to Facilitate Communication, Environmental Control and Computer Access........ 72
COGAIN 2008 Keynote by Dr. Anthony Hornof .................................76
The Human-Technical Challenge of Developing Gaze-Controlled Devices ......................... 76
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 5
Prague, Czech Republic
Session 1: Overcoming Technical Challenges in Mobile and
other Systems
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
6 September 2-3, 2008
Prague, Czech Republic
Off-the-Shelf Mobile Gaze Interaction
Javier San Agustin
IT University of Copenhagen
Rued Langgaards Vej 7, 2300-DK
javier@itu.dk
John Paulin Hansen
IT University of Copenhagen
Rued Langgaards Vej 7, 2300-DK
paulin@itu.dk
Keywords
Gaze interaction, head-mounted display, head-mounted eye tracker, off-the-shelf
Introduction
In this paper we present a prototype of a mobile gaze interaction system based on a commercial head-
mounted display (HMD) and an inexpensive webcam for tracking the user's eye movements. The
components are off-the-shelf and our solution does not require any hardware modifications. The total cost
of the hardware components (not including a laptop PC) is less than 200€.
HMDs are becoming increasingly popular as a means to obtain information on-the-spot in applications
such as medicine, entertainment, augmented reality, maintenance or telerobotics (Liu et al. 1993,
Tanriverdi and Jacob 2000, Broll et al. 2006). Displaying the information right in front of the user's eye(s)
holds interesting potentials. For instance, a technician repairing a defective wire in a building can benefit
from looking at maps and diagrams of the electrical installation on a head-mounted display, offering him
the possibility of accessing important information without moving. During an operation, a doctor might
need to look at different images and information of the patient being operated, and having them at-a-
glance on an HMD can be more efficient than turning towards a desktop computer.
Even relatively high-resolution HMDs (640x480 and higher) are comfortable to wear, weighing around
100 to 200 grams. A growing number of companies is producing HMDs at a relatively low price (200 to
400 US$) for mini-PCs, mobile phones or mp4 video players. Most of the systems are non-immersive,
allowing the user to maintain a view of the physical environment.
The new mobile displays create a demand for an efficient technique to interact with the information
displayed in the HMD. When the hands are needed for other tasks, hand-controlled devices such as
keyboard or mouse become awkward. Gaze interaction with the HMD can potentially provide a hands-free
pointing technique (Bleach et al., 1998).
People using augmented and alternative communication tools may benefit from an HMD with gaze
control. Daily activities, like driving a wheelchair, would not be interrupted when communicating. People
without control of their hands could communicate on-the-move and in bed without requiring external
assistance to reposition the equipment.
There are several challenges in the development of gaze interactive HMD systems. First and foremost
present head-mounted eye trackers are expensive. Secondly, adding a gaze tracker to a display may
increase the weight and make the complete system uncomfortable to wear. Thirdly, the user may feel
“odd” wearing bulky gear in front of the eyes. In addition, tracking the eyes in mobile conditions may be
more complicated than in a well-controlled environment as when sitting in front of a desktop computer.
Light conditions will change as the user walks around different scenarios, introducing the need for robust
eye tracking techniques (Hansen and Pece 2005). Movements of the head may also cause the camera to
slip. However, if the display and the tracker are mounted on the same frame, issues related to head
movements are eliminated from the gaze tracking process.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 7
Prague, Czech Republic
The purpose of the research project reported in this paper is to investigate the possibilities of building a
low-cost mobile HMD system that allows for gaze control with a standard PC. The availability of such a
system would make it possible for researchers on a limited budget to explore solutions to the challenges
listed above. There might also be some users who would like to test a mobile system in real-life, even with
the present shortcomings.
Gaze interaction on HMDs
Interaction on the HMD can be performed by means of gaze tracking. Although remote eye trackers are
less intrusive than head-mounted, they do not represent a viable solution for interacting with mobile
displays. Furthermore, recent advances in miniaturization of cameras, batteries and light sources have
reduced the weight and intrusivenes of head-mounted eye trackers.
A number of such systems have been described in the literature. Babcock and Pelz (2004) presented a
system to be used in off-line situations. It includes a camera that records the scene in front of the user.
After recording a sequence, gaze information can be obtained and combined with scene information. Li et
al. (2005) introduced a similar system that works in real-time. Although they use off-the-shelf
components, their approach involves ingenious hardware modifications that require an advanced
knowledge on electronics, which may prevent potential users from building the system. Smith et al. (2005)
presented the ViewPointer, a head-mounted eye tracker that enhances context information when the user
looks at pre-tagged physical objects by detecting whether the user looks directly at the object. This
approach does not estimate gaze coordinates and thus is not suitable for interaction with a display.
Our prototype makes use of off-the-shelf components that do not require hardware modifications.
Hardware
Our system consists of a Sandberg Nightvision camera (Figure 1), which provides a resolution of 640x480
at 15 Hz or 320x240 at 30 Hz. It costs around €15 and weighs 100 grams. It has 6 built-in infrared LEDs.
Infrared light improves the illumination conditions of the image and ease the detection of the eye features.
We take advantage of the built-in infrared light to create a dark-pupil effect.
A commercial binocular head-mounted display (Vuzix DV920, Figure 1) is connected to a standard laptop
PC. It provides a resolution of 1024x768 pixels and weighs 100 grams. The binocular HMD prevents
ambient light from reaching the user's eye, eliminating most of the undesired reflections on the sclera and
iris. However, the user can still see parts of the surrounding environment by looking above or below the
2.5 cm thick display frame. The current price is about €185.
Figure 1. Nightvision web camera (left). Binocular head-mounted display (right)
The camera has not been fixed to the HMD. Instead, it is mounted on a lightweight strap helmet that can
be adjusted to the user's head. The camera may thus be conveniently positioned close to the eye to obtain a
good image of the pupil. Figure 2 shows a user wearing the HMD and the head-mounted camera looking
into the eye from below the HMD.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
8 September 2-3, 2008
Prague, Czech Republic
Figure 2. A user wearing the HMD and the head-mounted eye tracker
Algorithm for tracking the eye
The eye tracking algorithm uses the dark-pupil technique, and is based on fitting an ellipse to the contour
of the pupil. A point on the contour is considered to have maximum gradient along any line extending
from the initial guess of the center of the pupil. A set of 80 points on the contour are located by calculating
the maximum gradient. The size of the pupil in the previous image is taken into account to calculate the
length of the lines along which the gradient is calculated. This avoids taking points far away from the
pupil as belonging to its contour.
Once the points on the pupil contour are located, an ellipse is fitted to these points. Since the number of
points on the contour is usually high and very few points are located far from the pupil, we use a 2-step
approach to estimate the ellipse. First, an initial ellipse is fitted to all the points. The shape of this ellipse
might be deformed due to the presence of outliers. A second ellipse is then fitted using only the points that
lie close to the first ellipse. Most outliers are eliminated by this technique. This approach avoids using
iterative methods such as RANSAC, which are inefficient and require higher processing time.
Calibration
Gaze is estimated from the center of the pupil. A calibration process is required to map the pupil position
to the HMD screen. A set of points is shown in the display and the user has to look at them in sequential
order. A second order polynomial regression is then applied to estimate gaze (Morimoto et al., 1999).
Calibration takes around 30 seconds.
Results
The accuracy of the system has been evaluated by conducting an experiment with three subjects. Each
subject calibrated the eye tracker by looking at 16 targets displayed on the HMD. Upon completion of
calibration, the user was instructed to look again at the 16 targets. Gaze location was estimated in real time
during the test phase. No smoothing was applied to the estimated gaze coordinates.
Accuracy was evaluated under two different conditions. In the first one, the subject stood still. This
situation is equivalent to using a remote eye tracker while maintaining the head still. In the second test the
subject was instructed to walk along a corridor while looking at the targets presented on the screen. Figure
3 shows the accuracy in degrees for each of the users in both conditions.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 9
Prague, Czech Republic
Figure 3. Accuracy for each user when standing still and when walking
When standing still, the average accuracy obtained is 0.766º ± 0.49. Since the camera is located close to
the eye, a good quality image of the pupil is obtained, and thus the estimated center is very precise. On the
contrary, in a mobile scenario where the user is walking the accuracy drops to an average of 2.20º ± 1.38.
Camera and HMD are not fixed to each other, and therefore there are relative movements between both
components. This introduces errors in the estimated gaze position. In addition, the system might slip as the
user walks. Integrating the camera and the HMD into one element would improve the accuracy in a mobile
scenario. Figure 4 shows the estimated gaze positions for one of the users when the user is standing still
and when the user is walking.
Figure 4. Estimated gaze positions for one user: standing still (left) and walking (right)
Discussion
We have developed and built a prototype of a head-mounted eye tracker that allows the user to interact
with a commercially available head-mounted display. The whole system costs 200€ and weighs a total of
200 grams. The preliminary tests show an average accuracy under 1º when standing still and around 2.5º
when walking. Since the camera and the HMD are not fixed to each other, the accuracy is affected by
relative movements between both elements when the user walks. However, even while walking the
accuracy is high enough to interact with noise-tolerant interfaces that have been specifically designed for
gaze typing (e.g. Hansen et al. 2001, Hansen et al. 2008). While standing it is possible to interact with a
normal windows environment through the use of standard gaze-clicking techniques (dwell, zooming or
two-step magnification).
The prototype can undergo a number of improvements. Integrating the camera completely with the HMD
is the most obvious next step. However, a complete integration of HMD and eye tracker will require some
hardware modifications or special manufacturing. We are considering alternative solutions for a flexible
mounting of the camera binding it close to the display but without covering the user’s face. The design of
a cool-looking face mount is probably the biggest challenge remaining. We expect that manufacturing
companies of e.g. bike helmets, sunglasses, earphones or visors will have the competence to solve this
problem.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
10 September 2-3, 2008
Prague, Czech Republic
Gaze can be used for pointing, but it lacks the ability to perform selections, i.e. clicking. Facial muscle
activity through an EMG switch can provide a reliable solution to perform activations in combination with
gaze pointing (Mateo et al., 2008). Voice recognition could also be used to activate certain predefined
actions.
Acknowledgments
This work was supported by the European Network of Excellence COGAIN, Communication by Gaze
Interaction, funded under the FP6/IST programme of the European Commission.
References
Babcock, J. and Pelz, J. (2004) Building a lightweight eyetracking headgear. Proceedings of the 2004
symposium on Eye tracking research & applications.
Bleach, G., Cohen, C.J., Braun, J. and Moody, G. (1998) Eye tracker system for use with head mounted
displays. IEEE International Conference on Systems, Man, and Cybernetics.
Broll, W., Ohlenburg, J., Lindt, I., Herbst, I., Braun, A-K. (2006) Meeting technology challenges of
pervasive augmented reality games. Proceedings of 5th ACM SIGCOMM workshop on Network
and system support for games.
Hansen, D.W. and Pece, E.C. (2005) Eye tracking in the wild. Computer Vision and Image
Understanding.
Hansen, D.W., Skovsgaard, H.H.T., Hansen, J.P., Møllenbach, E. (2008) Noise tolerant selection by gaze-
controlled pan and zoom in 3D. Proceedings of the 2008 symposium on Eye tracking research &
applications.
Johansen, A. and Hansen, J.P. (2006) Augmentative and alternative communication: the future of text on
the move. Universal Access in the Information Society, Volume 5, Number 2.
Li, D., Winfield, D. and Parkhurst, D. (2005) Starburst: A hybrid algorithm for video-based eye tracking
combining feature-based and model-based approaches. IEEE Computer Society Conference on
Computer Vision and Pattern Recognition.
Liu, A., Tharp, G., French, L., Lai, S. and Stark, L. (1993) Some of what one needs to know about using
head-mounted displays to improve teleoperator performance. IEEE Transactions on Robotics and
Automation.
Morimoto, C.H., Koons, D., Amit, A., Flickner, M., Zhai, S. (1999) Keeping an eye for HCI. Proceedings
of the XII Brazilian Symposium on Computer Graphics and Image Processing.
Mateo, J.C., San Agustin, J. and Hansen, J.P. (2008) Gaze beats mouse: hands-free selection by combining
gaze and EMG. Proceedings of the SIGCHI conference on Human factors in computing systems.
Smith, J.D., Vertegaal, R. and Sohn, C. (2005) ViewPointer: Lightweight calibration-free eye tracking for
ubiquitous handsfree deixis. Proceedings of the 18th annual ACM symposium on User interface
software and technology.
Tanriverdi, V. and Jacob, R.J.K. (2000) Interacting with eye movements in virtual environments.
Proceedings of the SIGCHI conference on Human factors in computing systems.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
12 September 2-3, 2008
Prague, Czech Republic
centre and calibration points are both automatically detected with basic image processing algorithms. This
offers an easy calibration for eye trackers.
Assuming that the head (or eye) remains in the same plane during the calibration, we demonstrate now
that these two calibrations methods are identical. Here, we use homographies to formulate the relations
between the planes of the model, but polynomial expressions could also be suitable. Because calibration is
used to compute the mapping from eye to scene cameras, we use the eye camera as the geometric
reference for the following equations. P is a calibration point P (in meters), vectors pe and ps are
respectively the projection (in pixels) of P on eye and scene cameras.
Assuming that the observed scene is plane, the relation between P and its projections pe and ps
respectively on Eye and Scene cameras are respectively:
P= H 1 . pe and ps= H 2 . P
with the situation of Figure 1, H1 and H2 don't change during the calibration step and the relation between
ps and pe can be written as:
ps= H 2 . H1 . pe= H.pe .
If we consider the scheme of Figure 2 where the head moved, the projections of P become:
P= H 1 ' . pe ' and ps '= H 2 ' . P
Because as eye and scene cameras are rigidly attached, H is unchanged and:
ps '= H 2 ' . H1 ' . pe'= H. pe '
Thus ps' and pe' can be used to compute H.
Figure 1. Top sight of head mounted eye tracker. P
projection on scene camera and pupil projection on eye
camera when gaze direction is oriented on P.
Figure 2. Example of a head movement on the left. H
doesn't change in the eye camera reference.
This method is correct only if the head movements are confined in a plane. If this hypothesis is not
respected, points coordinates will be affected by the parallax. On Figure 3, we present the error induced by
parallax in the condition where the gaze is focused on a point P and where the eye tracker user moves
following the axis of the gaze direction. In this case pe stays at the same location but ps changes of position
in the image of the scene camera. With a calibration made at 1 meter (distance between P and user's eye)
and with an eye tracker user moving from -0.5 to 10 meters, the parallax (dp) is characterized by
variations of ps position from 40 to 25 pixels (Figure 3). For head user staying in a depth of +/- 5cm during
the calibration process, ps coordinates are affected by an error below 2 pixels. This computation is
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 13
Prague, Czech Republic
obtained with intrinsic parameters of our scene camera and by considering that the scene camera and user
eye are distant of 5 cm (Figure 4).
Figure 3. ps variations when eye camera moves. Right figure is zoom around the calibration distance.
Calibration results
Compared to the classical calibration based on the use of a grid, our method is easier. In practice, the user
starts the calibration and just needs to look at the infrared led from different positions. The acquisition of
eye and scene images is continuous and takes only few seconds. Real times images processing algorithms
are used to automatically segment both the pupil position in the eye image and the infrared led position in
the scene image. This method allows to use more than 9 points to compute the mapping between eye and
scene cameras with Levenberg Marquart algorithm. Because acquisition is continuous, mapping
computation can be affected by false correspondences (outliers). We use a RANSAC algorithm (Fischler,
1981) to remove these outliers.
We propose to compare several kind of relations found in the literature (Li, 2005), which are used for eye
tracking systems. Table 1 compares the precision of our method with the one obtained using 9 point grid.
Points of calibrations were located 1 meter in front of the eye tracker. Here, the presented results are
obtained with a mapping based on pupil center methods. The corneal reflection or glint methods have not
yet been tested, but results should be closed. Table 1 here used the same points for the calibration and the
measurement.
Results with a grid of 9 points Results with a single calibration point
Expression between eye and scene camera Distance mean
Error (pixels)
Angle mean
Error (degrees)
Distance mean
Error (pixels)
Angle mean
Error (degrees)
Homography 9.1 0.6° 9.8 0.6°
Polynomials expressions tested : )..(
1
0
n
yn
n
N
n
xns yaxaax ++= ∑
=
)..(
1
0
n
yn
n
N
n
xns ybxbby ++= ∑
=
N=1 (first order) 15.6 1° 6.2 0.4°
N=2 (second order) 6.8 0.4° 3.9 0.3°
N=3 (third order) 3.9 0.2° 2.7 0.2°
Table 1. comparison of our method with the one obtained with 9 points calibration. The working distance is of 1
meter.
Eye camera variations position along the gaze direction (meters)
-1 0 1 2 3 4 5 6 7 8 9 10
0
10
20
30
40
-0.1 -0.05 0 0.05 0.1
0
0.5
1
1.5
2
2.5
3
dz
dp
dp
:
va
ri
at
io
ns
o
f
ps
p
os
iti
on
s(
pi
xe
ls
)
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
14 September 2-3, 2008
Prague, Czech Republic
We give the mean error between the calculate and real gaze position on scene camera. Results are also
presented as mean angle gaze direction error. The obtained results are comparable, which can be easily
explained by considering that the parallax error is in practice close to the error made by an eye tracker user
when the gaze position is manually validated during the calibration.
Figure 4 represents the latest version of our eye tracker. Software is running in real time and has been
developed in C++ using CVB and OpenCV libraries.
Figure 4. Illustration of our eye tracking system
Conclusion
This paper has presented an easy eye tracker calibration procedure, which takes only few seconds by using
only a simple infrared led and with no external assistance. By using simple image processing methods,
each calibration point is automatically detected, and the eye tracker user doesn't need to manually validate
it. We showed that the results are comparable to a procedure with a grid of 9 points. We showed that the
parallax is an important, factor which can blur the calibration but also the results. This procedure is well
adapted for our applications, where we want to analyse the driver's behaviour for various situations of
driving. The major advantages are that the driver's head doesn't need to be fixed and a single led is used
instead of a grid of points for the calibration.
References
Duchowski, T. (2007) Eye Tracking Methodology theory and practice, 2nd edition, Springer editor
Basset, M., Cudel, C., Georges, V., Gissinger, G.L (2005) Visual characterization of road driver’s
behaviour. WISP 2005, International symposium on intelligent signal processing, Faro, Portugal,
1-3 sept. 2005, 6 pages.
Villanueva, A., Cadeza, R., Porta, S. (2006) Eye Tracking: Pupil orientation geometrical modelling, Image
and Vision Computing, 24, elsevier, pp.663-679.
Li, D., Winfield, D., Parkhurst, D. J. (2005) Starburst: A hybrid algorithm for video-based eye tracking
combining feature-based and model-based approaches. IEEE Computer Society Conference on
Computer Vision and Pattern Recognition, Vol. 3, San Diego, USA, pp. 79.
Fischler, M. and Bolles, R. (1981) Random sample consensus: a paradigm for model fitting with
applications to image analysis and automated cartography, vol. 24, Communications of the ACM,
pp. 381-395.
Scene camera
Eye camera
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 15
Prague, Czech Republic
Magic Environment
Luis Figueiredo
ESTG/IPG
Guarda
Portugal
Luis.figueiredo@ipg.pt
Tiago Nunes
ESTG/IPG
Guarda
Portugal
tiagomkey@gmail.com
Filipe Caetano
ESTG/IPG
Guarda
Portugal
fkaetano@gmail.com
Ana Gomes
Agrupamento Escolas Infante
D. Henrique, Viseu
Portugal
Ana.isabel.gomes@netvisao.pt
Keywords
Environment eye control
Introduction
In recent years, we have witnessed a great development in eye gaze systems that allow handicapped
people to interact with the computer. Shi et Galley (2007) propose a remote eye tracking system with 3
cameras for environmental control, which proves the success of this new technology for handicapped
people.
They have presented some works that allow the control of wheelchairs through the movements of the eyes.
These systems usually use electromyography signals captured by electrodes placed around the user’s eyes.
After the processing of these signals, the systems can generate control signals for the wheelchair. Law et
al (1999) and Barea et al (2002) are examples of these systems.
The following natural step is to join an eye gaze system like the one described by Figueiredo et Gomes
(2007), with a conventional wheelchair to verify whether the control of an wheelchair only with the eye
gaze is viable, in the current state of the technology.
In this paper two applications will be presented that allow environment control and an electric wheelchair
control only with eye gaze.
Applications description
Environment Control
With the developed application of environment control we intend to provide the user with a simple and
configurable tool according to his/her needs, involving low cost hardware, that enables the control of any
infrared device or any electric device connected to a radio frequency receiver. We developed two different
circuits with the blocks diagram presented in Figure1.
The emitter circuit communicates with the PC application by USB through a PIC microcontroller and
allows recording infrared signals, the emission of infrared signals and the emission of radio frequency
signals for the receiver module. The receiver module receives and processes the signals and allows the
control of electric devices through the PIC microcontroller firmware. As for software, we developed an
application that can create a limitless set of communication pictures and in each communication picture it
can also create a high set of buttons. A function can be associated to each button in order to control an
infrared device, an electric device, or both. The only thing that an eye gaze user will have to do is to select
the communication picture button whose function he/she intends to activate.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
16 September 2-3, 2008
Prague, Czech Republic
Figure 1: Block diagrams for the Emitter and Receiver
Wheelchair control
Since we used a low cost wheelchair for the tests, the first problem was the absence of a digital interface
that allowed the connection between the PC and the wheelchair. This problem was solved with the
development of an auxiliary circuit whose blocks diagram we present in Figure 2.
Figure 2: Software/hardware interface
For the wheelchair control, we try two approaches. First, the conceptual and simplest ones consist of
placing a menu in the computer with 8 buttons indicating the different possible directions for the
wheelchair movements. The user would have to activate each one of these buttons to follow in the desired
direction. This solution has two problems. First, it is very complicated to control the wheelchair speed.
Second, the user must constantly switch eye gaze from the laptop to the physical environment and vice-
versa to be able to always lead the wheelchair in a safe position. This way of managing eye control was
abandoned due to the deficient results obtained.
In the second approach, we remove the laptop from the user’s sight and leaving only the digital cam to
capture the user’s face, which enables the eye gaze estimation. The user eye gaze directed to the cam will
correspond to the stop position, i.e. the wheelchair is without movement when the user looks at the cam.
The user can control not only the wheelchair direction but also its speed. To increase its speed the user
must look up in relation to the cam. To diminish the speed he/she must look at a zone next to the cam.
Looking at a zone below the cam the wheelchair runs backwards, also with the controlled speed with the
eye gaze. For direction control (left and right), we obtained exceptional results, since we can control the
wheelchair with high stability and precision. In fact, the system auto-adjusts the wheelchair movement by
a natural feedback control system. If the user fixes his/her eyes in one target direction, and if the
wheelchair doesn’t take this direction, then the displacement of the user eyes will tend to increase in the
Y direction
USB
PIC
PC
PIC
Original
joystick
Wheelchair
X direction
X
Y
PIC
USB
PIC
PC Infrared
emitter
Infrared
receptor
Wire-less
transmitter
Wire-less
receiver
Power
control
Emitter Receiver
PI
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 17
Prague, Czech Republic
inverse direction, which immediately implies the correction of its route. As it is not easy to translate the
true control that we obtained with this system into words, graphs or tables, we will play a video that shows
the first experience made with this system: http://www.youtube.com/watch?v=igIG-hMh3jU
Some simple security mechanisms were introduced:
• Disconnect the wheelchair whenever the PC does not send a control message to the wheelchair for
a time period longer than 100ms.
• Activate the wheelchair when the user looks sequentially to the cam, to the right, to the cam, to
the left and again to the cam. Only after these eye movements can the user control the wheelchair.
• Deactivate the wheelchair when the user looks to the cam (stop the wheelchair) for a fixed time.
This allows the user to look wherever he/she wants to without moving the wheelchair when it is
deactivated.
• Use a wireless communication device that disconnects the wheelchair in an emergency.
Future work
Since this basic work is to show, in practical terms, the viability of this project, we have already identified
some future tasks to improve its functionality. We point out the following:
• Develop an Eye Gaze system using only a web cam instead of a high definition cam, considering
that we have an excess resolution in the eye gaze determination for wheelchair control.
• Develop an Eye Gaze system that is more tolerant to the surrounding light, and allows its use
outdoors.
• Provide the chair with a set of sensors that increase its security.
• Find alternatives to the computer to implement the Eye Gaze algorithms, which would increase
the wheelchair autonomy. The use of embedded systems is fundamental for the future of this
work.
• Test the wheelchair with real users.
Conclusion
The development of simple hardware devices that communicate with the PC can substantially increase the
interaction between an Eye Gaze user and the environment. Being nothing new, this is still simple,
economic, functional and easy to install.
It’s easy and economic to construct the hardware that establishes a digital connection between the PC and
a conventional wheelchair that does not have any digital interface.
It is possible to control the speed and direction of a wheelchair with great precision using an Eye Gaze
system.
The wheelchair direction control uses a natural feedback system that automatically adjusts the wheelchair
direction towards the point where the user is looking at.
The vibrations caused by the normal wheelchair movement do not interfere in the eye gaze detection.
Natural light interferes significantly in the Eye Gaze detection, which limits this system to just indoor use.
With these experiences and future work, the expression “what you see is what you get” may be
transformed into “where you see is where you get”.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
18 September 2-3, 2008
Prague, Czech Republic
References
Barea, R. Boquete, L. Mazo, M. López, E. (1999) Guidance of a wheelchair using electrooculography.
Proceeding of the 3rd IMACS International Multiconference on Circuits, Systems,
Communications and Computers (CSCC'99). July 1999.
Figueiredo, L. Gomes, A. (2007) Magic Eye Control. COGAIN 2007, Leicester, UK.
Law, C.K.H.; Leung, M.Y.Y.; Xu, Y.; Tso, S.K. (2002) A cap as interface for wheelchair control.
Intelligent Robots and System, 2002. IEEE/RSJ International Conference, pp. 1439 - 1444 vol.2.
Shi, A. Gale, A. (2007) Environmental Control by Remote Eye Tracking. COGAIN 2007, Leicester, UK.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 19
Prague, Czech Republic
AI Support for a Gaze Controlled Wheelchair
Petr Novák, Tomáš Krajník, Libor Přeučil
Department of Cybernetics, Faculty of Electrical
Engineering, Czech Technical University
Karlovo nam. 13, 12135, Prague, Czech republic
{novakpe, tkrajnik,preucil}@labe.felk.cvut.cz
Marcela Fejtová, Olga Štěpánková
Department of Cybernetics, Faculty of Electrical
Engineering, Czech Technical University
Karlovo nam. 13, 12135, Prague, Czech republic
{fejtova, step}@labe.felk.cvut.cz
Other co-authors: Eduard Bakštejn, Zdeňka Lukešová, Tibor Strašribka, Jan Šourek, Pavel Štastný
(bakste1, lukesz1, strast1, sourej1, stastp3 @fel.cvut.cz)
Keywords
Smart wheelchair, hands-free control, environment sensing, robotics, safety
Introduction
Currently, one can distinguish two basic modes used for control of a wheelchair:
● In the direct mode, the wheelchair is driven in the same way as a car. The user indicates direction of
the wheelchair movement using a steering wheel or a joystick and similar approach is applied for
changing the speed.
● In the indirect mode, the user communicates with the wheelchair using a control panel reviewing the
available predefined commands, e.g. “go 0,5 m forward”, “turn 30 degrees right”, … The control
panel consists either of a set of different buttons suited for the particular user (HW solution) or it is
represented by a GUI of a computer screen (SW solution).
The I4Control® system is a wearable system for gaze-computer interaction that is able to simulate the
function of a joystick or to select from a grid-like structure using an appropriate GUI (Fejtová et al.,
2006). Consequently, it can serve as a single input device for control of a wheelchair in both upper
mentioned modes. This is certainly true from purely technical point of view. But this is not enough
because safety of the resulting system has to be ensured! That is why special attention has to be given to
the questions concerning reliability of the acquired signal and various ways it can be influenced or
obscured. The fundamental danger related to gaze-based control comes from physiological reactions to
certain stimuli that we have “build-in” to protect our eyes and even ourselves: we close eyes when a strong
light flashes, we look into the direction of a loud sound, etc. It is a question how these immediate reactions
can be distinguished from the intentional control signals the user wants to mediate to the controlled
system. Moreover, the quality of the gaze control is significantly influenced by any change of light
conditions for which the human eye has to adopt itself. Even if we use the best algorithms to evaluate
point of gaze we can loose the control of the system for the time interval the human eye needs for
adaptation to changed conditions. And this is not acceptable.
To make up for limitations of gaze control we have conducted several experiments with a wheelchair
complemented by several AI features that have been developed in the field of intelligent mobile robotics.
The resulting system is described in the section on an environment sensing system. Its functionalities seem
to be useful not only for gaze-based wheelchair control but in a more general context. That is why we
reconsider the indirect mode of wheelchair control and suggest its further refinement in the next section.
In the conclusions there are mentioned some ideas for our future work towards construction of a smart
wheelchair.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
20 September 2-3, 2008
Prague, Czech Republic
Environment Sensing System
To ensure safety of the wheelchair user and to support autonomy, the wheelchair has been equipped with
a sensory system consisting of sonar and laser rangefinders, color camera and a notebook that conducts all
necessary sensor data processing. The forward-looking color camera acquires images at 15 frames per
second. The laser rangefinder is aimed to the front and provides a planar scan with 230° field of view and
range of 4 m. Sonars are located at the back of the chair and are used to detect obstacles during backward
movement. The wheelchair has been also equipped with a prototype odometric system previously
developed for another project.
The safety is enforced by limiting the maximal speed of the wheelchair whenever nearby obstacles are
detected by any of the aforementioned sensors. When moving forwards, the rangefinder scan is searched
for objects closer than 1 m. If such objects are detected, the maximal speed is decreased and when such
distance is 0.2 m, the wheelchair forward movement is turned off. Similar rules are introduced for
backward movement and for sonar sensors. We plan to implement algorithm similar to insect-like
navigation, where obstacle detection is based on optical flow computed from image sequence acquired by
the camera mounted to the wheelchair.
The sensors are not used only for obstacle detection – their input is essential for construction of
autonomous modes of navigation. So far, we have tested two algorithms based on data from color camera
and one laser rangefinder based algorithm.
• First vision based algorithm (Kosnar et al., 2008) recognizes pathways in front of the wheelchair. The
user first specifies, which parts of current image represent obstacles and what color has the path. The
algorithm indicates, what trajectory will be followed. After the user confirms the trajectory, the
wheelchair starts to move. While moving, estimated future trajectory is shown enabling the user to
redefine obstacle and path colors on demand. Moreover, this algorithm can be used to create a graph
like map of the environment. With this map, the driver can just specify required destination.
• Second vision based algorithm (Krajnik and Preucil, 2008) detects significant objects in the image,
measures their positions and creates a simple description of the path the wheelchair follows. The
description of the recorded path can then be stored in a corresponding database and later used to
ensure autonomous traversal of the path by the wheelchair.
• Third algorithm incorporates laser rangefinder measurements into a two-dimensional map of indoor
environment. After a reliable map is created, the path between any two reachable points on the map
can be planned through dedicated AI algorithms. The wheelchair can safely follow the designed path
provided upper mentioned obstacle detection is applied.
The refinements of indirect mode for wheelchair control
Let us consider the indirect mode ensured by a computer GUI. In this case the input is not limited to direct
physical contact based on touch but it can be mediated by number of alternative interfaces including e.g.
those applying gaze, voice or blow (Felzer and Nordmann, 2007).
The simplest approach offers the user to compose his/her journey from many elementary steps. GUI
interface offers several buttons with corresponding labels for example: 1m forward, 2m forward, left
10degree, right 20degree. User selects appropriate button and the wheelchair performs requested action. In
a more sophisticated setting the user can first design a sequence of elementary steps and finally give a
command to perform them in one run. Here, the user must be able to interrupt movement of the wheelchair
in any moment. This can be achieved either by using an independent input channel dedicated to this
purpose or by specifying a special combination of the main control signals. This combination has to be
such that it is highly improbable that the user executes it without specific intention. This type of
wheelchair control is rather demanding and the movement takes a lot of time.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 21
Prague, Czech Republic
• The following options rely on incorporation of various AI features (Mandel et al., 2005) based on self-
orientation and localization of wheelchair in world as well as on some methods of artificial
intelligence (image detection, creating of map, smart localization, …). Those features we are currently
applying have been briefly described in the former section. The first option the image from camera of
the control system is displayed on user’s screen. Control system detects (recognizes) some routes
(footpath, road, …) and the user can select one of the offered possibilities. When selection is
confirmed, the wheelchair starts to follow the requested path. Movement of wheelchair stops
automatically whenever the control system detects some obstacle it cannot cope with itself.
• Further improvement is represented by the second option which incorporates learning. Suppose, the
wheelchair has built-in a map of the environment it moves in and it offers a list of pre-created or
learned paths. As soon as the wheelchair can identify its location, it is enough if the user selects
his/her target position (for example: kitchen, bathroom, bedroom, …) and wheelchair can plan its
journey to requested position itself by composing it from the parts listed among its ready-made paths.
The obstacle detection subsystem ensures that users' reactions to surprising stimuli do not negatively affect
function and safety of the resulting system because wheelchair movement is automatically halted
whenever the control system detects any serious problem (for example big obstacle).
Universal GUI for wheelchair control
Of course, the control system of wheelchair does not have to be restricted to a single option just described.
User can make choice from the appropriate options according his/her actual location. In the home
environment, it is possible to rely on pre-created paths and select target position, only. In structured
outdoor environment (parks, pathways) it seems useful to use simple path recognition methods and in
unstructured or otherwise complicated environment it is possible to use direct control of movement.
Figure 1. GUI interface of control system.
Moreover, the user can wish to switch among several input devices (buttons, eye movement recognition,
…). To support freedom of choice while ensuring safety, the control system is divided into two parts. The
first part includes mainly interface for input device and GUI interface (dialog with buttons) to select
appropriate options / actions. This part also controls high level commands such as: go to target position.
Second part takes care of autonomous movement of the wheelchair and ensures safety during the journey.
This part performs movement commands (go, turn left, stop) and it checks permanently that the movement
in intended direction is safe.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
22 September 2-3, 2008
Prague, Czech Republic
Figure 2. Block diagram of wheelchair system.
Conclusions
There are a number of AI algorithms that can improve wheelchair user’s comfort and safety
(Mandel et al., 2005). When considering them one has to take into account their time and memory
requirements so that they fit the needs of the requested tasks and can be conducted by the HW available on
the wheelchair (notebook in our case). As a next step, we are planning to implement a simple tracking
program, which will simplify creation of the pre-defined paths: the user will be able to specify an object
and the wheelchair will follow it, track its path and remember it. This approach will be used to support the
learning based option mentioned as a refinement of the indirect mode of wheelchair control.
Acknowledgements
The presented research and development has been partially supported by the EU grant IST-2003-511598
COGAIN (Communication by Gaze Interaction) and by the Czech MSMT grant C06005 SYROTEK.
References
Fejtová M., Novák P., Fejt J., Štěpánková O. (2006): When can eyes make up for hands? Proceedings of
COGAIN 2006 ‘Gazing into the Future’, pp.46-49.
Felzer, T., and Nordmann R. (2007) Consolidating Computer Operation and Wheelchair Control. ASSETS
2007 - Proceedings of the Ninth International ACM SIGACCESS Conference on Computers and
Accessibility, Tempe, AZ, USA, ACM Press, pp. 239-240.
Felzer, T., and Nordmann R. (2007) Alternative Wheelchair Control. RAT 2007 Proceedings of the
International IEEE-BAIS Symposium on Research on Assistive Technologies, Dayton OH, USA,
IEEE Computer Society, pp. 67-74.
Krajník T., Přeučil, L. (2008) A Simple Navigation System with Convergence Property. In Proceedings of
European Robotics Symposium 2008, Springer-Verlag, pp.282-292.
Košnar K., Krajník, T. Přeučil, L. (2008) Visual Topological Mapping. In Proceedings of European
Robotics Symposium 2008, Springer-Verlag, pp.333-342.
Mandel, Ch., Huebner, K., Vierhuff T. (2005) Towards an Autonomous Wheelchair: Cognitive Aspects in
Service Robotics. Proc. of Towards Autonomous Robotic Systems (TAROS 2005), pp. 165–172.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 25
Prague, Czech Republic
The other algorithms perform very similar to the original algorithm as can be seen in Figure 2, where the
results of an image sequence where the eyes follow a slowly moving spot are shown.
Figure 2: Results for the algorithms of Geier, Zhu et al. (top row), Poursaberi et al. and
Perez et al. (middle row) Ohno et al. and Daunys et al. (circle approximation)
A quantitative evaluation is difficult, as no ground truth data exists to determine the detection error of the
algorithms. A coarse estimation has been done by using an image sequence where the user followed a
horizontally moving point. Allthough also saccades contribute to the real pupil positions, the mean
deviation
algorithm left eye right eye
(Geier, 2007) 1.4508 1.1996
(Zhu & Yang, 2002) 1.8148 1.7803
(Poursaberi & Araabi, 2005) 1.3013 1.0860
(Pérez et al., 2003) 1.3842 1.2090
(Ohno et al., 2002) 1.2832 1.2341
(Daunys & Ramanauskas, 2004)
Circle Approximation
1.2898 1.3303
Table 1: Mean vertical deviation for a sequence with a horizontally moving point
from the mean y-value has been calculated. The values in Table 1 give some hints on the accuracy of the
algorithms, but the results are too close and vary too much to nominate a clear winner.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 29
Prague, Czech Republic
• Glance down mode: Locomotion
no action on dwell
constant streaming of ‘W’ keystroke events when the user looks in the main part of
the screen
streaming of ‘A’ and ‘D’ keystroke events when the user looks into small square
regions in the left and right hand sides of the screen
streaming of ‘S’ keystroke events when the user looks inside a thin strip inside the
bottom edge of the screen causing the avatar to walk backwards
An approach to user performance investigations
The initial pilot study (Istance et al, 2008) showed that using Second Life with our gaze-based technique
resulted in task completion times that were distinctively longer than when using conventional interaction
techniques. In order to achieve parity of gaze interaction with normal keyboard and mouse, it is important
to be able to identify usability issues with gaze control in terms of what influences the speed of interaction
(time of task completion) and the types of errors made.
Partitioning task time into ‘productive’ time and ‘error’ time has long been a feature of usability
engineering (Gilb, 1984). The time spent in a specific error condition represents the potential saving in
task completion time if the cause of that specific error can be designed out so the user no longer makes
that error. The relative savings in task completion times by addressing each of the types of errors
identified represents a kind of cost-benefit analysis of redesigning different features of the user interface.
The Experiment
Subjects and apparatus. The study involved 12 participants. Ten of them were students and two were
university lecturers who were experienced users of gaze interaction. Ages varied from 20 to 56, the
average being 29. All subjects were able-bodied. The trials were carried out using a Tobii T60 screen
integrated eye tracker, and the window contents during all of the trials were recorded using screen capture
software.
Tasks. Two sets of three tasks were devised to be carried out within a purpose built world within Second
Life. The world represented the computer science building at the university.
• The locomotion task required the subject to walk
from the main entrance, up the main stairs (Figure
1), go into a room where there were display panels
about individual modules and then report the
module code from a particular panel. The difference
between the two sets of tasks was the actual panel
the participant was asked to report the code from.
• The object manipulation task required the subject to
change a slide or request a web page from the main
lecture theatre. In one task set, the participant
changed the slide. This involved a right click on a
panel button object to display a pie menu and then a
left click to select the ‘Touch’ option. In the other
set, the equivalent task was to request a web page to
be displayed. This involved a left click on a panel Figure 1. A locomotion task – searching for the
target from the upstairs.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 31
Prague, Czech Republic
For each subject and for each task, the time spent in each error condition was summed and this was
deducted from the total task time, leaving the non-error time for each trial.
The outcomes of the trials for each of the three tasks are shown in Figure 2. Each task represented one
example from the three main different categories of tasks performed in virtual environment (Hand, 1997;
Bowman, 1999). Data from the locomotion task is at the bottom of the graph, the application control task
data is in the middle and the object manipulation task data is shown at the top. The total lengths of the bars
show the average total task completion times in seconds including errors.
There were significant problems with calibrating the eyetracker for one subject. She was able to complete
all 3 tasks in the gaze condition but there were far greater accuracy errors than for any other subject (the
error time was more than 3 standard deviations from the mean of all other subjects’ error times for the
application control and object manipulation tasks). Consequently all data from this subject was removed
from the analysis.
The results show that all subjects were able to complete the three tasks using eye gaze. The non-error part
at the bottom of the bars enables comparison of task times if the cause of the errors can be removed by
design changes. The gaze:keyboard-mouse ratio of non-error times for the locomotion task is 1:1.2. The
corresponding ratio for the application control task is approximately 1:2, and for the object manipulation
task is approximately 1:2.5.
The error-free times in the gaze condition are encouraging, particularly for the locomotion task. With only
a short training session, subjects would be able to complete the locomotion task using gaze nearly as
quickly as with key commands if the cause of the locomotion errors could be removed. The reasons
behind the locomotion errors are in part due to the speed of movement of the avatar in response to key
commands generated by the emulator. This causes overshoot or undershoot of movement which then have
to be corrected. This is largely due to the processing pipeline on a single computer (eyetracker – emulator
– Second Life browser, and additionally, in the experiment condition, the video capture software). There
may also be optimisations to the emulator software that could improve performance here. Another source
of locomotion errors is the location of the backwards motion zone at the bottom of the screen. This meant
that the gaze position first had to travel through this zone after changing into locomotion mode and the
latency in the system caused an unwanted backwards movement as a result. These can be addressed by
modifications to the behaviour of the locomotion mode and examining in detail the causes for response
latency.
The biggest cause of errors in the application control and the object manipulation tasks was the difficulty
of hitting the small control objects in the dialog boxes to change appearance. This was exacerbated by
Figure 2. Average task completion times partitioned into error times (in four types of errors) and non-error times
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
34 September 2-3, 2008
Prague, Czech Republic
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
38 September 2-3, 2008
Prague, Czech Republic
How Can Tiny Buttons Be Hit Using Gaze Only?
Henrik H.T. Skovsgaard
John Paulin Hansen
Department of Innovative Communication
IT University of Copenhagen
Rued Langgaards Vej 7, DK-2300, Denmark
{hhje; paulin}@itu.dk
Julio C. Mateo
Department of Psychology
Wright State University
3640 Colonel Glenn Highway
Dayton, OH 45435, USA
mateo.2@wright.edu
Keywords
Gaze interaction, assistive technology, cursor control.
Introduction
The limited accuracy of gaze trackers requires alternative methods to the point-and-click selection used in
graphical user interfaces. Only a fraction of the interactive elements in Windows are actually critically
small for current gaze tracking systems (i.e. less than 12x12 pixels), but they become serious obstacles for
the workflow when using a “blunt” input. Increasing the size of icons or decrease the resolution of the
screen may help in some cases, but the smallest elements on a computer interface may still create
problems because of noise or off-sets on the gaze tracker. A two-step magnification method (Lankford,
2000) is provided by several commercial gaze communication systems to compensate for the inaccuracy.
First, the user looks at the region of a screen in which the target is located. After a certain dwell time an
enlarged version of the region pop up in a new window. The user can now make the selection in this
enlargement with another dwell time selection. The method works well for most people, but the two-step
process is likely to be slower than a single-step, direct selection.
Figure 3 The top-left Windows icons are taken from text typing application (16x16 pixels) and the top-right icons are common
desktop icons (32x32 pixels). The three red targets (bottom) were used in the present experiment (size 6x6, 9x9 and 12x12 pixels).
Zoom–selection is a new method examined in this paper. When engaged, it presents a zoom-window
around the user’s point of regard in which a smooth animation shows the content of the window gradually
increasing in size, as if approaching the user. The zoom-function allows for runtime course corrections
during the selection process, proportional to the current level of magnification. At the end of the
predetermined zooming time, the target in the centre of the zoom-window then becomes selected. Zoom-
selection has been successfully used for gaze typing (Hansen et al. 2008). This paper examines zoom
selection used for target selection in a windowed environment by comparing it to the two-step
magnification method and – more briefly – to the simple dwell-time method. Ashmore et al. 2005
examined a gaze-contingent fisheye perspective for eye pointing and selection of magnified targets. The
fisheye perspective (a so-called distortion interface) is hidden during visual search, but appears as soon as
the user fixates a target. This technique provides an overview during search and the enlargement of targets
during selection. However, we prefer to use zoom translations instead of a fisheye distortion since it keeps
a better legibility in motion.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 39
Prague, Czech Republic
Experiment
In our experiment, the movement time (Mt) starts with the onset of the target and it ends when the user
presses the space bar to trigger the selection process. With this approach, we also include reaction time in
Mt to simplify the analysis.
All gaze-only interactive systems must discriminate between when users are navigating and when they are
fixating. Their methods are unfortunately not standardized and the times it takes them to discriminate a
fixation vary from tracker to tracker (Kumar et al. 2008, Salvucci and Goldberg, 2000). In order to avoid
the uncertainties associated with the (unknown) device-dependent software fixation-detection techniques,
we decided to initiate the selection process only when the user pressed a space bar. Consequently,
selection time (St) is measured as the time it takes from the subject presses the space bar to the final
selection process has been executed.
We performed three experiments to test three different gaze-only selection methods: simple dwell, two-
step magnification and zoom-selection. Experiments 1 and 2 were conducted with a single target
appearing in a circular fashion according to the ISO 9241-9 standard (ISO/DIS 9241-9, 2000), cf. figure
2a. In experiment 3 we presented several thousands tiny Windows icons simultaneously with just one blue
target among them, cf. figure 2b. In all experiments the target would only appear when the user was
fixating at a marked centre on the screen.
Figure 4a Screenshot from the two-step magnification
selection. The red square indicates the target.
Figure 2b Screenshot of 2000 randomly spread icons used in
experiment 3. The blue icon in the centre indicates the target.
Six participants (3 male, 3 female, mean age = 30 years) were recruited from the local university campus.
The input device was a Tobii-1750 gaze tracker. The application ran on an IBM 1.86 GHz Intel Dual Core
machine. The resolution was 1280x1024 pixels. The gaze tracker sampled at 50 Hz with a claimed
accuracy of 0.5 degree on the visual angle. Roughly, this corresponds to 20 pixels in our test
configuration.
The primary independent variable was interaction technique with three levels:
• Dwell: Provides a visual feedback on remaining dwell time by a pointer gradually decreasing its
size from 68x68 pixels but without enlargements of the target.
• Two‐step Magnification: A window (size 200x200 pixels) pops‐up with a magnification (x5) of the
gaze area at the point of regard when the user presses the space bar. In this window, the user
can perform the second and final selection by another simple selection.
• Zoom selection: Pressing the spacebar triggers a graduate increase of objects within a window
(size 300x300 pixels) until a final x10 zoom‐level is reached.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
40 September 2-3, 2008
Prague, Czech Republic
We included additional variables to test a range of design options. Space constraints allow us only to
report on the following variables in this paper:
• Target size: 6x6 pixels, 9x9 pixels and 12x12 pixels
• Selection time: In experiment 2 we used 1000 ms, 1500 ms, 2000 ms for the zoom selection
interaction and 500 ms, 750 ms and 1000 ms for two‐step magnification. In experiment 3 we
used 1000 ms for the zoom selection and 500 ms for the two‐step dwell magnification.
In order to compare the methods fairly, we decided to set the dwell times to half the zoom-time since the
two-step magnification would take double as much time as the single-step zoom selection. Run-time
corrections of pointer positioning were possible for both the dwell selection and the zoom selection.
To minimize asymmetric learning effects, the interaction method, target size, and selection time were
counterbalanced using a balanced Latin Square. Furthermore, for both experiments, the mouse pointer was
hidden to reduce visual distraction and prevent chasing. Audio feedback from the application informed the
users about the outcome of their activation, playing a pleasant sound if the activation was a success and a
warning sound if it went bad.
Experiment 1 consisted of 10 activations with the standard single-dwell activation technique using the
ISO-format, with each of the 3 target sizes, (the largest 12x12 pixels shown first and the smallest 6x6
pixels shown last). The dwell time was set to 1000 ms. Experiment 2 was conducted immediately after
experiment 1. Interaction method (two-step dwell magnification and zoom selection), target size (12x12,
9x9, and 6x6 pixels), and selection time (1000, 1500, and 2000 ms) were manipulated. The order of the
conditions was counterbalanced. Ten activations were made with each of the combinations, giving 180
data per subject.
Finally, in experiment 3, conducted several days after experiment 1 and 2, we compared the two-step
dwell magnification and the zoom selection technique. In this experiment, we used a windows-like layout
with 2000 icons shown at once. Again, the sizes were 6x6, 9x9, and 12x12 pixels, but just tested with one
selection time namely 1000 ms for the zoom selection and 500 ms for the two-step dwell magnification. A
target would appear as the only blue icon among all the other 2000 small icons (c.f. figure 2b).
Results
Outliers (Outlier > µ + σ · 3) were first removed. This excluded 5 data of 180 in Experiment 1, 19 data out
of 1080 in Experiment 2 and 16 out of 1080 in Experiment 3. We then performed ANOVA and Bonferroni
post-hoc tests. Table 1 summarizes the results from all the experiments.
6 x 6
pixels
9 x 9
pixels
12 x 12
pixels
Movement time
(ms)
Selection time
(ms)
Total time
(ms)
Dwell (n=180) exp 1
µ = 0.07,
σ = 0.25
µ = 0.18,
σ = 0.40
µ = 0.27,
σ = 0.45
µ = 3374,
σ = 3865
µ = 1000,
σ = 0
µ = 4785,
σ = 3878
exp 2 µ = 0.30,
σ = 0.46
µ = 0.44,
σ = 0.50
µ = 0.57,
σ = 0.50
µ = 1544,
σ = 1205
µ = 1511,
σ = 411
µ = 3055,
σ = 1263
Zoom (n=1080)
exp 3 µ = 0.45,
σ = 0.50
µ = 0.50,
σ = 0.50
µ = 0.50,
σ = 0.46
µ = 2019,
σ = 1159
µ = 1000,
σ = 0
µ = 3058,
σ = 1195
exp 2 µ = 0.81,
σ = 0.40
µ = 0.88,
σ = 0.32
µ = 0.93,
σ = 0.24
µ = 1422,
σ = 966
µ = 2924,
σ = 936
µ = 4346,
σ = 1558 Two-Step Dwell
(n=1080) exp 3 µ = 0.85,
σ = 0.36
µ = 0.89,
σ = 0.31
µ = 0.88,
σ = 0.33
µ = 1678,
σ = 798
µ = 2177,
σ = 800
µ = 3865,
σ = 1169
Table 1 Means (µ) and standard deviations (σ) of data.
Experiment 1: The grand mean of hit rates for a common dwell selection were 0.17, σ = 0.38 and the
grand mean movement time was 4006 ms, σ = 4435. The mean hit rate for 12x12 pixels were 0.27, σ =
0.45, mean hit rate for 9x9 pixels were 0.18, σ = 0.40 and mean hit rate for 6x6 pixels were 0.07, σ = 0.25.
The ANOVA showed a main effect of the target size on hit rate: F (2, 179) = 4.269, p < 0.015. This
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
42 September 2-3, 2008
Prague, Czech Republic
(σ = 18.3) and the two-step magnification were 4.25 pixels off (σ = 8.8). Assuming that targets should be
at least twice the size of the average offset to reliably hit the target, the results from the different selection
methods indicates that a minimum target diameter would have to be around 34 pixels for dwell, around 24
pixels for zoom and 9 pixels for two-step dwell magnification if they are to consistently provide successful
activations.
In conclusion, the findings suggest that both two-step magnification and zoom selection be included in
gaze interactive systems. Because zoom selection are the fastest method and because it will handle the
majority of Windows icons well (namely all of those larger than approximately 32x32 pixels), we suggest
zoom selection to be the default method and the two-step dwell to be a second option that the user can
engage when needed – i.e. when targets are really small or when the tracker becomes inaccurate.
Acknowledgements
This work was supported by the European Network of Excellence COGAIN, Communication by Gaze
Interaction, funded under the FP6/IST programme of the European Commission.
References
Ashmore, M., Duchowski, A. T., and Shoemaker, G. 2005. Efficient eye pointing with a fisheye lens. In
Proceedings of Graphics interface 2005 (Victoria, British Columbia, May 09 - 11, 2005). ACM
International Conference Proceeding Series, vol. 112. Canadian Human-Computer
Communications Society, School of Computer Science, University of Waterloo, Waterloo,
Ontario, 203-210.
Ergonomic requirements for office work with visual display terminals (VDTs) - Part 9: Requirements for
non-keyboard input devices. International Standard, International Organization for
Standardization (2000) ISO. ISO/DIS 9241-9
Hansen, D. W., Skovsgaard, H. H., Hansen, J. P., and Møllenbach, E. 2008. Noise tolerant selection by
gaze-controlled pan and zoom in 3D. In Proceedings of the 2008 Symposium on Eye Tracking
Research & Applications (Savannah, Georgia, March 26 - 28, 2008). ETRA '08. ACM, New
York, NY, 205-212. DOI= http://doi.acm.org/10.1145/1344471.1344521
Kumar, M., Klingner, J., Puranik, R., Winograd, T., and Paepcke, A. 2008. Improving the accuracy of
gaze input for interaction. In Proceedings of the 2008 Symposium on Eye Tracking Research &
Applications (Savannah, Georgia, March 26 - 28, 2008). ETRA '08. ACM, New York, NY, 65-68.
DOI= http://doi.acm.org/10.1145/1344471.1344488
Lankford, C. 2000. Effective eye-gaze input into Windows. In Proceedings of the 2000 Symposium on
Eye Tracking Research & Applications (Palm Beach Gardens, Florida, United States, November
06 - 08, 2000). ETRA '00. ACM, New York, NY, 23-27. DOI=
http://doi.acm.org/10.1145/355017.355021
Salvucci, D. D. and Goldberg, J. H. 2000. Identifying fixations and saccades in eye-tracking protocols. In
Proceedings of the 2000 Symposium on Eye Tracking Research & Applications (Palm Beach
Gardens, Florida, United States, November 06 - 08, 2000). ETRA '00. ACM, New York, NY, 71-
78. DOI= http://doi.acm.org/10.1145/355017.355028
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
46 September 2-3, 2008
Prague, Czech Republic
Conclusion
The experiment described aims to shed light on which kind of gestures would be easy to do with gaze and
which kind of algorithm is needed to recognise the performed gaze gestures. Results will help when
designing a set of gaze gestures that are suitable for a drawing application and for its tools and functions.
When the set of gaze gestures is ready, they will be tested with users and compared against other available
methods.
References
Drewes, H. and Schmidt, A. (2007) Interacting with the Computer using Gaze Gestures. In Proceedings of
INTERACT 2007. Lecture Notes in Computer Science 4663, Springer, pp. 475–488.
Gips, J. and Olivieri, P. (1996) EagleEyes: An Eye Control System for Persons with Disabilities. The
Eleventh International Conference on Technology and Persons with Disabilities. Available in
http://www.cs.bc.edu/~eagleeye/papers/paper1/paper1.html.
Hornof, A. J., Cavender, A., and Hoselton, R. (2004) EyeDraw: A system for drawing pictures with eye
movements. Proceedings of ASSETS 2004: The Sixth International ACM SIGACCESS Conference
on Computers and Accessibility. ACM Press, New York, NY, pp. 86–93.
Isokoski, P. (2000) Text input methods for eye trackers using off-screen targets. In Proceedings of the
2000 Symposium on Eye Tracking Research & Applications (ETRA '00). ACM Press, New York,
NY, pp. 15–21.
Istance, H., Bates, R., Hyrskykari, A., and Vickers, S. (2008) Snap clutch, a moded approach to solving
the Midas touch problem. In Proceedings of the 2008 Symposium on Eye Tracking Research &
Applications (ETRA '08). ACM Press, New York, NY, pp. 221–228.
Tchalenko, J. (2001) Free-eye drawing. Point: Art and Design Research Journal, 11, pp. 36–41.
Wobbrock, J. O., Rubinstein, J., Sawyer, M. W., and Duchowski, A. T. (2008) Longitudinal evaluation of
discrete consecutive gaze gestures for text entry. In Proceedings of the 2008 Symposium on Eye
Tracking Research & Applications (ETRA '08). ACM Press, New York, pp. 11–18.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 47
Prague, Czech Republic
NeoVisus: Gaze Driven Interface Components
Martin Tall
Varmdogatan 72, 257 33 Rydeback, Sweden
m@martintall.com
+46730611677
Keywords
Gaze Interaction, Midas touch, Target areas, Saccade selection
Introduction
The goal for this work has been to venture into novel interaction methods and implement these in reusable
Graphical User Interface (GUI) components. My intention has been to create an interaction style that relies
more on the specific properties of the human visual system, in which movement comes at a more constant
and lower cost compared to moving a physical modality. Due to the proximity to natural human behavior
this type of interaction should be very easy to learn. There is no new physical modality that the user has to
map his or her intentions onto. Gaze interaction offers room for novel interaction techniques where objects
appear or change when the user looks at them, without necessarily leading to a command execution. These
reusable and configurable GUI components developed offer rapid development of future gaze driven
applications.
Gaze Interaction Interface Components
The use of gaze data for interaction with computers is fundamentally different from more traditional
computer interaction since there is no input modality (such as the mouse) to be acted upon. This requires
specific interaction methods. Due to the physiology properties of the eyes a fixation covers an area of the
screen that is larger than a traditional mouse pointer. Eye trackers will never be able to discriminate a gaze
position for some of the smaller User Interface (U.I) components used in current interfaces. Hence, most
of the existing applications for mainstream operating systems such as Microsoft Windows to be ill suited
for gaze interaction. Additionally, the gaze data provided by the eye tracker is noisy and jittery. This factor
has to be accounted for when designing gaze driven interfaces.
The commonly used dwell times creates a interaction style that is stressful to use since everywhere the
user looks a command seems to be activated. This issue, known as the “midas touch problem” (Jacobs et
al., 1993), enforces a constant roaming of the eyes. For example, the variance in text length displayed on
buttons leads to involuntary activation on items that contain longer and thus more time consuming text
strings. By further developing the use of target areas (Ohno, 1998) and displaying these dynamically
hopefully the midas touch problem can be alleviated. This result in components that will display options
only when the user is looking at them, providing a direct interaction style based on the contextual position
of the users gaze. To handle the noisy and jittery gaze data I intend to use target areas that are larger than
the buttons and icons used, this enables the gaze to remain on the target.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
48 September 2-3, 2008
Prague, Czech Republic
Implementation
When working with gaze as the only input, the midas touch problem as described earlier becomes a major
issue. The behavior of the components has been shaped to reduce this as much as possible. In this case that
includes a dynamically expanding target area which is activated by a fixation and creates a layer on top of
the other components when activated and “rolled out”. Erroneous activations are reduced since the
selection icons are not displayed on the interface in its original state, additionally a fixation on a button or
menu does not cause a command to be issued (since a second saccade is required for performing the
activation). When looking away from the component the activation icons are dynamically hidden from the
interface which could reduce the error rate.
The Binary Choice Selection Button
This component resembles the traditional On/Off button where an option can either be selected or
deselected, hence the name binary choice of either one or zero (true or false). The component was
developed since the placement of text on dwell time activated icons causes involuntary activation (midas
touch). The variances in length of the text of various buttons make the dwell time activation highly
unstable. In other words, a button containing three words will more often be accidentally activated
compared to a button with on one word.
Fig. 2. Binary Choice, the target area (right box) is larger than
the actual saccade/selection icon. It raises the tolerance for jitter
by reducing the effect of noise from eye tracker
The Radial Saccade Pie Menu
The idea behind the component is to make use of dynamic allocation of the display area as well as
providing a novel interaction method for activation. Upon fixating the button a set of icons are displayed
at the top, left, right and bottom of the ellipse. An activation can then be performed by making a short
saccade any of the selection icons. Since the second stage icons are displayed within the parafoveal field
of view and always positioned at the same location (top, bottom, left and right) the user can effortlessly
make a saccade to the desired icon. This reduces the chances of accidentally activating a command
compared to one step dwell activation.
1. Initial state 4. Selected state
2. On fixation, opaque icon appears (speaker) 3. Fixation on the icon (opacity removed, glowing border)
Figure 1. The Binary Choice component. Upon gaze entering the component a opaque layer expands to the
right, reveling the saccade icon (2, shown as a speaker). A growing white border indicates the activation
process (3). The changed/selected state is then indicated by the background of the component (4). The speed
of the rollout and the activation threshold is configurable.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
50 September 2-3, 2008
Prague, Czech Republic
configuration of the components in terms of both interaction speed (feedback) and activation threshold
(dwell) was configured in three modes. The three configurations had animation times of slow (500 ms.),
medium (300 ms.) and fast (10 ms.) which means virtually no delay and causes the selection area to
appear as soon as the gaze entered the component. In the same way the selection time (dwell) for each
choice was configured with the same variables, hence the naming of the configurations are long 500+500,
medium 300+300 and short 10+10.
The second part of the evaluation regards the prototype applications which the participants were free to
explore. The user satisfaction was measured by handing out two forms at the end requesting subjective
opinions on the interface concerning the navigation, design, feedback, ease of use and stability. The
Q.U.I.S (Chin et. al., 1987) and the IBM Computer Usability Satisfaction Questionnaires (Lewis, 1995)
were used.
Results
Fig. 5. Task set completion time
Fig 6. Binary Choice. Error rate
Binary Choice Component
The short temporal configuration (10+10) had a mean completion
time per task set (selecting nine components) of 12 seconds with a
standard deviation of 6 seconds compared to medium configuration
(300+300) which have a mean time of 16 seconds with a standard
deviation of 12 seconds. Finally the long activation time (500+500)
produced a mean task completion time at 18 seconds with the
standard deviation of 13 seconds .
Per indivdual component the short configuration had a mean
activation time of 1 second, while the medium provided a mean 1.2
seconds. The long configuration displayed activation times well
above the 500 ms (animation) + 500 ms dwell time required to
perform a selection, when displaying a mean individual activation
time on one and a half second.
Error rates are defined as the number of selections that exceed the
nine needed to complete each task set. The highest error rate was
found to be for the short configuration which also had the highest
variance. The average mean was short 4.03 (SD=3.7), medium 1.71
(SD=1.6) and long 3.9 (SD=2.6). The bars in figure 6 show the
mean average error rate over all sets in the three configurations.
Fig 5. Task completion times across the different configurations.
The horizontal line indicate the theoretical time needed to
accomplish the task.
Fig 6. Error rate for the different configurations. The short bar
represents errors for the 10+10 millisecond configuration, medium
equals 300+300 ms. and long 500+500 ms.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 51
Prague, Czech Republic
Fig. 7. Individual selection time.
Radial Saccade Pie Menu
The measures of time from gaze entering the component until a
selection has been performed. Looking at the different
configurations the short configuration (10+10) had a of mean 0.51
seconds with a standard deviation of 0.32 second. The medium
configuration (300+300) delivers mean of 0.8 second (SD = 0.24)
While the long (500+500) configuration of the component
produced a mean of 1.2 seconds (SD = 0.3).
Response to the prototype applications
The majority of the participants found the interface to be stimulating and fun to use. All participants who
were successfully calibrated and completed the two first steps in the evaluation were able to use the
prototype application with none or very few instructions. The interface was perceived as clear, well
structured and a majority was satisfied with how easy it was to use the system. The most prominent source
of dislike for the interface came from offsets in the calibration which consistently led to higher error rates,
longer task completion times and lower ratings in the questionnaires. The accuracy of the gaze position is
essential for a positive experience. Using gaze interaction with a constant offset is cumbersome, this factor
is represented by the high variance in frustration levels. These indicators correlates with the physical load
participants experienced and further with the overall satisfaction of the interface. An offset creates a
situation where no activations occur even if the participants reported starting at the components.
Future work
As the core technology of eye tracking more accessible a rich set of interface components is one important
area in making gaze interaction more widespread. Future versions of the NeoVisus component library is
likely to concern range selection, markers, text entry, communication and media functions, etc. The wide
range for computer usage today requires flexible building blocks for rapid application development.
References
Chin, J. P., Diehl, V. A, Norman, K. (Sept. 1987) Development of an instrument measuring user
satisfaction of the human-computer interface, Proc. ACM CHI '88 (Washington, DC) 213-218. CS-
TR-1926, CAR-TR-328
Jacob, R.J.K. “The Use of Eye Movements in Human-Computer Interaction Techniques: What You Look
at is What You Get,” ACM Transactions on Information System, pp. 152-169, 1991
Lewis, J. R. (1995) IBM Computer Usability Satisfaction Questionnaires: Psychometric Evaluation and
Instructions for Use. International Journal of Human-Computer Interaction, 7:1, 57-78.
Ohno, T. (1998). “Features of Eye Gaze Interface for Selections Tasks”. Proceedings of The Third Asia
Pacific Computer Human Interaction – APCHI ’98. IEEE Computer Society. 1– 6
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
52 September 2-3, 2008
Prague, Czech Republic
Session 3: Focusing on the User: Evaluating Needs and
Solutions
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 53
Prague, Czech Republic
Evaluations of Interactive Guideboard with Gaze-communitive Stuffed-toy Robot
Evaluations of Interactive Guideboard with
Gaze-Communicative Stuffed-Toy Robot
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 57
Prague, Czech Republic
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 59
Prague, Czech Republic
Gaze-contingent passwords at the ATM
Gaze-Contingent Passwords at the ATM
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 61
Prague, Czech Republic
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
62 September 2-3, 2008
Prague, Czech Republic
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 65
Prague, Czech Republic
Results
The typing rate was measured in words per minute (wpm). In the last session, the average typing speed
was 15.06 wpm for the full keyboard, 11.12 for the 2-row keyboard, and 7.29 wpm for the 1-row
keyboard. The average error rates varied between 1-5%, with large variance between participants during
the whole experiment. In the last session, the average error rates were below 2% for all conditions (see
Figure 2).
Figure 2. Typing speed (left) and error rate (right).
The selection time for the scroll buttons, letter keys and space was measured. Especially, monitoring the
usage of the scroll button is interesting, because it shows how the participants learned to use the scrollable
keyboards with only partially visible layout. Figure 3 below shows the selection times for the 1-row (on
the left) and 2-row (on the right) keyboards. The average selection times of the scroll buttons were 1107
and 1268 milliseconds for the 1-row and 2-row keyboard, respectively. If the constant dwell time of 500
ms is removed from the full selection time, the search time for each button is approximately 500 ms.
Figure 3. Selection time for the 1-row (left) and 2-row (right) scrollable keyboards.
Analysis of the scroll button usage shows that it slightly decreased in time and the average percentage of
the scroll button clicks among all clicks were 39% (1.64 KSPC) and 16.5% (1.2 KSPC) for the 1-row and
2-row keyboards, respectively. Participants used different strategies with the scrolling keyboards. Half of
them memorized the location of letter and rows so that they could choose the shortest route to the invisible
row. For example, after ‘e’ (located on the top row) the user can reach ‘n’ (on the bottom row) by one
scroll up instead of two scrolls down in the 1-row keyboard. Thus, the number of scroll usage was
minimized. Some participants never scrolled the layout from top line up (to the bottom) or vice versa,
because they did not want to lose orientation in scrolling. In this case, more scrolling was required but the
participants still did not spend time in searching for the target letter. Finally, one participant did not
memorize the distribution of letters across rows but always visually scanned any row to find the desired
letter, and used only one direction of scrolling (up). This strategy resulted in the slowest typing speed. The
difference between the fastest and slowest participant was approximately 3 wpm within each condition.
Error rate
0
1
2
3
4
5
6
0 200 400 600 800 1000 1200 1400
Characters typed
E
rr
o
rs
, %
1 row
2 row s
3 row s
Selection time (2-row keyboard)
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
Characters typed
T
im
e,
m
s
Scrolls
Letters
Space
Selection time (1-row keyboard)
0
500
1000
1500
2000
2500
0 200 400 600 800 1000 1200 1400
Characters typed
T
im
e,
m
s
Scrolls
Letters
Space
Typing speed
0
2
4
6
8
10
12
14
16
18
0 200 400 600 800 1000 1200 1400
Characters typed
S
pe
ed
, w
pm
1 row
2 rows
3 rows
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
70 September 2-3, 2008
Prague, Czech Republic
A few other sub-themes related directly to design considerations:
• Missing buttons: one specific part of the system, video playing, did not have a ‘stop button’ – a
quarter of participants specifically noticed this design anomaly. In other comparable screens with a
‘stop button’ participants were easily able to identify and use the button, confirmed through
observation of the eye-data.
• Content: in some parts of the system, participants can clearly been seen to understand the difference
between the interface buttons and the content and also to demonstrate an expected cause and effect
by looking to the content area after pressing a button. Analysis of the data also highlighted some
areas of the system where participants did not seem to find it clear which areas of the screen were
active buttons and which were displaying content.
Understanding and Cognitive Load
A number of themes emerged around participants understanding of the operation of the system, these
themes can be broadly grouped into issues to do with navigation, the intuitiveness of interaction:
• Navigation:
Participants showed varying levels of understanding of the navigation: For example, there was evidence
of confusion between the use of the ‘Do something else’ and ‘Quit’ buttons. However, some participants
also showed good understanding of the concept of the ‘Do something Else’ button – frequently using it
intentionally to choose another type of activity after having scanned and rejected the other options.
Participants also showed varying abilities to understand the concept of the navigation between the levels
in the system – most participants managed to show understanding of moving between the top level and
second level to choose a specific activity. Eye gaze data showed that many participants actively scanned
the available options on each level and then subsequently actively chose their preferred choice.
• Competence/intuitiveness:
Participants displayed varying levels of competence and intuitive understanding of the human-computer
interaction. Many participants, during the first period of use of the system, showed an intuitive
understanding of the touch-screen and how to use it, some other participants needed some instruction on
the touch-screen, however they then learnt its operation. Some participants were also able to explore the
system independently without prompting, including some of the more complex tasks in some cases, for
example navigating through multiple levels to select preferred music tracks. For some participants,
memory of the system sometimes appeared to affect their competence at using the system.
Discussion
Initial analysis of the data from the usability studies of this SIMWIN system software has shown it to
provide a useful source of information for the design and study of this Assistive Technology software for
older people. The recording of eye-gaze data has been demonstrated to be successful with this cohort
which might have otherwise been considered challenging. Although it would have been possible to run
the study without the eye data, the analysis has shown that combined with the other data streams (screen,
audio, video) it provides a very rich source of data. A number of the themes that developed from the
qualitative analysis of the data were reinforced through observation of the eye traces – for example, noting
the eye track path across choices before a selection helped confirm that users were intentionally choosing
options. Another example of the usefulness of the eye-data is shown in one of the themes where
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 71
Prague, Czech Republic
participants appear to find it difficult to see one of the buttons – without the eye data, the reason for their
difficulty in selecting this button would be difficult to induce.
The use of an eye-gaze system did have, however, some practical difficulties: for example, it was difficult
or impossible to calibrate the system for some older people who wore quite thick spectacles. Also, the use
of wheelchairs by some participants proved problematic when it came to locating the eye-gaze screen in
front of them in the ‘real environment’ of a residential home.
Other forms of analysis of the data will take place to try to establish further information from the data;
hand coding will help establish task success rates and times taken to complete tasks; and the data-log
from the software will be analysed to see if this provides a useful information source. In parallel with this
aspect of the project, a further study is being carried out to establish the cohort’s opinions on the concept
and use of this system.
Possible future research topics have already been identified – these include investigating specific aspects
of interface use by this cohort – for example the relationship between content and buttons and optimal
layout and positioning of these elements. The project is also likely to generate future research work
regarding the use and accessibility of computers for older people.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
72 September 2-3, 2008
Prague, Czech Republic
A Case Study Describing Development of an Eye
Gaze Setup for a Patient with ‘Locked-In
Syndrome’ to Facilitate Communication,
Environmental Control and Computer Access
Zoë Robertson
Barnsley Assistive Technology Team
Medical Physics, Block 14, Barnsley General
Hospital, Gawber Road, Barnsley, S75 2EP
zoe.robertson@nhs.net
Marcus Friday
Barnsley Assistive Technology Team
Medical Physics, Block 14, Barnsley General
Hospital, Gawber Road, Barnsley, S75 2EP
marcus.friday@nhs.net
Keywords
Eye gaze, MyTobii, environmental control, email, ‘locked-in syndrome’
Introduction
For people with severe physical disabilities eye gaze technology offers a way of accessing direct
communication and in addition a way of controlling their immediate environment and enabling access to
functions such as email and the internet. These additional functions have the capacity to enhance a
person’s independence and also their social inclusion hence improving quality of life.
This paper presents a case study of a person who has very limited physical movement including limited
and involuntary eye movement and looks at the process of assessing for, and setting up his eye gaze
system. It also highlights the challenges faced during the development process.
In order to maintain confidentiality the patient has been referred to as Mr X.
Process
Background
Mr X had a brainstem stroke in January 2005 and was diagnosed at that stage as having ‘locked in
syndrome’. The Barnsley Assistive Technology Team became involved towards the end of 2005 when
other health professionals, such as speech and language therapists and occupational therapists, were trying
to find methods that Mr X could use to access equipment such as a communication aid.
Mr X has a minimal amount of jaw and eyebrow movement and communicates ‘yes’ and ‘no’ by looking
down and up respectively. His eye movement is limited to vertical movement with little or no lateral
movement. Mr X and his wife have a very effective communication method in which his wife provides an
auditory scan of the alphabet. Although Mr X and his wife communicate very efficiently using this
method, other people such as family members and carers, rarely use it as effectively.
Following initial assessment the Barnsley AT Team investigated several different access methods to
attempt to identify a reliable and comfortable way for Mr X to obtain a switch action. Early on in the
assessment process the possibility of trying eye gaze was considered but soon dismissed as at that time a
system was not available which could accommodate the user having vertical eye movement only. The
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 73
Prague, Czech Republic
COGAIN report from 2005 highlights some of the physical difficulties that can make using eye gaze
difficult for some people (Donegan et al, 2005).
During the period that Mr X was trying various switch alternatives Barnsley AT Team had built links with
the COGAIN project, part of which is to consider using eye gaze for people who may have more
challenging requirements and, due to developments in eye gaze systems, it was suggested to Mr X that eye
gaze could be tried.
Assessing for eye gaze access
The initial aim was to set up a system for Mr X to give access to the alphabet, numbers, some basic editing
and punctuation tools and a rest page. Mr X was assessed using the MyTobii system. The initial challenge
was positioning the MyTobii appropriately for Mr X. The first assessments were done with the MyTobii
attached to the kitchen surface with Mr X in his chair in the doorway. To solve a height adjustable table
was purchased. This provided better positioning of the MyTobii and also enabled Mr X to try the system
whilst in his chair and whilst in bed.
The second major challenge was that, due to Mr X having only vertical eye movement and a degree of
nystagmus, (involuntary eye movement), it was difficult for Mr X to achieve a good calibration. In the
early assessments a calibration performed by another person was used. Although not ideal, this did enable
Mr X to be able to operate the system and indicated that with better calibration this could be a successful
access method.
Initial assessment sessions focussed on identifying ways of maximising Mr X’s success with eye gaze. To
facilitate assessment a grid which plays musical notes was used. This musical grid provided a rewarding
and relatively stress free way to practise as there was, for example, no pressure to spell out words. Having
identified the potential of eye gaze for Mr X a long process of refinement and development began.
Initial grid development
The initial trials had suggested that Mr X would only be able to access a single column of cells and that
these would be best positioned in the centre of the display with the workspace, (area where typed letters
appear), to the left. In addition previous assessment had suggested Mr X would probably manage five
rows. Using these grid constraints an initial communication grid was set up for Mr X based on the
scanning system which he uses with his wife.
Despite still having difficulty achieving a good calibration Mr X was able to use this initial grid set up to
type a sentence. In addition, having tried this initial grid Mr X requested that each row be made a different
colour, with the same colour scheme used throughout the grid set to aid his ability to distinguish between
cells more easily.
Due to advances in technology at this stage the second of the initial challenges, the calibration was solved
as the MyTobii software had advanced to enable one eye to be used for control. This enabled a good
calibration to be achieved and so improved the precision of his selections.
The colours were added to the grid set and at the following assessment Mr X was asked questions about
the set up, which he was able to answer using the MyTobii. Mr X was asked what he liked about the
system and he responded that it was great because most people won’t use the method he uses with his
wife. He was also asked what he didn’t like and his comments were about things he wanted changing
within the grid set (i.e. workspace to be positioned at the bottom with rows filling the width of the screen,
some comments on position of some of the letters and when he wanted it to speak). Finally, he was asked
about other things he would like to be able to do, and he was keen to have some basic environmental
control, document production and email.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
74 September 2-3, 2008
Prague, Czech Republic
Development of grid set
At this stage the grid set went through major changes and having solved the earlier challenges further
challenges arose. The first of these was that there were concerns regarding having the workspace at the
bottom of the screen as this reduced the height of the cells and it was possible that this could cause
selection problems for Mr X. The second challenge was and is accessing some of the additional functions
such as document production and email whilst only having a very limited number of cells available and a
small workspace. The third challenge was identifying ways of enhancing the speed at which Mr X can
communicate.
The first stage of this process was that the workspace was moved, the rows were extended to be the full
width of the screen and the labels on the cells were repeated across the cell (this was requested by Mr X
again to enhance the distinction between the cells). In addition a page of environmental control functions
was added. As mentioned above the second challenge at this stage was that Mr X can only access five
cells per page and one of those cells is always taken up with a way of getting to a different page. This
means considering the navigational process and number of selections to get to certain functions is
essential. A grid was added to give Mr X access to the additional functions required (e.g. Environmental
Control). Adding basic environmental control was relatively easy as the functions Mr X required were
single functions (e.g. lights) and so only took up one space on a grid. When these changes had been made
a further assessment was performed to test the changes. Mr X proved that he was able to manage with the
six rows and reported that he preferred the set up.
The grid underwent further development to enable Mr X to switch between communication, document
production, environmental control and email. The document and email production utilise the same
alphabet and editing grids however the workspace is changed accordingly when entering these modes.
Additional grids have also been introduced to give the specific commands required for these functions for
example a contacts list. This is where the limit of five cells has been a major challenge as to give all the
required functions for efficient and independent email use takes up multiple five cell grids and trying to
set these up to limit cognitive load and number of selections has been complex.
Work has also been carried out regarding the third challenge of increasing the communicative speed. For
each letter Mr X types he makes three selections and so giving Mr X methods to avoid typing every word
in full would enhance communicative speed. Three options were considered to approach this challenge.
The first of these was word prediction. This posed additional issues as word prediction requires a display
of the possible words and due to the five cell limit this resulted in either Mr X being taken to a separate
prediction page or having a single cell on the front page. These options were demonstrated to Mr X
however due to his visual impairment he found it difficult to see the predicted words and got frustrated
and agitated. The second option tried was abbreviation expansion, for example defining that if ‘hh’ is
typed it could be expanded to ‘hello, how are you?’ However with abbreviation expansion within the Grid
2 software a cell is needed to display the possible expansion and then that cell is selected to choose the
expansion. Again Mr X did not like this due to it requiring a cell on the home page and it being difficult to
see. The third option explored was auto replace. This is similar to abbreviation expansion however when a
unique letter combination is entered it is automatically expanded to a phrase, (e.g. typing ‘hh’ results in
‘hello, how are you?’). Mr X felt that this was the best option at this stage.
Conclusion
The process has highlighted important challenges and possible solutions when using eye gaze for a person
with severe physical disabilities who wants to access a range of functions. This is not finished as Mr X
identifies further requirements as he uses the system more. He currently would like access to music and
has had a photograph album added. He would also like to access the web and initial work has looked at
this however the constraints of workspace size and number of cells does present particular issues when
considering web access.
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 75
Prague, Czech Republic
Further work is required to look at expanding the functions available using eye gaze when a person is
limited in the number of targets they can manage due to physical limitations.
Acknowledgements
The authors would like to thank the patient, his family, therapists and carers. We would also like to thank
Mick Donegan for his help and support.
References
Donegan M, Oosthuizen L, Bates R, Daunys G, Hansen JP, Joos M, Majaranta P & Signorile I (2005)
D3.1 User Requirements Report with Observations of Difficulties Users Are Experiencing.
Communication by Gaze Interaction (COGAIN), IST-2003-511598: Available from:
http://www.cogain.org/results/reports/COGAIN-D3.1.pdf (accessed 14 May 2008).
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
76 September 2-3, 2008
Prague, Czech Republic
COGAIN 2008 Keynote by Dr. Anthony Hornof
The Human-Technical Challenge of Developing
Gaze-Controlled Devices
Abstract
Since the summer of 2003, students, collaborators, and I have been working together to develop eye-
controlled interfaces. We have met with some success, such as with the development of EyeDraw and
EyeMusic. EyeDraw is software that is specifically designed to enable children with motor impairments
to draw pictures using their eye movements. EyeDraw has been extensively tested and validated, and is
now distributed with a commercial eye tracker. EyeMusic is a system developed for computer musicians
(without motor impairments) that enables a performer to control a new media art performance with just his
or her eye movements. EyeMusic compositions have been performed at major computer music
conferences.
Working on these projects, my students, collaborators, and I have encountered many challenges, both
technical and human-centered, which are probably consistent with the difficult challenges faced by the
COGAIN community in general. Some of these challenges include: Understanding and decomposing a
human task to the point that it can be dictated by a series of eye movements, developing eye-controlled
software within the constraints of existing frameworks for programming graphical user interfaces,
connecting software across platforms, working with children and adults with severe motor impairments as
software testers and collaborators, getting comfortable and integrated with a unique physical and social
environment, providing roles for caregivers and siblings in the software, and building teams that span
incredibly disparate disciplines and practices.
My current research efforts have for the moment put eye tracking software development on hold, and
instead focus on spending time with children with severe motor impairments and their caregivers. The
goal is to figure out how to move the eye tracking software development process out of the isolated lab so
that it can better mesh with actual usage and practice. Along the way to designing new gaze-controlled
technology, developers can perhaps benefit by learning and using other “lower tech” methods for
communicating with a person with impairments. It is my hope and expectation that by facing these
challenges head-on that COGAIN and like-minded researchers can better solve the incredibly difficult
problem of delivering complex, thoughtful, and easy-to-use communication by gaze interaction.
Biography
Dr. Anthony J. Hornof is an Associate Professor in the Department of Computer and Information Science
at the University of Oregon. He joined the faculty in 1999 and was promoted with tenure in 2005. Dr.
Hornof earned his Ph.D. in 1999 and his Master's degree in 1996, both from the University of Michigan,
and both in Computer Science and Engineering. He received a B.A. in Computer Science from Columbia
University in 1988. After college, he remained in New York City for five years (1988-1993) where he
worked as an information technology specialist for Deloitte and Touche, and also part-time as a deejay at
nightclubs such as Save the Robots and M.K. He also pursued mixed-media painting during these years,
and his work was featured in group shows in New York City. In 1993, he redirected his creative and
intellectual energies towards a career in academia, where he now integrates his interests in computing,
The 4th Conference on Communication by Gaze Interaction – COGAIN 2008:
Communication, Environment and Mobility Control by Gaze
September 2-3, 2008 77
Prague, Czech Republic
human factors, and creative expression. Dr. Hornof is published in the leading human-computer
interaction conferences and journals, and has been awarded over $1.75 million in single-investigator
research grants, including multiple awards from the National Science Foundation and the Office of Naval
Research.
Dr. Anthony J. Hornof
Department of Computer and Information Science
University of Oregon
Eugene, Oregon 97403-1202
USA
email: hornof@cs.uoregon.edu
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime


