Flying a Manta with Gesture and Controller : An Exploration of Certain Interfaces in Human-Robot Interaction
Available from
Brian Ballsun-Stanton's profile on Mendeley.
Page 1
Flying a Manta with Gesture and Controller : An Exploration of Certain Interfaces in Human-Robot Interaction
Proceedings of GW2007 - 7th International Workshop
on Gesture in Human-Computer Interaction and
Simulation 2007
Editors
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Editorial Production
Ana Rita Leitão, ADETTI, Portugal
Online Proceedings:
http://www.adetti.pt/
Logo and Cover Credits
Ana Rita Leitão – ADETTI
Local Organiser
ADETTI - Associação para o Desenvolvimento das Telecomunicações e Técnicas de
Informática
Avenida das Forças Armadas, Edifício ISCTE
1600-082 Lisboa, PORTUGAL
Tel: +351 21 7826480 Fax: +351 21 7826488
www.adetti.pt
(Printed copies can be ordered from ADETTI)
Sponsors
ADETTI - Associação para o Desenvolvimento das Telecomunicações e Técnicas de
Informática, Grupo Português de Computação Gráfica, ISCTE – Instituto Superior de Ciências
do Trabalho e da Empresa, Fundação para a Ciência e Tecnologia, Associação de Turismo de
Lisboa, Springer
ISBN: 978-972-8862-05-3
May 2007
on Gesture in Human-Computer Interaction and
Simulation 2007
Editors
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Editorial Production
Ana Rita Leitão, ADETTI, Portugal
Online Proceedings:
http://www.adetti.pt/
Logo and Cover Credits
Ana Rita Leitão – ADETTI
Local Organiser
ADETTI - Associação para o Desenvolvimento das Telecomunicações e Técnicas de
Informática
Avenida das Forças Armadas, Edifício ISCTE
1600-082 Lisboa, PORTUGAL
Tel: +351 21 7826480 Fax: +351 21 7826488
www.adetti.pt
(Printed copies can be ordered from ADETTI)
Sponsors
ADETTI - Associação para o Desenvolvimento das Telecomunicações e Técnicas de
Informática, Grupo Português de Computação Gráfica, ISCTE – Instituto Superior de Ciências
do Trabalho e da Empresa, Fundação para a Ciência e Tecnologia, Associação de Turismo de
Lisboa, Springer
ISBN: 978-972-8862-05-3
May 2007
Page 2
Foreword
The International Gesture Workshop is an interdisciplinary event where researchers
working on human gesture-based communication present and exchange ideas and
advanced research currently in progress, on gesture across multi-disciplinary
scientific disciplines. This workshop encompasses all fundamental aspects of gestural
studies in the field of Human-Computer Interaction and Simulation, including all
multifaceted issues of modelling, analysis and synthesis of human gesture,
encompassing hand and body gestures and facial expressions. A focus of these events
is a shared interest in using gesture in the context of sign language analysis,
understanding and synthesis. Another stream of interest is the user centric approach
of considering gesture in multimodal human-computer interaction, in the framework
of the integration of such interaction into the natural environment of users. In addition
to welcoming submission of work by established researchers, it is the tradition of the
GW series of workshops to encourage submission of student work at various stages of
completion, enabling a broader dissemination of finished or on-going novel work, and
the exchange of experiences in a multi-disciplinary environment.
Submissions include papers, posters and demonstrations.
GW2007 is the 7th European Gesture Workshop in the GW series initiated in 1996.
Since this event, the Gesture Workshops have been held roughly every second year,
with fully reviewed post-proceedings typically published by Springer-Verlag.
In GW2007 53 contributions were received, of which were accepted 15 full papers,16
short papers and 10 as posters and demos.
Two brilliant key-note speakers honoured the event with their presentations: Dr.
Andrew Wilson, member of the Adaptive Systems and Interaction group at Microsoft
Research, and Prof. Joaquim Jorge, Associate Professor of Computer Science at
Instituto Superior Técnico (IST/UTL), the School of Engineering of the Technical
University of Lisboa, Portugal
Miguel Sales Dias
Workshop Chair
The International Gesture Workshop is an interdisciplinary event where researchers
working on human gesture-based communication present and exchange ideas and
advanced research currently in progress, on gesture across multi-disciplinary
scientific disciplines. This workshop encompasses all fundamental aspects of gestural
studies in the field of Human-Computer Interaction and Simulation, including all
multifaceted issues of modelling, analysis and synthesis of human gesture,
encompassing hand and body gestures and facial expressions. A focus of these events
is a shared interest in using gesture in the context of sign language analysis,
understanding and synthesis. Another stream of interest is the user centric approach
of considering gesture in multimodal human-computer interaction, in the framework
of the integration of such interaction into the natural environment of users. In addition
to welcoming submission of work by established researchers, it is the tradition of the
GW series of workshops to encourage submission of student work at various stages of
completion, enabling a broader dissemination of finished or on-going novel work, and
the exchange of experiences in a multi-disciplinary environment.
Submissions include papers, posters and demonstrations.
GW2007 is the 7th European Gesture Workshop in the GW series initiated in 1996.
Since this event, the Gesture Workshops have been held roughly every second year,
with fully reviewed post-proceedings typically published by Springer-Verlag.
In GW2007 53 contributions were received, of which were accepted 15 full papers,16
short papers and 10 as posters and demos.
Two brilliant key-note speakers honoured the event with their presentations: Dr.
Andrew Wilson, member of the Adaptive Systems and Interaction group at Microsoft
Research, and Prof. Joaquim Jorge, Associate Professor of Computer Science at
Instituto Superior Técnico (IST/UTL), the School of Engineering of the Technical
University of Lisboa, Portugal
Miguel Sales Dias
Workshop Chair
Page 3
Workshop Chair
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Poster and Demo Chair
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Local Organising Committee
Ana Rita Leitão - ADETTI, Portugal
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Rafael Bastos - ADETTI, Portugal
International Programme Committee
Annelis Braffort - LIMSI, Orsay, France
António Augusto Sousa - INESC Porto/FEUP
Antonio Camurri-InfoMus Lab, DIST, University of Genova, Italy
Bencie Woll - Sign Language and Deaf Studies / UCL DCAL Research Centre, UK
Christian Vogler-Washington University, USA
Daniel Arfib - CNRS-LMA, Marseille, France
David McNeill - Center for Gesture and Speech Research, university of Chicago,
USA
Gildas Ménier - VALORIA, University of Bretagne Sud, France
Gualterio Volpe-InfoMus Lab, DIST, University of Genova, Italy
Héloir Alexis -VALORIA, University of Bretagne Sud, France
Hermann Ney - Aachen University, Germany
Ipke Wachsmuth - Bielefeld University, Germany
Isabel Hub Faria - Faculdade de Letras da Uni. de Lisboa, Portugal
Jean-François Kamp - VALORIA, University of Bretagne Sud, France
Joaquim Jorge - Instituto Superior Técnico, Portugal
Joaquim Madeira - DETI-IEETA/UA, Portugal
Jorge Salvador Marques - IST/ISR, Portugal
José Manuel Rebordão - INETI & FCUL, Portugal
Leonel Valbom- UM, Portugal
Manuel João Fonseca-IMMI – INESC-ID; Portugal
Manuel Próspero dos Santos - FCT/UNL, Portugal
Marcelo Wanderley - McGill University, Canada
Marianne Gullberg - Max Planck Institute for Psycholinguistics, Nijmegen, The
Netherlands
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Nicolas Courty - VALORIA, University of Bretagne Sud, France
Nuno Correia - DI/FCT/UNL, Portugal
Peter Wittenburg - Max-Planck Institute of Psycholinguistics, Nijmegen, The
Netherlands
Philippe Gorce - LESP, Université de Toulon et du Var, France
Pierre-François Marteau - VALORIA, University of Bretagne Sud, France
Richard Kennaway - Norwich, United Kingdom
Ronan Boulic - Virtual Reality Lab, EPFL, Switzerland
Seong-Whan Lee - Korea University, Korea
Sylvie Gibet - Valoria, Université de Bretagne Sud, France
Teresa Chambel - FC/UL, Portugal
Timo Sowa - Bielefeld University , Germany
Winand Dittrich - University of Hertfordshire, United Kingdom
Ying Wu - Northwestern University, Evanston, USA
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Poster and Demo Chair
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Local Organising Committee
Ana Rita Leitão - ADETTI, Portugal
Ricardo Jota - IST/Technical University of Lisbon, Portugal
Rafael Bastos - ADETTI, Portugal
International Programme Committee
Annelis Braffort - LIMSI, Orsay, France
António Augusto Sousa - INESC Porto/FEUP
Antonio Camurri-InfoMus Lab, DIST, University of Genova, Italy
Bencie Woll - Sign Language and Deaf Studies / UCL DCAL Research Centre, UK
Christian Vogler-Washington University, USA
Daniel Arfib - CNRS-LMA, Marseille, France
David McNeill - Center for Gesture and Speech Research, university of Chicago,
USA
Gildas Ménier - VALORIA, University of Bretagne Sud, France
Gualterio Volpe-InfoMus Lab, DIST, University of Genova, Italy
Héloir Alexis -VALORIA, University of Bretagne Sud, France
Hermann Ney - Aachen University, Germany
Ipke Wachsmuth - Bielefeld University, Germany
Isabel Hub Faria - Faculdade de Letras da Uni. de Lisboa, Portugal
Jean-François Kamp - VALORIA, University of Bretagne Sud, France
Joaquim Jorge - Instituto Superior Técnico, Portugal
Joaquim Madeira - DETI-IEETA/UA, Portugal
Jorge Salvador Marques - IST/ISR, Portugal
José Manuel Rebordão - INETI & FCUL, Portugal
Leonel Valbom- UM, Portugal
Manuel João Fonseca-IMMI – INESC-ID; Portugal
Manuel Próspero dos Santos - FCT/UNL, Portugal
Marcelo Wanderley - McGill University, Canada
Marianne Gullberg - Max Planck Institute for Psycholinguistics, Nijmegen, The
Netherlands
Miguel Sales Dias - Microsoft & ADETTI/ISCTE, Portugal
Nicolas Courty - VALORIA, University of Bretagne Sud, France
Nuno Correia - DI/FCT/UNL, Portugal
Peter Wittenburg - Max-Planck Institute of Psycholinguistics, Nijmegen, The
Netherlands
Philippe Gorce - LESP, Université de Toulon et du Var, France
Pierre-François Marteau - VALORIA, University of Bretagne Sud, France
Richard Kennaway - Norwich, United Kingdom
Ronan Boulic - Virtual Reality Lab, EPFL, Switzerland
Seong-Whan Lee - Korea University, Korea
Sylvie Gibet - Valoria, Université de Bretagne Sud, France
Teresa Chambel - FC/UL, Portugal
Timo Sowa - Bielefeld University , Germany
Winand Dittrich - University of Hertfordshire, United Kingdom
Ying Wu - Northwestern University, Evanston, USA
Page 4
Page 5
Workshop Programme
Session 1
Analysis and Synthesis of Gesture 9
• Gesture Recognition Based On Elastic Deformation Energies
Radu Daniel Vatavu, Laurent Grisoni, Stefan Gheorghe Pentiuc
10
• Approximation of curvature and velocity using adaptive
sampling representations - Application to hand gesture
analysis
Sylvie Gibet, Pierre-François Marteau
12
• Motion Primitives for Action Recognition
Thomas Moeslund
14
• Modeling Human Behaviors Using Bayesian Network with
Conditional Hidden Nodes
Myung-Cheol Roh, Seong-Whan Lee
16
Session 2
Theoretical Aspectd of Gestural Communication and Interaction 19
• On the parametrization of clapping
Herwin van Welbergen, Zsofi Ruttkay
20
• Improving the believability of virtual characters using
qualitative gesture analysis
Barbara Mazzarino, Manuel Peinado, Ronan Boulic, Gualtiero
Volpe,
22
• The application of a framework for the micro-analysis of
speech and body movements in face-to-face interaction
Isabel Galhano-Rodrigues
24
Session 3
Vision-based Gesture Recognition 27
• Representation of human postures for vision-based gesture
recognition in real-time
Antoni Jaume-i-Capó, Javier Varona, Francisco J. Perales
28
• Person-Independent 3D Sign Language Recognition
Gineke ten Holt, Jeroen Lichtenauer, Marcel Reinders, Emile
Hendriks
30
• Skin Color Profile Capture for Scale and Rotation Invariant
Hand Gesture Recognition
Rafael Bastos, Miguel Dias
32
• Robust tracking for processing of videos of communication's
gestures
Frédérick Gianni, Christophe Collet, Patrice Dalle
34
37
Session 1
Analysis and Synthesis of Gesture 9
• Gesture Recognition Based On Elastic Deformation Energies
Radu Daniel Vatavu, Laurent Grisoni, Stefan Gheorghe Pentiuc
10
• Approximation of curvature and velocity using adaptive
sampling representations - Application to hand gesture
analysis
Sylvie Gibet, Pierre-François Marteau
12
• Motion Primitives for Action Recognition
Thomas Moeslund
14
• Modeling Human Behaviors Using Bayesian Network with
Conditional Hidden Nodes
Myung-Cheol Roh, Seong-Whan Lee
16
Session 2
Theoretical Aspectd of Gestural Communication and Interaction 19
• On the parametrization of clapping
Herwin van Welbergen, Zsofi Ruttkay
20
• Improving the believability of virtual characters using
qualitative gesture analysis
Barbara Mazzarino, Manuel Peinado, Ronan Boulic, Gualtiero
Volpe,
22
• The application of a framework for the micro-analysis of
speech and body movements in face-to-face interaction
Isabel Galhano-Rodrigues
24
Session 3
Vision-based Gesture Recognition 27
• Representation of human postures for vision-based gesture
recognition in real-time
Antoni Jaume-i-Capó, Javier Varona, Francisco J. Perales
28
• Person-Independent 3D Sign Language Recognition
Gineke ten Holt, Jeroen Lichtenauer, Marcel Reinders, Emile
Hendriks
30
• Skin Color Profile Capture for Scale and Rotation Invariant
Hand Gesture Recognition
Rafael Bastos, Miguel Dias
32
• Robust tracking for processing of videos of communication's
gestures
Frédérick Gianni, Christophe Collet, Patrice Dalle
34
37
Page 6
Session 4
Sign Language Processing
• Generating Data for Signer Adaptation
Chunli Wang, Xilin Chen, Wen Gao
38
• A Qualitative and Quantitative Characterisation of Style in
Sign Language Gestures
Alexis Heloir, Sylvie Gibet
40
• Gesture Modelling for Linguistic Purposes
Guillaume Jean-Louis Olivrin
42
• Signing Avatar: Say hello to Elsi!
Michael Filhol, Annelies Braffort, Laurence Bolot
44
• Sequential Belief-Based Fusion of Manual and Non-Manual
Signs
Oya Aran, Thomas Burger, Alice Caplier, Lale Akarun
46
Session 5
Gesturing with Tangible Interfaces and in Virtual Augmented
Reality 49
• Flying a Manta with Gesture and Controller: An Exploration
of Certain Interfaces in Human-Robot Interaction
Brian Ballsun-Stanton, Jon Schull
50
• Automatic Classification of Expressive Hand Gestures on
Tangible Acoustic Interfaces According to Laban's Theory of
Effort
Antonio Camurri, Corrado Canepa, Simone Ghisio, Gualtiero
Volpe
52
• Using Hand Gesture and Speech in a Multimodal Augmented
Reality Environment
Miguel Dias, Rafael Bastos, João Fernandes, João Tavares, Pedro
Santos
54
• Implementing distinctive behavior for conversational agents
Maurizio Mancini, Catherine Pelachaud
56
Session 6
Gesture for Music and Performing Arts 59
• Geometry and effort and in gestural renderings of musical
sound
Rolf Inge Godoy
60
• String Bowing Gestures at Varying Bow Stroke Frequencies:
A Case Study
Nicolas Rasamimanana, Delphine Bernardin, Marcelo
Wanderley, Frederic Bevilacqua
62
Sign Language Processing
• Generating Data for Signer Adaptation
Chunli Wang, Xilin Chen, Wen Gao
38
• A Qualitative and Quantitative Characterisation of Style in
Sign Language Gestures
Alexis Heloir, Sylvie Gibet
40
• Gesture Modelling for Linguistic Purposes
Guillaume Jean-Louis Olivrin
42
• Signing Avatar: Say hello to Elsi!
Michael Filhol, Annelies Braffort, Laurence Bolot
44
• Sequential Belief-Based Fusion of Manual and Non-Manual
Signs
Oya Aran, Thomas Burger, Alice Caplier, Lale Akarun
46
Session 5
Gesturing with Tangible Interfaces and in Virtual Augmented
Reality 49
• Flying a Manta with Gesture and Controller: An Exploration
of Certain Interfaces in Human-Robot Interaction
Brian Ballsun-Stanton, Jon Schull
50
• Automatic Classification of Expressive Hand Gestures on
Tangible Acoustic Interfaces According to Laban's Theory of
Effort
Antonio Camurri, Corrado Canepa, Simone Ghisio, Gualtiero
Volpe
52
• Using Hand Gesture and Speech in a Multimodal Augmented
Reality Environment
Miguel Dias, Rafael Bastos, João Fernandes, João Tavares, Pedro
Santos
54
• Implementing distinctive behavior for conversational agents
Maurizio Mancini, Catherine Pelachaud
56
Session 6
Gesture for Music and Performing Arts 59
• Geometry and effort and in gestural renderings of musical
sound
Rolf Inge Godoy
60
• String Bowing Gestures at Varying Bow Stroke Frequencies:
A Case Study
Nicolas Rasamimanana, Delphine Bernardin, Marcelo
Wanderley, Frederic Bevilacqua
62
Page 7
• The premio paganini experiment : a multimodal [gesture-
based] approach for explaining emotional processes in music
performance.
Donald Glowinski, Antonio Camurri, Carol L. Krumhansl, Ben
Knapp, Roddy Cowie
64
• A perceptual-based algorithm for segmentation of human full-
body movement : a pilot experiment
Donald Glowinski, Antonio Camurri, Gualtiero Volpe, Barbara
Mazzarino
66
Session 7
Gesture for Therapy and Rehabilitation 69
• Assistive Technologies for Spinal Cord Injured Individuals:
Electromyographic Mobile Accessibility
Tiago Guerreiro, Joaquim Jorge
70
• Signs Workshop: the importance of natural gestures in the
Promotion of Early Communication Skills of Children with
Developmental Disabilities
Ana Margarida P. Almeida, Teresa Ferreira, Fernando
Ramos, Álvaro Sousa, Luisa Cotrim
72
• Interactive ergonomic analysis of a physically disabled
person's workplace
Matthieu Aubry, Frédéric Julliard, Sylvie Gibet
74
Session 8
Gesture In Mobile Computing and Usability Studies 77
• Mnemonical Body Shortcuts
Ricardo Gamboa, Tiago Guerreiro, Joaquim Jorge
78
• The effects of the gesture viewpoint on the students' memory
of words and stories
Giorgio Merola
80
• Gesture Control of Sound Spatialization
Mark Marshall, Joseph Malloch, Marcelo Wanderley
82
Author Index 85
based] approach for explaining emotional processes in music
performance.
Donald Glowinski, Antonio Camurri, Carol L. Krumhansl, Ben
Knapp, Roddy Cowie
64
• A perceptual-based algorithm for segmentation of human full-
body movement : a pilot experiment
Donald Glowinski, Antonio Camurri, Gualtiero Volpe, Barbara
Mazzarino
66
Session 7
Gesture for Therapy and Rehabilitation 69
• Assistive Technologies for Spinal Cord Injured Individuals:
Electromyographic Mobile Accessibility
Tiago Guerreiro, Joaquim Jorge
70
• Signs Workshop: the importance of natural gestures in the
Promotion of Early Communication Skills of Children with
Developmental Disabilities
Ana Margarida P. Almeida, Teresa Ferreira, Fernando
Ramos, Álvaro Sousa, Luisa Cotrim
72
• Interactive ergonomic analysis of a physically disabled
person's workplace
Matthieu Aubry, Frédéric Julliard, Sylvie Gibet
74
Session 8
Gesture In Mobile Computing and Usability Studies 77
• Mnemonical Body Shortcuts
Ricardo Gamboa, Tiago Guerreiro, Joaquim Jorge
78
• The effects of the gesture viewpoint on the students' memory
of words and stories
Giorgio Merola
80
• Gesture Control of Sound Spatialization
Mark Marshall, Joseph Malloch, Marcelo Wanderley
82
Author Index 85
Page 8
Page 9
Analysis and Synthesis of Gesture
Page 12
Approximation of curvature and velocity using adap-
tive sampling representations - application to hand ges-
ture analysis
Sylvie Gibet, Pierre-François Marteau
VALORIA, Université de Bretagne Sud, Campus de Tohannic, rue Yves Mainguy,
F-56000 Vannes, France
{Sylvie.Gibet, Pierre-Francois.Marteau}@univ-ubs.fr
The representation and the accurate understanding of human gesture is a crucial and
challenging problem which was raised in several research fields, including animation
of embodied agents, sport sciences, medicine or vision-based recognition. In recent
years, the huge development of new technologies for motion capture has made the
analysis of human motion feasible, and yielded to data-based methods for gesture
classification, retrieval, and computer-generated animation.
This paper describes a new approach to analyze hand gestures, based on an ex-
perimental approximation of the shape and kinematics of compressed arm trajectories.
The motivation of such model is on the one hand the reduction of the gesture data,
and on the other hand the possibility to segment gestures into meaningful units, yield-
ing to an analysis tool for gesture coding and synthesis.
In this work, as we are mainly interested in visual gestures, which are gestures that
draw the 3D space, we express them by 3D Cartesian trajectories. These gestures
are most of the time conveying meaningful information, as in sign languages gestures,
or expressive gestures like dance or musical gestures. They can be characterized by
their shape (change of curvature), as well as by their kinematic specificities. In sign
language gestures for example, the signer can draw the shape of the symbol as an
icon of some aspect of the object or the activity to be symbolized. Expressive gestures
may also implicitly contain some velocity or acceleration profiles. In particular varia-
tions in velocity are responsible for the aggregation of samples in some areas of the
trajectories.
We propose in this paper to study both these spatial and kinematics characteristics,
computed on arm end-point trajectories in a reduced representation space. A new
method is proposed for computing an approximation of the curvature and velocity
Page 13
characterizing arm trajectories. This method is applied on compressed data, obtained
from an adaptive sampling algorithm. This algorithm extracts discrete target patterns
from raw motion data, for a given compression rate. Given a desired trajectory, we
already showed that the targets patterns can represent in an optimal way the original
trajectory.
We show that the measures of the distance between adaptive samples and velocity
estimated at these points are respectively correlated to the instantaneous curvature
and tangential velocity directly computed on motion capture data. These approxima-
tions can therefore be used as an alternative to represent both the shape and the kine-
matics of end-point trajectories. Based on these correlation results, we propose a way
to identify kinematic segments on arm end-point trajectories. We also show that this
analysis tool can be applied on multidimensional data.
These measures provide an efficient way to automatically segment gestures. An in-
terpretation is given for the segmentation of sign language gestures. It is also possi-
ble, based on this method, to use the identified segments as input of our generation
system.
from an adaptive sampling algorithm. This algorithm extracts discrete target patterns
from raw motion data, for a given compression rate. Given a desired trajectory, we
already showed that the targets patterns can represent in an optimal way the original
trajectory.
We show that the measures of the distance between adaptive samples and velocity
estimated at these points are respectively correlated to the instantaneous curvature
and tangential velocity directly computed on motion capture data. These approxima-
tions can therefore be used as an alternative to represent both the shape and the kine-
matics of end-point trajectories. Based on these correlation results, we propose a way
to identify kinematic segments on arm end-point trajectories. We also show that this
analysis tool can be applied on multidimensional data.
These measures provide an efficient way to automatically segment gestures. An in-
terpretation is given for the segmentation of sign language gestures. It is also possi-
ble, based on this method, to use the identified segments as input of our generation
system.
Page 17
where G represents the `Gesture' node, and Np(k) represents a feature node at
time k. The score for a kth gesture, Gk, is de¯ned as follows:
Score(Gk; Obs) =
X T
t=1P (Gk; N1(t¡ 1); ¢ ¢ ¢Np(t))
where T is the length of the sequence and Obs is the input observation sequence.
However, since complex or long gestures may have many variations, they
cannot be modeled e±ciently using the model. Thus, we also proposed an HBN
(Hidden Bayesian Network) model which is structured so as to consider hidden
nodes, which can model more complex gesture. The HBN has an hidden node for
each time slice, and which is conditioned to the gesture node. This model is so
structured as to consider hidden nodes which increase the description capability
of time states. Each time slice is conditioned by the hidden node and the gesture
node. Therefore, the joint probability of all nodes in the network is as follows:
P (G;H(t¡ 1);H(t); N1(t¡ 1); N2(t¡ 1); ¢ ¢ ¢ ; Np(t))
= P (G)P (H(t¡ 1)jG)P (H(t)jG)P (N1(t¡ 1)jH(t¡ 1); G) ¢ ¢ ¢
where H(k) is a hidden node at time k.
3 Experimental Results and Discussion
To validate the proposed methods for human gesture recognition, we made ex-
periments on the KUGDB[5]. 6 gestures of 10 subjects, `bending a waist', `walk-
ing at a place', `raising a hand', `sitting on a chair', `standing up from a chair'
and `waving a hand' are used. We simply used mean vectors of optical °ows as
features, which are estimated at four sub-regions which are top-right, top-left,
bottom-right and bottom-left regions of the foreground. According to the ex-
periments, the CRF, BNTW and HBN achieved 86%, 92% and 94% recognition
accuracy, respectively.
Human gesture have complex characteristics and the BN is a natural and
appropriate tool for modeling these complexity. In this paper, we proposed two
modeling methods for human gesture using the BN. The extension of the model
can be used to more complex and various human gesture analysis.
References
1. Wang, Sy., Quattoni, A., Morency, L. P., Demirdjian, D. and Darrell, T.: Hidden
Conditional Random Fields for Gesture Recognition. Proc. of IEEE Conf. on Com-
puter Vision and Pattern Recognition (2006) 1521-1527
2. Murphy, K., http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html
3. Heckerman, D., A Tutorial on Learning with Bayesian Networks. Technical Report
MSR-TR-95-06. Microsoft Research. March (1995)
4. Fenton, N., http://www.dcs.qmul.ac.uk/ norman/BBNs/BBNs.htm
5. Korea University Gesture Database, http://gesturedb.korea.ac.kr
time k. The score for a kth gesture, Gk, is de¯ned as follows:
Score(Gk; Obs) =
X T
t=1P (Gk; N1(t¡ 1); ¢ ¢ ¢Np(t))
where T is the length of the sequence and Obs is the input observation sequence.
However, since complex or long gestures may have many variations, they
cannot be modeled e±ciently using the model. Thus, we also proposed an HBN
(Hidden Bayesian Network) model which is structured so as to consider hidden
nodes, which can model more complex gesture. The HBN has an hidden node for
each time slice, and which is conditioned to the gesture node. This model is so
structured as to consider hidden nodes which increase the description capability
of time states. Each time slice is conditioned by the hidden node and the gesture
node. Therefore, the joint probability of all nodes in the network is as follows:
P (G;H(t¡ 1);H(t); N1(t¡ 1); N2(t¡ 1); ¢ ¢ ¢ ; Np(t))
= P (G)P (H(t¡ 1)jG)P (H(t)jG)P (N1(t¡ 1)jH(t¡ 1); G) ¢ ¢ ¢
where H(k) is a hidden node at time k.
3 Experimental Results and Discussion
To validate the proposed methods for human gesture recognition, we made ex-
periments on the KUGDB[5]. 6 gestures of 10 subjects, `bending a waist', `walk-
ing at a place', `raising a hand', `sitting on a chair', `standing up from a chair'
and `waving a hand' are used. We simply used mean vectors of optical °ows as
features, which are estimated at four sub-regions which are top-right, top-left,
bottom-right and bottom-left regions of the foreground. According to the ex-
periments, the CRF, BNTW and HBN achieved 86%, 92% and 94% recognition
accuracy, respectively.
Human gesture have complex characteristics and the BN is a natural and
appropriate tool for modeling these complexity. In this paper, we proposed two
modeling methods for human gesture using the BN. The extension of the model
can be used to more complex and various human gesture analysis.
References
1. Wang, Sy., Quattoni, A., Morency, L. P., Demirdjian, D. and Darrell, T.: Hidden
Conditional Random Fields for Gesture Recognition. Proc. of IEEE Conf. on Com-
puter Vision and Pattern Recognition (2006) 1521-1527
2. Murphy, K., http://www.cs.ubc.ca/ murphyk/Bayes/bnintro.html
3. Heckerman, D., A Tutorial on Learning with Bayesian Networks. Technical Report
MSR-TR-95-06. Microsoft Research. March (1995)
4. Fenton, N., http://www.dcs.qmul.ac.uk/ norman/BBNs/BBNs.htm
5. Korea University Gesture Database, http://gesturedb.korea.ac.kr
Page 18
Page 19
Theoretical Aspect of Gestural Communication and Interaction
Page 23
2
Results of the analysis on the first set of movies show us a higher rate of segmenta-
tion in the motion of the virtual humanoid that means a general lower fluidity of the
motion. An overview of the segmentation analysis is represented in the table below.
Movie Number of motion
phases
Real subject movie. Real 10
Off Line Algorithm 17 First set of movies
Real Time Algorithm 16
1 Iteration 11
2-3-5 Iterations 13
Second set of movies
10 Iterations, Off line 13
Off Line Algorithm 12 Last set of movies
with posture correction Real Time Algorithm 16
Using the second definition of fluency we could also identify the reason of such
higher segmentation rate. In fact, the lower part of the body is generally not perform-
ing any relevant motion, motion is concentrated on the arms and torso, while legs
have just to compensate equilibrium; instead in the real subject the lower body part
follows the upper body part in any period.
The results of the space occupation analysis identify in the motion of the virtual
legs the main responsible of unbelievable motion. In fact due to the centre of gravity
constraint the reconstructed human has a motion reduced to vertical oscillations, on
the contrary legs motion of the musician is generally closed with small local varia-
tions in order to follow the upper body motion.
The methodology here presented is an innovative and precious complementary tool
to believability studies that are currently often based on analysing solely viewer feed-
back through questionnaires.
Acknowledgments.
This research has been supported by the E.U. Network of Excellence on Enactive In-
terfaces. We would like to thanks to Marie-Julie Chagnon for the data from her clari-
net performance.
References
1. Camurri A., Mazzarino B., Volpe G. Analysis of Expressive Gesture: The EyesWeb Ex-
pressive Gesture Processing Library, Gesture-based Communication in Human-Computer
Interaction, Springer-Verlag, LNAI, Vol. 2915 / 2004, ISBN 3-540-21072-5, pp. 20-39.
2. Boulic R., Peinado M., Raunhardt D., “Challenges in Exploiting Prioritized Inverse Kine-
matics for Motion Capture and Postural Control”, Springer Verlag, LNAI, Vol. 3881 / 2006,
ISBN: 3-540-32624-3, Chapter: pp. 176 – 187.
Results of the analysis on the first set of movies show us a higher rate of segmenta-
tion in the motion of the virtual humanoid that means a general lower fluidity of the
motion. An overview of the segmentation analysis is represented in the table below.
Movie Number of motion
phases
Real subject movie. Real 10
Off Line Algorithm 17 First set of movies
Real Time Algorithm 16
1 Iteration 11
2-3-5 Iterations 13
Second set of movies
10 Iterations, Off line 13
Off Line Algorithm 12 Last set of movies
with posture correction Real Time Algorithm 16
Using the second definition of fluency we could also identify the reason of such
higher segmentation rate. In fact, the lower part of the body is generally not perform-
ing any relevant motion, motion is concentrated on the arms and torso, while legs
have just to compensate equilibrium; instead in the real subject the lower body part
follows the upper body part in any period.
The results of the space occupation analysis identify in the motion of the virtual
legs the main responsible of unbelievable motion. In fact due to the centre of gravity
constraint the reconstructed human has a motion reduced to vertical oscillations, on
the contrary legs motion of the musician is generally closed with small local varia-
tions in order to follow the upper body motion.
The methodology here presented is an innovative and precious complementary tool
to believability studies that are currently often based on analysing solely viewer feed-
back through questionnaires.
Acknowledgments.
This research has been supported by the E.U. Network of Excellence on Enactive In-
terfaces. We would like to thanks to Marie-Julie Chagnon for the data from her clari-
net performance.
References
1. Camurri A., Mazzarino B., Volpe G. Analysis of Expressive Gesture: The EyesWeb Ex-
pressive Gesture Processing Library, Gesture-based Communication in Human-Computer
Interaction, Springer-Verlag, LNAI, Vol. 2915 / 2004, ISBN 3-540-21072-5, pp. 20-39.
2. Boulic R., Peinado M., Raunhardt D., “Challenges in Exploiting Prioritized Inverse Kine-
matics for Motion Capture and Postural Control”, Springer Verlag, LNAI, Vol. 3881 / 2006,
ISBN: 3-540-32624-3, Chapter: pp. 176 – 187.
Page 24
The application of a framework for the micro-analysis
of speech and body movements in face-to-face interaction
Isabel Galhano Rodrigues1
1
Faculdade de Letras da Universidade do Porto, Centro de Linguística da Universidade do
Porto
Via Panorâmica, s/n,
4150 – 465 Porto, Portugal
{Isabel Galhano Rodrigues, irodrig}@letras.up.pt
Extended Abstract: In this paper a functional framework for the micro-analysis of
speech and body movements in face-to-face interaction will be described. This
framework consists in a set of functional categories – the conversational signals– and
strategies (cf. Rodrigues, 1998, 2003) resulting from a synthesis of categories and
principles developed within Ethnometodologic Conversational Analysis, Discourse
Analysis and Contextualization Theory. Recent studies regarding multimodality were
also taken in account (cf. Kendon, 1994; McNeill, 1992; Poggi, 2006; Poggi, Cirela,
Zollo, Agustini, 2003).
Based on this theoretical background, different aspects of the relations between, on
one side, communicative and expressive body movements (head and trunk
movements, gestures, gaze, facial expressions like smile and eyebrow raising) and, on
the other side, the verbal modalities they accompany (words, parts of words, clauses,
hesitations and prosody) were described. For this purpose, speech and movement
units were isolated, identified and interpreted within their context, always considering
a) the relations between functions, forms and meanings of the different nonverbal
modalities, b) the relations between functions, forms and meanings of these nonverbal
modalities and the parts of speech they refer to, and c) the relations between
functions, forms and meanings of verbal and nonverbal modalities of the different
interaction partners.
As it has already been proved for face-to-face interactions (Rodrigues, 2006, in the
press), every movement (and non-movement) of body parts accompanying speech,
performed both unconsciously and in order to communicate, can assume different
functions at the same time at different interactional levels. The levels considered were
the level of the social relations between interaction partners, the level of the logical-
argumentative development of the theme, the level of the articulations between parts
of speech and the level of modality, concerning emotions, attitudes and expectations.
It could be stated that several nonverbal modalities performed by the speaker at one
moment of interaction, do not necessarily have the same functions: for instance, a
gesture can be made in order to close a theme; the upwards orientation of gaze,
simultaneously performed to this gesture, can be a signal of speaker’s intention to
continue (maintain) the turn, while the head-nods, also performed at the same time,
can show speaker’s agreement with the hearer, or a reinforcement of the idea
expressed before. In relation to the verbal modalities, it happens in the same way: a
of speech and body movements in face-to-face interaction
Isabel Galhano Rodrigues1
1
Faculdade de Letras da Universidade do Porto, Centro de Linguística da Universidade do
Porto
Via Panorâmica, s/n,
4150 – 465 Porto, Portugal
{Isabel Galhano Rodrigues, irodrig}@letras.up.pt
Extended Abstract: In this paper a functional framework for the micro-analysis of
speech and body movements in face-to-face interaction will be described. This
framework consists in a set of functional categories – the conversational signals– and
strategies (cf. Rodrigues, 1998, 2003) resulting from a synthesis of categories and
principles developed within Ethnometodologic Conversational Analysis, Discourse
Analysis and Contextualization Theory. Recent studies regarding multimodality were
also taken in account (cf. Kendon, 1994; McNeill, 1992; Poggi, 2006; Poggi, Cirela,
Zollo, Agustini, 2003).
Based on this theoretical background, different aspects of the relations between, on
one side, communicative and expressive body movements (head and trunk
movements, gestures, gaze, facial expressions like smile and eyebrow raising) and, on
the other side, the verbal modalities they accompany (words, parts of words, clauses,
hesitations and prosody) were described. For this purpose, speech and movement
units were isolated, identified and interpreted within their context, always considering
a) the relations between functions, forms and meanings of the different nonverbal
modalities, b) the relations between functions, forms and meanings of these nonverbal
modalities and the parts of speech they refer to, and c) the relations between
functions, forms and meanings of verbal and nonverbal modalities of the different
interaction partners.
As it has already been proved for face-to-face interactions (Rodrigues, 2006, in the
press), every movement (and non-movement) of body parts accompanying speech,
performed both unconsciously and in order to communicate, can assume different
functions at the same time at different interactional levels. The levels considered were
the level of the social relations between interaction partners, the level of the logical-
argumentative development of the theme, the level of the articulations between parts
of speech and the level of modality, concerning emotions, attitudes and expectations.
It could be stated that several nonverbal modalities performed by the speaker at one
moment of interaction, do not necessarily have the same functions: for instance, a
gesture can be made in order to close a theme; the upwards orientation of gaze,
simultaneously performed to this gesture, can be a signal of speaker’s intention to
continue (maintain) the turn, while the head-nods, also performed at the same time,
can show speaker’s agreement with the hearer, or a reinforcement of the idea
expressed before. In relation to the verbal modalities, it happens in the same way: a
Page 25
linguistic element (or set of elements) can be used for instance to close a preceding
theme and to maintain the turn; the prosodic features of this same element can
simultaneously convey speaker’s attitude (mental state) in relation to what has been
said and done or in relation to what he is going to say. In this way, both verbal and
nonverbal modalities are polisemic and polifunctional.
In order to explain this, a micro-analysis of a segment of a face-to-face interaction
between three participants will be described. This micro-analysis provides some
information on how speech and body movements function regarding their
coordination, formal tendencies, that is, which kind of functions are more alike to be
performed by which modality, if simultaneously performed movements of different
modalities have different conversational functions, or if they collaborate in the same
direction, contributing to redundancy.
This proposal for a multimodal micro-analysis of speech and body movements also
offers the possibility of creating a notation scheme with the abbreviations of forms
and the corresponding functions attributed to them considering context. This scheme
is useful, as it facilitates the systematization of analysis results. Any further analysis
can contribute to improve or confirm the reliance of the results obtained.
This micro-analytic framework can be applied not only in all disciplines that deal
with different aspects of face-to-face interaction or human communication, such as
Linguistics, Psycholinguistics, Social Psychology, Communication Sciences,
Anthropology, Ethology, and Artificial Intelligence, but also in areas where body
movements play an important role in the enactment of emotions, like Dance, Theater
and Music.
References:
1. Kendon, A.: Gesture. Visible action as Utterance. Cambridge University Press, Cambridge
(2004)
2. McNeill, D.: Hand and Mind. Chicago University Press, Chicago Il.(1992)
3. Poggi, I.: Le parole del corpo. Introduzione alla comunicazione multimodale. Carocci
Editore, Roma (2006)
4. Poggi, I., Cirela, F., Zollo, A., Agustini, A(2003) “The comunicative system of touch.
Lexicon alphabeth and norms of use.” In: Camurri, A., Volpe, G., Mazzarino, B. (eds.)
Proceedings of Gesture Workshops, April 15-17, Génova 2003.
5. Rodrigues, I.:Os sinais conversacionais de alternância de vez. Granito Editores e Livreiros,
Porto (1998)
6. Rodrigues, I.: Fala e movimentos do corpo na interação face a face. Estratégias de
reparação e de (des)focalização e co-funções conversacionais na manutenção de vez.
Gulbenkian/Fundação para a Ciência e Tecnologia, Lisboa (to be published in 2007)
7. Rodrigues, I.: Contar pelos dedos: um sinal de manutenção de vez na interacção face a face.
Lusorama, Zeitschrift für Lusitanistik. Revista de Estudos sobre os Países de Língua
Português (2006), 65-66 .
theme and to maintain the turn; the prosodic features of this same element can
simultaneously convey speaker’s attitude (mental state) in relation to what has been
said and done or in relation to what he is going to say. In this way, both verbal and
nonverbal modalities are polisemic and polifunctional.
In order to explain this, a micro-analysis of a segment of a face-to-face interaction
between three participants will be described. This micro-analysis provides some
information on how speech and body movements function regarding their
coordination, formal tendencies, that is, which kind of functions are more alike to be
performed by which modality, if simultaneously performed movements of different
modalities have different conversational functions, or if they collaborate in the same
direction, contributing to redundancy.
This proposal for a multimodal micro-analysis of speech and body movements also
offers the possibility of creating a notation scheme with the abbreviations of forms
and the corresponding functions attributed to them considering context. This scheme
is useful, as it facilitates the systematization of analysis results. Any further analysis
can contribute to improve or confirm the reliance of the results obtained.
This micro-analytic framework can be applied not only in all disciplines that deal
with different aspects of face-to-face interaction or human communication, such as
Linguistics, Psycholinguistics, Social Psychology, Communication Sciences,
Anthropology, Ethology, and Artificial Intelligence, but also in areas where body
movements play an important role in the enactment of emotions, like Dance, Theater
and Music.
References:
1. Kendon, A.: Gesture. Visible action as Utterance. Cambridge University Press, Cambridge
(2004)
2. McNeill, D.: Hand and Mind. Chicago University Press, Chicago Il.(1992)
3. Poggi, I.: Le parole del corpo. Introduzione alla comunicazione multimodale. Carocci
Editore, Roma (2006)
4. Poggi, I., Cirela, F., Zollo, A., Agustini, A(2003) “The comunicative system of touch.
Lexicon alphabeth and norms of use.” In: Camurri, A., Volpe, G., Mazzarino, B. (eds.)
Proceedings of Gesture Workshops, April 15-17, Génova 2003.
5. Rodrigues, I.:Os sinais conversacionais de alternância de vez. Granito Editores e Livreiros,
Porto (1998)
6. Rodrigues, I.: Fala e movimentos do corpo na interação face a face. Estratégias de
reparação e de (des)focalização e co-funções conversacionais na manutenção de vez.
Gulbenkian/Fundação para a Ciência e Tecnologia, Lisboa (to be published in 2007)
7. Rodrigues, I.: Contar pelos dedos: um sinal de manutenção de vez na interacção face a face.
Lusorama, Zeitschrift für Lusitanistik. Revista de Estudos sobre os Países de Língua
Português (2006), 65-66 .
Page 26
Page 29
2 Antoni Jaume-i-Cap¶o, Javier Varona, Francisco J. Perales
composed by all the unit vectors of the user's limbs. Formally, the representation
of the orientation of a limb, l, is
ql = (u+x ; u¡x ; u+y ; u¡y ; u+z ; u¡z ); (1)
where u+x and u¡x are respectively the positive and negative magnitudes of the
x-component of unit vector, ux, note that ux = u+x ¡ u¡x and u+x ; u¡x ¸ 0. The
same applies for components uy and uz. Therefore, we build a histogram of
limbs orientations which represent the complete user's limbs orientations. We
propose two forms to build the histogram. The ¯rst one is by cumulative limbs
orientations and the second one is by linking limbs poses. The main di®erence
between the two representations depends on the considered gesture set. The
cumulative representation is more robust to tracking errors, but the set of rec-
ognized gestures is much reduced. On the other hand, the linked representation
allows the de¯nition of more gestures, although it is more sensible to errors in
the estimation of the limbs orientations.
Temporal variation is managed using a temporal gesture representation. A
gesture is composed by several body postures, and then the gesture represen-
tation feature vector is composed by the cumulative postures involved in the
gesture.
An important goal of this work is that the human-computer interaction
should be performed using natural gestures. A gesture is natural depending on
the user experience. To cope the style variations, before the recognition process
starts the system asks the user to perform several of the allowable gestures in
order to build a user's speci¯c gestures models in real-time.
Finally, for the recognition phase, we choose the Bhattacharyya coe±cient in
order to complete the process, as a distance for comparison between the current
gesture and a gesture model.
2 Conclusions
The most important contribution of this work is that we have de¯ned two ges-
ture representations, capable to cope with variations between gestures in di®er-
ent users and performances, making also possible the recognition in real-time.
The complete system has been tested in a real-time application, a gesture-based
videogame control, and the results obtained state that the presented approach
for gesture recognition performs well (84.95% for the cumulated representation
and 87.69% for the linked representation).
References
1. Boulic, R., Varona, J., Unzueta, L., Peinado, M., Suescun, A., Perales, F.: Evalua-
tion of on-line analytic and numeric inverse kinematics approaches driven by partial
vision input. Virtual Reality 10(1) (2006) 48{61
composed by all the unit vectors of the user's limbs. Formally, the representation
of the orientation of a limb, l, is
ql = (u+x ; u¡x ; u+y ; u¡y ; u+z ; u¡z ); (1)
where u+x and u¡x are respectively the positive and negative magnitudes of the
x-component of unit vector, ux, note that ux = u+x ¡ u¡x and u+x ; u¡x ¸ 0. The
same applies for components uy and uz. Therefore, we build a histogram of
limbs orientations which represent the complete user's limbs orientations. We
propose two forms to build the histogram. The ¯rst one is by cumulative limbs
orientations and the second one is by linking limbs poses. The main di®erence
between the two representations depends on the considered gesture set. The
cumulative representation is more robust to tracking errors, but the set of rec-
ognized gestures is much reduced. On the other hand, the linked representation
allows the de¯nition of more gestures, although it is more sensible to errors in
the estimation of the limbs orientations.
Temporal variation is managed using a temporal gesture representation. A
gesture is composed by several body postures, and then the gesture represen-
tation feature vector is composed by the cumulative postures involved in the
gesture.
An important goal of this work is that the human-computer interaction
should be performed using natural gestures. A gesture is natural depending on
the user experience. To cope the style variations, before the recognition process
starts the system asks the user to perform several of the allowable gestures in
order to build a user's speci¯c gestures models in real-time.
Finally, for the recognition phase, we choose the Bhattacharyya coe±cient in
order to complete the process, as a distance for comparison between the current
gesture and a gesture model.
2 Conclusions
The most important contribution of this work is that we have de¯ned two ges-
ture representations, capable to cope with variations between gestures in di®er-
ent users and performances, making also possible the recognition in real-time.
The complete system has been tested in a real-time application, a gesture-based
videogame control, and the results obtained state that the presented approach
for gesture recognition performs well (84.95% for the cumulated representation
and 87.69% for the linked representation).
References
1. Boulic, R., Varona, J., Unzueta, L., Peinado, M., Suescun, A., Perales, F.: Evalua-
tion of on-line analytic and numeric inverse kinematics approaches driven by partial
vision input. Virtual Reality 10(1) (2006) 48{61
Page 30
Person-Independent 3D Sign Language Recognition
G.A. ten Holt, J.F. Lichtenauer, M.J.T. Reinders, E.A. Hendriks
Information and Communication Theory Group, Delft University of Technology,
Netherlands
{g.a.tenholt, j.f.lichtenauer, m.j.t.reinders, e.a.hendriks }@tudelft.nl
Introduction
Vision-based automatic sign language recognition has many applications. We are de-
veloping an interactive electronic tutor for young deaf children, to practise sign lan-
guage vocabulary. This means we want to achieve real-time, vision-based, robust,
person-independent sign language recognition of isolated signs. Varying results have
been obtained in the past with HMMs and Markov chain models. In our project, we
do not model signs, but use automatic feature selection to find the best representation
of a sign. Other unique features are use of 3D information, an adaptive skin model to
find the hands, and Dynamic Time Warping for synchronising signs.
The task of our system is to give feedback to a child as to whether the sign it
made was correct or not. This means that, rather than distinguish a set of signs from
each other, our recognition system must distinguish each sign from everything else –
other signs, but also incorrect versions of the same sign. This amounts to building a
set of one-class classifiers, one for each sign in our vocabulary.
The system consists of several components that take care of recording the
signs, extracting certain features, matching corresponding parts of different signs, and
calculating the probability that a sign was correct. Figure 1 shows an overview.
System Components
Signs are recorded with two calibrated digital cameras. We use an adaptive skin
colour model that can cope with different conditions. With this, we divide time frames
of a sign movie into skin and non-skin pixels. The skin blobs are tracked through sub-
sequent frames. Several properties are extracted from the blobs. Position information
from both cameras is used to calculate the 3D position. All properties are extracted for
Figure 1: Flow diagram of the sign recognition system. Input from the
2 cameras is combined to 3D features in the feature extraction step.
hands & head
Camera 2
25fps
Camera 1
320 x 240
RGB values
skin model
skin
detection
tracking
hands & head
feature
extraction
Dyn. Time
Warping
reference
sign
classification
detection
start/end
correct/incorrect
synchronised sign
G.A. ten Holt, J.F. Lichtenauer, M.J.T. Reinders, E.A. Hendriks
Information and Communication Theory Group, Delft University of Technology,
Netherlands
{g.a.tenholt, j.f.lichtenauer, m.j.t.reinders, e.a.hendriks }@tudelft.nl
Introduction
Vision-based automatic sign language recognition has many applications. We are de-
veloping an interactive electronic tutor for young deaf children, to practise sign lan-
guage vocabulary. This means we want to achieve real-time, vision-based, robust,
person-independent sign language recognition of isolated signs. Varying results have
been obtained in the past with HMMs and Markov chain models. In our project, we
do not model signs, but use automatic feature selection to find the best representation
of a sign. Other unique features are use of 3D information, an adaptive skin model to
find the hands, and Dynamic Time Warping for synchronising signs.
The task of our system is to give feedback to a child as to whether the sign it
made was correct or not. This means that, rather than distinguish a set of signs from
each other, our recognition system must distinguish each sign from everything else –
other signs, but also incorrect versions of the same sign. This amounts to building a
set of one-class classifiers, one for each sign in our vocabulary.
The system consists of several components that take care of recording the
signs, extracting certain features, matching corresponding parts of different signs, and
calculating the probability that a sign was correct. Figure 1 shows an overview.
System Components
Signs are recorded with two calibrated digital cameras. We use an adaptive skin
colour model that can cope with different conditions. With this, we divide time frames
of a sign movie into skin and non-skin pixels. The skin blobs are tracked through sub-
sequent frames. Several properties are extracted from the blobs. Position information
from both cameras is used to calculate the 3D position. All properties are extracted for
Figure 1: Flow diagram of the sign recognition system. Input from the
2 cameras is combined to 3D features in the feature extraction step.
hands & head
Camera 2
25fps
Camera 1
320 x 240
RGB values
skin model
skin
detection
tracking
hands & head
feature
extraction
Dyn. Time
Warping
reference
sign
classification
detection
start/end
correct/incorrect
synchronised sign
Page 32
Skin Color Profile Capture for Scale and Rotation In-
variant Hand Gesture Recognition
R. Bastos 1, M.S. Dias 1,2
1 ADETTI Av. das Forças Armadas, Edifício ISCTE 1600-082 Lisboa, Portugal, rafael.bastos@sapo.pt
2 MLDC - Microsoft Language Development Center, Edifício Qualidade C1-C2, Av. Prof. Doutor Aní-
bal Cavaco Silva, Tagus Park, 2744-010 Porto Salvo, Portugal, Miguel.Dias@microsoft.com
Extended Abstract
In this paper we present a new approach to real-time and rotation invariant hand pose
detection, which is based on a novel technique for computing the best hand skin pro-
file. This skin profile is used to classify each pixel in the current video frame as be-
longing to the skin color or to the background and corresponds to a group of 3D line
segments (vectors), where the control points are important HSV (Hue-Saturation-
Value) 3D coordinates extracted during the skin capture stage. The runtime pixel
classification is evaluated by measuring the distance of each pixel HSV 3D cylindri-
cal coordinates to each one of formed vectors of the current skin profile. A space
transformation, from HSV cone to HSV 3D cylindrical coordinates, is performed due
to the Hue component discontinuity around the 360º, found in the HSV model, which
would prevent any direct arithmetic comparison between Hue values. After
skin/background segmentation, we construct efficient and reliable scale and rotation
invariant hand pose gesture descriptors, by introducing an innovative technique, re-
ferred to as “oriented gesture descriptors”. These descriptors correspond to grayscale
image representations of the hand gesture captured during gesture acquisition. Finally,
hand pose recognition is computed using a template matching technique, which is
light invariant [BD05], between the acquired gestures/descriptors and the current
tracking gesture. The system takes into account the fact that a moving hand, in a dy-
namic light environment, can present several variations of the predominant skin-tone,
instead of just using a single color tone as a reference, such as in [MOC06]. To ac-
quire the Skin Profile, the user selects a region of interest in the captured image. As-
suming a general conversion from RGB color-space to HSV color-space, we convert
these components to cylindrical coordinates (HSV 3D coordinates) for every pixel in
the captured region of interest, to avoid Hue discontinuities while comparing. The
start vector of our Skin Profile corresponds to the HSV 3D coordinates of the pixels
which have the minimum and maximum z values. This model is improved iteratively
with additional points by computing the distance of every other pixel’s 3D coordi-
nates, which lie in the calibration region, to each one of the formed vectors of our
growing profile. For each pixel and for each profile vector, the resulting distance is
compared to a predetermined threshold value ε, which will allow us to specify the
level of discrimination for each one of the compared coordinates. If distance d is
above the predetermined threshold ε, then the newly evaluated point is added to our
profile, being placed between the points of the compared profile vector, forming a
new vector. As a final step of our skin profile creation algorithm, we use the Douglas-
Peucker [DP73] simplification procedure, using the same ε threshold value, in order
to reduce the number of 3D points of the profile, for efficiency purposes. During
variant Hand Gesture Recognition
R. Bastos 1, M.S. Dias 1,2
1 ADETTI Av. das Forças Armadas, Edifício ISCTE 1600-082 Lisboa, Portugal, rafael.bastos@sapo.pt
2 MLDC - Microsoft Language Development Center, Edifício Qualidade C1-C2, Av. Prof. Doutor Aní-
bal Cavaco Silva, Tagus Park, 2744-010 Porto Salvo, Portugal, Miguel.Dias@microsoft.com
Extended Abstract
In this paper we present a new approach to real-time and rotation invariant hand pose
detection, which is based on a novel technique for computing the best hand skin pro-
file. This skin profile is used to classify each pixel in the current video frame as be-
longing to the skin color or to the background and corresponds to a group of 3D line
segments (vectors), where the control points are important HSV (Hue-Saturation-
Value) 3D coordinates extracted during the skin capture stage. The runtime pixel
classification is evaluated by measuring the distance of each pixel HSV 3D cylindri-
cal coordinates to each one of formed vectors of the current skin profile. A space
transformation, from HSV cone to HSV 3D cylindrical coordinates, is performed due
to the Hue component discontinuity around the 360º, found in the HSV model, which
would prevent any direct arithmetic comparison between Hue values. After
skin/background segmentation, we construct efficient and reliable scale and rotation
invariant hand pose gesture descriptors, by introducing an innovative technique, re-
ferred to as “oriented gesture descriptors”. These descriptors correspond to grayscale
image representations of the hand gesture captured during gesture acquisition. Finally,
hand pose recognition is computed using a template matching technique, which is
light invariant [BD05], between the acquired gestures/descriptors and the current
tracking gesture. The system takes into account the fact that a moving hand, in a dy-
namic light environment, can present several variations of the predominant skin-tone,
instead of just using a single color tone as a reference, such as in [MOC06]. To ac-
quire the Skin Profile, the user selects a region of interest in the captured image. As-
suming a general conversion from RGB color-space to HSV color-space, we convert
these components to cylindrical coordinates (HSV 3D coordinates) for every pixel in
the captured region of interest, to avoid Hue discontinuities while comparing. The
start vector of our Skin Profile corresponds to the HSV 3D coordinates of the pixels
which have the minimum and maximum z values. This model is improved iteratively
with additional points by computing the distance of every other pixel’s 3D coordi-
nates, which lie in the calibration region, to each one of the formed vectors of our
growing profile. For each pixel and for each profile vector, the resulting distance is
compared to a predetermined threshold value ε, which will allow us to specify the
level of discrimination for each one of the compared coordinates. If distance d is
above the predetermined threshold ε, then the newly evaluated point is added to our
profile, being placed between the points of the compared profile vector, forming a
new vector. As a final step of our skin profile creation algorithm, we use the Douglas-
Peucker [DP73] simplification procedure, using the same ε threshold value, in order
to reduce the number of 3D points of the profile, for efficiency purposes. During
Page 33
runtime, for each pixel in the current video frame, we compute the minimum distance
between the corresponding 3D coordinates of each pixel and each one of the formed
vectors of the current skin profile, obtaining a binary mask of the current video frame,
where 1 corresponds to skin areas (distances below ε) and 0 to background areas
distances above ε). By applying a recursive filter based on connected components
evaluation, we are able to track connected areas. We assume that the object which
outlines the largest area corresponds to the hand we want to track. A Gesture Descrip-
tor is a grayscale version of a gesture, scaled to a maximum size of n x n pixel, without
does not compromise the initial aspect ratio of the acquired gesture pose. The descrip-
tor’s data is the n x n grayscale image patch (gi) centered at (xc, yc). We introduce a
novel concept of oriented gesture descriptors, by using the gesture’s orientation histo-
gram to compute the patch’s main orientation. This orientation histogram is computed
based on horizontal and vertical derivatives of the image patch. We create the final
rotation invariant descriptor (gr), which can be found by applying a rigid body trans-
formation (a rotation using the pre-computed main orientation) to the gi grayscale
patch. Irrespective to the orientation of patch descriptor gi, the invariant gesture de-
scriptor gr, is the version of the former always oriented towards the patch´s main di-
rection. The gesture matching is accomplished using a template matching technique
which is light invariant and uses the invariant image grayscale templates. This tech-
nique uses the image average and standard deviation to obtain a normalized correla-
tion value between the current gesture tracked and the ones in the gesture database.
During runtime, gestures are matched using an effective yet simple descriptor classi-
fication algorithm, based on a binary identification value that is obtained by evaluat-
ing certain regions of the image in relation to its average. This classification scheme
boosts the template matching procedure, since it reduces the number of template can-
didates in the database. To obtain a final result, in what concerns the tracked gesture,
we have used an occurrence histogram approach based on a fixed time window. For
each frame of this time window, we construct a statistical vector, where each position
corresponds to each one of the loaded gestures in the database. In this vector, we have
an accumulated sum of correlation values obtained for a given descriptor, as well as
the number of occurrences for this descriptor. We compute the current matched ges-
ture by finding the descriptor which possesses the highest correlation value in this
histogram, taking also into consideration the number of occurrences. The paper pre-
sents some results of efficient hand pose gesture recognition, with examples taken
from the handling of Portuguese Sign language signs in use cases of spelled language
recognition. The gestures are identified even if they present rotations changes (to
some degree specified by the user) in contrast with the ones in the gesture database. In
these examples, we also explore the scale invariant property of our gesture recogni-
tion algorithm. The presented work discusses also the development of multimodal
human-computer interaction, based on hand pose gesture recognition, to be applied in
other interaction scenarios, such as in industrial augmented reality frameworks.
[BD05] Bastos, R., Dias, J.M.S., “Fully Automated Texture Tracking Based on Natural Features Extrac-
tion and Template Matching”, in ACM SIGCHI International Conference on Advances in Computer
Entertainment Technology, Valencia, Spain, 2005.
[MOC06] Malima, A., Ozgur, E., Cetin, M., T. K., “A Fast Algorithm for Vision-Based Hand Gesture
Recognition for Robot Control”, in IEEE 14th Signal Processing and Communications Applications,
April, 2006.
between the corresponding 3D coordinates of each pixel and each one of the formed
vectors of the current skin profile, obtaining a binary mask of the current video frame,
where 1 corresponds to skin areas (distances below ε) and 0 to background areas
distances above ε). By applying a recursive filter based on connected components
evaluation, we are able to track connected areas. We assume that the object which
outlines the largest area corresponds to the hand we want to track. A Gesture Descrip-
tor is a grayscale version of a gesture, scaled to a maximum size of n x n pixel, without
does not compromise the initial aspect ratio of the acquired gesture pose. The descrip-
tor’s data is the n x n grayscale image patch (gi) centered at (xc, yc). We introduce a
novel concept of oriented gesture descriptors, by using the gesture’s orientation histo-
gram to compute the patch’s main orientation. This orientation histogram is computed
based on horizontal and vertical derivatives of the image patch. We create the final
rotation invariant descriptor (gr), which can be found by applying a rigid body trans-
formation (a rotation using the pre-computed main orientation) to the gi grayscale
patch. Irrespective to the orientation of patch descriptor gi, the invariant gesture de-
scriptor gr, is the version of the former always oriented towards the patch´s main di-
rection. The gesture matching is accomplished using a template matching technique
which is light invariant and uses the invariant image grayscale templates. This tech-
nique uses the image average and standard deviation to obtain a normalized correla-
tion value between the current gesture tracked and the ones in the gesture database.
During runtime, gestures are matched using an effective yet simple descriptor classi-
fication algorithm, based on a binary identification value that is obtained by evaluat-
ing certain regions of the image in relation to its average. This classification scheme
boosts the template matching procedure, since it reduces the number of template can-
didates in the database. To obtain a final result, in what concerns the tracked gesture,
we have used an occurrence histogram approach based on a fixed time window. For
each frame of this time window, we construct a statistical vector, where each position
corresponds to each one of the loaded gestures in the database. In this vector, we have
an accumulated sum of correlation values obtained for a given descriptor, as well as
the number of occurrences for this descriptor. We compute the current matched ges-
ture by finding the descriptor which possesses the highest correlation value in this
histogram, taking also into consideration the number of occurrences. The paper pre-
sents some results of efficient hand pose gesture recognition, with examples taken
from the handling of Portuguese Sign language signs in use cases of spelled language
recognition. The gestures are identified even if they present rotations changes (to
some degree specified by the user) in contrast with the ones in the gesture database. In
these examples, we also explore the scale invariant property of our gesture recogni-
tion algorithm. The presented work discusses also the development of multimodal
human-computer interaction, based on hand pose gesture recognition, to be applied in
other interaction scenarios, such as in industrial augmented reality frameworks.
[BD05] Bastos, R., Dias, J.M.S., “Fully Automated Texture Tracking Based on Natural Features Extrac-
tion and Template Matching”, in ACM SIGCHI International Conference on Advances in Computer
Entertainment Technology, Valencia, Spain, 2005.
[MOC06] Malima, A., Ozgur, E., Cetin, M., T. K., “A Fast Algorithm for Vision-Based Hand Gesture
Recognition for Robot Control”, in IEEE 14th Signal Processing and Communications Applications,
April, 2006.
Page 34
Robust tracking for processing of videos of
communication’s gestures
Fre´de´rick Gianni, Christophe Collet, and Patrice Dalle
Institut de Recherche en Informatique de Toulouse,
Universite´ Paul Sabatier, Toulouse, France
{gianni, collet, dalle} @irit.fr
In the context of the research undertaken on gestural man machine commu-
nication and on signs language’s (SL), we are interested in the study of image
processing tools able to automate part of the video annotation and then to build
gestures recognition systems. Here, the gestures should be performed naturally,
without of any constraints. Hands’ movements are thus very fast in particular in
SL, and one of the major problems is to find a robust tracking method. In this
paper, we present an enhanced tracking method using particle filtering.
Tracking of body parts : Human motion tracking needs accurate features de-
tection and features correspondence between frames using position, velocity and
intensity information. In our approach, the feature correspondence is achieved
using statistical estimators via a particle filter for the head and the two hands.
As the particle filter models the uncertainty, it will provide a robuste framework
for the tracking of the hands of a person communicating in french sign language.
Particle Filter (PF): aims at estimating a sequence of hidden parameters
xt from only the observed data zt. The idea is to approximate the probability
distribution by a weighted sample set : {(s(0)t , pi
(0)
t ) . . . (s
(n)
t , pi
(n)
t )} with n =
1, . . . , N numbers of samples used. Each sample s represents one state of the
tracked object with a corresponding discrete sampling probability pi.
The state is modelled as st = [x, y, x˙, y˙, x¨, y¨]
t, the position, velocity and accel-
eration of the sample s in the observation at time t. Three states are maintained
during the tracking, one for each body part tracked. We track the head and the
hands separately, each of those areas is represented by one sample set. In the
prediction phase, the samples are propagated throught a dynamic model : a first
order auto-regressive process model xt = Sxk−1 + η, where η is a multivariate
Gausian random variable and S a transition matrix.
We use the particle filter applied in a color based context to achieve ro-
bustness against non rigidity and rotation. The observation density p(zt|xt) is
modelled as a skin colour distribution using the histogramm back-projection
method.
In the particle filter, a resampling step is used to avoid the problem of de-
generacy of the algorithm, that is, avoiding the situation that all but one of the
importance weights are close to zero. A stratified resampling is used, because it
is optimal in terms of variance.
communication’s gestures
Fre´de´rick Gianni, Christophe Collet, and Patrice Dalle
Institut de Recherche en Informatique de Toulouse,
Universite´ Paul Sabatier, Toulouse, France
{gianni, collet, dalle} @irit.fr
In the context of the research undertaken on gestural man machine commu-
nication and on signs language’s (SL), we are interested in the study of image
processing tools able to automate part of the video annotation and then to build
gestures recognition systems. Here, the gestures should be performed naturally,
without of any constraints. Hands’ movements are thus very fast in particular in
SL, and one of the major problems is to find a robust tracking method. In this
paper, we present an enhanced tracking method using particle filtering.
Tracking of body parts : Human motion tracking needs accurate features de-
tection and features correspondence between frames using position, velocity and
intensity information. In our approach, the feature correspondence is achieved
using statistical estimators via a particle filter for the head and the two hands.
As the particle filter models the uncertainty, it will provide a robuste framework
for the tracking of the hands of a person communicating in french sign language.
Particle Filter (PF): aims at estimating a sequence of hidden parameters
xt from only the observed data zt. The idea is to approximate the probability
distribution by a weighted sample set : {(s(0)t , pi
(0)
t ) . . . (s
(n)
t , pi
(n)
t )} with n =
1, . . . , N numbers of samples used. Each sample s represents one state of the
tracked object with a corresponding discrete sampling probability pi.
The state is modelled as st = [x, y, x˙, y˙, x¨, y¨]
t, the position, velocity and accel-
eration of the sample s in the observation at time t. Three states are maintained
during the tracking, one for each body part tracked. We track the head and the
hands separately, each of those areas is represented by one sample set. In the
prediction phase, the samples are propagated throught a dynamic model : a first
order auto-regressive process model xt = Sxk−1 + η, where η is a multivariate
Gausian random variable and S a transition matrix.
We use the particle filter applied in a color based context to achieve ro-
bustness against non rigidity and rotation. The observation density p(zt|xt) is
modelled as a skin colour distribution using the histogramm back-projection
method.
In the particle filter, a resampling step is used to avoid the problem of de-
generacy of the algorithm, that is, avoiding the situation that all but one of the
importance weights are close to zero. A stratified resampling is used, because it
is optimal in terms of variance.
Page 37
Sign Language Processing
Page 39
If the lengths of the three samples are different, we select the medial length as
the length of the new sample and the other two samples are warped according to this
length. Because the gestures are static, the simple linear translation is competent for
this job. The generated new samples are used as the adaptation data to modify HMMs.
The MLLR approach to signer adaptation requires an initial signer independent
continuous density HMM system. MLLR takes some adaptation data from a new
signer and updates the model mean parameters to maximize the likelihood of the
adaptation data. The other HMM parameters are not adapted since the main
differences between signers are assumed to be characterized by the means [1]. The
transformation matrix W is obtained by solving a maximization problem using the
Expectation-Maximization (EM) technique [2].
3 Experiments
To verify the generalization capability of the proposed method, some experiments are
performed based on a vocabulary with 350 static sign words. Experimental data are
collected from 6 signers represented by A-F. Each signer performs 350 isolated
words for 4 times. Using the approach of cross validation test, 20 groups’ data
samples from five signers are used as the training samples.
With the clustering method, we get 107 basic units of P&O, 69 basic units of
LH and 95 basic units of RH. All these units are included in 136 sign words.
Based on one group data of these signs from the other signer, we generate the
adaptation data of 350 sign words (Generate).
One group data of 350 sign words from the other signer are used as adaptation
data (Ungenerate). And another group data from the same signer are referred to as the
unregistered test set. The recognition results are shown in Table 1.
Table 1. The recognition results of 350 sign words
New
Signer
Without
MLLR
MLLR
(Ungenerate)
MLLR
(Generate)
A 61.7% 81.8% 78.1%
B 64.0% 85.2% 83.3%
C 59.4% 80.6% 75.7%
D 68.0% 84.6% 80.5%
E 59.7% 79.4% 74.6%
F 59.1% 78.5% 74.4%
Average 62.0% 81.7% 77.8%
References
1. C. Leggetter and P. Woodland, “Maximum likelihood linear regression for speaker
adaptation of HMMs”, Computer Speech and Language, 1995, vol. 9, pp. 171–186.
2. Microsoft Corporation, Cambridge University Engineering Department: The HTK Book,
Version 3.2 December 2002, pp. 135-143.
the length of the new sample and the other two samples are warped according to this
length. Because the gestures are static, the simple linear translation is competent for
this job. The generated new samples are used as the adaptation data to modify HMMs.
The MLLR approach to signer adaptation requires an initial signer independent
continuous density HMM system. MLLR takes some adaptation data from a new
signer and updates the model mean parameters to maximize the likelihood of the
adaptation data. The other HMM parameters are not adapted since the main
differences between signers are assumed to be characterized by the means [1]. The
transformation matrix W is obtained by solving a maximization problem using the
Expectation-Maximization (EM) technique [2].
3 Experiments
To verify the generalization capability of the proposed method, some experiments are
performed based on a vocabulary with 350 static sign words. Experimental data are
collected from 6 signers represented by A-F. Each signer performs 350 isolated
words for 4 times. Using the approach of cross validation test, 20 groups’ data
samples from five signers are used as the training samples.
With the clustering method, we get 107 basic units of P&O, 69 basic units of
LH and 95 basic units of RH. All these units are included in 136 sign words.
Based on one group data of these signs from the other signer, we generate the
adaptation data of 350 sign words (Generate).
One group data of 350 sign words from the other signer are used as adaptation
data (Ungenerate). And another group data from the same signer are referred to as the
unregistered test set. The recognition results are shown in Table 1.
Table 1. The recognition results of 350 sign words
New
Signer
Without
MLLR
MLLR
(Ungenerate)
MLLR
(Generate)
A 61.7% 81.8% 78.1%
B 64.0% 85.2% 83.3%
C 59.4% 80.6% 75.7%
D 68.0% 84.6% 80.5%
E 59.7% 79.4% 74.6%
F 59.1% 78.5% 74.4%
Average 62.0% 81.7% 77.8%
References
1. C. Leggetter and P. Woodland, “Maximum likelihood linear regression for speaker
adaptation of HMMs”, Computer Speech and Language, 1995, vol. 9, pp. 171–186.
2. Microsoft Corporation, Cambridge University Engineering Department: The HTK Book,
Version 3.2 December 2002, pp. 135-143.
Page 43
43
Page 44
Signing Avatar: Say hello to Elsi!
Michael Filhol, Annelies Braffort and Laurence Bolot
LIMSI/CNRS
Campus d’Orsay, Bat 508
BP 133
91403 Orsay cedex
France
first.lastname@limsi.fr
Abstract
Limsi recently initiated a signing avatar project called Elsi1 with the purpose of
generating French Sign Language (LSF) in the direction of the French deaf
community. It is being built upon enhanced linguistic bases in order to make the
discourse level more consistent and acceptable.
LSF is the French deaf community's first language. Linguistic studies of LSF show
a heavy and consistent use of the "signing space", i.e. the portion of space in which
the signs are performed. Let us look at the following example: “Limsi has two
buildings: no. 502 and no. 508.”. The first part of the sentence above would be signed
with two occurrences of the same sign [BUILDING] performed in different locations,
thereby giving the two locations a special relevance in the signing space. Each of
them would then in turn be pointed at as a reference to the corresponding building,
immediately followed by its number.
Therefore, while modelling the signs of the language, i.e. at the lexical level, all the
possible context-driven variations must be considered as they are fully part of the
language. At the discourse level, sentence construction rules must also account for
the extensive use of space of LSF – and indeed all sign languages. Current computer
models like SigML for the lexicon (a computer-friendly representation based on
HamNoSys) and HPSG for syntax seem to lack some of these crucial features, hence
our wish to suggest a new approach to sign language representation.
We decided the evaluation process would be incremental, so that each step forward
is guaranteed the reliability of its basis. To do so, an avatar animation platform is
presently under development at Limsi. Any automatic sign production should abide
by the SL linguistic constraints. Diagram 1 sketches out the structure of the Elsi sign
production software.
Sentence generation (M1) is based on a model of the signing space (K2), and uses
knowledge base containing useful spatio-temporal structures (K1), a common feature
of LSF that will not be discussed in this paper. We use a simple representation of the
1
ELSI: Elsi is Limsi's SIgner
Michael Filhol, Annelies Braffort and Laurence Bolot
LIMSI/CNRS
Campus d’Orsay, Bat 508
BP 133
91403 Orsay cedex
France
first.lastname@limsi.fr
Abstract
Limsi recently initiated a signing avatar project called Elsi1 with the purpose of
generating French Sign Language (LSF) in the direction of the French deaf
community. It is being built upon enhanced linguistic bases in order to make the
discourse level more consistent and acceptable.
LSF is the French deaf community's first language. Linguistic studies of LSF show
a heavy and consistent use of the "signing space", i.e. the portion of space in which
the signs are performed. Let us look at the following example: “Limsi has two
buildings: no. 502 and no. 508.”. The first part of the sentence above would be signed
with two occurrences of the same sign [BUILDING] performed in different locations,
thereby giving the two locations a special relevance in the signing space. Each of
them would then in turn be pointed at as a reference to the corresponding building,
immediately followed by its number.
Therefore, while modelling the signs of the language, i.e. at the lexical level, all the
possible context-driven variations must be considered as they are fully part of the
language. At the discourse level, sentence construction rules must also account for
the extensive use of space of LSF – and indeed all sign languages. Current computer
models like SigML for the lexicon (a computer-friendly representation based on
HamNoSys) and HPSG for syntax seem to lack some of these crucial features, hence
our wish to suggest a new approach to sign language representation.
We decided the evaluation process would be incremental, so that each step forward
is guaranteed the reliability of its basis. To do so, an avatar animation platform is
presently under development at Limsi. Any automatic sign production should abide
by the SL linguistic constraints. Diagram 1 sketches out the structure of the Elsi sign
production software.
Sentence generation (M1) is based on a model of the signing space (K2), and uses
knowledge base containing useful spatio-temporal structures (K1), a common feature
of LSF that will not be discussed in this paper. We use a simple representation of the
1
ELSI: Elsi is Limsi's SIgner
Page 45
2 Michael Filhol, Annelies Braffort and Laurence Bolot
signing space for now. It allows production of isolated gap clauses with predefined
format. Signing space contains the signed elements that are useful and their respective
locations, orientation and size... The gap clauses we use include both manual and non-
manual signs and depend on variables. Production is then carried out by signing the
units back to back.
The sign generation module (M2) uses knowledge base (K3) which contains the
signs description which account for context influences, spatial relevance, discourse
genre or "intonation". This description uses geometric constraints. Signs are no more
regarded as sets (tuples) of universal parameters like in Stokoe-based approaches, but
rather as dynamic spatial geometric figures. A description may build any useful set of
geometric objects like planes or points in space, then constrains body segments so that
they move according to the sign being described.
Elsi –the avatar– is made of a bone skeleton, on which a skin is mapped. The
skeleton and the skin are built under dedicated animation software (3dsMax), and then
exported (K4) so that they could be fed into the animation module. The predefined
and generated parts of the signed production are also built under dedicated animation
software, then exported in an XML format (K5). This output simply lists the different
orientations for each bone throughout time. In the animation engine (M3 module) the
skin automatically follows the animated skeleton, bending and stretching at each joint
being taken care of real-time. Between each sequence, bone orientations are
interpolated. To avoid bones moving through one another, a software collision-
avoidance planner is under study.
Fig. 1. Structure of the ELSI animation platform.
The lexical and sentence generation modules are under development. The
animation module is close to being achieved. A qualitative evaluation is planned in
the next months on gap clauses. Once the three modules will have been tested enough
to be considered functional, we will start integrating bank on spatio-temporal
structures and generating the corresponding sign sequences. This of course is to keep
increasing the automation and capabilities of the software.
signing space for now. It allows production of isolated gap clauses with predefined
format. Signing space contains the signed elements that are useful and their respective
locations, orientation and size... The gap clauses we use include both manual and non-
manual signs and depend on variables. Production is then carried out by signing the
units back to back.
The sign generation module (M2) uses knowledge base (K3) which contains the
signs description which account for context influences, spatial relevance, discourse
genre or "intonation". This description uses geometric constraints. Signs are no more
regarded as sets (tuples) of universal parameters like in Stokoe-based approaches, but
rather as dynamic spatial geometric figures. A description may build any useful set of
geometric objects like planes or points in space, then constrains body segments so that
they move according to the sign being described.
Elsi –the avatar– is made of a bone skeleton, on which a skin is mapped. The
skeleton and the skin are built under dedicated animation software (3dsMax), and then
exported (K4) so that they could be fed into the animation module. The predefined
and generated parts of the signed production are also built under dedicated animation
software, then exported in an XML format (K5). This output simply lists the different
orientations for each bone throughout time. In the animation engine (M3 module) the
skin automatically follows the animated skeleton, bending and stretching at each joint
being taken care of real-time. Between each sequence, bone orientations are
interpolated. To avoid bones moving through one another, a software collision-
avoidance planner is under study.
Fig. 1. Structure of the ELSI animation platform.
The lexical and sentence generation modules are under development. The
animation module is close to being achieved. A qualitative evaluation is planned in
the next months on gap clauses. Once the three modules will have been tested enough
to be considered functional, we will start integrating bank on spatio-temporal
structures and generating the corresponding sign sequences. This of course is to keep
increasing the automation and capabilities of the software.
Page 46
Sequential Belief-Based Fusion of Manual and Non-
Manual Signs
Oya Aran1, Thomas Burger2, Alice Caplier3, Lale Akarun1
1Dep. of Computer Engineering, Bogazici University 34342 Istanbul, Turkey
aranoya@boun.edu.tr, akarun@boun.edu.tr
2France Telecom R&D, 28 ch. Vieux Chêne, Meylan, France
thomas.burger@orange-ftgroup.com
3GIPSA-lab, 46 av. Felix Viallet, Grenoble, France
alice.caplier@lis.inpg.fr
Abstract. This work aims to recognize signs which have both manual and non-
manual components by providing a sequential belief-based fusion mechanism.
We propose a methodology based on belief functions for fusing extracted
manual and non-manual information in a sequential two-step approach.
Keywords. Sign language recognition, manual and non-manual signs, hidden
Markov models, belief functions
1 Introduction
In sign languages, the message is contained not only in hand motion and shapes
(manual signs, MS) but also in facial expressions, head/shoulder motion and body
posture (non-manual signs, NMS). Most of the Sign Language Recognition (SLR)
systems concentrate on hand gesture analysis only. There are only a couple of studies
that integrate MS and NMS for SLR (see [1] for a review). We propose a
methodology for integrating manual and non-manual information in a sequential
approach. The methodology is based on (1) identifying the level of uncertainty of a
classification decision, (2) identifying sign clusters, and (3) identifying the correct
sign based on MS and NMS.
2 Sequential Belief Based Fusion
The sequential belief based fusion technique consists of two classification phases
where the second classification phase is only applied when necessary. The necessity
of applying the second phase is given by the belief functions defined on the
likelihoods of the first bank of HMMs. The uncertainty calculated from those beliefs
is evaluated and resolved via the second bank of HMMs. These uncertainties between
classes are used to identify the sign clusters in which the second bank of HMMs are
capable of discriminating. A sign cluster is defined as a group of signs which are
Manual Signs
Oya Aran1, Thomas Burger2, Alice Caplier3, Lale Akarun1
1Dep. of Computer Engineering, Bogazici University 34342 Istanbul, Turkey
aranoya@boun.edu.tr, akarun@boun.edu.tr
2France Telecom R&D, 28 ch. Vieux Chêne, Meylan, France
thomas.burger@orange-ftgroup.com
3GIPSA-lab, 46 av. Felix Viallet, Grenoble, France
alice.caplier@lis.inpg.fr
Abstract. This work aims to recognize signs which have both manual and non-
manual components by providing a sequential belief-based fusion mechanism.
We propose a methodology based on belief functions for fusing extracted
manual and non-manual information in a sequential two-step approach.
Keywords. Sign language recognition, manual and non-manual signs, hidden
Markov models, belief functions
1 Introduction
In sign languages, the message is contained not only in hand motion and shapes
(manual signs, MS) but also in facial expressions, head/shoulder motion and body
posture (non-manual signs, NMS). Most of the Sign Language Recognition (SLR)
systems concentrate on hand gesture analysis only. There are only a couple of studies
that integrate MS and NMS for SLR (see [1] for a review). We propose a
methodology for integrating manual and non-manual information in a sequential
approach. The methodology is based on (1) identifying the level of uncertainty of a
classification decision, (2) identifying sign clusters, and (3) identifying the correct
sign based on MS and NMS.
2 Sequential Belief Based Fusion
The sequential belief based fusion technique consists of two classification phases
where the second classification phase is only applied when necessary. The necessity
of applying the second phase is given by the belief functions defined on the
likelihoods of the first bank of HMMs. The uncertainty calculated from those beliefs
is evaluated and resolved via the second bank of HMMs. These uncertainties between
classes are used to identify the sign clusters in which the second bank of HMMs are
capable of discriminating. A sign cluster is defined as a group of signs which are
Page 47
similar and the differences are either based on NMS or variations of the MS. Our
automatic cluster identification method is based on the hesitation matrix.
3 Experiments
The experiments are conducted on eNTERFACE’06 sign language database [2] which
includes both manual and non-manual signs. There are eight base signs that represent
words and 19 variants which include the variations of the base signs in the form of
NMS. Since we concentrate on the fusion step in this paper, we have directly used the
processed data from [2] where sign features are extracted both for MS and NMS.
Table 1. Classification performance
Models
used
Fusion
method
Cluster
identification
Test
Accuracy
HMMM No fusion - 67.1 %
HMMM&N Feature fusion - 75.9 %
HMMM&N 4 HMMN Sequential belief-based fusion Automatic 81.6 %
To model the MS and NMS and perform classification, we trained three different
HMMs. The first one is trained for comparison purposes and the last two are for the
first and second steps of our fusion method: (1) HMMM, uses manual features; (2)
HMMM&N, uses manual and non-manual features, and (3) HMMN, uses non-manual
features. The accuracies of different fusion techniques are summarized in Table 1. We
obtain the highest accuracy, 81.6%, with the sequential-belief based fusion.
4 Conclusions
We have proposed a technique for integrating manual and non-manual signs in a sign
language recognition system. The first novelty of this approach is the decision
mechanism which ensures that if the decision at the first step is without hesitation, the
decision is made immediately. The second novelty is the clustering mechanism: the
sign clusters are identified automatically at the training phase which makes the system
flexible for adding new signs to the database by just providing new training data.
References
1. Ong, S.C.W. and Ranganath, S., “Automatic Sign Language Analysis: A survey and
the Future beyond Lexical Meaning”, IEEE Transactions on PAMI, vol.27, no.6,
pp.873-891, June 2005.
2. Aran, O., Ari, I., Benoit, F., Campr, A., Carrillo, A.H., Fanard, P., Akarun, L.,
Caplier, A., Rombaut, M. & Sankur, B., “Sign Language Tutoring Tool”,
eNTERFACE 2006, The Summer Workshop on Multimodal Interfaces, Croatia, 2006
Page 48
Page 49
Gesturing with Tangible Interfaces and in Virtual Augmented
Reality
Reality
Page 51
the device, to interface-at-operator, where the operator's physical location is relatively
unconstrained. This axis captures the effect of operator freedom on the interface.
Our first prototype used an Essential Reality P5 Data Glove to scroll a webpage
around a virtual mosaic of web pages. The natural mapping of hand-position to page-
position made great sense in theory [3], but in practice, it proved unworkable due to
the P5 Glove's multiple sources of error (sensor jitter and position skew). In terms of
our design space, the Mapping was natural (albeit error prone) in that the display
echoed the movement of the glove. The Frame of Reference was teleobserved, since
all of the action was happening in an electronic space. Finally, the Location was
(unfortunately) operator at interface: the system failed if we moved more than a few
feet from the sensor tower and controlling computer.
In the second interface scheme, we chose to forgo natural mapping in favor of a
relatively easy-to-implement machine-oriented linkage based on bend sensors in the
fingers of the P5 glove. As the operator curled and extended his or her fingers, the
interface software sent commands to the motors to reel out, stop, or reel in. Our
finger-sensing interface was machine mapped for a quicker and less error prone
implementation. Furthermore, our physical package was prone to frequent failure due
to tangled fishing lines and had no remote control functionality that made it fully
operator observed and operator-at-interface.
Our last interface scheme allowed us to move from machine-mapped thumb
movements to naturally mapped thumb movements that translated intuitively into
Manta movements and corresponded well with the operator's natural perception of
Manta movements. By mapping the three axes of the Xbox 360 joystick movement to
the three spatial axes of the rectangular atrium and by mapping degree of joystick
deflection to speed, it became possible to use thumb movements that made sense. This
latest scheme was a major advance over the prior interface because it allowed the
operator to gesture "I want it to go that way," with arbitrary vigor. Thus, this scheme
has a relatively natural mapping, is moderately operator-observed and moderately
interface-at-operator.
Our own experience as builders and performers leads us to speculate that while
engineers often begin by conforming to machine-imposed constraints, it is at the
natural and interface-at-operator poles, that the operator feels more like a performer
performing than an operator controlling, and the robotic system feels more like an
extension of the human body. Our experience demonstrating this system at the (re)Actor, the
First International Conference on Digital Live Art presented as part of HCI2006 also leads us to
observe that a Manta Ray dancing at a disco provides a remarkably good opportunity
for exploring human-computer gestural interaction and for discovering novel
engineering challenges.
References
1. Benko, H., Ishak, E. W., Feiner, S.: Cross-Dimensional Gestural Interaction Techniques for
Hybrid Imrnersive Environments. vr, 00 (2005) 209
2. Chalhoub, N. G.: Control of a Leadscrew Driven Flexible Robot Arm. (1986)
3. Norman, D.: The Design of Everyday Things. Doubleday, (1990)
unconstrained. This axis captures the effect of operator freedom on the interface.
Our first prototype used an Essential Reality P5 Data Glove to scroll a webpage
around a virtual mosaic of web pages. The natural mapping of hand-position to page-
position made great sense in theory [3], but in practice, it proved unworkable due to
the P5 Glove's multiple sources of error (sensor jitter and position skew). In terms of
our design space, the Mapping was natural (albeit error prone) in that the display
echoed the movement of the glove. The Frame of Reference was teleobserved, since
all of the action was happening in an electronic space. Finally, the Location was
(unfortunately) operator at interface: the system failed if we moved more than a few
feet from the sensor tower and controlling computer.
In the second interface scheme, we chose to forgo natural mapping in favor of a
relatively easy-to-implement machine-oriented linkage based on bend sensors in the
fingers of the P5 glove. As the operator curled and extended his or her fingers, the
interface software sent commands to the motors to reel out, stop, or reel in. Our
finger-sensing interface was machine mapped for a quicker and less error prone
implementation. Furthermore, our physical package was prone to frequent failure due
to tangled fishing lines and had no remote control functionality that made it fully
operator observed and operator-at-interface.
Our last interface scheme allowed us to move from machine-mapped thumb
movements to naturally mapped thumb movements that translated intuitively into
Manta movements and corresponded well with the operator's natural perception of
Manta movements. By mapping the three axes of the Xbox 360 joystick movement to
the three spatial axes of the rectangular atrium and by mapping degree of joystick
deflection to speed, it became possible to use thumb movements that made sense. This
latest scheme was a major advance over the prior interface because it allowed the
operator to gesture "I want it to go that way," with arbitrary vigor. Thus, this scheme
has a relatively natural mapping, is moderately operator-observed and moderately
interface-at-operator.
Our own experience as builders and performers leads us to speculate that while
engineers often begin by conforming to machine-imposed constraints, it is at the
natural and interface-at-operator poles, that the operator feels more like a performer
performing than an operator controlling, and the robotic system feels more like an
extension of the human body. Our experience demonstrating this system at the (re)Actor, the
First International Conference on Digital Live Art presented as part of HCI2006 also leads us to
observe that a Manta Ray dancing at a disco provides a remarkably good opportunity
for exploring human-computer gestural interaction and for discovering novel
engineering challenges.
References
1. Benko, H., Ishak, E. W., Feiner, S.: Cross-Dimensional Gestural Interaction Techniques for
Hybrid Imrnersive Environments. vr, 00 (2005) 209
2. Chalhoub, N. G.: Control of a Leadscrew Driven Flexible Robot Arm. (1986)
3. Norman, D.: The Design of Everyday Things. Doubleday, (1990)
Page 52
Automatic Classification of Expressive Hand Gestures
on Tangible Acoustic Interfaces According to Laban’s
Theory of Effort
Antonio Camurri, Corrado Canepa, Simone Ghisio, and Gualtiero Volpe
InfoMus Lab, DIST – University of Genova
Viale Causa 13, I-16145 Genova, Italy
http://www.infomus.org
{toni, corrado, ghisio, volpe}@infomus.org;
Extended abstract
The EU-IST Project TAI-CHI (Tangible Acoustic Interfaces for Computer Human
Interaction) investigates a new generation of tangible interfaces, Tangible Acoustic
Interfaces (TAIs). TAIs exploit the propagation of sound in physical objects in order
to localize touching positions and to analyse user’s gesture on the object, both from a
low-level, quantitative point of view and from a high-level qualitative one. Designing
and developing TAIs consists of exploring how physical objects, augmented surfaces,
and spaces can be transformed into tangible-acoustic embodiments of natural
seamless unrestricted interfaces. The ultimate goal of the TAI-CHI project is therefore
to design TAIs employing physical objects (also including complex-shaped everyday
objects) and space as media to bridge the gap between the virtual and physical worlds
and to make information accessible through large size touchable objects as well as
through ambient media.
In this framework, a relevant aspect for the success of TAI-based interactive
systems is their ability of processing expressive information, i.e., information related
to the affective, emotional sphere conveyed by users through non-verbal channels
[1][2]. In other words, such information is what Cowie and colleagues call “implicit
messages” [3] or what Hashimoto calls KANSEI [4], and it is often conveyed through
expressive gesture [5]. We call Expressive TAIs a subset of TAIs endowed with the
special ability of extracting, analysing, and processing such emotional, affective,
expressive content. Expressive TAIs are based on high-level multimodal analysis of
users’ gesture on or approaching the TAI. They are thus a novel generation of human-
computer interfaces combining the naturalness of vision and touching gesture with the
power and the impact of human non-verbal expressive, emotional communication in
experience-centric tasks and collaborative applications.
This paper presents and discusses a concrete example of high-level analysis of
expressive gesture on Expressive TAIs: hand gestures on a TAI surface are analysed
and classified according to two major dimensions of Rudolf Laban’s Theory of Effort
[6][7], the Space and Time dimensions. Hand gestures have been tracked and
segmented by means of multimodal integration of visual and acoustic tracking
techniques. A collection of expressive features including eccentricity of a gesture,
on Tangible Acoustic Interfaces According to Laban’s
Theory of Effort
Antonio Camurri, Corrado Canepa, Simone Ghisio, and Gualtiero Volpe
InfoMus Lab, DIST – University of Genova
Viale Causa 13, I-16145 Genova, Italy
http://www.infomus.org
{toni, corrado, ghisio, volpe}@infomus.org;
Extended abstract
The EU-IST Project TAI-CHI (Tangible Acoustic Interfaces for Computer Human
Interaction) investigates a new generation of tangible interfaces, Tangible Acoustic
Interfaces (TAIs). TAIs exploit the propagation of sound in physical objects in order
to localize touching positions and to analyse user’s gesture on the object, both from a
low-level, quantitative point of view and from a high-level qualitative one. Designing
and developing TAIs consists of exploring how physical objects, augmented surfaces,
and spaces can be transformed into tangible-acoustic embodiments of natural
seamless unrestricted interfaces. The ultimate goal of the TAI-CHI project is therefore
to design TAIs employing physical objects (also including complex-shaped everyday
objects) and space as media to bridge the gap between the virtual and physical worlds
and to make information accessible through large size touchable objects as well as
through ambient media.
In this framework, a relevant aspect for the success of TAI-based interactive
systems is their ability of processing expressive information, i.e., information related
to the affective, emotional sphere conveyed by users through non-verbal channels
[1][2]. In other words, such information is what Cowie and colleagues call “implicit
messages” [3] or what Hashimoto calls KANSEI [4], and it is often conveyed through
expressive gesture [5]. We call Expressive TAIs a subset of TAIs endowed with the
special ability of extracting, analysing, and processing such emotional, affective,
expressive content. Expressive TAIs are based on high-level multimodal analysis of
users’ gesture on or approaching the TAI. They are thus a novel generation of human-
computer interfaces combining the naturalness of vision and touching gesture with the
power and the impact of human non-verbal expressive, emotional communication in
experience-centric tasks and collaborative applications.
This paper presents and discusses a concrete example of high-level analysis of
expressive gesture on Expressive TAIs: hand gestures on a TAI surface are analysed
and classified according to two major dimensions of Rudolf Laban’s Theory of Effort
[6][7], the Space and Time dimensions. Hand gestures have been tracked and
segmented by means of multimodal integration of visual and acoustic tracking
techniques. A collection of expressive features including eccentricity of a gesture,
Page 54
Using Hand Gesture and Speech in a Multimodal
Augmented Reality Environment
M.S. Dias1,2, R. Bastos1,, J. Fernandes1, J. Tavares1, P. Santos2
1 ADETTI Av. das Forças Armadas, Edifício ISCTE 1600-082 Lisboa (+351) 21 782 64 80, Portugal
2 MLDC - Microsoft Language Development Center, Edifício Qualidade C1-C2, Av. Prof. Doutor
Aníbal Cavaco Silva, Tagus Park, 2744-010 Porto Salvo (+351) 96 2093324, Portugal
Extended Abstract
Information Technologies (IT) professionals require new ways of interacting with
computers using more natural approaches. A natural paradigm of interaction is one
that doesn’t need any intrusive devices, which may be confusing to users, therefore
distracting them from their main goal. Computer Vision, as an example, has enabled
these professionals to explore new ways for humans to interact with machines and
computers. The adoption of multimodal interfaces in the framework of augmented
reality is one way to address these requirements. The main benefit of using a system
of this kind is the provision of a more transparent, flexible, efficient and expressive
means of human-computer interaction. Since multimodal interfaces offer different
possibilities of interacting with the system, errors and time of action can be reduced,
improving efficiency and effectiveness while executing a certain task. Our work
envisages the creation of a tool for architects and interior designers which allows, via
multimodal interaction (gesture and speech), designers or clients, to visualize the
implementation of real size furniture using augmented reality. The tool is capable of
importing, disposing, moving and rotating virtual furniture objects in a real scenario.
The users are able to take control of all actions with gestures and speech, and to walk
into the augmented scene, seeing it from a variety of angles and distances. This paper
exploits some previously obtained knowledge, namely the MX Toolkit library
[DBS*03]. This library conveys a platform, which allows the programmer to combine
multimodal interfaces with 3D object interaction and visualization, applied to
augmented reality scenarios. Since the final goal of this paper was the creation of an
augmented reality computational application we have integrated a previous developed
Augmented Reality Authoring tool, based in MX Toolkit, the Plaza [S05]. Plaza is a
3D AR authoring module that allows the user to manipulate and modify 3D objects
loaded from a predefined database either in a VR environment, in an AR scenario or
in both. The proposed logical architecture of the system is depicted the picture below
and can be divided in two modules: Plaza, responsible for Augmented Reality
authoring and Speech Recognition and the Gesture Recognition Server, responsible
for Hand Gesture recognition. Both modules use the MX Toolkit library and
communicate through the TCP/IP COM module. The Gesture Recognition Server also
maintains a Gesture Database, which will be used at runtime for gesture matching.
Augmented Reality Environment
M.S. Dias1,2, R. Bastos1,, J. Fernandes1, J. Tavares1, P. Santos2
1 ADETTI Av. das Forças Armadas, Edifício ISCTE 1600-082 Lisboa (+351) 21 782 64 80, Portugal
2 MLDC - Microsoft Language Development Center, Edifício Qualidade C1-C2, Av. Prof. Doutor
Aníbal Cavaco Silva, Tagus Park, 2744-010 Porto Salvo (+351) 96 2093324, Portugal
Extended Abstract
Information Technologies (IT) professionals require new ways of interacting with
computers using more natural approaches. A natural paradigm of interaction is one
that doesn’t need any intrusive devices, which may be confusing to users, therefore
distracting them from their main goal. Computer Vision, as an example, has enabled
these professionals to explore new ways for humans to interact with machines and
computers. The adoption of multimodal interfaces in the framework of augmented
reality is one way to address these requirements. The main benefit of using a system
of this kind is the provision of a more transparent, flexible, efficient and expressive
means of human-computer interaction. Since multimodal interfaces offer different
possibilities of interacting with the system, errors and time of action can be reduced,
improving efficiency and effectiveness while executing a certain task. Our work
envisages the creation of a tool for architects and interior designers which allows, via
multimodal interaction (gesture and speech), designers or clients, to visualize the
implementation of real size furniture using augmented reality. The tool is capable of
importing, disposing, moving and rotating virtual furniture objects in a real scenario.
The users are able to take control of all actions with gestures and speech, and to walk
into the augmented scene, seeing it from a variety of angles and distances. This paper
exploits some previously obtained knowledge, namely the MX Toolkit library
[DBS*03]. This library conveys a platform, which allows the programmer to combine
multimodal interfaces with 3D object interaction and visualization, applied to
augmented reality scenarios. Since the final goal of this paper was the creation of an
augmented reality computational application we have integrated a previous developed
Augmented Reality Authoring tool, based in MX Toolkit, the Plaza [S05]. Plaza is a
3D AR authoring module that allows the user to manipulate and modify 3D objects
loaded from a predefined database either in a VR environment, in an AR scenario or
in both. The proposed logical architecture of the system is depicted the picture below
and can be divided in two modules: Plaza, responsible for Augmented Reality
authoring and Speech Recognition and the Gesture Recognition Server, responsible
for Hand Gesture recognition. Both modules use the MX Toolkit library and
communicate through the TCP/IP COM module. The Gesture Recognition Server also
maintains a Gesture Database, which will be used at runtime for gesture matching.
Page 55
Fig. 1. – System Architecture Diagram.
The system supports several user profiles that can be created using the Gesture
Recognition Server, either online or offline. Each profile corresponds to different
gesture templates, since segmentation conditions may vary from user to user (skin
tone, hand size, etc.). For each command, there is an associated hand pose gesture and
a voice command. This ensures the possibility for the user to choose between issuing
a command from gesture or speech. The application works in real-time and is able to
detect and track static hand poses and hand movements, using them to control and
manipulate all objects in the scene. To determine the usability of the system we
developed a task-based test, were subjects would perform simple geometric
transformations applied to virtual models using gesturing and voice interactions. Our
goal was to determine which interface the users preferred and why. We applied the
same test three times using different multimodal interfaces (voice, gestures and
voice/gestures). It was revealed that the use of the two interfaces in conjunction was
definitively the best way to interact with the system reducing in 20% (average) the
time to complete the predetermined task. Most of the users preferred the speech
interface to activate simple commands like “Move” and “Rotate” or to modify the
speed of the interaction. On the other hand, the hand gestures interface was commonly
used to transform the objects (rotating and translating). In some cases we had to
calibrate the skin profile of the subjects because of the different skin tones.
Nevertheless, all users conclude that the use of two interfaces together is the most
efficient way to use the system.
REFERENCES
[DB06] Dias J.M.S., Bastos, R., “An Optimized Marker Tracking System”, in Eurographics Symposium on
Virtual Environments pages 1-4, 2006.
[DBS*03] Dias, J. M.S., Bastos, R., Santos, P., Monteiro, L., Silvestre, R., “Developing and Authoring
Mixed Reality with MX Toolkit”, in The Second IEEE International Augmented Reality Toolkit
Workshop ART 03, Toki, Japan, 2003.
[S05] Santos, P., "AR Authoring for 3D Design”, in Master of Science Thesis in Computers
and Telecommunications Engineering, ADETTI, Portugal, 2005.
The system supports several user profiles that can be created using the Gesture
Recognition Server, either online or offline. Each profile corresponds to different
gesture templates, since segmentation conditions may vary from user to user (skin
tone, hand size, etc.). For each command, there is an associated hand pose gesture and
a voice command. This ensures the possibility for the user to choose between issuing
a command from gesture or speech. The application works in real-time and is able to
detect and track static hand poses and hand movements, using them to control and
manipulate all objects in the scene. To determine the usability of the system we
developed a task-based test, were subjects would perform simple geometric
transformations applied to virtual models using gesturing and voice interactions. Our
goal was to determine which interface the users preferred and why. We applied the
same test three times using different multimodal interfaces (voice, gestures and
voice/gestures). It was revealed that the use of the two interfaces in conjunction was
definitively the best way to interact with the system reducing in 20% (average) the
time to complete the predetermined task. Most of the users preferred the speech
interface to activate simple commands like “Move” and “Rotate” or to modify the
speed of the interaction. On the other hand, the hand gestures interface was commonly
used to transform the objects (rotating and translating). In some cases we had to
calibrate the skin profile of the subjects because of the different skin tones.
Nevertheless, all users conclude that the use of two interfaces together is the most
efficient way to use the system.
REFERENCES
[DB06] Dias J.M.S., Bastos, R., “An Optimized Marker Tracking System”, in Eurographics Symposium on
Virtual Environments pages 1-4, 2006.
[DBS*03] Dias, J. M.S., Bastos, R., Santos, P., Monteiro, L., Silvestre, R., “Developing and Authoring
Mixed Reality with MX Toolkit”, in The Second IEEE International Augmented Reality Toolkit
Workshop ART 03, Toki, Japan, 2003.
[S05] Santos, P., "AR Authoring for 3D Design”, in Master of Science Thesis in Computers
and Telecommunications Engineering, ADETTI, Portugal, 2005.
Page 59
Gesture for Music and Performing Arts
Page 60
Geometry and effort in gestural renderings of
musical sound
Rolf Inge Godøy
Department of Musicology, University of Oslo, P.B. 1017 Blindern, N-
0315 Oslo, Norway
r.i.godoy@imv.uio.no
Extended abstract
In our current research on music-related gestures (http://musicalgestures.uio.no), we have
had a particular focus on the spontaneous gestures that listeners make to musical sound.
This has been motivated by the belief that perception and cognition of musical sound is
intimately linked with mental images of movement, and that a process of incessant motor
imagery is running in parallel with listening to, or even just imagining, musical sound. We
have called this motormimetic cognition, and see evidence for this in a number of research
findings as well as in our own observation studies. Furthermore, we believe hand
movements have a privileged role in motormimetic cognition of musical sound, and that
these hand movements may trace the geometry (i.e. elements such as pitch contours, pitch
spread, rhythmical patterns, textures, and even timbral elements as shapes) as well as
convey sensations of effort of musical sound, hence the focus in this paper on geometry and
effort in the gestural renderings of musical sound.
There are many different gestures that may be associated with music. Using the
Gibsonian concept of affordance, we can thus speak of rich gestural affordances of musical
sound. For practical purposes we can in this paper think of two main categories, sound-
producing gestures (such as hitting, stoking, bowing) and sound-accompanying gestures
(such as dancing, marching, making various movements to the music), as well as several
sub-categories of these. The distinction between these two main categories as well as their
sub-categories may not always be so clear (e.g. musicians make gestures in performance
that are probably not strictly necessary for producing sound, but may be useful for reasons
of motor control or physiological comfort, or have communicative functions towards other
musicians or the audience).
But in order to carry out more systematic observation studies of gestural renderings, we
have proceeded from giving subjects rather well-defined tasks with limited gestural affor-
dances onto progressively more open tasks with quite rich gestural affordances, meaning
proceeding from studies of air-instrument performances where subjects were asked to
make sound-producing movements, to what we have called sound-tracing studies where the
musical excerpts were quite restricted as to their number of salient features, on to what we
called free dance gestures with more complex, multi-feature excerpts and rather general
instructions to subjects about making spontaneous gestural renderings based on what they
perceived as the most salient features.
The idea of gestural rendering of musical sound is based on a large body of research
ranging from classical motor theory of perception to more recent theories of motor in-
musical sound
Rolf Inge Godøy
Department of Musicology, University of Oslo, P.B. 1017 Blindern, N-
0315 Oslo, Norway
r.i.godoy@imv.uio.no
Extended abstract
In our current research on music-related gestures (http://musicalgestures.uio.no), we have
had a particular focus on the spontaneous gestures that listeners make to musical sound.
This has been motivated by the belief that perception and cognition of musical sound is
intimately linked with mental images of movement, and that a process of incessant motor
imagery is running in parallel with listening to, or even just imagining, musical sound. We
have called this motormimetic cognition, and see evidence for this in a number of research
findings as well as in our own observation studies. Furthermore, we believe hand
movements have a privileged role in motormimetic cognition of musical sound, and that
these hand movements may trace the geometry (i.e. elements such as pitch contours, pitch
spread, rhythmical patterns, textures, and even timbral elements as shapes) as well as
convey sensations of effort of musical sound, hence the focus in this paper on geometry and
effort in the gestural renderings of musical sound.
There are many different gestures that may be associated with music. Using the
Gibsonian concept of affordance, we can thus speak of rich gestural affordances of musical
sound. For practical purposes we can in this paper think of two main categories, sound-
producing gestures (such as hitting, stoking, bowing) and sound-accompanying gestures
(such as dancing, marching, making various movements to the music), as well as several
sub-categories of these. The distinction between these two main categories as well as their
sub-categories may not always be so clear (e.g. musicians make gestures in performance
that are probably not strictly necessary for producing sound, but may be useful for reasons
of motor control or physiological comfort, or have communicative functions towards other
musicians or the audience).
But in order to carry out more systematic observation studies of gestural renderings, we
have proceeded from giving subjects rather well-defined tasks with limited gestural affor-
dances onto progressively more open tasks with quite rich gestural affordances, meaning
proceeding from studies of air-instrument performances where subjects were asked to
make sound-producing movements, to what we have called sound-tracing studies where the
musical excerpts were quite restricted as to their number of salient features, on to what we
called free dance gestures with more complex, multi-feature excerpts and rather general
instructions to subjects about making spontaneous gestural renderings based on what they
perceived as the most salient features.
The idea of gestural rendering of musical sound is based on a large body of research
ranging from classical motor theory of perception to more recent theories of motor in-
Page 61
volvement in perception in general, and more specifically in audio perception, as well as in
music related tasks in particular.
Obviously, auditory-motor couplings as well as the capacity to render and/or imitate
sound is not restricted to hand movements, as is evident from vocal imitation of both non-
musical and musical sound (e.g. so-called beat-boxing in hip-hop and other music and scat
singing in jazz). But the focus on hand movements in our case is based not only on innu-
merable informal observations of listeners making hand movements to musical sound, but
also on the belief that hand movements have a privileged role from an evolutionary point of
view and from a general gesture-cognitive point of view. Furthermore, we believe that a
listener through a process of translation by the principle of motor equivalence, may switch
from one set of effectors to another, revealing more amodal gestural images of musical
sound.
It seems quite clear that even novices can make gestures that reflect reasonably well
what is going on in the music when asked to mime sound-producing actions, although ex-
perts tend to make more detailed renderings as reported in. Also when listeners were asked
to draw gestures they felt reflected the musical excerpt they heard, there was reasonable
agreements as long as the excerpts did not have more than one or two prominent feature,
e.g. an ascending pitch contour or an ascending pitch contour combined with various
ornamental ripples, and greater disagreement when the number of concurrent prominent
features was increased, e.g. excerpts with several concurrent textural elements. A sub-
sequent similarity rating of the resultant sound-tracings seems to have confirmed these
agreement and/or disagreements. In the case of 3-dimensional bi-manual movements to
sounds, we got even more varied results, something that we would expect given the greater
choice in effector use and feature focus.
Rather than despair because of these increasingly divergent, and also often rather ap-
proximate, gestural renderings of musical sound, we shall in the following see how the
elements of geometry and effort may be understood as intrinsic to the perception-action
cycle spontaneously at work in musical experience, and furthermore try to see how gestural
renderings of musical sound may be understood as a means for intentional focus in listen-
ing, and may even be put to active use in the exploration of musical sound.
It is generally accepted that music is a multidimensional phenomenon in the sense that
music has elements such as rhythm, tempo, intensity (often referred to as dynamics) pitch,
melody, accompaniment, harmony, timbre, texture, etc., and that these elements in turn may
be differentiated into a number of sub-elements. This is one of the reasons for the above-
mentioned rich gestural affordances of musical sound, as listeners may attend to, and ges-
turally render, any single or any selection of such musical elements. Also, elements that
from a music theory perspective may be thought of as separate, may be fused in actual per-
ception, such as in the well known interdependence of perceived intensity and timbre. This
dimensional fusion may even extend to dimensions that 'are not really there', i.e. we may
see what the authors of have termed a 'spill over' effect, e.g. that a crescendo may also be
perceived as an accelerando by some listeners although the tempo was in fact constant.
Although the elements of geometry and effort are inseparable in the sense that we can
not have images of geometry (e.g. pitch change, timbre change, etc.) without some image of
effort, and conversely, can hardly have images of effort in music without images of move-
ment in space, it is strategically convenient to separate these elements here in order to get a
more clear impression of what features listeners focus on in various gestural renderings of
musical sound.
music related tasks in particular.
Obviously, auditory-motor couplings as well as the capacity to render and/or imitate
sound is not restricted to hand movements, as is evident from vocal imitation of both non-
musical and musical sound (e.g. so-called beat-boxing in hip-hop and other music and scat
singing in jazz). But the focus on hand movements in our case is based not only on innu-
merable informal observations of listeners making hand movements to musical sound, but
also on the belief that hand movements have a privileged role from an evolutionary point of
view and from a general gesture-cognitive point of view. Furthermore, we believe that a
listener through a process of translation by the principle of motor equivalence, may switch
from one set of effectors to another, revealing more amodal gestural images of musical
sound.
It seems quite clear that even novices can make gestures that reflect reasonably well
what is going on in the music when asked to mime sound-producing actions, although ex-
perts tend to make more detailed renderings as reported in. Also when listeners were asked
to draw gestures they felt reflected the musical excerpt they heard, there was reasonable
agreements as long as the excerpts did not have more than one or two prominent feature,
e.g. an ascending pitch contour or an ascending pitch contour combined with various
ornamental ripples, and greater disagreement when the number of concurrent prominent
features was increased, e.g. excerpts with several concurrent textural elements. A sub-
sequent similarity rating of the resultant sound-tracings seems to have confirmed these
agreement and/or disagreements. In the case of 3-dimensional bi-manual movements to
sounds, we got even more varied results, something that we would expect given the greater
choice in effector use and feature focus.
Rather than despair because of these increasingly divergent, and also often rather ap-
proximate, gestural renderings of musical sound, we shall in the following see how the
elements of geometry and effort may be understood as intrinsic to the perception-action
cycle spontaneously at work in musical experience, and furthermore try to see how gestural
renderings of musical sound may be understood as a means for intentional focus in listen-
ing, and may even be put to active use in the exploration of musical sound.
It is generally accepted that music is a multidimensional phenomenon in the sense that
music has elements such as rhythm, tempo, intensity (often referred to as dynamics) pitch,
melody, accompaniment, harmony, timbre, texture, etc., and that these elements in turn may
be differentiated into a number of sub-elements. This is one of the reasons for the above-
mentioned rich gestural affordances of musical sound, as listeners may attend to, and ges-
turally render, any single or any selection of such musical elements. Also, elements that
from a music theory perspective may be thought of as separate, may be fused in actual per-
ception, such as in the well known interdependence of perceived intensity and timbre. This
dimensional fusion may even extend to dimensions that 'are not really there', i.e. we may
see what the authors of have termed a 'spill over' effect, e.g. that a crescendo may also be
perceived as an accelerando by some listeners although the tempo was in fact constant.
Although the elements of geometry and effort are inseparable in the sense that we can
not have images of geometry (e.g. pitch change, timbre change, etc.) without some image of
effort, and conversely, can hardly have images of effort in music without images of move-
ment in space, it is strategically convenient to separate these elements here in order to get a
more clear impression of what features listeners focus on in various gestural renderings of
musical sound.
Page 63
2fast as possible” tied with a decelerando back to medium paced De´tache´ on one
fixed note. Movements were recorded with a Vicon System 460 optical tracker,
and bow pressure with a custom sensor designed at Ircam.
Results and Discussion: The analysis of bow motion showed the existence
of two types of bow velocity and acceleration profiles, therefore defining two
bowing gestures: for lower bow stroke frequencies, bow velocity is close to a
square shape, for higher frequencies it shifts to a smooth, almost sinusoid shape
(Figure 1 left).
The arm joint angle profiles follow a similar change. Moreover, arm joint
angles also indicates the possible existence of a later, within-limb change of
coordination. In the case of the violin player, the elbow and the wrist first
start in out-of-phase and shift to in-phase to achieve the fastest part of the
accelerando/decelerando (Figure 1 right).
Interestingly, the effect on sound is more subtle. A spectrum analysis does
not reveal any clear concomitant change to the dramatic change in the bow
velocity profile. We can argue that this similarity in sound is intended by instru-
mentalists, adjusting unconsciously other parameters, so that the abrupt change
in bowing gesture is not hearable. However, we may also hypothesize that this
change in bowing gesture has still an effect on finer timbre aspects (e.g. tran-
sitions between notes). This exploratory study is currently completed with the
measurement and analysis of more players to validate the hypothesis of a change
in arm coordination.
violin bow acceleration [mm/s]
9 10 11 12−6000
−4000−2000
02000
40006000
time [s]
derivative of violinist′s joint angles
11 12 13 14 15−150
−100−50
050
100150
time [s]
wristelbow
Fig. 1. Left: Change in bow acceleration. Right: Possible change in limb coordination.
References
1. Rasamimanana, N.H., Flety, E., Bevilacqua, F.: Gesture analysis of violin bow
strokes. In Gibet, S., Courty, N., Kamp, J.F., eds.: Lecture Notes in Artificial
Intelligence, LNAI 3881, Springer Verlag (2006) 145–155
2. Haken, H., Kelso, J.A.S., Bunz, H.: A theoretical model of phase transitions in
human hand movements. Biological cybernetics 51(5) (1985) 347–356
3. Winold, H., Thelen, E., Ulrich, B.D.: Coordination and control in the bow arm
movements of highly skilled cellists. Ecological Psychology 6(1) (1994) 1–31
fixed note. Movements were recorded with a Vicon System 460 optical tracker,
and bow pressure with a custom sensor designed at Ircam.
Results and Discussion: The analysis of bow motion showed the existence
of two types of bow velocity and acceleration profiles, therefore defining two
bowing gestures: for lower bow stroke frequencies, bow velocity is close to a
square shape, for higher frequencies it shifts to a smooth, almost sinusoid shape
(Figure 1 left).
The arm joint angle profiles follow a similar change. Moreover, arm joint
angles also indicates the possible existence of a later, within-limb change of
coordination. In the case of the violin player, the elbow and the wrist first
start in out-of-phase and shift to in-phase to achieve the fastest part of the
accelerando/decelerando (Figure 1 right).
Interestingly, the effect on sound is more subtle. A spectrum analysis does
not reveal any clear concomitant change to the dramatic change in the bow
velocity profile. We can argue that this similarity in sound is intended by instru-
mentalists, adjusting unconsciously other parameters, so that the abrupt change
in bowing gesture is not hearable. However, we may also hypothesize that this
change in bowing gesture has still an effect on finer timbre aspects (e.g. tran-
sitions between notes). This exploratory study is currently completed with the
measurement and analysis of more players to validate the hypothesis of a change
in arm coordination.
violin bow acceleration [mm/s]
9 10 11 12−6000
−4000−2000
02000
40006000
time [s]
derivative of violinist′s joint angles
11 12 13 14 15−150
−100−50
050
100150
time [s]
wristelbow
Fig. 1. Left: Change in bow acceleration. Right: Possible change in limb coordination.
References
1. Rasamimanana, N.H., Flety, E., Bevilacqua, F.: Gesture analysis of violin bow
strokes. In Gibet, S., Courty, N., Kamp, J.F., eds.: Lecture Notes in Artificial
Intelligence, LNAI 3881, Springer Verlag (2006) 145–155
2. Haken, H., Kelso, J.A.S., Bunz, H.: A theoretical model of phase transitions in
human hand movements. Biological cybernetics 51(5) (1985) 347–356
3. Winold, H., Thelen, E., Ulrich, B.D.: Coordination and control in the bow arm
movements of highly skilled cellists. Ecological Psychology 6(1) (1994) 1–31
Page 65
acquisition of synchronized multimodal data from two violin performers. Toward this
aim, we decide to use different types of video-cameras, microphones, and special
physiological sensors (the BioMuse system designed by Ben Knapp). For the generation
of the multimodal archive, we developed a distributed network of computers running the
EyesWeb XMI open software platform, each dedicated to a subset of the recording
streams. The result was a synchronized recording of all the streams of multimodal data.
We obtained an archive of 1500 GB of multimodal high quality data which is the start for
subsequent investigation.
Fig. 1 Setup for the Premio Paganini project
3 Ratings studies with human participants
The part of the experiment involving the audience was a real concert, where a subset of
the measurements were shown and explained to the audience in real time during the event.
The violin player selected from the semifinalists from International Violin Competition
Premio Paganini was asked to perform the selected Bach canon, four times, with the four
different emotions: anger, sad, joy, and serenity (verbal instruction was used in this case).
Among the audience, 31 spectators performed a rating test in which they were asked to
rate the intensity of each emotion (on a scale from 0 to 10). Analysis shows that all the
expressed emotions were recognized by the audience: all intended emotions received
higher mean ratings. To complete the analysis with between-subject measurements, we
computed the number of emotion classes successfully recognized by each spectator. We
found that only 20% of the spectators succeeded in recognizing all the emotions, but that
90% recognized at least one emotion.
In sum, the results showed that several spectators recognized some emotional content but
sometimes the emotion did not match the emotion intended by the performer. The
musician, directed with verbal instructions, thus succeeded partially in conveying
emotional content during her performance. These results should be confirmed and
extended by subsequent analysis on the multimodal data coming from the larger
multimodal archive.
aim, we decide to use different types of video-cameras, microphones, and special
physiological sensors (the BioMuse system designed by Ben Knapp). For the generation
of the multimodal archive, we developed a distributed network of computers running the
EyesWeb XMI open software platform, each dedicated to a subset of the recording
streams. The result was a synchronized recording of all the streams of multimodal data.
We obtained an archive of 1500 GB of multimodal high quality data which is the start for
subsequent investigation.
Fig. 1 Setup for the Premio Paganini project
3 Ratings studies with human participants
The part of the experiment involving the audience was a real concert, where a subset of
the measurements were shown and explained to the audience in real time during the event.
The violin player selected from the semifinalists from International Violin Competition
Premio Paganini was asked to perform the selected Bach canon, four times, with the four
different emotions: anger, sad, joy, and serenity (verbal instruction was used in this case).
Among the audience, 31 spectators performed a rating test in which they were asked to
rate the intensity of each emotion (on a scale from 0 to 10). Analysis shows that all the
expressed emotions were recognized by the audience: all intended emotions received
higher mean ratings. To complete the analysis with between-subject measurements, we
computed the number of emotion classes successfully recognized by each spectator. We
found that only 20% of the spectators succeeded in recognizing all the emotions, but that
90% recognized at least one emotion.
In sum, the results showed that several spectators recognized some emotional content but
sometimes the emotion did not match the emotion intended by the performer. The
musician, directed with verbal instructions, thus succeeded partially in conveying
emotional content during her performance. These results should be confirmed and
extended by subsequent analysis on the multimodal data coming from the larger
multimodal archive.
Page 66
A perceptual-based algorithm for segmentation of human
full-body movement: a pilot experiment
Antonio Camurri, Donald Glowinski, Barbara Mazzarino, Gualtiero Volpe
InfoMus Lab, DIST - University of Genova
Viale Causa 13, I-16145 Genova, Italy
{toni, bunny, volpe, donald}@infomus.dist.unige.it
www.infomus.dist.unige.it
{toni, donald, barbara, volpe}@infomus.dist.unige.it
1 Introduction
This paper presents a pilot experiment for the perceptual validation by human subjects of
a motion segmentation algorithm, i.e., an algorithm for automatically segmenting a
motion sequence or a dance fragment in a collection of pause and motion phases. Such
validation of algorithm is a main issue in the development of multimodal interactive
systems having human action and in particular human movement and expressive gesture
as main input channel. The discussed experiment is part of a broader research at DIST-
InfoMus Lab aiming at investigating the non-verbal mechanisms of communication
involving human movement and gesture as primary conveyors of expressive emotional
content. Research is carried out in the framework of the UE IST Network of Excellence
HUMAINE (Human-Machine Interaction Network on Emotion).
2 Motion Segmentation
Our expressive gesture analysis is carried out in a multilayered conceptual framework
moving from low-level physical measures (e.g. position, speed, acceleration of body
parts) toward descriptors of overall motion features (e.g., motion fluency, directness,
impulsiveness) [2]. In such framework, motion segmentation occupies a central place
since expressive qualities are extracted from segmented motion phases, and, at a higher
level, extracted from the relationships between segments, including both motion and
pause segments. Some expressive features rely on the temporal duration of motion and
pause phases in relation with the total duration of the movement: e.g. fluidity and
impulsiveness, which are taken as exemplary expressive cues on the present study. A
movement (e.g., a dance fragment) performed with frequent stops and restarts (i.e.,
characterized by a high number of short pauses and motion phases) will result less fluent
full-body movement: a pilot experiment
Antonio Camurri, Donald Glowinski, Barbara Mazzarino, Gualtiero Volpe
InfoMus Lab, DIST - University of Genova
Viale Causa 13, I-16145 Genova, Italy
{toni, bunny, volpe, donald}@infomus.dist.unige.it
www.infomus.dist.unige.it
{toni, donald, barbara, volpe}@infomus.dist.unige.it
1 Introduction
This paper presents a pilot experiment for the perceptual validation by human subjects of
a motion segmentation algorithm, i.e., an algorithm for automatically segmenting a
motion sequence or a dance fragment in a collection of pause and motion phases. Such
validation of algorithm is a main issue in the development of multimodal interactive
systems having human action and in particular human movement and expressive gesture
as main input channel. The discussed experiment is part of a broader research at DIST-
InfoMus Lab aiming at investigating the non-verbal mechanisms of communication
involving human movement and gesture as primary conveyors of expressive emotional
content. Research is carried out in the framework of the UE IST Network of Excellence
HUMAINE (Human-Machine Interaction Network on Emotion).
2 Motion Segmentation
Our expressive gesture analysis is carried out in a multilayered conceptual framework
moving from low-level physical measures (e.g. position, speed, acceleration of body
parts) toward descriptors of overall motion features (e.g., motion fluency, directness,
impulsiveness) [2]. In such framework, motion segmentation occupies a central place
since expressive qualities are extracted from segmented motion phases, and, at a higher
level, extracted from the relationships between segments, including both motion and
pause segments. Some expressive features rely on the temporal duration of motion and
pause phases in relation with the total duration of the movement: e.g. fluidity and
impulsiveness, which are taken as exemplary expressive cues on the present study. A
movement (e.g., a dance fragment) performed with frequent stops and restarts (i.e.,
characterized by a high number of short pauses and motion phases) will result less fluent
Page 72
Signs Workshop: the Importance of Natural Gestures
in the Promotion of Early Communication Skills of
Children with Developmental Disabilities
Ana Margarida P. Almeida1, Teresa Condeço2, Fernando Ramos1, Álvaro Sousa1,
Luísa Cotrim2, Sofia Macedo2, Miguel Palha2
1
Department of Communication and Arts, University of Aveiro, Campus de Santiago, 3810-
193 Aveiro
2 Differences, Child Developmental Centre, Centro Comercial da Bela Vista, Av. Santo
Condestável, Loja 32, Via Central de Chelas, 1950-094 Lisboa
This article emphasises the importance of natural gestures and describes the
framework and the development process of the “Signs Workshop” CD-ROM, which
is a multimedia application for the promotion of early communication skills of
children with developmental disabilities. Signs Workshop CD-ROM was created in
the scope of Down’s Comm Project, which was financed by the Calouste Gulbenkian
Foundation, and is the result of a partnership between UNICA (Communication and
Arts Research Unit of the University of Aveiro) and the Portuguese Down Syndrome
Association (APPT21/Differences).
This project’s main objective was to research (and translate into an interactive
multimedia application) examples of natural gestures from the Portuguese culture, in
order to ensure the expansion and flexibility of its use by parent, educators and
therapist who care for children with developmental disabilities, particularly children
with difficulties in the development of speech.
Children with developmental difficulties, especially those with Down Syndrome,
present changes in the development and use of language, with particular emphasis at
the speech development level [1] [2] [3] [4]. Hence, the subsequent difficulties to
communicate surface since the pre-verbal stage, which result in a general tendency to
show passivity in communicating and in a low ability to take initiative towards
interacting with other individuals.
The Sign Communication Systems or Sign Language Systems, when organized in
symbolic or coded signs, are examples of the Augmented Communication Systems
frequently used [5]. In effect, in the specific case of children with Trisomy 21, the
Augmented Communication System designated as Total Communication
(simultaneous use of signs and language) is intensively used as a temporary system of
transition during the early stages of speech development. This transition temporary
system is particularly appropriate to children who did not initiate speech exercise
around the 12-18 months of age, and who, in consequence, present signs of frustration
by their incapacity of being understood by parents, siblings or other individuals [5].
The research made by the APPT21/Diferenças team, with the objective of
standardizing the signs to include in the CD-ROM, was gathered among Portuguese
population (continent and islands) with diagnosed Down Syndrome, who, in an early
in the Promotion of Early Communication Skills of
Children with Developmental Disabilities
Ana Margarida P. Almeida1, Teresa Condeço2, Fernando Ramos1, Álvaro Sousa1,
Luísa Cotrim2, Sofia Macedo2, Miguel Palha2
1
Department of Communication and Arts, University of Aveiro, Campus de Santiago, 3810-
193 Aveiro
2 Differences, Child Developmental Centre, Centro Comercial da Bela Vista, Av. Santo
Condestável, Loja 32, Via Central de Chelas, 1950-094 Lisboa
This article emphasises the importance of natural gestures and describes the
framework and the development process of the “Signs Workshop” CD-ROM, which
is a multimedia application for the promotion of early communication skills of
children with developmental disabilities. Signs Workshop CD-ROM was created in
the scope of Down’s Comm Project, which was financed by the Calouste Gulbenkian
Foundation, and is the result of a partnership between UNICA (Communication and
Arts Research Unit of the University of Aveiro) and the Portuguese Down Syndrome
Association (APPT21/Differences).
This project’s main objective was to research (and translate into an interactive
multimedia application) examples of natural gestures from the Portuguese culture, in
order to ensure the expansion and flexibility of its use by parent, educators and
therapist who care for children with developmental disabilities, particularly children
with difficulties in the development of speech.
Children with developmental difficulties, especially those with Down Syndrome,
present changes in the development and use of language, with particular emphasis at
the speech development level [1] [2] [3] [4]. Hence, the subsequent difficulties to
communicate surface since the pre-verbal stage, which result in a general tendency to
show passivity in communicating and in a low ability to take initiative towards
interacting with other individuals.
The Sign Communication Systems or Sign Language Systems, when organized in
symbolic or coded signs, are examples of the Augmented Communication Systems
frequently used [5]. In effect, in the specific case of children with Trisomy 21, the
Augmented Communication System designated as Total Communication
(simultaneous use of signs and language) is intensively used as a temporary system of
transition during the early stages of speech development. This transition temporary
system is particularly appropriate to children who did not initiate speech exercise
around the 12-18 months of age, and who, in consequence, present signs of frustration
by their incapacity of being understood by parents, siblings or other individuals [5].
The research made by the APPT21/Diferenças team, with the objective of
standardizing the signs to include in the CD-ROM, was gathered among Portuguese
population (continent and islands) with diagnosed Down Syndrome, who, in an early
Page 75
2successive poses of the human during a specific task. The dynamical level deals
with the forces and torques implied by the motion. Physiological analysis can
then compute the impact of the motion on the human.
To perform ergonomic analysis on a disabled person’s workplace, we propose
an analysis-synthesis framework which allows to synthesize motor tasks from
physical information while providing ergonomic relevant information at each of
the analysis three levels.
As proposed, the virtual human’s physical information are represented by a
set of motion constraints, divided in three categories: global task constraints,
kinematic constraints and physical constraints. The framework is divided in two
parts. The first one concerns synthesis and transforms the selected interaction in
a motion respecting the specified constraints. The second part exploits the infor-
mation provided by the synthesis part to perform different levels of ergonomic
analysis.
We have developed a physical environment by integrating an existing dy-
namic simulator supported by a virtual reality platform called ARe´VI. Simple
environments, responding to physics, can already be defined using simple geo-
metrical primitives. We are currently working to enhance the objects description
using the Smart Object framework.
A virtual human prototype, capable of pointing objects, has also been im-
plemented using the sensori-motor model described in [2]. The simple motion
primitives will be combined for simulating more complex tasks.
This paper aims to present our preliminary work on a framework for er-
gonomic analysis of a disabled person’s workplace.
Our perspectives are to concretize this framework and confront it to a use case
provided by ergonomists. We are planning to apply the framework to the sensori-
motor model with respect to natural control laws. Having a constraint compliant
model, we will experiment our gesture based approach to model disabilities.
Finally, we wish to evaluate our model using a real case study provided by
experts in ergonomy.
References
1. N. Badler. Virtual humans for animation, ergonomics, and simulation. Nonrigid
and Articulated Motion Workshop, pages 28–36, 1997.
2. S. Gibet and P.F. Marteau. A self-organized model for the control, planning and
learning of nonlinear multi-dimensional systems using a sensory feedback. Applied
Intelligence, pages 337–349, 1994.
3. J.M. Porter, K. Case, R. Marshall, D. Gyi, and R. Sims nee´ Oliver. Beyond Jack and
Jill: designing for individuals using HADRIAN. International Journal of Industrial
Ergonomics, pages 249–264, 2004.
with the forces and torques implied by the motion. Physiological analysis can
then compute the impact of the motion on the human.
To perform ergonomic analysis on a disabled person’s workplace, we propose
an analysis-synthesis framework which allows to synthesize motor tasks from
physical information while providing ergonomic relevant information at each of
the analysis three levels.
As proposed, the virtual human’s physical information are represented by a
set of motion constraints, divided in three categories: global task constraints,
kinematic constraints and physical constraints. The framework is divided in two
parts. The first one concerns synthesis and transforms the selected interaction in
a motion respecting the specified constraints. The second part exploits the infor-
mation provided by the synthesis part to perform different levels of ergonomic
analysis.
We have developed a physical environment by integrating an existing dy-
namic simulator supported by a virtual reality platform called ARe´VI. Simple
environments, responding to physics, can already be defined using simple geo-
metrical primitives. We are currently working to enhance the objects description
using the Smart Object framework.
A virtual human prototype, capable of pointing objects, has also been im-
plemented using the sensori-motor model described in [2]. The simple motion
primitives will be combined for simulating more complex tasks.
This paper aims to present our preliminary work on a framework for er-
gonomic analysis of a disabled person’s workplace.
Our perspectives are to concretize this framework and confront it to a use case
provided by ergonomists. We are planning to apply the framework to the sensori-
motor model with respect to natural control laws. Having a constraint compliant
model, we will experiment our gesture based approach to model disabilities.
Finally, we wish to evaluate our model using a real case study provided by
experts in ergonomy.
References
1. N. Badler. Virtual humans for animation, ergonomics, and simulation. Nonrigid
and Articulated Motion Workshop, pages 28–36, 1997.
2. S. Gibet and P.F. Marteau. A self-organized model for the control, planning and
learning of nonlinear multi-dimensional systems using a sensory feedback. Applied
Intelligence, pages 337–349, 1994.
3. J.M. Porter, K. Case, R. Marshall, D. Gyi, and R. Sims nee´ Oliver. Beyond Jack and
Jill: designing for individuals using HADRIAN. International Journal of Industrial
Ergonomics, pages 249–264, 2004.
Page 77
Gesture In Mobile Computing and Usability Studies
Page 79
consisted on a first part with questions about current habits on mobile phone
interaction and in a second part where users were asked to reach the most used
applications and contacts. It was found that 75% of the interviewed used key
shortcuts, while none used voice shortcuts due to its social constraints and low
recognition rates. An average of 5 key shortcuts is used, where 93% of the users
execute them on a daily basis. Users with more programmed shortcuts reported
difficulties in their memorization. In user observation, results show that people
needed an average of 4 keystrokes to access the 3 most used applications and 5
keystrokes to call the 3 most used contacts. Key shortcuts seem to be used but
observation results reflect a large number of keystrokes. Users often make mistakes or
simply forget to use them and apply menu selection.
System Requirements. The system should be able to produce a shortcut after a
recognized gesture. Some of these gestures can be predefined but the user has to be
able to build personalized ones. Those gestures are intended to be associated with
body space and store a meaningful body mnemonic to help in its future recall.
Feedback is important and should be also personalized, giving the user the possibility
to choose visual, speech, audio or vibrational feedback.
RFID Body Shortcuts. To validate our approach we developed a RFID-based
prototype able to associate body parts (through sticker tags) with any given mobile
device shortcut (i.e. an application or a call to a certain contact). We selected RFID
technology to apply our approach because it provides direct mapping, easing the
creation of body shortcuts.
Evaluation. The prototype was evaluated with 20 users in a controlled
environment using a Pocket LOOX 720 with a compact flash ACG RF PC Handheld
Reader. In the first stage of the evaluation the users were asked to select the five most
frequently tasks effectuated with their mobile phones and associate them both with a
body part and a mobile device key (in their own mobile device). Considering body
shortcuts, it is interesting to notice that 89%, out of 18 users, related message writing
with the hand, 88%, out of 17 users, related making a call to their ear or mouth and
91%, out of 11 users, related their contacts to their chest, among other meaningful
relations. An hour later, the users were asked to access the previously selected
applications, following both approaches (body and key shortcuts). For each of the
approaches the users were prompted randomly 20 times (5 for each application).
Although several users selected already used key/application relations, 50% (10 users)
made at least one error, with an average of 9% errors/user. Considering body
shortcuts, only 15% (3 users) made a mistake with an average of 0.8% errors/user.
3 Conclusions and Future Work
We presented a work in progress to improve shortcut execution in a mobile context
focusing on the body space as a meaningful target for interaction. A RFID-based
prototype was developed and evaluated. The conducted user studies showed that body
mnemonics, besides meaningful, and sometimes universal, are easily recalled and
surpass traditional key shortcuts. The work will continue with focus on feedback,
shortcut personalization and user acceptance issues.
interaction and in a second part where users were asked to reach the most used
applications and contacts. It was found that 75% of the interviewed used key
shortcuts, while none used voice shortcuts due to its social constraints and low
recognition rates. An average of 5 key shortcuts is used, where 93% of the users
execute them on a daily basis. Users with more programmed shortcuts reported
difficulties in their memorization. In user observation, results show that people
needed an average of 4 keystrokes to access the 3 most used applications and 5
keystrokes to call the 3 most used contacts. Key shortcuts seem to be used but
observation results reflect a large number of keystrokes. Users often make mistakes or
simply forget to use them and apply menu selection.
System Requirements. The system should be able to produce a shortcut after a
recognized gesture. Some of these gestures can be predefined but the user has to be
able to build personalized ones. Those gestures are intended to be associated with
body space and store a meaningful body mnemonic to help in its future recall.
Feedback is important and should be also personalized, giving the user the possibility
to choose visual, speech, audio or vibrational feedback.
RFID Body Shortcuts. To validate our approach we developed a RFID-based
prototype able to associate body parts (through sticker tags) with any given mobile
device shortcut (i.e. an application or a call to a certain contact). We selected RFID
technology to apply our approach because it provides direct mapping, easing the
creation of body shortcuts.
Evaluation. The prototype was evaluated with 20 users in a controlled
environment using a Pocket LOOX 720 with a compact flash ACG RF PC Handheld
Reader. In the first stage of the evaluation the users were asked to select the five most
frequently tasks effectuated with their mobile phones and associate them both with a
body part and a mobile device key (in their own mobile device). Considering body
shortcuts, it is interesting to notice that 89%, out of 18 users, related message writing
with the hand, 88%, out of 17 users, related making a call to their ear or mouth and
91%, out of 11 users, related their contacts to their chest, among other meaningful
relations. An hour later, the users were asked to access the previously selected
applications, following both approaches (body and key shortcuts). For each of the
approaches the users were prompted randomly 20 times (5 for each application).
Although several users selected already used key/application relations, 50% (10 users)
made at least one error, with an average of 9% errors/user. Considering body
shortcuts, only 15% (3 users) made a mistake with an average of 0.8% errors/user.
3 Conclusions and Future Work
We presented a work in progress to improve shortcut execution in a mobile context
focusing on the body space as a meaningful target for interaction. A RFID-based
prototype was developed and evaluated. The conducted user studies showed that body
mnemonics, besides meaningful, and sometimes universal, are easily recalled and
surpass traditional key shortcuts. The work will continue with focus on feedback,
shortcut personalization and user acceptance issues.
Page 81
2
Moreover, as C-VPT gestures are more accessible to children [2], the observation of
C-VPT gestures while listening at words enhanced the students’ performance in
memory tasks: students more frequently retrieved both words accompanied by C-VPT
signals. The effects were limited to younger students: results show that C-VPT
gestures seem to be of higher communicative power than O-VPT gestures for younger
children, and that multimodality is less necessary in older students.
In the second study we wanted to estimate the effects of teachers’ iconic gestures and
their viewpoints on the students’ comprehension and memory of short stories.
Data analysis in this study about viewpoint effects in story recall seems to confirm the
decrease, during cognitive development, of the C-VPT positive effects.
Also for memory of story task, C-VPT gestures help younger students more than
older ones. Moreover, as in the previous task, teacher’s gestures seem to be more
helpful for lower class grades students that in N condition performed significantly
worse.
Both studies show that the communicative value of iconic gestures already found for
adults, and specifically of the gesture viewpoint, [4];[5];[6], is even higher for
children. Moreover it results that younger children take advantage of the additional
information provided by iconic gestures more than older ones, and that the advantage
provided by character viewpoint in younger children holds not only on the production
side, as found by McNeill [2], but also on the comprehension side.
Finally, the results of our studies and the analysis of the videotapes confirm that the
best way to assess the communicative effectiveness of iconic gestures is a semantic
analysis that allows to consider the type of information they specifically convey. On
this ground it will also be possible to re-assess the communicative effectiveness of
gestures also in adults, for which the viewpoint does not seem as important.
Acknowledgments. Participation in the GW2007 was supported by HUMAINE
(European Project IST- 507422).
References
1. Rosenthal, R., Jacobson, L. (1968) Pygmalion in classroom. New York, Holt, Rinehart &
Winston.
2. McNeill, D., (1992). Hand and Mind: What Gestures Reveal about Thought. University of
Chicago Press, Chicago and London.
3. Goldin-Meadow, S. (2003). Hearing gesture, The Belknap Press of Harvard Univeristy Press
4. Holler, J., Beattie G. (2002). A microanalytic investigation of how iconic gestures and
speech represent core semantic features in talk. Semiotica, 142-1/4, 31-69.
5. Beattie, G. & Shovelton, H. (1999). Do iconic hand gestures really contribute anything to the
semantic information conveyed by speech? An experimental investigation. Semiotica, 123,
1-30
6. Beattie, G. & Shovelton, H., (2001). An experimental investigation of the role of different
types of iconic gesture in communication: a semantic feature approach. Gesture, 1:129-149.
Moreover, as C-VPT gestures are more accessible to children [2], the observation of
C-VPT gestures while listening at words enhanced the students’ performance in
memory tasks: students more frequently retrieved both words accompanied by C-VPT
signals. The effects were limited to younger students: results show that C-VPT
gestures seem to be of higher communicative power than O-VPT gestures for younger
children, and that multimodality is less necessary in older students.
In the second study we wanted to estimate the effects of teachers’ iconic gestures and
their viewpoints on the students’ comprehension and memory of short stories.
Data analysis in this study about viewpoint effects in story recall seems to confirm the
decrease, during cognitive development, of the C-VPT positive effects.
Also for memory of story task, C-VPT gestures help younger students more than
older ones. Moreover, as in the previous task, teacher’s gestures seem to be more
helpful for lower class grades students that in N condition performed significantly
worse.
Both studies show that the communicative value of iconic gestures already found for
adults, and specifically of the gesture viewpoint, [4];[5];[6], is even higher for
children. Moreover it results that younger children take advantage of the additional
information provided by iconic gestures more than older ones, and that the advantage
provided by character viewpoint in younger children holds not only on the production
side, as found by McNeill [2], but also on the comprehension side.
Finally, the results of our studies and the analysis of the videotapes confirm that the
best way to assess the communicative effectiveness of iconic gestures is a semantic
analysis that allows to consider the type of information they specifically convey. On
this ground it will also be possible to re-assess the communicative effectiveness of
gestures also in adults, for which the viewpoint does not seem as important.
Acknowledgments. Participation in the GW2007 was supported by HUMAINE
(European Project IST- 507422).
References
1. Rosenthal, R., Jacobson, L. (1968) Pygmalion in classroom. New York, Holt, Rinehart &
Winston.
2. McNeill, D., (1992). Hand and Mind: What Gestures Reveal about Thought. University of
Chicago Press, Chicago and London.
3. Goldin-Meadow, S. (2003). Hearing gesture, The Belknap Press of Harvard Univeristy Press
4. Holler, J., Beattie G. (2002). A microanalytic investigation of how iconic gestures and
speech represent core semantic features in talk. Semiotica, 142-1/4, 31-69.
5. Beattie, G. & Shovelton, H. (1999). Do iconic hand gestures really contribute anything to the
semantic information conveyed by speech? An experimental investigation. Semiotica, 123,
1-30
6. Beattie, G. & Shovelton, H., (2001). An experimental investigation of the role of different
types of iconic gesture in communication: a semantic feature approach. Gesture, 1:129-149.
Page 83
from sound source panning systems to complex positioning and room model sys-
tems such as ViMiC[1]. These parameters form the basis for our examination of
the control of spatialization. Some examples of parameters include sound source
position and orientation, source presence, room size and reverb time[2].
A number of issues arise with the control of sound spatialization, which must
be dealt with in order to arrive at an effective system. These issues can arise
from the spatialization system itself, or from the gesture control system. In par-
ticular we are concerned with issues such as the continuous or discrete nature
of variables (whether control or spatialization variables), the resolution of con-
trol, the frequency of parameter updates, the integrality and separability[3] of
control and spatialization variables and the cognitive load placed on the per-
former by the system. Along with these general issues, there are a number of
issues which specifically relate to the controllers for spatialization and which
must be examined when designing new gesture control systems. These include
the use of absolute or relative positioning systems, the use of current or ballistic
control, whether the sensing methods used have a return-to-zero feature or not,
the choice of isometric or isotonic sensing, and the provision of feedback to the
performer.
One example of a system which we have implemented to allow for control
of spatialization is that of a pressure-sensitive floor, which is used to control
spatialization parameters using the position of the performer’s center of mass.
Motion capture of cello performance has shown that this is a continous, slowly
varying parameter, primarily in one dimension. This makes it suitable for con-
trolling single spatialization parameters which do not need to change quickly
over the course of the performance. While this control parameter could be used
to change the position of a sound source, it would result only in a sound which
moves slowly from side to side in a repetitive manner over the course of the piece,
which may not make for an interesting or useful effect. Also, in such a case, if
the performer were to try and use this interface to deliberately steer a sound
this would result in extra loading on the performer which might distract from
their performance. On the other hand, using this control to manipulate a sound
source parameter such as brightness allowed control over the system without
requiring as much thought on the part of the performer. This presentation will
include examples of other control systems which we have developed to allow for
gesture controlled sound spatialization.
References
1. Braasch, J.: A loudspeaker-based 3d sound projection using virtual microphone
control (vimic). In: Convention of the Audio Eng. Soc. 118. (2005)
2. Marshall, M., Malloch, J., Wanderley, M.: Comparison of controls in available
spatialization systems. Technical Report MUMT-IDMIL-07-02, McGill University
(2007)
3. Jacob, R., Sibert, L., McFarlane, D., Mullen, M.: Integrality and separability of
input devices. ACM Transactions on Computer-Human Interaction 1(1) (1994)
3–26
tems such as ViMiC[1]. These parameters form the basis for our examination of
the control of spatialization. Some examples of parameters include sound source
position and orientation, source presence, room size and reverb time[2].
A number of issues arise with the control of sound spatialization, which must
be dealt with in order to arrive at an effective system. These issues can arise
from the spatialization system itself, or from the gesture control system. In par-
ticular we are concerned with issues such as the continuous or discrete nature
of variables (whether control or spatialization variables), the resolution of con-
trol, the frequency of parameter updates, the integrality and separability[3] of
control and spatialization variables and the cognitive load placed on the per-
former by the system. Along with these general issues, there are a number of
issues which specifically relate to the controllers for spatialization and which
must be examined when designing new gesture control systems. These include
the use of absolute or relative positioning systems, the use of current or ballistic
control, whether the sensing methods used have a return-to-zero feature or not,
the choice of isometric or isotonic sensing, and the provision of feedback to the
performer.
One example of a system which we have implemented to allow for control
of spatialization is that of a pressure-sensitive floor, which is used to control
spatialization parameters using the position of the performer’s center of mass.
Motion capture of cello performance has shown that this is a continous, slowly
varying parameter, primarily in one dimension. This makes it suitable for con-
trolling single spatialization parameters which do not need to change quickly
over the course of the performance. While this control parameter could be used
to change the position of a sound source, it would result only in a sound which
moves slowly from side to side in a repetitive manner over the course of the piece,
which may not make for an interesting or useful effect. Also, in such a case, if
the performer were to try and use this interface to deliberately steer a sound
this would result in extra loading on the performer which might distract from
their performance. On the other hand, using this control to manipulate a sound
source parameter such as brightness allowed control over the system without
requiring as much thought on the part of the performer. This presentation will
include examples of other control systems which we have developed to allow for
gesture controlled sound spatialization.
References
1. Braasch, J.: A loudspeaker-based 3d sound projection using virtual microphone
control (vimic). In: Convention of the Audio Eng. Soc. 118. (2005)
2. Marshall, M., Malloch, J., Wanderley, M.: Comparison of controls in available
spatialization systems. Technical Report MUMT-IDMIL-07-02, McGill University
(2007)
3. Jacob, R., Sibert, L., McFarlane, D., Mullen, M.: Integrality and separability of
input devices. ACM Transactions on Computer-Human Interaction 1(1) (1994)
3–26
Page 85
Author Index
Akarun, Lale 42
Aran, Oya 46
Aubry, Matthieu 74
Ballsun-Stanton, Brian 50
Bastos, Rafael 32, 54
Bernardin, Delphine 62
Bevilacqua, Frederic 62
Bolot, Laurence 44
Boulic, Ronan 22
Braffort, Annelies 44
Burger, Thomas 46
Camurri, Antonio 52, 64, 66
Canepa, Corrado 52
Caplier, Alice 46
Chen, Xilin 38
Collet, Christophe 34
Cotrim, Luisa 72
Cowie, Roddy 64
Dalle, Patrice 34
Dias, Miguel 32, 54
Fernandes, João 54
Ferreira, Teresa 72
Filhol, Michael 44
Galhano-Rodrigues, Isabel 24
Gamboa, Ricardo 78
Gao, Wen 38
Akarun, Lale 42
Aran, Oya 46
Aubry, Matthieu 74
Ballsun-Stanton, Brian 50
Bastos, Rafael 32, 54
Bernardin, Delphine 62
Bevilacqua, Frederic 62
Bolot, Laurence 44
Boulic, Ronan 22
Braffort, Annelies 44
Burger, Thomas 46
Camurri, Antonio 52, 64, 66
Canepa, Corrado 52
Caplier, Alice 46
Chen, Xilin 38
Collet, Christophe 34
Cotrim, Luisa 72
Cowie, Roddy 64
Dalle, Patrice 34
Dias, Miguel 32, 54
Fernandes, João 54
Ferreira, Teresa 72
Filhol, Michael 44
Galhano-Rodrigues, Isabel 24
Gamboa, Ricardo 78
Gao, Wen 38
Page 87
Perales, Francisco J. 28
Ramos, Fernando 72
Rasamimanana, Nicolas 62
Reinders, Marcel 30
Roh, Myung-Cheol 16
Ruttkay, Zsofi 20
Santos, Pedro 54
Schull, Jon 50
Sousa, Álvaro 72
Tavares, João 54
ten Holt, Gineke 30
van Welbergen, Herwin 20
Varona, Javier 28
Vatavu, Radu Daniel 10
Volpe, Gualtiero 22, 52, 66
Wanderley, Marcelo 62, 82
Wang, Chunli 38
Ramos, Fernando 72
Rasamimanana, Nicolas 62
Reinders, Marcel 30
Roh, Myung-Cheol 16
Ruttkay, Zsofi 20
Santos, Pedro 54
Schull, Jon 50
Sousa, Álvaro 72
Tavares, João 54
ten Holt, Gineke 30
van Welbergen, Herwin 20
Varona, Javier 28
Vatavu, Radu Daniel 10
Volpe, Gualtiero 22, 52, 66
Wanderley, Marcelo 62, 82
Wang, Chunli 38
Sign up today - FREE
Mendeley saves you time finding and organizing research. Learn more
- All your research in one place
- Add and import papers easily
- Access it anywhere, anytime
Start using Mendeley in seconds!
Readership Statistics
2 Readers on Mendeley
by Discipline
50% Engineering
50% Philosophy
by Academic Status
50% Lecturer
50% Ph.D. Student
by Country
50% China
50% Australia


