Unleash Your Potential - Namagunga Girls Coding Club
Affective Computing and Intelligent Interaction (ACII 2011)
1. Expressive Gesture Model for a Humanoid Robot
Le Quoc Anh - Catherine Pelachaud
CNRS, LTCI
Telecom-ParisTech, France
Doctoral Consortium, ACII 2011, Memphis, USA
2. Objectives
Generate communicative gestures for Nao robot
• Integrated within an existing platform (GRETA)
• Scripts with a symbolic language
• Synchronization (gestures and speech)
• Expressivity of gestures
GVLEX project (Gesture & Voice for Expressive Reading)
• Robot tells a story expressively.
• Partners : LIMSI (linguistic aspects), Aldebaran (robotics),
Acapela (speech synthesis), Telecom ParisTech
(expressive gestures)
page 2 ACII 2011 Le Quoc Anh & Catherine Pelachaud
3. State of the art
Several initiatives recently: Salem et al. (2010), Holroyd et al.
(2011), Ng-Thow-Hing et al. (2010), Shi et al. (2010), Nozawa et
al. (2006).
• Motion scripts with MURML, BML, MPML-HR, etc
• Adapt gestures to speech (for synchronization)
• Mechanism for receiving and processing feedback from the
robot
• Gesture animation: no expressivity
Our system: Focus on expressivity and synchronization of
gestures with speech
page 3 ACII 2011 Le Quoc Anh & Catherine Pelachaud
4. Our methodology
Gesture describes with a symbolic language (BML)
Gestural expressivity (amplitude, fluidity, power,
repetition, speed, stiffness,…)
Elaboration of gestures from a storytelling video corpus
(Martin et al., 2009)
Execution of the animation by translating into joint values
page 4 ACII 2011 Le Quoc Anh & Catherine Pelachaud
5. Problem and Solution
Using a virtual agent framework to control a physical
robot raises several problems:
• Different degrees of freedom
• Limit of movement space and speed
Solution:
• Use the same representation language
- same algorithm for selecting and planning gestures
- different algorithm for creating the animation
• Elaborate one gesture repository for the robot and
another one for the Greta agent
• Gesture movement space and velocity specification
page 5 ACII 2011 Le Quoc Anh & Catherine Pelachaud
6. System Overview
WAV file
FML Behavior Planning BML Behavior Realizer
Animation Computation Animation Production
Lexicons Symbolic Description of Joint Values Instantiation
Gesture Phases
Lexicon for Nao
Temporal Infomation Interpolation Module
Lexicon for Greta
Behavior Planning: selects and plans gestures.
Behavior Realizer: schedules and creates gesture
animations.
page 6 ACII 2011 Le Quoc Anh & Catherine Pelachaud
7. Gesture Elaboration
• Annotation of gestures from a storytelling video
corpus from Martin et al. (2009)
Base of gesture elaboration in lexicons
• From gesture annotation to entries in Nao lexicon
• BML description of each gesture:
Gesture->Phases->Hands (wrist position, palm orientation, shape,...)
Only stroke phases are specified. Other phases will be generated
automatically by the system
page 7 ACII 2011 Le Quoc Anh & Catherine Pelachaud
8. Synchronization of gestures with speech
The stroke phase coincides or precedes
emphasized words of the speech (McNeill, 1992)
Gesture stroke phase timing specified by synch
points
page 8 ACII 2011 Le Quoc Anh & Catherine Pelachaud
9. Synchronization of gestures with speech
Algorithm
• Compute preparation phase
• Delete gesture if not enough time (strokeEnd(i-1) > strokeStart(i)+duration)
• Add a hold phase to fit gesture planned duration
• Coarticulation between several gestures
- If enough time, retraction phase (ie go back to rest position)
Start end Start end
- Otherwise, go from end of stroke to preparation phase of next
gesture S-start S-end S-start S-end
Start
end
page 9 ACII 2011 Le Quoc Anh & Catherine Pelachaud
10. Gesture expressivity
Spatial Extent (SPC): Amplitude of movement
Temporal Extent (TMP): Speed of movement
Power (PWR): Acceleration of movement
Fluidity (FLD): Smoothness and Continuity
Repetition (REP): Number of Stroke times
Stiffness (STF): Tension/Flexibility
page 10 ACII 2011 Le Quoc Anh & Catherine Pelachaud
11. Gesture expressivity
Spatial Extent (SPC): Amplitude of movement
Temporal Extent (TMP): Speed of movement
Power (PWR): Acceleration of movement
Fluidity (FLD): Smoothness and Continuity
Repetition (REP): Number of Stroke times
Stiffness (STF): Tension/Flexibility
page 11 ACII 2011 Le Quoc Anh & Catherine Pelachaud
12. Animation Computation & Execution
Schedule and plan gestures phases
Compute expressivity parameters
Translate symbolic descriptions into joint values
Execute animation
• Send timed key-positions to the robot using available
APIs
• Animation is obtained by interpolating between joint
values with robot built-in proprietary procedures.
page 12 ACII 2011 Le Quoc Anh & Catherine Pelachaud
13. Example
<bml> <gesture id=“beat_hungry” min_time="1.0" >
<speech id="s1" start="0.0“ <phase type="STROKE-START“>
vce=speaker=Antoine spd=180 <hand side=“BOTH">
Et le troisième dit tristement: <verticalLocation>YCC</verticalLocation>
<horizontalLocation>XCenter</horizontalLocation>
vce=speaker=AntoineSad spd=90 pau=200 <distanceLocation>Zmiddle</distanceLocation>
<tm id="tm1"/>J'ai très faim! <handShape>OPENHAND</handShape>
</speech> <palmOrientation>INWARD</palmOrientation>
<gesture id="beat_hungry" </hand>
start="s1:tm1" end=“start+1.5" stroke="0.5"> </phase>
<FLD.value>0</FLD.value> <phase type="STROKE-END“ >
<OAC.value>0</OAC.value> <hand side=“BOTH">
<PWR.value>-1.0</PWR.value> <verticalLocation>YLowerEP</verticalLocation>
<REP.value>0</REP.value> <horizontalLocation>XCenter</horizontalLocation>
<SPC.value>-0.3</SPC.value> <distanceLocation>ZNear</distanceLocation>
<TMP.value>-0.2</TMP.value> <handShape>OPEN</handShape>
</gesture> <palmOrientation>INWARD</palmOrientation>
</bml> </hand>
</phase>
</gesture>
animation[1]<phase="preparation", start-time=“Start", end-time="Ready", description of stroke-start's position>
animation[2] <phase="stroke", start-time="Stroke-start", end-time="Stroke-end", description of stroke-end's position>
animation[3]<phase="retraction", start-time="Relax", end-time="End", description of rest position>
page 13 ACII 2011 Le Quoc Anh & Catherine Pelachaud
15. Conclusion and future work
Conclusion
• A gesture model is designed, implemented for Nao while taking into
account physical constraints of the robot.
• Common platform for both virtual agent and robot
• Expressivity model
• Allows us to create gestures with different affective states and
personal style
Future work
• Build two repositories of gestures, one for Greta and another one
for NAO
• Improve expressivity and synch of gestures with speech
• Receive and process feedback from the robot
• Validate the model through perceptive evaluations
page 15 ACII 2011 Le Quoc Anh & Catherine Pelachaud
16. Acknowledgment
This work has been funded by the ANR GVLEX project
It is supported from members of the laboratory TSI,
Telecom-ParisTech
page 16 ACII 2011 Le Quoc Anh & Catherine Pelachaud
17. Gesture Specification
Gesture->Phases->Hands (wrist position, palm orientation, shape,...)
Only stroke phases are specified. Other phases will be generated
automatically by the system
1. <gesture id="greeting" category="ICONIC" min_time="1.0“ hand="RIGHT">
2. <phase type="STROKE-START" twohand="ASSYMMETRIC“ >
3. <hand side="RIGHT">
4. <vertical_location>YUpperPeriphery</vertical_location>
5. <horizontal_location>XPeriphery</horizontal_location>
6. <location_distance>ZNear</location_distance>
7. <hand_shape>OPEN</handshape>
8. <palm_orientation>AWAY</palm_orientation>
9. </hand>
10. </phase>
11. <phase type="STROKE-END" twohand="ASSYMMETRIC">
12. <hand side="RIGHT">
13. <vertical_location>YUpperPeriphery</vertical_location>
14. <horizontal_location>XExtremePeriphery</horizontal_location>
15. <location_distance>ZNear</location_distance>
16. <hand_shape>OPEN</handshape>
17. <palm_orientation>AWAY</palm_orientation>
18. </hand>
19.</phase>
20.</gesture> An example for greeting gesture
page 17 ACII 2011 Le Quoc Anh & Catherine Pelachaud
Notes de l'éditeur
Total: 18 minutes including questions => 15 minutes for the presentation Thank you very much, chairperson, for your kind introduction. My name is Le Quoc Anh, I am a PhD student from Paris where I work on an expressive gesture model for humanoid robots under the direction of Professor Catherine Pelachaud. Schedule Mechanisme Such as Account Realize Obtain /ob chen/ Architecture /ar ki tec tro/ Exchange /ex s change z/ Twice / wi so/ Table /ta ble/ Creating /cre et ting/ Message /me se/ Virtual /vir tu al/
Mancini ACII2011, expressivity can contribute to convey emotional contents from agent to users. The main objective of my work is to generate communicative gestures for the humanoid robot Nao while it is reading a story. For many years, we have developed a virtual agent, namely Greta that can communicate with human through voice and faces, hand gestures. We want to develop this framework to control non-behaviors of the Nao robot. From given communicative intentions, the GRETA system selects and plans corresponding gestures. The animation scripts are encoded with a symbolic language. My work is then focusing on creating gesture animations for the robot. In detail, the question is how to synchronize gestures with speech and render gestures with expressivities. The work takes place within the French national project GVLEX (it means gesture and voice for an expressive lecture). Its objective is to use the robot NAO to tell a story with expressive gestures to children. The project has four partners: LIMSI works on linguistic aspects, Aldebaran works on robotics (mechanism and an operating system of the robot NAO). Acapela works on speech synthesis (text to speech). And for us, Telecom ParisTech, we work on nonverbal behaviors in general and (especially) expressive gestures accompanying speech in particular.
Recently, several systems have been developed to create gestures for humanoid robots. For example, Salem et al. Base on the gesture engine of virtual agent system MAX to control gestures of the robot ASIMO. These systemes have some common characteristics. For example, they use symbolic languages to specify gestural scripts and the synchronisation of gestures and speech is done by adapting the gestures to the speech. Our system has some differences compared to the others. It follows the SAIBA framework, a standard architecture. In our system, the gesture lexicon is an external parameter that can be modfied to be adapted to a specific agent. The system focus on gestural expressivity also.
In our system, we use BML (a symbolic behavior representation language) to specify an animation script. The expressivity of gestures is translated into a set of expressivity parameters such as the speed of movement, power of movement, etc. We predefine a repository of gesture, called gesture lexicon. The elaboration of gestures is based on gesture annotations extracted from a storytelling video corpus. The system selects and plans gestures from lexicon and then realizes them. The animation is obtained by translating symbolic description of gestures into joint values of the robot. Feasible: It is due to physical constraints Animation is specified by scripts described with BML = gesture specification + descriptions BML
Using a virtual agent framework to control a physical robots raises several problems because the robot has certain physical constraints such as limit of movement space and speed. This is really important point. Our solution is to use the same representation language to control both agent systems (virtual and physical). So that we can use the same algorithm for selecting and planning gesture, but different algorithm for ereating the animation. Additionally, we plan to build a propre gesture database for the robot in which gesture movement space and velocity specification are predefined.
Now I would like to turn to the implemention section. As you can see, the system consists of two seperated modules. The first module, Behavior Planning selects and plans gestures corresponding to the given intentions encoded in the FML message. The second module, Behavior Realizer schedules the phases of gestures and creates gesture animations. In a bit more detail, the same ways for selecting and planning gestures are applied to agents but the method for producing animation is different for each agent. The next slide will talk about gesture lexicons.
Gestures are elaborated with the information of gesture annotations which are made from a storytelling video corpus. We use BML syntaxes to encode gestures. Following the observations of Kendon, a gesture can be devided into several phases (Preparation, Stroke and Retraction), in which the stroke phase is the most important that conveys the meaning of gesture. In the lexicon, only stroke phases are specified. Other phases will generated automatically by the system.
In order to synchronize gestures with speech, the stroke phase of gesture must happen at the same time of emphasized wods. In our system, the timing of gesture stroke phase is specified by synchonization points.
In detail, the system have to predict the duration of the preparation phase so that the agent know exactly when to start the gesture to be synchronized with the speech. In this step, the system verifies whether agent has enough time to do gesture. If not, the gesture have to be deleted. In an other case, if the gesture planned duration is too long, a hold phase should be added to make gesture more natural. The coarticulation between two consecutive gestures is doned by checking the available time between them. If enough time, retraction phase of the first gesture will be executed. Otherwise, the retraction phase is canceled and the hand moves from the end of stroke of the first gesture to the preparation of the second gesture.
Agents can do the same gestures with different ways. That depends on the affective states and personal types For exemple, if the agent is angry, he does hand movements faster and stronger. To make gestures more expressive, we define several parameters such as spatial extent, temporal extent, power, etc.
Agents can do the same gestures with different ways. That depends on the context, the personality, the expression of the agent. For exemple, if the agent is angry, he does hand movements faster and stronger. To make gestures more expressive, we define several parameters such as spatial extent, temporal extent, power, etc.
From the selected gestures, the system plans gestures phases while taking into account expressivity parameters. After that, the symbolic gesture description are translated into joint values and sent to the robot.
Let’s look at a concrete example of this. In the left, that is a description of beat gesture with sadness. The expressivity parameters are set so that the gesture will be done in the way weakly and slowly.
Now you can see the results of the system. http://www.youtube.com/watch?v=MSNHqmIMnpk As you can see,…
I have just presented an expressive gesture model designed and implemented for the humanoid robot NAO. This platform is used not only for the Nao robot but also for the virtual agent Greta. In this model allows us to create gestures with different emotional charateristics. In the near future, we will complete and validate the model through perceptive evaluations. (Each pair may be different in form but convey similar meaning ) Why need expressivity: The same gesture can be presented with different expressions. It depends on the conversational context, the current emotion and the character.
Thank you for your attention. I would be happy to try answer any questions you might have. Remarques: Pas clair d: travailler sur quelle partie du system grete pour nao Concentrer
Gestures in the lexicon are specified symbolically. Following the observation of Kendon, each gesture can be divided into several phases: preparation , stroke and retraction. In which the stroke phase is the most important that conveys the meaning of the gesture. In the lexicon, only the description of the stroke phase is specified with BML. The other phases will be generated automatically by the system. The stroke phase is formed by some key poses. Each key pose is described by the information of wrist position, palm orientation and hand shape. There is maybe some temporal constraints that are included in this description such as minimum time for executing a gesture.