SlideShare a Scribd company logo
1 of 18
A Common Gesture and Speech Production Framework for
              Virtual and Physical Agents
     Quoc Anh Le - Jing Huang - Catherine Pelachaud
                        CNRS, LTCI
               Telecom-ParisTech, France



    Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA
Introduction
      Motivations

       • Similar approaches between virtual agents and
         humanoid robots
       • Limits of existing systems: agent dependent
      Objectives

       • Common co-verbal gesture generation framework for
         both virtual and physical agents
      Methodologies

       • Based on GRETA system
       • Use
          - same representation languages
          - same algorithm for selecting and planning gestures
          - different algorithms for creating the animation
page 2
Architecture Overview
                                                     Intent Lexicon                          Behavior Lexicon

    Input Data (text, audio,                        Baselines for Nao                        Gestuary for Nao
          video, etc)                             Baselines for Greta                        Gestuary for Greta




          Intent Planner                          Behavior Planner                           Behavior Realizer
         (Common Module)                          (Common Module)                            (Common Module)
           FML-                            FML-
                                                             BML                       BML         Keyframes
           APML                            APML

                                                        ActiveMQ
                                              Messaging Central System

                               Keyframes                           Keyframes
FAP-BAP      FAP-BAP                                                                             Joint         Nao Built-in
 Player       Values            Animation Realizer                    Animation Realizer         Values        Proprietary
                                (Specific Module)                       (Specific Module)                      Procedures




                                 Greta                                  Nao
                               Animation Lexicon                      Animation Lexicon



page 3
Behavior Realizer
                                                     Intent Lexicon                          Behavior Lexicon
                                                                                              Behavior Lexicon

    Input Data (text, audio,                        Baselines for Nao                        Gestuary for Nao
          video, etc)                             Baselines for Greta                        Gestuary for Greta




          Intent Planner                          Behavior Planner                           Behavior Realizer

         (Common Module)                          (Common Module)                            (Common Module)
           FML-                            FML-
                                                             BML                       BML         Keyframes
           APML                            APML




                               Keyframes                           Keyframes
FAP-BAP      FAP-BAP                                                                             Joint         Nao Built-in
 Player       Values            Animation Realizer                    Animation Realizer         Values        Proprietary
                                (Specific Module)                       (Specific Module)                      Procedures




                                 Greta                                  Nao
                               Animation Lexicon                      Animation Lexicon



page 4
Behavior Realizer: Outline

      Common          processes to all agents
         1.   Create gesture from the gestuary of an agent
         2.   Schedule timing of gesture phases
         3.   Generate keyframes: pair (absolute time, symbolic
              description of hand configuration at this time)
      Different      databases
             For Nao
                 Gestuary (for instance, pointing with full stretch arm)
                 Velocity profile (empirically determined from Nao)
             For Greta
                 Gestuary (for instance, pointing with one finger)
                 Velocity profile (empirically determined from real humans)


page 5
Example: Different pointing gestures
                                                              <bml id=“bml1” >
Nao Gestuary
..
                                                                 <speech xmlns="" id="s1" start="0">
                                                                   <text>It is <sync id=« tm1 »/> overthere! <sync id=« tm2 »/>                BML                                Greta Gestuary
                                                                                                                                                                                  ..
                                                                 </speech>
<gesture id=« pointing »>                                        <gesture id=« g1 » lexeme=« pointing » start=«s1:tm1» end=«s2:tm2»>                                              <gesture id=« pointing »>
<phase type=« stroke »>
 <vertical>YUpperP</vertical>            1                           <description priority=« 1 » type=«GRETA»>
                                                                              <GRETA:SPC>0.80</GRETA:SPC>
                                                                             <GRETA:TMP>0.50</GRETA:TMP>
                                                                                                                                                                              1   <phase type=« stroke »>
                                                                                                                                                                                   <vertical>YP</vertical>
 <horizontal>XEP</horizontal>                                                <GRETA:FLD>-0.62</GRETA:FLD>                                                                          <horizontal>XP</horizontal>
 <distance>XFar<distance>                                                    <GRETA:PWR>0.30</GRETA:PWR>                                                                           <distance>XMiddle<distance>
 <hShape>OPEN</hShape>                                                       <GRETA:REP>0.00</GRETA:REP>                                                                           <hShape>INDEX</hShape>
                                                                             <GRETA:OPE>1.00</GRETA:OPE>
</phase>                                                                     <GRETA:TEN>0.20</GRETA:TEN>                                                                          </phase>
</gestures>                                                          </description>                                                                                               </gestures>
…                                                                </gesture>                                                                                                       …
                                                              </bml>




                                                                               2, 3                                                                  2,3
                                <keyframe 1 (time, description)>                                                                       <keyframe 1 (time, description)>
                                <keyframe 2 (time, description)>                                                                       <keyframe 2 (time, description)>
                                …                                                                                                      …
                                <keyframe N (time, description)>                                                                       <keyframe N (time, description)>




                                                                   4                                                                                                      4
                                            JOINT VALUES                                                                                                       BAP




       page 6
BR: Synchronization with speech

          Algorithm
          • Compute preparation phase
          • Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i)
            +duration)

          • Add a hold phase to fit gesture planned duration
          • Co-articulation between several gestures
            - If enough time, retraction phase (ie go back to rest position)


               Start                 end   Start                end
            - Otherwise, go from end of stroke to preparation phase of next
              gesture
                           S-start     S-end       S-start   S-end


                                                                  end
                  Start
page 7
BR: Velocity profiles

          Gesture   velocity
          • Predict a movement duration using Fitts’ law:
             • Movement Time = a+b*log2(Distance+1)
          • Threshold of maximal speeds (empirically determined)
          • Stroke phase is different from other phases in velocity and
            acceleration (Quek, 1995)
          Add   expressivity
              • Temportal extent (TMP): Modulate the duration of whole gesture
                => change coefficient of Fitts’ Law




page 8
BR: Build coefficients of Fitts’ law




page 9
Animation Realizer
                                                     Intent Lexicon                          Behavior Lexicon

    Input Data (text, audio,                        Baselines for Nao                        Gestuary for Nao
          video, etc)                             Baselines for Greta                        Gestuary for Greta




          Intent Planner                          Behavior Planner                           Behavior Realizer
      (Common Module)                             (Common Module)                            (Common Module)
           FML-                            FML-
                                                             BML                       BML         Keyframes
           APML                            APML




                               Keyframes                           Keyframes
             FAP-BAP                                                                             Joint
              Values            Animation Realizer                    Animation Realizer         Values
                                (Specific Module)                       (Specific Module)




                                 Greta                                  Nao
                               Animation Lexicon                      Animation Lexicon



page 10
Implemented expressivity parameters
EXP               Definition                       Nao                        Greta
TMP       Velocity of movement         Change coefficient of Fitts’   Change coefficient of
                                       law                            Fitts’ law
SPC       Amplitude of movement        Limited in predefined key      Change gesture
                                       positions                      space scales
PWR       Acceleration of              Modulate stroke duration       Modulate stroke
          movement                                                    acceleration
REP       Number of stroke             Yes                            Yes
          repetition times
FLD       Smoothness and               No                             No
          Continuity
OPN       Relative spatial extent to   No                             elbow swivel angle
          body
TEN       Muscular tension             No                             No

   Create animation parameters
         Joint values for Nao
         BAP values for Greta
    page 11
Create animation parameters
         Descritization of the gestural space of McNeill (1992)
         One symbolic position will be translated into concrete values of agent joints (for
          instance 6 joints of Nao as table below)
            Code   ArmX   ArmY       ArmZ      Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand)

            000    XEP    YUpperEP   ZNear     (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0)
            001    XEP    YUpperEP   ZMiddle   (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0)
            002    XEP    YUpperEP   ZFar      (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0)
            010    XEP    YUpperP    ZNear     (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0)
            ...    ...    ...        ...       ...



         Translate symbolic keyframes in joint values
         Animation is obtained by interpolating between
             joint values with robot built-in proprietary procedures
             use Slerp (spherical linear interpolation) with time warping: easing in out
              functionsfor Greta



page 12
Greta: Full Body IK
                                                 Torso IK




                                          Analytic Method: Arm To Torso




     Torso target depending on hand position

page 13
Demo: Greta




page 14
Demo: Nao




page 15
Perceptive Evaluation
         Objective
          • Evaluate how robot’s gestures are perceived by human users
         Procedure
          • Participants (63 French speakers) rate videos of Nao
            storyteller
          • Random displayed versions to the participants:
          - Gestures with expressivity VS. Gestures without expressivity
          - Gesture-speech synchronization VS. Gesture-speech asynchronization
         Results (using the ANOVA method)
          • Synchronization:
          - F(1, 124) = 4.94, p < .05
          - 76% agreed that gestures were synchronized with speech for sync version
          • Expressivity:
          - F(1, 124) = 4.43, p < .05
          - 70% agreed that gestures were expressive for expressivity version
page 16
State of the art
         Most similar work: Salem et al. (2012)
          • Same idea (based on existing Max virtual agent system)
         Main differences:
          • Our system: re-designed GRETA as a common framework
          • Salem et al.’s system: adjusted Max’s ACE to ASIMO robot

          Features             Our model                 Salem et al.’s system

 Gesture Product     Online from templates        Automatically generated from trained
                     regardless specific domain   specified domain data corpus
 Gesture Shapes      Agent specific parameter     Original for Max and mapped to
                                                  ASIMO configurations

 Gesture Timing      Agent specific parameter     Original for Max and adapted to
                                                  ASIMO by feedback
 Expressivity        Yes                          No
 Synchronization     Adapt gesture to speech      Cross-Modal Adjustment



page 17
Future works

       Short-term   plan
        • Human like gestures: enhance velocity profiles
        • Expressivity: implement fluidity and tension
       Long-term plan

        • Feedback mechanism
        • Study of the coherence between consecutive
          gestures in a G-Unit (Kendon, 2004)




page 18

More Related Content

Viewers also liked

فيتامين واو
فيتامين واوفيتامين واو
فيتامين واوkininaful
 
EventRegist(イベントレジスト)概要
EventRegist(イベントレジスト)概要EventRegist(イベントレジスト)概要
EventRegist(イベントレジスト)概要EventRegist Co., Ltd.
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012Lê Anh
 
Người Ảo
Người ẢoNgười Ảo
Người ẢoLê Anh
 
Cahier de charges
Cahier de chargesCahier de charges
Cahier de chargesLê Anh
 
Automatic vs. human question answering over multimedia meeting recordings
Automatic vs. human question answering over multimedia meeting recordingsAutomatic vs. human question answering over multimedia meeting recordings
Automatic vs. human question answering over multimedia meeting recordingsLê Anh
 
Lap trinh java hieu qua
Lap trinh java hieu quaLap trinh java hieu qua
Lap trinh java hieu quaLê Anh
 

Viewers also liked (8)

فيتامين واو
فيتامين واوفيتامين واو
فيتامين واو
 
EventRegist(イベントレジスト)概要
EventRegist(イベントレジスト)概要EventRegist(イベントレジスト)概要
EventRegist(イベントレジスト)概要
 
ACM ICMI Workshop 2012
ACM ICMI Workshop 2012ACM ICMI Workshop 2012
ACM ICMI Workshop 2012
 
Diftong
DiftongDiftong
Diftong
 
Người Ảo
Người ẢoNgười Ảo
Người Ảo
 
Cahier de charges
Cahier de chargesCahier de charges
Cahier de charges
 
Automatic vs. human question answering over multimedia meeting recordings
Automatic vs. human question answering over multimedia meeting recordingsAutomatic vs. human question answering over multimedia meeting recordings
Automatic vs. human question answering over multimedia meeting recordings
 
Lap trinh java hieu qua
Lap trinh java hieu quaLap trinh java hieu qua
Lap trinh java hieu qua
 

Similar to ICMI 2012 Workshop on gesture and speech production

SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!melbats
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroYosuke Matsusaka
 
Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Lê Anh
 
Casing3d opengl
Casing3d openglCasing3d opengl
Casing3d openglgowell
 
Florian adler minute project
Florian adler   minute projectFlorian adler   minute project
Florian adler minute projectDmitry Buzdin
 

Similar to ICMI 2012 Workshop on gesture and speech production (7)

SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!SiriusCon 2015 - Breathe Life into Your Designer!
SiriusCon 2015 - Breathe Life into Your Designer!
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
 
Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)Affective Computing and Intelligent Interaction (ACII 2011)
Affective Computing and Intelligent Interaction (ACII 2011)
 
Casing3d opengl
Casing3d openglCasing3d opengl
Casing3d opengl
 
Cascon2011_5_rules+owl
Cascon2011_5_rules+owlCascon2011_5_rules+owl
Cascon2011_5_rules+owl
 
Florian adler minute project
Florian adler   minute projectFlorian adler   minute project
Florian adler minute project
 
2
22
2
 

More from Lê Anh

Spark docker
Spark dockerSpark docker
Spark dockerLê Anh
 
Presentation des outils traitements distribues
Presentation des outils traitements distribuesPresentation des outils traitements distribues
Presentation des outils traitements distribuesLê Anh
 
Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Lê Anh
 
Lequocanh
LequocanhLequocanh
LequocanhLê Anh
 
These lequocanh v7
These lequocanh v7These lequocanh v7
These lequocanh v7Lê Anh
 
Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Lê Anh
 
Poster WACAI 2012
Poster WACAI 2012Poster WACAI 2012
Poster WACAI 2012Lê Anh
 
Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lê Anh
 
IEEE Humanoids 2011
IEEE Humanoids 2011IEEE Humanoids 2011
IEEE Humanoids 2011Lê Anh
 
ACII 2011, USA
ACII 2011, USAACII 2011, USA
ACII 2011, USALê Anh
 
Mid-term thesis report
Mid-term thesis reportMid-term thesis report
Mid-term thesis reportLê Anh
 
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotJournée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotLê Anh
 

More from Lê Anh (12)

Spark docker
Spark dockerSpark docker
Spark docker
 
Presentation des outils traitements distribues
Presentation des outils traitements distribuesPresentation des outils traitements distribues
Presentation des outils traitements distribues
 
Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013Final report. nguyen ngoc anh.01.07.2013
Final report. nguyen ngoc anh.01.07.2013
 
Lequocanh
LequocanhLequocanh
Lequocanh
 
These lequocanh v7
These lequocanh v7These lequocanh v7
These lequocanh v7
 
Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam Applying Computer Vision to Traffic Monitoring System in Vietnam
Applying Computer Vision to Traffic Monitoring System in Vietnam
 
Poster WACAI 2012
Poster WACAI 2012Poster WACAI 2012
Poster WACAI 2012
 
Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)Lecture Notes in Computer Science (LNCS)
Lecture Notes in Computer Science (LNCS)
 
IEEE Humanoids 2011
IEEE Humanoids 2011IEEE Humanoids 2011
IEEE Humanoids 2011
 
ACII 2011, USA
ACII 2011, USAACII 2011, USA
ACII 2011, USA
 
Mid-term thesis report
Mid-term thesis reportMid-term thesis report
Mid-term thesis report
 
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-RobotJournée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
Journée Inter-GDR ISIS et Robotique: Interaction Homme-Robot
 

Recently uploaded

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusZilliz
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxRemote DBA Services
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Bhuvaneswari Subramani
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 

Recently uploaded (20)

ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 

ICMI 2012 Workshop on gesture and speech production

  • 1. A Common Gesture and Speech Production Framework for Virtual and Physical Agents Quoc Anh Le - Jing Huang - Catherine Pelachaud CNRS, LTCI Telecom-ParisTech, France Workshop on Speech and Gesture Production, ICMI 2012, Santa Monica, CA, USA
  • 2. Introduction  Motivations • Similar approaches between virtual agents and humanoid robots • Limits of existing systems: agent dependent  Objectives • Common co-verbal gesture generation framework for both virtual and physical agents  Methodologies • Based on GRETA system • Use - same representation languages - same algorithm for selecting and planning gestures - different algorithms for creating the animation page 2
  • 3. Architecture Overview Intent Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML ActiveMQ Messaging Central System Keyframes Keyframes FAP-BAP FAP-BAP Joint Nao Built-in Player Values Animation Realizer Animation Realizer Values Proprietary (Specific Module) (Specific Module) Procedures Greta Nao Animation Lexicon Animation Lexicon page 3
  • 4. Behavior Realizer Intent Lexicon Behavior Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML Keyframes Keyframes FAP-BAP FAP-BAP Joint Nao Built-in Player Values Animation Realizer Animation Realizer Values Proprietary (Specific Module) (Specific Module) Procedures Greta Nao Animation Lexicon Animation Lexicon page 4
  • 5. Behavior Realizer: Outline  Common processes to all agents 1. Create gesture from the gestuary of an agent 2. Schedule timing of gesture phases 3. Generate keyframes: pair (absolute time, symbolic description of hand configuration at this time)  Different databases  For Nao  Gestuary (for instance, pointing with full stretch arm)  Velocity profile (empirically determined from Nao)  For Greta  Gestuary (for instance, pointing with one finger)  Velocity profile (empirically determined from real humans) page 5
  • 6. Example: Different pointing gestures <bml id=“bml1” > Nao Gestuary .. <speech xmlns="" id="s1" start="0"> <text>It is <sync id=« tm1 »/> overthere! <sync id=« tm2 »/> BML Greta Gestuary .. </speech> <gesture id=« pointing »> <gesture id=« g1 » lexeme=« pointing » start=«s1:tm1» end=«s2:tm2»> <gesture id=« pointing »> <phase type=« stroke »> <vertical>YUpperP</vertical> 1 <description priority=« 1 » type=«GRETA»> <GRETA:SPC>0.80</GRETA:SPC> <GRETA:TMP>0.50</GRETA:TMP> 1 <phase type=« stroke »> <vertical>YP</vertical> <horizontal>XEP</horizontal> <GRETA:FLD>-0.62</GRETA:FLD> <horizontal>XP</horizontal> <distance>XFar<distance> <GRETA:PWR>0.30</GRETA:PWR> <distance>XMiddle<distance> <hShape>OPEN</hShape> <GRETA:REP>0.00</GRETA:REP> <hShape>INDEX</hShape> <GRETA:OPE>1.00</GRETA:OPE> </phase> <GRETA:TEN>0.20</GRETA:TEN> </phase> </gestures> </description> </gestures> … </gesture> … </bml> 2, 3 2,3 <keyframe 1 (time, description)> <keyframe 1 (time, description)> <keyframe 2 (time, description)> <keyframe 2 (time, description)> … … <keyframe N (time, description)> <keyframe N (time, description)> 4 4 JOINT VALUES BAP page 6
  • 7. BR: Synchronization with speech  Algorithm • Compute preparation phase • Do not perform gesture if not enough time (strokeEnd(i-1) > strokeStart(i) +duration) • Add a hold phase to fit gesture planned duration • Co-articulation between several gestures - If enough time, retraction phase (ie go back to rest position) Start end Start end - Otherwise, go from end of stroke to preparation phase of next gesture S-start S-end S-start S-end end Start page 7
  • 8. BR: Velocity profiles  Gesture velocity • Predict a movement duration using Fitts’ law: • Movement Time = a+b*log2(Distance+1) • Threshold of maximal speeds (empirically determined) • Stroke phase is different from other phases in velocity and acceleration (Quek, 1995)  Add expressivity • Temportal extent (TMP): Modulate the duration of whole gesture => change coefficient of Fitts’ Law page 8
  • 9. BR: Build coefficients of Fitts’ law page 9
  • 10. Animation Realizer Intent Lexicon Behavior Lexicon Input Data (text, audio, Baselines for Nao Gestuary for Nao video, etc) Baselines for Greta Gestuary for Greta Intent Planner Behavior Planner Behavior Realizer (Common Module) (Common Module) (Common Module) FML- FML- BML BML Keyframes APML APML Keyframes Keyframes FAP-BAP Joint Values Animation Realizer Animation Realizer Values (Specific Module) (Specific Module) Greta Nao Animation Lexicon Animation Lexicon page 10
  • 11. Implemented expressivity parameters EXP Definition Nao Greta TMP Velocity of movement Change coefficient of Fitts’ Change coefficient of law Fitts’ law SPC Amplitude of movement Limited in predefined key Change gesture positions space scales PWR Acceleration of Modulate stroke duration Modulate stroke movement acceleration REP Number of stroke Yes Yes repetition times FLD Smoothness and No No Continuity OPN Relative spatial extent to No elbow swivel angle body TEN Muscular tension No No  Create animation parameters  Joint values for Nao  BAP values for Greta page 11
  • 12. Create animation parameters  Descritization of the gestural space of McNeill (1992)  One symbolic position will be translated into concrete values of agent joints (for instance 6 joints of Nao as table below) Code ArmX ArmY ArmZ Joint values (LShoulderPitch, LShoulderRoll, LElbowYaw, LElbowRoll, LWristYaw, Hand) 000 XEP YUpperEP ZNear (-54.4953, 22.4979, -79.0171, -5.53477, -0.00240423, 1.0) 001 XEP YUpperEP ZMiddle (-65.5696, 22.0584, -78.7534, -8.52309, -0.178188, 1.0) 002 XEP YUpperEP ZFar (-79.2807, 22.0584, -78.6655,-8.4352, -0.178188, 1.0) 010 XEP YUpperP ZNear (-21.0964, 24.2557, -79.4565, -26.8046, 0.261271, 1.0) ... ... ... ... ...  Translate symbolic keyframes in joint values  Animation is obtained by interpolating between  joint values with robot built-in proprietary procedures  use Slerp (spherical linear interpolation) with time warping: easing in out functionsfor Greta page 12
  • 13. Greta: Full Body IK Torso IK Analytic Method: Arm To Torso Torso target depending on hand position page 13
  • 16. Perceptive Evaluation  Objective • Evaluate how robot’s gestures are perceived by human users  Procedure • Participants (63 French speakers) rate videos of Nao storyteller • Random displayed versions to the participants: - Gestures with expressivity VS. Gestures without expressivity - Gesture-speech synchronization VS. Gesture-speech asynchronization  Results (using the ANOVA method) • Synchronization: - F(1, 124) = 4.94, p < .05 - 76% agreed that gestures were synchronized with speech for sync version • Expressivity: - F(1, 124) = 4.43, p < .05 - 70% agreed that gestures were expressive for expressivity version page 16
  • 17. State of the art  Most similar work: Salem et al. (2012) • Same idea (based on existing Max virtual agent system)  Main differences: • Our system: re-designed GRETA as a common framework • Salem et al.’s system: adjusted Max’s ACE to ASIMO robot Features Our model Salem et al.’s system Gesture Product Online from templates Automatically generated from trained regardless specific domain specified domain data corpus Gesture Shapes Agent specific parameter Original for Max and mapped to ASIMO configurations Gesture Timing Agent specific parameter Original for Max and adapted to ASIMO by feedback Expressivity Yes No Synchronization Adapt gesture to speech Cross-Modal Adjustment page 17
  • 18. Future works  Short-term plan • Human like gestures: enhance velocity profiles • Expressivity: implement fluidity and tension  Long-term plan • Feedback mechanism • Study of the coherence between consecutive gestures in a G-Unit (Kendon, 2004) page 18

Editor's Notes

  1. Schedule Mechanisme Such as Account Realize Obtain /ob chen/ Architecture /ar ki tec tro/ Exchange /ex s change z/ Twice / wi so/ Table /ta ble/ Creating /cre et ting/ Message /me se/ Virtual /vir tu al/
  2. donnes une description des keyframes que contiennent-elles comme information
  3. rajouter les definitions manquantes “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions) Expressive Posture: Volume Editing Power parameter: torso relative rotation varies with time and gesture target positions due to inertia Expressive Animated Sequence: Sequential Editing “ fluidity” and “tension” using TCB spline and noise functions(for trajectory) “ Power”: acceleration simulation through slerp (frame interpolation) or trajectory interpolation: use of time variation functions (easing in out functions)
  4. Joint rotation interpolation: use Slerp (spherical linear interpolation) with time warping: easing in out functions. Definition of trajectory parameters: Various trajectory paths: line, circle, spiral, etc. Expressivity: Kochanek Bartels splines(TCB splines)
  5. For posture generation, we use Forward kine. FK defines the initial states; the IK retargets the postures. Relative torso movement is first generated by using potential torso target depending on both hand gestures positions. (vt1, vl5) We decompose torso movement into horizontal and vertical movements, it depends on the center of both hands targets, we solve it directly by analytical method. Head direction is generated by FK, and trigonometric function for gaze. For Arm gesture we use a mass spring solver, which can apply light weight shoulder movements by defining arm chain from sternoclavicular till wrist. It allows us to model passive shoulder movement
  6. The system of Salem et al. produce gesture parameters &gt; potentially result in mistimed synchronization with speech affiliate due to physical joint velocity limits Max: Gesture shapes are designed for virtual agent &gt; Mapping solution
  7. Long-term plan: Mutual synchronization: Adapting phoneme duration to gestures