ALIAS WP3 Results

WP 3 Presentation:
Dialogue Manager
Jürgen Geiger

Overview
• Goals
• Achievements
• Open Questions
• List of Publications
04.06.2013 WP 3 Presentation 2

Goals
• Dialogue Manager
– Back-end for HMI
– Control all other modules
• Applications: Games, Reading service, …
• Physiological Monitoring

Tasks
T3.1 User identification via speech or face recognition
T3.2 Knowledge representation
T3.3 Development of a dialogue system
T3.4 Development and Integration of a game collection
T3.5 Web 2.0 wrapper for web services
T3.6 Integration of further software modules
T3.7 Adaptable behaviour of the robot platform
T3.8 Integration of natural language understanding
T3.9 Physiological monitoring
T3.10 Integration of the physiological monitoring into the dialogue manager

Deliverables
Name Due
D3.1 Report on the dialogue manager concept 09/2010
D3.2 Knowledge databases 04/2011
D3.3 Identification System (face & voice) 01/2011
D3.4 Prototype of dialogue manager 04/2011
D3.5 Physiological Monitoring (PM) 02/2011
D3.6 Dialogue system with integrated PM 06/2011
D3.7 Dialogue system updated to user‘s needs 05/2012
D3.8 Final dialogue system with integrated PM 02/2013

Achievements
• Dialogue Manager
– Control of all other modules
– Natural language understanding
• Software modules
– Physiological Monitoring
– User Identification
• Adaptable behaviour
– Emotions
• Physiological Monitoring

Dialogue Manager: Overview (T3.3, D3.1, D3.4, D3.7, D3.8)
• Central component of the ALIAS robot („brain“)
– Reproduces the basic mechanisms of human thinking
– Decides on the behavior of the robot
– Communicates with all other modules
Hello
Robot!
TTS
Face Detect
ASR
Robot Control
GUI
Touch Screen
DM Core
Situation Model
Action
Input
CES
Understanding
Physio Monitor

Dialogue Manager: Overview
• Components
– DM-Core („Brain“)
• NLU-Engine understands human
verbal messages
• Decision-Engine decides on the
behavior
• Based on conceptual event
representations (human thinking)
– DM-Communicator
• Communicates with sensing and
acting modules
• Translates between modules and
DM-Core

Natural Language Understanding
• NLU-Engine (T3.8, D3.2)
Based on Cognesys CES technology
Extracts and processes the conceptual meaning of
verbal messages
Resistent to syntactically or grammatically degraded
informations
Uses knowledge and current situation to identify and
check the practicability of identified statements
• NLU-Knowledge Database (T3.2, D3.2)
World knowledge: understands the world in general,
simulates human memory
Expert knowledge: understands the world of elderly
people and depends on the robots functionality

Acting and Behavior (T3.8, D3.2)
• Decision-Engine
Based on Cognesys CES technology
Processes conceptual event representations
like humans do
Uses a situation model like human memory
Situation model
• Represents the currently relevant objects and
their states and modalities
• Represents history of events that constitutes the
current situation
Proactive behavior
• Example: inform the user about new mails, invites the
user to stay in contact with its relatives

Dialogue Management (T3.6)
• ASR Adapter
– Receives spoken user input as text
– The NLU-Engine processes the text
• GUI Adapter
– Controls the GUI, processes user input
• Menus
• Games, TV, audio books, email …
• Skype call and alarm call control flow
– Synchronizes the GUI menus with BCI masks
• BCI Adapter
– Controls the Brain Computer Interface masks
– Processes user inputs

Dialogue Management
• TTS Adapter
– Sends text to be spoken to the
Text-To-Speech module
• RD Adapter
– Interface to the robots low-level-controlers
– Controls navigation and movement behavior
– Controls the robots head emotions
– Receives speaker ident information

User identification: speech (T3.1, D3.3)
• Research aspects
– Speaker diarization
– Overlap detection
– Speech activity detection
• Implementation for the robot

Research aspects
• Speaker diarization
– „Who speaks when?“
– Utilise the output of a speech transcription system to suppress
linguistic variation
• Overlap detection
– Overlapping speech degrades performance
– Detect & handle overlap
• Voice activity detection

Speaker Recognition : Implementation
• Integrated with DM
• Running permanently
• DM receives name of
speaker
• Used during TTS output
– To call the user by his name

User Identification: Face (T3.1, D3.3)
• Omnidirectional camera
• Viola & Jones algorithm for face detection
• Fusion with laser-based leg pair detection
• Face identification using Eigenfaces
• Keep eye contact with user

Gaming with Speech Control (T3.4, D3.8)
• Control game via ASR
• Noughts and crosses
• AI to control computer player
• Touchscreen control also
possible

Reading Service (T3.5, D3.8)
• Customised GUI
• Based on open-source software
• Functionality:
– Read out e-books
– Recognition from camera

Display of Emotions (T3.7, D3.8)
• Can ALIAS display emotions?
• 5 basic emotions (Disgust, Fear, Joy, Sadness, Surprise)
• Integrated into Dialogue System
Disgust Neutral Sadness

Physiological Monitoring (T3.9, T3.10, D3.5, D3.6)
• Vital function monitoring system
• Recording, saving, display of vital function data
– Manual data input
– Data input directly by sensors
• Alarm function for suspicious data values

Open questions
• Personal data: storage and
usage
– Person ID, physiological monitoring
– Who gets access?
• Learning how to use the robot
– Self-explanatory system
– Systems adapts to the user
• Tablet PC?

Selected Publications
• J. Geiger, M. Hofmann, B.Schuller and G. Rigoll: "Gait-based Person Identification by Spectral, Cepstral and Energy-
related Audio Features," ICASSP 2013
• J. Geiger, T. Leykauf, T. Rehrl, F. Wallhoff, G. Rigoll: "The Robot ALIAS as a Gaming Platform for Elderly Persons," AAL-
Kongress 2013
• J. Geiger, I. Yenin, T. Rehrl, F. Wallhoff, G. Rigoll: "Display of Emotions with the Robotic Platform ALIAS", AAL-Kongress
2013
• T. Rehrl, J. Geiger, M. Golcar, S. Gentsch, J. Knobloch, G. Rigoll: "The Robot ALIAS as a Database for Health Monitoring
for Elderly People," AAL-Kongress 2013
• T. Rehrl, R. Troncy, A. Bley, S. Ihsen, K. Scheibl, W. Schneider, S. Glende, S. Goetze, J. Kessler, C. Hintermueller, and F.
Wallhoff: “The Ambient Adaptable Living Assistant is Meeting its Users,“ AAL-Forum 2012
• T. Rehrl, J. Blume, A. Bannat, G. Rigoll, and F. Wallhoff: “On-line Learning of Dynamic Gestures for Human-Robot
Interaction,“ KI 2012
• J. Geiger, R. Vipperla, S. Bozonnet, N. Evans, B. Schuller, G. Rigoll: " Convolutive Non-Negative Sparse Coding and New
Features for Speech Overlap Handling in Speaker Diarization", INTERSPEECH 2012
• R. Vipperla, J. Geiger, S. Bozonnet, D. Wang, N. Evans, B. Schuller, G. Rigoll: "Speech Overlap Detection and Attribution
Using Convolutive Non-Negative Sparse Coding", ICASSP 2012
• J. Geiger, M. Lakhal, B. Schuller, and G. Rigoll: “Learning new acoustic events in an HMM-based system using MAP
adaptation,“ INTERSPEECH 2011
• T. Rehrl, J. Blume, J. Geiger, A. Bannat, F. Wallhoff, S. Ihsen, Y. Jeanrenaud, M. Merten, B. Schönebeck, S. Glende, and
C. Nedopil: “ALIAS: Der anpassungsfähige Ambient Living Assistent,“ AAL-Kongress 2011

ALIAS WP3 Results

Recommandé

Recommandé

Contenu connexe

En vedette

En vedette (7)

Similaire à ALIAS WP3 Results

Similaire à ALIAS WP3 Results (20)

Dernier

Dernier (20)

ALIAS WP3 Results