This research aims to propose a multi-modal fusion framework for high-level data fusion between two or more modalities. It takes as input low level features extracted from dier-
ent system devices, analyses and identies intrinsic meanings in these data. Extracted meanings are mutually compared to identify complementarities, ambiguities and inconsistencies to better understand the user intention when interacting with the system. The whole fusion life cycle will be described
and evaluated in an OCE environment scenario, where two co-workers interact by voice and movements, which might show their intentions. The fusion in this case is focusing on combining modalities for capturing a context to enhance the user experience.
A Fusion Framework for Multimodal Interactive Applications
1. A Fusion Framework for Multimodal
Interactive Applications
Presented by: Hildeberto Mendonça
Jean-Yves Lionel Lawson
Olga Vybornova
Benoit Macq
Jean Vanderdonckt
ICMI-MLMI 2009 – Cambridge MA, USA, November 2-6, 2009
Special Session Fusion Engines for Multimodal Interfaces
November 3, 2009
2. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA2
Motivations
How to support multimodal fusion in order to
maximize reuse and minimize complexity?
If there is complexity on multimodal fusion it should
be about the fusion in itself
What already exists should be reused with minimal
adaptation
A general life cycle can guarantee a standard
treatment for each modality
3. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA3
Research Goal
To define and develop a multipurpose framework
for high level data fusion on multimodal
interactive applications
4. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA4
Fusion Principles
Type: Parallel + Combined = Synergistic
Each modality is endowed of meanings
Level: Feature (i.e. pattern extraction) + decision (i.e.
Recognized task)
Input Devices: Multiple
Notation: Defined by the developer
Ambiguity resolution: Defined by the developer
Time representation (Quantitative – Qualitative): Both
Application Type : The domain is defined using ontologies
5. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA5
Process
Recognition: identification of patterns on input signals.
Segmentation: delimitation of identified areas.
Meanings Extraction: deeper analysis to identify
meanings and correlations between segments according
to specific domains.
Annotation: formal description of segments through
domain concepts.
6. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA6
Process
The flow is fixed but it can start at any point
respecting the sequence.
Not fixed to any particular method. The method
is “plugged”.
Focus on good level of analysis, not on real
time processing.
10. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA10
Fusion Mechanism
Define a process for each modality and put them in parallel.
Data from each stage is buffered and processed together for the
purpose of fusion.
Agent-oriented: problem solved in a distributed fashion.
15. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA15
Scenario
Maybe I can find a
book about it in the
library
Ronald is moving
towards the book
shelves
16. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA16
Results
managed spatial relationships based on the fixed objects
in the room
made semantic fusion of events not coinciding in time
achieved good results in speaker identification -
synchronization between image and speech identification
created an open framework to manage fusion between two
(in our case) or more modalities (in enhanced future work)
designed the system so that each component can run in a
separate machine due to the distribution mechanism
interchanging data through a TCP/IP network
17. 01/30/15 ICMI-MLMI 2009 – Cambridge MA, USA17
Next Steps
Implementing the segmentation and annotation
of 3D content
Migrate the framework to a real-time
implementation
Evaluate other methods under the rules of the
framework
Continuously extend the framework to support
other fusion concepts and methods of
implementation