Multimodal interaction provides the user with multiple modes of interacting with a system. A multimodal interface provides several distinct tools for input and output of data.
2. Multimodal HCI System
› Combination of multiple modalities, or usage of more than
one independent channel signals for the interaction between
a user and a machine is termed as multimodal human
computer interaction system (MMHCI).
› A multimodal interface acts as a facilitator of human-
computer interaction via two or more modes of input.
– Input: voice, pen, gesture, face expression, etc.
– Output: voice, graphical output, etc.
3. A classic example of a multimodal system is the “Put That
There” system.(Bolt,1980)
5. 5
Multimodal Systems – Why?
› Provide transparent, flexible, and powerfully expressive means
of HCI.
› Easier to learn and use.
› Robustness and Stability.
› If used as front-ends to sophisticated application systems,
conducting HCI in modes all users are familiar with, then the
cost of training users would be reduced.
› Potentially user, task and environment adaptive.
7. • Inputs: An interactive multimodal implementation will use multiple input
modes such as audio, speech, handwriting, and keyboarding, and other
input modes.
• Outputs: An interactive multimodal implementation will use one or more
modes of output, such as speech, text, graphics, audio files, and
animation.
• Interaction manager: The interaction manager is the logical component
that coordinates data and manages execution flow from various input
and output modality component interface objects. The interaction
manager maintains the interaction state and context of the application
and responds to inputs from component interface objects and changes
in the system and environment. In some architectures the interaction
manager may be implemented as one single component. In other
architectures the interaction manager may be treated as a composition
of lesser components. Composition may be distributed across process
and device boundaries.
8. Multimodal interfaces process two or more combined user
input modes— such as speech, pen, touch, manual gestures,
gaze, and head and body movements— in a coordinated
manner with multimedia system output. They are a new class
of interfaces that aim to recognize naturally occurring forms of
human language and behavior, and which incorporate one or
more recognition-based technologies (e.g., speech, pen, vision).
9. Challenges for multimodal interface design
› More than 2 modes –e.g. spoken, gestural, facial
expression, gaze; various sensors
› Inputs are uncertain –vs. Keyboard/mouse
– Corrupted by noise
– Multiple people
› Recognition is probabilistic
› Meaning is ambiguous
10. Approach
Gain robustness via
– Fusion of inputs from multiple modalities
– Using strengths of one mode to compensate for
weaknesses of others—design time and run time
– Avoiding/correcting errors
– Statistical architecture
– Confirmation
– Dialogue context
– Simplification of language in a multimodal context
– Output affecting/channeling input
11. Differences Between Multimodal Interfaces
and GUIs
GUI
1. Assume that there is a single event
stream that controls event loop with
processing being sequential.
2. Assume interface actions (e.g. selection
of items) are atomic and unambiguous.
3. Built to be separable from application
software and reside centrally on one
machine.
4. Do not require temporal constraints.
Architecture not time sensitive.
MULTI-MODAL
1. Typically process continuous and
simultaneous input from parallel incoming
streams.
2. Process input modes using recognition
based technology, good at handling
uncertainty.
3. These have large computational and
memory requirements and are typically
distributed over the network.
4. Require time stamping of input and
development of temporal constraints on
mode fusion operations.
12. Application Areas
› Driver Monitoring
› Architecture and Design
› Geographical Information Systems
› Emergency Operations
› Field-based Operations
› Mobile Computing and Telecommunications
› Virtual Reality
› Pervasive/Ubiquitous Computing
› Computer-Supported Collaborative Work
› Education and Entertainment
› Intelligent Homes/Offices
› Intelligent Games
› Helping People with Disabilities
Notes de l'éditeur
It is easy to use by disabled, illiterate people
Interaction manager — The interaction manager is the logical component that coordinates data and manages execution flow from various input and output modality component interface objects. The interaction manager maintains the interaction state and context of the application and responds to inputs from component interface objects and changes in the system and environment. The interaction manager then manages these changes and coordinates input and output across component interface objects. The Interaction manager is discussed insection 6.
In some architectures the interaction manager may be implemented as one single component. In other architectures the interaction manager may be treated as a composition of lesser components. Composition may be distributed across process and device boundaries.
CSCW [is] a generic term, which combines the understanding of the way people work in groups with the enabling technologies of computer networking, and associated hardware, software, services and techniques.
Pervasive computing (also called ubiquitous computing) is the growing trend towards embedding microprocessors in everyday objects so they can communicate information. The words pervasive and ubiquitous mean "existing everywhere." Pervasive computing devices are completely connected and constantly available.