More Related Content Similar to Human Interaction Library (20) More from graphitech (20) Human Interaction Library1. UNITN & Graphitech
Human Interaction Library
Final Report
Magliocchetti Daniele < 125162 >
Raffaele De Amicis, Giuseppe Conti
Version 1.0
2. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Revisions
Date Version Description Author
11/09/2008 0.1 Software Requirements Definition Magliocchetti Daniele
06/07/2008 0.2 First Draft Magliocchetti Daniele
08/07/2008 0.3 Manuals Magliocchetti Daniele
07/08/2008 0.4 Review Magliocchetti Daniele
03/09/2008 0.5 Review Magliocchetti Daniele
10/09/2008 1.0 Final Review Magliocchetti Daniele
©UNITN & Graphitech, 2008 Page 2
3. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Index
1. Introduction 4
1.1 Acronyms and Abbreviations 4
1.2 References 4
1.3 Report Overview 4
2. Global Capability Description 5
2.1 Finger Tracking Capability 5
2.1.1 Navigation Mode (up to 2 fingers) 5
2.1.2 Interaction / Edit Mode (3 or more fingers) 5
2.2 Head Tracking Capability (Optional) 6
2.3 Speech Recognition (Optional) 6
3. Requirements 6
3.1 Functional Requirements 6
3.2 User Use Case 7
3.3 Additional Requirements 7
4. Requirements Classification 7
5. System Architecture 8
5.1 General Description 8
5.2 Finger Tracking Engine 9
5.3 Head Tracking Engine 10
5.4 Speech Recognition Engine 10
5.5 WorldWind Integration 11
6. Developer Manual 11
6.1 System Requirements 11
6.2 System Configuration 11
6.3 WorldWind 15
7. User Manual 15
7.1 Gestures 16
7.2 Voice Commands 16
7.3 Head Tracking 16
8. Appendix 17
8.1 Know Bugs and limitations 17
8.2 Directory Structure 17
8.3 Adopted Tools 18
©UNITN & Graphitech, 2008 Page 3
4. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Software Requirements Specification
1. Introduction
0B
This report presents the design and the architecture of a java library that allows the final user to
interact with applications through non conventional devices, like the Nintendo Wii® Controller
(WiiMote), the microphone and the webcam. It has been developed as a java API, that can be
easily integrated inside any kind of software that is suitable to be controlled by devices different
from a keyboard or a mouse. In particular in this report we will show how it has been used inside
WorldWind sdk, the open source NASA world exploring tool.
1.1 Acronyms and Abbreviations
8B
WiiMote: The Nintendo Wii® Controller
WWJ: WorldWind Java SDK
OpenCV: The Intel® Open Source Computer Vision Library
MoteJ: The WiiMote java api
BlueCove: The java bluetooth stack
Sphinx4j: The speech recognition library
BT: Bluetooth
IR: Infrared
JNI: The Java Native Interface
BNF: Backus‐Naur Form
1.2 References
9B
‐ http://motej.sourceforge.net : The WiiMote java api official site
H H
‐ http://bluecove.sourceforge.net/ : The java bluetooth stack official site
H H
‐ http://www.wiili.org/index.php/Wiimote : A WiiMote hardware description
H H
‐ http://www.youtube.com/watch?v=0awjPUkBXOU : “Tracking Your Fingers with the
H H
Wiimote” by Johnny Chung Lee
‐ http://sourceforge.net/projects/opencv/ : The OpenCV official site
H H
‐ http://www.youtube.com/watch?v=Jd3‐eiid‐Uw : “Head tracking with the Wiimote” by
H H
Johnny Chung Lee
‐ http://cmusphinx.sourceforge.net/sphinx4/ : The Sphinx official site: “A speech recognizer
H H
written entirely in the JavaTM programming language”
1.3 Report Overview
10B
In the following chapters we will go through all the aspects related to the implemented library.
The reminder of the paper is organized as follows. Section 2 describes in details the problem
and the functionalities that need to be implemented by the library, section 3 captures the
functional and non functional requirements for the project while section 4 classifies their
priority for the development. Finally, section 5 describes the architecture of the system while
section 6 and 7 provide usage manuals for developers and users respectively.
©UNITN & Graphitech, 2008 Page 4
5. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
2. Global Capability Description
1B
This chapter describes in details what the final user should be able to do with the library, once it
has been integrated inside a software like WorldWind.
2.1 Finger Tracking Capability
1B
The system should be able to recognize the finger movements of the user, through the WiiMote
controller and a set of reflective markers located on both forefingers and thumbs of the user
(the WiiMote can recognize up to 4 points with a resolution of 1024x768 points at 100 Hz of
frequency).
The system should be able to distinguish between 2 modes, with the following set of related
operations:
2.1.1 Navigation Mode (up to 2 fingers)
31B
The navigation mode includes all the common navigation operations and it is done with at most
2 fingers:
‐ Pan (1 finger): this is the default operation; if the application detects just one finger, a pan
(drag) action over the map is assumed.
‐ Zoom (2 fingers): when the system detects two points moving in the opposite direction, a
zoom action is assumed. In particular, an increasing distance is equivalent to a zoom‐in and a
decreasing distance is equivalent to a zoom‐out. Zoom‐in and zoom‐out will
increase/decrease the speed of the camera, respectively.
‐ Rotation (2 fingers): when the system detects two points rotating one over the other, a steer
action is assumed.
‐ Look up and look down (2 fingers): when the system detects two points at a constant
distance, moving on the same direction, a camera move action is assumed. In particular,
moving two fingers up will be interpreted as a look up and moving two fingers down as a
look down.
2.1.2 Interaction / Edit Mode (3 or more fingers)
32B
The interaction mode includes all the common edit operations, like selection, drag, double click
etc.
‐ Move cursor (3 or 1 finger): to use one of the fingers like a simple cursor in the application,
and to disable the default pan action of the navigation mode, the user can vocally switch to
the edit mode, or simply maintain the forefinger and the thumb of one hand fixed while the
forefinger of the other hand acts like a cursor. This mode is more immediate for occasional
selections.
‐ Click (1 finger): if the finger appears, disappears and appears on the same location, the
system should recognize it as a single click. To avoid pan reffer to the move cursor operation.
‐ Double click (1 finger): if the finger appears, disappears and appears for two times, the
system should recognize it as a double click. To avoid pan reffer to the move cursor
operation.
©UNITN & Graphitech, 2008 Page 5
6. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
‐ Right click (1 finger): if one finger moves forward and backward returning on the same
position, the system should recognize it as a right click. To avoid pan reffer to the move
cursor operation.
2.2 Head Tracking Capability (Optional)
12B
The system should be able to recognize the position of user head through a WiiMote controller
and a couple of infrared emitters placed on a pair of glasses or on a hat, and change the
perspective accordingly to its movements. Alternatively, the system can use a face recognition
algorithm to recognize user face and its movements.
2.3 Speech Recognition (Optional)
13B
The system should be able to recognize vocal commands of the user. These commands could be:
‐ Switch between navigation and edit mode;
‐ Navigation commands;
‐ Edit commands;
3. Requirements
2B
On the base of the description provided in the previous chapter, we can detect the following
functionalities and additional requirements.
3.1 Functional Requirements
14B
WiiMote Detection and Connection
The system should be able to discover Nintendo WiiMote® devices and connect to them.
Gestures Recognition
The system should be able to process the coordinate stream provided by the WiiMote when the user
turns on the infrared emitters, detect the gestures defined in section 2 and notify them to the third party
application (e.g. WorldWind).
WebCam Detection and Connection
The system should be able to detect and connect to a webcam in order to run the face recognition
algorithm.
Head Tracking
The system should be able to process the images provided by the webcam, detect the user face position
and notify its movements to the third party application.
Microphone Detection and Connection
The system should be able to detect and use the microphone of the machine in order to run the speech
recognition algorithm.
Speech Recognition
The system should be able to process the sound stream of the microphone and detect the user vocal
commands chosen from a small command set defined by a grammar.
Simultaneous Execution
The system should be able to run gesture recognition, speech recognition and head tracking
simultaneously allowing each capability to interact with the others.
©UNITN & Graphitech, 2008 Page 6
7. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Capability Selection
The user should be able to enable or disable each capability separately.
WorldWind Integration
The system should be integrated inside the WorldWind sdk, to extend the user interaction.
3.2 User Use Case
15B
The functionalities provided to the final user can be summarized with the following use case:
«extends»
Gestures
Edit Mode
Interaction
«extends»
«extends» Navigation Mode
Voice Interraction
User
Head Movements
Interaction
Capability
Selection
3.3 Additional Requirements
16B
Configuration file
All the configuration parameters should be defined inside a setting file.
Java
The adoption of Java as the main programming language is preferable.
Speed
Since the WiiMote sends data with a frequency of 100 frame per second, the system should be optimized for
a fast gesture detection.
Simple Integration
The integration of the library inside a third party application by the developer, has to be the simplest
possible. In particular the use of Singleton and an Event patterns should be preferred.
4. Requirements Classification
3B
The following table shows the previous requirements divided between functional and non
functional.
Accordingly with the IBM Rational Unified Process©, we have classified each of them with the
following tags: Essential, Desirable and Optional. Please note that although some of them have
been tagged as Optional and Desirable the library satisfies all the requirements.
©UNITN & Graphitech, 2008 Page 7
8. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Requirement Type
WiiMote Detection and Connection Essential
Gestures Recognition Essential
WebCam Detection and Connection Optional
Head Tracking Optional
Functional
Microphone Detection and Connection Optional
Speech Recognition Optional
Simultaneous Execution Desirable
Capability Selection Desirable
WorldWind Integration Essential
Configuration file Essential
Non Functional
Java Desirable
Speed Essential
Simple Integration Essential
5. System Architecture
4B
5.1 General Description
17B
We choose to develop a java library that wraps and extends a set of existing APIs, allowing the final
user to interact with its applications in different ways through gestures and voice. To do this, we
build a complex system that will be explained with the help of the package organization and the
schema provided in the Architecture.docx or Architecture.pdf file.
If we look at the source code, we can recognize the following set of packages:
‐ humanInteraction: that includes the main class of the system and a demo/test class;
‐ humanInteraction.connection: that includes the classes required to handle the
connection with the WiiMote and with the network;
‐ humanInteraction.core.events: that includes all the event wrapper classes that
are notified by the system to the third party application when an event occurs;
‐ humanInteraction.core.listeners.ext: that includes all the listener
interfaces that should be implemented by the third party application in order to handle the
incoming events from the system;
‐ humanInteraction.core.listeners.mote: that includes all the listener classes
that have been implemented by the system to handle the events produced by the MoteJ
library;
©UNITN & Graphitech, 2008 Page 8
9. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
‐ humanInteraction.core.queue: that includes a set of utility classes to handle the
flow of information between two threads;
‐ humanInteraction.core.fingertracking: that includes the classes for the
gesture recognition process and the continuous event notification;
‐ humanInteraction.core.headTracking: that includes the classes for the head
tracking process;
‐ humanInteraction.core.speechRecognition: that includes the classes for the
speech recognition process and the continuous event notification;
As we have seen from the package description, we have chosen an event‐listener approach
where the system calls all the defined external listeners each time an event occurs. We adopted
this solution because it is the most suitable for an API, since it allows a complete separation
from third party applications and its reuse in different contexts.
The starting point of the library is the HumanInteractionManager class, that implements
the Singleton patter to ensure that a class has only one instance, and provides a global point of
access to it among multiple applications. This class is responsible for the creation and
management of the API since it can add the defined listeners and start/stop the tracking system
but most of the options must be defined from the configuration file: settings.cfg.
When the manager creates its instance, it checks the configuration file to ensure that all the
parameters are correct, loads all the settings and creates an instance of the
ConnectionManager class. This class handles the discovery and connection with the
WiiMotes and with the head tracking client through a socket connection. After that, when the
startTracking() method is invoked, the manager initializes and starts the desired engines. Each of
them runs on a different thread and thus can be executed independently from the others.
Following the schema presented in the Architecture.docx (or Architecture.pdf) file, the next
sections will explain all the details related to these three engines, with references to the
additional libraries that have been used.
5.2 Finger Tracking Engine
18B
The finger tracking engine is responsible for the gesture recognition and relies, among the other
things on two external libraries: BlueCove the java bluetooth stack and Motej the WiiMote java
API. The former is responsible for the communication with the bluetooth dongle of the system,
the latter is an event‐listener layer that simplifies the communication with the WiiMotes
translating commands (methods) to byte streams and vice versa.
When the tracking starts, the ConnectionManager launches the WiiDiscoveryHandler
thread that adds a listener (WiiMoteListener) to the discoverer and each time a new
WiiMote is found, the listener initializes it by activating the IR camera and adding an
IrEventListener to the just created Mote instance. Since the Motej library generates an
event for each of the four points that can be detected by the WiiMote for each frame, the
IrEventListener executes a preprocessing operation where it reconstructs the original
frame, keep tracks of the position of each finger and generates a stream of IrEvent objects for
the finger tracking engine (FTEngine) through a fixed size synchronized message queue.
The FTEngine is a thread listening on the message queue, that processes all the incoming
IrEvents and detect gestures with the help of a CoordinatesBuffer class that acts like
an history file with the coordinates of the previous n seconds, where n can be defined from the
configuration file.
©UNITN & Graphitech, 2008 Page 9
10. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
The gesture recognition for all the navigation commands (see chapter 2) is executed by analyzing
the coordinate history and checking the movements of the two points with some tolerance
values, while the editing operations are detected by a state machine implemented with a set of
switch‐case and a status flag. For additional details, please refer to the source code.
Once a gesture has been detected, the FTEngine sends an event to all the registered listeners.
Events can be of the type: WiiButtonEvent, WiiInteractionEvent and
WiiNavigationEvent.
Finally, since some application (like WorldWind) need to be continuously notified to execute an
animation, an additional thread (FTNotifier) can be activated from the configuration file. In
this case, the FTEngine will only change some FTNotifier flags and let it do the event
generation step .
5.3 Head Tracking Engine
19B
The head tracking engine (HTEngine) is responsible for the face tracking step and relies on an
external library called OpenCV a very powerful face recognition library. Since it has been written
entirely in C, to avoid the use of jni, we choose to write the face recognition code in its native
language and send the detected coordinates stream through a socket connection.
When the tracking starts, the main manager starts the HTEngine listening on the message
queue while the ConnectionManager launches a SocketHandler, listening on the port
defined in the configuration file. Once started, the face recognition client loads its configuration
file htSettings.cfg, checks the correctness of the settings, connects to the SocketHandler,
activates the webcam of the system and starts the detection.
When the SocketHandler receives the connection request from the client, it launches a new
thread (ConnectionHandler) to handle the incoming stream of data and returns listening
on the port for new connections. The new thread is responsible for the processing of the
coordinates stream and for the sending of them on the message queue where the HTEngine is
listening.
When a new message appears on the queue, the HTEngine filters the incoming coordinates
set and notifies the event (HeadEvent) to all the registered listeners.
5.4 Speech Recognition Engine
20B
The speech recognition engine is responsible for the processing of the user voice in order to
detect commands and notify them to the third party application listeners. It is based on the
sphinx speech recognition library, that allows the recognition of words accordingly with a BNF
grammar (hi.gram file) and an xml configuration file (hi.config.xml).
When the recognition starts, the HumanInteractionManager starts the SREngine thread
that initializes the sphinx library and enters inside a recognition loop. Each time sphinx
recognizes a word that has been defined inside the grammar, it returns the word and the engine
notifies the corresponding events or sets the corresponding execution flags.
Here, like as in the tracking engine, we have defined an additional thread for the continuous
notification (SRNotifier) that can be activated in the case the third party applications
execute only event based animations.
©UNITN & Graphitech, 2008 Page 10
11. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
5.5 WorldWind Integration
21B
After the development of the library, we have proceeded with the integration inside the
WorldWind sdk. The input handling inside this sdk is basically done by two classes:
AWTInputHanlder that implements all the allowed listeners and handles all the input events
and OrbitViewInputBroker that computes the position of the globe and of the objects
when an event occurs. For these reasons starting from their code we decided to modify and
extend them in order to produce two new classes: WiiAWTInputHanlder and
WiiOrbitViewInputHandler located inside the it.unitn.cg2008 package.
The former has been extended to be not only the listener for key and mouse events but also for
all the events generated by the human interaction library, the latter has been tuned with some
concurrency workaround to avoid unpredictable movements of the globe.
To avoid the entire rewriting of the two classes, we choose to convert each incoming library
event to a sequence of key and mouse events. In this way, a WiiMote drag event will be mapped
as a sequence of MouseEvent (mousePressed(), mouseDragged(), mouseReleased()) as if the
user is dragging the globe with the mouse. Navigation events, like zoom and rotation are
mapped into KeyEvents (buttonPressed(), buttonReleased()).
Inside the WiiOrbitViewInputHandler we have simply added some checks to ensure the
consistency of the new camera and objects coordinates after an event occurs, because
WorldWind methods are not completely thread safe.
Finally, inside the it.unitn.cg2008, we have added three additional classes:
WiiMonitorLayer, SRMessageListener and WiiPointListener. The first class is
a WorldWind layer that is used to show the position of the user infrared emitters on the screen
(as little triangles) and to show the voice command that has been recognized. The other two
classes are listeners used to change the layer parameters, when a WiiMote movement or a voice
command is detected.
6. Developer Manual
5B
This chapter provides to the developer, detailed information about how to configure, compile
and run the library for a general application and for the WorldWind sdk.
6.1 System Requirements
2B
‐ A machine that is able to run smoothly WorlWind SDK for java version 0.4;
‐ Windows OS to run Head Tracking client;
‐ 128 MB of additional ram to run the speech recognition ;
‐ 1 Nintendo WiiMote® for finger tracking;
‐ 1 Bluetooth dongle with Widdcom Bluetooth Drivers version 1.4.2 or above;
‐ SoundCard and Microphone for speech recognition;
‐ 1 Webcam with a resolution of at least 320x240 pixel for the head tracking;
‐ Java JDK 1.6;
‐ Jogl version 1.1.1;
6.2 System Configuration
23B
In order to use all the features provided by the library, you must first ensure to meet all the
requirements of the previous section. In particular, we are assuming that the last jdk, the 1.1.1
version of jogl and the external device drivers have been installed and properly configured on
the system.
©UNITN & Graphitech, 2008 Page 11
12. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
To create a new application, open your favorite IDE, create an empty project and add to it all the
libraries (*.jar) that are stored inside the libraries directory. In particular:
‐ bluecove‐2.0.2.jar : the bluetooth stack;
‐ motej‐library‐0.8.jar : the WiiMote api;
‐ commons‐logging‐1.1.1.jar; commons‐logging‐adapters‐1.1.1.jar; commons‐logging‐api‐
1.1.1.jar; commons‐logging‐tests.jar : common libraries;
‐ js.jar; jsapi.jar; sphinx4.jar; tags.jar; WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar : the
sphinx libraries;
After that you have to place a copy of the three configuration files (settings.cfg, hi.gram,
hi.config.xml) inside the working directory of your project. If you plan to use the head tracking
capability, locate the ExecFR directory and remember to launch runme.cmd after the start of
your application.
To use the library in your application, you first have to implement one or more listeners, starting
from the interfaces located at the package humanInteraction.core.listeners.ext:
‐ WiiPointsEventListener: to know the position of the IR emitters on the screen;
‐ WiiButtonsEventListener: to know if a WiiButton has been pressed;
‐ WiiNavigationEventListener: to get navigation events;
‐ WiiInteractionEventListener: to get editing events;
‐ HeadTrackingEventListener: to know the position of the user head;
‐ SpeechEventListener: to know the user voice command;
The final step consist of the instantiation of the main class and the addition of the just defined
listeners. You can get an instance of the manager by typing:
HumanInteractionManager manager = HumanInteractionManager.getInstance();
To add your listeners you can invoke the corresponding manager method:
manager.setWiiButtonsEventListener(YourWiiButtonsEventListener);
manager.setWiiIntEvtListener(YourWiiIntEvtListener);
manager.setWiiNavEvtListener(YourWiiNavEvtListener);
manager.setWiiPointsEventListener(YourWiiPointsEventListener);
manager.setHeadTrackingListener(YourHeadTrkListener);
Please note that an arbitrary number of them can be added.
Finally, you can start and stop the tracking by typing:
manager.startTracking();
manager.stopTracking();
If you want to calibrate the Nintendo WiiMote, call manager.startCalibration() at any
time. The calibration is useful since it allows the system to configure most of its sensitivity
settings automatically, in relation with the distance between the user and the WiiMote. If you
call this method the library will output all the necessary instructions on the console.
In addition, from the manager you can choose to pause the speech recognition (manager.
muteMicroPhone()), the head tracking (manager.pauseHeadTracking()) and to
©UNITN & Graphitech, 2008 Page 12
13. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
switch from navigation to edit mode (manager.switchMode()).
For a better comprehension, refer to the TestManager class and to the code documentation.
To achieve the best performance from the application an accurate tuning of the configuration
files is essential. The main configuration file for the library is setting.cfg whose parameters are
described in the following table:
Parameter Values Description
fingerTrackingEnabled 0,1 Enables/Disables finger tracking
ftBufferSize 0..n slot The message buffer size
irCameraMode BASIC , The camera configuration mode: extended
EXTENDED, FULL recommended
irCameraSensitivity CLIFF, MARCAN Sensitivity preset: marcan recommended
recognizeDrag 0,1 Enables/Disables drag gesture recognition
recognizeZoom 0,1 Enables/Disables zoom gesture recognition
recognizeLook 0,1 Enables/Disables look gesture recognition
recognizeRotation 0,1 Enables/Disables rotation gesture recognition
recognizeClicks 0,1 Enables/Disables click recognition
continuousZoom 0,1 Enables/Disables zoom continuous notification
continuousLook 0,1 Enables/Disables look continuous notification
continuousRotation 0,1 Enables/Disables rotation continuous notification
FT
bufferTrackingSize 0..n ms Movements history size
allowCalibration 0,1 Enables/Disables calibration
zoomStopDistance 0..n pixel Distance under that two ir emitters are recognized
as a stop gesture
zoomSensitivity 0..n pixel Tolerance to detects a zoom gesture
lookSensitivity 0..n pixel Tolerance to detects a look gesture
lookFingerSensitivity 0..n pixel Tolerance between two ir emitters
rotationSensitivity 0..n pixel Tolerance to detect a rotation gesture
rotationRadiusSensitivity 0..n pixel Radius tolerance to detect a rotation gesture
minClickInterval 0..n ms Minimum interval to detect a click
maxClickInterval 0..n ms Maximal interval to detect a click
clickSensitivity 0..n pixel Tolerance to detect a click
rightClickDetectionInterval 0..n ms History time to analyze for a right click detection
ftNotificationInterval 0..n ms The continuous notification sleep time
speechRecognitionEnabled 0,1 Enables/Disables speech recognition
continuousSRnotification 0,1 Enables/Disables voice command continuous
SR
notification
srNotificationInterval 0..n ms The continuous notification sleep time
xmlCfg String The xml configuration file path
headTrackingEnabled 0,1 Enables/Disables head tracking
htPort 1024..65535 The listening port for the head tracking client
cameraWidth 0..n pixel The pixel width of the camera
cameraHeight 0..n pixel The pixel height of the camera
HT
camTimeOut 0..n ms The timeout after that the face recognition
disengages the user head
camXSensitivity 0..n pixel The x sensitivity
camYSensitivity 0..n pixel The y sensitivity
camRadiusThreshold 0..n pixel The radius sensitivity
©UNITN & Graphitech, 2008 Page 13
14. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
In addition there are two additional files for the speech recognition. The first is hi.gram that
contains the following grammar in BNF:
public <command> = Double Click | Edit Mode | Left | Look Down | Look Up
| Mouse Drag | Navigation Mode | Right | Right Click
| Single Click | Stop | Turn Left | Turn Right
| Zoom In | Zoom Out;
This is the set of recognizable voice commands. Please note that most of them are multiwords.
This choice has been made to improve the speech recognition accuracy, since a lot of words
appear similar in the pronounce like “look” and “zoom”. If you plan to extend the speech
recognition capability with additional commands, you first have to define them inside this
grammar.
The second file is hi.config.xml and includes all the speech recognizer parameters. Although
there are a lot of them, the most important are those placed at beginning of the file:
Parameter Value Description
absoluteBeamWidth ‐1..n The maximum number of hypotheses to consider for
each frame (‐1 all)
relativeBeamWidth 0..n The more negative you make the exponent of the
relative beam width, the less hypotheses you discard
and the more accurate your recognition
wordInsertionProbability 0..n Controls word breaks recognition. Near 1 is more
aggressive
languageWeight 0..n Tradeoff between grammar and acoustic scores
addSilenceWords True, false Enables/Disables silence word insertion
silenceInsertionProbability 0..1 How aggressive Sphinx is at inserting silences
silenceThreshold 0..n Sensitivity level with respect from the environment
The actual values are not definitive and depend strictly on the environment and on the distance
between the microphone and the user. Usually some empirical test are required to achieve a
good speech recognition. For additional information on the other parameters, please refer to
the Sphinx reference manual.
The last configuration file is the one for the head tracking (htSettings.cfg) that includes the
following parameters:
Parameter Value Description
hostIp 1..255.1..255.1..255.1..255 The address of the machine running the
library
hostPort 1024...65535 The port of the machine running the library
cascadeMethod String The name of the detection rule file
cutoff 1..1.4 The scale factor. Increase to improve speed
but to get less precise detection.
cvSizehorizontal 1..cameraWidth Window sample pixel width. Increase to
improve speed but to get less accuracy.
cvSizeVertical 1..cameraHeight Window sample pixel height. Increase to
improve speed but to get less accuracy.
©UNITN & Graphitech, 2008 Page 14
15. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
horizontalScaleFactor 1..n The WiiMote horizontal Resolution
verticalScaleFactor 1..n The WiiMote horizontal Resolution
showCameraWindow 0,1 Enable/Disable the camera window
waitTime 1..n The waiting time between each detection
6.3 WorldWind
24B
If your system has been correctly configured as explained in the previous section, the execution of
WorldWind is relatively simple. Inside the Project directory there is an already configured copy of
WorldWind 0.4 with the additional classes defined in section 5.5. To run the code, simply add all
the files of the WiiWorldWind directory in your favorite IDE, add the libraries of the previous
section and configure the setting files.
If you want to use the library inside a clean copy of the WorldWind sdk, you have to follow these
steps:
‐ copy the directory .projectWiiWorldWindsrcit inside the src directory of WWJ;
‐ copy settings.cfg, hi.gram, hi.config.xml files from .WorldWind to your WWJ working
directory and configure them;
‐ copy the directory .projectWiiWorldWindsrcworldwinddemo inside the src directory of
WWJ;
‐ locate the WWJ configuration file worldwind.properties inside the
.projectWiiWorldWindsrcconfig directory and change the line:
gov.nasa.worldwind.avkey.InputHandlerClassName=gov.nasa.worldw
ind.awt.AWTInputHandler with
gov.nasa.worldwind.avkey.InputHandlerClassName=it.unitn.cg2008.
WiiAWTHandler;
‐ Change the visibility of gov.nasa.worldwind.awt.KeyPollTimer and
gov.nasa.worldwind.awt.OrbitViewInputStateIterator from protected
to public;
‐ Add all the libraries of section 6.2 to the project;
Finally, run the application. If you choose to enable the speech recognition, you will need to pass
the additional parameter –Xmx256M to the java virtual machine.
You will not see any difference with respect to WWJ, with the exception that by pressing the C
button you will start/stop the human interaction library.
7. User Manual
6B
This chapter provides instructions for the final user on how to use WorldWind extended with our
human interaction library. To activate the library, follow these steps:
‐ Once the WWJ window appears, click with the mouse inside the window in order to get the
focus on the canvas and press the C button;
‐ Turn the WiiMote in connection mode by pressing together button 1 and 2 and wait until it
connects with the computer. The WiiMote will rumble and the first led will turn on.
‐ Position the WiiMote above or below the screen, with the IR camera looking at you;
©UNITN & Graphitech, 2008 Page 15
16. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
‐ To enable the head tracking, double click on the runme.cmd file locate inside the Exec FR
directory and wait for the connection and initialization of the face recognizer (a window with
the webcam stream will be opened);
Now that the library is running, you have to put the IR emitters / reflectors on your fingers
(forefingers and thumbs) and start to interact with WorldWind.
7.1 Gestures
25B
The allowed gestures are those defined in section 2.1 with the addition of the stop command
that is achieved by putting your two forefingers one near the other, in front of the WiiMote. If
the distance is below a certain value all the activated movements (zoom, look and turn) will be
interrupted.
For the best user experience it is strongly recommended to execute a calibration just after the
application start, by pressing the R button. After that, place your two forefingers in front of the
WiiMote at a short distance. This distance will be used as the stop distance and to configure all
the sensitivity parameters for the other gestures. So if the system appears to sensitive, just
recalibrate the WiiMote with a stop distance greater than the previous one.
If you want to manually switch between navigation and edit mode, press the E button.
7.2 Voice Commands
26B
The allowed voice commands are shown in the table below. If you want, you can enable/disable
the microphone (stop the speech recognition) by pressing the m button.
Command Description
Left Moves to the West direction
Right Moves to the East direction
Look Up Moves to the North direction
Look Down Moves to the South direction
Turn Right Rotates the globe clockwise
Turn Left Rotates the globe counterclockwise
Zoom In Zoom in
Zoom Out Zoom out
Stop Stop all the movements
Edit Mode Switch to the edit mode
Navigation Mode Switch to the navigation mode
Single Click Notifies a single left click to WWJ
Double Click Notifies a double left click to WWJ
Right Click Notifies a single right click to WWJ
Mouse Drag Notifies a drag to WWJ
7.3 Head Tracking
27B
Once the client has been started, you can activate the head tracking on WorldWind by looking at
center of the camera, so that the library can engage your head and move the WorldWind globe
accordingly. If you go off the camera for a time that is above the one defined inside the
htSettings.cfg file, you will be disengaged. If you want, you can enable/disable the head tracking
by pressing the h button. The behavior of WWJ is summarized in the following table:
©UNITN & Graphitech, 2008 Page 16
17. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
Command Description
Head Right Moves the globe to the East direction
Head Left Moves the globe to the West direction
Head Up Moves the globe to the North direction
Head Down Moves the globe to the South direction
Head Near the Screen Zoom in
Head Far fro the Screen Zoom out
8. Appendix
7B
8.1 Know Bugs and limitations
28B
‐ The library can discover all the WiiMote devices that are near the bluetooth dongle but only
if they are in connection mode (button 1 and 2 pressed together) in a short time after the
start of the application (method manager.startTracking());
‐ Although the library can connect with up to 4 WiiMote, it can handle just one of them at a
time;
‐ Although the drag movement detected from the library is mapped exactly into the
corresponding WorldWind events, sometimes the sdk generates some inconsistencies that
lead to an incorrect positioning of the globe (flipping and turning). We were not able to
solve this problem, neither to detect it, even after a discussion on the WorldWind forum
with the developers of the library.
‐ The Speech Recognition accuracy depends strictly from the quality of the microphone and
the noise of the environment. This means that each time the application is executed on a
different machine, phone or environment, the configuration file hi.config.xml has to
be reconfigured;
‐ As a general rule for a good speech recognition, it’s preferable to speak slowly, especially
with multiwords (like “turn left”), and to pronounce the command “stop” with a long “o”.
‐ The library does not recognize combinations of buttons. This means that when the user
presses two WiiMote buttons together, two different WiiButtonEvent will be
generated.
‐ To allow the definition of the WiiAwtHandler and WiiOrbitViewInputBroker
classes in a package different from gov.nasa.worldwind.awt , the visibility of the
KeyPollTimer and OrbitViewInputStateIterator class has been changed from
protected to public.
‐ The head tracking client sets always the camera to the maximal resolution. With webcams
with a resolution greater than 640x480 pixel and slow cpus, the detection can be really slow.
In this case, it’s preferable to set higher values for cvSizeHorizontal and cvSizeVertical inside
the htSettings.cfg file.
‐ If you have multiple webcams, only the first installed will be used
8.2 Directory Structure
29B
‐ .HumanInteractionLibraryDocs : Directory including this report, the UML, the specification,
the slide show and the javadoc of the library;
‐ . HumanInteractionLibraryFaceRecognition : Directory including the Visual Studio project
for the head tracking client;
‐ .HumanInteractionLibraryExecFR : Directory including only the compiled executable for
the head tracking client;
©UNITN & Graphitech, 2008 Page 17
18. Human Interaction Library – Final Report Version: <1.0>
Author: Magliocchetti Daniele Matr. 125162 Date: 11/09/2008
‐ . HumanInteractionLibraryProject : Directory including the eclipse project of the library;
‐ . HumanInteractionLibrarylibraries: Directory including all the libraries required for the
correct execution and compilation of the project. It also includes the
HumanInteraction.jar file, a compiled version of the project that can be easily
integrated with third party projects;
8.3 Adopted Tools
30B
Coding: Borland JBuilder 2007 Enterprise, Eclipse 3.3, Microsoft Visual Studio 2008
Modeling: Microsoft Visio 2007, Microsoft Word 2007, Microsoft PowerPoint 2007
©UNITN & Graphitech, 2008 Page 18