Gesture recognition document

GESTURE RECOGNITION
Abstract:
Gesture recognition is the process of understanding and
interpreting meaningful movements of the hands, arms, face,
or sometimes head. It is of great need in designing an
efficient human-computer interface. The technology has
been in study in recent years because of its potential for
application in user interfaces.Gesture recognition is one of
the main areas of research for the engineers and scientists.
It is the natural way of human machine interaction. Today the
industry is working on different application to make
interactions more easy, natural and convenient without
wearing any extra device. There are many key top people in
gesture recognition systems who have marketed their
solution for various issues. The key vendors in gesture
recognition space include Cognitec-Systems, eyeSight
Mobile Ltd.,GestureTek Technologies Inc., Omek Interactive
Ltd., PointGrab Ltd., PrimeSense Ltd., and SoftKinetic Inc.
Some more to name are Intel Corp., Qualcomm Inc., and
Thalmic Labs Inc...
In this paper we present what gesture recognition technology
is all about and conceptualizing an ideal approach for
gesture recognition.
NIKHIL RAJ
4

CONTENTS
S.N TITLE PAGE
 1 Introduction...................................................... 1
 2 Gesture types................................................... 2
 3 Input devices.................................................... 2
 4 Algorithms........................................................ 5-9
o D model-based algorithms
o Skeletal-based algorithms
o Appearance-based models
 5 Architecture..................................................... 10
 6 Application Development................................. 11
 5 Challenges....................................................... 12-13
o "Gorilla arm"
 7 Market Trend.................................................... 13
 8 Factors driving the need of solution................. 13
 9 Upcoming New Technology............................. 15
 10 Construction..................................................... 16
 11 Advantage and Disadvantage.......................... 17
 11 Conclusion........................................................ 18
 12 References........................................................ 19
5

INTRODUCTION
Gesture recognition
It is a topic in computer science and language technology
with the goal of interpreting human gestures via
mathematical algorithms. Gestures can originate from any
bodily motion or state but commonly originate from the face
or hand. Current focuses in the field include emotion
recognition from face and hand gesture recognition. Many
approaches have been made using cameras and computer
vision algorithms to interpret sign language. However, the
identification and recognition of posture, gait, proxemics, and
human behaviors is also the subject of gesture recognition
techniques. Gesture recognition can be seen as a way for
computers to begin to understand human body language,
thus building a richer bridge between machines and humans
than primitive text user interfaces or even GUIs (graphical
user interfaces), which still limit the majority of input to
keyboard and mouse.
Gesture recognition enables humans to communicate with
the machine (HMI) and interact naturally without any
mechanical devices. Using the concept of gesture
recognition, it is possible to point a finger at the computer
screen so that the cursor will move accordingly. This could
potentially make conventional input devices such as mouse,
keyboards and even touch-screens redundant.Gesture
recognition can be conducted with techniques from computer
vision and image processing.
The literature includes ongoing work in the computer vision
field on capturing gestures or more general human pose and
movements by cameras connected to a computer.
6

Gesture recognition and pen computing:
Pen computing reduces the hardware impact of a system
and also increases the range of physical world objects
usable for control beyond traditional digital objects like
keyboards and mice. Such implements can enable a new
range of hardware not requiring monitors. This idea may lead
to the creation of holographic display. The term gesture
recognition has been used to refer more narrowly to non-
text-input handwriting symbols, such as inking on a graphics
tablet, multi-touch gestures, and mouse gesture recognition.
This is computer interaction through the drawing of symbols
with a pointing device cursor.
Gesture types
In computer interfaces, two types of gestures are
distinguished. We consider online gestures, which can also
be regarded as direct manipulations like scaling and rotating.
In contrast, offline gestures are usually processed after the
interaction is finished; e. g. a circle is drawn to activate a
context menu.
 Offline gestures: Those gestures that are processed
after the user interaction with the object. An example is
the gesture to activate a menu. 
 Online gestures: Direct manipulation gestures. They
are used to scale or rotate a tangible object. 
Input devices
The ability to track a person's movements and determine
what gestures they may be performing can be achieved
through various tools. Although there is a large amount of
research done in image/video based gesture recognition,
7

there is some variation within the tools and environments
used between implementations.
 Wired gloves. These can provide input to the computer
about the position and rotation of the hands using
magnetic or inertial tracking devices. Furthermore,
some gloves can detect finger bending with a high
degree of accuracy (5-10 degrees), or even provide
haptic feedback to the user, which is a simulation of the
sense of touch. The first commercially available hand-
tracking glove-type device was the DataGlove, a glove-
type device which could detect hand position,
movement and finger bending. This uses fiber optic
cables running down the back of the hand. Light pulses
are created and when the fingers are bent, light leaks
through small cracks and the loss is registered, giving
an approximation of the hand pose. 
 Depth-aware cameras. Using specialized cameras
such as structured light or time-of-flight cameras, one
can generate a depth map of what is being seen
through the camera at a short range, and use this data
to approximate a 3D representation of what is being
seen. These can be effective for detection of hand
gestures due to their short range capabilities. 
 Stereo cameras. Using two cameras whose relations
to one another are known, a 3d representation can be
approximated by the output of the cameras. To get the
cameras' relations, one can use a positioning reference
such as a lexian-stripe or infrared emitters. In
combination with direct motion measurement (6D-
Vision) gestures can directly be detected. 
 Controller-based gestures. These controllers act as
an extension of the body so that when gestures are
performed, some of their motion can be conveniently
captured by software. Mouse gestures are one such 
8

example, where the motion of the mouse is correlated
to a symbol being drawn by a person's hand, as is the
Wii Remote or the Myo, which can study changes in
acceleration over time to represent gestures. Devices
such as the LG Electronics Magic Wand, the Loop and
the Scoop use Hillcrest Labs' Freespace technology,
which uses MEMS accelerometers, gyroscopes and
other sensors to translate gestures into cursor
movement. The software also compensates for human
tremor and inadvertent movement. AudioCubes are
another example. The sensors of these smart light
emitting cubes can be used to sense hands and fingers
as well as other objects nearby, and can be used to
process data. Most applications are in music and sound
synthesis, but can be applied to other fields.
 Single camera. A standard 2D camera can be used for
gesture recognition where the resources/environment
would not be convenient for other forms of image-based
recognition. Earlier it was thought that single camera
may not be as effective as stereo or depth aware
cameras, but some companies are challenging this
theory. Software-based gesture recognition technology
using a standard 2D camera that can detect robust hand
gestures, hand signs, as well as track hands or fingertip
at high accuracy has already been embedded in
Lenovo’s Yoga ultrabooks, Pantech’s Vega LTE
smartphones, Hisense’s Smart TV models, among other
devices. 
9

Algorithms
Different ways of tracking and analyzing gestures exist, and
some basic layout is given is in the diagram above. For
example, volumetric models convey the necessary
information required for an elaborate analysis, however they
prove to be very intensive in terms of computational power
and require further technological developments in order to be
implemented for real-time analysis. On the other hand,
appearance-based models are easier to process but usually
lack the generality required for Human-Computer Interaction.
Depending on the type of the input data, the approach for
interpreting a gesture could be done in different ways.
However, most of the techniques rely on key pointers
represented in a 3D coordinate system. Based on the
relative motion of these, the gesture can be detected with a
high accuracy, depending of the quality of the input and the
algorithm’s approach. In order to interpret movements of the
body, one has to classify them according to common
properties and the message the movements may express.
For example, in sign language each gesture represents a
word or phrase. The taxonomy that seems very appropriate
for Human-Computer Interaction has been proposed by
10

Quek in "Toward a Vision-Based Hand Gesture Interface".He
presents several interactive gesture systems in order to
capture the whole space of the gestures:
1. Manipulative;
2. Semaphoric;
3. Conversational.
Some literature differentiates 2 different approaches in
gesture recognition: a 3D model based and an appearance-
based. The foremost method makes use of 3D information of
key elements of the body parts in order to obtain several
important parameters, like palm position or joint angles. On
the other hand, Appearance-based systems use images or
videos for direct interpretation.
A real hand (left) is interpreted as a collection of vertices and
lines in the 3D mesh version (right), and the software uses
their relative position and interaction in order to infer the
gesture.
11

3D model-based algorithms:The 3D model approach can
use volumetric or skeletal models, or even a combination of
the two. Volumetric approaches have been heavily used in
computer animation industry and for computer vision
purposes. The models are generally created of complicated
3D surfaces, like NURBS or polygon meshes. The drawback
of this method is that is very computational intensive, and
systems for live analysis are still to be developed. For the
moment, a more interesting approach would be to map
simple primitive objects to the person’s most important body
parts ( for example cylinders for the arms and neck, sphere
for the head) and analyse the way these interact with each
other. Furthermore, some abstract structures like super-
quadrics and generalised cylinders may be even more
suitable for approximating the body parts. The exciting thing
about this approach is that the parameters for these objects
are quite simple. In order to better model the relation
between these, we make use of constraints and hierarchies
between our objects.
The skeletal version (right) is effectively modelling the hand
(left). This has fewer parameters than the volumetric version
and it's easier to compute, making it suitable for real-time
gesture analysis systems.
Skeletal-based algorithms:Instead of using intensive
processing of the 3D models and dealing with a lot of
12

parameters, one can just use a simplified version of joint
angle parameters along with segment lengths. This is known
as a skeletal representation of the body, where a virtual
skeleton of the person is computed and parts of the body are
mapped to certain segments. The analysis here is done
using the position and orientation of these segments and the
relation between each one of them( for example the angle
between the joints and the relative position or orientation)
Advantages of using skeletal models:
 Algorithms are faster because only key parameters are
analyzed. 
 Pattern matching against a template database is
possible 
 Using key points allows the detection program to focus
on the significant parts of the body 
These binary silhouette(left) or contour(right) images
represent typical input for appearance-based algorithms.
They are compared with different hand templates and if they
match, the correspondent gesture is inferred.
13

Appearance-based models
These models don’t use a spatial representation of the body
anymore, because they derive the parameters directly from
the images or videos using a template database. Some are
based on the deformable 2D templates of the human parts of
the body, particularly hands. Deformable templates are sets
of points on the outline of an object, used as interpolation
nodes for the object’s outline approximation. One of the
simplest interpolation function is linear, which performs an
average shape from point sets, point variability parameters
and external deformators. These template-based models are
mostly used for hand-tracking, but could also be of use for
simple gesture classification.
A second approach in gesture detecting using appearance-
based models uses image sequences as gesture templates.
Parameters for this method are either the images
themselves, or certain features derived from these. Most of
the time, only one ( monoscopic) or two ( stereoscopic )
views are used.
14

Architecture
The solution should act as middleware to form glue between
platform and application. A sensor collect and sends raw
data to the solution. The solution is a middleware that sits
between the platform and application. It receives raw data
from the sensor,processes and send high level data to
application various vision based gesture application for
market such as TV, Automobile, healthcare and many others
can be built on top of the solution.
15

Application Development
The solution effectiveness is judged by the solution usage.
The aim of the solution will be to allow developers to easily
create a variety of natural gesture control applications
such as games, consumer electronics, and car
infotainment solution.
Let us consider the application example of car
infotainment solution to illustrate how effectively this ideal
solution will be used. Automobile manufacturers are
benefitted with gesture recognition as the technology is
adding more value to their offerings. Intuitive car
infotainment solutions enable the user to explore maps,
toggle menus and radio stations using simple gesture
control.
16

Challenges
There are many challenges associated with the accuracy
and usefulness of gesture recognition software. For image-
based gesture recognition there are limitations on the
equipment used and image noise. Images or video may not
be under consistent lighting, or in the same location. Items in
the background or distinct features of the users may make
recognition more difficult.
The variety of implementations for image-based gesture
recognition may also cause issue for viability of the
technology to general usage. For example, an algorithm
calibrated for one camera may not work for a different
camera. The amount of background noise also causes
tracking and recognition difficulties, especially when
occlusions (partial and full) occur. Furthermore, the distance
from the camera, and the camera's resolution and quality,
also cause variations in recognition accuracy.
In order to capture human gestures by visual sensors, robust
computer vision methods are also required, for example for
hand tracking and hand posture recognition or for capturing
movements of the head, facial expressions or gaze direction.
"Gorilla arm" :"Gorilla arm" was a side-effect of vertically
oriented touch-screen or light-pen use. In periods of
prolonged use, users' arms began to feel fatigue and/or
discomfort. This effect contributed to the decline of touch-
screen input despite initial popularity in the 1980s. In order to
measure arm fatigue and the gorilla arm side effect,
researchers developed a technique called Consumed
Endurance.
17

Market Trend
The market is changing rapidly due to evolving technology
and more and more OEM’s are moving towards gesture
recognition technology adoption. As per a report published
by Markets and Markets, the gesture recognition market is
estimated to grow at a healthy CAGR from 2013 till 2018 and
is expected to cross $15.02 billion by the end of these five
years. Analysts forecast the Global Gesture Recognition
market to grow at a CAGR of 29.2 percent over the period
2013-2018.
If we talk in terms of industry, then currently consumer
electronics application contributes to more than 99% of the
global gesture recognition market. As per the report
published, the Healthcare application is expected to emerge
as a significant market for gesture recognition technologies
over the next five years. The automotive application for
gesture recognition is expected to be commercialized in
2015.
Factors driving the need for a Solution
There are many solutions provided by top vendors which
cater to the business need of gesture recognition. Each
solution comes with one or more shortcomings. The
solutions available are closely coupled with the underlying
hardware. It allows almost no flexibility for using it with any
other hardware. This is one of the major factors which makes
it impossible to port gesture recognition application
developed on one platform to another.
Solutions available in the market today come up with a list of
pre-defined cameras. The camera interface is either
integrated into the solution or is supported by the solution.
Gesture recognition applications built using these solutions
18

are bound to use the specified set of cameras. The
applications are highly dependent on camera capability and
the application development is limited by the set of features
camera is offering.
There could be a situation where the hardware (camera) is
already in place but the solution required to build the gesture
recognition application does not support the camera which
results in extra hardware cost. Today’s solutions are highly
dependent on camera and do not allow the flexibility to use
any other camera.
Gesture recognition is spanning its wings across various
industries like automobile, healthcare, media etc. Solutions
available today are targeting specific market area. The
leading vendors are offering solutions which cater to the
need of specific OEM. If the solution is designed for
developing gesture recognition application for automobile, it
will have the SDK designed specifically for automobile
gesture recognition applications. If a solution is targeting
healthcare, it will have gesture detectors specifically for
developing healthcare gesture recognition applications.
19

Upcoming New Technologies
The Sixth Sense Device:-
SixthSense is a wearable gestural interface device
developed by Pranav Mistry, a PhD student in the Fluid
Interfaces Group at the MIT Media Lab. IT is similar to
Telepointer, a neckworn projector/camera system developed
by Media Lab student Steve Mann (which Mann originally
referred to as "Synthetic Synesthesia of the Sixth Sense").
The SixthSense prototype is comprised of a pocket projector,
a mirror and a camera. The hardware components are
coupled in a pendant like mobile wearable device. Both the
projector and the camera are connected to the mobile
computing device in the user‟s pocket. The projector projects
visual information enabling surfaces, walls and physical
objects around us to be used as interfaces; while the camera
recognizes and tracks user's hand gestures and physical
objects using computer-vision based techniques. The
software program processes the video stream data captured
by the camera and tracks the locations of the colored
markers (visual tracking fiducials) at the tip of the user‟s
fingers using simple computer-vision techniques. The
movements and arrangements of these fiducials are
interpreted into gestures that act as interaction instructions
for the projected application interfaces. The maximum
number of tracked fingers is only constrained by the number
of unique fiducials, thus SixthSense also supports multi-
touch and multi-user interaction. The SixthSense prototype
implements several applications that demonstrate the
usefulness, viability and flexibility of the system. The map
application lets the user navigate a map displayed on a
nearby surface using hand gestures, similar to gestures
supported by Multi-Touch based systems, letting the user
zoom in, zoom out or pan using intuitive hand movements.
20

The drawing application lets the user draw on any surface by
tracking the fingertip movements of the user‟s index finger.
SixthSense also recognizes user‟s freehand gestures
(postures). For example, the SixthSense system implements
a gestural camera that takes photos of the scene the user is
looking at by detecting the „framing‟ gesture. The user can
stop by any surface or wall and flick through the photos
he/she has taken. SixthSense also lets the user draw icons
or symbols in the air using the movement of the index finger
and recognizes those symbols as interaction instructions. For
example, drawing a magnifying glass symbol takes the user
to the map application or drawing an „@‟ symbol lets the
user check his mail. The SixthSense system also augments
physical objects the user is interacting with by projecting
more information about these objects projected on them.
Construction and Working: -
The SixthSense prototype comprises a pocket projector, a
mirror and a camera contained in a pendant like, wearable
device. Both the projector and the camera are connected to
amobile computing device in the user‟s pocket. The projector
projects visual information enabling surfaces, walls and
physical objects around us to be used as interfaces; while
the camera recognizes and tracks user's hand gestures and
physical objects using computer-vision based techniques.
The software program processes the video stream data
captured by the camera and tracks the locations of the
colored markers (visual tracking fiducials) at the tips of the
user‟s fingers. The movements and arrangements of these
fiducials are interpreted into gestures that act as interaction
instructions for the projected application interfaces.
SixthSense supports multi-touch and multi-user interaction.
21

Advantage
 Replaces large I/P devices. 
 Also usefull for Physically handicapped person. 
 It provides a simple, usable and interesting user
interface and satisfies the need for more freedom in a
human computer interaction environment. 
 It is considered as a powerful tool for computers to
begin to understand human body language. 
 It is widely used in various application areas since it
gives the user a new experience of feeling. 
Disadvantages
 Very costly.
22

Conclusion
According to the Markets and Markets analysis, the growth of
gesture recognition is going to be huge. So, we have a huge
opportunity to play in this technology. The end user is now
moving towards a whole new path of human machine
interaction. This is creating a demand for enabling gesture
recognition in every facet of market.
The solution we are proposing will have a mammoth place in
gesture recognition market. Using the solution, we can
develop easily and quickly gesture recognition applications
for various industries. The solution will aim to develop
business tie-ups with major OEMs.
23

Reference
 http://www.marketsandmarkets.com/Market- 
 Reports/touchless-sensing-gesturing-market-369.html 

 http://www.marketsandmarkets.com/Market- 
 Reports/europe-gesture-recognition-and-touchless-
sensing- 
 market-1149.html 

 http://commons.wikimedia.org/wiki/File:Thumbs_up.jpg 

 http://www.nextpowerup.com/news/499/gm-announces- 
 sdk-for-in-car-infotainment-system.html 

 http://sixrevisions.com/user-interface/the-future-of-user- 
 interfaces/ 

 http://www.biometricupdate.com/201303/eye-tracking-
and- 
 gesture-will-control-future-mobile-devices 
24

Gesture recognition document

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

En vedette

En vedette (16)

Similaire à Gesture recognition document

Similaire à Gesture recognition document (20)

Dernier

Dernier (20)

Gesture recognition document