2. Gesture
• What is a gesture?
• An action intended to communicate feelings or intentions
• What is “Gesture Detection” or “Gesture Recognition”?
• Computer’s ability to understand human gestures as input
• First used in 1963 with pen-based input device
• What is it used for?
• Mouse movements, Handwriting recognition, Sign language,
recognition, Touch screen input, Kinect
KINECT Programming
3. Interaction metaphors
• Depend by the tasks
• Important aspect in design of UI
Cursors (hands tracking): Avatars (body tracking):
Target an object Interaction with virtual space
KINECT Programming
4. The shadow/mirror effect
Shadow Effect: Mirror Effect:
• I see the back of my avatar • I see the front of my avatar
• Problems with Z movements • Problem with mapping left/right
movements
KINECT Programming
23. Heuristics
• Experience-based techniques for problem solving, learning, and
discovery
Cost
• Cost effective
• Helps reconstruct missing
information
• Helps compute outcome of
a gesture
Gesture
Heuristics Machine Learning Complexity
KINECT Programming
24. Define What Constitutes a Gesture
• Some players have more energy (or enthusiasm) than
others
• Some players will “optimize” their gestures
• Most players will not perform the gesture precisely as
intended
KINECT Programming
25. Select the Right Triggers
• Use skeleton view to analyze whole skeleton behavior
• Use joint view to isolate and analyze specific joints and
axis behavior
• Use data sheet view: to get the real numbers
• Not all joints are needed
• Player location in the play area can cause some joints to
become occluded
KINECT Programming
26. Define Key Stages of a Gesture
• Determine
• When the gesture begins
• When the gesture ends
• Determine other key stages
• Changes in motion direction
• Pauses
• …
• You could simply signal that the gesture has been completed, or
• You could keep a progress, or
• You could use distinct states
KINECT Programming
27. Determine the Type of Outcome
• Definite gesture • Continuous gesture
• Contact or release • Frequency
point • Amplitude
• Direction
• Initial velocity
KINECT Programming
28. Run a Detection Filter Only When Necessary
• Define clear context for when a gesture is expected
• Provide clear feedback to the player
• Run the gesture filter when the context warrants it
• Cancel the gesture if context changes
KINECT Programming
29. Causes of Missing Information
• Self Occlusion
• Side poses
• Player’s position in play space
• Obstacles
• Other players
• Furniture
• Outside the camera’s field of view
• Left or right (easy to fix)
• Top or bottom (hard to avoid)
KINECT Programming
40. Pros & Cons
PROs
• Easy to understand
• Easy to implement (for simple gestures)
• Easy to debug
Recommendation
Use for simple gestures
CONs • Hand wave
• Challenging to choose best values for
parameters • Head movement
• Doesn’t scale well for variants of same
gesture
• Gets challenging for complex gestures
• Challenging to compensate for latency
KINECT Programming
47. Network Definition for Detector
• Similar to perceptron
• Normalize using weights
• Use probabilities, not Booleans
P1
1
P2 2
iPi
n
i 1
i
n
n i 1
Pn
KINECT Programming
48. Surely This Will Suffice?
HeadAboveBaseLine
0.3
LeftKneeAboveBaseLine 0.1
0.1 0.8 Jump?
RightKneeAboveBaseLine
0.5
LegsStraightPreviouslyBent
• But due to noise, still many false positives
• How can we reduce false positives?
KINECT Programming
49. And We’re Done!
HeadAboveBaseLine 0.3
LeftKneeAboveBaseLine 0.1
RightKneeAboveBaseLine
0.1 0.8
LegsStraightPreviouslyBent 0.5
1
HeadBelowBaseLine
2 Jump?
1
AND
LeftKneeBelowBaseLine 1
OR NOT 1
RightKneeBelowBaseLine 1 -1
1 0
LeftAnkleBelowBaseLine 1
1
RightAnkleBelowBaseLine
1
BodyFaceUpwards
KINECT Programming
50. But Wait, If We Know For Sure…
HeadAboveBaseLine 0.3 HeadFarAboveBaseLine
0.1 1
LeftKneeAboveBaseLine
RightKneeAboveBaseLine
0.1 0.8 OR
1
LegsStraightPreviouslyBent 0.5 1 Jump?
HeadBelowBaseLine
2
1
AND
LeftKneeBelowBaseLine 1 1
OR NOT
RightKneeBelowBaseLine 1 -1
1 0
LeftAnkleBelowBaseLine 1
1
RightAnkleBelowBaseLine
1
BodyFaceUpwards
KINECT Programming
51. Implementation Overview
• Update height baseline values
• Update input nodes, i.e. algorithmic gestures
• Evaluate each node in network
• Calculate probability of gesture
KINECT Programming
52. Pros
• Neural networks well understood
• Introduced in 1940’s
• Learning algorithm can be used to find optimum
• Parameters, weights, and thresholds
• Complex gestures can be detected
• Scale well for variants of same gesture
• Nodes can be reused in different gestures
• Easy to visualize as node graph
• Good CPU performance
• 0.095 ms to execute Jump Detector
KINECT Programming
53. Cons
• Lots of parameters, weights, and thresholds
• Small changes can have dramatic changes in results
• Very time consuming to choose manually
• Not easy to debug
• Is the code wrong or are parameters not optimal
• Challenging to compensate for latency
KINECT Programming
54. Recommendation
• Use for more complex gestures
• Jump, duck, punch
• Break complex gestures into collection of simple
gestures
• Use learning algorithm
• Debug visualization is essential
KINECT Programming
56. Gesture Definition
• Define gesture as pre-recorded animations
• Motion capture animations
• Record different people doing same gesture
• Each person doing same gesture multiple times
KINECT Programming
57. Exemplar
• Definition: ideal example to compare against
• Pre-recorded animations are exemplars
KINECT Programming
58. Exemplar Matching
• Need to compare skeleton frames
• Define error metric for skeleton
• Angular difference for each joint in local space
• Peak Signal to Noise Ratio for whole skeleton
1
MSE Distancei2
0.3 N
PSNR 10 * log10 ( MAX 2 / MSE )
KINECT Programming
59. Exemplar Matching
• Search for best matching frames
• Best matching frame has strongest signal
• Different classifiers can be used
• K-Nearest
• Dynamic Time Warping (DTW)
• Hidden Markov Models (HMM)
KINECT Programming
61. Pros
• Works well for context-sensitive gesture detection
• Works well for animation blending
• Very complex gestures can be detected
• DTW allows for different speeds
• Can compensate for latency
• Can scale for variants of same gesture
• Just need more resources
• Easy to visualize exemplar matching
KINECT Programming
62. Cons
• Requires lots of resources to be robust
• Multiple recordings of multiple people for one
gesture
• i.e. requires lots of CPU and memory
• K-Nearest
• 1.5 ms for 16 exemplar matches
• DTW
• 5 ms for 16 exemplar matches
KINECT Programming
63. Example
• 10 Gestures, 10 People, 5 times = 500 Exemplars
• K-Nearest 180
• 46 ms 160
• DTW 140 K-Nearest
120
• 156 ms
100 DTW
• Weighted network 80
• 1 ms 60 Weighted
40 Network
20
0
KINECT Programming
64. Recommendation
• Use for context-sensitive gesture detection
• Use for complex gestures
• Dancing, fitness exercises
• Use when reducing latency is critical
• Optimize by reducing exemplar matches
• Preprocess exemplar data with key frames
• Use context of game
• Use another fast method first
• Implement debug visualization
KINECT Programming
67. Data Collection
Jump
Identify Gestures Punch
At least depth & skeleton 1. Exemplar
2. Sequence of same gesture
Record Gestures 3. General (actual game play)
Old, young, male, female,
overweight, handedness
Meta data per recording, tag
Tag Gesture Recordings start/stop events for each
gesture
Use custom tool,or export to
Excel
Someone other than tagger
Verify Gesture Tagging should verify correctness
Backup & Share
KINECT Programming
68. Development
Phase 1 – Exemplar Data Tagged Gesture
Phase 2 – Sequence Data Recordings
Phase 3 – General Data Filter Joints Normalize
Skeleton
Parameters
Gesture Debug
Weights
Detector Visualization
Thresholds
Result Verification
Machine Learning
Algorithm Error
KINECT Programming
69. Testing
Live Camera Tagged Gesture
Stream Recordings
Filter Joints
Normalize Skeleton
Parameters
Weights Gesture
Thresholds Detector
Human Verification
Result
Verification
Feels No Data Error
Robust? Collection
KINECT Programming
70. Takeaways
• A system, not just a detector
• Detector is small component
• Invest equally in other components
• Manage data
• You’ll have lots of it!
• Most valuable component
• Tagging correctly is essential
• Collect real user data
KINECT Programming
71. References
• “A Brief History of Human Computer Interaction Technology” – Brad A. Myers
• “Neural Networks – A Systematic Introduction” – Raúl Rojas
• “A Gesture Processing Framework for Multimodal Interaction in Virtual Reality” – Marc E. Latoschik
• Gamefest 2010 – “Gesture Recognition” – Lewey Geselowitz & J. McBride
• Kinect Developer Summit 2011 – “Inside Kinect Skeletal Tracking Deep Dive” – Zsolt Mathe
KINECT Programming