Writing applications using the Microsoft Kinect Sensor
1. Writing applications using the
Microsoft Kinect sensor
Phil Denoncourt
phil@denoncourtassociates.com
Philknows.net
2. About me
Consultant based in Concord NH
Writing software for over 20 years
Writing .NET applications for 10 years
MCPD, MCITP, MCSD, MCDBA, MCSE
Philknows.net
3. Kinect Features
Motion sensing
device for Xbox 360 +
Windows
Contains
RGB Camera
Depth Sensor
Multi array microphone
4. Kinect SDK
Hardware Software
Dual Core > 2.66 Windows 7
GHz Windows 7 Embedded
2 GB Ram (4 Rec’d) DirectX 9.0c
Visual Studio 2010
Kinect for Windows
Can use Xbox Kinect Microsoft Speech
with power adapter for Platform 11
development
5. SDK Features
Kinect Drivers
Supports up to 4 connected devices
Each device needs a dedicated USB bus
Managed + Native libraries
Access to the various streams
Video
Depth
Skeleton
Manipulate Camera Elevation
Access to multi-array microphone
6. What it doesn’t do
Doesn’t work with XNA for Xbox
Need XDK to develop Kinect for Xbox
Does work with XNA for Windows
Skeleton Limitations
Doesn’t determine fingers
Doesn’t determine skull features
Eyes, Jaw, Nose…
Only works on humanoid figures
No person/face recognition
Speech Recognition doesn’t support Dictation
7. Depth Stream
Depth “Image” captured 30/sec
Returned as byte array
Left-Right, Top to Bottom
Returns distance of pixel in millimeters
Between 850 – 4000 mm
-1 = unknown (Shadows, reflectivity)
Near mode allows between 400-3000 mm
Also contains info describing which player
occupies that pixel.
8. Skeleton Streams
Can capture and track up 2
skeletons
Can monitor up to 6
Captures data at 30/sec
Captures a collection of 20
joints
X,Y,Z position in meters from
the sensor
Some joints are inferred
Recognizes “partial” skeletons
No indication of joint’s
orientation
Where is the person looking?
No built in gesture support
Choose which skeleton to
track, or sensor can
automatically determine.
9. Basic Models of Interaction
Event based
Event is raised for every frame
You must copy data from frame before next
frame comes in
Routines should read data quickly
Interrogation based
You ask the sensor for the latest frame
Up to you when you ask
Might miss frames
10. Audio Processing
4 microphone array
Processing occurs on Kinect hardware
Echo Cancelation
Position Tracking
Other
Noise Suppression
Reduction
Recording is done on separate thread
Make sure apps are MTA, not STA
11. Speech Recognition
Command based recognition only
Kinect uses Microsoft.Speech libraries
Not System.Speech
Needs Speech Platform Runtime (v11)
App needs to be MTA, not STA
13. Upcoming
New SDK released late May
Should be compatible with v1
Gesture Recording
Stronger support for “seated” skeleton
ASUS is rumored to be releasing laptop
with embedded Kinect