3AMIGAS - Paper5: Feifei Huo

Detection Tracking and Recognition of Human Poses
for a Real Time Spatial Game

Feifei Huo, Emile Hendriks, A.H.J. Oomes, Pascal van Beek, Remco Veltkamp

Presenter: Feifei Huo
Information and Communication Theory (ICT) Group
Delft University of Technology

June 16, 2009

Outline:
• Introduction to visual analysis system
• People detection, tracking and pose recognition system
– Human body detection and body parts segmentation
– Feature points representation and tracking
– Pose recognition

• Experimental results and conclusion
• Spatial game application and future works

Introduction to Visual Analysis System

1. virtual reality
2. smart environment systems
3. sports video indexing
4. advanced users interfaces

Video-based
applications

Pose-Driven
Spatial Game

The state of the art:
• combining bottom-up and top-down approaches.
• incorporating appearance, kinematic, temporal constraints, etc.

The proposed system:
• real time system
• a variety of poses
• spatial game control

Fig.1. The flowchart of the proposed system

People Detection, Tracking and Pose Recognition System

Video People People Pose Spatial
Sequence Detection Tracking Recognition Game

Whole Human Blob
Initial Frame
Detection

Different Body
Parts Segmentation

Methodology
• Background subtraction
– Mixture of Gaussian
• Head and torso detection and tracking
– 2D upper-body model

B F Area( F ) = Area( B)

(a) (b)

Fig.2. (a) Foreground binary image of the initial frame, (b) 2D upper-body model for
human torso detection and tracking.

Particle Filtering
{s (n)
, n = 1, 2 , 3 … , N } →

P( B A = s ( ) )
n

{π (n)
, n = 1, 2,3,… , N } →

8

People detection and tracking
• A sample set {s , π , n = 1, 2, N } is generated with an initial
(n) (n)

distribution s ( n ) = p ( n ) = ( x ( n ) , y ( n ) , scale( n ) ).

• Then the observation steps take place.
(n ) 1 ⎧ ∑ F ( n ) − ∑ B ( n ) , if ∑ F ( n ) >
⎪ ∑B (n)
⎫
⎪
P(B A = s )=ω (n)
= (n)
×⎨ ⎬
Area ( F ) ⎪ ⎩ 0, otherwise ⎪
⎭

People detection and tracking

• This observation is updated by taking
the prior weight into account.
π t(−1)
n
ωt
(n)
=ω (n)
× N

∑π
n =1
(n)
t −1

• The normalized observation forms a
new set of particle weight.
ωt( n )
π (n)
t = N Fig.3. 2D upper-body model for human
∑ω
n =1
t
(n)
torso detection and tracking.

Methodology

• Hand detection and tracking
– Foreground pixels are segmented into skin-color and non-skin-
color regions.
B π π G π π B π π
arctan( ) − < , arctan( ) − < , arctan( ) − <
R 4 8 R 6 18 G 5 15

– The face is excluded from the candidate hands regions by using
the size of the connected skin color area.



Feature Points
Multiple Views
Location

Subsequent Feature Points
Video Frames Tracking

Torso and Hand Segmentation

Fig.4. Results of torso and hand segmentation

3D Reconstruction
• Three synchronized cameras are used.
– One front view
– Two side views

• The 3D positions of torso and hands can be obtained.

Fig.5. Multiple camera settings



Construction

Predefined Key
Classifier
Poses

Pose Recognition

Pose Recognition

• Feature space construction

2D and 3D positions of the torso center and the hands

normalized feature space

relative positions between hands and torso center

Predefined Key Poses

Pose Classification
• 9 poses into 9 classes
• 15 persons
• 1515 samples in total

Results and Discussion
Cross-validation results of pose classifiers (mean errors with standard deviation)
method LOPO FORO
mean pose err. max pose err. mean pose err. max pose err.

NMC 0.06(0.09) 0.18(0.35) 0.04(0.02) 0.09(0.10)

LDC 0.06(0.07) 0.14(0.35) 0.01(0.01) 0.04(0.05)

QDC 0.10(0.11) 0.23(0.34) 0.01(0.01) 0.04(0.06)

LDA+QDC 0.07(0.09) 0.16(0.35) 0.02(0.01) 0.04(0.06)

Parzen 0.07(0.09) 0.16(0.35) 0.01(0.01) 0.02(0.04)

LDA+Parzen 0.06(0.07) 0.14(0.35) 0.00(0.00) 0.01(0.03)

Conclusion: the simplest method (NMC) provides comparable
performance to more complex classifiers.

Results and Discussion
Confusion matrices of nine poses
Estimated Labels
P1 P2 P3 P4 P5 P6 P7 P8 P9
P1 198 0 0 0 0 0 0 0 0
True Labels

P2 0 193 0 0 0 0 0 0 0
P3 2 0 157 0 0 0 0 0 0
P4 0 0 0 159 0 20 0 0 0
P5 1 0 1 0 164 0 2 0 0
P6 2 3 6 0 0 129 0 0 0
P7 0 0 1 0 3 0 164 0 0
P8 0 0 9 0 6 0 1 162 0
P9 0 0 5 3 0 0 0 0 133

Conclusion: most of the poses can be recognized very well.
However, there is quite a large error between pose4 and pose6.



Pose Color Control

Location Position Control

Application: Spatial Game

• Real-time application: 20 frames/second PRSD Studio, http://prsysdesign.net/

• Robust to different environments: different indoor settings

• Adapt to different users: various users

Future Works

• Improve the robustness of the system
better skin colour detection, more robust feature detection

• Develop multiple-user applications
solve occlusion problem

Thanks for your attention !

?

3AMIGAS - Paper5: Feifei Huo

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (13)

Similaire à 3AMIGAS - Paper5: Feifei Huo

Similaire à 3AMIGAS - Paper5: Feifei Huo (20)

Plus de FOCUS K3D

Plus de FOCUS K3D (12)

Dernier

Dernier (20)

3AMIGAS - Paper5: Feifei Huo