SlideShare une entreprise Scribd logo
1  sur  32
Articulated Human Pose Estimation by Deep
Learning
Wei Yang
Supervisor: Xiaogang Wang, Wanli Ouyang
wyang@ee.cuhk.edu.hk
Outline
• Introduction
• Regression by Convolutional Neural Network
• Deformable Convolutional Neural Networks
• Discussion and Future work
2016/8/11 2
Introduction
Articulated body pose estimation
“recovers the pose of an articulated body, which consists of
joints and rigid parts using image-based observations.”
2016/8/11 3
Applications
Action recognition
Human tracking
Clothing Parsing
Gaming
2016/8/11 4
Challenges
2016/8/11 5
Classic Approaches
Fischler & Elschlager 1973
Felzenszwalb & Huttenlocher 2005
Pictorial Structure
• Unary Templates
• Pairwise Springs
Yang & Ramanan 2011
Mixtures of “mini-parts”
• Mixture of part 𝑖
• Unary template for part 𝑖 with mixture 𝑚𝑖
• Pairwise springs between part 𝑖 with
mixture 𝑚𝑖 and part 𝑗 with mixture 𝑚𝑗
2016/8/11 6
Deep Learning Methods
Multi-source Deep Learning
• Candidate estimations
• Deep model uses multi-source
including appearance score, mixture
type, and deformation.
Ouyang et al. 2014
Deeppose
• Reasoning pose in a holistic fashion
• refines the joint predictions by using
higher resolution sub-images
Toshev & Szegedy 2014
2016/8/11 7
We propose to study pose estimation in two ways
• Holistic View
–Regression of joint locations by convolutional neural
networks (CNNs)
• Local information
–Deformable Convolutional Neural Networks
2016/8/11 8
Regression by Convolutional Neural Network
2016/8/11 9
Formulation
• Image: 𝐼
• Part location: 𝐩 = 𝑝𝑖 𝑖=1
𝑃
= 𝑥𝑖, 𝑦𝑖 𝑖=1
𝑃
𝜓( 𝐼 ; 𝜃) = 𝐩
Location of part 𝑖:
𝑝𝑖 = (𝑥𝑖, 𝑦𝑖)
Learned by deep CNN
2016/8/11 10
Basic Architecture of the CNN Regressor
• AlexNet
– Krizhevsky, Sutskever, and Hinton, NIPS 2012
– The first time deep model is shown to be effective on large scale
computer vision task.
2016/8/11 11
Normalize Scale of Human Body
• Size of the CNN input is fixed
• Simple warping changes the aspect ratio of people
• People appear at different scales of an image
1. Original image 2. Human detection
[Ouyang et al. CVPR 2014]
3. Crop by bbox 4. Padding with
mean RGB value
2016/8/11 12
Architecture 1
• Loss function:
• Evaluation metric: PCP
2016/8/11 13
Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8
Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3
Architecture 2
Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8
Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3
Fc8
(AlexNet)
81.1 63.7 72.8 66.6 50.6 21.9 56.9
2016/8/11 14
• Loss function:
• Evaluation metric: PCP
Architecture 3
Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8
Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3
Fc8
(AlexNet)
81.1 63.7 72.8 66.6 50.6 21.9 56.9
Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9
2016/8/11 15
• Loss function:
• Evaluation metric: PCP
PCP and PDJ on LSP
# Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
ours
1 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3
2
Fc8
(AlexNet)
81.1 63.7 72.8 66.6 50.6 21.9 56.9
3
Fc8
(LSP-extend)
83.1 67.2 75.0 68.7 53.4 25.6 59.6
4 Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9
5 Fc10 (Fusion) 84.8 71.8 77.6 71.2 55.9 29.2 62.5
State-of-
the-art
methods
6 Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8
7 Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6
2016/8/11 16
Results on LSP dataset
2016/8/11 17
Failure Cases
• articulation
• fore-shortening
• occlusions and distractions
• cluttered background or overlapping people
2016/8/11 18
Deformable Convolutional Neural Networks
2016/8/11 19
Motivation
• Local image patches are able to capture:
– Part presence
– Pairwise part spatial relationships
2016/8/11 20
Number of mixture type for each pair: 6
Neighbor: 1
# of relationships: 61 = 6
Neighbor: 2
# of relationships: 62
= 36
Lowerarm
Upper arm
[Chen & Yuille NIPS 2014]
Tree-structured Relational Graph
• 𝑇 = 𝑉, 𝐸
– 𝑉: positions of body parts
– 𝐸: pairwise relationships between parts
• 𝐩 = 𝑝𝑖 = {(𝑥𝑖, 𝑦𝑖)}
– 𝑝𝑖: Pixel location of part 𝑖
• 𝑡 = {𝑡𝑖𝑗, 𝑡𝑗𝑖| 𝑖, 𝑗 ∈ 𝐸}
– Pairwise relationship
– Defined by relative position
– 𝑡𝑖𝑗 ∈ 1, … , 𝑇𝑖𝑗
– In experiment: 13 type for each pair
𝑖, 𝑗 ∈ 𝐸
2016/8/11 21
Formulation
2016/8/11 22
𝐹 𝐩, 𝐭 𝐼; 𝝎, 𝜃 =
𝑖∈𝑉
𝐴𝑖(𝑝𝑖|𝐼; 𝜃)
Part
presence
𝜔𝑖 ⋅
Inference: 𝐩∗
, 𝐭∗
= arg max
𝐩,𝐭
𝐹 𝐩, 𝐭 𝐼; 𝝎, 𝜃
• Tree structure
• Can be solved efficiently by dynamic programming
𝜔𝑖, 𝜔𝑖𝑗, 𝝎𝑖𝑗
𝑡 𝑖𝑗
are currently learned by Latent structure SVM
+
(𝑖,𝑗)∈𝐸
𝑅(𝑝𝑖, 𝑝𝑗, 𝑡𝑖𝑗, 𝑡𝑗𝑖|𝐼; 𝜃)
Pairwise
deformation
+𝝎𝑖𝑗
𝑡 𝑖𝑗
⋅𝜔𝑖𝑗 ⋅
Pairwise
Relationship
Learning parameters 𝜃
2016/8/11 23
Derive the type label for each patch
• use relative position 𝑑𝑖𝑗 to represent
the pairwise relations
• Cluster the relative positions over the
whole training set 𝑑𝑖𝑗 𝑖=1
𝑁
• Type label 𝑡𝑖𝑗
𝑛
: cluster index
• Mean relative position 𝑟𝑖𝑗
𝑡 𝑖𝑗
: cluster
center
Casting Full Connections into Convolutions
2016/8/11 24
Elbow
Part presence map
Pairwise relationship
map
PCP and PDJ on LSP dataset and FLIC dataset
Dataset Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP
LSP
DCNN 92.5 85.1 82.7 76.3 70.2 55.9 74.8
Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6
FLIC DCNN 87.0 98.8 - - 96.5 84.0 91.1
LSP FLIC
2016/8/11 25
Future work
2016/8/11 26
Future Work
• Build end-to-end system to estimate human pose
• Consider combining local information and holistic view
• Beyond tree structure
2016/8/11 27
Thank you
Articulated Human Pose Estimation by Deep Learning
Appendix
Data Augmentation
Evaluation Metrics
2016/8/11 29
Data Augmentation
• The number of training data of existing datasets are
insufficient to train deep CNNs
– Statistics of existing datasets
– Number of parameters of AlexNet: 60 million
• Data augmentation is efficient to prevent overfitting
Dataset # Training
images
# Testing
images
Type
PARSE 100 205 Full body
LSP 1,000 1,000 Full body
LSP extend 10,000 - Full
FLIC 3,987 1,016 Upper body
MPII 28,821 11,701 Full body
2016/8/11 30
Data Augmentation (cont.)
• Random padding
• Rotating
– ±[2.5◦, 5◦, 7.5◦, 10◦, 15◦, 20◦]
• Flipping
2016/8/11 31
Evaluation Metrics
• Percentage of Correct Parts (PCP)
– measures the percentage of correctly localized body parts.
– A candidate body part is treated as correct if its segment endpoints lie within
50% of the length of the ground-truth annotated endpoints.
• Percentage of Detected Joints (PDJ)
– measures the performance using a curve of the percentage of correctly localized
joints by varying localization precision threshold, which is normalized by the
scale defined as distance between left shoulder and right hip
– invariant to scale
2016/8/11 32

Contenu connexe

Tendances

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkNader Karimi
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Yoga Posture Classification using Computer Vision
Yoga Posture Classification using Computer VisionYoga Posture Classification using Computer Vision
Yoga Posture Classification using Computer VisionDr. Amarjeet Singh
 
Hough Transform By Md.Nazmul Islam
Hough Transform By Md.Nazmul IslamHough Transform By Md.Nazmul Islam
Hough Transform By Md.Nazmul IslamNazmul Islam
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingYosuke Shinya
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationTaeoh Kim
 
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksPR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksHyeongmin Lee
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionEntrepreneur / Startup
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object DetectionTaegyun Jeon
 
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出MPRG_Chubu_University
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisHyeongmin Lee
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 

Tendances (20)

Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Object detection
Object detectionObject detection
Object detection
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
You only look once
You only look onceYou only look once
You only look once
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Object Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning FrameworkObject Detection Using R-CNN Deep Learning Framework
Object Detection Using R-CNN Deep Learning Framework
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Super resolution
Super resolutionSuper resolution
Super resolution
 
Yoga Posture Classification using Computer Vision
Yoga Posture Classification using Computer VisionYoga Posture Classification using Computer Vision
Yoga Posture Classification using Computer Vision
 
Hough Transform By Md.Nazmul Islam
Hough Transform By Md.Nazmul IslamHough Transform By Md.Nazmul Islam
Hough Transform By Md.Nazmul Islam
 
Dataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene UnderstandingDataset for Semantic Urban Scene Understanding
Dataset for Semantic Urban Scene Understanding
 
Pr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentationPr045 deep lab_semantic_segmentation
Pr045 deep lab_semantic_segmentation
 
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional NetworksPR-214: FlowNet: Learning Optical Flow with Convolutional Networks
PR-214: FlowNet: Learning Optical Flow with Convolutional Networks
 
You only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detectionYou only look once (YOLO) : unified real time object detection
You only look once (YOLO) : unified real time object detection
 
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
[PR12] You Only Look Once (YOLO): Unified Real-Time Object Detection
 
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出
Relational Binarized HOG特徴量とReal AdaBoostによるバイナリ選択を用いた物体検出
 
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View SynthesisPR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
PR-302: NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 

En vedette

Pose Machine
Pose MachinePose Machine
Pose MachineWei Yang
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksWei Yang
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageWei Yang
 
Manifold learning
Manifold learningManifold learning
Manifold learningWei Yang
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose MachinesTakanori Ogata
 
Recovering 3D human body configurations using shape contexts
Recovering 3D human body configurations using shape contextsRecovering 3D human body configurations using shape contexts
Recovering 3D human body configurations using shape contextswolf
 
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)Lac Hong University
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesUniversitat Politècnica de Catalunya
 
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...Grand Canyon Visitor Center
 
Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDavid Thompson
 
Shape Matching and Object Recognition Using Shape Contexts
Shape Matching and Object Recognition Using Shape ContextsShape Matching and Object Recognition Using Shape Contexts
Shape Matching and Object Recognition Using Shape ContextsRatul Alahy
 
Shape Matching And Object Recognition Using Shape Contexts
Shape Matching And Object Recognition Using Shape ContextsShape Matching And Object Recognition Using Shape Contexts
Shape Matching And Object Recognition Using Shape Contextsnlnngu
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusionJongju Shin
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesAlex Orso
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and trackingJavier_Barbadillo
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Methodsuzyhgoodwin
 
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection modelWEBFARMER. ltd.
 

En vedette (20)

Pose Machine
Pose MachinePose Machine
Pose Machine
 
Deformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural NetworksDeformable Part Models are Convolutional Neural Networks
Deformable Part Models are Convolutional Neural Networks
 
Deep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single imageDeep convolutional neural fields for depth estimation from a single image
Deep convolutional neural fields for depth estimation from a single image
 
Manifold learning
Manifold learningManifold learning
Manifold learning
 
Convolutional Pose Machines
Convolutional Pose MachinesConvolutional Pose Machines
Convolutional Pose Machines
 
Recovering 3D human body configurations using shape contexts
Recovering 3D human body configurations using shape contextsRecovering 3D human body configurations using shape contexts
Recovering 3D human body configurations using shape contexts
 
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)
2.51 tổ chức lớp viết báo khoa học y khoa đăng trên tạp chí quốc tế (4)
 
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of FeaturesContextless Object Recognition with Shape-enriched SIFT and Bags of Features
Contextless Object Recognition with Shape-enriched SIFT and Bags of Features
 
Monocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian NetworksMonocular Human Pose Estimation with Bayesian Networks
Monocular Human Pose Estimation with Bayesian Networks
 
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...20 Instagram Pics that will Have You Wanting to Visit the  Grand Canyon Just ...
20 Instagram Pics that will Have You Wanting to Visit the Grand Canyon Just ...
 
Docking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD upDocking Pose Assessment: The importance of keeping your GARD up
Docking Pose Assessment: The importance of keeping your GARD up
 
Shape Matching and Object Recognition Using Shape Contexts
Shape Matching and Object Recognition Using Shape ContextsShape Matching and Object Recognition Using Shape Contexts
Shape Matching and Object Recognition Using Shape Contexts
 
Shape context
Shape context Shape context
Shape context
 
Shape Matching And Object Recognition Using Shape Contexts
Shape Matching And Object Recognition Using Shape ContextsShape Matching And Object Recognition Using Shape Contexts
Shape Matching And Object Recognition Using Shape Contexts
 
All pose face alignment robust to occlusion
All pose face alignment robust to occlusionAll pose face alignment robust to occlusion
All pose face alignment robust to occlusion
 
Towards the Extended Pose
Towards the Extended PoseTowards the Extended Pose
Towards the Extended Pose
 
How to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering ConferencesHow to Get My Paper Accepted at Top Software Engineering Conferences
How to Get My Paper Accepted at Top Software Engineering Conferences
 
Single person pose recognition and tracking
Single person pose recognition and trackingSingle person pose recognition and tracking
Single person pose recognition and tracking
 
Efficient Running with Pose Method
Efficient Running with Pose MethodEfficient Running with Pose Method
Efficient Running with Pose Method
 
Introduction to YOLO detection model
Introduction to YOLO detection modelIntroduction to YOLO detection model
Introduction to YOLO detection model
 

Similaire à Articulated human pose estimation by deep learning

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu
 
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...민진 최
 
Feature Extraction and Classification of NIRS Data
Feature Extraction and Classification of NIRS DataFeature Extraction and Classification of NIRS Data
Feature Extraction and Classification of NIRS DataPritam Mondal
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorSoma Boubou
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrellzukun
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrellzukun
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFIan Foster
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Bulbul Agrawal
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Postermultimediaeval
 
Comparison of Machine Learning Techniques for Identification of Disease
Comparison of Machine Learning Techniques for Identification of DiseaseComparison of Machine Learning Techniques for Identification of Disease
Comparison of Machine Learning Techniques for Identification of DiseaseIJCSIS Research Publications
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Networkijaia
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distddm314
 
Spatial and Temporal Features of Noise in fMRI
Spatial and Temporal Features of Noise in fMRISpatial and Temporal Features of Noise in fMRI
Spatial and Temporal Features of Noise in fMRIVanessa S
 
A supervised lung nodule classification method using patch based context anal...
A supervised lung nodule classification method using patch based context anal...A supervised lung nodule classification method using patch based context anal...
A supervised lung nodule classification method using patch based context anal...ASWATHY VG
 
To identify the person using gait knn based approach
To identify the person using gait   knn based approachTo identify the person using gait   knn based approach
To identify the person using gait knn based approacheSAT Journals
 
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...Institute of Information Systems (HES-SO)
 

Similaire à Articulated human pose estimation by deep learning (20)

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...
 
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
Session-aware Linear Item-Item Models for Session-based Recommendation (WWW 2...
 
DefenseTalk_Trimmed
DefenseTalk_TrimmedDefenseTalk_Trimmed
DefenseTalk_Trimmed
 
Feature Extraction and Classification of NIRS Data
Feature Extraction and Classification of NIRS DataFeature Extraction and Classification of NIRS Data
Feature Extraction and Classification of NIRS Data
 
Human action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptorHuman action recognition with kinect using a joint motion descriptor
Human action recognition with kinect using a joint motion descriptor
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
 
Fcv bio cv_cottrell
Fcv bio cv_cottrellFcv bio cv_cottrell
Fcv bio cv_cottrell
 
Going Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCFGoing Smart and Deep on Materials at ALCF
Going Smart and Deep on Materials at ALCF
 
Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...Standard Statistical Feature analysis of Image Features for Facial Images usi...
Standard Statistical Feature analysis of Image Features for Facial Images usi...
 
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - PosterMediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
MediaEval 2015 - UNED-UV @ Retrieving Diverse Social Images Task - Poster
 
F0164348
F0164348F0164348
F0164348
 
Comparison of Machine Learning Techniques for Identification of Disease
Comparison of Machine Learning Techniques for Identification of DiseaseComparison of Machine Learning Techniques for Identification of Disease
Comparison of Machine Learning Techniques for Identification of Disease
 
Offline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural NetworkOffline Character Recognition Using Monte Carlo Method and Neural Network
Offline Character Recognition Using Monte Carlo Method and Neural Network
 
Pca seminar final report
Pca seminar final reportPca seminar final report
Pca seminar final report
 
Morgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 distMorgan uw maGIV v1.3 dist
Morgan uw maGIV v1.3 dist
 
Spatial and Temporal Features of Noise in fMRI
Spatial and Temporal Features of Noise in fMRISpatial and Temporal Features of Noise in fMRI
Spatial and Temporal Features of Noise in fMRI
 
A supervised lung nodule classification method using patch based context anal...
A supervised lung nodule classification method using patch based context anal...A supervised lung nodule classification method using patch based context anal...
A supervised lung nodule classification method using patch based context anal...
 
To identify the person using gait knn based approach
To identify the person using gait   knn based approachTo identify the person using gait   knn based approach
To identify the person using gait knn based approach
 
SCT course
SCT courseSCT course
SCT course
 
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...
Evaluation of a Hierarchical Anatomical Segmentation Approach in VISCERAL Ana...
 

Dernier

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )aarthirajkumar25
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfmuntazimhurra
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...Sérgio Sacani
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxAArockiyaNisha
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PPRINCE C P
 

Dernier (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Biological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdfBiological Classification BioHack (3).pdf
Biological Classification BioHack (3).pdf
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Engler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomyEngler and Prantl system of classification in plant taxonomy
Engler and Prantl system of classification in plant taxonomy
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptxPhysiochemical properties of nanomaterials and its nanotoxicity.pptx
Physiochemical properties of nanomaterials and its nanotoxicity.pptx
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Artificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C PArtificial Intelligence In Microbiology by Dr. Prince C P
Artificial Intelligence In Microbiology by Dr. Prince C P
 

Articulated human pose estimation by deep learning

  • 1. Articulated Human Pose Estimation by Deep Learning Wei Yang Supervisor: Xiaogang Wang, Wanli Ouyang wyang@ee.cuhk.edu.hk
  • 2. Outline • Introduction • Regression by Convolutional Neural Network • Deformable Convolutional Neural Networks • Discussion and Future work 2016/8/11 2
  • 3. Introduction Articulated body pose estimation “recovers the pose of an articulated body, which consists of joints and rigid parts using image-based observations.” 2016/8/11 3
  • 6. Classic Approaches Fischler & Elschlager 1973 Felzenszwalb & Huttenlocher 2005 Pictorial Structure • Unary Templates • Pairwise Springs Yang & Ramanan 2011 Mixtures of “mini-parts” • Mixture of part 𝑖 • Unary template for part 𝑖 with mixture 𝑚𝑖 • Pairwise springs between part 𝑖 with mixture 𝑚𝑖 and part 𝑗 with mixture 𝑚𝑗 2016/8/11 6
  • 7. Deep Learning Methods Multi-source Deep Learning • Candidate estimations • Deep model uses multi-source including appearance score, mixture type, and deformation. Ouyang et al. 2014 Deeppose • Reasoning pose in a holistic fashion • refines the joint predictions by using higher resolution sub-images Toshev & Szegedy 2014 2016/8/11 7
  • 8. We propose to study pose estimation in two ways • Holistic View –Regression of joint locations by convolutional neural networks (CNNs) • Local information –Deformable Convolutional Neural Networks 2016/8/11 8
  • 9. Regression by Convolutional Neural Network 2016/8/11 9
  • 10. Formulation • Image: 𝐼 • Part location: 𝐩 = 𝑝𝑖 𝑖=1 𝑃 = 𝑥𝑖, 𝑦𝑖 𝑖=1 𝑃 𝜓( 𝐼 ; 𝜃) = 𝐩 Location of part 𝑖: 𝑝𝑖 = (𝑥𝑖, 𝑦𝑖) Learned by deep CNN 2016/8/11 10
  • 11. Basic Architecture of the CNN Regressor • AlexNet – Krizhevsky, Sutskever, and Hinton, NIPS 2012 – The first time deep model is shown to be effective on large scale computer vision task. 2016/8/11 11
  • 12. Normalize Scale of Human Body • Size of the CNN input is fixed • Simple warping changes the aspect ratio of people • People appear at different scales of an image 1. Original image 2. Human detection [Ouyang et al. CVPR 2014] 3. Crop by bbox 4. Padding with mean RGB value 2016/8/11 12
  • 13. Architecture 1 • Loss function: • Evaluation metric: PCP 2016/8/11 13 Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3
  • 14. Architecture 2 Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3 Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9 2016/8/11 14 • Loss function: • Evaluation metric: PCP
  • 15. Architecture 3 Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3 Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9 Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9 2016/8/11 15 • Loss function: • Evaluation metric: PCP
  • 16. PCP and PDJ on LSP # Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP ours 1 Conv5 58.8 24.1 49.6 36.6 25.8 2.8 31.3 2 Fc8 (AlexNet) 81.1 63.7 72.8 66.6 50.6 21.9 56.9 3 Fc8 (LSP-extend) 83.1 67.2 75.0 68.7 53.4 25.6 59.6 4 Fc10 84.1 68.8 76.8 69.4 54.9 26.8 60.9 5 Fc10 (Fusion) 84.8 71.8 77.6 71.2 55.9 29.2 62.5 State-of- the-art methods 6 Yang&Ramanan 84.1 77.1 69.5 65.6 52.5 35.9 60.8 7 Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6 2016/8/11 16
  • 17. Results on LSP dataset 2016/8/11 17
  • 18. Failure Cases • articulation • fore-shortening • occlusions and distractions • cluttered background or overlapping people 2016/8/11 18
  • 19. Deformable Convolutional Neural Networks 2016/8/11 19
  • 20. Motivation • Local image patches are able to capture: – Part presence – Pairwise part spatial relationships 2016/8/11 20 Number of mixture type for each pair: 6 Neighbor: 1 # of relationships: 61 = 6 Neighbor: 2 # of relationships: 62 = 36 Lowerarm Upper arm [Chen & Yuille NIPS 2014]
  • 21. Tree-structured Relational Graph • 𝑇 = 𝑉, 𝐸 – 𝑉: positions of body parts – 𝐸: pairwise relationships between parts • 𝐩 = 𝑝𝑖 = {(𝑥𝑖, 𝑦𝑖)} – 𝑝𝑖: Pixel location of part 𝑖 • 𝑡 = {𝑡𝑖𝑗, 𝑡𝑗𝑖| 𝑖, 𝑗 ∈ 𝐸} – Pairwise relationship – Defined by relative position – 𝑡𝑖𝑗 ∈ 1, … , 𝑇𝑖𝑗 – In experiment: 13 type for each pair 𝑖, 𝑗 ∈ 𝐸 2016/8/11 21
  • 22. Formulation 2016/8/11 22 𝐹 𝐩, 𝐭 𝐼; 𝝎, 𝜃 = 𝑖∈𝑉 𝐴𝑖(𝑝𝑖|𝐼; 𝜃) Part presence 𝜔𝑖 ⋅ Inference: 𝐩∗ , 𝐭∗ = arg max 𝐩,𝐭 𝐹 𝐩, 𝐭 𝐼; 𝝎, 𝜃 • Tree structure • Can be solved efficiently by dynamic programming 𝜔𝑖, 𝜔𝑖𝑗, 𝝎𝑖𝑗 𝑡 𝑖𝑗 are currently learned by Latent structure SVM + (𝑖,𝑗)∈𝐸 𝑅(𝑝𝑖, 𝑝𝑗, 𝑡𝑖𝑗, 𝑡𝑗𝑖|𝐼; 𝜃) Pairwise deformation +𝝎𝑖𝑗 𝑡 𝑖𝑗 ⋅𝜔𝑖𝑗 ⋅ Pairwise Relationship
  • 23. Learning parameters 𝜃 2016/8/11 23 Derive the type label for each patch • use relative position 𝑑𝑖𝑗 to represent the pairwise relations • Cluster the relative positions over the whole training set 𝑑𝑖𝑗 𝑖=1 𝑁 • Type label 𝑡𝑖𝑗 𝑛 : cluster index • Mean relative position 𝑟𝑖𝑗 𝑡 𝑖𝑗 : cluster center
  • 24. Casting Full Connections into Convolutions 2016/8/11 24 Elbow Part presence map Pairwise relationship map
  • 25. PCP and PDJ on LSP dataset and FLIC dataset Dataset Method Torso Head U.Leg L.Leg U.Arm L.Arm Mean PCP LSP DCNN 92.5 85.1 82.7 76.3 70.2 55.9 74.8 Ouyang et al. 85.8 83.1 76.5 72.2 63.3 46.6 68.6 FLIC DCNN 87.0 98.8 - - 96.5 84.0 91.1 LSP FLIC 2016/8/11 25
  • 27. Future Work • Build end-to-end system to estimate human pose • Consider combining local information and holistic view • Beyond tree structure 2016/8/11 27
  • 28. Thank you Articulated Human Pose Estimation by Deep Learning
  • 30. Data Augmentation • The number of training data of existing datasets are insufficient to train deep CNNs – Statistics of existing datasets – Number of parameters of AlexNet: 60 million • Data augmentation is efficient to prevent overfitting Dataset # Training images # Testing images Type PARSE 100 205 Full body LSP 1,000 1,000 Full body LSP extend 10,000 - Full FLIC 3,987 1,016 Upper body MPII 28,821 11,701 Full body 2016/8/11 30
  • 31. Data Augmentation (cont.) • Random padding • Rotating – ±[2.5◦, 5◦, 7.5◦, 10◦, 15◦, 20◦] • Flipping 2016/8/11 31
  • 32. Evaluation Metrics • Percentage of Correct Parts (PCP) – measures the percentage of correctly localized body parts. – A candidate body part is treated as correct if its segment endpoints lie within 50% of the length of the ground-truth annotated endpoints. • Percentage of Detected Joints (PDJ) – measures the performance using a curve of the percentage of correctly localized joints by varying localization precision threshold, which is normalized by the scale defined as distance between left shoulder and right hip – invariant to scale 2016/8/11 32

Notes de l'éditeur

  1. Good afternoon everyone. It’s my honor to take my screening test here. My name is Wei Yang. I’m from the IVP lab. My talk is about articulated pose estimation with Deep learning methods.
  2. We will first have a brief introduction about the task we address. Then we discuss two methods based on deep learning. Finally we conclude this talk and discuss future work.
  3. According to Wikipedia, the goal of articulated pose estimation is to “recovers the joint positions of articulated limbs, as we show here for a man playing baseball.
  4. There are lots of applications where being able to estimate human pose is useful. For example, pose estimation is helpful for recognizing action. It also helps to parse clothing in fashion photographs. Recently, pose estimation has been successful applied in human tracking and gaming systems.
  5. However, In unconstrained images, human pose estimation can be a very hard problem because people can appear with a variety of poses, clothing, and body shape. In the slides, you can see some very interesting and unusual examples that demonstrate how flexible the human pose is.
  6. A classic approach for human pose estimation is to model the human as a set of parts, such as a head, torso, arm, and leg part. In 3D, these parts can be modeled as cylinders. Pictorial structures use 2D part models, where geometric relations between parts are encoded by springs. However, capturing the whole range of appearances using pictorial structures is still quite difficult. A big problem is that even projections of a simple cylinder into 2D yields many different appearances. So one usually has to explicitly evaluate many different possible in-plane orientations and foreshortenings in order to find a good match for a part template. Yang propose mini parts to approximate these transformations. in this case the mini-parts are tuned to represent near-vertical and near horizontal limbs.
  7. Recently, the state-of-the-art performance on pose estimation are achived by deep learning methods. 1. Ouyang et al. [19] propose to use multi-source deep model for constructing the non-linear representation from multiple information sources, i.e., mixture type, apperance score, and deformations. 2. Deeppose [26] estimates body part locations by a regressor in a holistic manner. Then they refines the joint predictions by using higher resolution sub-images with a convolutional neural network. Howerver, this method suffers from inaccuracy in the high-precision region.
  8. We propose to study pose estimation in two ways First, we study regression of joint locations by CNNs. We want to know how accurate this method is. And what is the limitation of only using a single CNN regressor. We also study methods based on local image patches. And in future work we plan to incorporate the deformable graphical models into the network.
  9. Let I denote an image, and p denote part locations in an image. We want to learn a regression model that given an image, it output all the part locations. This is a non-trivial problem. And we use a CNN as our regression model because of its strong representation ability.
  10. We adopt AlexNet as the basic network structure. This structure was proposed in 2012. It won the imagenet competition on a large margin, and is the first time that deep model is shown to be effective on large scale computer vision task.
  11. The original input size of AlexNet is 227*227. We first simply warp the images to this size. However, we found that the performance is very bad because two reasons: 1. This simple warping method changes the aspect ratio of people 2. Second, people appear at different scales of an image. To keep the aspect ratio and meanwhile to make people in different images to have the same scale. We first detect the rough location of the human, then we crop images with the detected bounding box. Finally, we do padding and warping. Note that we use mean RGB values instead zero to perform padding.
  12. Since existing pose datasets are relatively small. We start from removing the last two fully connected layers. The evaluation metric here is the Percentage of Correct Parts, the higher the better. However, the performance if far from the baseline method. This shows that the model complexity is not high enough to model this complex problem.
  13. Then we increase the complexty by adding two fully connected layers. We achieve 56.9 mean PCP, which is still not better than the baseline method.
  14. We observe that the location of one part may help to locate another part. For example, the location of elbows may help to locate the wrist. Hence to add another fully connected layer after the original output layer. We also add two layers as the second branch to increase the variation of the model. Finally, we sum the outputs form two branches together to get the final prediction. This time we achieve 60.9 mean PCP, which is comparable with Yang’s method.
  15. Here is the summary of the experiment results. By further doing data fusion on test set. We finally get 62.5 mean PCP.
  16. This is the visualized results on LSP dataset. We can see that this method has limitations in high precision regions, such as lower arms and lower legs. It is worth to mention that this method is very fast, since predictions can be get by batch forward propagation.
  17. Here we provided some failure cases. The failures are mainly caused by articulation, fore-shortening, self occlusion or occlusions caused by clothing, and cluttered background or overlapping people.
  18. As mentioned before, although this method is very fast. It still has limitations, for example, it only gives one prediction for one image. Hence we turn to another kind of method based on local image information.
  19. We observe that local image patches are not only able to capture part presence, but also able to reason pairwise spatial relationships. For example, consider the patch centered at wrist can predict the relative position of elbow; the patch centered at elbow can reliably predict position of shoulder and wrist. We use mixture model to define different types of spatial relationships. The right panel shows typical spatial relationships the wrist can have with its neighbor elbow. The left panel shows the typical spatial relationships the elbow can have with its two neighbors, say shoulder and wrist.
  20. Based on this observation, we can define human pose as a tree structure graph, where each node denotes the position of each part, and the edges denote the pairwise spatial relationships.
  21. We define the score function of part locations p and pairwise relation types t. It is computed by summing the Unary appearance term and the pairwise relationship term. The unary term is the part presence map indicating the probalibity that part I appears at each location of the image. Pairwise term consists of two part. The first part is the pairwise relationship map, and the second part is the deformation cost. Theta are parameters which are learned by CNN. Inference is to find the positions and mixture types to maximize this score. As the relational graph is tree structure, it can be efficiently solved by dynamic programming.
  22. Here we talk about how to learn theta. Given an image, we want produce a score map to indicate its probability of a specific type. This is done by learn a multi class classifier on local image patches. First we need to derive type label for each patch.
  23. Then we use two convolutional layers with 1 by 1 kernels to replace the original fully connected layers. Then the network becomes a fully convolutional network, and can perform convolutions on input image with arbitrary size, and the output is the scoremap for each type, as we want. Then we can easily compute the part presence map and pairwise relationship maps as this figure illustrated. For example, to compute part presence map of elbow, we just add all the score maps associated with elbow to shoulder, and elbow to wrist together. To compute pairwise relationship maps, we need to perform marginalization.
  24. Here are