2018 07 02_dense_pose

Dense Pose: Dense Human Pose
Estimation In The Wild
2018年7月2日(月)
細川皓平

概要
• タイトル
• DensePose: Dense Human Pose Estimation In The Wild
• 著者
• Riza Alp Guler, Natalia Neverova, Iasonas Kokkinos
• 所属
• Facebook AI Research
• 発表
• CVPR 2018 (https://arxiv.org/abs/1802.00434)
• ソースコードが6/18に公開
(https://github.com/facebookresearch/DensePose)
• 目的
• リアルタイムでRGB画像中の全ての人物領域を3次元にマッピング
• DensePose-RCNN System による，各部位ごとのUV座標の回帰
• 学習のためにCOCO Dataset に手動でアノテーションしたCOCO-
DensePose データセットの作成
DensePose 2

DensePose 3
オリジナル画像
体の面に対するパーティショニング
と UV展開画像
DensePose-COCO
(アノテーション)
DensePose-RCNNの出力画像

COCO-DensePose Dataset
• COCO-Dataset から人が写っている画像を収集
• 330K枚の画像中，50.3K人分の画像を使用 (何枚かは書いてなかった)
• 各パーツに分割．全24．
• 頭 (左，右)
• 首
• 胴体
• 腕(左，右 / 上，前 / 前，後)
• 脚 (左，右 / 太もも，ふくらはぎ / 前，後)
• 手 (左，右)
• 足 (左，右)
DensePose 4

• サンプルポイントをつける
• 数はパーツのサイズによる
• 最大で14 個
• 人体のモデルとしてSMPL [1] モデルを使用
DensePose 5
[1] Loper, Matthew, et al. "SMPL: A skinned multi-person linear
model." ACM Transactions on Graphics (TOG) 34.6 (2015): 248.
SMPL モデル

DensePose 6
• アノテーション用ソフトのキャプチャ
• 点を打つと (赤のX)，6方向からのCG中に対応する点が描画される

DensePose 7
• アノテーションの結果
• 左: オリジナル
• 中央: Uの値
• 右: Vの値

評価指標: 点ごとの評価
• AUC: Area Under the Curve
• AUCα は予測した点と正確な点の距離の閾値をα とした時の精度
• ここでは，α = 10cm, 30cm の2通りを採用している
• GPS: geodesic point similarity
• 𝐺𝑃𝑆𝑗 =
1
𝑃 𝑗
𝑝∈𝑃 𝑗
−𝑔 𝑖 𝑝, 𝑖 𝑝
2
2𝑘2
• 𝑃𝑗: アノテーションしたperson instance j の点の集合 (person instance: 24のパーツ ?)
• 𝑖 𝑝 : 点p でモデルによって推定された頂点
• 𝑖 𝑝 : 正確な頂点
• 𝑘 = 0.255 : 正規化パラメータ
• 完璧な予測をすると，0.5になる
• Average Precision (AP) と Average Recall (AR) で評価する
DensePose 8

DensePose-RCNN
• 基本は DenseReg [2] (先行研究)と Mask-RCNN [3] の組合せ
1. Region Proposal Network (RPN) [4] で矩形領域抽出
2. ROI-Align Pooling [5] で固定サイズに変換
3. Cross-cascading architecture
• Mask R-CNN
• 体の部位と部位同士の繋り (keypoints)
• 人物領域 (mask)
• DensePose network
• UV座標
DensePose 9
[2] Güler, Rıza Alp, et al. "Densereg: Fully convolutional dense shape regression in-the-wild." Proc. CVPR. Vol. 2. No. 3. 2017.
[3] He, Kaiming, et al. "Mask r-cnn." Computer Vision (ICCV), 2017 IEEE International Conference on. IEEE, 2017.
[4] Ren, Shaoqing, et al. "Faster r-cnn: Towards real-time object detection with region proposal networks." Advances in
neural information processing systems. 2015.

DensePose-RCNN: 矩形領域抽出
• ResNet-50 [6] + Feature pyramid
networks (FPN) [7]
• ResNet-50
• 特に理由は書いてない
• まあ表現力が高い
• FPN
• 様々なスケールに対応するための
ネットワーク
• 中間層の出力マップと，最終層の
出力マップをアップサンプリングし
たものを足し合せて，スケールごと
に予測をする
DensePose 10
FPN とその他の構造の比較
(a) 理想だが重い
(b) 一般的なCNN
(c) 浅い層の表現力が乏しい
(d) 表現力も高く計算量も少ない
[6] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE
conference on computer vision and pattern recognition. 2016.
[7] Lin, Tsung-Yi, et al. "Feature pyramid networks for object detection." CVPR. Vol. 1. No. 2. 2017.

DensePose-RCNN: 矩形領域抽出
• Region Proposal Network
• 特徴マップから矩形の候補領域を求めるネットワーク
1. nxn のウィンドウをスライド
2. 各位置でk種類のアンカーを生成
3. それぞれのアンカーに対し，
1. objectか否かの2クラス分類
2. Bounding box の4点を求める回帰
を行なう．
• N = 3
• K = 9
• Anchor: 9種類
• スケール: (0.5, 1.0, 2.0)
• アスペクト比: (1:1, 1:2, 2:1)
DensePose 11

DensePose-RCNN: ROI Align Pooling
• Mask R-CNN，DencePose Networkへの入力を固定長にしたい．(下図だと 3x3)
• Pytorch実装 (この論文でも使っている) では，binの中央のサブピクセルに対して
Bilinear Interpolation してる
• Tensorflow ではただのresize
DensePose 12
(双方向補完)
参考) Qiita: 最新の物体検出手法Mask R-CNNのRoI Alignと
Fast(er) R-CNNのRoI Poolingの違いを正しく理解する
https://qiita.com/yu4u/items/5cbe9db166a5d72f9eb8

DensePose-RCNN: Mask-RCNN
• RoI Align Pooling後の画像を入力
• keypointsとmasksを出力
• オリジナルのMask R-CNNと比較して，
マップ数を2倍の512に変更している
• (keypointsの推定に512必要なので，
masksもそれに合わせている)
DensePose 13
filter: 3x3
stride: 1
X 512
filter: 2x2
stride: 2
filter: 1x1
stride: 1
X 512 X 512 X 512
物体判別部分．ここでは人だけなので関係無い

DensePose-RCNN: DensePose Network
• 75 て何?
• Patch てどこからきたんだ ?
• ちょっとよくわからない
DensePose 14

DensePose-RCNN: Cross-cascading architecture
•Mask R-CNN部分での特徴量とDensePose Network部分での特徴量
を互いに足し合わせる
•相乗効果で，より効果的な特徴量が得られる
DensePose 15

DensePose-RCNN: Teacher Network
•Dense pose 推定のためには人物領域全体に対して密にアノテーションする必
要があるのに対し，実際には100〜150程度しか行っていない
•これらをもとに塗り潰しを行う，”teacher” ネットワークを学習により構築して，
密な教師信号を生成する．(詳細不明)
DensePose 16

実験: データセット
•COCO-DensePose
•テストデータ
• 15000枚 (23000人分)
•訓練データ
• 480000人分
DensePose 17

実験: データセットの比較
DensePose 18

実験: 3D画像推定モデルとの比較
• 一人の画像
• DensePose が最も精度が高い
• 特に体全体が写っていない画像に対し
ては差が大きい
• 計算スピードに関しても，他の手法が
60-200 [sec / image] (?) に対し，0.04 –
0.25 とかなり速い
DensePose 19

実験: 各モジュールごとの評価
• DP-FCN
• DensePose Network の部分だけ
• スケールに対するロバスト性が不十
分
• DP-RCNN (points only)
• RPNを追加
• 急激にパフォーマンスが上昇
• DP-RCNN(distillations)
• “teacher” networkを追加
• DP-RCNN(cascade)
• Mask-RCNNを追加
• DP*
• 背景の削除，複数のスケールのアン
サンブル
DensePose 20

実験: ベースネットワークの違い
• 単純に大きいネットワークの方が良い評価ではあるが，大した差では無い
DensePose 21

実験: Multi-task とcascading
• Mask とkeypointsでは，keypoints の方が全体の評価に大きく影響を与
えている
• 指標によっては，keypoints のみを使用した方がいい場合もある
DensePose 22

定性的な評価
• スカートや服，周りの物体を無視できている
DensePose 23

定性的な評価
• スケールやポーズによらず評価できている
DensePose 24

2018 07 02_dense_pose

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 2018 07 02_dense_pose

Similar to 2018 07 02_dense_pose (20)

More from harmonylab

More from harmonylab (20)

Recently uploaded

Recently uploaded (9)

2018 07 02_dense_pose