SlideShare une entreprise Scribd logo
1  sur  48
Télécharger pour lire hors ligne
V2V-PoseNet:
Voxel-to-Voxel Prediction Network for
Accurate 3D Hand and Human Pose
Estimation from a Single Depth Map
Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee
Computer Vision Lab.
Dept. of ECE, ASRI,
Seoul National University
http://cv.snu.ac.kr Aug 29, 2018
Invited Talk @ NAVER
 Winner of the 2017 Hands in the Million Challenge on 3D Hand Pose Estimation
Intelligent and Invisible Computing 2
The 2017 Hands in the Million Challenge on
3D Hand Pose Estimation
Intelligent and Invisible Computing 3
HANDS 2017 3D Hand Pose Estimation Challenge
 We won the challenge! (ranked 1st among 15 entries)
 Frame-based 3D Hand Pose Estimation
V2V-PoseNet
Intelligent and Invisible Computing 4
3D Hand Pose Estimation
 Goal: Localize hand keypoints (joints) from a single depth
map
Fig. 3D hand model: 21 keypoints (joints)
Intelligent and Invisible Computing 5
3D Hand Pose Estimation
 Still hot topic:
 More than 16,000 publications over last 5 years
Intelligent and Invisible Computing 6
Applications
Oculus Rift
Microsoft HoloLens
 Crucial Technique for HCI and AR
Intelligent and Invisible Computing 7
What are the Challenges?
 Diverse geometric (shape) variations
 Weak appearance features
 Heavy self occlusions
 Self similarity
 Noise
Previous works for 3D Hand Pose Estimation
 Generative approaches
• Assume pre-defined hand model and fit it to the input depth image
• PSO, ICP to minimize hand-crafted cost function
[1] C. Qian, et al. “Realtime and robust hand tracking from depth.” CVPR 2014,
[2] Tang, Danhang, et al. "Opening the black box: Hierarchical sampling optimization for estimating human hand pose." ICCV 2015.
Fig. Finger detection and hand pose initialization [1]
Fig. Hierarchical sampling optimization using silver and gold energy [2]
Previous works for 3D Hand Pose Estimation
 Discriminative approaches
• Directly localize keypoints from the input depth image without hand model
• Most of the random forest- and recent deep learning-based methods (including V2V-PoseNet)
Fig. Pose-REN [1]
[1] Chen, Xinghao, et al. "Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation.“Neurocomputing 2018.
[2] Ge, Liuhao, et al. "3d convolutional neural networks for efficient and robust hand pose estimation from single depth images." CVPR 2017.
[3] Ge, Liuhao, et al. "Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns.“ CVPR 2016.
Fig. 3D CNN for hand pose estimation [2]
Fig. Multi-view CNN for hand pose estimation [3]
Previous works for 3D Hand Pose Estimation
 Hybrid approaches
• Try to combine generative and discriminative approaches
• Learn latent space of pose (generative) and localize keypoints from the space (discriminative)
• Recent methods learned latent space successfully using adversarial loss
Fig. Learned latent space of CrossingNets [2]
[1] Zhou, Xingyi, et al. "Model-based deep hand pose estimation." IJCV 2016.
[2] Wan, Chengde, et al. "Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation." CVPR 2017.
Fig. Model-based hand pose estimation (DeepModel) [1]
Intelligent and Invisible Computing 11
Our Contributions
 Firstly cast the 3D hand and human pose estimation from a single depth map into
voxel-to-voxel prediction
 Empirically validate the usefulness of the volumetric input and output
representations
 Significantly outperformed existing methods on almost all of 3D hand and human
pose estimation datasets
 Won the first place in the HANDS 2017 frame-based 3D hand pose estimation
challenge
Intelligent and Invisible Computing 12
 Most of the previous works take 2D depth image and directly regress 3D coordinates
 P2C: [Chen et al. arXiv 2017], [CrossingNets. CVPR 2017], [DeepPrior++. ICCVW2017], [Oberweger et al. ICCV 2015]
 P2V: [Pavlakos et al. CVPR 2017]
 V2C: [Ge et al. CVPR 2017], [Deng et al. arXiv 2017]
 V2V: Ours
 We argue that voxel-to-voxel prediction achieves more accurate result
Analysis of the Previous Works
Intelligent and Invisible Computing 13
Why Voxel-to-Voxel (V2V) is better ?
 Perspective distortion matters: what is perspective distortion?
Intelligent and Invisible Computing 14
Why Voxel-to-Voxel (V2V) is better ?
 Perspective distortion matters: what is perspective distortion?

𝑥 𝑝𝑖𝑥𝑒𝑙
𝑦 𝑝𝑖𝑥𝑒𝑙
=
𝑥 𝑤𝑜𝑟𝑙𝑑
𝑦 𝑤𝑜𝑟𝑙𝑑
∗
𝐹𝐿
𝑧 𝑤𝑜𝑟𝑙𝑑
+ 𝑅0
 𝑹 𝟎: constant, 𝑭𝑳: focal length (camera
param), 𝒛 𝒘𝒐𝒓𝒍𝒅: distance from camera
 Different distances from camera make
distortion
𝑢, 𝑣 = (𝑥 𝑝𝑖𝑥𝑒𝑙, 𝑦 𝑝𝑖𝑥𝑒𝑙)
𝑋, 𝑌, 𝑍 = (𝑥 𝑤𝑜𝑟𝑙𝑑, 𝑦 𝑤𝑜𝑟𝑙𝑑, 𝑧 𝑤𝑜𝑟𝑙𝑑)
Why Voxel-to-Voxel (V2V) is better ?
Camera
 Perspective distortion matters:
1-to-1
relation
N-to-1
relation
3D to 2D
projection
3D point cloud
(𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅)
ΔX-ΔX
-ΔY
ΔY
2D depth maps
(𝒄𝒐𝒐𝒓𝒅 𝒑𝒊𝒙𝒆𝒍)
Intelligent and Invisible Computing 16
Why Voxel-to-Voxel (V2V) is better ?
 We discretize 3D point cloud to the Voxels
 Voxelized 3D point cloud is free from
perspective distortion
 Voxelized input can be more easily adopted
to the advanced CNN architecture (ResNet,
U-Net) than point cloud input
Voxelize
3D point cloud
(𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅)
Voxels
(𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅)
1-to-1
relation
1-to-1
relation
Why Voxel-to-Voxel (V2V) is better ?
 Tompson et. al [1] argued mapping
between image and coordinates of
keypoint is highly non-linear
 Supervising per-pixel likelihood (2D
heatmap) to the network gave more
accurate result
 Most of the 2D human pose estimation
methods learn to estimate 2D heatmap
(called detection-based)
 Our model estimate per-voxel likelihood
(3D heatmap) instead of 3D coordinates
Fig. Overall architecture of the Tompson et. al [1]
[1] Tompson, Jonathan J., et al. "Joint training of a convolutional network and a graphical model for human pose estimation." Advances in neural information processing systems. 2014.
Generating Input of the V2V-PoseNet
Some problems are here…
 Simple depth thresholding can exclude some parts of hand or human body
 In contrast to regression-based methods (coordinate estimation), detection-
based methods (heatmap estimation) cannot recover excluded parts
 Conventional strategy for input generation
Depth map from a dataset Depth thresholding and
calculate center-of-mass (CoM)
Draw a fixed-size cubic
box around the CoM
Project the cubic box on
the 2D image and crop the
hand region
CoM
Thumb is contained in the
bounding box
Intelligent and Invisible Computing 19
 We refine the estimated CoM using a simple network [1]
 The network takes cropped depth image from conventional cropping method and
outputs offsets to the correct CoM
 A depth image is converted to the 3D point cloud and crop hand in the voxelized
3D space around the refined CoM by placing fixed-size cubic
Generating Input of the V2V-PoseNet
[1] Oberweger, Markus, and Vincent Lepetit. "Deepprior++: Improving fast and accurate 3d hand pose estimation." ICCV workshop. Vol. 840. 2017
Forward the cropped hand image
(CoM refinement network)
Crop hand in the 3D space and voxelize it
Refined CoM = (x-0.8,y+0.1,z+0.3)
Input of the V2V-PoseNet
Crop hand following conventional protocol
CoM = (x,y,z)
Fig. Effect of the CoM refinement
Network Design
 Fully convolutional 3D CNN
 Takes voxelized depth map and estimates per-voxel likelihood (3D heatmap) of each keypoint
 Encoder and decoder enable the model to exploit multi-scale information
Network Design
 Volumetric BasicBlock: 3D Conv + 3D BN + ReLU
 Volumetric ResBlock: extended 2D Resblock [1] to 3D
 Volumetric DownSamplingBlock: 3D Max-pooling
 Volumetric UpSamplingBlock: 3D Deconv + 3D BN + ReLU
[1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
Intelligent and Invisible Computing 22
Network Design
 Encoder decreases resolution while increases the number of channels
 Decoder increases resolution while decreases the number of channels
 Effectively extracts multi-scale information (downsampling-upsampling structure)
 Efficiently enlarges receptive field (Volumetric DownSamplingBlock)
Intelligent and Invisible Computing 23
Network Design
 3D CNN consumes a lot of GPU memory -> careful architecture designing is required
 Increasing the number of all feature maps consumes too much memory
 We increased the number of feature map of downsampled feature map only -> trade-off between memory
limitation and performance
 1.53 mm error decreases on the NYU dataset
Intelligent and Invisible Computing 24
Network Design
 Hourglass network [2] uses simple NN for upsampling
 We use VoluemetricUpSamplingBlock (3D Deconv + 3D BN + ReLU) instead of NN -> error decreases
 Skip connection helps to upsample the feature map more stable -> error decreases
[2] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In ECCV, 2016. 1
Intelligent and Invisible Computing 25
Implementation Details
 Ground-truth 3D heatmap is generated, wherein the mean of Gaussian peak is positioned at the
ground-truth joint location
• 𝐻 𝑛
∗
(𝑖, 𝑗, 𝑘) = exp −
𝑖−𝑖 𝑛
2 + 𝑗−𝑗 𝑛
2+ 𝑘−𝑘 𝑛
2
2𝜎2
• 𝐻 𝑛
∗ is the ground-truth 3D heatmap of 𝑛th keypoint, (𝑖 𝑛, 𝑗 𝑛, 𝑘 𝑛) is the ground-truth voxel coordinate of 𝑛th
keypoint.
 Mean square error is adopted as a loss function
• 𝐿 = σ 𝑛=1
𝑁 σ𝑖,𝑗,𝑘 𝐻 𝑛
∗
𝑖, 𝑗, 𝑘 − 𝐻 𝑛 𝑖, 𝑗, 𝑘 2
• 𝐻 𝑛
∗ and 𝐻 𝑛 are the ground-truth and estimated heatmaps for 𝑛th keypoint, respectively, and 𝑁 denotes the
number of keypoints
 88×88×88 voxel grid is fed to the network with data augmentation
• Rotation: [-40, 40] degrees in XY space
• Scaling: [0.8, 1.2] in XYZ space
• Translation: [-8, 8] voxels in 3D space
 Implemented under Torch7 framework (will be reimplemented under pyTorch)
Datasets
 ICVL Hand Posture Dataset
• 330K training and 1.6K testing depth images
• 10 different subjects
 NYU Hand Pose Dataset
• 72K training and 8.2K testing depth images
 MSRA Hand Pose Dataset
• 76K depth images from 9 subjects with 17 gestures
• Leave-one-subject-out cross-validation
 HANDS 2017 Frame-based 3D Hand Pose Estimation Challenge Dataset
• 975K training and 295K testing depth images
• Five subjects in the training set and ten subjects in testing set
 ITOP Human Pose Dataset
• 40K training and 10K testing depth images of 20 subjects
• Front-view, top-view
NYU Hand Pose dataset
ICVL Hand Posture
dataset MSRA Hand Pose
dataset
ITOP Human Pose dataset
Intelligent and Invisible Computing 27
Evaluation Metrics
 3D distance error
• Euclidean distance between estimated keypoint and ground-truth coordinates in 3D space
 Percentage of success frame
• Success frame: All the 3D distance error of each keypoint are less than a threshold
• Ratio of success frames in the whole test frames
 mAP based on 10 cm rule
• Consider estimated keypoint is correct if 3D distance to ground-truth is less than 10 cm
• Used for 3D human pose estimation
Intelligent and Invisible Computing 28
Computational Complexity
 Training time
• 2 days for ICVL dataset (330K training images)
• 12 hours for NYU and MSRA dataset (70K training images)
• 6 days for HANDS 2017 challenge dataset (957K training images)
• 3 hours for ITOP dataset (40K training images)
 Testing time
• 35 fps on the single-GPU machine (NVIDIA TITAN X, without ensemble)
• Can be used in real-world applications in real-time
• Input generation (ref.pt refinement + voxelizing): 23 ms (most of the time is for voxelizing)
• Network forwarding: 5 ms
• Extracting 3D coordinates from the 3D heatmaps: 0.5 ms
Ablation Study
Table. Performance and # of param comparison according to the input
and output type
 Converting 2D depth map to 3D
voxelized grid improves performance
 Estimating the per-voxel likelihood
(3D heatmap) gives more accurate
estimation compared with directly
regressing 3D coordinates
 The table shows the benefit of the
volumetric input and output
representation
 Effect of input-output representation
Intelligent and Invisible Computing 30
Ablation Study
Fig. Ref.pt refinement network (localization refinement)
 The epoch ensemble averages estimation
from several epochs
 In contrast to other ensemble techniques,
it ensembles models from a single training
 We used models from all epochs (10
epochs) for the ensemble
 In multi-GPU environment, it does not
increase running time
 More accurate and robust estimation
Fig. Effect of the localization refinement
Intelligent and Invisible Computing 31
Quantitative Results: Hand Pose
ICVL NYU MSRA
Intelligent and Invisible Computing 32
Quantitative Results: Hand Pose
Intelligent and Invisible Computing 33
Quantitative Results: Hand Pose
Intelligent and Invisible Computing 34
HANDS 2017 Challenge Results
Intelligent and Invisible Computing 36
HANDS 2017 Challenge Results
Intelligent and Invisible Computing 37
HANDS 2017 Challenge Results
Qualitative Results
 ICVL dataset
Intelligent and Invisible Computing 39
Qualitative Results
 NYU dataset
Intelligent and Invisible Computing 40
Qualitative Results
 MSRA dataset
Intelligent and Invisible Computing 41
Qualitative Results
 HANDS 2017 Challenge dataset
Intelligent and Invisible Computing 42
Quantitative Results: Human Pose
 ITOP dataset
Intelligent and Invisible Computing 43
Qualitative Results
 ITOP dataset: Front View
Intelligent and Invisible Computing 44
Qualitative Results
 ITOP dataset: Top View
Intelligent and Invisible Computing 45
Qualitative Results
 ICVL dataset: Frame-based
Video
Intelligent and Invisible Computing 46
Qualitative Results
 NYU dataset: Frame-based
Video
Intelligent and Invisible Computing 47
Qualitative Results
 MSRA dataset (grouped by gesture): Frame-based
Video
Conclusion
 We proposed a novel and powerful network, V2V-PoseNet, for 3D hand and human
pose estimation from a single depth map
 Converted 2D depth map into the 3D voxel representation and estimated the per-
voxel likelihood (3D heatmap) for each keypoint instead of directly regressing 3D
coordinates
 Significantly outperformed almost all the existing methods in almost all the 3D
hand and human pose estimation dataset
 Achieved the 1st place in HANDS 2017 frame-based 3D hand pose estimation
challenge
 Learning physical constraints via generative approach and improving encoder-
decoder for multi-scale information are future works
 Code is available: https://github.com/mks0601/V2V-PoseNet_RELEASE
Intelligent and Invisible Computing 49
Thank you
http://cv.snu.ac.kr

Contenu connexe

Tendances

PRML輪読#9
PRML輪読#9PRML輪読#9
PRML輪読#9matsuolab
 
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)Youichiro Miyake
 
ベキ零行列とジョルダン標準形
ベキ零行列とジョルダン標準形ベキ零行列とジョルダン標準形
ベキ零行列とジョルダン標準形HanpenRobot
 
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningssuserca2822
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測Kitamura Laboratory
 
環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類Keisuke Imoto
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)Daichi Kitamura
 
異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知hagino 3000
 
双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何matsumoring
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)Takuma Yagi
 
マルコフ連鎖モンテカルロ法
マルコフ連鎖モンテカルロ法マルコフ連鎖モンテカルロ法
マルコフ連鎖モンテカルロ法Masafumi Enomoto
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタToshihisa Tanaka
 
円形アレイを用いた水平面3次元音場の収録と再現
円形アレイを用いた水平面3次元音場の収録と再現円形アレイを用いた水平面3次元音場の収録と再現
円形アレイを用いた水平面3次元音場の収録と再現Takuma_OKAMOTO
 
条件付き確率場の推論と学習
条件付き確率場の推論と学習条件付き確率場の推論と学習
条件付き確率場の推論と学習Masaki Saito
 
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~MrUnadon
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video GenerationDeep Learning JP
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionHiroto Honda
 

Tendances (20)

PRML輪読#9
PRML輪読#9PRML輪読#9
PRML輪読#9
 
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)
人工知能、意識、ベルクソン(応用哲学会ー〈意識の遅延テーゼ〉の行為論的射程̶講演資料)
 
ベキ零行列とジョルダン標準形
ベキ零行列とジョルダン標準形ベキ零行列とジョルダン標準形
ベキ零行列とジョルダン標準形
 
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learningベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
ベイズ深層学習5章 ニューラルネットワークのベイズ推論 Bayesian deep learning
 
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測深層学習に基づく音響特徴量からの振幅スペクトログラム予測
深層学習に基づく音響特徴量からの振幅スペクトログラム予測
 
Chapter11.2
Chapter11.2Chapter11.2
Chapter11.2
 
環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類環境音の特徴を活用した音響イベント検出・シーン分類
環境音の特徴を活用した音響イベント検出・シーン分類
 
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
近接分離最適化によるブラインド⾳源分離(Blind source separation via proximal splitting algorithm)
 
PRML11章
PRML11章PRML11章
PRML11章
 
異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知異常検知と変化検知 9章 部分空間法による変化点検知
異常検知と変化検知 9章 部分空間法による変化点検知
 
Maxout networks
Maxout networksMaxout networks
Maxout networks
 
双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何双曲平面のモデルと初等幾何
双曲平面のモデルと初等幾何
 
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
RBM、Deep Learningと学習(全脳アーキテクチャ若手の会 第3回DL勉強会発表資料)
 
マルコフ連鎖モンテカルロ法
マルコフ連鎖モンテカルロ法マルコフ連鎖モンテカルロ法
マルコフ連鎖モンテカルロ法
 
ウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタウィナーフィルタと適応フィルタ
ウィナーフィルタと適応フィルタ
 
円形アレイを用いた水平面3次元音場の収録と再現
円形アレイを用いた水平面3次元音場の収録と再現円形アレイを用いた水平面3次元音場の収録と再現
円形アレイを用いた水平面3次元音場の収録と再現
 
条件付き確率場の推論と学習
条件付き確率場の推論と学習条件付き確率場の推論と学習
条件付き確率場の推論と学習
 
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
魅せる・際立つ・役立つグラフ Hands on!! ggplot2!! ~導入編~
 
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
[DL輪読会] MoCoGAN: Decomposing Motion and Content for Video Generation
 
Recent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-ResolutionRecent Progress on Single-Image Super-Resolution
Recent Progress on Single-Image Super-Resolution
 

Similaire à V2 v posenet

FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020RohanLekhwani
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUEScscpconf
 
Ijarcet vol-2-issue-3-938-941
Ijarcet vol-2-issue-3-938-941Ijarcet vol-2-issue-3-938-941
Ijarcet vol-2-issue-3-938-941Editor IJARCET
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...Seiya Ito
 
A Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesA Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesAIRCC Publishing Corporation
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...AIRCC Publishing Corporation
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...AIRCC Publishing Corporation
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...ijcsit
 
A Review Paper on Real-Time Hand Motion Capture
A Review Paper on Real-Time Hand Motion CaptureA Review Paper on Real-Time Hand Motion Capture
A Review Paper on Real-Time Hand Motion CaptureIRJET Journal
 
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing DevicesFrom Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devicestoukaigi
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui
 
Обучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхОбучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхAnatol Alizar
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Alexander Decker
 
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism HardwarePerformance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism HardwareCSCJournals
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdfmokamojah
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...eSAT Publishing House
 

Similaire à V2 v posenet (20)

FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020FastV2C-HandNet - ICICC 2020
FastV2C-HandNet - ICICC 2020
 
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUESA STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
A STUDY AND ANALYSIS OF DIFFERENT EDGE DETECTION TECHNIQUES
 
Ijarcet vol-2-issue-3-938-941
Ijarcet vol-2-issue-3-938-941Ijarcet vol-2-issue-3-938-941
Ijarcet vol-2-issue-3-938-941
 
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
[論文紹介] BlendedMVS: A Large-scale Dataset for Generalized Multi-view Stereo Ne...
 
A Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural CommunitiesA Wireless Network Infrastructure Architecture for Rural Communities
A Wireless Network Infrastructure Architecture for Rural Communities
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate... Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrate...
 
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
Complete End-to-End Low Cost Solution to a 3D Scanning System with Integrated...
 
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
COMPLETE END-TO-END LOW COST SOLUTION TO A 3D SCANNING SYSTEM WITH INTEGRATED...
 
A Review Paper on Real-Time Hand Motion Capture
A Review Paper on Real-Time Hand Motion CaptureA Review Paper on Real-Time Hand Motion Capture
A Review Paper on Real-Time Hand Motion Capture
 
crowd counting.pptx
crowd counting.pptxcrowd counting.pptx
crowd counting.pptx
 
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing DevicesFrom Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
From Sense to Print: Towards Automatic 3D Printing from 3D Sensing Devices
 
最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に - 最近の研究情勢についていくために - Deep Learningを中心に -
最近の研究情勢についていくために - Deep Learningを中心に -
 
Обучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграхОбучение нейросетей компьютерного зрения в видеоиграх
Обучение нейросетей компьютерного зрения в видеоиграх
 
Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...Implementation of a modified counterpropagation neural network model in onlin...
Implementation of a modified counterpropagation neural network model in onlin...
 
538 207-219
538 207-219538 207-219
538 207-219
 
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism HardwarePerformance Improvement of Vector Quantization with Bit-parallelism Hardware
Performance Improvement of Vector Quantization with Bit-parallelism Hardware
 
Computer graphics
Computer graphicsComputer graphics
Computer graphics
 
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf10.1109@ICCMC48092.2020.ICCMC-000167.pdf
10.1109@ICCMC48092.2020.ICCMC-000167.pdf
 
mini prjt
mini prjtmini prjt
mini prjt
 
A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...A systematic image compression in the combination of linear vector quantisati...
A systematic image compression in the combination of linear vector quantisati...
 

Plus de NAVER Engineering

디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIXNAVER Engineering
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)NAVER Engineering
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트NAVER Engineering
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호NAVER Engineering
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라NAVER Engineering
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기NAVER Engineering
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정NAVER Engineering
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기NAVER Engineering
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)NAVER Engineering
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드NAVER Engineering
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기NAVER Engineering
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활NAVER Engineering
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출NAVER Engineering
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우NAVER Engineering
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...NAVER Engineering
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법NAVER Engineering
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며NAVER Engineering
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기NAVER Engineering
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기NAVER Engineering
 

Plus de NAVER Engineering (20)

React vac pattern
React vac patternReact vac pattern
React vac pattern
 
디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX디자인 시스템에 직방 ZUIX
디자인 시스템에 직방 ZUIX
 
진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)진화하는 디자인 시스템(걸음마 편)
진화하는 디자인 시스템(걸음마 편)
 
서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트서비스 운영을 위한 디자인시스템 프로젝트
서비스 운영을 위한 디자인시스템 프로젝트
 
BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호BPL(Banksalad Product Language) 무야호
BPL(Banksalad Product Language) 무야호
 
이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라이번 생에 디자인 시스템은 처음이라
이번 생에 디자인 시스템은 처음이라
 
날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기날고 있는 여러 비행기 넘나 들며 정비하기
날고 있는 여러 비행기 넘나 들며 정비하기
 
쏘카프레임 구축 배경과 과정
 쏘카프레임 구축 배경과 과정 쏘카프레임 구축 배경과 과정
쏘카프레임 구축 배경과 과정
 
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
플랫폼 디자이너 없이 디자인 시스템을 구축하는 프로덕트 디자이너의 우당탕탕 고통 연대기
 
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
200820 NAVER TECH CONCERT 15_Code Review is Horse(코드리뷰는 말이야)(feat.Latte)
 
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
200819 NAVER TECH CONCERT 03_화려한 코루틴이 내 앱을 감싸네! 코루틴으로 작성해보는 깔끔한 비동기 코드
 
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
200819 NAVER TECH CONCERT 10_맥북에서도 아이맥프로에서 빌드하는 것처럼 빌드 속도 빠르게 하기
 
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
200819 NAVER TECH CONCERT 08_성능을 고민하는 슬기로운 개발자 생활
 
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
200819 NAVER TECH CONCERT 05_모르면 손해보는 Android 디버깅/분석 꿀팁 대방출
 
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
200819 NAVER TECH CONCERT 09_Case.xcodeproj - 좋은 동료로 거듭나기 위한 노하우
 
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
200820 NAVER TECH CONCERT 14_야 너두 할 수 있어. 비전공자, COBOL 개발자를 거쳐 네이버에서 FE 개발하게 된...
 
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
200820 NAVER TECH CONCERT 13_네이버에서 오픈 소스 개발을 통해 성장하는 방법
 
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
200820 NAVER TECH CONCERT 12_상반기 네이버 인턴을 돌아보며
 
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
200820 NAVER TECH CONCERT 11_빠르게 성장하는 슈퍼루키로 거듭나기
 
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
200819 NAVER TECH CONCERT 07_신입 iOS 개발자 개발업무 적응기
 

Dernier

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxSuji236384
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusNazaninKarimi6
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learninglevieagacer
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsSérgio Sacani
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Silpa
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Silpa
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.Silpa
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfSumit Kumar yadav
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsOrtegaSyrineMay
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.Silpa
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLkantirani197
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxSilpa
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsbassianu17
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE
 

Dernier (20)

PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Clean In Place(CIP).pptx .
Clean In Place(CIP).pptx                 .Clean In Place(CIP).pptx                 .
Clean In Place(CIP).pptx .
 
development of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virusdevelopment of diagnostic enzyme assay to detect leuser virus
development of diagnostic enzyme assay to detect leuser virus
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRingsTransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
TransientOffsetin14CAftertheCarringtonEventRecordedbyPolarTreeRings
 
Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.Proteomics: types, protein profiling steps etc.
Proteomics: types, protein profiling steps etc.
 
Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.Phenolics: types, biosynthesis and functions.
Phenolics: types, biosynthesis and functions.
 
POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.POGONATUM : morphology, anatomy, reproduction etc.
POGONATUM : morphology, anatomy, reproduction etc.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
Dr. E. Muralinath_ Blood indices_clinical aspects
Dr. E. Muralinath_ Blood indices_clinical  aspectsDr. E. Muralinath_ Blood indices_clinical  aspects
Dr. E. Muralinath_ Blood indices_clinical aspects
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate ProfessorThyroid Physiology_Dr.E. Muralinath_ Associate Professor
Thyroid Physiology_Dr.E. Muralinath_ Associate Professor
 
CYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptxCYTOGENETIC MAP................ ppt.pptx
CYTOGENETIC MAP................ ppt.pptx
 
Genetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditionsGenetics and epigenetics of ADHD and comorbid conditions
Genetics and epigenetics of ADHD and comorbid conditions
 
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIACURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIA
 

V2 v posenet

  • 1. V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map Gyeongsik Moon, Ju Yong Chang, Kyoung Mu Lee Computer Vision Lab. Dept. of ECE, ASRI, Seoul National University http://cv.snu.ac.kr Aug 29, 2018 Invited Talk @ NAVER  Winner of the 2017 Hands in the Million Challenge on 3D Hand Pose Estimation
  • 2. Intelligent and Invisible Computing 2 The 2017 Hands in the Million Challenge on 3D Hand Pose Estimation
  • 3. Intelligent and Invisible Computing 3 HANDS 2017 3D Hand Pose Estimation Challenge  We won the challenge! (ranked 1st among 15 entries)  Frame-based 3D Hand Pose Estimation V2V-PoseNet
  • 4. Intelligent and Invisible Computing 4 3D Hand Pose Estimation  Goal: Localize hand keypoints (joints) from a single depth map Fig. 3D hand model: 21 keypoints (joints)
  • 5. Intelligent and Invisible Computing 5 3D Hand Pose Estimation  Still hot topic:  More than 16,000 publications over last 5 years
  • 6. Intelligent and Invisible Computing 6 Applications Oculus Rift Microsoft HoloLens  Crucial Technique for HCI and AR
  • 7. Intelligent and Invisible Computing 7 What are the Challenges?  Diverse geometric (shape) variations  Weak appearance features  Heavy self occlusions  Self similarity  Noise
  • 8. Previous works for 3D Hand Pose Estimation  Generative approaches • Assume pre-defined hand model and fit it to the input depth image • PSO, ICP to minimize hand-crafted cost function [1] C. Qian, et al. “Realtime and robust hand tracking from depth.” CVPR 2014, [2] Tang, Danhang, et al. "Opening the black box: Hierarchical sampling optimization for estimating human hand pose." ICCV 2015. Fig. Finger detection and hand pose initialization [1] Fig. Hierarchical sampling optimization using silver and gold energy [2]
  • 9. Previous works for 3D Hand Pose Estimation  Discriminative approaches • Directly localize keypoints from the input depth image without hand model • Most of the random forest- and recent deep learning-based methods (including V2V-PoseNet) Fig. Pose-REN [1] [1] Chen, Xinghao, et al. "Pose Guided Structured Region Ensemble Network for Cascaded Hand Pose Estimation.“Neurocomputing 2018. [2] Ge, Liuhao, et al. "3d convolutional neural networks for efficient and robust hand pose estimation from single depth images." CVPR 2017. [3] Ge, Liuhao, et al. "Robust 3d hand pose estimation in single depth images: from single-view cnn to multi-view cnns.“ CVPR 2016. Fig. 3D CNN for hand pose estimation [2] Fig. Multi-view CNN for hand pose estimation [3]
  • 10. Previous works for 3D Hand Pose Estimation  Hybrid approaches • Try to combine generative and discriminative approaches • Learn latent space of pose (generative) and localize keypoints from the space (discriminative) • Recent methods learned latent space successfully using adversarial loss Fig. Learned latent space of CrossingNets [2] [1] Zhou, Xingyi, et al. "Model-based deep hand pose estimation." IJCV 2016. [2] Wan, Chengde, et al. "Crossing nets: Combining gans and vaes with a shared latent space for hand pose estimation." CVPR 2017. Fig. Model-based hand pose estimation (DeepModel) [1]
  • 11. Intelligent and Invisible Computing 11 Our Contributions  Firstly cast the 3D hand and human pose estimation from a single depth map into voxel-to-voxel prediction  Empirically validate the usefulness of the volumetric input and output representations  Significantly outperformed existing methods on almost all of 3D hand and human pose estimation datasets  Won the first place in the HANDS 2017 frame-based 3D hand pose estimation challenge
  • 12. Intelligent and Invisible Computing 12  Most of the previous works take 2D depth image and directly regress 3D coordinates  P2C: [Chen et al. arXiv 2017], [CrossingNets. CVPR 2017], [DeepPrior++. ICCVW2017], [Oberweger et al. ICCV 2015]  P2V: [Pavlakos et al. CVPR 2017]  V2C: [Ge et al. CVPR 2017], [Deng et al. arXiv 2017]  V2V: Ours  We argue that voxel-to-voxel prediction achieves more accurate result Analysis of the Previous Works
  • 13. Intelligent and Invisible Computing 13 Why Voxel-to-Voxel (V2V) is better ?  Perspective distortion matters: what is perspective distortion?
  • 14. Intelligent and Invisible Computing 14 Why Voxel-to-Voxel (V2V) is better ?  Perspective distortion matters: what is perspective distortion?  𝑥 𝑝𝑖𝑥𝑒𝑙 𝑦 𝑝𝑖𝑥𝑒𝑙 = 𝑥 𝑤𝑜𝑟𝑙𝑑 𝑦 𝑤𝑜𝑟𝑙𝑑 ∗ 𝐹𝐿 𝑧 𝑤𝑜𝑟𝑙𝑑 + 𝑅0  𝑹 𝟎: constant, 𝑭𝑳: focal length (camera param), 𝒛 𝒘𝒐𝒓𝒍𝒅: distance from camera  Different distances from camera make distortion 𝑢, 𝑣 = (𝑥 𝑝𝑖𝑥𝑒𝑙, 𝑦 𝑝𝑖𝑥𝑒𝑙) 𝑋, 𝑌, 𝑍 = (𝑥 𝑤𝑜𝑟𝑙𝑑, 𝑦 𝑤𝑜𝑟𝑙𝑑, 𝑧 𝑤𝑜𝑟𝑙𝑑)
  • 15. Why Voxel-to-Voxel (V2V) is better ? Camera  Perspective distortion matters: 1-to-1 relation N-to-1 relation 3D to 2D projection 3D point cloud (𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅) ΔX-ΔX -ΔY ΔY 2D depth maps (𝒄𝒐𝒐𝒓𝒅 𝒑𝒊𝒙𝒆𝒍)
  • 16. Intelligent and Invisible Computing 16 Why Voxel-to-Voxel (V2V) is better ?  We discretize 3D point cloud to the Voxels  Voxelized 3D point cloud is free from perspective distortion  Voxelized input can be more easily adopted to the advanced CNN architecture (ResNet, U-Net) than point cloud input Voxelize 3D point cloud (𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅) Voxels (𝒄𝒐𝒐𝒓𝒅 𝒘𝒐𝒓𝒍𝒅) 1-to-1 relation 1-to-1 relation
  • 17. Why Voxel-to-Voxel (V2V) is better ?  Tompson et. al [1] argued mapping between image and coordinates of keypoint is highly non-linear  Supervising per-pixel likelihood (2D heatmap) to the network gave more accurate result  Most of the 2D human pose estimation methods learn to estimate 2D heatmap (called detection-based)  Our model estimate per-voxel likelihood (3D heatmap) instead of 3D coordinates Fig. Overall architecture of the Tompson et. al [1] [1] Tompson, Jonathan J., et al. "Joint training of a convolutional network and a graphical model for human pose estimation." Advances in neural information processing systems. 2014.
  • 18. Generating Input of the V2V-PoseNet Some problems are here…  Simple depth thresholding can exclude some parts of hand or human body  In contrast to regression-based methods (coordinate estimation), detection- based methods (heatmap estimation) cannot recover excluded parts  Conventional strategy for input generation Depth map from a dataset Depth thresholding and calculate center-of-mass (CoM) Draw a fixed-size cubic box around the CoM Project the cubic box on the 2D image and crop the hand region CoM Thumb is contained in the bounding box
  • 19. Intelligent and Invisible Computing 19  We refine the estimated CoM using a simple network [1]  The network takes cropped depth image from conventional cropping method and outputs offsets to the correct CoM  A depth image is converted to the 3D point cloud and crop hand in the voxelized 3D space around the refined CoM by placing fixed-size cubic Generating Input of the V2V-PoseNet [1] Oberweger, Markus, and Vincent Lepetit. "Deepprior++: Improving fast and accurate 3d hand pose estimation." ICCV workshop. Vol. 840. 2017 Forward the cropped hand image (CoM refinement network) Crop hand in the 3D space and voxelize it Refined CoM = (x-0.8,y+0.1,z+0.3) Input of the V2V-PoseNet Crop hand following conventional protocol CoM = (x,y,z) Fig. Effect of the CoM refinement
  • 20. Network Design  Fully convolutional 3D CNN  Takes voxelized depth map and estimates per-voxel likelihood (3D heatmap) of each keypoint  Encoder and decoder enable the model to exploit multi-scale information
  • 21. Network Design  Volumetric BasicBlock: 3D Conv + 3D BN + ReLU  Volumetric ResBlock: extended 2D Resblock [1] to 3D  Volumetric DownSamplingBlock: 3D Max-pooling  Volumetric UpSamplingBlock: 3D Deconv + 3D BN + ReLU [1] He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  • 22. Intelligent and Invisible Computing 22 Network Design  Encoder decreases resolution while increases the number of channels  Decoder increases resolution while decreases the number of channels  Effectively extracts multi-scale information (downsampling-upsampling structure)  Efficiently enlarges receptive field (Volumetric DownSamplingBlock)
  • 23. Intelligent and Invisible Computing 23 Network Design  3D CNN consumes a lot of GPU memory -> careful architecture designing is required  Increasing the number of all feature maps consumes too much memory  We increased the number of feature map of downsampled feature map only -> trade-off between memory limitation and performance  1.53 mm error decreases on the NYU dataset
  • 24. Intelligent and Invisible Computing 24 Network Design  Hourglass network [2] uses simple NN for upsampling  We use VoluemetricUpSamplingBlock (3D Deconv + 3D BN + ReLU) instead of NN -> error decreases  Skip connection helps to upsample the feature map more stable -> error decreases [2] A. Newell, K. Yang, and J. Deng. Stacked hourglass networks for human pose estimation. In ECCV, 2016. 1
  • 25. Intelligent and Invisible Computing 25 Implementation Details  Ground-truth 3D heatmap is generated, wherein the mean of Gaussian peak is positioned at the ground-truth joint location • 𝐻 𝑛 ∗ (𝑖, 𝑗, 𝑘) = exp − 𝑖−𝑖 𝑛 2 + 𝑗−𝑗 𝑛 2+ 𝑘−𝑘 𝑛 2 2𝜎2 • 𝐻 𝑛 ∗ is the ground-truth 3D heatmap of 𝑛th keypoint, (𝑖 𝑛, 𝑗 𝑛, 𝑘 𝑛) is the ground-truth voxel coordinate of 𝑛th keypoint.  Mean square error is adopted as a loss function • 𝐿 = σ 𝑛=1 𝑁 σ𝑖,𝑗,𝑘 𝐻 𝑛 ∗ 𝑖, 𝑗, 𝑘 − 𝐻 𝑛 𝑖, 𝑗, 𝑘 2 • 𝐻 𝑛 ∗ and 𝐻 𝑛 are the ground-truth and estimated heatmaps for 𝑛th keypoint, respectively, and 𝑁 denotes the number of keypoints  88×88×88 voxel grid is fed to the network with data augmentation • Rotation: [-40, 40] degrees in XY space • Scaling: [0.8, 1.2] in XYZ space • Translation: [-8, 8] voxels in 3D space  Implemented under Torch7 framework (will be reimplemented under pyTorch)
  • 26. Datasets  ICVL Hand Posture Dataset • 330K training and 1.6K testing depth images • 10 different subjects  NYU Hand Pose Dataset • 72K training and 8.2K testing depth images  MSRA Hand Pose Dataset • 76K depth images from 9 subjects with 17 gestures • Leave-one-subject-out cross-validation  HANDS 2017 Frame-based 3D Hand Pose Estimation Challenge Dataset • 975K training and 295K testing depth images • Five subjects in the training set and ten subjects in testing set  ITOP Human Pose Dataset • 40K training and 10K testing depth images of 20 subjects • Front-view, top-view NYU Hand Pose dataset ICVL Hand Posture dataset MSRA Hand Pose dataset ITOP Human Pose dataset
  • 27. Intelligent and Invisible Computing 27 Evaluation Metrics  3D distance error • Euclidean distance between estimated keypoint and ground-truth coordinates in 3D space  Percentage of success frame • Success frame: All the 3D distance error of each keypoint are less than a threshold • Ratio of success frames in the whole test frames  mAP based on 10 cm rule • Consider estimated keypoint is correct if 3D distance to ground-truth is less than 10 cm • Used for 3D human pose estimation
  • 28. Intelligent and Invisible Computing 28 Computational Complexity  Training time • 2 days for ICVL dataset (330K training images) • 12 hours for NYU and MSRA dataset (70K training images) • 6 days for HANDS 2017 challenge dataset (957K training images) • 3 hours for ITOP dataset (40K training images)  Testing time • 35 fps on the single-GPU machine (NVIDIA TITAN X, without ensemble) • Can be used in real-world applications in real-time • Input generation (ref.pt refinement + voxelizing): 23 ms (most of the time is for voxelizing) • Network forwarding: 5 ms • Extracting 3D coordinates from the 3D heatmaps: 0.5 ms
  • 29. Ablation Study Table. Performance and # of param comparison according to the input and output type  Converting 2D depth map to 3D voxelized grid improves performance  Estimating the per-voxel likelihood (3D heatmap) gives more accurate estimation compared with directly regressing 3D coordinates  The table shows the benefit of the volumetric input and output representation  Effect of input-output representation
  • 30. Intelligent and Invisible Computing 30 Ablation Study Fig. Ref.pt refinement network (localization refinement)  The epoch ensemble averages estimation from several epochs  In contrast to other ensemble techniques, it ensembles models from a single training  We used models from all epochs (10 epochs) for the ensemble  In multi-GPU environment, it does not increase running time  More accurate and robust estimation Fig. Effect of the localization refinement
  • 31. Intelligent and Invisible Computing 31 Quantitative Results: Hand Pose ICVL NYU MSRA
  • 32. Intelligent and Invisible Computing 32 Quantitative Results: Hand Pose
  • 33. Intelligent and Invisible Computing 33 Quantitative Results: Hand Pose
  • 34. Intelligent and Invisible Computing 34 HANDS 2017 Challenge Results
  • 35. Intelligent and Invisible Computing 36 HANDS 2017 Challenge Results
  • 36. Intelligent and Invisible Computing 37 HANDS 2017 Challenge Results
  • 38. Intelligent and Invisible Computing 39 Qualitative Results  NYU dataset
  • 39. Intelligent and Invisible Computing 40 Qualitative Results  MSRA dataset
  • 40. Intelligent and Invisible Computing 41 Qualitative Results  HANDS 2017 Challenge dataset
  • 41. Intelligent and Invisible Computing 42 Quantitative Results: Human Pose  ITOP dataset
  • 42. Intelligent and Invisible Computing 43 Qualitative Results  ITOP dataset: Front View
  • 43. Intelligent and Invisible Computing 44 Qualitative Results  ITOP dataset: Top View
  • 44. Intelligent and Invisible Computing 45 Qualitative Results  ICVL dataset: Frame-based Video
  • 45. Intelligent and Invisible Computing 46 Qualitative Results  NYU dataset: Frame-based Video
  • 46. Intelligent and Invisible Computing 47 Qualitative Results  MSRA dataset (grouped by gesture): Frame-based Video
  • 47. Conclusion  We proposed a novel and powerful network, V2V-PoseNet, for 3D hand and human pose estimation from a single depth map  Converted 2D depth map into the 3D voxel representation and estimated the per- voxel likelihood (3D heatmap) for each keypoint instead of directly regressing 3D coordinates  Significantly outperformed almost all the existing methods in almost all the 3D hand and human pose estimation dataset  Achieved the 1st place in HANDS 2017 frame-based 3D hand pose estimation challenge  Learning physical constraints via generative approach and improving encoder- decoder for multi-scale information are future works  Code is available: https://github.com/mks0601/V2V-PoseNet_RELEASE
  • 48. Intelligent and Invisible Computing 49 Thank you http://cv.snu.ac.kr