SlideShare une entreprise Scribd logo
1  sur  29
Télécharger pour lire hors ligne
Objects as Points
Choi Dongmin
Yonsei University Severance Hospital CCIDS
Contents
• Abstract

• Introduction

• Preliminary

• Objects as Points

• Implementation details

• Experiments

• Conclusion

• Appendix : Model Architecture
Abstract
Abstract
• Most successful object detectors enumerate a nearly exhaustive
list of potential object locations and classify each

- wasteful, inefficient, requiring additional pre-processing

• An object as a single point, the center point of its bounding box

- find center points

- regress other properties (size, 3D location, orientation, pose)

• CenterNet

- end-to-end differentiable

- simpler, faster, and more accurate

- runs in real-time
Introduction
Introduction
Object Detection
Represent each object through an axis-aligned
bounding box that tightly encompasses the the object
https://hoya012.github.io/blog/Tutorials-of-Object-Detection-Using-Deep-Learning-the-application-of-object-detection/
Introduction
One-Stage Detector
- Anchor (possible bounding box) based method
Wu et al. Recent Advances in Deep Learning for Object Detection. arXiv:1908.03673
Introduction
Two-Stage Detector
Wu et al. Recent Advances in Deep Learning for Object Detection. arXiv:1908.03673
Introduction
Post-processing : NMS (Non Maximum Suppression)
https://heiwais25.github.io/machine%20learning/cnn/2018/05/10/non-maximum-suppression/
- Remove duplicated detections for the same instance by computing IoU

- Hard to differentiate and train

- Most current detectors are not end-to-end because of NMS
Introduction
Objects as Points
- Center point

- Regression of other properties (size, dimension, 3D extent, orientation, and pose)

- Standard keypoint estimation problem 

- Single network forward-pass without nms post-processing
Introduction
Extension to Other Tasks
- Easy to extend method to other tasks

- 3D object detection,

multi-person human pose estimation

- Additional outputs at each center
point (right figure)
Introduction
Official Pytorch Code : https://github.com/xingyizhou/CenterNet
Preliminary
N Ukita et al. Semi- and Weakly-supervised Human Pose Estimation. arXiv:1906.0139
I ∈ RW×H×3 ̂Y ∈ [0,1]
W
R ×H
R ×C
- : Output Stride

- : the number of keypoints
R
C
Preliminary
Preliminary
Penalty-reduced pixel-wise logistic regression with focal loss
- : the number of keypoints in image

- , (hyper-parameters for focal-loss)
N
α = 2 β = 4
Preliminary
Local Offset : L1 Loss
- To recover the discretization error caused by the output stride

- predict a local offset for each center point̂O ∈ R
W
R × H
R ×2
Objects as Points
: the bounding box of object with category(x(k)
1
, y(k)
1
, x(k)
2
, y(k)
2
) k ck
- center point 

- object size 

- a single size prediction 

- L1 loss :
pk =
(
x(k)
1 + x(k)
2
2
,
y(k)
1 + y(k)
2
2 )
sk = (x(k)
2
− x(k)
1
, y(k)
2
− y(k)
1 )
̂S ∈ R
W
R × H
R ×2
Lsize =
1
N
N
∑
k=1
| ̂Spk
− sk |
(x(k)
1
, y(k)
1
)
(x(k)
2
, y(k)
2
)
Objects as Points
Training objective
Ldet = Lk + λsizeLsize + λoff Loff
- 

- total outputs at each location
λsize = 0.1, λoff = 1
C + 4
Objects as Points
From points to bounding boxes
: the set of detected center points of clasŝPc n ̂P = {( ̂xi, ̂yi)}n
i=1 c
Bounding box location : ( ̂xi + δ ̂xi − ̂wi/2, ̂yi + δ ̂yi − ̂hi/2,
̂xi + δ ̂xi + ̂wi/2, ̂yi + δ ̂yi + ̂hi/2)
- : the offset prediction

- : the size prediction
(δ ̂xi, δ ̂yi) = ̂O ̂xi, ̂yi
( ̂wi, ̂hi) = ̂S ̂xi, ̂yi
Implementation details
A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937
Experiment with 4 architectures
Implementation details
A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937
Experiment with 4 architectures1. Hourglass
Implementation details
Experiment with 4 architectures1. Hourglass
A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937
- Downsamples the input by 4x, followed by sequential hourglass module (Used 2-stacks)

- Each hourglass module is a symmetric 5-layer-down and up-convolutional network with skip
connections

- Quite large but yields the best performance
Implementation details
Experiment with 4 architectures2. ResNet
B Xiao et al. Simple baselines for human pose estimation and tracking. arXiv:1804.0620

J Dai et al. Deformable Convolutional Networks. arXiv:1703.06211
- A standard residual network with three up-convolutional networks

- Add one 3x3 deformable convolutional layer before each up-conv

- Up-conv kernels initialization : Bilinear interpolation
Implementation details
Experiment with 4 architectures3. DLA
F Yu et al. Deep Layer Aggregation. arXiv:1707.06484
- An image classification network with hierarchical skip connections

- utilize the fully convolutional upsampling for dense prediction

- Replace the original conv with 3x3 deformable conv at every upsampling layer
Implementation details
Experiment with 4 architecturesCOCO validation result
- N.A : without test augmentation

- F : flip testing

- MS : multi-scale augmentation (0.5, 0.75, 1, 1.25, 1.5) with NMS to merge results
Experiments
Conclusion
• A new representations for objects: as points

• Simple, fast, accurate, and end-to-end differentiable without NMS
post-processing

• Estimation of a range of additional object properties, such as

pose, 3D orientation, depth and extent, in one single forward pass

• A new direction for real-time object recognition
Appendix
Model Architecture
Thank you
Yonsei University Severance Hospital CCIDS

Contenu connexe

Tendances

Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Universitat Politècnica de Catalunya
 
[DL輪読会]Objects as Points
[DL輪読会]Objects as Points[DL輪読会]Objects as Points
[DL輪読会]Objects as PointsDeep Learning JP
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methodsBrodmann17
 
[DL輪読会]A System for General In-Hand Object Re-Orientation
[DL輪読会]A System for General In-Hand Object Re-Orientation[DL輪読会]A System for General In-Hand Object Re-Orientation
[DL輪読会]A System for General In-Hand Object Re-OrientationDeep Learning JP
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersSungchul Kim
 
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...Hideki Tsunashima
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionDADAJONJURAKUZIEV
 
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual GoalsDeep Learning JP
 
[DL Hacks] Objects as Points
[DL Hacks] Objects as Points[DL Hacks] Objects as Points
[DL Hacks] Objects as PointsDeep Learning JP
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object DetectionDeep Learning JP
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with TransformersToru Tamaki
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationRyo Takahashi
 
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified NetworkDeep Learning JP
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detectionWenjing Chen
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...Deep Learning JP
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation경훈 김
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksUsman Qayyum
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationDat Nguyen
 

Tendances (20)

Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...Faster R-CNN: Towards real-time object detection with region proposal network...
Faster R-CNN: Towards real-time object detection with region proposal network...
 
[DL輪読会]Objects as Points
[DL輪読会]Objects as Points[DL輪読会]Objects as Points
[DL輪読会]Objects as Points
 
Advanced deep learning based object detection methods
Advanced deep learning based object detection methodsAdvanced deep learning based object detection methods
Advanced deep learning based object detection methods
 
[DL輪読会]A System for General In-Hand Object Re-Orientation
[DL輪読会]A System for General In-Hand Object Re-Orientation[DL輪読会]A System for General In-Hand Object Re-Orientation
[DL輪読会]A System for General In-Hand Object Re-Orientation
 
Emerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision TransformersEmerging Properties in Self-Supervised Vision Transformers
Emerging Properties in Self-Supervised Vision Transformers
 
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...
SPACE: Unsupervised Object-Oriented Scene Representation via Spatial Attentio...
 
You Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object DetectionYou Only Look Once: Unified, Real-Time Object Detection
You Only Look Once: Unified, Real-Time Object Detection
 
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
[DL輪読会]ViNG: Learning Open-World Navigation with Visual Goals
 
[DL Hacks] Objects as Points
[DL Hacks] Objects as Points[DL Hacks] Objects as Points
[DL Hacks] Objects as Points
 
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
[DL輪読会]YOLOv4: Optimal Speed and Accuracy of Object Detection
 
論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers論文紹介:End-to-End Object Detection with Transformers
論文紹介:End-to-End Object Detection with Transformers
 
A Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth EstimationA Beginner's Guide to Monocular Depth Estimation
A Beginner's Guide to Monocular Depth Estimation
 
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network
[DL輪読会]FOTS: Fast Oriented Text Spotting with a Unified Network
 
PR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox DetectorPR-132: SSD: Single Shot MultiBox Detector
PR-132: SSD: Single Shot MultiBox Detector
 
Deep learning for object detection
Deep learning for object detectionDeep learning for object detection
Deep learning for object detection
 
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
[DL輪読会]GENESIS: Generative Scene Inference and Sampling with Object-Centric L...
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Deep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentationDeep sort and sort paper introduce presentation
Deep sort and sort paper introduce presentation
 
Object Detection using Deep Neural Networks
Object Detection using Deep Neural NetworksObject Detection using Deep Neural Networks
Object Detection using Deep Neural Networks
 
Mask-RCNN for Instance Segmentation
Mask-RCNN for Instance SegmentationMask-RCNN for Instance Segmentation
Mask-RCNN for Instance Segmentation
 

Similaire à Objects as points (CenterNet) review [CDM]

object detection paper review
object detection paper reviewobject detection paper review
object detection paper reviewYoonho Na
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineSoma Boubou
 
Scrdet++ analysis
Scrdet++ analysisScrdet++ analysis
Scrdet++ analysisNEHA Kapoor
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Dongmin Choi
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance
 
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object DetectionDeep Learning JP
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Sergey Karayev
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...inside-BigData.com
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)WoochulShin10
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Sean Moran
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual RepresentationsSimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representationsynxm25hpxp
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Abdulrahman Kerim
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentShubham Joshi
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Mostafa G. M. Mostafa
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSSoma Boubou
 
Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionalesJoseAlGarcaGutierrez
 

Similaire à Objects as points (CenterNet) review [CDM] (20)

object detection paper review
object detection paper reviewobject detection paper review
object detection paper review
 
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning MachineFast Object Recognition from 3D Depth Data with Extreme Learning Machine
Fast Object Recognition from 3D Depth Data with Extreme Learning Machine
 
Scrdet++ analysis
Scrdet++ analysisScrdet++ analysis
Scrdet++ analysis
 
Deformable DETR Review [CDM]
Deformable DETR Review [CDM]Deformable DETR Review [CDM]
Deformable DETR Review [CDM]
 
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...
 
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
[DL輪読会]PV-RCNN: Point-Voxel Feature Set Abstraction for 3D Object Detection
 
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
Lecture 2.B: Computer Vision Applications - Full Stack Deep Learning - Spring...
 
Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018Object Detection - Míriam Bellver - UPC Barcelona 2018
Object Detection - Míriam Bellver - UPC Barcelona 2018
 
object-detection.pptx
object-detection.pptxobject-detection.pptx
object-detection.pptx
 
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
Scratch to Supercomputers: Bottoms-up Build of Large-scale Computational Lens...
 
D3L4-objects.pdf
D3L4-objects.pdfD3L4-objects.pdf
D3L4-objects.pdf
 
ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)ImageNet classification with deep convolutional neural networks(2012)
ImageNet classification with deep convolutional neural networks(2012)
 
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
Learning to Project and Binarise for Hashing-based Approximate Nearest Neighb...
 
Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)Deep Learning for Computer Vision: Object Detection (UPC 2016)
Deep Learning for Computer Vision: Object Detection (UPC 2016)
 
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual RepresentationsSimCLR: A Simple Framework for Contrastive Learning of Visual Representations
SimCLR: A Simple Framework for Contrastive Learning of Visual Representations
 
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
Towards Accurate Multi-person Pose Estimation in the Wild (My summery)
 
IOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_presentIOEfficientParalleMatrixMultiplication_present
IOEfficientParalleMatrixMultiplication_present
 
Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)Neural Networks: Principal Component Analysis (PCA)
Neural Networks: Principal Component Analysis (PCA)
 
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORSADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
ADAPTIVE FILTER FOR DENOISING 3D DATA CAPTURED BY DEPTH SENSORS
 
Introducción a las redes convolucionales
Introducción a las redes convolucionalesIntroducción a las redes convolucionales
Introducción a las redes convolucionales
 

Plus de Dongmin Choi

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...Dongmin Choi
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Dongmin Choi
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer VisionDongmin Choi
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Dongmin Choi
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]Dongmin Choi
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationDongmin Choi
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]Dongmin Choi
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationDongmin Choi
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Dongmin Choi
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Dongmin Choi
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingDongmin Choi
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Dongmin Choi
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Dongmin Choi
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Dongmin Choi
 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Dongmin Choi
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Dongmin Choi
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...Dongmin Choi
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]Dongmin Choi
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...Dongmin Choi
 

Plus de Dongmin Choi (20)

[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
[Review] BoxInst: High-Performance Instance Segmentation with Box Annotations...
 
Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]Review: Incremental Few-shot Instance Segmentation [CDM]
Review: Incremental Few-shot Instance Segmentation [CDM]
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
Transformer in Computer Vision
Transformer in Computer VisionTransformer in Computer Vision
Transformer in Computer Vision
 
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
Review : Adaptive Consistency Regularization for Semi-Supervised Transfer Lea...
 
YolactEdge Review [cdm]
YolactEdge Review [cdm]YolactEdge Review [cdm]
YolactEdge Review [cdm]
 
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image SegmentationReview : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
Review : Inter-slice Context Residual Learning for 3D Medical Image Segmentation
 
ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]ViT (Vision Transformer) Review [CDM]
ViT (Vision Transformer) Review [CDM]
 
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic SegmentationReview : Prototype Mixture Models for Few-shot Semantic Segmentation
Review : Prototype Mixture Models for Few-shot Semantic Segmentation
 
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
Review : PolarMask: Single Shot Instance Segmentation with Polar Representati...
 
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
Review : Multi-Domain Image Completion for Random Missing Input Data [cdm]
 
Review : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-trainingReview : Rethinking Pre-training and Self-training
Review : Rethinking Pre-training and Self-training
 
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
Review : Structure Boundary Preserving Segmentation
for Medical Image with Am...
 
Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]Pyradiomics Customization [CDM]
Pyradiomics Customization [CDM]
 
Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]Seeing What a GAN Cannot Generate [cdm]
Seeing What a GAN Cannot Generate [cdm]
 
Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]Neural network pruning with residual connections and limited-data review [cdm]
Neural network pruning with residual connections and limited-data review [cdm]
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 
How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...How much position information do convolutional neural networks encode? review...
How much position information do convolutional neural networks encode? review...
 
Augmix review [cdm]
Augmix review [cdm]Augmix review [cdm]
Augmix review [cdm]
 
Bag of tricks for image classification with convolutional neural networks r...
Bag of tricks for image classification with convolutional neural networks   r...Bag of tricks for image classification with convolutional neural networks   r...
Bag of tricks for image classification with convolutional neural networks r...
 

Dernier

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Thomas Poetter
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024Susanna-Assunta Sansone
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Boston Institute of Analytics
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一F sss
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxMike Bennett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPTBoston Institute of Analytics
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 

Dernier (20)

Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
Minimizing AI Hallucinations/Confabulations and the Path towards AGI with Exa...
 
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
FAIR, FAIRsharing, FAIR Cookbook and ELIXIR - Sansone SA - Boston 2024
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...Student profile product demonstration on grades, ability, well-being and mind...
Student profile product demonstration on grades, ability, well-being and mind...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesConf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines
 
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
Data Analysis Project : Targeting the Right Customers, Presentation on Bank M...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
办理学位证加利福尼亚大学洛杉矶分校毕业证,UCLA成绩单原版一比一
 
Semantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptxSemantic Shed - Squashing and Squeezing.pptx
Semantic Shed - Squashing and Squeezing.pptx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default  Presentation : Data Analysis Project PPTPredictive Analysis for Loan Default  Presentation : Data Analysis Project PPT
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 

Objects as points (CenterNet) review [CDM]

  • 1. Objects as Points Choi Dongmin Yonsei University Severance Hospital CCIDS
  • 2. Contents • Abstract • Introduction • Preliminary • Objects as Points • Implementation details • Experiments • Conclusion • Appendix : Model Architecture
  • 4. Abstract • Most successful object detectors enumerate a nearly exhaustive list of potential object locations and classify each
 - wasteful, inefficient, requiring additional pre-processing • An object as a single point, the center point of its bounding box
 - find center points
 - regress other properties (size, 3D location, orientation, pose) • CenterNet
 - end-to-end differentiable
 - simpler, faster, and more accurate
 - runs in real-time
  • 6. Introduction Object Detection Represent each object through an axis-aligned bounding box that tightly encompasses the the object https://hoya012.github.io/blog/Tutorials-of-Object-Detection-Using-Deep-Learning-the-application-of-object-detection/
  • 7. Introduction One-Stage Detector - Anchor (possible bounding box) based method Wu et al. Recent Advances in Deep Learning for Object Detection. arXiv:1908.03673
  • 8. Introduction Two-Stage Detector Wu et al. Recent Advances in Deep Learning for Object Detection. arXiv:1908.03673
  • 9. Introduction Post-processing : NMS (Non Maximum Suppression) https://heiwais25.github.io/machine%20learning/cnn/2018/05/10/non-maximum-suppression/ - Remove duplicated detections for the same instance by computing IoU - Hard to differentiate and train - Most current detectors are not end-to-end because of NMS
  • 10. Introduction Objects as Points - Center point - Regression of other properties (size, dimension, 3D extent, orientation, and pose) - Standard keypoint estimation problem - Single network forward-pass without nms post-processing
  • 11. Introduction Extension to Other Tasks - Easy to extend method to other tasks - 3D object detection,
 multi-person human pose estimation - Additional outputs at each center point (right figure)
  • 12. Introduction Official Pytorch Code : https://github.com/xingyizhou/CenterNet
  • 13. Preliminary N Ukita et al. Semi- and Weakly-supervised Human Pose Estimation. arXiv:1906.0139 I ∈ RW×H×3 ̂Y ∈ [0,1] W R ×H R ×C - : Output Stride - : the number of keypoints R C
  • 15. Preliminary Penalty-reduced pixel-wise logistic regression with focal loss - : the number of keypoints in image - , (hyper-parameters for focal-loss) N α = 2 β = 4
  • 16. Preliminary Local Offset : L1 Loss - To recover the discretization error caused by the output stride - predict a local offset for each center point̂O ∈ R W R × H R ×2
  • 17. Objects as Points : the bounding box of object with category(x(k) 1 , y(k) 1 , x(k) 2 , y(k) 2 ) k ck - center point - object size - a single size prediction - L1 loss : pk = ( x(k) 1 + x(k) 2 2 , y(k) 1 + y(k) 2 2 ) sk = (x(k) 2 − x(k) 1 , y(k) 2 − y(k) 1 ) ̂S ∈ R W R × H R ×2 Lsize = 1 N N ∑ k=1 | ̂Spk − sk | (x(k) 1 , y(k) 1 ) (x(k) 2 , y(k) 2 )
  • 18. Objects as Points Training objective Ldet = Lk + λsizeLsize + λoff Loff - - total outputs at each location λsize = 0.1, λoff = 1 C + 4
  • 19. Objects as Points From points to bounding boxes : the set of detected center points of clasŝPc n ̂P = {( ̂xi, ̂yi)}n i=1 c Bounding box location : ( ̂xi + δ ̂xi − ̂wi/2, ̂yi + δ ̂yi − ̂hi/2, ̂xi + δ ̂xi + ̂wi/2, ̂yi + δ ̂yi + ̂hi/2) - : the offset prediction - : the size prediction (δ ̂xi, δ ̂yi) = ̂O ̂xi, ̂yi ( ̂wi, ̂hi) = ̂S ̂xi, ̂yi
  • 20. Implementation details A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937 Experiment with 4 architectures
  • 21. Implementation details A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937 Experiment with 4 architectures1. Hourglass
  • 22. Implementation details Experiment with 4 architectures1. Hourglass A Newell et al. Stacked Hourglass Networks for Human Pose Estimation. arXiv:1603.06937 - Downsamples the input by 4x, followed by sequential hourglass module (Used 2-stacks) - Each hourglass module is a symmetric 5-layer-down and up-convolutional network with skip connections - Quite large but yields the best performance
  • 23. Implementation details Experiment with 4 architectures2. ResNet B Xiao et al. Simple baselines for human pose estimation and tracking. arXiv:1804.0620
 J Dai et al. Deformable Convolutional Networks. arXiv:1703.06211 - A standard residual network with three up-convolutional networks - Add one 3x3 deformable convolutional layer before each up-conv - Up-conv kernels initialization : Bilinear interpolation
  • 24. Implementation details Experiment with 4 architectures3. DLA F Yu et al. Deep Layer Aggregation. arXiv:1707.06484 - An image classification network with hierarchical skip connections - utilize the fully convolutional upsampling for dense prediction - Replace the original conv with 3x3 deformable conv at every upsampling layer
  • 25. Implementation details Experiment with 4 architecturesCOCO validation result - N.A : without test augmentation - F : flip testing - MS : multi-scale augmentation (0.5, 0.75, 1, 1.25, 1.5) with NMS to merge results
  • 27. Conclusion • A new representations for objects: as points • Simple, fast, accurate, and end-to-end differentiable without NMS post-processing • Estimation of a range of additional object properties, such as
 pose, 3D orientation, depth and extent, in one single forward pass • A new direction for real-time object recognition
  • 29. Thank you Yonsei University Severance Hospital CCIDS