Centernet

•

3 j'aime•1,356 vues

Slide for study session given by Christian Saravia at Arithmer inc. It is a summary of recent method for object detection, centernet. Arithmer株式会社は東京大学大学院数理科学研究科発の数学の会社です。私達は現代数学を応用して、様々な分野のソリューションに、新しい高度AIシステムを導入しています。AIをいかに上手に使って仕事を効率化するか、そして人々の役に立つ結果を生み出すのか、それを考えるのが私たちの仕事です。 Arithmer began at the University of Tokyo Graduate School of Mathematical Sciences. Today, our research of modern mathematics and AI systems has the capability of providing solutions when dealing with tough complex issues. At Arithmer we believe it is our job to realize the functions of AI through improving work efficiency and producing more useful results for society.

Technologie

2020/08/28
Centernet: Object as Points
Arithmer Dynamics Div. Christian Saravia

Self-Introduction
Christian Martin Saravia Hernandez
● Graduate School
○ Universidad Peruana de Ciencias Aplicadas
○ Bs. Software Engineering
● Former Job
○ Ficha Inc
■ Algorithm Development Engineer
● Research & application of deep learning in computer vision problems
● Current Job
○ Application of machine learning / deep learning to computer vision problems
■ Object detection
■ Object classification

Purpose
Purpose of this material
● Understand an anchor free approach object detection algorithm

Agenda
Agenda
● Current object detection approaches
● Centernet approach
● Object as Points
● Training
○ Keypoint heatmap
○ Local offset
○ Size prediction
○ Loss function
● Network Architecture
○ DLA
○ Modified DLA
● Inference
● Results

Background
Current approaches
● Object detections model (such as Yolo, SSD, etc.) rely on the usage of anchor boxes
● Anchor boxes are not completely optimal:
○ Wasteful: SSD300 does 8732 detections per class, and yolo448 does 98 detections per class,
which means that most of the box are discarded
○ Inefficient: We have to process all the boxes (even we will discard them later), which comes with
more processing time
○ Require post processing: like non-max suppression algorithm
○ Fixed: SSD requires fixed scale and steps of boxes, while yolov3 fixes the size of the anchors per
detection level

Centernet
Centernet approach
● End-to-end differentiable solution
● Relies on keypoint estimation to find the center points and regress all other object properties(such as
size)
● As a result, the model is simpler, faster and more accurate than bounding-box based detectors
Speed-accuracy trade-off on COCO dataset

Keypoint Heatmap
* If two gaussians of the same class overlap, take the element-wise maximum

Network Architecture
● Authors experiment with different backbone architectures, obtaining different results:
● The backbone that produces best speed/accuracy tradeoff is DLA-34 (modified by authors)
Results without test augmentation(N.A.), flip testing(F), and multi-scale augmentation(MS). HW: Intel
Core i7-8086k CPU, Titan Xp GPU

Network Architecture
● Deep Layer Aggregation:
Iterative deep aggregation
Upsample 2x
Aggregation node:
Conv2d(1x1)
Batch Normalization
Relu
Stage node:
Conv2d(3x3)
Batch Normalization
Relu
Conv2d(3x3)
Batch Normalization
Relu

Network Architecture
● Modified Deep Layer Aggregation:
Iterative deep aggregation
Upsample 2x
Deformable convolution
Aggregation node:
Conv2d(1x1)
Batch Normalization
Relu
Stage node:
Conv2d(3x3)
Batch Normalization
Relu
Conv2d(3x3)
Batch Normalization
Relu

Results
● Object detection
● Human pose estimation

Results
Experimental results on Pascal VOC 2007 test. The
results are shown in mAP@0.5. Flip test is used for
Centernet. The FPSs for other methods are
copied from the original publications
COCO test-dev. Frame-per-second (FPS) were measured
on the same machine whenever possible. Italic FPS
highlight the cases, where the performance measure was
copied from the original publication

Recommandé

Objects as points (CenterNet) review [CDM]Dongmin Choi

Objects as pointsDADAJONJURAKUZIEV

A Beginner's Guide to Monocular Depth EstimationRyo Takahashi

PR-146: CornerNet detecting objects as paired keypointsjaewon lee

Depth Fusion from RGB and Depth Sensors IIYu Huang

Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo

support vector regressionAkhilesh Joshi

Object detectionJksuryawanshi

Recommandé

Objects as points (CenterNet) review [CDM]Dongmin Choi

Objects as pointsDADAJONJURAKUZIEV

A Beginner's Guide to Monocular Depth EstimationRyo Takahashi

PR-146: CornerNet detecting objects as paired keypointsjaewon lee

Depth Fusion from RGB and Depth Sensors IIYu Huang

Improved Trainings of Wasserstein GANs (WGAN-GP)Sangwoo Mo

support vector regressionAkhilesh Joshi

Object detectionJksuryawanshi

Gradient Boosted treesNihar Ranjan

Introduction to object detectionBrodmann17

EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkPeinan ZHANG

Generative Adversarial Network (GAN)Prakhar Rastogi

Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks

End-to-End Object Detection with TransformersSeunghyun Hwang

3-d interpretation from single 2-d image for autonomous driving IIYu Huang

Lecture 1 for Digital Image Processing (2nd Edition)Moe Moe Myint

DBSCAN : A Clustering AlgorithmPınar Yahşi

Image classification using cnnDebarko De

PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee

Final thesis presentationPawan Singh

Density based clusteringYaswanthHariKumarVud

GAN - Theory and ApplicationsEmanuele Ghelfi

Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Jia-Bin Huang

Content Based Image Retrieval Swati Chauhan

Image segmentationGayan Sampath

03 digital image fundamentals DIPbabak danyal

Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa

Image Denoising using Spatial Domain Filters: A Quantitative StudyAnmol Sharma

A Kaggle TalkLex Toumbourou

Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Lviv Startup Club

Contenu connexe

Tendances

Gradient Boosted treesNihar Ranjan

Introduction to object detectionBrodmann17

EMNLP 2014: Opinion Mining with Deep Recurrent Neural NetworkPeinan ZHANG

Generative Adversarial Network (GAN)Prakhar Rastogi

Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...Preferred Networks

End-to-End Object Detection with TransformersSeunghyun Hwang

3-d interpretation from single 2-d image for autonomous driving IIYu Huang

Lecture 1 for Digital Image Processing (2nd Edition)Moe Moe Myint

DBSCAN : A Clustering AlgorithmPınar Yahşi

Image classification using cnnDebarko De

PR-132: SSD: Single Shot MultiBox DetectorJinwon Lee

Final thesis presentationPawan Singh

Density based clusteringYaswanthHariKumarVud

GAN - Theory and ApplicationsEmanuele Ghelfi

Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012Jia-Bin Huang

Content Based Image Retrieval Swati Chauhan

Image segmentationGayan Sampath

03 digital image fundamentals DIPbabak danyal

Pruning convolutional neural networks for resource efficient inferenceKaushalya Madhawa

Image Denoising using Spatial Domain Filters: A Quantitative StudyAnmol Sharma

Tendances (20)

Gradient Boosted trees

Introduction to object detection

EMNLP 2014: Opinion Mining with Deep Recurrent Neural Network

Generative Adversarial Network (GAN)

Introduction to Graph Neural Networks: Basics and Applications - Katsuhiko Is...

End-to-End Object Detection with Transformers

3-d interpretation from single 2-d image for autonomous driving II

Lecture 1 for Digital Image Processing (2nd Edition)

DBSCAN : A Clustering Algorithm

Image classification using cnn

PR-132: SSD: Single Shot MultiBox Detector

Final thesis presentation

Density based clustering

GAN - Theory and Applications

Saliency Detection via Divergence Analysis: A Unified Perspective ICPR 2012

Content Based Image Retrieval

Image segmentation

03 digital image fundamentals DIP

Pruning convolutional neural networks for resource efficient inference

Image Denoising using Spatial Domain Filters: A Quantitative Study

Similaire à Centernet

A Kaggle TalkLex Toumbourou

Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...Lviv Startup Club

Software Design Practices for Large-Scale AutomationHao Xu

Lessons learned from designing a QA Automation for analytics databases (big d...Omid Vahdaty

HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.

Data pipelines from zero to solidLars Albertsson

Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI

Caching in (DevoxxUK 2013)RichardWarburton

Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...Thien Q. Tran

Multicore architecturesMuhammet SOYTÜRK

[ppt]butest

Netflix machine learningAmer Ather

Exploiting GPU's for Columnar DataFrrames by Kiran LonikarSpark Summit

DruidDori Waldman

Kubernetes Workload RebalancingOlaf Reitmaier Veracierta

Big data 2.0, deep learning and financial UsecasesArvind Rapaka

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering

Automatic Machine Learning, AutoMLHimadri Mishra

Architectural Optimizations for High Performance and Energy Efficient Smith-W...NECST Lab @ Politecnico di Milano

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...confluent

Similaire à Centernet (20)

A Kaggle Talk

Volodymyr Lyubinets. One startup's journey of building ML pipelines for text ...

Software Design Practices for Large-Scale Automation

Lessons learned from designing a QA Automation for analytics databases (big d...

HP - Jerome Rolia - Hadoop World 2010

Data pipelines from zero to solid

Semantic Segmentation on Satellite Imagery

Caching in (DevoxxUK 2013)

Set Transfomer: A Framework for Attention-based Permutaion-Invariant Neural N...

Multicore architectures

[ppt]

Netflix machine learning

Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar

Druid

Kubernetes Workload Rebalancing

Big data 2.0, deep learning and financial Usecases

대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화

Automatic Machine Learning, AutoML

Architectural Optimizations for High Performance and Energy Efficient Smith-W...

From Zero to Streaming Healthcare in Production (Alexander Kouznetsov, Invita...

Plus de Arithmer Inc.

コーディネートレコメンドArithmer Inc.

Test for AI modelArithmer Inc.

最適化Arithmer Inc.

Arithmerソリューション紹介流体予測システムArithmer Inc.

Weakly supervised semantic segmentation of 3D point cloudArithmer Inc.

Arithmer NLP 自然言語処理ソリューション紹介Arithmer Inc.

Arithmer Robo IntroductionArithmer Inc.

Arithmer AIチャットボットArithmer Inc.

Arithmer R3 Introduction Arithmer Inc.

VIBE: Video Inference for Human Body Pose and Shape EstimationArithmer Inc.

Arithmer Inspection IntroductionArithmer Inc.

全力解説！TransformerArithmer Inc.

Arithmer NLP IntroductionArithmer Inc.

Introduction of Quantum Annealing and D-Wave MachinesArithmer Inc.

Arithmer OCR IntroductionArithmer Inc.

Arithmer Dynamics Introduction Arithmer Inc.

ArithmerDB IntroductionArithmer Inc.

Summarizing videos with AttentionArithmer Inc.

3D human body modeling from RGB imagesArithmer Inc.

YOLACTArithmer Inc.

Plus de Arithmer Inc. (20)

コーディネートレコメンド

Test for AI model

最適化

Arithmerソリューション紹介流体予測システム

Weakly supervised semantic segmentation of 3D point cloud

Arithmer NLP 自然言語処理ソリューション紹介

Arithmer Robo Introduction

Arithmer AIチャットボット

Arithmer R3 Introduction

VIBE: Video Inference for Human Body Pose and Shape Estimation

Arithmer Inspection Introduction

全力解説！Transformer

Arithmer NLP Introduction

Introduction of Quantum Annealing and D-Wave Machines

Arithmer OCR Introduction

Arithmer Dynamics Introduction

ArithmerDB Introduction

Summarizing videos with Attention

3D human body modeling from RGB images

YOLACT

Dernier

How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes

A Domino Admins Adventures (Engage 2024)Gabriella Davis

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo

Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer

The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los

🐬 The future of MySQL is Postgres 🐘RTylerCroy

08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls

TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc

How to convert PDF to text with Nanonetsnaman860154

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK

Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko

Finology Group – Insurtech Innovation Award 2024The Digital Insurer

Automating Google Workspace (GWS) & more with Apps Scriptwesley chun

Salesforce Community Group Quito, Salesforce 101Paola De la Torre

From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science

Dernier (20)

How to Troubleshoot Apps for the Modern Connected Worker

A Domino Admins Adventures (Engage 2024)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...

Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...

Axa Assurance Maroc - Insurer Innovation Award 2024

The 7 Things I Know About Cyber Security After 25 Years | April 2024

🐬 The future of MySQL is Postgres 🐘

08448380779 Call Girls In Greater Kailash - I Women Seeking Men

TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments

How to convert PDF to text with Nanonets

Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Unblocking The Main Thread Solving ANRs and Frozen Frames

Handwritten Text Recognition for manuscripts and early printed texts

Finology Group – Insurtech Innovation Award 2024

Automating Google Workspace (GWS) & more with Apps Script

Salesforce Community Group Quito, Salesforce 101

From Event to Action: Accelerate Your Decision Making with Real-Time Automation

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx

Centernet

1. 2020/08/28 Centernet: Object as Points Arithmer Dynamics Div. Christian Saravia

2. Self-Introduction Christian Martin Saravia Hernandez ● Graduate School ○ Universidad Peruana de Ciencias Aplicadas ○ Bs. Software Engineering ● Former Job ○ Ficha Inc ■ Algorithm Development Engineer ● Research & application of deep learning in computer vision problems ● Current Job ○ Application of machine learning / deep learning to computer vision problems ■ Object detection ■ Object classification

3. Purpose Purpose of this material ● Understand an anchor free approach object detection algorithm

4. Agenda Agenda ● Current object detection approaches ● Centernet approach ● Object as Points ● Training ○ Keypoint heatmap ○ Local offset ○ Size prediction ○ Loss function ● Network Architecture ○ DLA ○ Modified DLA ● Inference ● Results

5. Background Current approaches ● Object detections model (such as Yolo, SSD, etc.) rely on the usage of anchor boxes ● Anchor boxes are not completely optimal: ○ Wasteful: SSD300 does 8732 detections per class, and yolo448 does 98 detections per class, which means that most of the box are discarded ○ Inefficient: We have to process all the boxes (even we will discard them later), which comes with more processing time ○ Require post processing: like non-max suppression algorithm ○ Fixed: SSD requires fixed scale and steps of boxes, while yolov3 fixes the size of the anchors per detection level

6. Centernet Centernet approach ● End-to-end differentiable solution ● Relies on keypoint estimation to find the center points and regress all other object properties(such as size) ● As a result, the model is simpler, faster and more accurate than bounding-box based detectors Speed-accuracy trade-off on COCO dataset

7. Object as Points

8. Training

9. Keypoint Heatmap

10. Keypoint Heatmap * If two gaussians of the same class overlap, take the element-wise maximum

11. Keypoint Heatmap Loss calculation

12. Local Offset Motivation

13. Local Offset Loss calculation

14. Size Prediction Motivation

15. Size Prediction Loss calculation

16. Loss Function

17. Network Architecture ● Authors experiment with different backbone architectures, obtaining different results: ● The backbone that produces best speed/accuracy tradeoff is DLA-34 (modified by authors) Results without test augmentation(N.A.), flip testing(F), and multi-scale augmentation(MS). HW: Intel Core i7-8086k CPU, Titan Xp GPU

18. Network Architecture ● Deep Layer Aggregation: Iterative deep aggregation Upsample 2x Aggregation node: Conv2d(1x1) Batch Normalization Relu Stage node: Conv2d(3x3) Batch Normalization Relu Conv2d(3x3) Batch Normalization Relu

19. Network Architecture ● Modified Deep Layer Aggregation: Iterative deep aggregation Upsample 2x Deformable convolution Aggregation node: Conv2d(1x1) Batch Normalization Relu Stage node: Conv2d(3x3) Batch Normalization Relu Conv2d(3x3) Batch Normalization Relu

20. Inference Points to boxes

21. Results ● Object detection ● Human pose estimation

22. Results Experimental results on Pascal VOC 2007 test. The results are shown in mAP@0.5. Flip test is used for Centernet. The FPSs for other methods are copied from the original publications COCO test-dev. Frame-per-second (FPS) were measured on the same machine whenever possible. Italic FPS highlight the cases, where the performance measure was copied from the original publication

23. 23