211121 detection in crowded scenes one proposal, multiple predictions

•

0 likes•270 views

안녕하세요 딥러닝 논문읽기 모임입니다 오늘 업로드된 논문 리뷰 영상은 2020 CVPR에 발표된 'Detection in Crowded Scenes: One Proposal, Multiple Predictions' 라는 제목의 논문입니다. 오늘의 논문도 제목에서 유추가 가능하듯, 여러 오브젝트가, 특히 중첩이 되어 있는 crowded object Detection을 목적으로 합니다. 뛰어난 Object detection모델이더라도, 중첩되는 정도가 심할경우, 기존 프레임워크로는 제대로 성능을 발휘하지 못하는데요,crowded object detection을 위한 새로운 접근법을 제시했고요 이에 특화된 loss와 NMS 알고리즘을 도입하고 결과적으로 CrowdHuman Dataset에서 90.7의 AP로 SOTA를 달성하였습니다. 논문의 리뷰를 위해 이미지처리팀 홍은기님이 자세한 리뷰 도와주셨습니다. 오늘도 많은 관심 미리 감사드립니다 ! https://youtu.be/LPC4m66YZfg

Data & Analytics

Detection in Crowded Scenes:
One Proposal, Multiple Predictions
Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun
Peking University, MEGVII Technology
CVPR 2020
2021.11.21
딥러닝논문읽기모임 이미지처리팀
홍은기, 김병현, 김선옥, 안종식, 이찬혁

2
목차
1. Introduction
2. Proposed Approach: Multiple Instance Prediction
3. Experiment
4. Conclusion & Discussion

3
Chu et al., 2020, Detection in Crowded Scenes: One Proposal, Multiple Predictions
Introduction – Crowded Object Detection

4
1. Proposed a novel approach: Multiple Instance Prediction
2. Proposed a novel loss: EMD loss
3. Proposed a novel NMS: Set NMS
4. Achieved SOTA on CrowdHuman Dataset
Contribution

5
- Shao et al., 2018, CrowdHuman: A Benchmark or Detection Human in a Crowd
- https://www.crowdhuman.org/
• train/val/test: 15,000 / 4,370 / 5,000
• 470K human instances
CrowdHuman Dataset

6
Chu et al., 2020, Detection in Crowded Scenes: One Proposal, Multiple Predictions
• State-of-the-art models on COCO or VOC perform poorly on CrowdHuman dataset
1) Highly overlapped instances are likely to have very similar features
2) Heavily overlapped instances are likely to be mistakenly suppressed by NMS
Fundamental difficulties in crowded object detection

7
https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c
NMS (Non-Maximum Suppression)

8
• For each proposal box, rather than predicting a single instance, propose a set of instances
Solution – multiple instance prediction
(a) Each proposal box predicts a single instance
(intrinsically difficult!). After NMS, only one
prediction survives.
(b) Set NMS removes duplicates from different
proposals while keeping duplicates in a proposal.
single prediction
paradigm
multiple instance
prediction

9
• Step 1: assign a proposal box to ground-truths
Solution – multiple instance prediction
proposal b1
g1
g2
g3

10
• Step 2: make K predictions from one proposal box
Solution – multiple instance prediction
proposal b1
g1
g2
g3
p1
p2
p3
K = 3

11
• Step 3: assign predictions to ground-truths using Earth Mover’s Distance (EMD)
EMD Loss
p1
P2
P3
g1
g2
g3
background
EMD loss:
g1
g2
g3
p1
p2
p3
K = 3

12
• Step 4: apply Set NMS
Set NMS
Set NMS

16
Experiments
• Evaluation Metrics
1) Averaged Precision (AP)
2) MR-2 Miss Rate on False Positive Per Image (FPPI) in [10-2, 100])
3) Jaccard Index
• Datasets
1) CrowdHuman
2) CityPersons
3) COCO
• Network Architecture
1) Backbone: ResNet-50 pre-trained on ImageNet
2) Head: FPN with RoIAlign
3) K = 2

17
Main results and ablation study
Performance on CrowdHuman Dataset

18
Comparison with various NMS strategies
Performance on CrowdHuman Dataset

19
Ablation on Number of Heads
Performance on CrowdHuman Dataset

21
Conclusion & Discussion
1. Proposed approach is not only effective on crowded scenes, but also generalizes well on
normal data.
2. Proposed approach is compatible with other one-stage & two-stage architectures.
3. A local version of DETR (Carion et al., 2020)?

What's hot

Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)Universitat Politècnica de Catalunya

MN-3, MN-Core and HPL - SC21 Green500 BOFPreferred Networks

Graph Convolutional Neural Networks 신동 강

Introduction to 3D Computer Vision and Differentiable RenderingPreferred Networks

[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...Seiya Ito

[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...Taegyun Jeon

FCN-Based 6D Robotic Grasping for Arbitrary Placed ObjectsKusano Hitoshi

TensorFlow Tutorial Part1Sungjoon Choi

NIPS読み会2013: One-shot learning by inverting a compositional causal processnozyh

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華杜

Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...Jumlesha Shaik

Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...Alex Conway

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018Universitat Politècnica de Catalunya

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)Universitat Politècnica de Catalunya

Deep learning for molecules, introduction to chainer chemistryKenta Oono

Beyond data and model parallelism for deep neural networksJunKudo2

DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...Joonhyung Lee

Object classification using CNN & VGG16 Model (Keras and Tensorflow) Lalit Jain

Semantic segmentation with Convolutional Neural Network ApproachesFellowship at Vodafone FutureLab

Pr045 deep lab_semantic_segmentationTaeoh Kim

What's hot (20)

Optimization for Deep Networks (D2L1 2017 UPC Deep Learning for Computer Vision)

MN-3, MN-Core and HPL - SC21 Green500 BOF

Graph Convolutional Neural Networks

Introduction to 3D Computer Vision and Differentiable Rendering

[3D勉強会@関東] Deep Reinforcement Learning of Volume-guided Progressive View Inpa...

[PR12] PR-050: Convolutional LSTM Network: A Machine Learning Approach for Pr...

FCN-Based 6D Robotic Grasping for Arbitrary Placed Objects

TensorFlow Tutorial Part1

NIPS読み会2013: One-shot learning by inverting a compositional causal process

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation

Ml srhwt-machine-learning-based-superlative-rapid-haar-wavelet-transformation...

Convolutional Neural Networks for Image Classification (Cape Town Deep Learni...

Deep Generative Models - Kevin McGuinness - UPC Barcelona 2018

Optimizing Deep Networks (D1L6 Insight@DCU Machine Learning Workshop 2017)

Deep learning for molecules, introduction to chainer chemistry

Beyond data and model parallelism for deep neural networks

DeepLab V3+: Encoder-Decoder with Atrous Separable Convolution for Semantic I...

Object classification using CNN & VGG16 Model (Keras and Tensorflow)

Semantic segmentation with Convolutional Neural Network Approaches

Pr045 deep lab_semantic_segmentation

Similar to 211121 detection in crowded scenes one proposal, multiple predictions

A Gans-Based Deep Learning Framework For Automatic Subsurface Object Recognit...Angie Miller

最近の研究情勢についていくために - Deep Learningを中心に - Hiroshi Fukui

Object Detection and Tracking using Statistical and Stochastic TechniquesVasuhiSamydurai1

Towards Accurate Multi-person Pose Estimation in the Wild (My summery)Abdulrahman Kerim

myashar_research_2016Mark Yashar

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...grssieee

A framework for outlier detection inijfcstjournal

PPT - Deep and Confident Prediction For Time Series at UberJisang Yoon

Human detection in hours ofijistjournal

Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...Tarik Reza Toha

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...Wanjin Yu

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...multimediaeval

"The effect of angles and distance on image-based three-dimensional reconstru...TRUSS ITN

20210226 esa-science-coffee-v2.0Advanced-Concepts-Team

Ieee 2016 nss mic poster N30-21Dae Woon Kim

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...Edge AI and Vision Alliance

PR157: Best of both worlds: human-machine collaboration for object annotationjaewon lee

"An adaptive modular approach to the mining of sensor network ...butest

Time-delayed collective flow diffusion models for inferring latent people flo...Shun Kojima

Binary Analysis - LuxembourgAbhik Roychoudhury

Similar to 211121 detection in crowded scenes one proposal, multiple predictions (20)

A Gans-Based Deep Learning Framework For Automatic Subsurface Object Recognit...

最近の研究情勢についていくために - Deep Learningを中心に -

Object Detection and Tracking using Statistical and Stochastic Techniques

Towards Accurate Multi-person Pose Estimation in the Wild (My summery)

myashar_research_2016

VERIFICATION_&_VALIDATION_OF_A_SEMANTIC_IMAGE_TAGGING_FRAMEWORK_VIA_GENERATIO...

A framework for outlier detection in

PPT - Deep and Confident Prediction For Time Series at Uber

Human detection in hours of

Beyond Counting: Comparisons of Density Maps for Crowd Analysis Tasks—Countin...

Human Behavior Understanding: From Human-Oriented Analysis to Action Recognit...

Classification of Strokes in Table Tennis with a Three Stream Spatio-Temporal...

"The effect of angles and distance on image-based three-dimensional reconstru...

20210226 esa-science-coffee-v2.0

Ieee 2016 nss mic poster N30-21

“Understanding DNN-Based Object Detectors,” a Presentation from Au-Zone Techn...

PR157: Best of both worlds: human-machine collaboration for object annotation

"An adaptive modular approach to the mining of sensor network ...

Time-delayed collective flow diffusion models for inferring latent people flo...

Binary Analysis - Luxembourg

Recently uploaded

Real-Time AI Streaming - AI Max PrincetonTimothy Spann

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档208367051

Vision, Mission, Goals and Objectives ppt..pptxellehsormae

RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993

RadioAdProWritingCinderellabyButleri.pdfgstagge

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...GQ Research

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...ssuserf63bd7

Semantic Shed - Squashing and Squeezing.pptxMike Bennett

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改yuu sss

Defining Constituents, Data Vizzes and Telling a Data StoryJeremy Anderson

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Universitat Politècnica de Catalunya

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...Boston Institute of Analytics

GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch

Student profile product demonstration on grades, ability, well-being and mind...Seán Kennedy

How we prevented account sharing with MFAAndrei Kaleshka

Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics

Conf42-LLM_Adding Generative AI to Real-Time Streaming PipelinesTimothy Spann

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss

Recently uploaded (20)

Real-Time AI Streaming - AI Max Princeton

原版1:1定制南十字星大学毕业证（SCU毕业证）#文凭成绩单#真实留信学历认证永久存档

Vision, Mission, Goals and Objectives ppt..pptx

RABBIT: A CLI tool for identifying bots based on their GitHub events.

RadioAdProWritingCinderellabyButleri.pdf

Biometric Authentication: The Evolution, Applications, Benefits and Challenge...

RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi

NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...

Statistics, Data Analysis, and Decision Modeling, 5th edition by James R. Eva...

Semantic Shed - Squashing and Squeezing.pptx

专业一比一美国俄亥俄大学毕业证成绩单pdf电子版制作修改

Defining Constituents, Data Vizzes and Telling a Data Story

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)

Decoding the Heart: Student Presentation on Heart Attack Prediction with Data...

GA4 Without Cookies [Measure Camp AMS]

Student profile product demonstration on grades, ability, well-being and mind...

How we prevented account sharing with MFA

Heart Disease Classification Report: A Data Analysis Project

Conf42-LLM_Adding Generative AI to Real-Time Streaming Pipelines

办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一

211121 detection in crowded scenes one proposal, multiple predictions

1. Detection in Crowded Scenes: One Proposal, Multiple Predictions Xuangeng Chu, Anlin Zheng, Xiangyu Zhang, Jian Sun Peking University, MEGVII Technology CVPR 2020 2021.11.21 딥러닝논문읽기모임 이미지처리팀 홍은기, 김병현, 김선옥, 안종식, 이찬혁

2. 2 목차 1. Introduction 2. Proposed Approach: Multiple Instance Prediction 3. Experiment 4. Conclusion & Discussion

3. 3 Chu et al., 2020, Detection in Crowded Scenes: One Proposal, Multiple Predictions Introduction – Crowded Object Detection

4. 4 1. Proposed a novel approach: Multiple Instance Prediction 2. Proposed a novel loss: EMD loss 3. Proposed a novel NMS: Set NMS 4. Achieved SOTA on CrowdHuman Dataset Contribution

5. 5 - Shao et al., 2018, CrowdHuman: A Benchmark or Detection Human in a Crowd - https://www.crowdhuman.org/ • train/val/test: 15,000 / 4,370 / 5,000 • 470K human instances CrowdHuman Dataset

6. 6 Chu et al., 2020, Detection in Crowded Scenes: One Proposal, Multiple Predictions • State-of-the-art models on COCO or VOC perform poorly on CrowdHuman dataset 1) Highly overlapped instances are likely to have very similar features 2) Heavily overlapped instances are likely to be mistakenly suppressed by NMS Fundamental difficulties in crowded object detection

7. 7 https://towardsdatascience.com/non-maximum-suppression-nms-93ce178e177c NMS (Non-Maximum Suppression)

8. 8 • For each proposal box, rather than predicting a single instance, propose a set of instances Solution – multiple instance prediction (a) Each proposal box predicts a single instance (intrinsically difficult!). After NMS, only one prediction survives. (b) Set NMS removes duplicates from different proposals while keeping duplicates in a proposal. single prediction paradigm multiple instance prediction

9. 9 • Step 1: assign a proposal box to ground-truths Solution – multiple instance prediction proposal b1 g1 g2 g3

10. 10 • Step 2: make K predictions from one proposal box Solution – multiple instance prediction proposal b1 g1 g2 g3 p1 p2 p3 K = 3

11. 11 • Step 3: assign predictions to ground-truths using Earth Mover’s Distance (EMD) EMD Loss p1 P2 P3 g1 g2 g3 background EMD loss: g1 g2 g3 p1 p2 p3 K = 3

12. 12 • Step 4: apply Set NMS Set NMS Set NMS

13. 13 Set NMS

14. 14 Architecture

15. 15 Q & A

16. 16 Experiments • Evaluation Metrics 1) Averaged Precision (AP) 2) MR-2 Miss Rate on False Positive Per Image (FPPI) in [10-2, 100]) 3) Jaccard Index • Datasets 1) CrowdHuman 2) CityPersons 3) COCO • Network Architecture 1) Backbone: ResNet-50 pre-trained on ImageNet 2) Head: FPN with RoIAlign 3) K = 2

17. 17 Main results and ablation study Performance on CrowdHuman Dataset

18. 18 Comparison with various NMS strategies Performance on CrowdHuman Dataset

19. 19 Ablation on Number of Heads Performance on CrowdHuman Dataset

20. 20 Experiments on COCO

21. 21 Conclusion & Discussion 1. Proposed approach is not only effective on crowded scenes, but also generalizes well on normal data. 2. Proposed approach is compatible with other one-stage & two-stage architectures. 3. A local version of DETR (Carion et al., 2020)?

22. 22 Thank you

211121 detection in crowded scenes one proposal, multiple predictions

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to 211121 detection in crowded scenes one proposal, multiple predictions

Similar to 211121 detection in crowded scenes one proposal, multiple predictions (20)

More from taeseon ryu

More from taeseon ryu (20)

Recently uploaded

Recently uploaded (20)

211121 detection in crowded scenes one proposal, multiple predictions