SlideShare une entreprise Scribd logo
1  sur  32
Télécharger pour lire hors ligne
EfficientDet:
Scalable and Efficient Object Detection
Mingxing Tan, et al., “EfficientDet: Scalable and Efficient Object Detection
5th January, 2020
PR12 Paper Review
JinWon Lee
Samsung Electronics
References
• Hoya012’s Research Blog
 https://hoya012.github.io/blog/EfficientDet-Review/
• PR12 – EfficientNet (PR-169)
 https://youtu.be/Vhz0quyvR7I
• PR12 – NAS-FPN (PR-166)
 https://youtu.be/FAAt0jejWOA
EfficientDet
Intro.
• State-of-the-art object detectors become increasingly more
expensive.
 AmoebaNet-based NAS-FPN detector requires 167M parameters and 3045B
FLOPS(30x more than RetinaNet)
• Given real-world resource constraints such as robotics and self-
driving cars, model efficiency becomes increasingly important for
object detection.
• Although previous works tend to achieve better efficiency, they
usually sacrifice accuracy and they only focus on a specific or a small
range of resource requirements.
A Natural Question
Is it possible to build a scalable detection architecture with
both higher accuracy and better efficiency across a wide
spectrum of resource constraints?
Two Challenges
1. Efficient multi-scale feature fusion
 FPN has been widely used for multi-scale feature fusion.
 PANet, NAS-FPN and other studies have developed more network structures
for cross-scale feature fusion.
 Most previous works simply sum them up without distinction.
 However, they usually contribute to the fused output feature unequally.
2. Model Scaling
 Inspired by EfficientNet, the authors propose a compound scaling method
for object detectors, which jointly scales up the resolution/depth/width for all
backbone, feature network, box/class prediction network.
• Combining EfficientNet backbones with BiFPN and compound
scaling  EfficientDet
Contributions
• Propose BiFPN, a weighted bidirectional feature network for easy
and fast multi-scale feature fusion.
• Propose a new compound scaling method, which jointly scales up
backbone, feature network, box/class network, and resolution, in a
principled way.
• Develop EfficientDet, a new family of one-stage detectors with
significantly better accuracy and efficiency across a wide spectrum of
resource constraints.
BiFPN – Problem Formulation
• Formaly, given a list of multi-scale features 𝑃 𝑖𝑛 = (𝑃𝑙1
𝑖𝑛
, 𝑃𝑙2
𝑖𝑛
, … ),
where 𝑃𝑙 𝑖
𝑖𝑛
represents the feature at level 𝑙𝑖, the goal is to find a
transformation 𝑓 that can effectively aggregate different features
and output a list of new features: 𝑃 𝑜𝑢𝑡 = 𝑓(𝑃 𝑖𝑛)
BiFPN – Problem Formulation
• FPN takes level 3-7 input features 𝑃 𝑖𝑛
= (𝑃3
𝑖𝑛
, … , 𝑃7
𝑖𝑛
), where 𝑃𝑖
𝑖𝑛
represents
a feature level with resolution of Τ1 2𝑖 of the input images.
• For instance, if input resolution is 640x640, then 𝑃3
𝑖𝑛
represents feature level
3 (640/23 = 80) with resolution 80x80, while 𝑃7
𝑖𝑛
represents feature level 7
(640/27 = 5) with resolution 5x5
• The conventional FPN aggregates multi-scale features
In a top-down manner:
𝑃7
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣(𝑃7
𝑖𝑛
)
𝑃6
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣 𝑃6
𝑖𝑛
+ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7
𝑜𝑢𝑡
…
𝑃3
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣 𝑃3
𝑖𝑛
+ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃4
𝑜𝑢𝑡
BiFPN – Cross-Scale Connections
• PANet adds an extra bottom-up path aggregation network.
• NAS-FPN employs neural architecture search to search for better
cross-scale feature network topology, but it requires thousands of
GPU hours during search and the found network is irregular and
difficult to interpret or modify.
PANet
• Augmenting a top-down path propagates semantically strong
features and enhances all features with reasonable classification
capability in FPN
• Augmenting a bottom-up path of low-level patterns based on the
fact that high response to edges or instance parts is a strong
indicator to accurately localize instances.
NAS-FPN
• Adopt Neural Architecture Search and discover a
new feature pyramid architecture in a novel
scalable search space covering all cross-scale
connections.
• The discovered architecture, named NAS-FPN,
consists of a combination of top-down and
bottom-up connections to fuse features across
scales.
BiFPN – Cross-Scale Connections
• PANet achieves better
accuracy then FPN and NAS-
FPN, but with the cost of more
parameters and computations.
• First, removing those nodes
that only have one input edge
with no feature fusion, then it
will have less contribution to
feature network that aims at
fusing different features.
BiFPN – Cross-Scale Connections
• Second, adding an extra edge from the
original input to output node if they are
at the same level, in order to fuse more
features without adding much cost.
• Third, unlike PANet that only has one
top-down and one bottom-up path, we
treat each bidirectional (top-down &
bottom-up) path as one feature network
layer, and repeat the same layer multiple
times to enable more high-level feature
fusion.
BiFPN –Weighted Feature Fusion
• When fusing multiple input features with
different resolutions, a common way is to first
resize them to the same resolution and then
sum them up.
• Pyramid attention network introduces global
self-attention upsampling to recover pixel
localization.
• Since different input features are at different
resolutions, they usually contribute to the
output feature unequally. So, adding an
additional weight for each input during feature
fusion, making the network to learn the
importance of each input feature.
Unbounded Fusion
𝑂 = ෍
𝑖
𝑤𝑖 ∙ 𝐼𝑖
• where 𝑤𝑖 is a learnable weight that can be a scalar (per-feature), a
vector(per-channel), or a multi-dimensional tensor (per-pixel).
• Authors find a scale can achieve comparable accuracy to other
approaches with minimal computational costs. However, since the
scalar weight is unbounded, it could potentially cause training
instability.
Softmax-Based Fusion
𝑂 = ෍
𝑖
𝑒 𝑤 𝑖
σ 𝑗 𝑒 𝑤 𝑗
∙ 𝐼𝑖
• An intuitive idea is to apply softmax to each weight, such that all
weights are normalized to be a probability with value range from 0 to
1, representing the importance of each input.
• However, the extra softmax leads to significant slowdown on GPU
hardware.
Fast Normalized Fusion
𝑂 = ෍
𝑖
𝑤𝑖
𝜖 + σ 𝑗 𝑤𝑗
∙ 𝐼𝑖
• where 𝑤𝑖 ≥ 0is ensured by applying a Relu after each 𝑤𝑖,
and 𝜖 = 0.0001 is a small value to avoid numerical instability.
• Ablation study shows this fast fusion approach has very similar
learning behavior and accuracy as the softmax-based fusion, but runs
up to 30% faster on GPUs .
BiFPN
• BiFPN integrates both the bidirectional cross-scale
connections and the fast normalized fusion.
𝑃6
𝑡𝑑
= 𝐶𝑜𝑛𝑣
𝑤1 ∙ 𝑃6
𝑖𝑛
+ 𝑤2 ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7
𝑖𝑛
𝑤1 + 𝑤2 + 𝜖
𝑃6
𝑜𝑢𝑡
= 𝐶𝑜𝑛𝑣(
𝑤1
′
∙ 𝑃6
𝑖𝑛
+ 𝑤2
′
∙ 𝑃6
𝑡𝑑
+ 𝑤3
′
∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃5
𝑜𝑢𝑡
𝑤1 + 𝑤2 + 𝜖
)
Where 𝑃6
𝑡𝑑
is the intermediate feature at level 6 on the
top-down pathway, and 𝑃6
𝑜𝑢𝑡
is the output feature at
level 6 on the bottom-up pathway.
• Depthwise separable convolution is used for feature
fusion.
𝑷 𝟔
𝒕𝒅
𝑷 𝟔
𝒐𝒖𝒕
𝑷 𝟓
𝒐𝒖𝒕
EfficientDet Architecture
• ImageNet-pretrained EfficientNet is employed as the backbone.
• The class and box network weights are shared across all levels of features.
EfficientNet – Compound Scaling
EfficientNet – Compound Scaling Method
• , ,  are constants that can be determined by a small grid search.
• Intuitively, 𝜙 is a user-specified coefficient that controls how many
more resources are available for model scaling.
EfficientNet – Compound Scaling
Method
• Notably, the FLOPS of a regular convolution op is proportional to d,
w2, r2.
 Doubling network depth will double FLOPS, but doubling network width or
resolution will increase FLOPS by four times. Since convolution ops usually
dominate the computation cost in ConvNets, scaling a ConvNet with above
equation will approximately increase total FLOPS by
• In this paper, total FLOPs approximately increase by
Compound Scaling
• Inspired by EfficientNet, a new compound scaling method which
uses a simple compound coefficient 𝜙 to jointly scale up all
dimensions of backbone network, BiFPN network, class/box network,
and resolution.
• Unfortunately, object detectors have much more scaling dimensions
than image classification models, so a heuristic-based scaling
approach is used.
Compound Scaling
• Backbone network
 Same width/depth scaling coefficient of EfficientNet-B0 to B6
• BiFPN network
 Exponentially grow BiFPN width 𝑊𝑏𝑖𝑓𝑝𝑛(#channels), but linearly increase depth
𝐷 𝑏𝑖𝑓𝑝𝑛(#layers) since depth needs to be rounded to small integers.
 𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙
, 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙
• Box/class prediction network
 Fix their width to always the same as BiFPN(i.e., 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛)
 But, linearly increase the depth(#layers) using equation:
 𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3
• Input image resolution
 Since feature level 3-7 are used in BiFPN, the input resolution must be dividable by
27=128
 𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
Compound Scaling
Input resolution
𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
Backbone Network
𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑁𝑒𝑡 − 𝐵0~𝐵6
#layers
𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙
#channels
𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙
#layers
𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3
#channels
𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛
Experiments
Titan-V Single-thread Xeon
Experiments
Ablation Study
• Disentangling Backbone and BiFPN
• BiFPN Cross-Scale Connections
FPN and PANet only have one top-down or bottom-up flow so they were repeated 5 times
Ablation Study
• Softmax vs Fast Normalized Fusion
Ablation Study
• Compound Scaling
Conclusion
• Weighted bidirectional feature network and customized compound
scaling method are proposed.
• These are for improving both accuracy and efficiency.
• EfficientDet-D7 achieves state-of-the-art 51.0 mAP on COCO dataset
with 52M parameters and 326B FLOPs, being 4x smaller and using
9.3x fewer FLOPS yet still more accurate(+0.3% mAP) than the best
previous detector.
• EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs.

Contenu connexe

Tendances

Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsKasun Chinthaka Piyarathna
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaPreferred Networks
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksJinwon Lee
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural networkMojammilHusain
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSDThomas Delteil
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network Yan Xu
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detectionDaeHeeKim31
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Segment Anything
Segment AnythingSegment Anything
Segment Anythingfake can
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnnSumeraHangi
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNNShuai Zhang
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)Universitat Politècnica de Catalunya
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Universitat Politècnica de Catalunya
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image TranslationDeep Learning JP
 

Tendances (20)

Convolutional Neural Network and Its Applications
Convolutional Neural Network and Its ApplicationsConvolutional Neural Network and Its Applications
Convolutional Neural Network and Its Applications
 
RNN-LSTM.pptx
RNN-LSTM.pptxRNN-LSTM.pptx
RNN-LSTM.pptx
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
A Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi KerolaA Brief History of Object Detection / Tommi Kerola
A Brief History of Object Detection / Tommi Kerola
 
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural NetworksPR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
PR-169: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Intro to Object Detection with SSD
Intro to Object Detection with SSDIntro to Object Detection with SSD
Intro to Object Detection with SSD
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Convolutional neural network
Convolutional neural network Convolutional neural network
Convolutional neural network
 
Focal loss for dense object detection
Focal loss for dense object detectionFocal loss for dense object detection
Focal loss for dense object detection
 
Mask R-CNN
Mask R-CNNMask R-CNN
Mask R-CNN
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Segment Anything
Segment AnythingSegment Anything
Segment Anything
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Image classification using cnn
Image classification using cnnImage classification using cnn
Image classification using cnn
 
Introduction to CNN
Introduction to CNNIntroduction to CNN
Introduction to CNN
 
You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)You only look once: Unified, real-time object detection (UPC Reading Group)
You only look once: Unified, real-time object detection (UPC Reading Group)
 
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
Image classification on Imagenet (D1L4 2017 UPC Deep Learning for Computer Vi...
 
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
[DL輪読会]Few-Shot Unsupervised Image-to-Image Translation
 

Similaire à PR-217: EfficientDet: Scalable and Efficient Object Detection

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)DonghyunKang12
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architecturesananth
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewLEE HOSEONG
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution Mohammed Ashour
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningTrong-An Bui
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentationOwin Will
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxssuser2624f71
 
artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxPoonam60376
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxJohnPrasad14
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNNJunho Cho
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxfahmi324663
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...Edge AI and Vision Alliance
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flowNaresh Kumar
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksParrotAI
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012Jinwon Lee
 

Similaire à PR-217: EfficientDet: Scalable and Efficient Object Detection (20)

Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)Cvpr 2018 papers review (efficient computing)
Cvpr 2018 papers review (efficient computing)
 
Convolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular ArchitecturesConvolutional Neural Networks : Popular Architectures
Convolutional Neural Networks : Popular Architectures
 
Pelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper ReviewPelee: a real time object detection system on mobile devices Paper Review
Pelee: a real time object detection system on mobile devices Paper Review
 
A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution A Fully Progressive approach to Single image super-resolution
A Fully Progressive approach to Single image super-resolution
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
Batch normalization presentation
Batch normalization presentationBatch normalization presentation
Batch normalization presentation
 
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptxEfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks.pptx
 
artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Introduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptxIntroduction to Perceptron and Neural Network.pptx
Introduction to Perceptron and Neural Network.pptx
 
Waste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptxWaste Classification System using Convolutional Neural Networks.pptx
Waste Classification System using Convolutional Neural Networks.pptx
 
150807 Fast R-CNN
150807 Fast R-CNN150807 Fast R-CNN
150807 Fast R-CNN
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
SPPNet
SPPNetSPPNet
SPPNet
 
Week5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptxWeek5-Faster R-CNN.pptx
Week5-Faster R-CNN.pptx
 
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio..."Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
"Quantizing Deep Networks for Efficient Inference at the Edge," a Presentatio...
 
Ocr using tensor flow
Ocr using tensor flowOcr using tensor flow
Ocr using tensor flow
 
Introduction to Convolutional Neural Networks
Introduction to Convolutional Neural NetworksIntroduction to Convolutional Neural Networks
Introduction to Convolutional Neural Networks
 
Faster R-CNN - PR012
Faster R-CNN - PR012Faster R-CNN - PR012
Faster R-CNN - PR012
 
cnn.pdf
cnn.pdfcnn.pdf
cnn.pdf
 
Cnn
CnnCnn
Cnn
 

Plus de Jinwon Lee

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sJinwon Lee
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersJinwon Lee
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...Jinwon Lee
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionJinwon Lee
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...Jinwon Lee
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)Jinwon Lee
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...Jinwon Lee
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesJinwon Lee
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsJinwon Lee
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementJinwon Lee
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...Jinwon Lee
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsJinwon Lee
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionJinwon Lee
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignJinwon Lee
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...Jinwon Lee
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksJinwon Lee
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksJinwon Lee
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitJinwon Lee
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingJinwon Lee
 

Plus de Jinwon Lee (20)

PR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020sPR-366: A ConvNet for 2020s
PR-366: A ConvNet for 2020s
 
PR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision LearnersPR-355: Masked Autoencoders Are Scalable Vision Learners
PR-355: Masked Autoencoders Are Scalable Vision Learners
 
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
PR-344: A Battle of Network Structures: An Empirical Study of CNN, Transforme...
 
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...
 
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for VisionPR-317: MLP-Mixer: An all-MLP Architecture for Vision
PR-317: MLP-Mixer: An all-MLP Architecture for Vision
 
PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...PR-297: Training data-efficient image transformers & distillation through att...
PR-297: Training data-efficient image transformers & distillation through att...
 
PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)PR-284: End-to-End Object Detection with Transformers(DETR)
PR-284: End-to-End Object Detection with Transformers(DETR)
 
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
PR-258: From ImageNet to Image Classification: Contextualizing Progress on Be...
 
PR243: Designing Network Design Spaces
PR243: Designing Network Design SpacesPR243: Designing Network Design Spaces
PR243: Designing Network Design Spaces
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
PR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental ImprovementPR-207: YOLOv3: An Incremental Improvement
PR-207: YOLOv3: An Incremental Improvement
 
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
PR-197: One ticket to win them all: generalizing lottery ticket initializatio...
 
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional KernelsPR-183: MixNet: Mixed Depthwise Convolutional Kernels
PR-183: MixNet: Mixed Depthwise Convolutional Kernels
 
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image RecognitionPR-155: Exploring Randomly Wired Neural Networks for Image Recognition
PR-155: Exploring Randomly Wired Neural Networks for Image Recognition
 
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network DesignPR-144: SqueezeNext: Hardware-Aware Neural Network Design
PR-144: SqueezeNext: Hardware-Aware Neural Network Design
 
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
PR-120: ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture De...
 
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear BottlenecksPR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
PR-108: MobileNetV2: Inverted Residuals and Linear Bottlenecks
 
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning TasksPR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
PR095: Modularity Matters: Learning Invariant Relational Reasoning Tasks
 
In datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unitIn datacenter performance analysis of a tensor processing unit
In datacenter performance analysis of a tensor processing unit
 
Efficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter SharingEfficient Neural Architecture Search via Parameter Sharing
Efficient Neural Architecture Search via Parameter Sharing
 

Dernier

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 

Dernier (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 

PR-217: EfficientDet: Scalable and Efficient Object Detection

  • 1. EfficientDet: Scalable and Efficient Object Detection Mingxing Tan, et al., “EfficientDet: Scalable and Efficient Object Detection 5th January, 2020 PR12 Paper Review JinWon Lee Samsung Electronics
  • 2. References • Hoya012’s Research Blog  https://hoya012.github.io/blog/EfficientDet-Review/ • PR12 – EfficientNet (PR-169)  https://youtu.be/Vhz0quyvR7I • PR12 – NAS-FPN (PR-166)  https://youtu.be/FAAt0jejWOA
  • 4. Intro. • State-of-the-art object detectors become increasingly more expensive.  AmoebaNet-based NAS-FPN detector requires 167M parameters and 3045B FLOPS(30x more than RetinaNet) • Given real-world resource constraints such as robotics and self- driving cars, model efficiency becomes increasingly important for object detection. • Although previous works tend to achieve better efficiency, they usually sacrifice accuracy and they only focus on a specific or a small range of resource requirements.
  • 5. A Natural Question Is it possible to build a scalable detection architecture with both higher accuracy and better efficiency across a wide spectrum of resource constraints?
  • 6. Two Challenges 1. Efficient multi-scale feature fusion  FPN has been widely used for multi-scale feature fusion.  PANet, NAS-FPN and other studies have developed more network structures for cross-scale feature fusion.  Most previous works simply sum them up without distinction.  However, they usually contribute to the fused output feature unequally. 2. Model Scaling  Inspired by EfficientNet, the authors propose a compound scaling method for object detectors, which jointly scales up the resolution/depth/width for all backbone, feature network, box/class prediction network. • Combining EfficientNet backbones with BiFPN and compound scaling  EfficientDet
  • 7. Contributions • Propose BiFPN, a weighted bidirectional feature network for easy and fast multi-scale feature fusion. • Propose a new compound scaling method, which jointly scales up backbone, feature network, box/class network, and resolution, in a principled way. • Develop EfficientDet, a new family of one-stage detectors with significantly better accuracy and efficiency across a wide spectrum of resource constraints.
  • 8. BiFPN – Problem Formulation • Formaly, given a list of multi-scale features 𝑃 𝑖𝑛 = (𝑃𝑙1 𝑖𝑛 , 𝑃𝑙2 𝑖𝑛 , … ), where 𝑃𝑙 𝑖 𝑖𝑛 represents the feature at level 𝑙𝑖, the goal is to find a transformation 𝑓 that can effectively aggregate different features and output a list of new features: 𝑃 𝑜𝑢𝑡 = 𝑓(𝑃 𝑖𝑛)
  • 9. BiFPN – Problem Formulation • FPN takes level 3-7 input features 𝑃 𝑖𝑛 = (𝑃3 𝑖𝑛 , … , 𝑃7 𝑖𝑛 ), where 𝑃𝑖 𝑖𝑛 represents a feature level with resolution of Τ1 2𝑖 of the input images. • For instance, if input resolution is 640x640, then 𝑃3 𝑖𝑛 represents feature level 3 (640/23 = 80) with resolution 80x80, while 𝑃7 𝑖𝑛 represents feature level 7 (640/27 = 5) with resolution 5x5 • The conventional FPN aggregates multi-scale features In a top-down manner: 𝑃7 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣(𝑃7 𝑖𝑛 ) 𝑃6 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣 𝑃6 𝑖𝑛 + 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7 𝑜𝑢𝑡 … 𝑃3 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣 𝑃3 𝑖𝑛 + 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃4 𝑜𝑢𝑡
  • 10. BiFPN – Cross-Scale Connections • PANet adds an extra bottom-up path aggregation network. • NAS-FPN employs neural architecture search to search for better cross-scale feature network topology, but it requires thousands of GPU hours during search and the found network is irregular and difficult to interpret or modify.
  • 11. PANet • Augmenting a top-down path propagates semantically strong features and enhances all features with reasonable classification capability in FPN • Augmenting a bottom-up path of low-level patterns based on the fact that high response to edges or instance parts is a strong indicator to accurately localize instances.
  • 12. NAS-FPN • Adopt Neural Architecture Search and discover a new feature pyramid architecture in a novel scalable search space covering all cross-scale connections. • The discovered architecture, named NAS-FPN, consists of a combination of top-down and bottom-up connections to fuse features across scales.
  • 13. BiFPN – Cross-Scale Connections • PANet achieves better accuracy then FPN and NAS- FPN, but with the cost of more parameters and computations. • First, removing those nodes that only have one input edge with no feature fusion, then it will have less contribution to feature network that aims at fusing different features.
  • 14. BiFPN – Cross-Scale Connections • Second, adding an extra edge from the original input to output node if they are at the same level, in order to fuse more features without adding much cost. • Third, unlike PANet that only has one top-down and one bottom-up path, we treat each bidirectional (top-down & bottom-up) path as one feature network layer, and repeat the same layer multiple times to enable more high-level feature fusion.
  • 15. BiFPN –Weighted Feature Fusion • When fusing multiple input features with different resolutions, a common way is to first resize them to the same resolution and then sum them up. • Pyramid attention network introduces global self-attention upsampling to recover pixel localization. • Since different input features are at different resolutions, they usually contribute to the output feature unequally. So, adding an additional weight for each input during feature fusion, making the network to learn the importance of each input feature.
  • 16. Unbounded Fusion 𝑂 = ෍ 𝑖 𝑤𝑖 ∙ 𝐼𝑖 • where 𝑤𝑖 is a learnable weight that can be a scalar (per-feature), a vector(per-channel), or a multi-dimensional tensor (per-pixel). • Authors find a scale can achieve comparable accuracy to other approaches with minimal computational costs. However, since the scalar weight is unbounded, it could potentially cause training instability.
  • 17. Softmax-Based Fusion 𝑂 = ෍ 𝑖 𝑒 𝑤 𝑖 σ 𝑗 𝑒 𝑤 𝑗 ∙ 𝐼𝑖 • An intuitive idea is to apply softmax to each weight, such that all weights are normalized to be a probability with value range from 0 to 1, representing the importance of each input. • However, the extra softmax leads to significant slowdown on GPU hardware.
  • 18. Fast Normalized Fusion 𝑂 = ෍ 𝑖 𝑤𝑖 𝜖 + σ 𝑗 𝑤𝑗 ∙ 𝐼𝑖 • where 𝑤𝑖 ≥ 0is ensured by applying a Relu after each 𝑤𝑖, and 𝜖 = 0.0001 is a small value to avoid numerical instability. • Ablation study shows this fast fusion approach has very similar learning behavior and accuracy as the softmax-based fusion, but runs up to 30% faster on GPUs .
  • 19. BiFPN • BiFPN integrates both the bidirectional cross-scale connections and the fast normalized fusion. 𝑃6 𝑡𝑑 = 𝐶𝑜𝑛𝑣 𝑤1 ∙ 𝑃6 𝑖𝑛 + 𝑤2 ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃7 𝑖𝑛 𝑤1 + 𝑤2 + 𝜖 𝑃6 𝑜𝑢𝑡 = 𝐶𝑜𝑛𝑣( 𝑤1 ′ ∙ 𝑃6 𝑖𝑛 + 𝑤2 ′ ∙ 𝑃6 𝑡𝑑 + 𝑤3 ′ ∙ 𝑅𝑒𝑠𝑖𝑧𝑒 𝑃5 𝑜𝑢𝑡 𝑤1 + 𝑤2 + 𝜖 ) Where 𝑃6 𝑡𝑑 is the intermediate feature at level 6 on the top-down pathway, and 𝑃6 𝑜𝑢𝑡 is the output feature at level 6 on the bottom-up pathway. • Depthwise separable convolution is used for feature fusion. 𝑷 𝟔 𝒕𝒅 𝑷 𝟔 𝒐𝒖𝒕 𝑷 𝟓 𝒐𝒖𝒕
  • 20. EfficientDet Architecture • ImageNet-pretrained EfficientNet is employed as the backbone. • The class and box network weights are shared across all levels of features.
  • 22. EfficientNet – Compound Scaling Method • , ,  are constants that can be determined by a small grid search. • Intuitively, 𝜙 is a user-specified coefficient that controls how many more resources are available for model scaling.
  • 23. EfficientNet – Compound Scaling Method • Notably, the FLOPS of a regular convolution op is proportional to d, w2, r2.  Doubling network depth will double FLOPS, but doubling network width or resolution will increase FLOPS by four times. Since convolution ops usually dominate the computation cost in ConvNets, scaling a ConvNet with above equation will approximately increase total FLOPS by • In this paper, total FLOPs approximately increase by
  • 24. Compound Scaling • Inspired by EfficientNet, a new compound scaling method which uses a simple compound coefficient 𝜙 to jointly scale up all dimensions of backbone network, BiFPN network, class/box network, and resolution. • Unfortunately, object detectors have much more scaling dimensions than image classification models, so a heuristic-based scaling approach is used.
  • 25. Compound Scaling • Backbone network  Same width/depth scaling coefficient of EfficientNet-B0 to B6 • BiFPN network  Exponentially grow BiFPN width 𝑊𝑏𝑖𝑓𝑝𝑛(#channels), but linearly increase depth 𝐷 𝑏𝑖𝑓𝑝𝑛(#layers) since depth needs to be rounded to small integers.  𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙 , 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙 • Box/class prediction network  Fix their width to always the same as BiFPN(i.e., 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛)  But, linearly increase the depth(#layers) using equation:  𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3 • Input image resolution  Since feature level 3-7 are used in BiFPN, the input resolution must be dividable by 27=128  𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128
  • 26. Compound Scaling Input resolution 𝑅𝑖𝑛𝑝𝑢𝑡 = 512 + 𝜙 ∙ 128 Backbone Network 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡𝑁𝑒𝑡 − 𝐵0~𝐵6 #layers 𝐷 𝑏𝑖𝑓𝑝𝑛 = 2 + 𝜙 #channels 𝑊𝑏𝑖𝑓𝑝𝑛 = 64 ∙ 1.35 𝜙 #layers 𝐷 𝑏𝑜𝑥 = 𝐷𝑐𝑙𝑎𝑠𝑠 = 3 + ‫ہ‬ 𝜙 ‫ۂ‬/3 #channels 𝑊𝑝𝑟𝑒𝑑 = 𝑊𝑏𝑖𝑓𝑝𝑛
  • 29. Ablation Study • Disentangling Backbone and BiFPN • BiFPN Cross-Scale Connections FPN and PANet only have one top-down or bottom-up flow so they were repeated 5 times
  • 30. Ablation Study • Softmax vs Fast Normalized Fusion
  • 32. Conclusion • Weighted bidirectional feature network and customized compound scaling method are proposed. • These are for improving both accuracy and efficiency. • EfficientDet-D7 achieves state-of-the-art 51.0 mAP on COCO dataset with 52M parameters and 326B FLOPs, being 4x smaller and using 9.3x fewer FLOPS yet still more accurate(+0.3% mAP) than the best previous detector. • EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs.