8. • (a) Conventional top-down FPN
– Limited by the one-way information flow
8
BiFPN: Bi-directional Feature Pyramid Network
9. • (b) PANet
– Adds extra bottom-up path aggregation
network
9
BiFPN: Bi-directional Feature Pyramid Network
• (c) NAS-FPN
– Neural architecture search
– Requires thousands of GPU hours for search
– Irregular network, difficult to interpret or modify
10. • (e) Simplified PANet
– PANet: Accurate but needs more parameters
and computations
– Remove the nodes whit only 1 input edge
10
BiFPN: Bi-directional Feature Pyramid Network
• (f) BiFPN
– Extra edges from input to output at the same level
– Repeat feature network layer (=bidirectional path)
11. • Weighted feature fusion:How to fuse multi-scale features?
– Equally sum? → x
– Introduce additional weights, let the network to learn the importance of each input feature
– Unbound fusion:
• 𝑤𝑖:scalar(per-feature), vector(per-channel), tensor(per-pixel)
• scalar is enough but needs bounding for stable training
– Soft-max fusion:
• Slowdown on GPU
– Fast normalized fusion:
• Efficient
11
BiFPN: Bi-directional Feature Pyramid Network
12. • Backbone: ImageNet pretrained EfficientNet
• Repeat BiFPN Layer
• Class & Box prediction networks share weights across all level of features
12
EfficientDet Architecture
13. • Use compound coefficient 𝝓 to jointly scale up all dimensions
– Object detection model has much more scaling dimensions than image classification models
13
Compound Scaling
Input size
𝑅𝑖𝑛𝑝𝑢𝑡
#channels
𝑊𝑏𝑖𝑓𝑝𝑛
#layers
𝐷 𝑏𝑖𝑓𝑝𝑛
#layers
𝐷𝑐𝑙𝑎𝑠𝑠
Backbone Network
𝐵0, … , 𝐵6 = 64 ∙ (1.35 𝜙
) = 3 + 𝜙/3
= 2 + 𝜙
= 512 + 𝜙 ∙ 128