3. The problem
• Fully supervised
• Costly for the annotation work
• Hard to generalize to unseen categories
• Self-supervised
• Learn part segmentations that are semantically consistent across different
object instances, given only an image collection of the same object category.
• Class agnostic
4. Contributions
• Geometric concentration
• Geometric Concentration Loss
• Robustness to variations
• Equivariance Loss
• Semantic consistency
• Semantic Consistency Loss
• Objects as union of parts
• Saliency Constraint
5. Overall Framework
• Backbone: DeepLab-V2(ResNet50)
• Output: 𝑅 = 𝐹(𝐼; 𝜃𝑓) ∈ [0,1] 𝐾+1 ∗𝐻∗𝑊
• K is the number of parts
6. Geometric Concentration Loss
• Observation:
• Pixels belonging to the same object part are spatially concentrated or form a
connected component.
• Minimize the variance of spatial probability distribution
• The part center for a part k along axis u
7. Equivariance Loss
• Random spatial Transform Ts
• Random appearance perturbation
Ta
Transformed
part center