SlideShare une entreprise Scribd logo
1  sur  29
GROUP NORMALIZATION &
RETHINKING IMAGENET PRE-TRAINING
Ruijie Quan 2018/11/25
I. GROUP NORMALIZATION
GROUP NORMALIZATION &
RETHINKING IMAGENET PRE-TRAINING
II. RETHINKING IMAGENET PRE-TRAINING
• METHODOLOGY
• EXPERIMENTS
• METHODOLOGY
• EXPERIMENTS
I. GROUP NORMALIZATION
125.11.2018
BN’s error increases rapidly when the batch
size becomes smaller, caused by inaccurate
batch statistics estimation.
Group Normalization
ImageNet classification error vs. batch sizes.
225.11.2018
GROUP NORMALIZATION
Group Normalization
A general formulation of
feature normalization:
325.11.2018
GROUP NORMALIZATION
Group Normalization
425.11.2018
GROUP NORMALIZATION
Group Normalization
(C//G , H, W)
525.11.2018
Group Normalization
Only need to specify how the mean and variance (“moments”) are computed,
along the appropriate axes as defined by the normalization method.
IMPLEMENTATION
625.11.2018
Group Normalization
EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET
Comparison of error curves with a batch size of 32 images/GPU.(Model: Resnet-50)
725.11.2018
Group Normalization
EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET
825.11.2018
Group Normalization
Evolution of feature distributions of conv5-3’s output (before normalization and ReLU) from
VGG-16, shown as the {1, 20, 80, 99} percentile of responses. The table on the right shows
the ImageNet validation error (%). Models are trained with 32 images/GPU.
EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET
VGG models: For VGG-16, GN is better than BN by 0.4%. This possibly implies that VGG-16
benefits less from BN’s regularization effect.
925.11.2018
Group Normalization
EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET
With a given fixed group number, GN performs
reasonably well for all values of G we studied.
Fixing the number of channels per group.
Note that because the layers can have different
channel numbers, the group number G can
change across layers in this setting.
Deeper models: ResNet-101 Batch size=32
BN baseline: error 22.0%
GN: error 22.4%
Batch size=2
BN baseline: error 31.9%
GN: error 23.0%
1025.11.2018
Group Normalization
OBJECT DETECTION AND SEGMENTATION IN COCO
GN is not fully trained with the default schedule,
so we also tried increasing the iterations from
180k to 270k (BN* does not benefit from longer
training).
1125.11.2018
Group Normalization
OBJECT DETECTION AND SEGMENTATION IN COCO
Error curves in Kinetics with an input length of 32 frames. We show ResNet-50 I3D’s
validation error of BN (left) and GN (right) using a batch size of 8 and 4 clips/GPU.
1225.11.2018
Group Normalization
VIDEO CLASSIFICATION IN KINETICS
Video classification results in Kinetics:
ResNet-50 I3D baseline’s top-1 / top-5
accuracy (%).
Detection and segmentation results trained
from scratch in COCO using Mask R-CNN and
FPN. Here the BN is synced across GPUs and is
not frozen.
II. RETHINKING IMAGENET PRE-TRAINING
1325.11.2018
Rethinking ImageNet Pre-training
1425.11.2018
Rethinking ImageNet Pre-training
RETHINKING IMAGENET PRE-TRAINING
Get competitive results on object detection and instance segmentation on
the COCO dataset using standard models trained from random initialization.
NO worse than their ImageNet pre-training counterparts
ONLY !!! increase the number of training iterations so the
randomly initialized models may converge
(i) using only 10% of the training data,
(ii) for deeper and wider models,
and (iii) for multiple tasks and metrics.
EVEN WHEN
1525.11.2018
Rethinking ImageNet Pre-training
RETHINKING IMAGENET PRE-TRAINING
We train Mask R-CNN with a ResNet-50
FPN and GroupNorm backbone on the
COCO train2017 set and evaluate
bounding box AP on the val2017 set.
(i) ImageNet pre-training speeds up
convergence
(ii) ImageNet pre-training does not
automatically give better regularization.
(iii) ImageNet pre-training shows no
benefit when the target tasks/metrics are
more sensitive to spatially welllocalized
predictions.
Observation:
1625.11.2018
Rethinking ImageNet Pre-training
METHODOLOGY
1. Normalization
(i) Group Normalization (GN)
(ii) Synchronized Batch Normalization (SyncBN)
Small batch sizes severely degrade the accuracy of BN. This issue can be circumvented if
pre-training is used, because fine-tuning can adopt the pretraining batch statistics as fixed
parameters; however, freezing BN is invalid when training from scratch.
2. Convergence
trained for longer than typical fine-tuning...
1725.11.2018
Rethinking ImageNet Pre-training
METHODOLOGY
2. Convergence
trained for longer than typical fine-tuning
This suggests that a sufficiently large
number of total samples (arguably in
terms of pixels) are required for the
models trained from random
initialization to converge well
125.11.2018
Rethinking ImageNet Pre-training
TRAINING FROM SCRATCH TO MATCH ACCURACY
Our first surprising discovery is that when only using the COCO data, models trained from
scratch can catch up in accuracy with ones that are fine-tuned.
1925.11.2018
Rethinking ImageNet Pre-training
TRAINING FROM SCRATCH TO MATCH ACCURACY
(i) Typical fine-tuning schedules (2×) work well
for the models with pre-training to converge to
near optimum.But these schedules are not
enough for models trained from scratch.
(ii) Models trained from scratch can catch up
with their fine-tuning counterparts, their
detection AP is no worse than their fine-tuning
counterparts. The models trained from scratch
catch up not only by chance for a single metric.
2025.11.2018
Rethinking ImageNet Pre-training
TRAINING FROM SCRATCH TO MATCH ACCURACY
X152: Large models trained from scratch
2125.11.2018
Rethinking ImageNet Pre-training
TRAINING FROM SCRATCH TO MATCH ACCURACY
ImageNet pre-training, which has little
explicit localization information, does
not help keypoint detection
2225.11.2018
Rethinking ImageNet Pre-training
TRAINING FROM SCRATCH WITH LESS DATA
2325.11.2018
Rethinking ImageNet Pre-training
BREAKDOWN REGIME
I. 1k COCO training images.
Training with 1k COCO images (shown as the loss in the training set). The randomly initialized
model can catch up for the training loss, but has lower validation accuracy (3.4 AP) than the
pre-training counterpart (9.9 AP).
A sign of strong overfitting due to the
severe lack of data. The breakdown
point in the COCO dataset is somewhere
between 3.5k to 10k training images
2425.11.2018
Rethinking ImageNet Pre-training
BREAKDOWN REGIME
II. PASCAL VOC
There are 15k VOC images used for training. But these images have on
average 2.3 instances per image (vs. COCO’s ∼7) and 20 categories (vs.
COCO’s 80).
We suspect that the fewer instances (and categories) has a similar
negative impact as insufficient training data, which can explain why
training from scratch on VOC is not able to catch up as observed on
COCO.
Using ImageNet pre-training: 82.7 mAP at 18k iterations
Trained from scratch: 77.6 mAP at 144k iterations
2525.11.2018
Rethinking ImageNet Pre-training
MAIN OBSERVATIONS
 Training from scratch on target tasks is possible without architectural changes.
 Training from scratch requires more iterations to sufficiently converge.
 Training from scratch can be no worse than its ImageNet pre-training counterparts
under many circumstances, down to as few as 10k COCO images.
 ImageNet pre-training speeds up convergence on the target task.
 ImageNet pre-training does not necessarily help reduce overfitting unless we enter
a very small data regime.
 ImageNet pre-training helps less if the target task is more sensitive to localization
than classification.
2625.11.2018
Rethinking ImageNet Pre-training
A FEW IMPORTANT QUESTIONS
Is ImageNet pre-training necessary? -No
Is ImageNet helpful? -Yes
Do we need big data? -Yes
Shall we pursuit universal representations? -Yes
Thank you for your attention.

Contenu connexe

Tendances

Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clusteringSahil Biswas
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level FeatureDongmin Choi
 
27 robust super resolution for 276-282
27 robust super resolution for 276-28227 robust super resolution for 276-282
27 robust super resolution for 276-282Alexander Decker
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET Journal
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicIRJET Journal
 
A C OMPARATIVE S TUDY ON A DAPTIVE L IFTING B ASED S CHEME AND I NTERACT...
A C OMPARATIVE  S TUDY ON  A DAPTIVE L IFTING  B ASED  S CHEME AND  I NTERACT...A C OMPARATIVE  S TUDY ON  A DAPTIVE L IFTING  B ASED  S CHEME AND  I NTERACT...
A C OMPARATIVE S TUDY ON A DAPTIVE L IFTING B ASED S CHEME AND I NTERACT...ijma
 
absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDIJERA Editor
 
IRJET- Satellite Image Resolution Enhancement using Dual-tree Complex Wav...
IRJET-  	  Satellite Image Resolution Enhancement using Dual-tree Complex Wav...IRJET-  	  Satellite Image Resolution Enhancement using Dual-tree Complex Wav...
IRJET- Satellite Image Resolution Enhancement using Dual-tree Complex Wav...IRJET Journal
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)Susang Kim
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMIJCSEA Journal
 
Video Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience DetectionVideo Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience DetectionIRJET Journal
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsVijay Karan
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET Journal
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsVijay Karan
 
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)Universitat Politècnica de Catalunya
 
Reversible Data Hiding Using Contrast Enhancement Approach
Reversible Data Hiding Using Contrast Enhancement ApproachReversible Data Hiding Using Contrast Enhancement Approach
Reversible Data Hiding Using Contrast Enhancement ApproachCSCJournals
 
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET Journal
 
Novel algorithm for color image demosaikcing using laplacian mask
Novel algorithm for color image demosaikcing using laplacian maskNovel algorithm for color image demosaikcing using laplacian mask
Novel algorithm for color image demosaikcing using laplacian maskeSAT Journals
 

Tendances (20)

Video summarization using clustering
Video summarization using clusteringVideo summarization using clustering
Video summarization using clustering
 
Review: You Only Look One-level Feature
Review: You Only Look One-level FeatureReview: You Only Look One-level Feature
Review: You Only Look One-level Feature
 
27 robust super resolution for 276-282
27 robust super resolution for 276-28227 robust super resolution for 276-282
27 robust super resolution for 276-282
 
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
IRJET- Remote Sensing Image Retrieval using Convolutional Neural Network with...
 
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy LogicImproved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
Improved Weighted Least Square Filter Based Pan Sharpening using Fuzzy Logic
 
A C OMPARATIVE S TUDY ON A DAPTIVE L IFTING B ASED S CHEME AND I NTERACT...
A C OMPARATIVE  S TUDY ON  A DAPTIVE L IFTING  B ASED  S CHEME AND  I NTERACT...A C OMPARATIVE  S TUDY ON  A DAPTIVE L IFTING  B ASED  S CHEME AND  I NTERACT...
A C OMPARATIVE S TUDY ON A DAPTIVE L IFTING B ASED S CHEME AND I NTERACT...
 
absorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRDabsorption, Cu2+ : glass, emission, excitation, XRD
absorption, Cu2+ : glass, emission, excitation, XRD
 
svebeck_stefan_09103
svebeck_stefan_09103svebeck_stefan_09103
svebeck_stefan_09103
 
IRJET- Satellite Image Resolution Enhancement using Dual-tree Complex Wav...
IRJET-  	  Satellite Image Resolution Enhancement using Dual-tree Complex Wav...IRJET-  	  Satellite Image Resolution Enhancement using Dual-tree Complex Wav...
IRJET- Satellite Image Resolution Enhancement using Dual-tree Complex Wav...
 
[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)[Paper] Multiscale Vision Transformers(MVit)
[Paper] Multiscale Vision Transformers(MVit)
 
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHMAN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
AN EFFICIENT CODEBOOK INITIALIZATION APPROACH FOR LBG ALGORITHM
 
Video Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience DetectionVideo Compression Using Block By Block Basis Salience Detection
Video Compression Using Block By Block Basis Salience Detection
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
 
IRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature DescriptorIRJET- Crowd Density Estimation using Novel Feature Descriptor
IRJET- Crowd Density Estimation using Novel Feature Descriptor
 
Remote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 ProjectsRemote Sensing IEEE 2015 Projects
Remote Sensing IEEE 2015 Projects
 
40120140502005
4012014050200540120140502005
40120140502005
 
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
YouTube-8M: A Large-Scale Video Classification Benchmark (UPC Reading Group)
 
Reversible Data Hiding Using Contrast Enhancement Approach
Reversible Data Hiding Using Contrast Enhancement ApproachReversible Data Hiding Using Contrast Enhancement Approach
Reversible Data Hiding Using Contrast Enhancement Approach
 
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet TransformIRJET- Fabric Defect Detection using Discrete Wavelet Transform
IRJET- Fabric Defect Detection using Discrete Wavelet Transform
 
Novel algorithm for color image demosaikcing using laplacian mask
Novel algorithm for color image demosaikcing using laplacian maskNovel algorithm for color image demosaikcing using laplacian mask
Novel algorithm for color image demosaikcing using laplacian mask
 

Similaire à GNorm and Rethinking pre training-ruijie

CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGIRJET Journal
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION cscpconf
 
TEXT TO IMAGE GENERATION USING GAN
TEXT TO IMAGE GENERATION USING GANTEXT TO IMAGE GENERATION USING GAN
TEXT TO IMAGE GENERATION USING GANIRJET Journal
 
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET Journal
 
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...IRJET- Segmentation and Representation of Data Dependent Label Distribution L...
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...IRJET Journal
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET Journal
 
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...sipij
 
Target Detection and Classification Improvements using Contrast Enhanced 16-b...
Target Detection and Classification Improvements using Contrast Enhanced 16-b...Target Detection and Classification Improvements using Contrast Enhanced 16-b...
Target Detection and Classification Improvements using Contrast Enhanced 16-b...sipij
 
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...IRJET Journal
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition SystemIRJET Journal
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...Edge AI and Vision Alliance
 
IRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN ClassifierIRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN ClassifierIRJET Journal
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...Alexander Decker
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...SBGC
 
IRJET- Different Approaches for Implementation of Fractal Image Compressi...
IRJET-  	  Different Approaches for Implementation of Fractal Image Compressi...IRJET-  	  Different Approaches for Implementation of Fractal Image Compressi...
IRJET- Different Approaches for Implementation of Fractal Image Compressi...IRJET Journal
 
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO Objects
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO ObjectsPerformance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO Objects
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO ObjectsIJECEIAES
 
Ieee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingIeee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingK Sundaresh Ka
 
IMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIRJET Journal
 
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...ssuser6a46522
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用CHENHuiMei
 

Similaire à GNorm and Rethinking pre training-ruijie (20)

CAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNINGCAR DAMAGE DETECTION USING DEEP LEARNING
CAR DAMAGE DETECTION USING DEEP LEARNING
 
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
CNN FEATURES ARE ALSO GREAT AT UNSUPERVISED CLASSIFICATION
 
TEXT TO IMAGE GENERATION USING GAN
TEXT TO IMAGE GENERATION USING GANTEXT TO IMAGE GENERATION USING GAN
TEXT TO IMAGE GENERATION USING GAN
 
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image DatabaseIRJET-  	  Review of Tencent ML-Images Large-Scale Multi-Label Image Database
IRJET- Review of Tencent ML-Images Large-Scale Multi-Label Image Database
 
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...IRJET- Segmentation and Representation of Data Dependent Label Distribution L...
IRJET- Segmentation and Representation of Data Dependent Label Distribution L...
 
IRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural NetworksIRJET- Mango Classification using Convolutional Neural Networks
IRJET- Mango Classification using Convolutional Neural Networks
 
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...
TARGET DETECTION AND CLASSIFICATION IMPROVEMENTS USING CONTRAST ENHANCED 16-B...
 
Target Detection and Classification Improvements using Contrast Enhanced 16-b...
Target Detection and Classification Improvements using Contrast Enhanced 16-b...Target Detection and Classification Improvements using Contrast Enhanced 16-b...
Target Detection and Classification Improvements using Contrast Enhanced 16-b...
 
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
Comparitive Analysis for Pre-Processing of Images and Videos using Histogram ...
 
Traffic Sign Recognition System
Traffic Sign Recognition SystemTraffic Sign Recognition System
Traffic Sign Recognition System
 
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
“Practical Image Data Augmentation Methods for Training Deep Learning Object ...
 
IRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN ClassifierIRJET- A Review on Object Tracking based on KNN Classifier
IRJET- A Review on Object Tracking based on KNN Classifier
 
An improved image compression algorithm based on daubechies wavelets with ar...
An improved image compression algorithm based on daubechies  wavelets with ar...An improved image compression algorithm based on daubechies  wavelets with ar...
An improved image compression algorithm based on daubechies wavelets with ar...
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
IRJET- Different Approaches for Implementation of Fractal Image Compressi...
IRJET-  	  Different Approaches for Implementation of Fractal Image Compressi...IRJET-  	  Different Approaches for Implementation of Fractal Image Compressi...
IRJET- Different Approaches for Implementation of Fractal Image Compressi...
 
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO Objects
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO ObjectsPerformance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO Objects
Performance Evaluation of Fine-tuned Faster R-CNN on specific MS COCO Objects
 
Ieee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image ProcessingIeee projects 2012 2013 - Digital Image Processing
Ieee projects 2012 2013 - Digital Image Processing
 
IMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNINGIMAGE CAPTION GENERATOR USING DEEP LEARNING
IMAGE CAPTION GENERATOR USING DEEP LEARNING
 
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
Batch normalization: Accelerating Deep Network Training by Reducing Internal ...
 
深度學習在AOI的應用
深度學習在AOI的應用深度學習在AOI的應用
深度學習在AOI的應用
 

Plus de 哲东 郑

Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification哲东 郑
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...哲东 郑
 
Visual saliency
Visual saliencyVisual saliency
Visual saliency哲东 郑
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style哲东 郑
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval哲东 郑
 
Weijian image retrieval
Weijian image retrievalWeijian image retrieval
Weijian image retrieval哲东 郑
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation哲东 郑
 
Video object detection
Video object detectionVideo object detection
Video object detection哲东 郑
 
C2 ae open set recognition
C2 ae open set recognitionC2 ae open set recognition
C2 ae open set recognition哲东 郑
 
Sota semantic segmentation
Sota semantic segmentationSota semantic segmentation
Sota semantic segmentation哲东 郑
 
Deep randomized embedding
Deep randomized embeddingDeep randomized embedding
Deep randomized embedding哲东 郑
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization哲东 郑
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow哲东 郑
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic哲东 郑
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation哲东 郑
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks 哲东 郑
 

Plus de 哲东 郑 (20)

Deep learning for person re-identification
Deep learning for person re-identificationDeep learning for person re-identification
Deep learning for person re-identification
 
Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...Cross-domain complementary learning with synthetic data for multi-person part...
Cross-domain complementary learning with synthetic data for multi-person part...
 
Step zhedong
Step zhedongStep zhedong
Step zhedong
 
Visual saliency
Visual saliencyVisual saliency
Visual saliency
 
Image Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and StyleImage Synthesis From Reconfigurable Layout and Style
Image Synthesis From Reconfigurable Layout and Style
 
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal RetrievalPolysemous Visual-Semantic Embedding for Cross-Modal Retrieval
Polysemous Visual-Semantic Embedding for Cross-Modal Retrieval
 
Weijian image retrieval
Weijian image retrievalWeijian image retrieval
Weijian image retrieval
 
Scops self supervised co-part segmentation
Scops self supervised co-part segmentationScops self supervised co-part segmentation
Scops self supervised co-part segmentation
 
Video object detection
Video object detectionVideo object detection
Video object detection
 
Center nets
Center netsCenter nets
Center nets
 
C2 ae open set recognition
C2 ae open set recognitionC2 ae open set recognition
C2 ae open set recognition
 
Sota semantic segmentation
Sota semantic segmentationSota semantic segmentation
Sota semantic segmentation
 
Deep randomized embedding
Deep randomized embeddingDeep randomized embedding
Deep randomized embedding
 
Semantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive NormalizationSemantic Image Synthesis with Spatially-Adaptive Normalization
Semantic Image Synthesis with Spatially-Adaptive Normalization
 
Instance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flowInstance level facial attributes transfer with geometry-aware flow
Instance level facial attributes transfer with geometry-aware flow
 
Learning to adapt structured output space for semantic
Learning to adapt structured output space for semanticLearning to adapt structured output space for semantic
Learning to adapt structured output space for semantic
 
Unsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image GenerationUnsupervised Learning of Object Landmarks through Conditional Image Generation
Unsupervised Learning of Object Landmarks through Conditional Image Generation
 
Graph based global reasoning networks
Graph based global reasoning networks Graph based global reasoning networks
Graph based global reasoning networks
 
Style gan
Style ganStyle gan
Style gan
 
Vi2vi
Vi2viVi2vi
Vi2vi
 

Dernier

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 

Dernier (20)

Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 

GNorm and Rethinking pre training-ruijie

  • 1. GROUP NORMALIZATION & RETHINKING IMAGENET PRE-TRAINING Ruijie Quan 2018/11/25
  • 2. I. GROUP NORMALIZATION GROUP NORMALIZATION & RETHINKING IMAGENET PRE-TRAINING II. RETHINKING IMAGENET PRE-TRAINING • METHODOLOGY • EXPERIMENTS • METHODOLOGY • EXPERIMENTS
  • 3. I. GROUP NORMALIZATION 125.11.2018 BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. Group Normalization ImageNet classification error vs. batch sizes.
  • 4. 225.11.2018 GROUP NORMALIZATION Group Normalization A general formulation of feature normalization:
  • 7. 525.11.2018 Group Normalization Only need to specify how the mean and variance (“moments”) are computed, along the appropriate axes as defined by the normalization method. IMPLEMENTATION
  • 8. 625.11.2018 Group Normalization EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET Comparison of error curves with a batch size of 32 images/GPU.(Model: Resnet-50)
  • 10. 825.11.2018 Group Normalization Evolution of feature distributions of conv5-3’s output (before normalization and ReLU) from VGG-16, shown as the {1, 20, 80, 99} percentile of responses. The table on the right shows the ImageNet validation error (%). Models are trained with 32 images/GPU. EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET VGG models: For VGG-16, GN is better than BN by 0.4%. This possibly implies that VGG-16 benefits less from BN’s regularization effect.
  • 11. 925.11.2018 Group Normalization EXPERIMENTS: IMAGE CLASSIFICATION IN IMAGENET With a given fixed group number, GN performs reasonably well for all values of G we studied. Fixing the number of channels per group. Note that because the layers can have different channel numbers, the group number G can change across layers in this setting. Deeper models: ResNet-101 Batch size=32 BN baseline: error 22.0% GN: error 22.4% Batch size=2 BN baseline: error 31.9% GN: error 23.0%
  • 12. 1025.11.2018 Group Normalization OBJECT DETECTION AND SEGMENTATION IN COCO GN is not fully trained with the default schedule, so we also tried increasing the iterations from 180k to 270k (BN* does not benefit from longer training).
  • 13. 1125.11.2018 Group Normalization OBJECT DETECTION AND SEGMENTATION IN COCO Error curves in Kinetics with an input length of 32 frames. We show ResNet-50 I3D’s validation error of BN (left) and GN (right) using a batch size of 8 and 4 clips/GPU.
  • 14. 1225.11.2018 Group Normalization VIDEO CLASSIFICATION IN KINETICS Video classification results in Kinetics: ResNet-50 I3D baseline’s top-1 / top-5 accuracy (%). Detection and segmentation results trained from scratch in COCO using Mask R-CNN and FPN. Here the BN is synced across GPUs and is not frozen.
  • 15. II. RETHINKING IMAGENET PRE-TRAINING 1325.11.2018 Rethinking ImageNet Pre-training
  • 16. 1425.11.2018 Rethinking ImageNet Pre-training RETHINKING IMAGENET PRE-TRAINING Get competitive results on object detection and instance segmentation on the COCO dataset using standard models trained from random initialization. NO worse than their ImageNet pre-training counterparts ONLY !!! increase the number of training iterations so the randomly initialized models may converge (i) using only 10% of the training data, (ii) for deeper and wider models, and (iii) for multiple tasks and metrics. EVEN WHEN
  • 17. 1525.11.2018 Rethinking ImageNet Pre-training RETHINKING IMAGENET PRE-TRAINING We train Mask R-CNN with a ResNet-50 FPN and GroupNorm backbone on the COCO train2017 set and evaluate bounding box AP on the val2017 set. (i) ImageNet pre-training speeds up convergence (ii) ImageNet pre-training does not automatically give better regularization. (iii) ImageNet pre-training shows no benefit when the target tasks/metrics are more sensitive to spatially welllocalized predictions. Observation:
  • 18. 1625.11.2018 Rethinking ImageNet Pre-training METHODOLOGY 1. Normalization (i) Group Normalization (GN) (ii) Synchronized Batch Normalization (SyncBN) Small batch sizes severely degrade the accuracy of BN. This issue can be circumvented if pre-training is used, because fine-tuning can adopt the pretraining batch statistics as fixed parameters; however, freezing BN is invalid when training from scratch. 2. Convergence trained for longer than typical fine-tuning...
  • 19. 1725.11.2018 Rethinking ImageNet Pre-training METHODOLOGY 2. Convergence trained for longer than typical fine-tuning This suggests that a sufficiently large number of total samples (arguably in terms of pixels) are required for the models trained from random initialization to converge well
  • 20. 125.11.2018 Rethinking ImageNet Pre-training TRAINING FROM SCRATCH TO MATCH ACCURACY Our first surprising discovery is that when only using the COCO data, models trained from scratch can catch up in accuracy with ones that are fine-tuned.
  • 21. 1925.11.2018 Rethinking ImageNet Pre-training TRAINING FROM SCRATCH TO MATCH ACCURACY (i) Typical fine-tuning schedules (2×) work well for the models with pre-training to converge to near optimum.But these schedules are not enough for models trained from scratch. (ii) Models trained from scratch can catch up with their fine-tuning counterparts, their detection AP is no worse than their fine-tuning counterparts. The models trained from scratch catch up not only by chance for a single metric.
  • 22. 2025.11.2018 Rethinking ImageNet Pre-training TRAINING FROM SCRATCH TO MATCH ACCURACY X152: Large models trained from scratch
  • 23. 2125.11.2018 Rethinking ImageNet Pre-training TRAINING FROM SCRATCH TO MATCH ACCURACY ImageNet pre-training, which has little explicit localization information, does not help keypoint detection
  • 25. 2325.11.2018 Rethinking ImageNet Pre-training BREAKDOWN REGIME I. 1k COCO training images. Training with 1k COCO images (shown as the loss in the training set). The randomly initialized model can catch up for the training loss, but has lower validation accuracy (3.4 AP) than the pre-training counterpart (9.9 AP). A sign of strong overfitting due to the severe lack of data. The breakdown point in the COCO dataset is somewhere between 3.5k to 10k training images
  • 26. 2425.11.2018 Rethinking ImageNet Pre-training BREAKDOWN REGIME II. PASCAL VOC There are 15k VOC images used for training. But these images have on average 2.3 instances per image (vs. COCO’s ∼7) and 20 categories (vs. COCO’s 80). We suspect that the fewer instances (and categories) has a similar negative impact as insufficient training data, which can explain why training from scratch on VOC is not able to catch up as observed on COCO. Using ImageNet pre-training: 82.7 mAP at 18k iterations Trained from scratch: 77.6 mAP at 144k iterations
  • 27. 2525.11.2018 Rethinking ImageNet Pre-training MAIN OBSERVATIONS  Training from scratch on target tasks is possible without architectural changes.  Training from scratch requires more iterations to sufficiently converge.  Training from scratch can be no worse than its ImageNet pre-training counterparts under many circumstances, down to as few as 10k COCO images.  ImageNet pre-training speeds up convergence on the target task.  ImageNet pre-training does not necessarily help reduce overfitting unless we enter a very small data regime.  ImageNet pre-training helps less if the target task is more sensitive to localization than classification.
  • 28. 2625.11.2018 Rethinking ImageNet Pre-training A FEW IMPORTANT QUESTIONS Is ImageNet pre-training necessary? -No Is ImageNet helpful? -Yes Do we need big data? -Yes Shall we pursuit universal representations? -Yes
  • 29. Thank you for your attention.