Training data-efficient image transformers & distillation through attention

•Download as PPTX, PDF•

0 likes•383 views

Training data-efficient image transformers & distillation through attention paper review!! youtube : https://youtu.be/Jpv-Wm8vvI4 contact : tfkeras@kakao.com

Science

2021년 1월 31일
딥러닝 논문읽기 모임
이미지 처리팀 : 김병현 박동훈 안종식 홍은기 허다운
Training data-Efficient Image transformer &
Distillation through Attention(DeiT)

Contents
Summary 01
03
02
04
05
Experience
Prerequisites
Method
Discussion

01. Summary
1. 2020년 12월 발표, Facebook AI
2. ViT를 일부 발전시키고 Distillation 개념 도입
3. Contribution
- CNN을 사용하지 않은 Image Classification
- ImageNet만으로 학습
- Single 8-GPU Node로 2~3일정도만 학습
- SOTA CNN기반 Model과 비슷한 성능 확인
- Distillation 개념 도입
4. Conclusion
- CNN 기반 Architecture들은 다년간 연구가 진행되어 성능 향상
- Image Context Task에서 Transformer는 이제 막 연구되기 시작함
> 비슷한 성능을 보여준다는 점에서 Transformer의 가능성을 보여줌

Prerequisites
02
Vision Transformer & Knowledge Distillation

02. Prerequisites
1. Vision Transformer
- An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google
> 참조 : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - 홍은기

02. Prerequisites
1. Vision Transformer
- Training Dataset : JFT-300M
- Pre-train : Low Resolution, Fine-tunning : High Resolution
> Position Embedding : Bicubic Interpolation

02. Prerequisites
2. Knowledge Distillation
- 미리 잘 학습된 Teacher Model을 작은 Student Model에 지식을 전달한다는 개념
> 참조 : Explaining knowledge distillation by quantifying the knowledge - 김동희

03. Architecture
1. Knowledge Distillation
- Class Token과 같은 구조의 Distillation Token 추가
- Soft Distillation
- Hard Distillation
- Random Crop으로 인한 잘못된 학습 방지 가능
GT : Cat / Prediction : Cat
GT : Cat / Prediction : ???

03. Architecture
2. Bag of Tricks
- 기본적으로, ViT 구조를 그대로 사용 (ViT-B = DeiT-B)
> 기본적인 학습 방법 동일
> Hyper parameter Tunning으로 성능 향상

EXPERIMENTS
04
Experiment Result of DeiT

04. Experiments
1. Distillation
- Teacher Model : RegNetY-16GF
> ConvNet is Better than Transformer Model
“Probably” Inductive Bias !
- Distillation Comparison : Hard is Better
* Inductive Bias
- Distillation Method가 Convnet의 Inductive Bias를 더 잘 학습한다

04. Experiments
2. Efficiency vs Accuracy
- Parameter의 개수, 처리속도, Accuracy를 비교
> Throughput과 Accuracy로 비교하면, Convnet와 유사한 성능을 보인다
- Base Model : DeiT-B (= ViT-B)
3. Transfer Learning
- ImageNet으로 학습한 Pre-Train Model을 다른 데이터 Set으로 Test

05. Discussion
1. Contribution
1) Transformer 기반의 ViT Model의 성능 향상 (Convnet X)
2) ViT보다 더 적은 Dataset으로 학습 및 학습속도 향상
3) SOTA Convnet과 유사한 성능 확인
4) 간편한 Knowledge Distillation 방법 제안
2. Opinion
1) 여전히 많은 Epoch 필요 (300~500Epoch)
2) Transformer의 단점이 드러남
> Hyper Parameter에 민감
> Convnet대비 많은 Dataset과 Training 시간이 필요
> 연구단계에서는 많은 연구 가능, 현업에 적용하기에는 어려움
3) Deep Learning 개발 초기단계의 연구 방식
> Quantitative Research (Experiment  Theory)
> Experiment의 결과를 충분히 해석하지 못함
3. Conclusion
1) 아직 연구가 많이 필요한 분야
2) 연구 초기단계임에도 불구하고 CNN과 유사한 성능을 나타낸다는 것은
NLP에서의 변화처럼, CNN을 대체할 수 있을 가능성을 확인할 수 있음

What's hot

Transforming deep into transformers – a computer vision approachFerdin Joe John Joseph PhD

ImageNet Classification with Deep Convolutional Neural NetworksKouhei Nakajima

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...Jinwon Lee

ViT.pptxChangjin Lee

딥러닝 논문읽기 efficient netv2 논문리뷰taeseon ryu

Pr045 deep lab_semantic_segmentationTaeoh Kim

MixMatch: A Holistic Approach to Semi- Supervised Learningharmonylab

ViT (Vision Transformer) Review [CDM]Dongmin Choi

[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...Deep Learning JP

[cvpaper.challenge] 超解像メタサーベイ #meta-study-group勉強会S_aiueo32

adversarial robustness through local linearizationtaeseon ryu

Attention-Guided GANについてyohei okawa

ウェーブレット変換の基礎と応用事例：連続ウェーブレット変換を中心にRyosuke Tachibana

12. Diffusion Model の数学的基礎.pdf幸太朗岩澤

attention_is_all_you_need_nips17_論文紹介Masayoshi Kondo

[Paper Reading] Attention is All You NeedDaiki Tanaka

Res netと派生研究の紹介masataka nishimori

Video Transformers.pptxSangmin Woo

Semi supervised, weakly-supervised, unsupervised, and active learningYusuke Uchida

Attention is All You Need (Transformer)Jeong-Gwan Lee

What's hot (20)

Transforming deep into transformers – a computer vision approach

ImageNet Classification with Deep Convolutional Neural Networks

PR-330: How To Train Your ViT? Data, Augmentation, and Regularization in Visi...

ViT.pptx

딥러닝 논문읽기 efficient netv2 논문리뷰

Pr045 deep lab_semantic_segmentation

MixMatch: A Holistic Approach to Semi- Supervised Learning

ViT (Vision Transformer) Review [CDM]

[DL輪読会]An Image is Worth 16x16 Words: Transformers for Image Recognition at S...

[cvpaper.challenge] 超解像メタサーベイ #meta-study-group勉強会

adversarial robustness through local linearization

Attention-Guided GANについて

ウェーブレット変換の基礎と応用事例：連続ウェーブレット変換を中心に

12. Diffusion Model の数学的基礎.pdf

attention_is_all_you_need_nips17_論文紹介

[Paper Reading] Attention is All You Need

Res netと派生研究の紹介

Video Transformers.pptx

Semi supervised, weakly-supervised, unsupervised, and active learning

Attention is All You Need (Transformer)

Similar to Training data-efficient image transformers & distillation through attention

I3D and Kinetics datasets (Action Recognition)Susang Kim

권기훈_포트폴리오Kihoon4

TinyBERTHoon Heo

History of Vision AITae Young Lee

PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...Sunghoon Joo

[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdfChris Hoyean Song

AUTOML승우 이

Automl승우 이

LeNet & GoogLeNetInstitute of Agricultural Machinery, NARO

생체 광학 데이터 분석 AI 경진대회 3위 수상작DACON AI 데이콘

위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작DACON AI 데이콘

네트워크 경량화 이모저모 @ 2020 DLDKim Junghoon

스마트폰 위의 딥러닝NAVER Engineering

180624 mobile visionnet_baeksucon_jwkang_pubJaewook. Kang

ICIP 2018 REVIEWSungMan Cho

20년된 Naver Cafe 서비스가 Modularization으로 진화 하기_정동진.pdfeastarJeong2

소프트웨어 마에스트로 10기 - 책을 만나는 순간, 책을찍다HYEONGNAM LEE

딥러닝 세계에 입문하기 위반 분투Ubuntu Korea Community

작고 빠른 딥러닝 그리고 Edge computingStellaSeoYeonYang

Tiny ml study 20201031ByoungHern Kim

Similar to Training data-efficient image transformers & distillation through attention (20)

I3D and Kinetics datasets (Action Recognition)

권기훈_포트폴리오

TinyBERT

History of Vision AI

PR-383: Solving ImageNet: a Unified Scheme for Training any Backbone to Top R...

[KR] Building modern data pipeline with Snowflake + DBT + Airflow.pdf

AUTOML

Automl

LeNet & GoogLeNet

생체 광학 데이터 분석 AI 경진대회 3위 수상작

위성관측 데이터 활용 강수량 산출 AI 경진대회 1위 수상작

네트워크 경량화 이모저모 @ 2020 DLD

스마트폰 위의 딥러닝

180624 mobile visionnet_baeksucon_jwkang_pub

ICIP 2018 REVIEW

20년된 Naver Cafe 서비스가 Modularization으로 진화 하기_정동진.pdf

소프트웨어 마에스트로 10기 - 책을 만나는 순간, 책을찍다

딥러닝 세계에 입문하기 위반 분투

작고 빠른 딥러닝 그리고 Edge computing

Tiny ml study 20201031

More from taeseon ryu

VoxelNettaeseon ryu

OpineSum Entailment-based self-training for abstractive opinion summarization...taeseon ryu

3D Gaussian Splattingtaeseon ryu

JetsonTX2 Python taeseon ryu

Hyperbolic Image Embedding.pptxtaeseon ryu

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정taeseon ryu

LLaMA Open and Efficient Foundation Language Models - 230528.pdftaeseon ryu

YOLO V6taeseon ryu

Dataset Distillation by Matching Training Trajectories taeseon ryu

RL_UpsideDowntaeseon ryu

Packed Levitated Marker for Entity and Relation Extractiontaeseon ryu

MOReL: Model-Based Offline Reinforcement Learningtaeseon ryu

Scaling Instruction-Finetuned Language Modelstaeseon ryu

Visual prompt tuningtaeseon ryu

mPLUGtaeseon ryu

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdftaeseon ryu

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdftaeseon ryu

The Forward-Forward Algorithmtaeseon ryu

Towards Robust and Reproducible Active Learning using Neural Networkstaeseon ryu

BRIO: Bringing Order to Abstractive Summarizationtaeseon ryu

More from taeseon ryu (20)

VoxelNet

OpineSum Entailment-based self-training for abstractive opinion summarization...

3D Gaussian Splatting

JetsonTX2 Python

Hyperbolic Image Embedding.pptx

MCSE_Multimodal Contrastive Learning of Sentence Embeddings_변현정

LLaMA Open and Efficient Foundation Language Models - 230528.pdf

YOLO V6

Dataset Distillation by Matching Training Trajectories

RL_UpsideDown

Packed Levitated Marker for Entity and Relation Extraction

MOReL: Model-Based Offline Reinforcement Learning

Scaling Instruction-Finetuned Language Models

Visual prompt tuning

mPLUG

variBAD, A Very Good Method for Bayes-Adaptive Deep RL via Meta-Learning.pdf

Reinforced Genetic Algorithm Learning For Optimizing Computation Graphs.pdf

The Forward-Forward Algorithm

Towards Robust and Reproducible Active Learning using Neural Networks

BRIO: Bringing Order to Abstractive Summarization

Training data-efficient image transformers & distillation through attention

1. 2021년 1월 31일 딥러닝 논문읽기 모임 이미지 처리팀 : 김병현 박동훈 안종식 홍은기 허다운 Training data-Efficient Image transformer & Distillation through Attention(DeiT)

2. Contents Summary 01 03 02 04 05 Experience Prerequisites Method Discussion

3. Summary 01 Summary of DeiT

4. 01. Summary 1. 2020년 12월 발표, Facebook AI 2. ViT를 일부 발전시키고 Distillation 개념 도입 3. Contribution - CNN을 사용하지 않은 Image Classification - ImageNet만으로 학습 - Single 8-GPU Node로 2~3일정도만 학습 - SOTA CNN기반 Model과 비슷한 성능 확인 - Distillation 개념 도입 4. Conclusion - CNN 기반 Architecture들은 다년간 연구가 진행되어 성능 향상 - Image Context Task에서 Transformer는 이제 막 연구되기 시작함 > 비슷한 성능을 보여준다는 점에서 Transformer의 가능성을 보여줌

5. Prerequisites 02 Vision Transformer & Knowledge Distillation

6. 02. Prerequisites 1. Vision Transformer - An Image is Worth 16x16 words : Transformers for Image Recognition at Scale, Google > 참조 : Deformable DETR: Deformable Transformers for End to End Object Detection paper review - 홍은기

7. 02. Prerequisites 1. Vision Transformer - Training Dataset : JFT-300M - Pre-train : Low Resolution, Fine-tunning : High Resolution > Position Embedding : Bicubic Interpolation

8. 02. Prerequisites 2. Knowledge Distillation - 미리 잘 학습된 Teacher Model을 작은 Student Model에 지식을 전달한다는 개념 > 참조 : Explaining knowledge distillation by quantifying the knowledge - 김동희

9. Q & A

10. Architecture 03 Implement of DeiT

11. 03. Architecture 1. Knowledge Distillation - Class Token과 같은 구조의 Distillation Token 추가 - Soft Distillation - Hard Distillation - Random Crop으로 인한 잘못된 학습 방지 가능 GT : Cat / Prediction : Cat GT : Cat / Prediction : ???

12. 03. Architecture 2. Bag of Tricks - 기본적으로, ViT 구조를 그대로 사용 (ViT-B = DeiT-B) > 기본적인 학습 방법 동일 > Hyper parameter Tunning으로 성능 향상

13. Q & A

14. EXPERIMENTS 04 Experiment Result of DeiT

15. 04. Experiments 1. Distillation - Teacher Model : RegNetY-16GF > ConvNet is Better than Transformer Model “Probably” Inductive Bias ! - Distillation Comparison : Hard is Better * Inductive Bias - Distillation Method가 Convnet의 Inductive Bias를 더 잘 학습한다

16. 04. Experiments 2. Efficiency vs Accuracy - Parameter의 개수, 처리속도, Accuracy를 비교 > Throughput과 Accuracy로 비교하면, Convnet와 유사한 성능을 보인다 - Base Model : DeiT-B (= ViT-B) 3. Transfer Learning - ImageNet으로 학습한 Pre-Train Model을 다른 데이터 Set으로 Test

17. Discussion 05 Conclusion & Discussing

18. 05. Discussion 1. Contribution 1) Transformer 기반의 ViT Model의 성능 향상 (Convnet X) 2) ViT보다 더 적은 Dataset으로 학습 및 학습속도 향상 3) SOTA Convnet과 유사한 성능 확인 4) 간편한 Knowledge Distillation 방법 제안 2. Opinion 1) 여전히 많은 Epoch 필요 (300~500Epoch) 2) Transformer의 단점이 드러남 > Hyper Parameter에 민감 > Convnet대비 많은 Dataset과 Training 시간이 필요 > 연구단계에서는 많은 연구 가능, 현업에 적용하기에는 어려움 3) Deep Learning 개발 초기단계의 연구 방식 > Quantitative Research (Experiment  Theory) > Experiment의 결과를 충분히 해석하지 못함 3. Conclusion 1) 아직 연구가 많이 필요한 분야 2) 연구 초기단계임에도 불구하고 CNN과 유사한 성능을 나타낸다는 것은 NLP에서의 변화처럼, CNN을 대체할 수 있을 가능성을 확인할 수 있음

19. Q & A

20. THANK YOU for Watching

Training data-efficient image transformers & distillation through attention

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Training data-efficient image transformers & distillation through attention

Similar to Training data-efficient image transformers & distillation through attention (20)

More from taeseon ryu

More from taeseon ryu (20)

Training data-efficient image transformers & distillation through attention