SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
UNDERSTANDING
DEEP LEARNING
REQUIRES
RETHINKING
Ahmet KUZUBAŞLI
16.03.2018
Quick Facts
• Google Brain
• ICLR 2017 - Best Paper Award
• Love it / Hate it
• Interesting Experiments
• Questioning the Traditional Explanations
• A “This is also not useful!” paper
large networks
generalize well in practice?
Answers…
more interpretable networks.
principled and reliable designs.
neural networks
that generalize well
from those that don’t?
Answers…
more interpretable networks.
principled and reliable designs.
Are enough to
explain the results we are observing?
Answers…
more interpretable networks.
principled and reliable designs.
Conventional Wisdom
Test
Training
Error
Model Complexity
Where should we put
Deep Learning?
Conventional Wisdom
Deep Learning
Conventional Wisdom
Counting the number of parameters are NOT USEFUL!
How are we going to measure the complexity?
Deep Learning
(from poster)
Randomization Tests
1. Random labeling of true data
2. Partially corrupted labels
3. Shuffled pixels
4. Completely random pixels
5. Interpolations between no-noise / complete-noise pixels
Random Labeling of True Data
(from poster)
Random Labeling of True Data
(from poster)
Random Labeling of True Data
But it still fits perfectly!
It memorizes all random labels!
(from paper)
Random Labeling of True Data
(from paper)
Implications
Rademacher complexity and VC-dimension:
Uniform stability:
The networks fit the training set with random labels perfectly.
How sensitive the algorithm is to replacement of a single example.
(from paper)
What do they mean? - Implications
Rademacher complexity and VC-dimension:
Uniform stability:
The networks fit the training set with random labels perfectly.
How sensitive the algorithm is to replacement of a single example.
What do they mean? - Implications
Rademacher complexity and VC-dimension:
Uniform stability:
The networks fit the training set with random labels perfectly.
How sensitive the algorithm is to replacement of a single example.
Explicit Regularizations
• “The original hypothesis space is too large to generalize,
so confine learning to a subset of the hypothesis space
with manageable complexity.”
• Data augmentation
• Weight decay
• Dropout
Can this be the reason why the networks generalize well?
Explicit Regularizations
(see Table-2 at the appendix for details)
Explicit Regularizations
(see Table-2 at the appendix for details)
Implicit Regularizations
• Early Stopping (helps in ImageNet but not in CIFAR10)
• Batch-normalization: Improves 3~4%
(see Table-2 at the appendix for details)
Implicit Regularizations
• Early Stopping (helps in ImageNet but not in CIFAR10)
• Batch-normalization: Improves 3~4%
(see Table-2 at the appendix for details)
So what is the source of generalization?
Finite Sample Expressivity
a depth k network with O(n/k) also does the job.
(see Proof at the appendix for details)
An Appeal to Linear Models
• “Is there a way to determine when one global minimum
will generalize whereas another will not?”
• Common way: check the curvature of the loss at the minimum
• In linear case: Hessian is degenerate at all minima. NOT USEFUL!
• A good-old-friend: SGD!
• SGD provides “Kernel trick” as an implicit regularization.
• Often converges with minimum norm, providing guidance.
• But minimum norm is NOT -totally- predictive for generalization.
Conclusion
• Effective capacity of successful NNs is large enough to
shatter the training data.
“Rich enough to memorize the training data”.
• A conceptual challenge to statistical learning theory.
• Model complexity struggle to explain the generalization ability of large
ANNs.
• Optimization is easy for large neural networks.
• The source of optimization and generalization are different.
• We have yet to discover a precise formal measure
under which these enormous models are simple.
References
• https://arxiv.org/pdf/1611.03530.pdf (paper)
• https://danieltakeshi.github.io/2017/05/19/understanding - deep - learning - requires -
rethinking - generalization - my - thoughts - and - notes (blog)
• https://www.slideshare.net/JungHoonSeo2/understanding - deep - learning - requires -
rethinking - generalization - 2017 - 12 (slideshare)
• https://www.youtube.com/watch?v=kCj51pTQPKI (presentation, YouTube)
• http://pluskid.org/slides/ICLR2017 - Poster.pdf (poster)
• https://github.com/pluskid/fitting - random - labels (code)
• https://openreview.net/forum?id=Sy8gdB9xx (open review comments)
• Google Images
Feel free to ask questions…

Contenu connexe

Tendances

Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among ThievesMultiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Naughty Dog
 
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
tcaesvk
 

Tendances (10)

Privacy preserving machine learning
Privacy preserving machine learningPrivacy preserving machine learning
Privacy preserving machine learning
 
Mnist report ppt
Mnist report pptMnist report ppt
Mnist report ppt
 
Anomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-EncodersAnomaly Detection using Deep Auto-Encoders
Anomaly Detection using Deep Auto-Encoders
 
Intelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiencyIntelligence at scale through AI model efficiency
Intelligence at scale through AI model efficiency
 
AI Security : Machine Learning, Deep Learning and Computer Vision Security
AI Security : Machine Learning, Deep Learning and Computer Vision SecurityAI Security : Machine Learning, Deep Learning and Computer Vision Security
AI Security : Machine Learning, Deep Learning and Computer Vision Security
 
Building trust through Explainable AI
Building trust through Explainable AIBuilding trust through Explainable AI
Building trust through Explainable AI
 
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among ThievesMultiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
Multiprocessor Game Loops: Lessons from Uncharted 2: Among Thieves
 
Robustness of Deep Neural Networks
Robustness of Deep Neural NetworksRobustness of Deep Neural Networks
Robustness of Deep Neural Networks
 
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
NDC 2013, 마비노기 영웅전 개발 테크니컬 포스트-모템
 
ARM Programlama
ARM ProgramlamaARM Programlama
ARM Programlama
 

Similaire à Understanding Deep Learning Requires Rethinking Generalization

Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
Turi, Inc.
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.
Teng Xiaolu
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
Akash527744
 

Similaire à Understanding Deep Learning Requires Rethinking Generalization (20)

[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization[PR12] understanding deep learning requires rethinking generalization
[PR12] understanding deep learning requires rethinking generalization
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Deep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problemsDeep vs diverse architectures for classification problems
Deep vs diverse architectures for classification problems
 
The zen of predictive modelling
The zen of predictive modellingThe zen of predictive modelling
The zen of predictive modelling
 
Demystifying Machine Learning
Demystifying Machine LearningDemystifying Machine Learning
Demystifying Machine Learning
 
Core Methods In Educational Data Mining
Core Methods In Educational Data MiningCore Methods In Educational Data Mining
Core Methods In Educational Data Mining
 
How data science works and how can customers help
How data science works and how can customers helpHow data science works and how can customers help
How data science works and how can customers help
 
Learning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat MinimaLearning Theory 101 ...and Towards Learning the Flat Minima
Learning Theory 101 ...and Towards Learning the Flat Minima
 
Real World Performance - OLTP
Real World Performance - OLTPReal World Performance - OLTP
Real World Performance - OLTP
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Statistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignoreStatistics in the age of data science, issues you can not ignore
Statistics in the age of data science, issues you can not ignore
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual RepresentationsPR-231: A Simple Framework for Contrastive Learning of Visual Representations
PR-231: A Simple Framework for Contrastive Learning of Visual Representations
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?
 
Neural Networks in Data Mining - “An Overview”
Neural Networks  in Data Mining -   “An Overview”Neural Networks  in Data Mining -   “An Overview”
Neural Networks in Data Mining - “An Overview”
 
copy for Gary Chin.
copy for Gary Chin.copy for Gary Chin.
copy for Gary Chin.
 
DataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptxDataMiningOverview_Galambos_2015_06_04.pptx
DataMiningOverview_Galambos_2015_06_04.pptx
 
Fuzzy clustering of sentence
Fuzzy clustering of sentenceFuzzy clustering of sentence
Fuzzy clustering of sentence
 
Entity embeddings for categorical data
Entity embeddings for categorical dataEntity embeddings for categorical data
Entity embeddings for categorical data
 

Dernier

Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
HyderabadDolls
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
chadhar227
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 

Dernier (20)

DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Statistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbersStatistics notes ,it includes mean to index numbers
Statistics notes ,it includes mean to index numbers
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
Sealdah % High Class Call Girls Kolkata - 450+ Call Girl Cash Payment 8005736...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Kings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about themKings of Saudi Arabia, information about them
Kings of Saudi Arabia, information about them
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
Nirala Nagar / Cheap Call Girls In Lucknow Phone No 9548273370 Elite Escort S...
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...Fun all Day Call Girls in Jaipur   9332606886  High Profile Call Girls You Ca...
Fun all Day Call Girls in Jaipur 9332606886 High Profile Call Girls You Ca...
 
Gartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptxGartner's Data Analytics Maturity Model.pptx
Gartner's Data Analytics Maturity Model.pptx
 
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
Gomti Nagar & best call girls in Lucknow | 9548273370 Independent Escorts & D...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
Gulbai Tekra * Cheap Call Girls In Ahmedabad Phone No 8005736733 Elite Escort...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 

Understanding Deep Learning Requires Rethinking Generalization

  • 2. Quick Facts • Google Brain • ICLR 2017 - Best Paper Award • Love it / Hate it • Interesting Experiments • Questioning the Traditional Explanations • A “This is also not useful!” paper
  • 3. large networks generalize well in practice? Answers… more interpretable networks. principled and reliable designs.
  • 4. neural networks that generalize well from those that don’t? Answers… more interpretable networks. principled and reliable designs.
  • 5. Are enough to explain the results we are observing? Answers… more interpretable networks. principled and reliable designs.
  • 8. Conventional Wisdom Counting the number of parameters are NOT USEFUL! How are we going to measure the complexity? Deep Learning (from poster)
  • 9. Randomization Tests 1. Random labeling of true data 2. Partially corrupted labels 3. Shuffled pixels 4. Completely random pixels 5. Interpolations between no-noise / complete-noise pixels
  • 10. Random Labeling of True Data (from poster)
  • 11. Random Labeling of True Data (from poster)
  • 12. Random Labeling of True Data But it still fits perfectly! It memorizes all random labels! (from paper)
  • 13. Random Labeling of True Data (from paper)
  • 14. Implications Rademacher complexity and VC-dimension: Uniform stability: The networks fit the training set with random labels perfectly. How sensitive the algorithm is to replacement of a single example. (from paper)
  • 15. What do they mean? - Implications Rademacher complexity and VC-dimension: Uniform stability: The networks fit the training set with random labels perfectly. How sensitive the algorithm is to replacement of a single example.
  • 16. What do they mean? - Implications Rademacher complexity and VC-dimension: Uniform stability: The networks fit the training set with random labels perfectly. How sensitive the algorithm is to replacement of a single example.
  • 17. Explicit Regularizations • “The original hypothesis space is too large to generalize, so confine learning to a subset of the hypothesis space with manageable complexity.” • Data augmentation • Weight decay • Dropout Can this be the reason why the networks generalize well?
  • 18. Explicit Regularizations (see Table-2 at the appendix for details)
  • 19. Explicit Regularizations (see Table-2 at the appendix for details)
  • 20. Implicit Regularizations • Early Stopping (helps in ImageNet but not in CIFAR10) • Batch-normalization: Improves 3~4% (see Table-2 at the appendix for details)
  • 21. Implicit Regularizations • Early Stopping (helps in ImageNet but not in CIFAR10) • Batch-normalization: Improves 3~4% (see Table-2 at the appendix for details)
  • 22. So what is the source of generalization?
  • 23. Finite Sample Expressivity a depth k network with O(n/k) also does the job. (see Proof at the appendix for details)
  • 24. An Appeal to Linear Models • “Is there a way to determine when one global minimum will generalize whereas another will not?” • Common way: check the curvature of the loss at the minimum • In linear case: Hessian is degenerate at all minima. NOT USEFUL! • A good-old-friend: SGD! • SGD provides “Kernel trick” as an implicit regularization. • Often converges with minimum norm, providing guidance. • But minimum norm is NOT -totally- predictive for generalization.
  • 25. Conclusion • Effective capacity of successful NNs is large enough to shatter the training data. “Rich enough to memorize the training data”. • A conceptual challenge to statistical learning theory. • Model complexity struggle to explain the generalization ability of large ANNs. • Optimization is easy for large neural networks. • The source of optimization and generalization are different. • We have yet to discover a precise formal measure under which these enormous models are simple.
  • 26. References • https://arxiv.org/pdf/1611.03530.pdf (paper) • https://danieltakeshi.github.io/2017/05/19/understanding - deep - learning - requires - rethinking - generalization - my - thoughts - and - notes (blog) • https://www.slideshare.net/JungHoonSeo2/understanding - deep - learning - requires - rethinking - generalization - 2017 - 12 (slideshare) • https://www.youtube.com/watch?v=kCj51pTQPKI (presentation, YouTube) • http://pluskid.org/slides/ICLR2017 - Poster.pdf (poster) • https://github.com/pluskid/fitting - random - labels (code) • https://openreview.net/forum?id=Sy8gdB9xx (open review comments) • Google Images
  • 27. Feel free to ask questions…