SlideShare une entreprise Scribd logo
1  sur  43
Télécharger pour lire hors ligne
Designing Your Neural Networks:
A Step by Step Walkthrough
Lavanya Shukla
What's
on the
menu
today?
How do Neural Networks
learn?
Take a whirlwind tour of
Neural Network architectures
Train Neural Networks
Optimize Neural Network to
achieve SOTA performance
Weights & Biases
How You Can Train Your Own
Neural Nets
The Codebit.ly/keras-neural-nets
The Goal For Today + Code
Basic Neural Network
Architecture
The Perceptron
Neurons output the weighted sum of
their inputs
How A Neural Network Learns
This is the number of features your neural
network uses to make its predictions.
The input vector needs one input neuron per
feature.
You want to carefully select these features
and remove any that may contain patterns
that won’t generalize beyond the training set
(and cause overfitting).
For images, this is the dimensions of your
image (28*28=784 in case of MNIST).
Input neurons
How A Neural Network
Sees An Image
For images, the number of input neurons is
the dimensions of your image (28*28=784 in
case of MNIST).
How A Neural Network Sees
An Image
The convolution layer is made up of a set of
independent filters.
Each filter slides over the image and creates
feature maps that learn different aspects of
an image.
Convolutions
The pooling layer reduce the size of the
image representation and so the number of
parameters and computation in the
network.
Pooling
How The Network Sees An Image
Basic Neural Network
Architecture
Let's get back to talking about the
This is the number of predictions you want to
make.
Regression
one neuron per predicted value
Classification
For binary classification (spam-not spam),
one output neuron per positive class.
For multi-class classification (car, dog,
house), one output neuron per class, and
use the softmax activation function on the
output layer to ensure the final
probabilities sum to 1.
Output neurons
The number of hidden layers is highly
dependent on the problem and the
architecture of your neural network.
You’re essentially trying to Goldilocks your
way into the perfect neural network
architecture – not too big, not too small, just
right.
Generally, 1-5 hidden layers will serve you
well for most problems.
Hidden Layers
In general using the same number of
neurons for all hidden layers will suffice.
For some datasets, having a large first layer
followed by smaller layers will lead to
better performance as the first layer can learn
a lot of lower-level features that can feed
into a few higher order features in the
subsequent layers.
Hidden Layers - Tips
Usually you will get more of a performance
boost from adding more layers than adding
more neurons in each layer.
I’d recommend starting with 1-5 layers and 1-
100 neurons and slowly adding more layers
and neurons until you start overfitting.
Hidden Layers - Tips
If number of layers/neurons is too small,
your network will not be able to learn the
underlying patterns in your data and be
useless.
An approach to counteract this is to start
with a huge number of hidden layers +
hidden neurons and then use dropout and
early stopping to let the neural network size
itself down for you.
Hidden Layers - Overfit First
When working with image or speech data,
you’d want your network to have dozens-
hundreds of layers, not all of which might be
fully connected.
For these use cases, there are pre-trained
models (YOLO, ResNet, VGG) that allow you
to use large parts of their networks, and train
your model on top of these networks to learn
only the higher order features.
In this case, your model will still have only a
few layers to train.
Hidden Layers - Images
Batch Size: Total number of training
examples present in a single batch.
Large batch sizes can be great because they
can harness the power of GPUs to process
more training instances per time.
Small batch sizes' performance generalizes
better and are less memory intensive.
If you’re not operating at massive scales, I
would recommend starting with lower
batch sizes and slowly increasing the size
and monitoring performance.
Between 32 and 64.
Batch Size
Epochs: number of times your network sees
your data.
One epoch is when an entire dataset is
passed both forward and backward through
the neural network only once.
I’d recommend starting with a large number
of epochs and use Early Stopping to halt
training when performance stops improving.
Number of Epochs
Make sure all your features have similar scale
before using them as inputs to your neural
network.
This ensures faster convergence.
When your features have different scales (e.g.
salaries in thousands and years of experience
in tens), the cost function will look like the
elongated bowl on the left.
This means your optimization algorithm will
take a long time to traverse the valley
compared to using normalized features (on
the right).
Scaling your features
Activation Functions
Decides if a neuron fires or not.
In general, the performance from using
different activation functions improves in this
order (from lowest→highest performing):
logistic → tanh → ReLU → Leaky ReLU → ELU
→ SELU.
to combat neural network overfitting: RReLU
reduce latency at runtime: leaky ReLU
for massive training sets: PReLU
for fast inference times: leaky ReLU
if your network doesn’t self-normalize: ELU
for an overall robust activation function: SELU
Activation Functions
Loss Functions
Regression
Mean squared error is the most common
loss function to optimize for, unless there
are a significant number of outliers.
When you have a lot of outliers, use mean
absolute error.
Classification
Cross-entropy will serve you well in most
cases. It maximize the likelihood of
classifying the input data correctly.
Loss Functions
Learning Rate
The amount by which the weights are updated
during training.
Picking the learning rate is very important,
and you want to make sure you get this right!
To find the best learning rate:
start with a very low values (10^-6)
and slowly multiply it by a constant until
it reaches a very high value (e.g. 10).
Measure your model performance vs each
learning rate and use W&B to pick the best
one.
Learning Rate
Weight Initialization
The right weight initialization method can
speed up time-to-convergence considerably.
The choice of your initialization method
depends on your activation function.
When using ReLU or leaky RELU,
use He initialization
When using SELU or ELU,
use LeCun initialization
When using softmax, logistic, or tanh,
use Glorot initialization
Weight Initialization
Early Stopping
Early Stopping lets you live it up by training
a model with more hidden layers, hidden
neurons and for more epochs than you
need.
Then just stopping training when
performance stops improving
consecutively for n epochs.
It saves the best performing model for you.
You can enable Early Stopping by setting
up a callback when you fit your model and
setting save_best_only=True.
Early Stopping
Dropout
gives you a massive performance boost
(~2% for SOTA models)
Very simple: randomly turns off a
percentage of neurons at each layer, at
each training step.
More robust network because it can’t rely
on any particular set of input neurons for
making predictions.
Around 2^n (where n is the number of
neurons in the architecture) slightly-
unique neural networks are generated
during the training process, and ensembled
together to make predictions.
Dropout
A good dropout rate is:
between 0.1 to 0.5; 0.3 for RNNs
0.5 for CNNs.
Use larger rates for bigger layers.
Increasing the dropout rate decreases
overfitting, and decreasing the rate is
helpful to combat under-fitting.
You definitely don’t want to use dropout in
the output layers.
AlphaDropout works well with SELU
activation functions by preserving the
input’s mean and standard deviations.
Dropout
Optimizers
Use Stochastic Gradient Descent if you
care deeply about quality of convergence
and if time is not of the essence.
If you care about time-to-convergence
and a point close to optimal
convergence will suffice, experiment with
Adam, Nadam, RMSProp, and Adamax
optimizers.
Adam/Nadam are good starting points, and
tend to be quite forgiving to a bad
learning late and other non-optimal
hyperparameters.
Optimizers
Vanishing & Exploding
Gradients
Just like people, not all neural network
layers learn at the same speed.
When the backprop algorithm
propagates the error gradient from the
output layer to the first layers, the
gradients get smaller and smaller until
they’re almost negligible when they reach
the first layers.
This means the weights of the first layers
aren’t updated significantly at each
step.
This is the problem of vanishing gradients.
Vanishing & Exploding
Gradients
Experiment with:
Weight Initialization Method
Activation Function
Gradient Clipping:
clip them when they exceed a certain
value
Early Stopping
Dropout
Optimizer - Adam, nAdam
BatchNorm
Learning Rate Scheduling
Vanishing & Exploding
Gradients
Now it's your turn!bit.ly/keras-neural-nets
Thank you!
lavanyaai
lavanya.ai
Twitter.com/
http://

Contenu connexe

Tendances

artificial neural network
artificial neural networkartificial neural network
artificial neural networkPallavi Yadav
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Mohammed Musah
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learningShajun Nisha
 
Knapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsKnapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsStelios Krasadakis
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment AnalysisRebecca Williams
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDevashish Shanker
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2izahn
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and BoostingMohit Rajput
 
Fundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksFundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksNelson Piedra
 
Text summarization
Text summarizationText summarization
Text summarizationkareemhashem
 
Mtech IEEE Conference Presentation
Mtech IEEE Conference PresentationMtech IEEE Conference Presentation
Mtech IEEE Conference PresentationJanardhan Reddy
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Edureka!
 
Adaptive resonance theory (art)
Adaptive resonance theory (art)Adaptive resonance theory (art)
Adaptive resonance theory (art)Ashutosh Tyagi
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time seriesLuigi Piva CQF
 

Tendances (19)

artificial neural network
artificial neural networkartificial neural network
artificial neural network
 
Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)Introduction to principal component analysis (pca)
Introduction to principal component analysis (pca)
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
Linear regression in machine learning
Linear regression in machine learningLinear regression in machine learning
Linear regression in machine learning
 
Knapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic AlgorithmsKnapsack problem solved by Genetic Algorithms
Knapsack problem solved by Genetic Algorithms
 
Presentation on Sentiment Analysis
Presentation on Sentiment AnalysisPresentation on Sentiment Analysis
Presentation on Sentiment Analysis
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2Introduction to R Graphics with ggplot2
Introduction to R Graphics with ggplot2
 
Understanding Bagging and Boosting
Understanding Bagging and BoostingUnderstanding Bagging and Boosting
Understanding Bagging and Boosting
 
Fundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural NetworksFundamental, An Introduction to Neural Networks
Fundamental, An Introduction to Neural Networks
 
Path analysis
Path analysisPath analysis
Path analysis
 
Text summarization
Text summarizationText summarization
Text summarization
 
Mtech IEEE Conference Presentation
Mtech IEEE Conference PresentationMtech IEEE Conference Presentation
Mtech IEEE Conference Presentation
 
The basics of ontologies
The basics of ontologiesThe basics of ontologies
The basics of ontologies
 
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...Statistics For Data Science | Statistics Using R Programming Language | Hypot...
Statistics For Data Science | Statistics Using R Programming Language | Hypot...
 
Ppt for national conference
Ppt for national conferencePpt for national conference
Ppt for national conference
 
Adaptive resonance theory (art)
Adaptive resonance theory (art)Adaptive resonance theory (art)
Adaptive resonance theory (art)
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Multivariate time series
Multivariate time seriesMultivariate time series
Multivariate time series
 

Similaire à Designing your neural networks – a step by step walkthrough

AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesValue Amplify Consulting
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Julien SIMON
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.pptyang947066
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdfgnans Kgnanshek
 
neural network
neural networkneural network
neural networkSTUDENT
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash courseVishwas N
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksNAVER Engineering
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxDrKBManwade
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxssuserd23711
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfsudheeremoa229
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its applicationmilan107
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine LearningUpekha Vandebona
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Amazon Web Services
 

Similaire à Designing your neural networks – a step by step walkthrough (20)

ANN.ppt[1].pptx
ANN.ppt[1].pptxANN.ppt[1].pptx
ANN.ppt[1].pptx
 
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI TechnologiesAI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
AI Class Topic 6: Easy Way to Learn Deep Learning AI Technologies
 
SoftComputing6
SoftComputing6SoftComputing6
SoftComputing6
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)Deep Learning with Apache MXNet (September 2017)
Deep Learning with Apache MXNet (September 2017)
 
deepnet-lourentzou.ppt
deepnet-lourentzou.pptdeepnet-lourentzou.ppt
deepnet-lourentzou.ppt
 
33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf33.-Multi-Layer-Perceptron.pdf
33.-Multi-Layer-Perceptron.pdf
 
neural network
neural networkneural network
neural network
 
Deep learning crash course
Deep learning crash courseDeep learning crash course
Deep learning crash course
 
Lifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable NetworksLifelong Learning for Dynamically Expandable Networks
Lifelong Learning for Dynamically Expandable Networks
 
NITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptxNITW_Improving Deep Neural Networks (1).pptx
NITW_Improving Deep Neural Networks (1).pptx
 
NITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptxNITW_Improving Deep Neural Networks.pptx
NITW_Improving Deep Neural Networks.pptx
 
Deep learning
Deep learningDeep learning
Deep learning
 
Dataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdfDataset Augmentation and machine learning.pdf
Dataset Augmentation and machine learning.pdf
 
Regularizing DNN.pptx
Regularizing DNN.pptxRegularizing DNN.pptx
Regularizing DNN.pptx
 
Ann model and its application
Ann model and its applicationAnn model and its application
Ann model and its application
 
Feature Selection in Machine Learning
Feature Selection in Machine LearningFeature Selection in Machine Learning
Feature Selection in Machine Learning
 
Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)Deep Learning for Developers (Advanced Workshop)
Deep Learning for Developers (Advanced Workshop)
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 
ANN - UNIT 3.pptx
ANN - UNIT 3.pptxANN - UNIT 3.pptx
ANN - UNIT 3.pptx
 

Dernier

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 

Dernier (20)

Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 

Designing your neural networks – a step by step walkthrough

  • 1. Designing Your Neural Networks: A Step by Step Walkthrough Lavanya Shukla
  • 2. What's on the menu today? How do Neural Networks learn? Take a whirlwind tour of Neural Network architectures Train Neural Networks Optimize Neural Network to achieve SOTA performance Weights & Biases How You Can Train Your Own Neural Nets
  • 4. The Goal For Today + Code
  • 6. The Perceptron Neurons output the weighted sum of their inputs
  • 7. How A Neural Network Learns
  • 8. This is the number of features your neural network uses to make its predictions. The input vector needs one input neuron per feature. You want to carefully select these features and remove any that may contain patterns that won’t generalize beyond the training set (and cause overfitting). For images, this is the dimensions of your image (28*28=784 in case of MNIST). Input neurons
  • 9. How A Neural Network Sees An Image
  • 10. For images, the number of input neurons is the dimensions of your image (28*28=784 in case of MNIST). How A Neural Network Sees An Image
  • 11. The convolution layer is made up of a set of independent filters. Each filter slides over the image and creates feature maps that learn different aspects of an image. Convolutions
  • 12. The pooling layer reduce the size of the image representation and so the number of parameters and computation in the network. Pooling
  • 13. How The Network Sees An Image
  • 14. Basic Neural Network Architecture Let's get back to talking about the
  • 15. This is the number of predictions you want to make. Regression one neuron per predicted value Classification For binary classification (spam-not spam), one output neuron per positive class. For multi-class classification (car, dog, house), one output neuron per class, and use the softmax activation function on the output layer to ensure the final probabilities sum to 1. Output neurons
  • 16. The number of hidden layers is highly dependent on the problem and the architecture of your neural network. You’re essentially trying to Goldilocks your way into the perfect neural network architecture – not too big, not too small, just right. Generally, 1-5 hidden layers will serve you well for most problems. Hidden Layers
  • 17. In general using the same number of neurons for all hidden layers will suffice. For some datasets, having a large first layer followed by smaller layers will lead to better performance as the first layer can learn a lot of lower-level features that can feed into a few higher order features in the subsequent layers. Hidden Layers - Tips
  • 18. Usually you will get more of a performance boost from adding more layers than adding more neurons in each layer. I’d recommend starting with 1-5 layers and 1- 100 neurons and slowly adding more layers and neurons until you start overfitting. Hidden Layers - Tips
  • 19. If number of layers/neurons is too small, your network will not be able to learn the underlying patterns in your data and be useless. An approach to counteract this is to start with a huge number of hidden layers + hidden neurons and then use dropout and early stopping to let the neural network size itself down for you. Hidden Layers - Overfit First
  • 20. When working with image or speech data, you’d want your network to have dozens- hundreds of layers, not all of which might be fully connected. For these use cases, there are pre-trained models (YOLO, ResNet, VGG) that allow you to use large parts of their networks, and train your model on top of these networks to learn only the higher order features. In this case, your model will still have only a few layers to train. Hidden Layers - Images
  • 21. Batch Size: Total number of training examples present in a single batch. Large batch sizes can be great because they can harness the power of GPUs to process more training instances per time. Small batch sizes' performance generalizes better and are less memory intensive. If you’re not operating at massive scales, I would recommend starting with lower batch sizes and slowly increasing the size and monitoring performance. Between 32 and 64. Batch Size
  • 22. Epochs: number of times your network sees your data. One epoch is when an entire dataset is passed both forward and backward through the neural network only once. I’d recommend starting with a large number of epochs and use Early Stopping to halt training when performance stops improving. Number of Epochs
  • 23. Make sure all your features have similar scale before using them as inputs to your neural network. This ensures faster convergence. When your features have different scales (e.g. salaries in thousands and years of experience in tens), the cost function will look like the elongated bowl on the left. This means your optimization algorithm will take a long time to traverse the valley compared to using normalized features (on the right). Scaling your features
  • 25. Decides if a neuron fires or not. In general, the performance from using different activation functions improves in this order (from lowest→highest performing): logistic → tanh → ReLU → Leaky ReLU → ELU → SELU. to combat neural network overfitting: RReLU reduce latency at runtime: leaky ReLU for massive training sets: PReLU for fast inference times: leaky ReLU if your network doesn’t self-normalize: ELU for an overall robust activation function: SELU Activation Functions
  • 27. Regression Mean squared error is the most common loss function to optimize for, unless there are a significant number of outliers. When you have a lot of outliers, use mean absolute error. Classification Cross-entropy will serve you well in most cases. It maximize the likelihood of classifying the input data correctly. Loss Functions
  • 29. The amount by which the weights are updated during training. Picking the learning rate is very important, and you want to make sure you get this right! To find the best learning rate: start with a very low values (10^-6) and slowly multiply it by a constant until it reaches a very high value (e.g. 10). Measure your model performance vs each learning rate and use W&B to pick the best one. Learning Rate
  • 31. The right weight initialization method can speed up time-to-convergence considerably. The choice of your initialization method depends on your activation function. When using ReLU or leaky RELU, use He initialization When using SELU or ELU, use LeCun initialization When using softmax, logistic, or tanh, use Glorot initialization Weight Initialization
  • 33. Early Stopping lets you live it up by training a model with more hidden layers, hidden neurons and for more epochs than you need. Then just stopping training when performance stops improving consecutively for n epochs. It saves the best performing model for you. You can enable Early Stopping by setting up a callback when you fit your model and setting save_best_only=True. Early Stopping
  • 35. gives you a massive performance boost (~2% for SOTA models) Very simple: randomly turns off a percentage of neurons at each layer, at each training step. More robust network because it can’t rely on any particular set of input neurons for making predictions. Around 2^n (where n is the number of neurons in the architecture) slightly- unique neural networks are generated during the training process, and ensembled together to make predictions. Dropout
  • 36. A good dropout rate is: between 0.1 to 0.5; 0.3 for RNNs 0.5 for CNNs. Use larger rates for bigger layers. Increasing the dropout rate decreases overfitting, and decreasing the rate is helpful to combat under-fitting. You definitely don’t want to use dropout in the output layers. AlphaDropout works well with SELU activation functions by preserving the input’s mean and standard deviations. Dropout
  • 38. Use Stochastic Gradient Descent if you care deeply about quality of convergence and if time is not of the essence. If you care about time-to-convergence and a point close to optimal convergence will suffice, experiment with Adam, Nadam, RMSProp, and Adamax optimizers. Adam/Nadam are good starting points, and tend to be quite forgiving to a bad learning late and other non-optimal hyperparameters. Optimizers
  • 40. Just like people, not all neural network layers learn at the same speed. When the backprop algorithm propagates the error gradient from the output layer to the first layers, the gradients get smaller and smaller until they’re almost negligible when they reach the first layers. This means the weights of the first layers aren’t updated significantly at each step. This is the problem of vanishing gradients. Vanishing & Exploding Gradients
  • 41. Experiment with: Weight Initialization Method Activation Function Gradient Clipping: clip them when they exceed a certain value Early Stopping Dropout Optimizer - Adam, nAdam BatchNorm Learning Rate Scheduling Vanishing & Exploding Gradients
  • 42. Now it's your turn!bit.ly/keras-neural-nets