SlideShare une entreprise Scribd logo
1  sur  65
Summary of Bay Area
Deep Learning School
Niketan Pansare
• Summary
• Why Deep Learning is gaining popularity ?
• Introduction to Deep Learning
• Case-study of the state-of-the-art networks
• How to train them
• Tricks of the trade
• Overview of existing deep learning stack
Agenda
Summary
• 1300 applicants for 500 spots (industry + academia)
• Videos are online:
• Day 1: https://www.youtube.com/watch?v=eyovmAtoUx0
• Day 2: https://www.youtube.com/watch?v=9dXiAecyJrY
• Mostly high-quality talks from different areas
• Computer Vision (Karpathy – OpenAI), Speech (Coates - Baidu), NLP (Socher –
Salesforce, Quoc Le - Google), Unsupervised Learning (Salakhutdinov - CMU),
Reinforcement Learning (Schulman - OpenAI)
• Tools (TensorFlow/Theano/Torch)
• Overview/Vision talks (Ng, Bengio and Larochelle)
• Networking:
• Keras contributor (working in startup) – CNTK integration, potential for SystemML
integration
• TensorFlow users in Google
• Discussion on “dynamic operator placement” described in the whitepaper
Why Deep Learning is gaining
popularity ?
• Efficacy of larger networks
Why Deep Learning is gaining popularity ?
Reference: Andrew Ng (Spark summit 2016).
• Efficacy of larger networks
Why Deep Learning is gaining popularity ?
Reference: Andrew Ng (Spark summit 2016).
Train large network on
large amount of data
Relative ordering not
defined for small data
• Efficacy of larger networks
• Large amount of data
Why Deep Learning is gaining popularity ?
Caltech101 dataset (by
FeiFei Li)
Google Street View
House Numbers (SVHN) Dataset
CIFAR-10 dataset
Flickr 30K Images
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
Why Deep Learning is gaining popularity ?
VGG: ~2-3 weeks training with 4 GPUs
ResNet 101: 2-3 weeks with 4 GPUs
Rocket
Fuel*
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Non-linearities, Batch normalization, Dropout, Ensembles
• Will discuss these in detail later
Why Deep Learning is gaining popularity ?
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
Why Deep Learning is gaining popularity ?
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
Why Deep Learning is gaining popularity ?
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• https://github.com/BVLC/caffe/wiki/Model-Zoo
• https://github.com/KaimingHe/deep-residual-networks
• https://github.com/facebook/fb.resnet.torch
• https://github.com/baidu-research/warp-ctc
• https://github.com/NervanaSystems/ModelZoo
Why Deep Learning is gaining popularity ?
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• Tooling support for rapid iterations/experimentation
• Auto-differentiation, general purpose optimizer (SGD variants)
• Layered architecture
• Tensorboard
Why Deep Learning is gaining popularity ?
• Efficacy of larger networks
• Large amount of data
• Compute power necessary to train larger networks
• Techniques/Algorithms/Networks to deal with training issues
• Success stories in vision, speech and text
• No feature engineering
• Transfer Learning + Open-source (network, learned weights, dataset
as well as codebase)
• Tooling support for rapid iterations/experimentation
• Auto-differentiation, general purpose optimizer (SGD variants)
• Layered architecture
• Tensorboard
Why Deep Learning is gaining popularity ?
Will skip RNN, LSTM,
CTC, Parameter
server, Unsupervised
and Reinforcement
Deep Learning
• DL for Speech (covers CTC + Speech pipeline):
• https://youtu.be/9dXiAecyJrY?t=3h49m40s
• https://github.com/baidu-research/ba-dls-deepspeech
• DL for NLP (covers word embeddings, RNN, LSTM, seq2seq)
• https://youtu.be/eyovmAtoUx0?t=3h51m45s (Richard Socher)
• https://youtu.be/9dXiAecyJrY?t=7h4m12s (Quoc Le)
• Deep Unsupervised Learning (covers RBM, Autoencoders, …):
• https://youtu.be/eyovmAtoUx0?t=7h7m54s
• Deep Reinforcement Learning (covers Q-learning, policy gradients):
• https://youtu.be/9dXiAecyJrY?t=7m43s
• Tutorial (TensorFlow, Torch, Theano)
• https://github.com/wolffg/tf-tutorial/
• https://github.com/alexbw/bayarea-dl-summerschool
• https://github.com/lamblin/bayareadlschool
Not covered in this talk
Introduction to Deep Learning
Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
Common layers
• Fully connected layer
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
Common layers
• Fully connected layer
• Convolution layer
• Less number of parameters as
compared to FC
• Useful to capture local
features (spatially)
• Output #channels = #filters
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Useful to tolerate feature
deformation such as local shifts
• Output #channels = Input
#channels
Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Activations
• Sigmoid
• Tanh
• ReLU
Reference: Introduction to Feedforward Neural Networks - Larochelle.​ https://dl.dropboxusercontent.com/u/19557502/hugo_dlss.pdf
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
• Squashes the neuron’s pre-activations between [0, 1]
• Historically popular
• Disadvantages:
• Tends to vanish the gradient as activation increase (i.e. saturated neurons)
• Sigmoid outputs are not zero-centered
• exp() is a bit compute expensive
Sigmoid
Reference: Introduction to Feedforward Neural Networks - Larochelle.​
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
• Squashes the neuron’s pre-activations between [-1, 1]
• Advantage:
• Zero-centered
• Disadvantages:
• Tends to vanish the gradient as activation increase
• exp() is compute expensive
Tanh
Reference: Introduction to Feedforward Neural Networks - Larochelle.​
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
• Bounded below by 0 (always non-negative)
• Advantages:
• Does not saturate (in +region)
• Very computationally efficient
• Converges much faster than sigmoid/tanh in practice (e.g. 6x)
• Disadvantages:
• Tends to blowup the activations
• Alternatives:
• Leaky ReLU: max(0.001*a, a)
• Parameteric ReLU: max(alpha*a, a)
• Exponential ReLU: a if a>0; else alpha*(exp(a)-1)
ReLU (Rectified Linear Units)
Reference: Introduction to Feedforward Neural Networks - Larochelle.​
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
max(0, a)
• According to Hinton, why did deep learning not catch on earlier ?
• Our labeled datasets were thousands of times too small.
• Our computers were millions of times too slow.
• We initialized the weights in a stupid way.
• We used the wrong type of non-linearity (i.e. sigmoid/tanh).
• Which non-linearity to use => ReLU according to
• LeCun: http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf
• Hinton: http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf
• Bengio: https://www.utc.fr/~bordesan/dokuwiki/_media/en/glorot10nipsworkshop.pdf
• If not satisfied with ReLU,
• Double-check the learning rates
• Then, try out Leaky ReLU / ELU
• Then, try out tanh but don’t expect much
• Don’t use sigmoid
Reference: Introduction to Feedforward Neural Networks - Larochelle.​
http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Activations
• SoftMax
• Strictly positive
• Sums to 1
• Used for multi-class classification
• Other losses: Hinge, Euclidean, Sigmoid cross-entropy, …
Reference: Introduction to Feedforward Neural Networks - Larochelle. https://dl.dropboxusercontent.com/u/19557502/hugo_dlss.pdf​
Common layers
• Fully connected layer
• Convolution layer
• Pooling layer
• Activations
• SoftMax
• Dropout
• Idea: «cripple» neural network by removing hidden units stochastically
• Use random mask: Could use a different dropout probability, but 0.5 usually
works well
• Beats regular backpropagation on many datasets, but is slower (~2x)
• Helps to prevent overfitting
Common layers
• Normalization layers
• Batch Normalization (BN)
• Network converge faster if inputs are whitened, i.e. linearly transformed to have zero
mean and unit variance, and decorrelated
• Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level
• BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift”
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
Two popular approaches:
- Subtract the mean image (e.g. AlexNet)
- Subtract per-channel mean (e.g. VGGNet)
Common layers
• Normalization layers
• Batch Normalization (BN)
• Network converge faster if inputs are whitened, i.e. linearly transformed to have zero
mean and unit variance, and decorrelated
• Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level
• BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift”
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
Common layers
• Normalization layers
• Batch Normalization (BN)
• BN: normalizing each layer, for each mini-batch
• Greatly accelerate training + Less sensitive to initialization + Improve regularization
Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
Trained with initial learning
rate 0.0015
Same as Inception with BN
before each nonlinearity
Initial learning rate
increased by 5x (0.0075)
and 30x (0.045)
Same as N-x5, but with
Sigmoid instead of ReLU
Common layers
• Normalization layers
• Batch Normalization (BN)
• Local Response Normalization (LRN)
• Used in AlexNet paper with k=2, alpha=10e-4, beta=0.75, n=5
• Not common anymore
channel
Number of channels
Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
Convolutional Neural networks
Convolutional Neural networks
LeNet for OCR (90s)
AlexNet
Compared to LeCun 1998, AlexNet used:
•More data: 10^6 vs. 10^3
•GPU (~20x speedup) => Almost 1B FLOPs for single image
•Deeper: More layers (8 weight layers)
•Fancy regularization (dropout 0.5)
•Fancy non-linearity (first use of ReLU according to Karpathy)
•Accuracy on ImageNet (ILSVRC 2012 winner): 16.4%
•Using ensembles (7 CNN), accuracy 15.4%
Convolutional Neural networks
ZFNet [Zeiler and Fergus, 2013]
•It was an improvement on AlexNet by tweaking the architecture
hyperparameters,
• In particular by expanding the size of the middle convolutional layers
• CONV 3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512
• And making the stride and filter size on the first layer smaller.
• CONV 1: change from (11x11 stride 4) to (7x7 stride 2)
•Accuracy on ImageNet (ILSVRC 2013 winner): 16.4% -> 14.8%
Reference: http://cs231n.github.io/convolutional-networks/
Convolutional Neural networks
• Homogenous architecture
• All convolution layers use small 3x3 filters
(compared to AlexNet that uses 11x11, 5x5 and
3x3 filters) with stride 1 (compared to AlexNet
that uses 4 and 1 strides)
• Depth of network critical component (19 layers)
• Other details:
• 5 maxpool layers (x2 reduction)
• No normalization
• 3 FC layers (instead of 2) => Most number of
parameters (102760448, 16777216, 409600)
• ImageNet top 5 error (ILSVRC 2014 runner-up):
• 14.8% -> 7.3% (top 5 error)
Reference: https://arxiv.org/pdf/1509.07627.pdf, https://arxiv.org/pdf/1409.1556v6.pdf, https://www.youtube.com/watch?v=j1jIoHN3m0s
64 128 256 512 512
Number of filters
• Why 3x3 layers ?
• Stacked convolution layers have large receptive field
• two 3x3 => 5x5 receptive field
• three 3x3 layers => 7x7 receptive field
• More non-linearity
• Less parameters to learn
New Lego brick or mini-network
(Inception module)
For Inception v4, see https://arxiv.org/abs/1602.07261
Convolutional Neural networks
GoogLeNet [Szegedy et al., 2014]
- 9 inception modules
- ILSVRC 2014 winner
(6.7% top 5 error )
- Only 5 million params!
(Uses Avg pooling instead of FC layers)
Convolutional Neural networks
GoogLeNet VGG_model_A AlexNet
updateOutput 130.76 162.74 27.65
updateGradInput 197.86 167.05 24.32
accGradParameters 142.15 199.49 28.99
Forward 130.76 162.74 27.65
Backward 340.01 366.54 53.31
TOTAL 470.77 529.29 80.96
Speed with Torch7 (using GeForce GTX TITAN X and CuDNN) … all time in milliseconds
Compared to AlexNet, GoogLeNet has
- 12x less params
- 2x more compute
- 6.67% (vs. 16.4%)
Compared to VGGNet, GoogLeNet has
- 36x less params
- 22 layers (vs. 19)
- 6.67% (vs. 7.3%)
Reference: https://arxiv.org/pdf/1512.00567.pdf, https://github.com/soumith/convnet-benchmarks/blob/master/torch7/imagenet_winners/output.log
Analysis of errors on GoogLeNet vs
human on ImageNet dataset
• Types of error that both GoogLeNet human are susceptible to:
• Multiple objects (24% of GoogLeNet errors and 16% of human errors)
• Incorrect annotations
• Types of error that GoogLeNet is more susceptible to than human:
• Object small or thin (21% of GoogLeNet errors)
• Image filters, eg: distort contrast/color distribution (13% of GoogLeNet errors and only 1
human error)
• Abstract representations, eg: shadow on the ground, of a child on a swing (6% GoogleNet
errors)
• Types of error that human is more susceptible to than GoogLeNet:
• Fine-grained recognition, eg: species of dogs (7% of GoogLeNet errors and 37% of human
errors)
• Insufficient training data
Reference: http://arxiv.org/abs/1409.0575
Convolutional Neural networks
New Lego brick (Residual block)
Reference: http://torch.ch/blog/2016/02/04/resnets.html
Shortcut to address underfitting
due to vanishing gradients
- Occurs even with batch
normalization
Convolutional Neural networks
• ResNet Architecture
• VGG style design => just deep
• All 3x3 convolution
• #Filter x2
• Other remarks:
• no max pooling (almost)
• no FC
• no dropout
• See https://github.com/facebook/fb.resnet.torch
Reference: http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf
Different abstractions for Deep Learning
Deep Learning pipeline Deep Learning task
Eg: CNN + classifier
=> Image captioning,
Localization, …
Deep Neural Network
Eg: CNN, AlexNet,
GoogLeNet, …
Layer
Eg: Convolution,
Pooling, …
Addressing other tasks …
Reference: https://docs.google.com/presentation/d/1Q1CmVVnjVJM_9CDk3B8Y6MWCavZOtiKmOLQ0XB7s9Vg/edit#slide=id.g17e6880c10_0_926
SKIP THIS !!
Addressing other tasks … SKIP THIS !!
…
How to train a Deep Neural
Network ?
Training a Deep Neural Network
Training a Deep Neural Network
“Forward propagation”
Compute a function via composition of linear
transformations followed by element-wise non-linearities
“Backward propagation”
Propagates errors backwards and update weights according
to how much they contributed to the output
Reference: “You Should Be Using Automatic Differentiation” by Ryan Adams (Twitter)
Special case of “automatic
differentiation” discussed
in next slides
Training a Deep Neural Network
Training features:
Training label:
Goal: learn the weights
Define a loss function:
For numerical stability and mathematical simplicity, we use negative log-likelihood
(often referred to as cross-entropy):
• Using the loss function: , we learn weights by
Training a Deep Neural Network
• Learning is cast as optimization
• Popular algorithm: Stochastic Gradient Descent
• Needs to compute the gradients:
• And initialization of weights (covered later):
• Evaluate derivative of f(x) = sin(x – 3/x) at x = 0.01
• Symbolic differentiation
• Symbolically differentiate the function as an expression, and evaluate it at
the required point
• Low speed + difficult to convert DNN into expressions
• Symbolically, f’(x) = cos(x – 3/x)(1+ 3/x2
) … at x=0.01 => -962.8192798
• Numerical differentiation
• Use finite differences:
• Generally bad numerical stability
Methods for differentiating functions
Reference: http://homes.cs.washington.edu/~naveenks/files/2009_Cranfield_PPT.pdf
• Automatic/Algorithmic Differentiation (AD)
• Mechanically calculates derivatives as functions expressed as computer
programs, at machine precision, and with complexity guarantees - Barak
Pearlmutter
• Reverse-mode automatic differentiation used in practice
Examples of AD in practice
https://github.com/HIPS/autograd
For Python and NumPy:
See http://www.autodiff.org/ for more details
For Torch (developed by Twitter cortex):
https://github.com/twitter/torch-autograd/
• Convert the algorithm into sequence of assignment of basic
operations:
Reverse-mode AD (how it works)
https://justindomke.wordpress.com/2009/03/24/a-simple-explanation-of-reverse-mode-automatic-differentiation/
parents of
• Apply chain rule:
• Differentiate each basic operation f in the reverse order:
Reverse-mode AD (how it works – NN)
From Neural Network with Torch - Alex Wiltschko
Reverse-mode AD (how it works – NN)
From Neural Network with Torch - Alex Wiltschko
• Normalize your data
• Mini-batch instead of SGD (leverage matrix-matrix operations)
• Use momentum
• Use adaptive learning rates:
• Adagrad: learning rates are scaled by the square root of the cumulative sum
of squared gradients
• RMSProp: instead of cumulative sum, use exponential moving average
• Adam: essentially combines RMSProp with momentum
• Debug your gradient using finite difference method
Tricks of the Trade
• Use momentum
• Use adaptive learning rates:
• Adagrad: learning rates are scaled by the square root of the cumulative sum
of squared gradients
• RMSProp: instead of cumulative sum, use exponential moving average
• Adam: essentially combines RMSProp with momentum
Tricks of the Trade
• Initialization matters
• Assume 10-layer FC network with tanh non-linearity
Tricks of the Trade
- Initialize with zero mean & 0.01 std dev
- Does not work for deep networks
Layer Number
Layer mean Layer std dev
- Initialize with zero mean & unit std dev
- Almost all neurons completely saturated, either -1
and 1. Gradients will be all zero.
Layer Number
Layer mean Layer std dev
• Initialization matters
• Assume 10-layer FC network with tanh non-linearity
Tricks of the Trade
Xavier initialization [Glorot et al., 2010]:
Layer Number
Layer mean Layer std dev
- Use zero-mean and 1/fan_in variance
- Works well for tanh
- But not for ReLU
He al proposed replacing by
Note: additional /2
• Initialization matters
• Assume 10-layer FC network with tanh non-linearity
• Batch normalization reduces the strong dependence on initialization
Tricks of the Trade
Overview of existing deep
learning stack
Existing Deep Learning Stack
Caffee, Theano , Torch7, TensorFlow, DeepLearning4J, SystemML*
cuDNN Aparapi (converts bytecode to OpenCL)
~CPU’s BLAS/LAPACK: cuBLAS, MAGMA,
CULA, cuSPARSE, cuSOLVER, cuRAND, etc
CUDA (preferred if Nvidia GPUs) OpenCL (portable)
Framework:
Library with
commonly used
building blocks:
Driver/Toolkit:
Hardware
Multicore, Task parallelism,
Minimize latency (eg:
Unsafe/DirectBuf/GC
pauses/NIO)
Data parallelism (single task),
Cost of moving data from CPU to
GPU (Kernel fusion ?), Maximize
throughput.
Rule of Thumb: Always use libraries !!
Caffe (GPU) 11x but Caffe(cuDNN) 14x
on AlexNet training (5 convolution + 3
connected layers)
*Conditions apply: Unified
memory model since CUDA
6
Comparison of existing framework
Core
Lang
Bindings CPU Single
GPU
Multi
GPU
Distributed Comments
Caffe C++ Python,
MatLab
Yes Yes Yes See
com.yahoo.ml.
CaffeOnSpark
Mostly for image classification,
Models/Layers expressed in proto
format
Theano /
PyLearn2
Python Yes Yes In
Progress
No Transparent use of GPU, Auto-diff,
General purpose, Computation as
DAG.
Torch7 Lua Yes Yes Yes See Twitter’s
torch-distlearn
CTC impl on Torch7 of Baidu’s Deep
Speech opensourced. Very efficient.
TensorFlow C++ Python Yes Yes Upto 4
GPUs
Not open-
sourced
Slower than Theano/Torch,
TensorBoard useful, Computation as
DAG
DL4J Java Yes Yes Most likely Yes Supports GPUs via CUDA, Support
for Hadoop/Spark
SystemML Java Python,
Scala
Yes In
Progress
Not yet Yes
Minerva/CXXN
et (Smola)
C++ Python Yes Yes Yes Yes https://github.com/dmlc. Minerva ~
Theano and CXXNet ~ Caffe
Thank You !!

Contenu connexe

Tendances

Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alphaSeiya Tokui
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningSeiya Tokui
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Preferred Networks
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-rileySopna Sumāto
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Seiya Tokui
 
Ground to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationGround to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationJawad Khan
 
WiMAX implementation in ns3
WiMAX implementation in ns3WiMAX implementation in ns3
WiMAX implementation in ns3Mustafa Khaleel
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUShohei Hido
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsKendall
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowEmanuel Di Nardo
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...MLconf
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Tyrone Systems
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Tyrone Systems
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshopodsc
 
Ns3 implementation wifi
Ns3 implementation wifiNs3 implementation wifi
Ns3 implementation wifiSalah Amean
 

Tendances (20)

Chainer v2 alpha
Chainer v2 alphaChainer v2 alpha
Chainer v2 alpha
 
Introduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep LearningIntroduction to Chainer: A Flexible Framework for Deep Learning
Introduction to Chainer: A Flexible Framework for Deep Learning
 
Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018Introduction to Chainer 11 may,2018
Introduction to Chainer 11 may,2018
 
NS-3
NS-3 NS-3
NS-3
 
Deep parking
Deep parkingDeep parking
Deep parking
 
Spring sim 2010-riley
Spring sim 2010-rileySpring sim 2010-riley
Spring sim 2010-riley
 
Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+Chainer Update v1.8.0 -> v1.10.0+
Chainer Update v1.8.0 -> v1.10.0+
 
Ground to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementationGround to ns3 - Basic wireless topology implementation
Ground to ns3 - Basic wireless topology implementation
 
WiMAX implementation in ns3
WiMAX implementation in ns3WiMAX implementation in ns3
WiMAX implementation in ns3
 
CuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPUCuPy: A NumPy-compatible Library for GPU
CuPy: A NumPy-compatible Library for GPU
 
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular LabsIntro to TensorFlow and PyTorch Workshop at Tubular Labs
Intro to TensorFlow and PyTorch Workshop at Tubular Labs
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Distributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflowDistributed implementation of a lstm on spark and tensorflow
Distributed implementation of a lstm on spark and tensorflow
 
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
Avi Pfeffer, Principal Scientist, Charles River Analytics at MLconf SEA - 5/2...
 
Deep Learning in theano
Deep Learning in theanoDeep Learning in theano
Deep Learning in theano
 
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
Explore Deep Learning Architecture using Tensorflow 2.0 now! Part 2
 
Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1Learn about Tensorflow for Deep Learning now! Part 1
Learn about Tensorflow for Deep Learning now! Part 1
 
DIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe WorkshopDIY Deep Learning with Caffe Workshop
DIY Deep Learning with Caffe Workshop
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Ns3 implementation wifi
Ns3 implementation wifiNs3 implementation wifi
Ns3 implementation wifi
 

Similaire à Notes from 2016 bay area deep learning school

prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptxssuserf583ac
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptxRohanBorgalli
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptxSreeVani74
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformShivaji Dutta
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)Amazon Web Services
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksIntel Nervana
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Larry Smarr
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelSri Ambati
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebula Project
 
2_Image Classification.pdf
2_Image Classification.pdf2_Image Classification.pdf
2_Image Classification.pdfFEG
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerPoo Kuan Hoong
 
ELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureBryan Ollendyke
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 

Similaire à Notes from 2016 bay area deep learning school (20)

prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
prace_days_ml_2019.pptx
prace_days_ml_2019.pptxprace_days_ml_2019.pptx
prace_days_ml_2019.pptx
 
Deep Learning on Qubole Data Platform
Deep Learning on Qubole Data PlatformDeep Learning on Qubole Data Platform
Deep Learning on Qubole Data Platform
 
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
AWS re:Invent 2016: Bringing Deep Learning to the Cloud with Amazon EC2 (CMP314)
 
Urs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural NetworksUrs Köster - Convolutional and Recurrent Neural Networks
Urs Köster - Convolutional and Recurrent Neural Networks
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
Panel: Open Infrastructure for an Open Society: OSG, Commercial Clouds, and B...
 
Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
H2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno CandelH2O World - H2O Deep Learning with Arno Candel
H2O World - H2O Deep Learning with Arno Candel
 
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander DibboOpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
OpenNebulaConf2015 1.07 Cloud for Scientific Computing @ STFC - Alexander Dibbo
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
2_Image Classification.pdf
2_Image Classification.pdf2_Image Classification.pdf
2_Image Classification.pdf
 
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A PrimerMDEC Data Matters Series: machine learning and Deep Learning, A Primer
MDEC Data Matters Series: machine learning and Deep Learning, A Primer
 
ELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System ArchitectureELMSLN: Rethinking System Architecture
ELMSLN: Rethinking System Architecture
 
Stackato v6
Stackato v6Stackato v6
Stackato v6
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 

Dernier

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operationalssuser3e220a
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmStan Meyer
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxRosabel UA
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsRommel Regala
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 

Dernier (20)

4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
Expanded definition: technical and operational
Expanded definition: technical and operationalExpanded definition: technical and operational
Expanded definition: technical and operational
 
Oppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and FilmOppenheimer Film Discussion for Philosophy and Film
Oppenheimer Film Discussion for Philosophy and Film
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
Presentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptxPresentation Activity 2. Unit 3 transv.pptx
Presentation Activity 2. Unit 3 transv.pptx
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
The Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World PoliticsThe Contemporary World: The Globalization of World Politics
The Contemporary World: The Globalization of World Politics
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 

Notes from 2016 bay area deep learning school

  • 1. Summary of Bay Area Deep Learning School Niketan Pansare
  • 2. • Summary • Why Deep Learning is gaining popularity ? • Introduction to Deep Learning • Case-study of the state-of-the-art networks • How to train them • Tricks of the trade • Overview of existing deep learning stack Agenda
  • 3. Summary • 1300 applicants for 500 spots (industry + academia) • Videos are online: • Day 1: https://www.youtube.com/watch?v=eyovmAtoUx0 • Day 2: https://www.youtube.com/watch?v=9dXiAecyJrY • Mostly high-quality talks from different areas • Computer Vision (Karpathy – OpenAI), Speech (Coates - Baidu), NLP (Socher – Salesforce, Quoc Le - Google), Unsupervised Learning (Salakhutdinov - CMU), Reinforcement Learning (Schulman - OpenAI) • Tools (TensorFlow/Theano/Torch) • Overview/Vision talks (Ng, Bengio and Larochelle) • Networking: • Keras contributor (working in startup) – CNTK integration, potential for SystemML integration • TensorFlow users in Google • Discussion on “dynamic operator placement” described in the whitepaper
  • 4. Why Deep Learning is gaining popularity ?
  • 5. • Efficacy of larger networks Why Deep Learning is gaining popularity ? Reference: Andrew Ng (Spark summit 2016).
  • 6. • Efficacy of larger networks Why Deep Learning is gaining popularity ? Reference: Andrew Ng (Spark summit 2016). Train large network on large amount of data Relative ordering not defined for small data
  • 7. • Efficacy of larger networks • Large amount of data Why Deep Learning is gaining popularity ? Caltech101 dataset (by FeiFei Li) Google Street View House Numbers (SVHN) Dataset CIFAR-10 dataset Flickr 30K Images
  • 8. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks Why Deep Learning is gaining popularity ? VGG: ~2-3 weeks training with 4 GPUs ResNet 101: 2-3 weeks with 4 GPUs Rocket Fuel*
  • 9. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Non-linearities, Batch normalization, Dropout, Ensembles • Will discuss these in detail later Why Deep Learning is gaining popularity ?
  • 10. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Success stories in vision, speech and text Why Deep Learning is gaining popularity ?
  • 11. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Success stories in vision, speech and text • No feature engineering Why Deep Learning is gaining popularity ?
  • 12. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Success stories in vision, speech and text • No feature engineering • Transfer Learning + Open-source (network, learned weights, dataset as well as codebase) • https://github.com/BVLC/caffe/wiki/Model-Zoo • https://github.com/KaimingHe/deep-residual-networks • https://github.com/facebook/fb.resnet.torch • https://github.com/baidu-research/warp-ctc • https://github.com/NervanaSystems/ModelZoo Why Deep Learning is gaining popularity ?
  • 13. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Success stories in vision, speech and text • No feature engineering • Transfer Learning + Open-source (network, learned weights, dataset as well as codebase) • Tooling support for rapid iterations/experimentation • Auto-differentiation, general purpose optimizer (SGD variants) • Layered architecture • Tensorboard Why Deep Learning is gaining popularity ?
  • 14. • Efficacy of larger networks • Large amount of data • Compute power necessary to train larger networks • Techniques/Algorithms/Networks to deal with training issues • Success stories in vision, speech and text • No feature engineering • Transfer Learning + Open-source (network, learned weights, dataset as well as codebase) • Tooling support for rapid iterations/experimentation • Auto-differentiation, general purpose optimizer (SGD variants) • Layered architecture • Tensorboard Why Deep Learning is gaining popularity ? Will skip RNN, LSTM, CTC, Parameter server, Unsupervised and Reinforcement Deep Learning
  • 15. • DL for Speech (covers CTC + Speech pipeline): • https://youtu.be/9dXiAecyJrY?t=3h49m40s • https://github.com/baidu-research/ba-dls-deepspeech • DL for NLP (covers word embeddings, RNN, LSTM, seq2seq) • https://youtu.be/eyovmAtoUx0?t=3h51m45s (Richard Socher) • https://youtu.be/9dXiAecyJrY?t=7h4m12s (Quoc Le) • Deep Unsupervised Learning (covers RBM, Autoencoders, …): • https://youtu.be/eyovmAtoUx0?t=7h7m54s • Deep Reinforcement Learning (covers Q-learning, policy gradients): • https://youtu.be/9dXiAecyJrY?t=7m43s • Tutorial (TensorFlow, Torch, Theano) • https://github.com/wolffg/tf-tutorial/ • https://github.com/alexbw/bayarea-dl-summerschool • https://github.com/lamblin/bayareadlschool Not covered in this talk
  • 17. Different abstractions for Deep Learning Deep Learning pipeline Deep Learning task Eg: CNN + classifier => Image captioning, Localization, … Deep Neural Network Eg: CNN, AlexNet, GoogLeNet, … Layer Eg: Convolution, Pooling, …
  • 18. Common layers • Fully connected layer Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
  • 19. Common layers • Fully connected layer • Convolution layer • Less number of parameters as compared to FC • Useful to capture local features (spatially) • Output #channels = #filters Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
  • 20. Common layers • Fully connected layer • Convolution layer • Pooling layer • Useful to tolerate feature deformation such as local shifts • Output #channels = Input #channels Reference: Convolutional Neural Networks for Visual Recognition. http://cs231n.github.io/
  • 21. Common layers • Fully connected layer • Convolution layer • Pooling layer • Activations • Sigmoid • Tanh • ReLU Reference: Introduction to Feedforward Neural Networks - Larochelle.​ https://dl.dropboxusercontent.com/u/19557502/hugo_dlss.pdf http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
  • 22. • Squashes the neuron’s pre-activations between [0, 1] • Historically popular • Disadvantages: • Tends to vanish the gradient as activation increase (i.e. saturated neurons) • Sigmoid outputs are not zero-centered • exp() is a bit compute expensive Sigmoid Reference: Introduction to Feedforward Neural Networks - Larochelle.​ http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
  • 23. • Squashes the neuron’s pre-activations between [-1, 1] • Advantage: • Zero-centered • Disadvantages: • Tends to vanish the gradient as activation increase • exp() is compute expensive Tanh Reference: Introduction to Feedforward Neural Networks - Larochelle.​ http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
  • 24. • Bounded below by 0 (always non-negative) • Advantages: • Does not saturate (in +region) • Very computationally efficient • Converges much faster than sigmoid/tanh in practice (e.g. 6x) • Disadvantages: • Tends to blowup the activations • Alternatives: • Leaky ReLU: max(0.001*a, a) • Parameteric ReLU: max(alpha*a, a) • Exponential ReLU: a if a>0; else alpha*(exp(a)-1) ReLU (Rectified Linear Units) Reference: Introduction to Feedforward Neural Networks - Larochelle.​ http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf max(0, a)
  • 25. • According to Hinton, why did deep learning not catch on earlier ? • Our labeled datasets were thousands of times too small. • Our computers were millions of times too slow. • We initialized the weights in a stupid way. • We used the wrong type of non-linearity (i.e. sigmoid/tanh). • Which non-linearity to use => ReLU according to • LeCun: http://yann.lecun.com/exdb/publis/pdf/jarrett-iccv-09.pdf • Hinton: http://www.cs.toronto.edu/~fritz/absps/reluICML.pdf • Bengio: https://www.utc.fr/~bordesan/dokuwiki/_media/en/glorot10nipsworkshop.pdf • If not satisfied with ReLU, • Double-check the learning rates • Then, try out Leaky ReLU / ELU • Then, try out tanh but don’t expect much • Don’t use sigmoid Reference: Introduction to Feedforward Neural Networks - Larochelle.​ http://cs231n.stanford.edu/slides/winter1516_lecture5.pdf
  • 26. Common layers • Fully connected layer • Convolution layer • Pooling layer • Activations • SoftMax • Strictly positive • Sums to 1 • Used for multi-class classification • Other losses: Hinge, Euclidean, Sigmoid cross-entropy, … Reference: Introduction to Feedforward Neural Networks - Larochelle. https://dl.dropboxusercontent.com/u/19557502/hugo_dlss.pdf​
  • 27. Common layers • Fully connected layer • Convolution layer • Pooling layer • Activations • SoftMax • Dropout • Idea: «cripple» neural network by removing hidden units stochastically • Use random mask: Could use a different dropout probability, but 0.5 usually works well • Beats regular backpropagation on many datasets, but is slower (~2x) • Helps to prevent overfitting
  • 28. Common layers • Normalization layers • Batch Normalization (BN) • Network converge faster if inputs are whitened, i.e. linearly transformed to have zero mean and unit variance, and decorrelated • Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level • BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift” • Greatly accelerate training + Less sensitive to initialization + Improve regularization Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift Two popular approaches: - Subtract the mean image (e.g. AlexNet) - Subtract per-channel mean (e.g. VGGNet)
  • 29. Common layers • Normalization layers • Batch Normalization (BN) • Network converge faster if inputs are whitened, i.e. linearly transformed to have zero mean and unit variance, and decorrelated • Ioffe and Szegedy, 2014 suggested to also use normalization at the level of hidden level • BN: normalizing each layer, for each mini-batch => addresses “internal covariate shift” • Greatly accelerate training + Less sensitive to initialization + Improve regularization Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift
  • 30. Common layers • Normalization layers • Batch Normalization (BN) • BN: normalizing each layer, for each mini-batch • Greatly accelerate training + Less sensitive to initialization + Improve regularization Reference: Batch Normalization: Accelerating Deep Network Training b y Reducing Internal Covariate Shift Trained with initial learning rate 0.0015 Same as Inception with BN before each nonlinearity Initial learning rate increased by 5x (0.0075) and 30x (0.045) Same as N-x5, but with Sigmoid instead of ReLU
  • 31. Common layers • Normalization layers • Batch Normalization (BN) • Local Response Normalization (LRN) • Used in AlexNet paper with k=2, alpha=10e-4, beta=0.75, n=5 • Not common anymore channel Number of channels
  • 32. Different abstractions for Deep Learning Deep Learning pipeline Deep Learning task Eg: CNN + classifier => Image captioning, Localization, … Deep Neural Network Eg: CNN, AlexNet, GoogLeNet, … Layer Eg: Convolution, Pooling, …
  • 34. Convolutional Neural networks LeNet for OCR (90s) AlexNet Compared to LeCun 1998, AlexNet used: •More data: 10^6 vs. 10^3 •GPU (~20x speedup) => Almost 1B FLOPs for single image •Deeper: More layers (8 weight layers) •Fancy regularization (dropout 0.5) •Fancy non-linearity (first use of ReLU according to Karpathy) •Accuracy on ImageNet (ILSVRC 2012 winner): 16.4% •Using ensembles (7 CNN), accuracy 15.4%
  • 35. Convolutional Neural networks ZFNet [Zeiler and Fergus, 2013] •It was an improvement on AlexNet by tweaking the architecture hyperparameters, • In particular by expanding the size of the middle convolutional layers • CONV 3,4,5: instead of 384, 384, 256 filters use 512, 1024, 512 • And making the stride and filter size on the first layer smaller. • CONV 1: change from (11x11 stride 4) to (7x7 stride 2) •Accuracy on ImageNet (ILSVRC 2013 winner): 16.4% -> 14.8% Reference: http://cs231n.github.io/convolutional-networks/
  • 36. Convolutional Neural networks • Homogenous architecture • All convolution layers use small 3x3 filters (compared to AlexNet that uses 11x11, 5x5 and 3x3 filters) with stride 1 (compared to AlexNet that uses 4 and 1 strides) • Depth of network critical component (19 layers) • Other details: • 5 maxpool layers (x2 reduction) • No normalization • 3 FC layers (instead of 2) => Most number of parameters (102760448, 16777216, 409600) • ImageNet top 5 error (ILSVRC 2014 runner-up): • 14.8% -> 7.3% (top 5 error) Reference: https://arxiv.org/pdf/1509.07627.pdf, https://arxiv.org/pdf/1409.1556v6.pdf, https://www.youtube.com/watch?v=j1jIoHN3m0s 64 128 256 512 512 Number of filters • Why 3x3 layers ? • Stacked convolution layers have large receptive field • two 3x3 => 5x5 receptive field • three 3x3 layers => 7x7 receptive field • More non-linearity • Less parameters to learn
  • 37. New Lego brick or mini-network (Inception module) For Inception v4, see https://arxiv.org/abs/1602.07261
  • 38. Convolutional Neural networks GoogLeNet [Szegedy et al., 2014] - 9 inception modules - ILSVRC 2014 winner (6.7% top 5 error ) - Only 5 million params! (Uses Avg pooling instead of FC layers)
  • 39. Convolutional Neural networks GoogLeNet VGG_model_A AlexNet updateOutput 130.76 162.74 27.65 updateGradInput 197.86 167.05 24.32 accGradParameters 142.15 199.49 28.99 Forward 130.76 162.74 27.65 Backward 340.01 366.54 53.31 TOTAL 470.77 529.29 80.96 Speed with Torch7 (using GeForce GTX TITAN X and CuDNN) … all time in milliseconds Compared to AlexNet, GoogLeNet has - 12x less params - 2x more compute - 6.67% (vs. 16.4%) Compared to VGGNet, GoogLeNet has - 36x less params - 22 layers (vs. 19) - 6.67% (vs. 7.3%) Reference: https://arxiv.org/pdf/1512.00567.pdf, https://github.com/soumith/convnet-benchmarks/blob/master/torch7/imagenet_winners/output.log
  • 40. Analysis of errors on GoogLeNet vs human on ImageNet dataset • Types of error that both GoogLeNet human are susceptible to: • Multiple objects (24% of GoogLeNet errors and 16% of human errors) • Incorrect annotations • Types of error that GoogLeNet is more susceptible to than human: • Object small or thin (21% of GoogLeNet errors) • Image filters, eg: distort contrast/color distribution (13% of GoogLeNet errors and only 1 human error) • Abstract representations, eg: shadow on the ground, of a child on a swing (6% GoogleNet errors) • Types of error that human is more susceptible to than GoogLeNet: • Fine-grained recognition, eg: species of dogs (7% of GoogLeNet errors and 37% of human errors) • Insufficient training data Reference: http://arxiv.org/abs/1409.0575
  • 42. New Lego brick (Residual block) Reference: http://torch.ch/blog/2016/02/04/resnets.html Shortcut to address underfitting due to vanishing gradients - Occurs even with batch normalization
  • 43. Convolutional Neural networks • ResNet Architecture • VGG style design => just deep • All 3x3 convolution • #Filter x2 • Other remarks: • no max pooling (almost) • no FC • no dropout • See https://github.com/facebook/fb.resnet.torch Reference: http://image-net.org/challenges/talks/ilsvrc2015_deep_residual_learning_kaiminghe.pdf
  • 44. Different abstractions for Deep Learning Deep Learning pipeline Deep Learning task Eg: CNN + classifier => Image captioning, Localization, … Deep Neural Network Eg: CNN, AlexNet, GoogLeNet, … Layer Eg: Convolution, Pooling, …
  • 45. Addressing other tasks … Reference: https://docs.google.com/presentation/d/1Q1CmVVnjVJM_9CDk3B8Y6MWCavZOtiKmOLQ0XB7s9Vg/edit#slide=id.g17e6880c10_0_926 SKIP THIS !!
  • 46. Addressing other tasks … SKIP THIS !! …
  • 47. How to train a Deep Neural Network ?
  • 48. Training a Deep Neural Network
  • 49. Training a Deep Neural Network “Forward propagation” Compute a function via composition of linear transformations followed by element-wise non-linearities “Backward propagation” Propagates errors backwards and update weights according to how much they contributed to the output Reference: “You Should Be Using Automatic Differentiation” by Ryan Adams (Twitter) Special case of “automatic differentiation” discussed in next slides
  • 50. Training a Deep Neural Network Training features: Training label: Goal: learn the weights Define a loss function: For numerical stability and mathematical simplicity, we use negative log-likelihood (often referred to as cross-entropy):
  • 51. • Using the loss function: , we learn weights by Training a Deep Neural Network • Learning is cast as optimization • Popular algorithm: Stochastic Gradient Descent • Needs to compute the gradients: • And initialization of weights (covered later):
  • 52. • Evaluate derivative of f(x) = sin(x – 3/x) at x = 0.01 • Symbolic differentiation • Symbolically differentiate the function as an expression, and evaluate it at the required point • Low speed + difficult to convert DNN into expressions • Symbolically, f’(x) = cos(x – 3/x)(1+ 3/x2 ) … at x=0.01 => -962.8192798 • Numerical differentiation • Use finite differences: • Generally bad numerical stability Methods for differentiating functions Reference: http://homes.cs.washington.edu/~naveenks/files/2009_Cranfield_PPT.pdf • Automatic/Algorithmic Differentiation (AD) • Mechanically calculates derivatives as functions expressed as computer programs, at machine precision, and with complexity guarantees - Barak Pearlmutter • Reverse-mode automatic differentiation used in practice
  • 53. Examples of AD in practice https://github.com/HIPS/autograd For Python and NumPy: See http://www.autodiff.org/ for more details For Torch (developed by Twitter cortex): https://github.com/twitter/torch-autograd/
  • 54. • Convert the algorithm into sequence of assignment of basic operations: Reverse-mode AD (how it works) https://justindomke.wordpress.com/2009/03/24/a-simple-explanation-of-reverse-mode-automatic-differentiation/ parents of • Apply chain rule: • Differentiate each basic operation f in the reverse order:
  • 55. Reverse-mode AD (how it works – NN) From Neural Network with Torch - Alex Wiltschko
  • 56. Reverse-mode AD (how it works – NN) From Neural Network with Torch - Alex Wiltschko
  • 57. • Normalize your data • Mini-batch instead of SGD (leverage matrix-matrix operations) • Use momentum • Use adaptive learning rates: • Adagrad: learning rates are scaled by the square root of the cumulative sum of squared gradients • RMSProp: instead of cumulative sum, use exponential moving average • Adam: essentially combines RMSProp with momentum • Debug your gradient using finite difference method Tricks of the Trade
  • 58. • Use momentum • Use adaptive learning rates: • Adagrad: learning rates are scaled by the square root of the cumulative sum of squared gradients • RMSProp: instead of cumulative sum, use exponential moving average • Adam: essentially combines RMSProp with momentum Tricks of the Trade
  • 59. • Initialization matters • Assume 10-layer FC network with tanh non-linearity Tricks of the Trade - Initialize with zero mean & 0.01 std dev - Does not work for deep networks Layer Number Layer mean Layer std dev - Initialize with zero mean & unit std dev - Almost all neurons completely saturated, either -1 and 1. Gradients will be all zero. Layer Number Layer mean Layer std dev
  • 60. • Initialization matters • Assume 10-layer FC network with tanh non-linearity Tricks of the Trade Xavier initialization [Glorot et al., 2010]: Layer Number Layer mean Layer std dev - Use zero-mean and 1/fan_in variance - Works well for tanh - But not for ReLU He al proposed replacing by Note: additional /2
  • 61. • Initialization matters • Assume 10-layer FC network with tanh non-linearity • Batch normalization reduces the strong dependence on initialization Tricks of the Trade
  • 62. Overview of existing deep learning stack
  • 63. Existing Deep Learning Stack Caffee, Theano , Torch7, TensorFlow, DeepLearning4J, SystemML* cuDNN Aparapi (converts bytecode to OpenCL) ~CPU’s BLAS/LAPACK: cuBLAS, MAGMA, CULA, cuSPARSE, cuSOLVER, cuRAND, etc CUDA (preferred if Nvidia GPUs) OpenCL (portable) Framework: Library with commonly used building blocks: Driver/Toolkit: Hardware Multicore, Task parallelism, Minimize latency (eg: Unsafe/DirectBuf/GC pauses/NIO) Data parallelism (single task), Cost of moving data from CPU to GPU (Kernel fusion ?), Maximize throughput. Rule of Thumb: Always use libraries !! Caffe (GPU) 11x but Caffe(cuDNN) 14x on AlexNet training (5 convolution + 3 connected layers) *Conditions apply: Unified memory model since CUDA 6
  • 64. Comparison of existing framework Core Lang Bindings CPU Single GPU Multi GPU Distributed Comments Caffe C++ Python, MatLab Yes Yes Yes See com.yahoo.ml. CaffeOnSpark Mostly for image classification, Models/Layers expressed in proto format Theano / PyLearn2 Python Yes Yes In Progress No Transparent use of GPU, Auto-diff, General purpose, Computation as DAG. Torch7 Lua Yes Yes Yes See Twitter’s torch-distlearn CTC impl on Torch7 of Baidu’s Deep Speech opensourced. Very efficient. TensorFlow C++ Python Yes Yes Upto 4 GPUs Not open- sourced Slower than Theano/Torch, TensorBoard useful, Computation as DAG DL4J Java Yes Yes Most likely Yes Supports GPUs via CUDA, Support for Hadoop/Spark SystemML Java Python, Scala Yes In Progress Not yet Yes Minerva/CXXN et (Smola) C++ Python Yes Yes Yes Yes https://github.com/dmlc. Minerva ~ Theano and CXXNet ~ Caffe

Notes de l'éditeur

  1. Computer Vision => ImageNet winners and extensions to related problems (localization, captioning, detection, segmentation, etc) Speech =>Deep Speech 2: Conv -> RNN -> FC -> CTC layer NLP =>Word vectors, Dynamic Memory Network, LSTM Unsupervised Learning => Autoencoder, Sparse Coding Reinforcement Learning => Q-learning