Nick McClure gave an introduction to neural networks using Tensorflow. He explained the basic unit of neural networks as operational gates and how multiple gates can be combined. He discussed loss functions, learning rates, and activation functions. McClure also covered convolutional neural networks, recurrent neural networks, and applications such as image captioning and style transfer. He concluded by discussing resources for staying up to date with advances in machine learning.
4. @nfmcclure
Why Neural Networks?
• The human brain can be considered
to be one of the best processors.
(Estimated to contain ~1011
neurons.)
• Studies show that our brain can
process the equivalent of
~20Mb/sec just through the optical
nerves.
• If we can copy this design, maybe
we can solve the “hard for a
computer – easy for humans”
problems.
• Speech recognition
• Facial identification
• Reading emotions
• Recognizing images
• Sentiment analysis
• Driving a vehicle
• Disease diagnosis
5. @nfmcclure
The Basic Unit: Operational Gate
• F(x,y) = x * y
• How can we change the inputs to decrease the output?
• The intuitive answer is to decrease 3 and/or decrease 4.
• Mathematically, the answer is to use the derivative…
• We know if we increase the (3) input by 1, the output goes up
by a total of 4. (4*4=16)
*
3
4
12
6. @nfmcclure
Multiple Gates
• F(x, y, z) = (x + y) * z
• If G(a, b) = a + b, then F(x,y,z) = G(x, y) * z
• Or (change notation) F = G*z (we have solved this problem
before)
• Mathematically, we will use the “chain rule” to get the
derivatives:
+
2
3
20
4
*
𝑑𝐹
𝑑𝑥
=
𝑑𝐹
𝑑𝐺
∙
𝑑𝐺
𝑑𝑥
𝑑𝐹
𝑑𝑥
= 𝑧 ∙ 1
7. @nfmcclure
Loss Functions
• You can’t learn unless you know how bad/good your past
results have been.
• This is designated by a ‘Loss function’.
• The idea is we have a labeled data set and we look at each
data point. We run the data point forward through the
network and then see if we need to increase or decrease
the output.
• We then change the parameters according to the derivatives
in the network.
• We loop through this many times until we reach the lowest
error in our training.
9. @nfmcclure
Logistic Regression as a Neural
Network
• A neural network is very similar to what we have been
doing, except that we bound the outputs between 0 and 1…
with a sigmoid function. (Called an activation function)
• The sigmoid function is nice, because it turns out that the
derivative is related to itself:
• We can use this fact in our ‘chain rule’ for derivatives.
𝐹(𝑥, 𝑎, 𝑏) = 𝜎(𝑎𝑥 + 𝑏) σ(𝑥) =
1
1 + 𝑒−𝑥
𝑑𝜎
𝑑𝑥
= 𝜎(𝑥) ∙ (1 − 𝜎 𝑥 )
x
a
*
b
+ Output
x
1
Output
a
b
OR
https://github.com/nfmcclure/tensorflow_cookbook/tree/master/03_Linear_Regression/08_Implementing_Logistic_Regression
10. @nfmcclure
Why Activation Functions?
• Turns out, no matter how many additions and multiplications we pile
on, we will never be able to model non-linear outputs without having
non-linear transformations.
• The sigmoid function is just one example of a gate function.
• There is also the Rectified Linear Unit (ReLU):
• Softplus:
• Leaky ReLU:
• Where 0<a<<1
σ(𝑥) =
1
1 + 𝑒−𝑥
σ(𝑥) = max(𝑥, 0)
σ(𝑥) = ln(1 + 𝑒 𝑥
)
σ(𝑥) = max(𝑥, 𝑎𝑥)
11. @nfmcclure
Many Layers
• Neural networks can have many ‘hidden layers’.
• We can make them as deep as we want.
• Neural Networks can also have as many inputs or
outputs as we would like as well.
Depth of Neural Network
Fully Connected Layers
12. @nfmcclure
What is Tensorflow?
• An open source framework for creating
computational graphs.
• Computational graphs are actually extremely
flexible.
• Linear Regression (regular, multiple, logistic, lasso, ridge,
elasticnet,…)
• Support Vector Machines (any kernel)
• Nearest Neighbor methods
• Neural Networks
• ODE/PDE Solvers
15. @nfmcclure
How is the Tensorflow Community?
• Very active github community
• In approximately 6 months: >5k commits, >250 contributors
• Over 2,000 questions tagged on stackoverflow (>50%
answered within 24 hours)
17. @nfmcclure
A One Hidden Layer NN in Tensorflow
• Iris dataset: Predict Petal width from (Sepal length, Sepal
width, and petal length)
S.L.
S.W.
P.L.
bias
h1
h2
h3
h4
h5
Output
bias
15 weights + 5 biases 5 weights + 1 bias = 26 variables
Computational Graph
Placeholders
https://github.com/nfmcclure/tensorflow_cookbook/tree/master/06_Neural_Networks/04_Single_Hidden_Layer_Network
18. @nfmcclure
More in Tensorflow: Distributed
• Computational graphs can take very long to train.
• Quicker training across multiple machines.
• Just need a dictionary mapping jobs to network addresses:
job_spec = {“my_workers”: [“worker1.location.com:22”, “worker2.location.com:22”],
“other_workers”: [“gpu_worker.location.com:22”]}
tf.train.ClusterSpec(job_spec)
with tf.device(“/job:my_workers/task:0”):
# do task
with tf.device(“/job:other_workers/task:0”):
# do another task
• Scheduling and communication is done for you (in an
efficient and greedy manner)
19. @nfmcclure
More in Tensorflow: GPU Capabilities
• Sometimes neural networks can have hundreds of millions
of parameters to train.
• Graphic cards (GPUs) are built to handle matrix calculations
very fast and efficiently.
• It’s quite common to get 10x to 50x speedup when switching the
same code to a GPU
(1) Install GPU version of T.F.
(2) Done.
As a side note, you can also tell T.F. to use
specific devices.
/cpu:0
/gpu:0
/gpu:100
- The CPU of the machine.
- The GPU of the machine.
- The 100th GPU of the machine.
20. @nfmcclure
More in Tensorflow: Tensorboard
• A built in way to visualize the computational graph,
accuracies, loss functions, gradients…
• (1) Specify a specific way to write and store summaries to a
log file.
• (2) Point Tensorboard at logging directory, and navigate to
0.0.0.0:6006. (Can view during training).
tf.histogram_summary(‘variable_histogram’, variable_value)
tf.scalar_summary(‘loss_value’, loss)
merged = tf.merge_all_summaries()
my_writer = tf.train.SummaryWriter('my_logging_directory', sess.graph)
$tensorboard –logdir=my_logging_directory
21. @nfmcclure
More in Tensorflow: Tensorboard
Collapsable/Expandable Named Scopes!!
with tf.name_scope(‘hidden1’) as hidden_scope:
#Perform calculations.
22. @nfmcclure
More in Tensorflow: skflow
• Easier scikit learn type implementation of common
algorithms.
• 3 layer NN:
• RNN:
• Easy to add custom layers by adding in layer functions from
vanilla Tensorflow.
import tensorflow.contrib.learn as skflow
my_model = skflow.TensorFlowDNNClassifier(hidden_units=[N1, N2, N3], n_classes=n_out)
my_model.fit(x_data, y_target)
my_rnn = skflow.TensorFlowRNNClassifier(…)
23. @nfmcclure
What is a Convolutional Neural
Network?
• A CNN us a large pattern recognition neural
network that uses various parameter reducing
techniques.
• E.g. Alexnet (2012 ImageNet Winner)
24. @nfmcclure
Reduction of Parameters
• CNNs use tricks to reduce the amount of
parameters to fit.
• Convolution layers: Assume weights are the same on
arrows in the same direction of a window we move
across data.
w1
w2
w3
There are only three
weights in this layer, w1,
w2, and w3
Note: Usually multiple convolutions are performed, creating ‘Layers’.
25. @nfmcclure
Other Tricks- Pooling
• Layer that is an aggregation across a window from
prior layer.
• Types of pooling:
• Max, min, mean, median, …
Aggregation over
three nodes in
prior layer. E.g.
Max().
26. @nfmcclure
Other Tricks- Dropout
• Dropout is a way to increase the training strength and regularize
the parameters.
• During training, we randomly select activation functions and
make the output equal to zero.
• Compare this to ‘zoneout’, which randomly replaced activation
functions with prior iteration value.
• This helps the network find multiple pathways for prediction.
Because of this, the network will not rely on just one connection
or edge for prediction.
• This also helps the training of other parameters in the network.
27. @nfmcclure
Larger Network Shorthand
• Since neural networks can get quite large, we do
not want to write out all the connections nor
nodes.
• Here’s how to convey the meaning in a shorter, less
complicated way:
“Max of a set of features”
E.g., Here, it appears to be
a max of 2 features.
28. @nfmcclure
What is a Recurrent Neural Network?
• Recurrent Neural Networks (RNN) are networks that can
‘see’ outputs from prior layers. Very helpful for sequences.
• The most common usage is to predict the next letter from
the prior letter. We train this network on large text samples.
U,V,W = Weight Parameters
S = layer
X = input
O = output
29. @nfmcclure
Regional Convolutional Networks
• We can convolve a whole network over patches of the
image and see what regions score highly for a category.
(Object Detection)
https://www.youtube.com/watch?v=lr1qVxhGUDw
30. @nfmcclure
Regional CNN + Recurrent NN =
Image Captioning
• We take the object detection output and feed it through a
recurrent neural network to generate text describing the
image.
Microsoft Research 2015
31. @nfmcclure
Deep Dream (2015)
• We can select specific nodes in an image
recognition pipeline and up weight the output…
https://www.youtube.com/watch?v=DgPaCWJL7XI
http://www.fastcodesign.com/3057368/inside-googles-first-deepdream-art-show
32. @nfmcclure
Stylenet (2015)
• Train network on a “style image”. Try to reconstruct
another image from that same pretrained network.
https://vimeo.com/139123754
Recoloring of black and white photos
33. @nfmcclure
CNN for Text (2015)
• Zhang, X., et. al. released a paper (http://arxiv.org/abs/1509.01626) that
showed great results for treating strings as 1-D numerical vectors.
• Ways to get strings into numerical vectors:
• Word level one-hot encoding.
• Word2Vec encoding or similar (GloVe or Doc2Vec or …)
• Character level one-hot encoding.
• Can’t resize vectors like pictures, so instead we pick a max length and
pad many entries with zeros.
• Like many NN results, depend heavily on dataset and architecture.
• Paper shows good results for low # of classes (ratings or sentiment).
34. @nfmcclure
Parsey McParseface
• Google released a model called ‘Syntaxnet’ in May 2016,
based on Tensorflow.
• Syntaxnet is arguably one of the best part-of-speech parsers
available.
• They trained an English version, called ‘Parsey McParseface’.
• This is a free and open source model to use.
35. @nfmcclure
The Possibilities are Endless
• Predict age from facial photo - https://how-old.net
• Predict dog breed from photo - https://www.what-dog.net
• Inside a self driving car brain :
https://www.youtube.com/watch?v=ZJMtDRbqH40
• Memory networks: https://arxiv.org/pdf/1410.3916.pdf
• Frodo journeyed to Mount-Doom. Frodo dropped the ring there. Sauron died.
• Frodo went back to the Shire. Bilbo travelled to the Grey-havens.
• Where is the ring? A: Mount-Doom
• Where is Bilbo now? A: Grey-havens
• “Synthia”- Virtual driving school for self driving cars:
• http://www.gizmag.com/synthia-dataset-self-driving-cars/43895/
• Solving hand-written chalk board problems in real time:
• http://www.willforfang.com/computer-vision/2016/4/9/artificial-intelligence-for-
handwritten-mathematical-expression-evaluation
36. @nfmcclure
<Insert Name Here>-Neural Net
• Last Month (June-July 2016):
• YodaNN: http://arxiv.org/abs/1606.05487
• CMS-RCNN: http://arxiv.org/abs/1606.05413
• Zoneout RNN: https://arxiv.org/abs/1606.01305
• Pixel CNN: http://arxiv.org/abs/1606.05328
• Memory CNN: http://arxiv.org/abs/1606.05262
• DISCO Nets: http://arxiv.org/abs/1606.02556
• This Year (2016):
• Stochastic Depth Net: https://arxiv.org/abs/1603.09382
• Squeeze Net: https://arxiv.org/abs/1602.07360
• Binary Nets: http://arxiv.org/abs/1602.02830
• Last Year:
• PoseNet: http://arxiv.org/abs/1505.07427
• YOLO Nets: http://arxiv.org/abs/1506.02640
• Style Net: http://arxiv.org/abs/1508.06576
• Residual Net: https://arxiv.org/abs/1512.03385
• And so many more…
• What’s Your Architecture?
37. @nfmcclure
How to Keep ML Current
• Machine learning advances happen more and more
frequently.
• Can’t rely on journals that take multiple years to publish.
• Twitter
• http://arxiv.org/
• Others?
38. @nfmcclure
More Resources
• More on CNNs:
• https://www.reddit.com/r/MachineLearning/
• https://www.reddit.com/r/deepdream/
• http://karpathy.github.io/
• RNNs:
• http://colah.github.io/posts/2015-08-Understanding-LSTMs/
• Recent Research: (Jan 16-June 16)
• Residual Conv. Nets: https://www.youtube.com/watch?v=1PGLj-
uKT1w
• Determine location of a random image:
https://www.technologyreview.com/s/600889/google-unveils-
neural-network-with-superhuman-ability-to-determine-the-
location-of-almost/
• Stochastic Depth Networks: http://arxiv.org/abs/1603.09382
Pre-trained models: http://www.vlfeat.org/matconvnet/pretrained/