2. – Nikhil Krishna
Hi!
My name is Nikhil and I am not a data scientist or a computer
scientist or any kind of scientist!
3. Agenda
What is Tensor Flow?
Tensor Flow computational model
Tensor Flow capabilities
What is not supported
Visualizing your model with Tensor Board
Building a simple image classifier with Tensor Flow
Installation and setup
Running the Image Classifier example
Tensor Flow Performance and Parallelism
3
4. What is Tensor Flow?
Tensor Flow is a powerful library for
doing large-scale numerical
computation using data flow graphs
Nodes in the graph represent
mathematical operations while the
edges represent multidimensional data
arrays (tensors)
Built by the Google Brain team for ML
and deep neural networks research
5. The computational model used in
Tensor Flow
Computations are represented as graphs
The nodes are called ops (operations) and can
take 0…n tensors as input and output 0…n
tensors
A tensor is a typed multi-dimensional array
The tensor flow graph is a description of
computations.
In order to compute anything the graph is launched
in a session which places the graph on devices
6. Tensor Flow Capabilities
It’s written in C/C++ but has strong python support. It integrates well with
iPython so it’s easy to use interactively
Tensor Flow can run CPU or GPU and on Desktop, Server and Mobile
(Android, iOS and even Raspberry Pi)
Flexibly assign compute elements of your graph to devices (CPU, GPU) and
let Tensor Flow handle the distribution of the copies
Easily setup a distributed cluster and distribute your graph across it.
7. Installing Tensor Flow
Multiple ways to do it – Docker, Anaconda, PIP and source code.
Anaconda seems to be a consistent way to do it that has the added
advantage of bundling other data manipulation and machine learning
packages like SciKit Learn, Numpy, etc
If you machine has NVidia GPU then you can leverage Tensor Flow on
GPU by installing the CUDA toolkit.
Recommend installing iPython as well. It’s a great way to explore the
Tensor Flow Library.
9. What’s not supported?
No Windows support :( - This is because Tensor Flow uses the Bazel build
system that does not support Windows. You can try with Docker images -
YMMV
Python and C++ API’s - the Python one being the primary API
Creating a Tensor Flow Cluster has a lot of manual steps at this point
10. Visualising your model with
Tensor Board
The tensor board is a visualization tool that can be used to visualize your
tensor flow graph, plot metrics of the graph execution and show additional
data like images flowing through the graph
The tensor board can be run either when the tensor flow graph is being
executed or after completion
The tensor board picks up the log data that has been generated by the
summary writer module when executing the tensor flow graph
12. How are we going to do image
classification?
The process of categorising a group of images while only using some basic
features that describe them.
Logistic regression, Support Vector Machines, Naive Bayes and Neural
Networks are common classification algorithms
We are going to use the Inception Convolutional Neural Network from
Google in our image classifier
13. Convolutional Neural Networks
At its most basic, convolutional neural networks can be thought of as a kind
of neural network that uses many identical copies of the same neuron.
Like in programming when we reuse code, CNN learns a neuron and use it
in many places making it easier to learn large models with smaller error.
14. Inception V3 Model
This is a CNN model that has been trained by Google on
1000 categories supplied by the ImageNet competition to
near human accuracy.
We will retrain the model (transfer learning) to help us classify
arbitrary image classifications
We are going to retrain the final layer of the classification.
This is possible because the CNN uses multiple layers to fine
tune classification.
15. Re-training Inception
Download the creative commons images of flowers and create
a directory structure with class names as sub-directories.
Run the retraining script. We can tweak the parameters to
reduce the time taken or increase the accuracy of the classifier
This script loads the pre-trained Inception v3 model, removes
the old final layer, and trains a new one on the flower photos.
17. Distributed Tensor Flow
A Tensor Flow ‘cluster’ is a set of ‘tasks’ that participate in the
distributed execution of a Tensor Flow graph.
Each task is associated to a Tensor Flow ‘server’ which contains a
‘master’ that can be used to create sessions and a ‘worker’ that
executes operations in the graph.
Each task typically runs on a separate machine but you can run
multiple tasks on the same machine.
19. The training accuracy shows the percentage of the images
used in the current training batch that were labeled with the
correct class.
Validation accuracy: The validation accuracy is the
precision (percentage of correctly-labelled images) on a
randomly-selected group of images from a different set.
Cross entropy is a loss function that gives a glimpse into
how well the learning process is progressing. (Lower
numbers are better here.)
20. Bottleneck' is an informal term for the layer just before the final
output layer that actually does the classification. This penultimate
layer has been trained to output a set of values that's good enough
for the classifier to use to distinguish between all the classes it's
been asked to recognize.
Because every image is reused multiple times during training and
calculating each bottleneck takes a significant amount of time, it
speeds things up to cache these bottleneck values on disk so they
don't have to be repeatedly recalculated. By default they're stored in
the /tmp/bottleneck directory, and if you rerun the script they'll be
reused so you don't have to wait for this part again.
21. So whats a classifier?
A classifier is a function that takes some data as input and assigns a label to
it as output
Supervised learning lets you write a classifier automatically
Getting good data and identifying features
22. It’s all about the data
There are certain publicly available datasets that are used for learning
TF Learn module has an API to download MNIST, IRIS, Boston Housing
datasets
Very useful for learning and understanding the concepts and quickly
bootstrap yourself.
We are going to look at MNIST and the IRIS datasets