2. Presenter: Josh Patterson
Past
Research in Swarm Algorithms
Real-time optimization techniques in mesh sensor networks
TVA / NERC
Smartgrid, Sensor Collection, and Big Data
Cloudera
Today
Patterson Consulting
josh@pattersonconsultingtn.com
Skymind (Advisor)
josh@skymind.io / @jpatanooga
Architecture, Committer on DL4J
3. Topics
• What is Deep Learning?
• What is DL4J?
• Enterprise Grade Deep Learning Workflows
5. We Want to be able to recognize
Handwriting
This is a Hard Problem
6. Automated Feature Engineering
• Deep Learning can be thought of as workflows for automated
feature construction
– Where previously we’d consider each stage in the workflow as unique
technique
• Many of the techniques have been around for years
– But now are being chained together in a way that automates exotic
feature engineering
• As LeCun says:
– “machines that learn to represent the world”
7.
8.
9. These are the features learned at each neuron in a Restricted Boltzmann Machine
(RBMS)
These features are passed to higher levels of RBMs to learn more complicated things.
Part of the
“7” digit
11. Deep Learning Architectures
• Deep Belief Networks
– Most common architecture
• Convolutional Neural Networks
– State of the art in image classification
• Recurrent Networks
– Timeseries
• Recursive Networks
– Text / image
– Can break down scenes
13. DL4J
• “The Hadoop of Deep Learning”
– Command line driven
– Java, Scala, and Python APIs
– ASF 2.0 Licensed
• Java implementation
– Parallelization (Yarn, Spark)
– GPU support
• Also Supports multi-GPU per host
• Runtime Neutral
– Local
– Hadoop / YARN
– Spark
– AWS
• https://github.com/deeplearning4j/deeplearning4j
14. Issues in Machine Learning
• Data Gravity
– We need to process the data in workflows where the data lives
• If you move data you don’t have big data
– Even if the data is not “big” we still want simpler workflows
• Integration Issues
– Ingest, ETL, Vectorization, Modeling, Evaluation, and Deployment issues
– Most ML tools are built with previous generation architectures in mind
• Legacy Architectures
– Parallel iterative algorithm architectures are not common
15. DL4J Suite of Tools
• DL4J
– Main library for deep learning
• Canova
– Vectorization library
• ND4J
– Linear Algebra framework
– Swappable backends (JBLAS, GPUs):
• http://www.slideshare.net/agibsonccc/future-of-ai-on-the-jvm
• Arbiter
– Model evaluation and testing platform
17. DL4J and Parallelization
17
Model
Training Data
Worker 1
Master
Partial
Model
Global Model
Worker 2
Partial Model
Worker N
Partial
Model
Split 1 Split 2 Split 3
…
Traditional Serial Training Modern Parallel Engine
(Hadoop / Spark)
18. DL4J Spark / GPUs via API
public class SparkGpuExample {
public static void main(String[] args) throws Exception {
Nd4j.MAX_ELEMENTS_PER_SLICE = Integer.MAX_VALUE;
Nd4j.MAX_SLICES_TO_PRINT = Integer.MAX_VALUE;
// set to test mode
SparkConf sparkConf = new SparkConf()
.setMaster("local[*]").set(SparkDl4jMultiLayer.AVERAGE_EACH_ITERATION,"false")
.set("spark.akka.frameSize", "100")
.setAppName("mnist");
System.out.println("Setting up Spark Context...");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.momentum(0.9).iterations(10)
.weightInit(WeightInit.DISTRIBUTION).batchSize(10000)
.dist(new NormalDistribution(0, 1)).lossFunction(LossFunctions.LossFunction.RMSE_XENT)
.nIn(784).nOut(10).layer(new RBM())
.list(4).hiddenLayerSizes(600, 500, 400)
.override(3, new ClassifierOverride()).build();
System.out.println("Initializing network");
SparkDl4jMultiLayer master = new SparkDl4jMultiLayer(sc,conf);
DataSet d = new MnistDataSetIterator(60000,60000).next();
List<DataSet> next = d.asList();
JavaRDD<DataSet> data = sc.parallelize(next);
MultiLayerNetwork network2 = master.fitDataSet(data);
Evaluation evaluation = new Evaluation();
evaluation.eval(d.getLabels(),network2.output(d.getFeatureMatrix()));
System.out.println("Averaged once " + evaluation.stats());
INDArray params = network2.params();
Nd4j.writeTxt(params,"params.txt",",");
FileUtils.writeStringToFile(new File("conf.json"), network2.getLayerWiseConfigurations().toJson());
}
}
19. Turn on GPUs and Spark
<dependency>
<groupId>org.deeplearning4j</groupId>
<artifactId>dl4j-spark</artifactId>
<version>${dl4j.version}</version>
</dependency>
<dependency>
<groupId>org.nd4j</groupId>
<artifactId>nd4j-jcublas-7.0</artifactId>
<version>${nd4j.version}</version>
</dependency>
20. Building Deep Learning Workflows
• We need to get data from a raw format into a baseline raw
vector
– Model the data
– Evaluate the Model
• Traditionally these are all tied together in one tool
– But this is a monolithic pattern
– We’d like to apply the unix principles here
• The DL4J Suite of Tools lets us do this
21. Modeling UCI Data: Iris
• We need to vectorize the data
– Possibly with some per column transformations
– Let’s use Canova
• We then need to build a deep learning model over the data
– We’ll use the DL4J lib to do this
• Finally we’ll evaluate what happened
– This is where Arbiter comes in
22. Canova for Command Line Vectorization
• Library of tools to take
– Audio
– Video
– Image
– Text
– CSV data
• And convert the input data into vectors in a standardized format
– Adaptable with custom input/output formats
• Open Source, ASF 2.0 Licensed
– https://github.com/deeplearning4j/Canova
– Part of DL4J suite
23. Vectorization with Canova
• Setup the configuration file
– Input Formats
– Output Formats
– Setup data types to vectorize
• Setup the schema transforms for the input CSV data
• Generate the SVMLight vector data as the output
– with the command line interface
26. Model UCI Iris From CLI
./bin/canova vectorize -conf /tmp/iris_conf.txt
File path already exists, deleting the old file before proceeding...
Output vectors written to: /tmp/iris_svmlight.txt
./bin/dl4j train –conf /tmp/iris_conf.txt
[ …log output… ]
./bin/arbiter evaluate –conf /tmp/iris_conf.txt
[ …log output… ]
27. Skymind as DL4J Distribution
• Just as Redhat was to Linux
– A distribution of Linux with enterprise grade packaging
• Just as Cloudera/Horton are to Hadoop
– A distribution of Apache Hadoop with enterprise grade packaging
• Skymind is to DL4J
– A distribution of DL4J (+tool suite) with enterprise grade packaging
28. Questions?
Thank you for your time and attention
“Deep Learning: A Practitioner’s Approach”
(Oreilly, October 2015)
Notes de l'éditeur
we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
we plot the learned filter for each hidden neuron, one per column of W. Each filter is of the same dimension as the input data, and it is most useful to visualize the filters in the same way as the input data is visualized. In the cases of image patches, we show each filter as an image patch
POLR: Parallel Online Logistic Regression
Talking points:
wanted to start with a known tool to the hadoop community, with expected characteristics
Mahout’s SGD is well known, and so we used that as a base point
API as early adopter entry point
This is how we enable spark execution and gpu integration for the back end
Code is slightly different for Spark job
Code is same linear algebra, no changes, for math / nd4j impl
Next release: we’re expanding the userbase with a CLI front end to the whole thing
The domain expert rarely knows how to code
Want to make it easier for someone who knows “bash” to get involved (still will need some eng help)