Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
Deep Learning with Apache
Flink and DeepLearning4J
Flink Forward 2016,
Berlin, Germany
Suneel Marthi
@suneelmarthi
About me
•Senior Principal Software Engineer, Office of Technology, Red Hat Inc.
•Member of the Apache Software Foundation...
Outline
● What is Deep Learning?
● Overview of DeepLearning4J Ecosystem
● Deep Learning Workflows
● ETL & Vectorization wi...
What is Deep Learning?
Handwriting Recognition
Face Recognition (Facebook)
Image Generation
Self-Driving Cars
DL has been very successful with Image Classification
Dogs v/s Cats
https://www.kaggle.com/c/dogs-vs-cats
● Deep Learning is a series of steps for automated feature extraction
o Based on techniques that have been around for seve...
“Deep learning will make you acceptable to the learned; but it is only
an obliging and easy behaviour, and entertaining co...
Popular Deep Neural Networks
● Deep Belief Networks
o Most popular architecture
● Convolutional Neural Networks
o Successf...
Deep Learning in Enterprise
● Ability to work with small and big data easily
o Don’t want to change tooling because we mov...
DeepLearning4J
● “The Hadoop of Deep Learning”
o Command line driven
o Java and Scala APIs
o ASF 2.0 Licensed
● Java implementation
o Par...
DL4J Suite of Tools
● DeepLearning4J
o Main library for deep learning
● DataVec
o Extract, Transform, Load (ETL) and Vecto...
DL4J: DataVec for Data Ingest and Vectorization
● Uses an Input/Output format
● Supports all major types of Input data (Te...
DL4J: ND4J
● Scientific computing library on JVM (think NumPy on JVM)
● Supports N-dimensional vector computations
● Suppo...
Learning Progressive Layers
Deep Learning Workflows
● Data Ingestion and storage.
● Data cleansing and transformation.
● Split the dataset into Training, Validation and Test ...
DL Model Building
● Build Deep Learning Network and Train with Training Data
● Parameter Averaging
● Test and Validate the...
Prediction and Scoring
Deployed Model used to make predictions against Streaming data
-- Streaming Predictors using Apache...
DL4J API Example
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
.seed(12345)
.iterations(1)
.optimiza...
Building Deep Learning Workflows
● Flexibility to build / apply the model
o Local
o AWS, Spark, Flink (WIP)
● Convert data...
Load Existing Models in DL4J
String jsonModelConfig = loadTextFileFromDisk( pathToModelJSON );
MultiLayerConfiguration con...
Vectorizing Data - Iris Data Set
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
7.0,3...
DataVec - Command Line Vectorization
● Library of tools to vectorize - Audio, Video, Image, Text, CSV, SVMLight
● Convert ...
Workflow Configuration (iris_conf.txt)
canova.input.header.skip=false
canova.input.statistics.debug.print=false
canova.inp...
Iris Canova Vector Schema
@RELATION UCIIrisDataset
@DELIMITER ,
@ATTRIBUTE sepallength NUMERIC !NORMALIZE
@ATTRIBUTE sepal...
Model Iris using Canova Command Line
./bin/canova vectorize -conf /tmp/iris_conf.txt
Output vectors written to: /tmp/iris_...
DL4J + Apache Flink
• Apache Flink support for Dl4J : DataVec (In progress)
• Streaming Predictors using Flink : Kafka (In progress)
• Possible
Present DL4J – Flink work in progress
• Support for DL4J : DataVec
• Streaming Predictions with Apache Flink
Future Work
•...
https://github.com/deeplearning4j
Credits
Skymind.io Team
• Adam Gibson
• Chris V. Nicholson
• Josh Patterson
Questions ???
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Suneel Marthi - Deep Learning with Apache Flink and DL4J
Prochain SlideShare
Chargement dans…5
×

Suneel Marthi - Deep Learning with Apache Flink and DL4J

2 754 vues

Publié le

http://flink-forward.org/kb_sessions/deep-learning-with-apache-flink-and-dl4j/

Deep Learning has become very popular over the last few years in areas such as Image Recognition, Fraud Detection, Machine Translation etc. Deep Learning has proved to be very useful in handling unstructured data and extracting value from them. A big challenge with having to build deep learning models was the high cost of training them. With the recent advent of distributed frameworks like Apache Flink, Apache Spark etc.. it’s faster to train Deep Learning models in parallel on modern platform architecture. In this talk, we’ll be showing how to use Apache Flink Streaming with the open source Deep Learning framework, DeepLearning4j to perform large scale deep learning model training. We will show a demo of a Recurrent Neural Net that is trained for language modeling and have it generate text.

Publié dans : Données & analyses

Suneel Marthi - Deep Learning with Apache Flink and DL4J

  1. 1. Deep Learning with Apache Flink and DeepLearning4J Flink Forward 2016, Berlin, Germany Suneel Marthi @suneelmarthi
  2. 2. About me •Senior Principal Software Engineer, Office of Technology, Red Hat Inc. •Member of the Apache Software Foundation •PMC member on Apache Mahout, Apache Pirk, Apache Incubator •PMC Chair, Apache Mahout (April 2015 - April 2016)
  3. 3. Outline ● What is Deep Learning? ● Overview of DeepLearning4J Ecosystem ● Deep Learning Workflows ● ETL & Vectorization with DataVec ● Apache Flink and DL4J
  4. 4. What is Deep Learning?
  5. 5. Handwriting Recognition Face Recognition (Facebook) Image Generation Self-Driving Cars
  6. 6. DL has been very successful with Image Classification Dogs v/s Cats https://www.kaggle.com/c/dogs-vs-cats
  7. 7. ● Deep Learning is a series of steps for automated feature extraction o Based on techniques that have been around for several years o Several techniques chained together to automate feature engineering o “Deep” due to several interconnected layers of nodes stacked together between the input and the output.
  8. 8. “Deep learning will make you acceptable to the learned; but it is only an obliging and easy behaviour, and entertaining conversation, that will make you agreeable to all companies” - James Burgh
  9. 9. Popular Deep Neural Networks ● Deep Belief Networks o Most popular architecture ● Convolutional Neural Networks o Successful in image classification ● Recurrent Networks o Time series Analysis o Sequence Modelling
  10. 10. Deep Learning in Enterprise ● Ability to work with small and big data easily o Don’t want to change tooling because we moved to Hadoop ● Ability to not get caught up in things like vectorization and ETL o Need to focus on better models o Understanding your data is very important ● Ability to experiment with lots of models
  11. 11. DeepLearning4J
  12. 12. ● “The Hadoop of Deep Learning” o Command line driven o Java and Scala APIs o ASF 2.0 Licensed ● Java implementation o Parallelization o GPU support  Support for multi-GPU per host ● Runtime Neutral o Local, Spark, Flink o AWS
  13. 13. DL4J Suite of Tools ● DeepLearning4J o Main library for deep learning ● DataVec o Extract, Transform, Load (ETL) and Vectorization library ● ND4J o Linear Algebra framework o Swappable backends (JBLAS, GPUs) o Think NumPy on the JVM ● Arbiter o Model evaluation, Hyperparameter Search and testing platform
  14. 14. DL4J: DataVec for Data Ingest and Vectorization ● Uses an Input/Output format ● Supports all major types of Input data (Text, Images, Audio, Video, SVMLight) ● Extensible for Specialized Input Formats ● Interfaces with Apache Kafka
  15. 15. DL4J: ND4J ● Scientific computing library on JVM (think NumPy on JVM) ● Supports N-dimensional vector computations ● Supports GPUs via CUDA and Native JBlas
  16. 16. Learning Progressive Layers
  17. 17. Deep Learning Workflows
  18. 18. ● Data Ingestion and storage. ● Data cleansing and transformation. ● Split the dataset into Training, Validation and Test Data sets - Apache Flink DataSet API for Data Ingestion and Transformation Data Ingestion and Munging
  19. 19. DL Model Building ● Build Deep Learning Network and Train with Training Data ● Parameter Averaging ● Test and Validate the Model ● Repeat until satisfied ● Persist and Deploy the Model in Production
  20. 20. Prediction and Scoring Deployed Model used to make predictions against Streaming data -- Streaming Predictors using Apache Flink DataStream API
  21. 21. DL4J API Example MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .seed(12345) .iterations(1) .optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT) .learningRate(0.05) .l2(0.001) .list(4) .layer(0, new DenseLayer.Builder().nIn(784).nOut(250) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(1, new DenseLayer.Builder().nIn(250).nOut(10) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(2, new DenseLayer.Builder().nIn(10).nOut(250) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").build()) .layer(3, new OutputLayer.Builder().nIn(250).nOut(784) .weightInit(WeightInit.XAVIER) .updater(Updater.ADAGRAD) .activation("relu").lossFunction(LossFunctions.LossFunction.MSE) .build()) .pretrain(false).backprop(true) .build();
  22. 22. Building Deep Learning Workflows ● Flexibility to build / apply the model o Local o AWS, Spark, Flink (WIP) ● Convert data from a raw format into a baseline raw vector o Model the data o Evaluate the Model ● Traditionally all of these are tied together in one tool o But this is a monolithic pattern
  23. 23. Load Existing Models in DL4J String jsonModelConfig = loadTextFileFromDisk( pathToModelJSON ); MultiLayerConfiguration configFromJson = MultiLayerConfiguration.fromJson( jsonModelConfig ); FSDataInputStream hdfsInputStream_ModelParams = hdfs.open(new Path( hdfsPathToModelParams )); try (DataInputStream dis = new DataInputStream( hdfsInputStream_ModelParams )) { INDArray newParams = Nd4j.read( dis ); } MultiLayerNetwork network = new MultiLayerNetwork( configFromJson ); network.init(); network.setParameters(newParams);
  24. 24. Vectorizing Data - Iris Data Set 5.1,3.5,1.4,0.2,Iris-setosa 4.9,3.0,1.4,0.2,Iris-setosa 4.7,3.2,1.3,0.2,Iris-setosa 7.0,3.2,4.7,1.4,Iris-versicolor vectorized to 0.0 1:0.1666666666666665 2:1.0 3:0.021276595744680823 4:0.0 0.0 1:0.08333333333333343 2:0.5833333333333334 3:0.021276595744680823 4:0.0 0.0 1:0.0 2:0.7500000000000002 3:0.0 4:0.0 1.0 1:0.9583333333333335 2:0.7500000000000002 3:0.723404255319149 4:0.5217391304347826
  25. 25. DataVec - Command Line Vectorization ● Library of tools to vectorize - Audio, Video, Image, Text, CSV, SVMLight ● Convert the input data into vectors in a standardized format (SVMLight, Text, CSV etc) o Adaptable with custom input/output formats ● Open Source, ASF 2.0 Licensed o https://github.com/deeplearning4j/DataVec o Part of DL4J suite
  26. 26. Workflow Configuration (iris_conf.txt) canova.input.header.skip=false canova.input.statistics.debug.print=false canova.input.format=org.canova.api.formats.input.impl.LineInputFormat canova.input.directory=src/test/resources/csv/data/uci_iris_sample.txt canova.input.vector.schema=src/test/resources/csv/schemas/uci/iris.txt canova.output.directory=/tmp/iris_unit_test_sample.txt canova.output.format=org.canova.api.formats.output.impl.SVMLightOutputFormat
  27. 27. Iris Canova Vector Schema @RELATION UCIIrisDataset @DELIMITER , @ATTRIBUTE sepallength NUMERIC !NORMALIZE @ATTRIBUTE sepalwidth NUMERIC !NORMALIZE @ATTRIBUTE petallength NUMERIC !NORMALIZE @ATTRIBUTE petalwidth NUMERIC !NORMALIZE @ATTRIBUTE class STRING !LABEL
  28. 28. Model Iris using Canova Command Line ./bin/canova vectorize -conf /tmp/iris_conf.txt Output vectors written to: /tmp/iris_svmlight.txt ./bin/dl4j train –conf /tmp/iris_conf.txt [ …log output… ] ./bin/arbiter evaluate –conf /tmp/iris_conf.txt [ …log output… ]
  29. 29. DL4J + Apache Flink
  30. 30. • Apache Flink support for Dl4J : DataVec (In progress) • Streaming Predictors using Flink : Kafka (In progress) • Possible
  31. 31. Present DL4J – Flink work in progress • Support for DL4J : DataVec • Streaming Predictions with Apache Flink Future Work • Flink support for DL4J: Arbiter for Hyperparameter Search • Flink support for DeepLearning4J to be able to build MultiLayer DL configurations.
  32. 32. https://github.com/deeplearning4j
  33. 33. Credits
  34. 34. Skymind.io Team • Adam Gibson • Chris V. Nicholson • Josh Patterson
  35. 35. Questions ???

×