Deep Learning has shown a tremendous success, yet it often requires a lot of effort to leverage its power. Existing Deep Learning frameworks require writing a lot of code to work with a model, let alone in a distributed manner.
This webinar is the first of a series in which we survey the state of Deep Learning at scale, and where we introduce the Deep Learning Pipelines, a new open-source package for Apache Spark. This package simplifies Deep Learning in three major ways:
1. It has a simple API that integrates well with enterprise Machine Learning pipelines.
2. It automatically scales out common Deep Learning patterns, thanks to Spark.
3. It enables exposing Deep Learning models through the familiar Spark APIs, such as MLlib and Spark SQL.
In this webinar, we will look at a complex problem of image classification, using Deep Learning and Spark. Using Deep Learning Pipelines, we will show:
* how to build deep learning models in a few lines of code;
* how to scale common tasks like transfer learning and prediction; and
* how to publish models in Spark SQL.
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
Build, Scale, and Deploy Deep Learning Pipelines with Ease
1. Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
Jules S. Damji (Spark Community Evangelist)
July 27, 2017
3. Logistics
• We can’t hear you…
• Recording will be available...
• Slides will be available...
• Queue up Questions ….
• Orange Button for Tech Support difficulties...
4. TEAM
About Databricks
Started Spark project (now Apache Spark) at UC Berkeleyin 2009
PRODUCT
Unified Analytics Platform
MISSION
Making Big Data Simple
5. Accelerate innovation by
unifying data science,
engineering and business.
Unified Analytics
Platform
UNIFIED
INFRASTRUCTURE
UNIFIED
EXPERIENCE
ACROSS TEAMS
UNIFIED
ANALYTIC
WORKFLOWS
7. About Us
• Sue Ann Hong
• Software engineer @ Databricks
• Ph.D. from CMU in Machine Learning
• Contributor to MLlib
• Author of Deep Learning Pipelines
8. About Us
• Tim Hunter
• Software engineer @ Databricks
• Ph.D. from UC Berkeley in Machine Learning
• Very early Spark user
• Contributor to MLlib
• Author of Deep Learning Pipelines, TensorFrames and
GraphFrames
9. Build, Scale, and Deploy Deep
Learning Pipelines with Ease
Tim Hunter (Software Engineer)
Sue Ann Hong (Software Engineer)
July 27, 2017
10. Today
• Deep Learning at scale made easy: the vision
• Processing images with DL Pipelines
• Building simple Deep Learning models with transfer learning
• Model deployment via SQL
Further advanced topics will be covered in our next webinar.
12. What is Deep Learning?
• A set of machine learning techniques that use layers that
transform numerical inputs
• Classification
• Regression
• Arbitrary mapping
• Popular in the 80’s as Neural Networks
• Recently came back thanks to advances in data collection,
computation techniques, and hardware.
13. Success of Deep Learning
• Tremendous success for applications with complex data
• AlphaGo
• Image interpretation
• Automatictranslation
• Speech recognition
14. But still requires a lot of effort
• Low level APIs with steep learning curve
• Tedious to distribute computations
• Not well integrated with other enterprise tools
• No exact science around deep learning
• Success requires many engineer-hours
15. Deep Learning in industry
• Currently limited adoption
• Huge potential beyond the industrial giants
• How do we accelerate the road to massive availability?
16. A typical Deep Learning workflow
• Load data (images, text, time series, …)
• Interactive work
• Train
• Select an architecture for a neural network
• Optimize the weights of the NN
• Evaluateresults, potentially re-train
• Apply:
• Pass the data through the NN to produce new features or output
17. How can Spark help?
• A lot of libraries available for Deep Learning in Spark
• TensorFlowOnSpark, BigDL, …
• Goes from simple to very advanced
• See our previous webinar for more detail
• Spark is great at scaling out computations
• Distribute the transforms
• Manage the trainingcomputation
• Spark MLlib Pipelines
• Simple, concise APIto capture the ML workflow
18. Deep Learning Pipelines:
Deep Learning with Simplicity
• Open-source Databricks library:
https://github.com/databricks/spark-deep-learning
• Focuses on easeof useand integration,without sacrificing
performance
• Scales out common tasks
• Integrates with Spark APIs
• Primary language: Python
19. Deep Learning Pipelines
• Load data
• Interactive work
• Train
• Evaluate model
• Apply
• Image loading in Spark
• Deploying models in SQL
• Transfer learning
• Distributed tuning
• Distributed prediction
• Pre-trained models
This
webinar:
✓
✓
✓
✓
21. Adds support for images in Spark
• ImageSchema, reader, conversion functions to/from numpy
arrays
• Most of the tools we’ll describe work on ImageSchema columns
from sparkdl import readImages
image_df = readImages(sample_img_dir)
22. Applying popular models
• Popular pre-trained models accessible through MLlib
Transformers
predictor = DeepImagePredictor(inputCol="image",
outputCol="predicted_labels",
modelName="InceptionV3")
predictions_df = predictor.transform(image_df)
33. MLlib primer
• MLlib: the machine learning library included with Spark
• Transformer
• Transforms the data: takes a Spark dataframe and appends a new column
• Estimator
• Produces a model (fit)
• Pipeline: sequence of transformers and estimators
34. Transfer Learning as a Pipeline
MLlib Pipeline
Image
Loading Preprocessing
Logistic
Regression
DeepImageFeaturizer
38. Shipping predictors in SQL
Take a trained model / Pipeline, register a SQL UDF usable by
anyone in the organization
In Spark SQL:
registerKerasUDF(”my_object_recognition_function",
keras_model_file="/mymodels/007model.h5")
select image, my_object_recognition_function(image) as objects
from traffic_imgs
40. Deep Learning without Deep Pockets
• Simple API for Deep Learning, integrated with MLlib
• Scales common tasks with transformers and estimators
• Embeds Deep Learning models in MLlib and SparkSQL
• Early release of Deep Learning Pipelines
https://github.com/databricks/spark-deep-learning
41. Deep Learning Pipelines - future
In progress
• Hyper-parameter tuning for Keras models
• Official image support in Spark
Potential future work
• Scala API
• Text models
• Support for more backends, e.g. MXNet, PyTorch, BigDL
42. Resources
Blog posts & webinars — http://databricks.com/blog
• Deep Learning Pipelines
• GPU acceleration in Databricks
• BigDL on Databricks
• Deep Learning and Apache Spark
Docs for Deep Learning on Databricks — http://docs.databricks.com
• Getting started
• Deep Learning Pipelines Example
• Spark integration