Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
© 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved.
Aran Khanna, AI Engineer, AWS Deep Learning
Deep...
Amazon AI
What Do These Have in Common?
Deep Neural Networks
Inputs Outputs
…At The Edge
Inputs Outputs
Deep Neural Networks At The Edge
Overview
Motivating Problems in DL at the Edge
Why Apache MXNet
From the Metal To the Models With
MXNet
DL at the Edge wit...
Why The Edge, When We have the Cloud?
VS.
Why The Edge, When We have the Cloud?
Latency
VS.
Why The Edge, When We have the Cloud?
Latency
Connectivity
VS.
Why The Edge, When We have the Cloud?
Latency
Connectivity
Cost
VS.
Why The Edge, When We have the Cloud?
Latency
Connectivity
Cost
Privacy/Security
VS.
Motivating Examples
• Real Time Filtering (Neural Style Transfer)
Motivating Examples
• Industrial IoT (Out of Distribution/Anomaly Detection)
Motivating Examples
• Robotics (Object Detection and Recognition)
Motivating Examples
• Autonomous Driving Systems
Infrastructure GPU CPU IoT Mobile
Amazon AI : Artificial Intelligence In The Hands Of Every Developer
Engines MXNet Tensor...
Infrastructure GPU CPU IoT Mobile
Amazon AI : Artificial Intelligence In The Hands Of Every Developer
Engines MXNet Tensor...
Overview
Motivating Problems in DL at the Edge
Why Apache MXNet
From the Metal To the Models With
MXNet
DL at the Edge wit...
Deep Learning Frameworks
Flexible Portable Performance
Mixed Programming API Runs Everywhere Near Linear Scaling
Apache MXNet | Differentiators
Flexible Portable Performance
Mixed Programming API Runs Everywhere Near Linear Scaling
Apache MXNet | Differentiators
>>> import mxnet as mx
>>> a = mx.nd.zeros((100, 50))
>>> b = mx.nd.ones((100, 50))
>>> c = a + b
>>> c += 1
>>> print(c)
...
Flexible Portable Performance
Mixed Programming API Runs Everywhere Near Linear Scaling
Apache MXNet | Differentiators
Ideal
Inception v3
Resnet
Alexnet
88%
Efficiency
1 2 4 8 16 32 64 128 256
No. of GPUs
Apache MXNet | Efficient Scaling
Flexible Portable Performance
Mixed Programming API Runs Everywhere Near Linear Scaling
Apache MXNet | Differentiators
Apache MXNet | On Mobile Devices
https://mxnet.incubator.apache.org
/how_to/smart_device.html
mxnet.incubator.apache.org/get_started/install.html
Apache MXNet | On IoT Devices
Most
Open
Best On
AWS
Optimized for deep learning on
AWS
Accepted into the Apache Incubator
Apache MXNet | Community
35%
Outpacing
Contributors
Diverse Community
0 40,000
Yutian Li (Stanford)
Nan Zhu (MSFT)
Liang Depeng (Sun Yat-sen U.)
Xi...
Apache MXNet | Apple CoreML
pip install mxnet-to-coreml
Apache MXNet | Easy to Get Started
http://gluon.mxnet.io/
Overview
Motivating Problems in DL at the Edge
Why Apache MXNet
From the Metal To the Models With
MXNet
DL at the Edge wit...
What Are the Challenges at the Edge?
The Metal: Heterogeneity
In the Cloud
• X86_64
• CUDA GPU
The Metal: Heterogeneity
In the Cloud
• X86_64
• CUDA GPU
At the Edge
• X86_64, X86_32, ARM, Arch64, Android, iOS
• OpenCL...
The Metal: Performance Gap
Low End:
Raspberry Pi 3
- 32 Bit ARMv7
- ARM NEON
- 1GB Ram
High End:
NVIDIA Jetson
- ARM Arch6...
The Metal: The Problem
How Can We Adapt Our Models?
The Models: Where is Our Cost?
Convolutions are expensive
The Models: Where is Our Cost?
Models are generally over parameterized
Cheaper Convolutions: Winograd
Convolution in Time Domain = Pointwise Multiplication in Frequency Domain
Under the Hood in...
Cheaper Convolutions: Separable Convolutions
Good for devices that can’t run lots of multiplications in parallel
Convolve ...
Depth Separable Convolutions in MXNet
>>> import mxnet as mx
>>> x = mx.sym.Variable('x')
>>> w = mx.sym.Variable('w')
>>>...
Fewer Parameters: Quantization
Good for devices with hardware to accelerate low precision operations
Map activations into ...
Quantization in MXNet
>>> import mxnet as mx
>>> min0 = mx.nd.array([0.0])
>>> max0 = mx.nd.array([1.0])
>>> sym = mx.nd.a...
Fewer Parameters: Weight Pruning
Prune unused weights during training
Good at high sparsity for devices with fast sparse m...
Weight Pruning in MXNet
>>> # Assume we have defined a model and training data set
>>> model.fit(train,
>>> eval_data=val,...
Weight Pruning in MXNet
Fewer Parameters: Efficient Architectures
SqueezeNet: AlexNet Accuracy with 50x Fewer Parameters
Good for devices with low...
Efficient Architectures in MXNet
https://mxnet.incubator.apache.org/model_zoo/
Fewer Parameters: Tensor Decompositions
CVPR paper at arxiv.org/abs/1706.00439
Code at https://github.com/tensorly/tensorly
Table of Model Optimization Techniques
Winograd
Convolutions
Separable
Convolutions
Quantization Tensor
Contractions
Spars...
Edge Model Optimization Benefits The Cloud
Models with fewer parameters often
generalize better
Tricks from the edge can b...
Overview
Motivating Problems in DL at the Edge
Why Apache MXNet
From the Metal To the Models With
MXNet
DL at the Edge wit...
Tons of GPUs and CPUs
Serverless
At the Edge, On IoT Devices
Prediction
The Challenge For Artificial Intelligence: SCALE
T...
p2 instances
Up to 40k CUDA cores
Deep Learning AMI
Pre-configured for Deep Learning
CFN Template
Launch a Deep Learning C...
AWS Deep Learning AMI: One-Click Deep Learning
Kepler, Volta
& Skylake
Apache
MXNet
Python 2/3 Notebooks
& Examples
(and o...
https://aws.amazon.com/amazon-ai/amis/
AWS IoT and AWS Greengrass
Manage and Monitor Models on The Fly
AWS
Captured Data
Upload
Tagged
Data
Escalate to
AI Service
Escalate to
Custom
Model ...
Local Learning Loop
Poorly
Classified
Data
Updated
Model
Fine Tune Model With
Accurate Classification
Getting Started with MXNet at the Edge+ AWS IoT
http://amzn.to/2h6kPvY
Running AI In Production on AWS Today
We’re Hiring!
Thank You!
Aran Khanna – arankhan@amazon.com
GRT Intern
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017
Prochain SlideShare
Chargement dans…5
×

Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017

354 vues

Publié le

High Performance Deep Learning on Edge Devices With Apache MXNet:
Deep network based models are marked by an asymmetry between the large amount of compute power needed to train a model, and the relatively small amount of compute power needed to deploy a trained model for inference. This is particularly true in computer vision tasks such as object detection or image classification, where millions of labeled images and large numbers of GPUs are needed to produce an accurate model that can be deployed for inference on low powered devices with a single CPU. The challenge when deploying vision models on these low powered devices though, is getting inference to run efficiently enough to allow for near real time processing of a video stream. Fortunately Apache MXNet provides the tools to solve this issues, allowing users to create highly performant models with tools like separable convolutions, quantized weights and sparsity exploitation as well as providing custom hardware kernels to ensure inference calculations are accelerated to the maximum amount allowed by the hardware the model is being deployed on. This is demonstrated though a state of the art MXNet based vision network running in near real time on a low powered Raspberry Pi device. We finally discuss how running inference at the edge as well as leveraging MXNet’s efficient modeling tools can be used to massively drive down compute costs for deploying deep networks in a production system at scale.

Publié dans : Technologie
  • Identifiez-vous pour voir les commentaires

Aran Khanna, Software Engineer, Amazon Web Services at MLconf ATL 2017

  1. 1. © 2017, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Aran Khanna, AI Engineer, AWS Deep Learning Deep Learning at the Edge With Apache MXNet Amazon AI GRT Intern
  2. 2. Amazon AI
  3. 3. What Do These Have in Common?
  4. 4. Deep Neural Networks Inputs Outputs
  5. 5. …At The Edge Inputs Outputs
  6. 6. Deep Neural Networks At The Edge
  7. 7. Overview Motivating Problems in DL at the Edge Why Apache MXNet From the Metal To the Models With MXNet DL at the Edge with AWS
  8. 8. Why The Edge, When We have the Cloud? VS.
  9. 9. Why The Edge, When We have the Cloud? Latency VS.
  10. 10. Why The Edge, When We have the Cloud? Latency Connectivity VS.
  11. 11. Why The Edge, When We have the Cloud? Latency Connectivity Cost VS.
  12. 12. Why The Edge, When We have the Cloud? Latency Connectivity Cost Privacy/Security VS.
  13. 13. Motivating Examples • Real Time Filtering (Neural Style Transfer)
  14. 14. Motivating Examples • Industrial IoT (Out of Distribution/Anomaly Detection)
  15. 15. Motivating Examples • Robotics (Object Detection and Recognition)
  16. 16. Motivating Examples • Autonomous Driving Systems
  17. 17. Infrastructure GPU CPU IoT Mobile Amazon AI : Artificial Intelligence In The Hands Of Every Developer Engines MXNet TensorFlow Caffe Theano Pytorch CNTK Platforms Amazon ML Spark & EMR Kinesis Batch ECS Services Rekognition Polly ChatSpeechVision Lex
  18. 18. Infrastructure GPU CPU IoT Mobile Amazon AI : Artificial Intelligence In The Hands Of Every Developer Engines MXNet TensorFlow Caffe Theano Pytorch CNTK
  19. 19. Overview Motivating Problems in DL at the Edge Why Apache MXNet From the Metal To the Models With MXNet DL at the Edge with AWS
  20. 20. Deep Learning Frameworks
  21. 21. Flexible Portable Performance Mixed Programming API Runs Everywhere Near Linear Scaling Apache MXNet | Differentiators
  22. 22. Flexible Portable Performance Mixed Programming API Runs Everywhere Near Linear Scaling Apache MXNet | Differentiators
  23. 23. >>> import mxnet as mx >>> a = mx.nd.zeros((100, 50)) >>> b = mx.nd.ones((100, 50)) >>> c = a + b >>> c += 1 >>> print(c) IMPERATIVE NDARRAY API >>> import mxnet as mx >>> net = mx.symbol.Variable('data') >>> net = mx.symbol.FullyConnected(data=net, num_hidden=128) >>> net = mx.symbol.SoftmaxOutput(data=net) >>> texec = mx.module.Module(net) >>> texec.forward(data=c) >>> texec.backward() DECLARATIVE SYMBOLIC EXECUTOR Apache MXNet | Flexible Programming
  24. 24. Flexible Portable Performance Mixed Programming API Runs Everywhere Near Linear Scaling Apache MXNet | Differentiators
  25. 25. Ideal Inception v3 Resnet Alexnet 88% Efficiency 1 2 4 8 16 32 64 128 256 No. of GPUs Apache MXNet | Efficient Scaling
  26. 26. Flexible Portable Performance Mixed Programming API Runs Everywhere Near Linear Scaling Apache MXNet | Differentiators
  27. 27. Apache MXNet | On Mobile Devices https://mxnet.incubator.apache.org /how_to/smart_device.html
  28. 28. mxnet.incubator.apache.org/get_started/install.html Apache MXNet | On IoT Devices
  29. 29. Most Open Best On AWS Optimized for deep learning on AWS Accepted into the Apache Incubator Apache MXNet | Community
  30. 30. 35% Outpacing Contributors Diverse Community 0 40,000 Yutian Li (Stanford) Nan Zhu (MSFT) Liang Depeng (Sun Yat-sen U.) Xingjian Shi (HKUST) Tianjun Xiao (Tesla) Chiyuan Zhang (MIT) Yao Wang (AWS) Jian Guo (TuSimple) Yizhi Liu (Mediav) Sandeep K. (AWS) Sergey Kolychev (Whitehat) Eric Xie (AWS) Tianqi Chen (UW) Mu Li (AWS) Bing Su (Apple) *As of 3/30/17 **Amazon @35% of Contributions | Amazon Contributions | Torch, Theano, CNTK Apple, Tesla, Microsoft, NYU, MIT, Stanford, Lots of others.. | Apache MXNet | Community
  31. 31. Apache MXNet | Apple CoreML pip install mxnet-to-coreml
  32. 32. Apache MXNet | Easy to Get Started http://gluon.mxnet.io/
  33. 33. Overview Motivating Problems in DL at the Edge Why Apache MXNet From the Metal To the Models With MXNet DL at the Edge with AWS
  34. 34. What Are the Challenges at the Edge?
  35. 35. The Metal: Heterogeneity In the Cloud • X86_64 • CUDA GPU
  36. 36. The Metal: Heterogeneity In the Cloud • X86_64 • CUDA GPU At the Edge • X86_64, X86_32, ARM, Arch64, Android, iOS • OpenCL GPU, CUDA GPU, Metal GPU • NEON DSP, Hexagon DSP • Custom Accelerators, FPGA
  37. 37. The Metal: Performance Gap Low End: Raspberry Pi 3 - 32 Bit ARMv7 - ARM NEON - 1GB Ram High End: NVIDIA Jetson - ARM Arch64 - 128 CUDA Cores - 8GB RAM
  38. 38. The Metal: The Problem
  39. 39. How Can We Adapt Our Models?
  40. 40. The Models: Where is Our Cost? Convolutions are expensive
  41. 41. The Models: Where is Our Cost? Models are generally over parameterized
  42. 42. Cheaper Convolutions: Winograd Convolution in Time Domain = Pointwise Multiplication in Frequency Domain Under the Hood in MXNet with integrations in NNPACK, CUDA etc.
  43. 43. Cheaper Convolutions: Separable Convolutions Good for devices that can’t run lots of multiplications in parallel Convolve separately over each depth channel of input followed by 1x1 convolutions to merge channels
  44. 44. Depth Separable Convolutions in MXNet >>> import mxnet as mx >>> x = mx.sym.Variable('x') >>> w = mx.sym.Variable('w') >>> b = mx.sym.Variable('b') >>> xslice = mx.sym.SliceChannel(data=x, num_outputs=num_group, axis=1) >>> wslice = mx.sym.SliceChannel(data=w, num_outputs=num_group, axis=0) >>> bslice = mx.sym.SliceChannel(data=b, num_outputs=num_group, axis=0) >>> y_sep = mx.sym.Concat(*[mx.sym.Convolution(data=xslice[i], weight=wslice[i], bias=bslice[i], num_filter=num_filter//num_group, kernel=kernel, stride=stride, pad=pad) for i in range(num_group)]) >>> y = mx.sym.Convolution(data=x, weight=w, bias=b, num_filter=num_filter, num_group=num_group, kernel=kernel, stride=stride, pad=pad)
  45. 45. Fewer Parameters: Quantization Good for devices with hardware to accelerate low precision operations Map activations into lower bit-width buckets and multiply with quantized weights
  46. 46. Quantization in MXNet >>> import mxnet as mx >>> min0 = mx.nd.array([0.0]) >>> max0 = mx.nd.array([1.0]) >>> sym = mx.nd.array([[0.1392, 0.5928], [0.6027, 0.8579]] >>> quantized_sym, min1, max1 = mx.nd.contrib.quantize(a, min0, max0, out_type='uint8') >>> dequantized_sym = mx.nd.contrib.dequantize(quantized_sym, min1, max1, out_type='float32')
  47. 47. Fewer Parameters: Weight Pruning Prune unused weights during training Good at high sparsity for devices with fast sparse multiplication
  48. 48. Weight Pruning in MXNet >>> # Assume we have defined a model and training data set >>> model.fit(train, >>> eval_data=val, >>> eval_metric='acc', >>> num_epoch=10, >>> optimizer='sparsesgd’, >>> optimizer_params={'learning_rate' : 0.1, >>> 'wd' : 0.004, >>> 'momentum' : 0.9, >>> 'pruning_switch_epoch' : 5, >>> 'weight_sparsity' : 0.8, >>> 'bias_sparsity' : 0.0, >>> }
  49. 49. Weight Pruning in MXNet
  50. 50. Fewer Parameters: Efficient Architectures SqueezeNet: AlexNet Accuracy with 50x Fewer Parameters Good for devices with low RAM that can’t hold all weights for larger models concurrently in memory
  51. 51. Efficient Architectures in MXNet https://mxnet.incubator.apache.org/model_zoo/
  52. 52. Fewer Parameters: Tensor Decompositions CVPR paper at arxiv.org/abs/1706.00439 Code at https://github.com/tensorly/tensorly
  53. 53. Table of Model Optimization Techniques Winograd Convolutions Separable Convolutions Quantization Tensor Contractions Sparsity Exploitation Weight Sharing CPU Acceleration + ++ = ++ + + GPU Acceleration + + + + = + Model Size = = - - - - Model Accuracy = - - - - - Specialized Hardware Acceleration + + ++ + + +
  54. 54. Edge Model Optimization Benefits The Cloud Models with fewer parameters often generalize better Tricks from the edge can be applied in the cloud Pre-processing with edge models decreases compute load in the cloud
  55. 55. Overview Motivating Problems in DL at the Edge Why Apache MXNet From the Metal To the Models With MXNet DL at the Edge with AWS
  56. 56. Tons of GPUs and CPUs Serverless At the Edge, On IoT Devices Prediction The Challenge For Artificial Intelligence: SCALE Tons of GPUs Elastic capacity Training Pre-built images Aggressive migration New data created on AWS Data PBs of existing data
  57. 57. p2 instances Up to 40k CUDA cores Deep Learning AMI Pre-configured for Deep Learning CFN Template Launch a Deep Learning Cluster AWS Tools for Deep Learning
  58. 58. AWS Deep Learning AMI: One-Click Deep Learning Kepler, Volta & Skylake Apache MXNet Python 2/3 Notebooks & Examples (and others)
  59. 59. https://aws.amazon.com/amazon-ai/amis/
  60. 60. AWS IoT and AWS Greengrass
  61. 61. Manage and Monitor Models on The Fly AWS Captured Data Upload Tagged Data Escalate to AI Service Escalate to Custom Model on P2 Deploy and Manage Model
  62. 62. Local Learning Loop Poorly Classified Data Updated Model Fine Tune Model With Accurate Classification
  63. 63. Getting Started with MXNet at the Edge+ AWS IoT http://amzn.to/2h6kPvY
  64. 64. Running AI In Production on AWS Today
  65. 65. We’re Hiring!
  66. 66. Thank You! Aran Khanna – arankhan@amazon.com GRT Intern

×