SlideShare a Scribd company logo
1 of 82
Download to read offline
© 2019 Synopsys
5+ Techniques for Efficient
Implementation of Neural
Networks
Bert Moons
Synopsys
May 2019
© 2019 Synopsys
Introduction --
Challenges of Embedding
Deep Learning
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
Many embedded applications require support for a variety of
networks: CNN’s in feature extraction, RNN’s in sequence modeling
© 2019 Synopsys
3 Major challenges
Introduction – Challenges of embedded deep learning
1. Many operations per pixel
2. Process a lot of pixels in real-time
3. A large variation of different algorithms
© 2019 Synopsys
Classification accuracy comes at a cost
Introduction – Challenges of embedded deep learning
Conventional Machine
Learning
Deep
Learning
Human
Bestreportedtop-5accuracy
onIMAGENET-1000[%]
Neural network accuracy comes at a cost of a high workload per input pixel
and huge model sizes and bandwidth requirements
© 2019 Synopsys
Computing on large input data
Introduction – Challenges of embedded deep learning
4KFHDIMAGENET
1X 40X 160X
Embedded applications require
real-time operation on large input frames
© 2019 Synopsys
Massive workload in real-time applications
Introduction – Challenges of embedded deep learning
1GOP 1TOP
Top-1IMAGENETaccuracy[%]
70
75
65
Single
ImageNet
Image
1GOP to 10GOP per IMAGENET image
# operations / ImageNet image1GOP/s 1TOP/s
Top-1IMAGENETaccuracy[%]
70
75
65
6 Cameras
30fps
Full HD
Image
5-to-180 TOPS @ 30 fps, FHD, ADAS
# operations / second
MobileNet V2
ResNet-50
GoogleNet
VGG-16
© 2019 Synopsys
5+ Techniques to reduce the DNN workload
A. Neural Networks are
error-tolerant
Introduction – Challenges of embedded deep learning
1. Linear post-training 8/12/16b quantization
2. Linear trained 2/4/8 bit quantization
3. Non-linear trained 2/4/8 bit quantization
through clustering
C. Neural Networks have
sparse and correlated
intermediate results
B. Neural Networks have
redundancies and
are over-dimensioned
4. Network pruning and compression
5. Network decomposition: low-rank network
approximations
6. Sparsity and correlation based feature map
compression
© 2019 Synopsys
A. Neural Networks Are Error-
Tolerant
9
© 2019 Synopsys
The benefits of quantized number representations
5 Techniques – A. Neural Networks Are Error-Tolerant
8 bit fixed is 3-4x faster, 2-4x more efficient than 16b floating point
Energy consumption
per unit
Processing
units per chip
Classification
time per chip
* [Choi,2019]
16b float 8b fixed
O(1)
relative fps
O(16)
relative fps
4b fixed
O(256)
relative fps
~ 16 ~ 6-8 ~ 2-4
Relative accuracy 100%
no loss
99% 50-95%
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Convert floating point pretrained models to Dynamic Fixed Point
0 1 1 0 1
1 0 0 1 1 1 0 1 0 1
0 1 1 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 1 1
1 1 1 0 0 1 0 1 0 0
0 1 1 0 1
1 0 0 1 1 1 0 1 0 1
0 1 1 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 1 1
1 1 1 0 0 1 0 1 0 0
Fixed Point Dynamic Fixed Point
0 1 1 0 1System Exponent Group 1 Exponent
Group 2 Exponent
* [Courbariaux,2019]
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Dynamic Fixed-Point Quantization allows running neural networks with 8
bit weights and activations across the board
32 bit float baseline 8 bit fixed point
* [Nvidia,2017]
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
How to optimally choose: dynamic exponent groups, saturation
thresholds, weight and activation exponents?
Min-max scaling throws away small values A saturation threshold better represents
small values, but clips large values
* [Nvidia,2017]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop +
Train saturation range
for activations
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
*
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Good accuracy down to 2b
Graceful performance degradation
0.85
0.9
0.95
1
1.05
CIFAR10 SVHN AlexNet ResNet18 ResNet50
full precision 5b 4b 3b 2b
* [Choi,2018]
Relativebenchmark
accuracyvsfloatbaseline*
© 2019 Synopsys
Non-linear trained quantization – codebook clustering
5 Techniques – A. Neural Networks Are Error-Tolerant
Clustered, codebook quantization can be optimally trained.
This only reduces bandwidth, computations are still in floating point.
* [Han,2015]
© 2019 Synopsys
B. Neural Networks Are Over-
Dimensioned & Redundant
17
© 2019 Synopsys
Pruning Neural Networks
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Pruning removes unnecessary connections in the neural network. Accuracy
is recovered through retraining the pruned network
* [Han,2015]
© 2019 Synopsys
Low Rank Singular Value Decomposition (SVD) in DNNs
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Many singular values are small and can be discarded
* [Xue,2013]
𝑨 = 𝑼 𝜮 𝑽 𝑻
𝑨 ≅ 𝑼′𝜮′𝑽′ 𝑻 = 𝑵𝑼
© 2019 Synopsys
Low Rank Canonical Polyadic (CP) decomp. in CNNs
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Convert a large convolutional filter in a triplet of smaller filters
* [Astrid,2017]
© 2019 Synopsys
Basic example: Combining SVD, pruning and clustering
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
11x model compression in a phone-recognition LSTM
0
2
4
6
8
10
12
Base P SVD SVD+P P+C SVD+P+C
LSTMCompressionRate
(P)
(C)
* [Goetschalckx,2018]
P = Pruning
SVD = Singular Value Decomposition
C = Clustering / Codebook Compression
© 2019 Synopsys
C. Neural Networks have
Sparse & Correlated
Intermediate Results
22
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Feature map bandwidth dominates in modern CNNs
0
2
4
6
8
10
12
Coefficient
BW
Feature Map
BW
BWinMObileNet-V1[MB]
1x1, 32
3x3DW, 32
1x1, 64
32
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
ReLU activation introduces 50-90% zero-valued numbers in intermediate
feature maps
-5 4 12
-10 0 17
-1 3 2
0 4 12
0 0 17
0 3 2
ReLU activation
8b Features 8b Features
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Hardware support for multi-bit Huffman encoding allows up to 2x
bandwidth reduction in typical networks.
Zero-runlength encoding as in [Chen, 2016],
Huffman-encoding as in [Moons, 2017]
0 4 12
0 0 17
0 3 2
8b Features
72b
Huffman Features
zero 2’b00
<16 2’b01, 4’b WORD
nonzero 1’b1, 8’b WORD
41b < 72b
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Intermediate features in the same channel-plane are highly correlated
Intermediate featuremaps in ReLU-less YOLOV2
An example featuremap
scale1
An example featuremap
scale9
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Super-linear correlation based extended bit-plane compression allows
feature-map compression even on non-sparse data
* [Cavigelli,2018]
0,4,0,0,0,8,0,0,71
Zero Values
Non-Zero
Values
Correlated Values: 16, 20, 20, 20, 28, 28, 28, 99
Delta Values: 0, 4, 0, 0, 8, 0, 0, 71
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Correlated compression outperforms sparsity-based compression
0
0.5
1
1.5
2
2.5
3
Mobilenet ResNet-50 Yolo V2 VOC VGG-16
CompressionRate
Sparsity-based Correlation-based
© 2019 Synopsys
Conclusion –
Bringing it All Together
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
A first-order energy model for Neural Network Inference
Assume:
• Quadratic energy scaling / MAC
when going from 32 to 8 bit.
• Linear energy saving / read-write in DDR/SRAM
when going from 32 to 8 bit
• 50% of coefficients zero when pruning
• 50% compute reduction under decomposition
• 50% of activations can be compressed
*[Han,2015]
*
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
When all model data is stored in DRAM optimized ResNet-50 is 10x
more efficient than its plain 32b counterpart
O(65MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame
10x
100%
22%
16%
11%
0%
20%
40%
60%
80%
100%
120%
32b float A. 8b fixed B. Decomposition
+ Pruning
C. Featuremap
Compression
RelativeEnergy
Consumption
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
In a system with sufficient on-chip SRAM, optimized ResNet-50 is 12.5x
more efficient than its plain 32b counterpart
O(0MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame
100%
15%
9% 8%
0%
20%
40%
60%
80%
100%
120%
32b float A. 8b fixed B. Decomposition
+ Pruning
C. Featuremap
Compression
RelativeEnergy
Consumption
12.5x
© 2019 Synopsys
For More Information
Visit the Synopsys booth for
demos on Automotive ADAS,
Virtual Reality & More
33
EV6x Embedded Vision
Processor IP with Safety
Enhancement Package
• Thursday, May 23
• Santa Clara Convention Center
• Doors open 8 AM
• Sessions on EV6x Vision Processor IP, Functional Safety, Security, OpenVX…
• Register via the EV Alliance website or at Synopsys Booth
Join Synopsys’ EV Seminar on Thursday
Navigating Embedded Vision at the Edge
B E S T P R O C E S S O R
© 2019 Synopsys
References
34
[Han,2015,2016]
https://arxiv.org/abs/1510.00149
https://arxiv.org/abs/1602.01528
[Xue, 2013]
https://www.microsoft.com/en-us/research/wp-
content/uploads/2013/01/svd_v2.pdf
[Nvidia, 2017]
http://on-demand.gputechconf.com/gtc
/2017/presentation/s7310-8-bit-inference-with-
tensorrt.pdf
[Choi, 2018, 2019]
https://arxiv.org/abs/1805.06085
https://www.ibm.com/blogs/research/2019/04/2-bit-
precision/
[Goetschalckx, 2018]
https://www.sigmobile.org/mobisys/2018/workshops/
deepmobile18/papers/Efficiently_Combining_SVD_Pru
ning_Clustering_Retraining.pdf
[Astrid,2017]
https://arxiv.org/abs/1701.07148
[Moons,2017]
https://ieeexplore.ieee.org/abstract/document/78703
53
[Chen,2016]
http://eyeriss.mit.edu/
[Cavigelli, 2018]
https://arxiv.org/abs/1810.03979
[Courbariaux, 2014]
https://arxiv.org/pdf/1412.7024.pdf
Embedded Vision Summit
Bert Moons --
5+ Techniques for Efficient Implementations
of Neural Networks
May 2019
© 2019 Synopsys
THANK YOU
35
© 2019 Synopsys
5+ Techniques for Efficient
Implementation of Neural
Networks
Bert Moons
Synopsys
May 2019
© 2019 Synopsys
Introduction --
Challenges of Embedding
Deep Learning
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
Many embedded applications require support for a variety of
networks: CNN’s in feature extraction, RNN’s in sequence modeling
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
Many embedded applications require support for a variety of
networks: CNN’s in feature extraction, RNN’s in sequence modeling
1. Many operations per pixel
2. Process a lot of pixels in real-time
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
Many embedded applications require support for a variety of
networks: CNN’s in feature extraction, RNN’s in sequence modeling
1. Many operations per pixel
2. Process a lot of pixels in real-time
3. A large variation of different algorithms
© 2019 Synopsys
Neural Network accuracy comes at a high cost in terms of model
storage and operations per input feature
3 Major challenges
Introduction – Challenges of embedded deep learning
Many embedded applications require real-time operation on high-
dimensional, large input data from various input sources
Many embedded applications require support for a variety of
networks: CNN’s in feature extraction, RNN’s in sequence modeling
1. Many operations per pixel
2. Process a lot of pixels in real-time
3. A large variation of different algorithms
© 2019 Synopsys
Classification accuracy comes at a cost
Introduction – Challenges of embedded deep learning
Conventional Machine
Learning
Deep
Learning
Human
Bestreportedtop-5accuracy
onIMAGENET-1000[%]
Neural network accuracy comes at a cost of a high workload per input pixel
and huge model sizes and bandwidth requirements
© 2019 Synopsys
Computing on large input data
Introduction – Challenges of embedded deep learning
4KFHDIMAGENET
1X 40X 160X
Embedded applications require
real-time operation on large input frames
© 2019 Synopsys
Massive workload in real-time applications
Introduction – Challenges of embedded deep learning
1GOP 1TOP
Top-1IMAGENETaccuracy[%]
70
75
65
Single
ImageNet
Image
1GOP to 10GOP per IMAGENET image
# operations / ImageNet image
MobileNet V2
ResNet-50
GoogleNet
VGG-16
© 2019 Synopsys
Massive workload in real-time applications
Introduction – Challenges of embedded deep learning
1GOP 1TOP
Top-1IMAGENETaccuracy[%]
70
75
65
Single
ImageNet
Image
1GOP to 10GOP per IMAGENET image
# operations / ImageNet image1GOP/s 1TOP/s
Top-1IMAGENETaccuracy[%]
70
75
65
6 Cameras
30fps
Full HD
Image
5-to-180 TOPS @ 30 fps, FHD, ADAS
# operations / second
MobileNet V2
ResNet-50
GoogleNet
VGG-16
© 2019 Synopsys
5+ Techniques to reduce the DNN workload
A. Neural Networks are
error-tolerant
Introduction – Challenges of embedded deep learning
C. Neural Networks have
sparse and correlated
intermediate results
B. Neural Networks have
redundancies and
are over-dimensioned
© 2019 Synopsys
5+ Techniques to reduce the DNN workload
A. Neural Networks are
error-tolerant
Introduction – Challenges of embedded deep learning
1. Linear post-training 8/12/16b quantization
2. Linear trained 2/4/8 bit quantization
3. Non-linear trained 2/4/8 bit quantization
through clustering
C. Neural Networks have
sparse and correlated
intermediate results
B. Neural Networks have
redundancies and
are over-dimensioned
© 2019 Synopsys
5+ Techniques to reduce the DNN workload
A. Neural Networks are
error-tolerant
Introduction – Challenges of embedded deep learning
1. Linear post-training 8/12/16b quantization
2. Linear trained 2/4/8 bit quantization
3. Non-linear trained 2/4/8 bit quantization
through clustering
C. Neural Networks have
sparse and correlated
intermediate results
B. Neural Networks have
redundancies and
are over-dimensioned
4. Network pruning and compression
5. Network decomposition: low-rank network
approximations
© 2019 Synopsys
5+ Techniques to reduce the DNN workload
A. Neural Networks are
error-tolerant
Introduction – Challenges of embedded deep learning
1. Linear post-training 8/12/16b quantization
2. Linear trained 2/4/8 bit quantization
3. Non-linear trained 2/4/8 bit quantization
through clustering
C. Neural Networks have
sparse and correlated
intermediate results
B. Neural Networks have
redundancies and
are over-dimensioned
4. Network pruning and compression
5. Network decomposition: low-rank network
approximations
6. Sparsity and correlation based feature map
compression
© 2019 Synopsys
A. Neural Networks Are Error-
Tolerant
52
© 2019 Synopsys
The benefits of quantized number representations
5 Techniques – A. Neural Networks Are Error-Tolerant
8 bit fixed is 3-4x faster, 2-4x more efficient than 16b floating point
Energy consumption
per unit
Processing
units per chip
Classification
time per chip
* [Choi,2019]
16b float 8b fixed
O(1)
relative fps
O(16)
relative fps
4b fixed
O(256)
relative fps
~ 16 ~ 6-8 ~ 2-4
Relative accuracy 100%
no loss
99% 50-95%
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Convert floating point pretrained models to Dynamic Fixed Point
0 1 1 0 1
1 0 0 1 1 1 0 1 0 1
0 1 1 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 1 1
1 1 1 0 0 1 0 1 0 0
0 1 1 0 1
1 0 0 1 1 1 0 1 0 1
0 1 1 1 0 0 0 0 1 0
0 0 1 0 1 1 0 0 1 1
1 1 1 0 0 1 0 1 0 0
Fixed Point Dynamic Fixed Point
0 1 1 0 1System Exponent Group 1 Exponent
Group 2 Exponent
* [Courbariaux,2019]
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Dynamic Fixed-Point Quantization allows running neural networks with 8
bit weights and activations across the board
32 bit float baseline 8 bit fixed point
* [Nvidia,2017]
© 2019 Synopsys
Linear post-training quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
How to optimally choose: dynamic exponent groups, saturation
thresholds, weight and activation exponents?
Min-max scaling throws away small values A saturation threshold better represents
small values, but clips large values
* [Nvidia,2017]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop +
Train saturation range
for activations
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop +
Train saturation range
for activations
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop +
Train saturation range
for activations
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Floating point models are a bad initializer for low-precision fixed-point.
Trained quantization from scratch automates heuristic-based optimization.
Quantizing weights and activations with straight-
through estimators, allowing back-prop +
Train saturation range
for activations
Forward Backward
* PACT, Parametrized Clipping
Activation [Choi,2019]
*
© 2019 Synopsys
Linear trained quantization
5 Techniques – A. Neural Networks Are Error-Tolerant
Good accuracy down to 2b
Graceful performance degradation
0.85
0.9
0.95
1
1.05
CIFAR10 SVHN AlexNet ResNet18 ResNet50
full precision 5b 4b 3b 2b
* [Choi,2018]
Relativebenchmark
accuracyvsfloatbaseline*
© 2019 Synopsys
Non-linear trained quantization – codebook clustering
5 Techniques – A. Neural Networks Are Error-Tolerant
Clustered, codebook quantization can be optimally trained.
This only reduces bandwidth, computations are still in floating point.
* [Han,2015]
© 2019 Synopsys
B. Neural Networks Are Over-
Dimensioned & Redundant
64
© 2019 Synopsys
Pruning Neural Networks
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Pruning removes unnecessary connections in the neural network. Accuracy
is recovered through retraining the pruned network
* [Han,2015]
© 2019 Synopsys
Low Rank Singular Value Decomposition (SVD) in DNNs
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Many singular values are small and can be discarded
* [Xue,2013]
𝑨 = 𝑼 𝜮 𝑽 𝑻
𝑨 ≅ 𝑼′𝜮′𝑽′ 𝑻 = 𝑵𝑼
© 2019 Synopsys
Low Rank Canonical Polyadic (CP) decomp. in CNNs
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
Convert a large convolutional filter in a triplet of smaller filters
* [Astrid,2017]
© 2019 Synopsys
Basic example: Combining SVD, pruning and clustering
5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant
11x model compression in a phone-recognition LSTM
0
2
4
6
8
10
12
Base P SVD SVD+P P+C SVD+P+C
LSTMCompressionRate
(P)
(C)
* [Goetschalckx,2018]
P = Pruning
SVD = Singular Value Decomposition
C = Clustering / Codebook Compression
© 2019 Synopsys
C. Neural Networks have
Sparse & Correlated
Intermediate Results
69
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Feature map bandwidth dominates in modern CNNs
0
2
4
6
8
10
12
Coefficient
BW
Feature Map
BW
BWinMObileNet-V1[MB]
1x1, 32
3x3DW, 32
1x1, 64
32
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
ReLU activation introduces 50-90% zero-valued numbers in intermediate
feature maps
-5 4 12
-10 0 17
-1 3 2
0 4 12
0 0 17
0 3 2
ReLU activation
8b Features 8b Features
© 2019 Synopsys
Sparse feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Hardware support for multi-bit Huffman encoding allows up to 2x
bandwidth reduction in typical networks.
Zero-runlength encoding as in [Chen, 2016],
Huffman-encoding as in [Moons, 2017]
0 4 12
0 0 17
0 3 2
8b Features
72b
Huffman Features
zero 2’b00
<16 2’b01, 4’b WORD
nonzero 1’b1, 8’b WORD
41b < 72b
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Intermediate features in the same channel-plane are highly correlated
Intermediate featuremaps in ReLU-less YOLOV2
An example featuremap
scale1
An example featuremap
scale9
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Super-linear correlation based extended bit-plane compression allows
feature-map compression even on non-sparse data
* [Cavigelli,2018]
0,4,0,0,0,8,0,0,71
Zero Values
Non-Zero
Values
Correlated Values: 16, 20, 20, 20, 28, 28, 28, 99
Delta Values: 0, 4, 0, 0, 8, 0, 0, 71
© 2019 Synopsys
Correlated feature-map compression
5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps
Correlated compression outperforms sparsity-based compression
0
0.5
1
1.5
2
2.5
3
Mobilenet ResNet-50 Yolo V2 VOC VGG-16
CompressionRate
Sparsity-based Correlation-based
© 2019 Synopsys
Conclusion –
Bringing it All Together
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
A first-order energy model for Neural Network Inference
Assume:
• Quadratic energy scaling / MAC
when going from 32 to 8 bit.
• Linear energy saving / read-write in DDR/SRAM
when going from 32 to 8 bit
• 50% of coefficients zero when pruning
• 50% compute reduction under decomposition
• 50% of activations can be compressed
*[Han,2015]
*
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
When all model data is stored in DRAM optimized ResNet-50 is 10x
more efficient than its plain 32b counterpart
O(65MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame
10x
100%
22%
16%
11%
0%
20%
40%
60%
80%
100%
120%
32b float A. 8b fixed B. Decomposition
+ Pruning
C. Featuremap
Compression
RelativeEnergy
Consumption
© 2019 Synopsys
A first-order analysis on ResNet-50
5 Techniques – Conclusion: bringing it all together
In a system with sufficient on-chip SRAM, optimized ResNet-50 is 12.5x
more efficient than its plain 32b counterpart
O(0MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame
100%
15%
9% 8%
0%
20%
40%
60%
80%
100%
120%
32b float A. 8b fixed B. Decomposition
+ Pruning
C. Featuremap
Compression
RelativeEnergy
Consumption
12.5x
© 2019 Synopsys
For More Information
Visit the Synopsys booth for
demos on Automotive ADAS,
Virtual Reality & More
80
EV6x Embedded Vision
Processor IP with Safety
Enhancement Package
• Thursday, May 23
• Santa Clara Convention Center
• Doors open 8 AM
• Sessions on EV6x Vision Processor IP, Functional Safety, Security, OpenVX…
• Register via the EV Alliance website or at Synopsys Booth
Join Synopsys’ EV Seminar on Thursday
Navigating Embedded Vision at the Edge
B E S T P R O C E S S O R
© 2019 Synopsys
References
81
[Han,2015,2016]
https://arxiv.org/abs/1510.00149
https://arxiv.org/abs/1602.01528
[Xue, 2013]
https://www.microsoft.com/en-us/research/wp-
content/uploads/2013/01/svd_v2.pdf
[Nvidia, 2017]
http://on-demand.gputechconf.com/gtc
/2017/presentation/s7310-8-bit-inference-with-
tensorrt.pdf
[Choi, 2018, 2019]
https://arxiv.org/abs/1805.06085
https://www.ibm.com/blogs/research/2019/04/2-bit-
precision/
[Goetschalckx, 2018]
https://www.sigmobile.org/mobisys/2018/workshops/
deepmobile18/papers/Efficiently_Combining_SVD_Pru
ning_Clustering_Retraining.pdf
[Astrid,2017]
https://arxiv.org/abs/1701.07148
[Moons,2017]
https://ieeexplore.ieee.org/abstract/document/78703
53
[Chen,2016]
http://eyeriss.mit.edu/
[Cavigelli, 2018]
https://arxiv.org/abs/1810.03979
[Courbariaux, 2014]
https://arxiv.org/pdf/1412.7024.pdf
Embedded Vision Summit
Bert Moons --
5+ Techniques for Efficient Implementations
of Neural Networks
May 2019
© 2019 Synopsys
THANK YOU
82

More Related Content

More from Edge AI and Vision Alliance

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
Edge AI and Vision Alliance
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
Edge AI and Vision Alliance
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
Edge AI and Vision Alliance
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
Edge AI and Vision Alliance
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
Edge AI and Vision Alliance
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
Edge AI and Vision Alliance
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
Edge AI and Vision Alliance
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
Edge AI and Vision Alliance
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
Edge AI and Vision Alliance
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
Edge AI and Vision Alliance
 

More from Edge AI and Vision Alliance (20)

“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
“Computer Vision in Sports: Scalable Solutions for Downmarkets,” a Presentati...
 
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
“Detecting Data Drift in Image Classification Neural Networks,” a Presentatio...
 
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
“Deep Neural Network Training: Diagnosing Problems and Implementing Solutions...
 
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
“AI Start-ups: The Perils of Fishing for Whales (War Stories from the Entrepr...
 
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
“A Computer Vision System for Autonomous Satellite Maneuvering,” a Presentati...
 
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
“Bias in Computer Vision—It’s Bigger Than Facial Recognition!,” a Presentatio...
 
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
“Sensor Fusion Techniques for Accurate Perception of Objects in the Environme...
 
“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara“Updating the Edge ML Development Process,” a Presentation from Samsara
“Updating the Edge ML Development Process,” a Presentation from Samsara
 
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
“Combating Bias in Production Computer Vision Systems,” a Presentation from R...
 
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
“Developing an Embedded Vision AI-powered Fitness System,” a Presentation fro...
 
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
“Navigating the Evolving Venture Capital Landscape for Edge AI Start-ups,” a ...
 
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
“Advanced Presence Sensing: What It Means for the Smart Home,” a Presentation...
 
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
“Tracking and Fusing Diverse Risk Factors to Drive a SAFER Future,” a Present...
 
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
“MIPI CSI-2 Image Sensor Interface Standard Features Enable Efficient Embedde...
 
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
“Introduction to the CSI-2 Image Sensor Interface Standard,” a Presentation f...
 
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
“Practical Approaches to DNN Quantization,” a Presentation from Magic Leap
 
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ..."Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
"Optimizing Image Quality and Stereo Depth at the Edge," a Presentation from ...
 
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
“Using a Collaborative Network of Distributed Cameras for Object Tracking,” a...
 
“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental“A Survey of Model Compression Methods,” a Presentation from Instrumental
“A Survey of Model Compression Methods,” a Presentation from Instrumental
 
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
“Reinventing Smart Cities with Computer Vision,” a Presentation from Hayden AI
 

Recently uploaded

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 

"Five+ Techniques for Efficient Implementation of Neural Networks," a Presentation from Synopsys

  • 1. © 2019 Synopsys 5+ Techniques for Efficient Implementation of Neural Networks Bert Moons Synopsys May 2019
  • 2. © 2019 Synopsys Introduction -- Challenges of Embedding Deep Learning
  • 3. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources Many embedded applications require support for a variety of networks: CNN’s in feature extraction, RNN’s in sequence modeling
  • 4. © 2019 Synopsys 3 Major challenges Introduction – Challenges of embedded deep learning 1. Many operations per pixel 2. Process a lot of pixels in real-time 3. A large variation of different algorithms
  • 5. © 2019 Synopsys Classification accuracy comes at a cost Introduction – Challenges of embedded deep learning Conventional Machine Learning Deep Learning Human Bestreportedtop-5accuracy onIMAGENET-1000[%] Neural network accuracy comes at a cost of a high workload per input pixel and huge model sizes and bandwidth requirements
  • 6. © 2019 Synopsys Computing on large input data Introduction – Challenges of embedded deep learning 4KFHDIMAGENET 1X 40X 160X Embedded applications require real-time operation on large input frames
  • 7. © 2019 Synopsys Massive workload in real-time applications Introduction – Challenges of embedded deep learning 1GOP 1TOP Top-1IMAGENETaccuracy[%] 70 75 65 Single ImageNet Image 1GOP to 10GOP per IMAGENET image # operations / ImageNet image1GOP/s 1TOP/s Top-1IMAGENETaccuracy[%] 70 75 65 6 Cameras 30fps Full HD Image 5-to-180 TOPS @ 30 fps, FHD, ADAS # operations / second MobileNet V2 ResNet-50 GoogleNet VGG-16
  • 8. © 2019 Synopsys 5+ Techniques to reduce the DNN workload A. Neural Networks are error-tolerant Introduction – Challenges of embedded deep learning 1. Linear post-training 8/12/16b quantization 2. Linear trained 2/4/8 bit quantization 3. Non-linear trained 2/4/8 bit quantization through clustering C. Neural Networks have sparse and correlated intermediate results B. Neural Networks have redundancies and are over-dimensioned 4. Network pruning and compression 5. Network decomposition: low-rank network approximations 6. Sparsity and correlation based feature map compression
  • 9. © 2019 Synopsys A. Neural Networks Are Error- Tolerant 9
  • 10. © 2019 Synopsys The benefits of quantized number representations 5 Techniques – A. Neural Networks Are Error-Tolerant 8 bit fixed is 3-4x faster, 2-4x more efficient than 16b floating point Energy consumption per unit Processing units per chip Classification time per chip * [Choi,2019] 16b float 8b fixed O(1) relative fps O(16) relative fps 4b fixed O(256) relative fps ~ 16 ~ 6-8 ~ 2-4 Relative accuracy 100% no loss 99% 50-95%
  • 11. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Convert floating point pretrained models to Dynamic Fixed Point 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 Fixed Point Dynamic Fixed Point 0 1 1 0 1System Exponent Group 1 Exponent Group 2 Exponent * [Courbariaux,2019]
  • 12. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Dynamic Fixed-Point Quantization allows running neural networks with 8 bit weights and activations across the board 32 bit float baseline 8 bit fixed point * [Nvidia,2017]
  • 13. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant How to optimally choose: dynamic exponent groups, saturation thresholds, weight and activation exponents? Min-max scaling throws away small values A saturation threshold better represents small values, but clips large values * [Nvidia,2017]
  • 14. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop + Train saturation range for activations Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019] *
  • 15. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Good accuracy down to 2b Graceful performance degradation 0.85 0.9 0.95 1 1.05 CIFAR10 SVHN AlexNet ResNet18 ResNet50 full precision 5b 4b 3b 2b * [Choi,2018] Relativebenchmark accuracyvsfloatbaseline*
  • 16. © 2019 Synopsys Non-linear trained quantization – codebook clustering 5 Techniques – A. Neural Networks Are Error-Tolerant Clustered, codebook quantization can be optimally trained. This only reduces bandwidth, computations are still in floating point. * [Han,2015]
  • 17. © 2019 Synopsys B. Neural Networks Are Over- Dimensioned & Redundant 17
  • 18. © 2019 Synopsys Pruning Neural Networks 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Pruning removes unnecessary connections in the neural network. Accuracy is recovered through retraining the pruned network * [Han,2015]
  • 19. © 2019 Synopsys Low Rank Singular Value Decomposition (SVD) in DNNs 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Many singular values are small and can be discarded * [Xue,2013] 𝑨 = 𝑼 𝜮 𝑽 𝑻 𝑨 ≅ 𝑼′𝜮′𝑽′ 𝑻 = 𝑵𝑼
  • 20. © 2019 Synopsys Low Rank Canonical Polyadic (CP) decomp. in CNNs 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Convert a large convolutional filter in a triplet of smaller filters * [Astrid,2017]
  • 21. © 2019 Synopsys Basic example: Combining SVD, pruning and clustering 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant 11x model compression in a phone-recognition LSTM 0 2 4 6 8 10 12 Base P SVD SVD+P P+C SVD+P+C LSTMCompressionRate (P) (C) * [Goetschalckx,2018] P = Pruning SVD = Singular Value Decomposition C = Clustering / Codebook Compression
  • 22. © 2019 Synopsys C. Neural Networks have Sparse & Correlated Intermediate Results 22
  • 23. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Feature map bandwidth dominates in modern CNNs 0 2 4 6 8 10 12 Coefficient BW Feature Map BW BWinMObileNet-V1[MB] 1x1, 32 3x3DW, 32 1x1, 64 32
  • 24. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps ReLU activation introduces 50-90% zero-valued numbers in intermediate feature maps -5 4 12 -10 0 17 -1 3 2 0 4 12 0 0 17 0 3 2 ReLU activation 8b Features 8b Features
  • 25. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Hardware support for multi-bit Huffman encoding allows up to 2x bandwidth reduction in typical networks. Zero-runlength encoding as in [Chen, 2016], Huffman-encoding as in [Moons, 2017] 0 4 12 0 0 17 0 3 2 8b Features 72b Huffman Features zero 2’b00 <16 2’b01, 4’b WORD nonzero 1’b1, 8’b WORD 41b < 72b
  • 26. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Intermediate features in the same channel-plane are highly correlated Intermediate featuremaps in ReLU-less YOLOV2 An example featuremap scale1 An example featuremap scale9
  • 27. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Super-linear correlation based extended bit-plane compression allows feature-map compression even on non-sparse data * [Cavigelli,2018] 0,4,0,0,0,8,0,0,71 Zero Values Non-Zero Values Correlated Values: 16, 20, 20, 20, 28, 28, 28, 99 Delta Values: 0, 4, 0, 0, 8, 0, 0, 71
  • 28. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Correlated compression outperforms sparsity-based compression 0 0.5 1 1.5 2 2.5 3 Mobilenet ResNet-50 Yolo V2 VOC VGG-16 CompressionRate Sparsity-based Correlation-based
  • 29. © 2019 Synopsys Conclusion – Bringing it All Together
  • 30. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together A first-order energy model for Neural Network Inference Assume: • Quadratic energy scaling / MAC when going from 32 to 8 bit. • Linear energy saving / read-write in DDR/SRAM when going from 32 to 8 bit • 50% of coefficients zero when pruning • 50% compute reduction under decomposition • 50% of activations can be compressed *[Han,2015] *
  • 31. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together When all model data is stored in DRAM optimized ResNet-50 is 10x more efficient than its plain 32b counterpart O(65MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame 10x 100% 22% 16% 11% 0% 20% 40% 60% 80% 100% 120% 32b float A. 8b fixed B. Decomposition + Pruning C. Featuremap Compression RelativeEnergy Consumption
  • 32. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together In a system with sufficient on-chip SRAM, optimized ResNet-50 is 12.5x more efficient than its plain 32b counterpart O(0MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame 100% 15% 9% 8% 0% 20% 40% 60% 80% 100% 120% 32b float A. 8b fixed B. Decomposition + Pruning C. Featuremap Compression RelativeEnergy Consumption 12.5x
  • 33. © 2019 Synopsys For More Information Visit the Synopsys booth for demos on Automotive ADAS, Virtual Reality & More 33 EV6x Embedded Vision Processor IP with Safety Enhancement Package • Thursday, May 23 • Santa Clara Convention Center • Doors open 8 AM • Sessions on EV6x Vision Processor IP, Functional Safety, Security, OpenVX… • Register via the EV Alliance website or at Synopsys Booth Join Synopsys’ EV Seminar on Thursday Navigating Embedded Vision at the Edge B E S T P R O C E S S O R
  • 34. © 2019 Synopsys References 34 [Han,2015,2016] https://arxiv.org/abs/1510.00149 https://arxiv.org/abs/1602.01528 [Xue, 2013] https://www.microsoft.com/en-us/research/wp- content/uploads/2013/01/svd_v2.pdf [Nvidia, 2017] http://on-demand.gputechconf.com/gtc /2017/presentation/s7310-8-bit-inference-with- tensorrt.pdf [Choi, 2018, 2019] https://arxiv.org/abs/1805.06085 https://www.ibm.com/blogs/research/2019/04/2-bit- precision/ [Goetschalckx, 2018] https://www.sigmobile.org/mobisys/2018/workshops/ deepmobile18/papers/Efficiently_Combining_SVD_Pru ning_Clustering_Retraining.pdf [Astrid,2017] https://arxiv.org/abs/1701.07148 [Moons,2017] https://ieeexplore.ieee.org/abstract/document/78703 53 [Chen,2016] http://eyeriss.mit.edu/ [Cavigelli, 2018] https://arxiv.org/abs/1810.03979 [Courbariaux, 2014] https://arxiv.org/pdf/1412.7024.pdf Embedded Vision Summit Bert Moons -- 5+ Techniques for Efficient Implementations of Neural Networks May 2019
  • 36. © 2019 Synopsys 5+ Techniques for Efficient Implementation of Neural Networks Bert Moons Synopsys May 2019
  • 37. © 2019 Synopsys Introduction -- Challenges of Embedding Deep Learning
  • 38. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning
  • 39. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources
  • 40. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources Many embedded applications require support for a variety of networks: CNN’s in feature extraction, RNN’s in sequence modeling
  • 41. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources Many embedded applications require support for a variety of networks: CNN’s in feature extraction, RNN’s in sequence modeling 1. Many operations per pixel 2. Process a lot of pixels in real-time
  • 42. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources Many embedded applications require support for a variety of networks: CNN’s in feature extraction, RNN’s in sequence modeling 1. Many operations per pixel 2. Process a lot of pixels in real-time 3. A large variation of different algorithms
  • 43. © 2019 Synopsys Neural Network accuracy comes at a high cost in terms of model storage and operations per input feature 3 Major challenges Introduction – Challenges of embedded deep learning Many embedded applications require real-time operation on high- dimensional, large input data from various input sources Many embedded applications require support for a variety of networks: CNN’s in feature extraction, RNN’s in sequence modeling 1. Many operations per pixel 2. Process a lot of pixels in real-time 3. A large variation of different algorithms
  • 44. © 2019 Synopsys Classification accuracy comes at a cost Introduction – Challenges of embedded deep learning Conventional Machine Learning Deep Learning Human Bestreportedtop-5accuracy onIMAGENET-1000[%] Neural network accuracy comes at a cost of a high workload per input pixel and huge model sizes and bandwidth requirements
  • 45. © 2019 Synopsys Computing on large input data Introduction – Challenges of embedded deep learning 4KFHDIMAGENET 1X 40X 160X Embedded applications require real-time operation on large input frames
  • 46. © 2019 Synopsys Massive workload in real-time applications Introduction – Challenges of embedded deep learning 1GOP 1TOP Top-1IMAGENETaccuracy[%] 70 75 65 Single ImageNet Image 1GOP to 10GOP per IMAGENET image # operations / ImageNet image MobileNet V2 ResNet-50 GoogleNet VGG-16
  • 47. © 2019 Synopsys Massive workload in real-time applications Introduction – Challenges of embedded deep learning 1GOP 1TOP Top-1IMAGENETaccuracy[%] 70 75 65 Single ImageNet Image 1GOP to 10GOP per IMAGENET image # operations / ImageNet image1GOP/s 1TOP/s Top-1IMAGENETaccuracy[%] 70 75 65 6 Cameras 30fps Full HD Image 5-to-180 TOPS @ 30 fps, FHD, ADAS # operations / second MobileNet V2 ResNet-50 GoogleNet VGG-16
  • 48. © 2019 Synopsys 5+ Techniques to reduce the DNN workload A. Neural Networks are error-tolerant Introduction – Challenges of embedded deep learning C. Neural Networks have sparse and correlated intermediate results B. Neural Networks have redundancies and are over-dimensioned
  • 49. © 2019 Synopsys 5+ Techniques to reduce the DNN workload A. Neural Networks are error-tolerant Introduction – Challenges of embedded deep learning 1. Linear post-training 8/12/16b quantization 2. Linear trained 2/4/8 bit quantization 3. Non-linear trained 2/4/8 bit quantization through clustering C. Neural Networks have sparse and correlated intermediate results B. Neural Networks have redundancies and are over-dimensioned
  • 50. © 2019 Synopsys 5+ Techniques to reduce the DNN workload A. Neural Networks are error-tolerant Introduction – Challenges of embedded deep learning 1. Linear post-training 8/12/16b quantization 2. Linear trained 2/4/8 bit quantization 3. Non-linear trained 2/4/8 bit quantization through clustering C. Neural Networks have sparse and correlated intermediate results B. Neural Networks have redundancies and are over-dimensioned 4. Network pruning and compression 5. Network decomposition: low-rank network approximations
  • 51. © 2019 Synopsys 5+ Techniques to reduce the DNN workload A. Neural Networks are error-tolerant Introduction – Challenges of embedded deep learning 1. Linear post-training 8/12/16b quantization 2. Linear trained 2/4/8 bit quantization 3. Non-linear trained 2/4/8 bit quantization through clustering C. Neural Networks have sparse and correlated intermediate results B. Neural Networks have redundancies and are over-dimensioned 4. Network pruning and compression 5. Network decomposition: low-rank network approximations 6. Sparsity and correlation based feature map compression
  • 52. © 2019 Synopsys A. Neural Networks Are Error- Tolerant 52
  • 53. © 2019 Synopsys The benefits of quantized number representations 5 Techniques – A. Neural Networks Are Error-Tolerant 8 bit fixed is 3-4x faster, 2-4x more efficient than 16b floating point Energy consumption per unit Processing units per chip Classification time per chip * [Choi,2019] 16b float 8b fixed O(1) relative fps O(16) relative fps 4b fixed O(256) relative fps ~ 16 ~ 6-8 ~ 2-4 Relative accuracy 100% no loss 99% 50-95%
  • 54. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Convert floating point pretrained models to Dynamic Fixed Point 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 0 1 1 0 1 1 0 0 1 1 1 0 1 0 1 0 1 1 1 0 0 0 0 1 0 0 0 1 0 1 1 0 0 1 1 1 1 1 0 0 1 0 1 0 0 Fixed Point Dynamic Fixed Point 0 1 1 0 1System Exponent Group 1 Exponent Group 2 Exponent * [Courbariaux,2019]
  • 55. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Dynamic Fixed-Point Quantization allows running neural networks with 8 bit weights and activations across the board 32 bit float baseline 8 bit fixed point * [Nvidia,2017]
  • 56. © 2019 Synopsys Linear post-training quantization 5 Techniques – A. Neural Networks Are Error-Tolerant How to optimally choose: dynamic exponent groups, saturation thresholds, weight and activation exponents? Min-max scaling throws away small values A saturation threshold better represents small values, but clips large values * [Nvidia,2017]
  • 57. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019]
  • 58. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop + Train saturation range for activations Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019]
  • 59. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop + Train saturation range for activations Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019]
  • 60. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop + Train saturation range for activations Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019]
  • 61. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Floating point models are a bad initializer for low-precision fixed-point. Trained quantization from scratch automates heuristic-based optimization. Quantizing weights and activations with straight- through estimators, allowing back-prop + Train saturation range for activations Forward Backward * PACT, Parametrized Clipping Activation [Choi,2019] *
  • 62. © 2019 Synopsys Linear trained quantization 5 Techniques – A. Neural Networks Are Error-Tolerant Good accuracy down to 2b Graceful performance degradation 0.85 0.9 0.95 1 1.05 CIFAR10 SVHN AlexNet ResNet18 ResNet50 full precision 5b 4b 3b 2b * [Choi,2018] Relativebenchmark accuracyvsfloatbaseline*
  • 63. © 2019 Synopsys Non-linear trained quantization – codebook clustering 5 Techniques – A. Neural Networks Are Error-Tolerant Clustered, codebook quantization can be optimally trained. This only reduces bandwidth, computations are still in floating point. * [Han,2015]
  • 64. © 2019 Synopsys B. Neural Networks Are Over- Dimensioned & Redundant 64
  • 65. © 2019 Synopsys Pruning Neural Networks 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Pruning removes unnecessary connections in the neural network. Accuracy is recovered through retraining the pruned network * [Han,2015]
  • 66. © 2019 Synopsys Low Rank Singular Value Decomposition (SVD) in DNNs 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Many singular values are small and can be discarded * [Xue,2013] 𝑨 = 𝑼 𝜮 𝑽 𝑻 𝑨 ≅ 𝑼′𝜮′𝑽′ 𝑻 = 𝑵𝑼
  • 67. © 2019 Synopsys Low Rank Canonical Polyadic (CP) decomp. in CNNs 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant Convert a large convolutional filter in a triplet of smaller filters * [Astrid,2017]
  • 68. © 2019 Synopsys Basic example: Combining SVD, pruning and clustering 5 Techniques – B. Neural Networks are Over-Dimensioned & Redundant 11x model compression in a phone-recognition LSTM 0 2 4 6 8 10 12 Base P SVD SVD+P P+C SVD+P+C LSTMCompressionRate (P) (C) * [Goetschalckx,2018] P = Pruning SVD = Singular Value Decomposition C = Clustering / Codebook Compression
  • 69. © 2019 Synopsys C. Neural Networks have Sparse & Correlated Intermediate Results 69
  • 70. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Feature map bandwidth dominates in modern CNNs 0 2 4 6 8 10 12 Coefficient BW Feature Map BW BWinMObileNet-V1[MB] 1x1, 32 3x3DW, 32 1x1, 64 32
  • 71. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps ReLU activation introduces 50-90% zero-valued numbers in intermediate feature maps -5 4 12 -10 0 17 -1 3 2 0 4 12 0 0 17 0 3 2 ReLU activation 8b Features 8b Features
  • 72. © 2019 Synopsys Sparse feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Hardware support for multi-bit Huffman encoding allows up to 2x bandwidth reduction in typical networks. Zero-runlength encoding as in [Chen, 2016], Huffman-encoding as in [Moons, 2017] 0 4 12 0 0 17 0 3 2 8b Features 72b Huffman Features zero 2’b00 <16 2’b01, 4’b WORD nonzero 1’b1, 8’b WORD 41b < 72b
  • 73. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Intermediate features in the same channel-plane are highly correlated Intermediate featuremaps in ReLU-less YOLOV2 An example featuremap scale1 An example featuremap scale9
  • 74. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Super-linear correlation based extended bit-plane compression allows feature-map compression even on non-sparse data * [Cavigelli,2018] 0,4,0,0,0,8,0,0,71 Zero Values Non-Zero Values Correlated Values: 16, 20, 20, 20, 28, 28, 28, 99 Delta Values: 0, 4, 0, 0, 8, 0, 0, 71
  • 75. © 2019 Synopsys Correlated feature-map compression 5 Techniques – C. Neural Networks have Sparse, Correlated Intermediate Feature Maps Correlated compression outperforms sparsity-based compression 0 0.5 1 1.5 2 2.5 3 Mobilenet ResNet-50 Yolo V2 VOC VGG-16 CompressionRate Sparsity-based Correlation-based
  • 76. © 2019 Synopsys Conclusion – Bringing it All Together
  • 77. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together A first-order energy model for Neural Network Inference Assume: • Quadratic energy scaling / MAC when going from 32 to 8 bit. • Linear energy saving / read-write in DDR/SRAM when going from 32 to 8 bit • 50% of coefficients zero when pruning • 50% compute reduction under decomposition • 50% of activations can be compressed *[Han,2015] *
  • 78. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together When all model data is stored in DRAM optimized ResNet-50 is 10x more efficient than its plain 32b counterpart O(65MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame 10x 100% 22% 16% 11% 0% 20% 40% 60% 80% 100% 120% 32b float A. 8b fixed B. Decomposition + Pruning C. Featuremap Compression RelativeEnergy Consumption
  • 79. © 2019 Synopsys A first-order analysis on ResNet-50 5 Techniques – Conclusion: bringing it all together In a system with sufficient on-chip SRAM, optimized ResNet-50 is 12.5x more efficient than its plain 32b counterpart O(0MB) DDR / frame O(1GB) SRAM / frame O(3.6G) MACS / frame 100% 15% 9% 8% 0% 20% 40% 60% 80% 100% 120% 32b float A. 8b fixed B. Decomposition + Pruning C. Featuremap Compression RelativeEnergy Consumption 12.5x
  • 80. © 2019 Synopsys For More Information Visit the Synopsys booth for demos on Automotive ADAS, Virtual Reality & More 80 EV6x Embedded Vision Processor IP with Safety Enhancement Package • Thursday, May 23 • Santa Clara Convention Center • Doors open 8 AM • Sessions on EV6x Vision Processor IP, Functional Safety, Security, OpenVX… • Register via the EV Alliance website or at Synopsys Booth Join Synopsys’ EV Seminar on Thursday Navigating Embedded Vision at the Edge B E S T P R O C E S S O R
  • 81. © 2019 Synopsys References 81 [Han,2015,2016] https://arxiv.org/abs/1510.00149 https://arxiv.org/abs/1602.01528 [Xue, 2013] https://www.microsoft.com/en-us/research/wp- content/uploads/2013/01/svd_v2.pdf [Nvidia, 2017] http://on-demand.gputechconf.com/gtc /2017/presentation/s7310-8-bit-inference-with- tensorrt.pdf [Choi, 2018, 2019] https://arxiv.org/abs/1805.06085 https://www.ibm.com/blogs/research/2019/04/2-bit- precision/ [Goetschalckx, 2018] https://www.sigmobile.org/mobisys/2018/workshops/ deepmobile18/papers/Efficiently_Combining_SVD_Pru ning_Clustering_Retraining.pdf [Astrid,2017] https://arxiv.org/abs/1701.07148 [Moons,2017] https://ieeexplore.ieee.org/abstract/document/78703 53 [Chen,2016] http://eyeriss.mit.edu/ [Cavigelli, 2018] https://arxiv.org/abs/1810.03979 [Courbariaux, 2014] https://arxiv.org/pdf/1412.7024.pdf Embedded Vision Summit Bert Moons -- 5+ Techniques for Efficient Implementations of Neural Networks May 2019