2019-06-14:7 - Neutral Network Compression

Neural Network Compression
Luigi Troiano
Dept. of Engineering
University of Sannio

This work is licensed under a Creative Commons Attribution 4.0 International License.

Today, AI is Data Center oriented
AI
AI
AI
AI
Computational
power is required
to train large
models over
massive data

Data Center @Nvidia
MARCH 19, 2019
NVIDIA Annual Investor Day

The future of AI is on the Edge
Edge Computing market will reach $34 billion by 2023, growing at 35% annually
Edge AI inference will grow from just 6% in 2017 to 43% in 2023
By 2022, 80% of smartphones shipped will have AI capabilities on the device
itself, up from 10% in 2017

Anatomy of a Convolutional Neural Networks
Umut Güçlü, Marcel A. J. van Gerven: Deep Neural Networks Reveal a Gradient in the Complexity of Neural Representations across the Ventral Stream, Journal of
Neuroscience 8 July 2015, 35 (27) 10005-10014

An example: GoogleNet
CONVOLUTION
POOLING
SOFTMAX
CONCAT/NORMALIZE

Aims of Neural Network Compression
1. Lowering the bandwidth necessary for transferring a network
2. Reducing the space for keeping the network in memory
3. Making more efficient the inference of the network
4. Shortening the time required for training
5. Reducing/eliminating multiple nodes for complex networks
6. Improving the network generalization

Approaches
■ Quantization
■ Pruning and sharing
■ Design Structure Matrix
■ Low-rank tensor decomposition
■ Tensor Factorization
■ Transforming Network Representation

Quantization
Reducing the number of bits required to represent each weight with minimal loss of accuracy
• Integer: 8-bit quantization of the parameters can result in significant speed-up
• Fixed-point: 16-bit fixed-point representation with stochastic rounding
• Non-uniform: K-means scalar quantization
• Binarization: 1-bit representation for each weight

Pruning and Sharing
Reducing network complexity by removing or grouping redundant and non-informative weights
• Remove connections:
• Based on the magnitude of the weighting matrix
• Based on the Hessian of the loss function
• Prune weights in a pre-trained CNN model and then retrain the model
• Sharing the weights:
• Using low-cost hash (possibly LSH) function to group weights into buckets
• Adding a L0, L1 or L2 regularizer to the loss function in order to by encouraging weights to become
exactly zero, then remove them

Combining methods
S. Han, H. Mao, and W. J. Dally proposed:
1. Learning the connectivity via common network training
2. Pruning the small-weight connections
3. Quantization of the link weights using weight sharing
4. Huffman coding to the quantized weights as well as the codebook
5. Retraining the network to learn the final weights for the remaining sparse connections

Design Structure Matrix (DSM)
Searching for relevant dependencies by rearranging the weights.
• Impose the structure since beginning
• Circulant matrix
• Adaptive Fastfood Transform:
• R = SHGPHB
S, G and B are random diagonal matrices, P is a random
permutation matrix, and H is the Walsh-Hadamard matrix.

Low-rank Tensor Decomposition
Producing/assuming 4D-Tensor decomposition in order to reduce
memory occupation
• Learning separable 1D filters
• Compress 2D convolutional layers
• Canonical Polyadic (CP) decomposition
• Batch Normalization

Tensor Factorization
To approximate tensors by means of low-rank reduces
tensors
• Single Value Decomposition (SVD)
• Non-negative Matrix Factorization
• CUR

Transforming Network Representation
• Transferred/Compact Convolutional Filters
• Knowledge Distillation

Conclusions
Applications of AI involves complex neural networks
Today Training (and also Inference) are mainly performed at Data Centers
There is an increasing interest for Edge AI
This requires to face transmission and execution of models over limited
resources
Research is addressing this problems by multiple approaches
MPEG will play a relevant role in the the standardization of compressed nets

Luigi Troiano
Artiﬁcial Intelligence, Data Science and Big Data
Dept. of Engineering
University of Sannio
Benevento, Italy
https://www.linkedin.com/in/luigitroiano/
THANK YOU

2019-06-14:7 - Neutral Network Compression

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à 2019-06-14:7 - Neutral Network Compression

Similaire à 2019-06-14:7 - Neutral Network Compression (20)

Plus de uninfoit

Plus de uninfoit (20)

Dernier

Dernier (20)

2019-06-14:7 - Neutral Network Compression