Machine Learning approaches at video compression

Machine Learning approaches at
video compression
Roberto Iacoviello
RAI - Radiotelevisione Italiana
Centre for Research, Technological Innovation and
Experimentation (CRITS)

Machine Learning is like sex in high school.
Everyone is talking about it, a few know what to
do, and only your teacher is doing it
There are 2/3 topics around AI: Ethics, that
sounds to me like if we don’t teach ethics to
the machines, Skynet will kill all of us.
Academic paper full of mathematics and
different notations. After you read them you
feel like: Ok, and now?
Then there is the real life: sometime is good
and sometimes is bad.

Dear old typical hybrid block based approach
Many new tools in VVC: Versatile Video
Coding. MPEG group in 30 years has
developed many useful standards but
based on the same schema. Now the
group is going towards new horizons:
neural networks.

Two approaches:
 NON Video approach: coded representation of neural network
Neural Network Video approach
Conservative Disruptive
One to One End to End
Replace one MPEG block with
one Deep Learning block
Replace the entire chain MPEG

Non-video approach: coded representation of
neural networks
Scope: Representation
of weights
and parameters,
no architecture
N18162 Marrakech

Non-video approach: coded representation of
neural networks
Coded
representation of
weight matrix

Coded representation of neural networks
Represent different artificial neural network
Enable faster inference
Enable use under resource limitations

Use cases
• Inference may be performed on a large number of devices
• The NNs used in an application can be improved incrementally
• Limitations in terms of processing power and memory
• Several apps would need to store on the device the
same base neural network multiple times
8
W17924 Macao
Type Parameter’s Size
Media content analysis From few KB to several
hundreds of MB
Translational app Currently around 200MB
Compact Descriptors for
Video Analysis (CDVA)
About 500-600 MB

MPEG Use cases
• UC10 Distributed training and evaluation of neural networks
for media content analysis
• UC11 Compact Descriptors for Video Analysis (CDVA)
• UC12 Image/Video Compression
• UC13 Distribution of neural networks for content processing
W17924 Macao

Dropping connections
Dropping layers
Replacing convolutions with
lower Dimensional ones  Matrix
decomposition
Changing stride in convolutions
without Increasing output size
Quantization (rate distortion based)
Quantization using codebook
Entropy coding
Methods
Summary: cut Something
Somewhere

• Uniform Quantization
• Sequential Quantization
• Nonuniform Quantization
• Low-Rank Approximation
M47704, Geneva
Methods
Original Weight
(32-bits)
Quantization Stage 1Quantization Stage 1
Quantization1
(10-bits)
DeQuantization 1
Quantization2
(8-bits)
Compressed
Model
DeQuantization 2
(for inference)
Quantization Stage 2Quantization Stage 2
W x H
Conv
W x 1
Conv
1 x H
Conv
Relu
Relu

• “Importance” estimation step
• With the proper re-train the model with the constraints of fixed-point
weights, the model’s precision could be very closed to the floating-
point model
• Quantize the coefficients with different precision for different layers
Methods

Video approach: Conservative
Neural Network based Filter for Video Coding
Core Experiment 13 on neural network based filter for video coding
Investigate the following problems:
 The impact of NN filter position in the filter chain
 The generalization capability of the NN: performance change when the test QP is not the same
as the training QP
13
JVET-N0840-v1

CE13-2.1: Convolutional Neural Network Filter (CNNF) for
Intra Frame
JVET-N0169
Over VTM-4.0 All Intra
Y U V EncT DecT
DF+CNNF+SAO+ALF -3.48% -5.18% -6.77% 142% 38414%
CNNF+ALF -4.65% -6.73% -7.92% 149% 37956%
CNNF -4.14% -5.49% -6.70% 140% 38411%
Pay attention to
the decoding
time

Concat
Conv1, (5,5,64)
Conv2, (3,3,64)
Conv3, (3,3,64)
Conv4, (3,3,64)
Conv5, (3,3,64)
Conv6, (3,3,64)
Conv7, (3,3,64)
Convolution8, (3,3,1)
Summation
Normalized QP MapNormalized Y/U/V
N: kernel size
K:kernel number
ConvM, (N,N,K)
Convolution (N,N,K)
ReLU
CE13-2.1: Convolutional Neural Network Filter (CNNF) for Intra
Frame
JVET-N0169

CE13-1.1: Convolutional neural network loop filter
JVET-N0110-v1
Over VTM-4.0
Random Access
Y U V EncT DecT
-1.36% -14.96% -14.91% 100% 142%

Each category will investigate the following problems:
 The impact of NN filter position in the filter chain: there is always objective gain
 The generalization capability of the NN: results indicate that the difference is minor
Neural Network based Filter for Video Coding
JVET-N_Notes_dD
What MPEG has decided in the March meeting (25/3/2019):
The performance/complexity tradeoff indicates that the NN technology
currently is not mature enough to be included in a standard
As I
said…sometimes
life is bad

PERFORMANCE
IS
NOTHING
WITHOUT
COMPLEXITY
Neural Network for Video Coding: Conclusion
The trade-off
matter

Neural Network Video approach: Disruptive
Videos are temporally highly
redundant
No deep image compression can
compete with state-of-the-art video
compression, which exploits this
redundancy
Optical Flow

Optical Flow
 In the computer vision tasks, optical flow is widely used to exploit temporal
relationship
 Learning based optical flow methods can provide accurate motion information at
pixel-level
 Only artificial/synthetic data set

• Learning based optical flow estimation is utilized to obtain the motion
information and reconstruct the current frame
• End-to-end deep video compression model that jointly learns motion
estimation, motion compression, and residual compression
DVC: An End-to-end Deep Video Compression
Framework

Framework
MPEG NN
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒2 =
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑜𝑓 𝑁𝑁 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒𝑠

Framework
Optical Flow Net

Framework
Motion Compression

 MV Encoder and Decoder Network
Framework

Framework
Motion Compensation Network

Framework
Residual Encoder Net
Bit Rate Estimation Net

Loss Function DVC: An End-to-end Deep Video
Compression Framework
 The whole compression system is end-to-end optimized:
Rate Distortion Optimization Just one end to end
formula that jointly learns
motion estimation,
motion compression, and
residual compression
Residuals
entropy
Motion
entropy

Advantages of Neural Networks
 Excellent content adaptivity
 Improve coding efficiency by leveraging samples from far distance
 Neural Network can well represent both texture and feature
 The whole compression system is end-to-end optimized

Rai R&D : what we are doing
 End to end chain
 Issues:
 Residuals compression

New EBU Distribution
Codecs activity
Please join the EBU Video Group
https://tech.ebu.ch/video
Please join the
EBU Video Group,
we’ll have lot of
fun!

Machine Learning approaches at
video compression
Roberto Iacoviello
roberto.iacoviello@rai.it
Grazie per l’attenzione
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0
Unported License
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
On your left there is the
reinforcement learning, that
means: this is the reward if
you contact me.

Machine Learning approaches at video compression

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à Machine Learning approaches at video compression

Similaire à Machine Learning approaches at video compression (20)

Dernier

Dernier (20)

Machine Learning approaches at video compression