Machine Learning approaches at
video compression
Roberto Iacoviello
RAI - Radiotelevisione Italiana
Centre for Research, Technological Innovation and
Experimentation (CRITS)
Machine Learning is like sex in high school.
Everyone is talking about it, a few know what to
do, and only your teacher is doing it
There are 2/3 topics around AI: Ethics, that
sounds to me like if we don’t teach ethics to
the machines, Skynet will kill all of us.
Academic paper full of mathematics and
different notations. After you read them you
feel like: Ok, and now?
Then there is the real life: sometime is good
and sometimes is bad.
Dear old typical hybrid block based approach
Many new tools in VVC: Versatile Video
Coding. MPEG group in 30 years has
developed many useful standards but
based on the same schema. Now the
group is going towards new horizons:
neural networks.
Two approaches:
NON Video approach: coded representation of neural network
Neural Network Video approach
Conservative Disruptive
One to One End to End
Replace one MPEG block with
one Deep Learning block
Replace the entire chain MPEG
Non-video approach: coded representation of
neural networks
Scope: Representation
of weights
and parameters,
no architecture
N18162 Marrakech
Coded representation of neural networks
Represent different artificial neural network
Enable faster inference
Enable use under resource limitations
Use cases
• Inference may be performed on a large number of devices
• The NNs used in an application can be improved incrementally
• Limitations in terms of processing power and memory
• Several apps would need to store on the device the
same base neural network multiple times
8
W17924 Macao
Type Parameter’s Size
Media content analysis From few KB to several
hundreds of MB
Translational app Currently around 200MB
Compact Descriptors for
Video Analysis (CDVA)
About 500-600 MB
MPEG Use cases
• UC10 Distributed training and evaluation of neural networks
for media content analysis
• UC11 Compact Descriptors for Video Analysis (CDVA)
• UC12 Image/Video Compression
• UC13 Distribution of neural networks for content processing
W17924 Macao
Dropping connections
Dropping layers
Replacing convolutions with
lower Dimensional ones Matrix
decomposition
Changing stride in convolutions
without Increasing output size
Quantization (rate distortion based)
Quantization using codebook
Entropy coding
Methods
Summary: cut Something
Somewhere
• Uniform Quantization
• Sequential Quantization
• Nonuniform Quantization
• Low-Rank Approximation
M47704, Geneva
Methods
Original Weight
(32-bits)
Quantization Stage 1Quantization Stage 1
Quantization1
(10-bits)
DeQuantization 1
Quantization2
(8-bits)
Compressed
Model
DeQuantization 2
(for inference)
Quantization Stage 2Quantization Stage 2
W x H
Conv
W x 1
Conv
1 x H
Conv
Relu
Relu
• “Importance” estimation step
• With the proper re-train the model with the constraints of fixed-point
weights, the model’s precision could be very closed to the floating-
point model
• Quantize the coefficients with different precision for different layers
Methods
Video approach: Conservative
Neural Network based Filter for Video Coding
Core Experiment 13 on neural network based filter for video coding
Investigate the following problems:
The impact of NN filter position in the filter chain
The generalization capability of the NN: performance change when the test QP is not the same
as the training QP
13
JVET-N0840-v1
CE13-2.1: Convolutional Neural Network Filter (CNNF) for
Intra Frame
JVET-N0169
Over VTM-4.0 All Intra
Y U V EncT DecT
DF+CNNF+SAO+ALF -3.48% -5.18% -6.77% 142% 38414%
CNNF+ALF -4.65% -6.73% -7.92% 149% 37956%
CNNF -4.14% -5.49% -6.70% 140% 38411%
Pay attention to
the decoding
time
CE13-1.1: Convolutional neural network loop filter
JVET-N0110-v1
Over VTM-4.0
Random Access
Y U V EncT DecT
-1.36% -14.96% -14.91% 100% 142%
Each category will investigate the following problems:
The impact of NN filter position in the filter chain: there is always objective gain
The generalization capability of the NN: results indicate that the difference is minor
Neural Network based Filter for Video Coding
JVET-N_Notes_dD
What MPEG has decided in the March meeting (25/3/2019):
The performance/complexity tradeoff indicates that the NN technology
currently is not mature enough to be included in a standard
As I
said…sometimes
life is bad
Neural Network Video approach: Disruptive
Videos are temporally highly
redundant
No deep image compression can
compete with state-of-the-art video
compression, which exploits this
redundancy
Optical Flow
Optical Flow
In the computer vision tasks, optical flow is widely used to exploit temporal
relationship
Learning based optical flow methods can provide accurate motion information at
pixel-level
Only artificial/synthetic data set
• Learning based optical flow estimation is utilized to obtain the motion
information and reconstruct the current frame
• End-to-end deep video compression model that jointly learns motion
estimation, motion compression, and residual compression
DVC: An End-to-end Deep Video Compression
Framework
DVC: An End-to-end Deep Video Compression
Framework
MPEG NN
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒2 =
𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑜𝑓 𝑁𝑁 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒𝑠
MV Encoder and Decoder Network
DVC: An End-to-end Deep Video Compression
Framework
DVC: An End-to-end Deep Video Compression
Framework
Motion Compensation Network
DVC: An End-to-end Deep Video Compression
Framework
Residual Encoder Net
Bit Rate Estimation Net
Loss Function DVC: An End-to-end Deep Video
Compression Framework
The whole compression system is end-to-end optimized:
Rate Distortion Optimization Just one end to end
formula that jointly learns
motion estimation,
motion compression, and
residual compression
Residuals
entropy
Motion
entropy
Advantages of Neural Networks
Excellent content adaptivity
Improve coding efficiency by leveraging samples from far distance
Neural Network can well represent both texture and feature
The whole compression system is end-to-end optimized
Rai R&D : what we are doing
End to end chain
Issues:
Residuals compression
New EBU Distribution
Codecs activity
Please join the EBU Video Group
https://tech.ebu.ch/video
Please join the
EBU Video Group,
we’ll have lot of
fun!
Machine Learning approaches at
video compression
Roberto Iacoviello
roberto.iacoviello@rai.it
Grazie per l’attenzione
This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0
Unported License
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/
On your left there is the
reinforcement learning, that
means: this is the reward if
you contact me.