Publicité

Machine Learning approaches at video compression

10 Dec 2019
Publicité

Contenu connexe

Similaire à Machine Learning approaches at video compression (20)

Publicité

Dernier(20)

Machine Learning approaches at video compression

  1. Machine Learning approaches at video compression Roberto Iacoviello RAI - Radiotelevisione Italiana Centre for Research, Technological Innovation and Experimentation (CRITS)
  2. Machine Learning is like sex in high school. Everyone is talking about it, a few know what to do, and only your teacher is doing it There are 2/3 topics around AI: Ethics, that sounds to me like if we don’t teach ethics to the machines, Skynet will kill all of us. Academic paper full of mathematics and different notations. After you read them you feel like: Ok, and now? Then there is the real life: sometime is good and sometimes is bad.
  3. Dear old typical hybrid block based approach Many new tools in VVC: Versatile Video Coding. MPEG group in 30 years has developed many useful standards but based on the same schema. Now the group is going towards new horizons: neural networks.
  4. Two approaches:  NON Video approach: coded representation of neural network Neural Network Video approach Conservative Disruptive One to One End to End Replace one MPEG block with one Deep Learning block Replace the entire chain MPEG
  5. Non-video approach: coded representation of neural networks Scope: Representation of weights and parameters, no architecture N18162 Marrakech
  6. Non-video approach: coded representation of neural networks Coded representation of weight matrix
  7. Coded representation of neural networks Represent different artificial neural network Enable faster inference Enable use under resource limitations
  8. Use cases • Inference may be performed on a large number of devices • The NNs used in an application can be improved incrementally • Limitations in terms of processing power and memory • Several apps would need to store on the device the same base neural network multiple times 8 W17924 Macao Type Parameter’s Size Media content analysis From few KB to several hundreds of MB Translational app Currently around 200MB Compact Descriptors for Video Analysis (CDVA) About 500-600 MB
  9. MPEG Use cases • UC10 Distributed training and evaluation of neural networks for media content analysis • UC11 Compact Descriptors for Video Analysis (CDVA) • UC12 Image/Video Compression • UC13 Distribution of neural networks for content processing W17924 Macao
  10. Dropping connections Dropping layers Replacing convolutions with lower Dimensional ones  Matrix decomposition Changing stride in convolutions without Increasing output size Quantization (rate distortion based) Quantization using codebook Entropy coding Methods Summary: cut Something Somewhere
  11. • Uniform Quantization • Sequential Quantization • Nonuniform Quantization • Low-Rank Approximation M47704, Geneva Methods Original Weight (32-bits) Quantization Stage 1Quantization Stage 1 Quantization1 (10-bits) DeQuantization 1 Quantization2 (8-bits) Compressed Model DeQuantization 2 (for inference) Quantization Stage 2Quantization Stage 2 W x H Conv W x 1 Conv 1 x H Conv Relu Relu
  12. • “Importance” estimation step • With the proper re-train the model with the constraints of fixed-point weights, the model’s precision could be very closed to the floating- point model • Quantize the coefficients with different precision for different layers Methods
  13. Video approach: Conservative Neural Network based Filter for Video Coding Core Experiment 13 on neural network based filter for video coding Investigate the following problems:  The impact of NN filter position in the filter chain  The generalization capability of the NN: performance change when the test QP is not the same as the training QP 13 JVET-N0840-v1
  14. CE13-2.1: Convolutional Neural Network Filter (CNNF) for Intra Frame JVET-N0169 Over VTM-4.0 All Intra Y U V EncT DecT DF+CNNF+SAO+ALF -3.48% -5.18% -6.77% 142% 38414% CNNF+ALF -4.65% -6.73% -7.92% 149% 37956% CNNF -4.14% -5.49% -6.70% 140% 38411% Pay attention to the decoding time
  15. Concat Conv1, (5,5,64) Conv2, (3,3,64) Conv3, (3,3,64) Conv4, (3,3,64) Conv5, (3,3,64) Conv6, (3,3,64) Conv7, (3,3,64) Convolution8, (3,3,1) Summation Normalized QP MapNormalized Y/U/V N: kernel size K:kernel number ConvM, (N,N,K) Convolution (N,N,K) ReLU CE13-2.1: Convolutional Neural Network Filter (CNNF) for Intra Frame JVET-N0169
  16. CE13-1.1: Convolutional neural network loop filter JVET-N0110-v1 Over VTM-4.0 Random Access Y U V EncT DecT -1.36% -14.96% -14.91% 100% 142%
  17. Each category will investigate the following problems:  The impact of NN filter position in the filter chain: there is always objective gain  The generalization capability of the NN: results indicate that the difference is minor Neural Network based Filter for Video Coding JVET-N_Notes_dD What MPEG has decided in the March meeting (25/3/2019): The performance/complexity tradeoff indicates that the NN technology currently is not mature enough to be included in a standard As I said…sometimes life is bad
  18. PERFORMANCE IS NOTHING WITHOUT COMPLEXITY Neural Network for Video Coding: Conclusion The trade-off matter
  19. Neural Network Video approach: Disruptive Videos are temporally highly redundant No deep image compression can compete with state-of-the-art video compression, which exploits this redundancy Optical Flow
  20. Optical Flow  In the computer vision tasks, optical flow is widely used to exploit temporal relationship  Learning based optical flow methods can provide accurate motion information at pixel-level  Only artificial/synthetic data set
  21. SpyNet
  22. • Learning based optical flow estimation is utilized to obtain the motion information and reconstruct the current frame • End-to-end deep video compression model that jointly learns motion estimation, motion compression, and residual compression DVC: An End-to-end Deep Video Compression Framework
  23. DVC: An End-to-end Deep Video Compression Framework MPEG NN 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒2 = 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒 𝑜𝑓 𝑁𝑁 𝐴𝑟𝑐ℎ𝑖𝑡𝑒𝑐𝑡𝑢𝑟𝑒𝑠
  24. DVC: An End-to-end Deep Video Compression Framework Optical Flow Net
  25. DVC: An End-to-end Deep Video Compression Framework Motion Compression
  26.  MV Encoder and Decoder Network DVC: An End-to-end Deep Video Compression Framework
  27. DVC: An End-to-end Deep Video Compression Framework Motion Compensation Network
  28. DVC: An End-to-end Deep Video Compression Framework Residual Encoder Net Bit Rate Estimation Net
  29. Loss Function DVC: An End-to-end Deep Video Compression Framework  The whole compression system is end-to-end optimized: Rate Distortion Optimization Just one end to end formula that jointly learns motion estimation, motion compression, and residual compression Residuals entropy Motion entropy
  30. Advantages of Neural Networks  Excellent content adaptivity  Improve coding efficiency by leveraging samples from far distance  Neural Network can well represent both texture and feature  The whole compression system is end-to-end optimized
  31. Rai R&D : what we are doing  End to end chain  Issues:  Residuals compression
  32. New EBU Distribution Codecs activity Please join the EBU Video Group https://tech.ebu.ch/video Please join the EBU Video Group, we’ll have lot of fun!
  33. Machine Learning approaches at video compression Roberto Iacoviello roberto.iacoviello@rai.it Grazie per l’attenzione This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-nd/3.0/ On your left there is the reinforcement learning, that means: this is the reward if you contact me.
Publicité